我们将随机路径的长度固定为20,每次变换单个节点生成路径的条数,分别尝试10、20、50,生成相应的Embedding结果,并查看相应的分类效果。完整代码如下:
BatchOperator <?> edges = new UnionBatchOp().linkFrom( paper_author.select("paper_id AS source_id, author_id AS target_id"), paper_conf.select("paper_id AS source_id, conf_id AS target_id") ); for (int walkNum : new int[] {10, 20, 50}) { edges .link( new DeepWalkBatchOp() .setSourceCol("source_id") .setTargetCol("target_id") .setIsToUndigraph(true) .setVectorSize(100) .setWalkLength(20) .setWalkNum(walkNum) .setNumIter(1) ) .link( new AkSinkBatchOp() .setFilePath(DATA_DIR + String.valueOf(walkNum) + "_" + DEEPWALK_EMBEDDING) .setOverwriteSink(true) ); BatchOperator.execute(); classifyWithEmbedding( new AkSourceBatchOp() .setFilePath(DATA_DIR + String.valueOf(walkNum) + "_" + DEEPWALK_EMBEDDING) ); }
整理运行结果如下,整体上随着WalkNum的增加,每个分类器的效果都在变好,Softmax分类器,在WalkNum从10到20的变化中,精确度(Accuracy)提升非常明显,甚至超过KnnClassifier。
WalkNum | Softmax | KnnClassifier |
10 | Accuracy:0.5361 Kappa:0.2547 | Accuracy:0.5595 Kappa:0.3649 |
20 | Accuracy:0.5727 Kappa:0.3555 | Accuracy:0.5669 Kappa:0.3781 |
50 | Accuracy:0.5752 Kappa:0.3647 | Accuracy:0.5791 Kappa:0.3999 |
下面,我们再尝试修改构建图的流程,看看对Embedding的影响。如下面代码所示,再添加一种边的关系,即作者到会议的边。
BatchOperator <?> edges = new UnionBatchOp().linkFrom( paper_author.select("paper_id AS source_id, author_id AS target_id"), paper_conf.select("paper_id AS source_id, conf_id AS target_id"), new LookupBatchOp() .setSelectedCols("paper_id") .setOutputCols("target_id") .setMapKeyCols("paper_id") .setMapValueCols("conf_id") .linkFrom(paper_conf, paper_author) .select("author_id AS source_id, target_id") );
整理运行结果如下,一个明显的变化是,WalkNum=10时,Softmax与KnnClassifier的分类效果明显好于前面的实验,Softmax上的表现更为突出;但随着WalkNum的增加,分类效果的改进较小。
WalkNum | Softmax | KnnClassifier |
10 | Accuracy:0.5669 Kappa:0.3524 | Accuracy:0.5633 Kappa:0.3717 |
20 | Accuracy:0.5746 Kappa:0.3748 | Accuracy:0.565 Kappa:0.3836 |
50 | Accuracy:0.5761 Kappa:0.38 | Accuracy:0.5652 Kappa:0.3936 |