尝试使用Weka向训练集添加更多实例时IndexOutOfBoundsException

我正在尝试向我的训练集添加更多实例并执行10次交叉validation。

我的实例是String格式,所以我使用StringToWordVectorfilter将它们转换为数字。 如果我不添加我想要的额外页面,事情会很好。 但是当我添加命令trainSet.addAll(data2); 并将trainSet传递给filter我在Instances fTrainSet = Filter.useFilter(trainSet, filter);的第一次迭代中得到一个奇怪的IndexOutOfBoundsException Instances fTrainSet = Filter.useFilter(trainSet, filter);

 Instances data = getDataFromFile("pathtofile.arff");//main dataset 1821 instances Instances data2 = getDataFromFile("anotherpath.arff");//709 instances i want to add int folds = 10; for(int i=0;i<folds;i++){ Instances trainSet = data.trainCV(folds, i);//training set System.out.println(trainSet.numInstances());//Prints 1638 Instances testSet = data.testCV(folds, i);//testing set //add more instances trainSet.addAll(data2); System.out.println(trainSet.numInstances());//Prints 2347 //filter StringToWordVector filter = new StringToWordVector(); filter.setInputFormat(trainSet); filter.setWordsToKeep(10000); filter.setTFTransform(true); filter.setLowerCaseTokens(true); filter.setOutputWordCounts(true); Stemmer stemmer = new IteratedLovinsStemmer(); filter.setStemmer(stemmer); WordsFromFile stopwords = new WordsFromFile(); stopwords.setStopwords(new File(".data/stopwords2.txt")); filter.setStopwordsHandler(stopwords); Instances fTrainSet = Filter.useFilter(trainSet, filter);//error!!! Instances fTestSet = Filter.useFilter(testSet, filter); .... //classification and evaluation.... 

当我尝试使用filter时,我收到以下错误:

 Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 2161, Size: 1749 at java.util.ArrayList.rangeCheck(Unknown Source) at java.util.ArrayList.get(Unknown Source) at weka.core.Attribute.addStringValue(Attribute.java:924) at weka.core.StringLocator.copyStringValues(StringLocator.java:150) at weka.core.StringLocator.copyStringValues(StringLocator.java:91) at weka.filters.Filter.copyValues(Filter.java:399) at weka.filters.Filter.bufferInput(Filter.java:342) at weka.filters.unsupervised.attribute.StringToWordVector.input(StringToWordVector.java:655) at weka.filters.Filter.useFilter(Filter.java:692) at CrossValidationExample.main(CrossValidationExample.java:108) 

可能有什么不对?

经过一番搜索,我发现addAll函数有问题。 我能想到的一个原因是addAll只是添加实例的引用,当我尝试将它们与filter一起使用时,这是一个问题。 相反,我使用了这里提出的合并functionhttps://stackoverflow.com/a/12359788/3923800 ,所以我更换了trainSet.addAll(data2); with Instances newTrainSettrainSet = merge(trainSet,data2); 一切正常。