uniVocity不会将第一列解析为bean

我试图在uniVocity解析器的帮助下从GTFS.zip读取CSV文件,并遇到一个我无法弄清楚的问题。 由于某种原因,似乎某些CSV文件的第一列将无法正确解析。 例如,在“stops.txt”文件中,如下所示:

stop_id,stop_name,stop_lat,stop_lon,location_type,parent_station "de:3811:30215:0:6","Freiburg Stübeweg","48.0248455941735","7.85563688037231","","Parent30215" "de:8311:30054:0:1","Freiburg Schutternstraße","48.0236251356332","7.72434519425597","","Parent30054" "de:8311:30054:0:2","Freiburg Schutternstraße","48.0235446600679","7.72438739944883","","Parent30054" 

“stop_id”字段将无法正确解析将具有值“null”

这是我用来读取文件的方法:

  public  List readCSV(String path, String file, BeanListProcessor processor) { List content = null; try { // Get zip file ZipFile zip = new ZipFile(path); // Get CSV file ZipEntry entry = zip.getEntry(file); InputStream in = zip.getInputStream(entry); CsvParserSettings parserSettings = new CsvParserSettings(); parserSettings.setProcessor(processor); parserSettings.setHeaderExtractionEnabled(true); CsvParser parser = new CsvParser(parserSettings); parser.parse(new InputStreamReader(in)); content = processor.getBeans(); zip.close(); return content; } catch (Exception e) { e.printStackTrace(); } return content; } 

这就是我的Stop Class的样子:

 public class Stop { @Parsed private String stop_id; @Parsed private String stop_name; @Parsed private String stop_lat; @Parsed private String stop_lon; @Parsed private String location_type; @Parsed private String parent_station; public Stop() { } public Stop(String stop_id, String stop_name, String stop_lat, String stop_lon, String location_type, String parent_station) { this.stop_id = stop_id; this.stop_name = stop_name; this.stop_lat = stop_lat; this.stop_lon = stop_lon; this.location_type = location_type; this.parent_station = parent_station; } // --------------------- Getter -------------------------------- public String getStop_id() { return stop_id; } public String getStop_name() { return stop_name; } public String getStop_lat() { return stop_lat; } public String getStop_lon() { return stop_lon; } public String getLocation_type() { return location_type; } public String getParent_station() { return parent_station; } // --------------------- Setter -------------------------------- public void setStop_id(String stop_id) { this.stop_id = stop_id; } public void setStop_name(String stop_name) { this.stop_name = stop_name; } public void setStop_lat(String stop_lat) { this.stop_lat = stop_lat; } public void setStop_lon(String stop_lon) { this.stop_lon = stop_lon; } public void setLocation_type(String location_type) { this.location_type = location_type; } public void setParent_station(String parent_station) { this.parent_station = parent_station; } @Override public String toString() { return "Stop [stop_id=" + stop_id + ", stop_name=" + stop_name + ", stop_lat=" + stop_lat + ", stop_lon=" + stop_lon + ", location_type=" + location_type + ", parent_station=" + parent_station + "]"; } } 

如果我调用该方法,我得到的输出不正确:

  PartialReading pr = new PartialReading(); List stops = pr.readCSV("VAGFR.zip", "stops.txt", new BeanListProcessor(Stop.class)); for (int i = 0; i < 4; i++) { System.out.println(stops.get(i).toString()); } 

输出:

 Stop [stop_id=null, stop_name=Freiburg Stübeweg, stop_lat=48.0248455941735, stop_lon=7.85563688037231, location_type=null, parent_station=Parent30215] Stop [stop_id=null, stop_name=Freiburg Schutternstraße, stop_lat=48.0236251356332, stop_lon=7.72434519425597, location_type=null, parent_station=Parent30054] Stop [stop_id=null, stop_name=Freiburg Schutternstraße, stop_lat=48.0235446600679, stop_lon=7.72438739944883, location_type=null, parent_station=Parent30054] Stop [stop_id=null, stop_name=Freiburg Waltershofen Ochsen, stop_lat=48.0220902613143, stop_lon=7.7205756507492, location_type=null, parent_station=Parent30055] 

有谁知道为什么会发生这种情况以及如何解决这个问题? 这也发生在我测试的“routes.txt”和“trips.txt”文件中。 这是GTFS文件: http ://stadtplan.freiburg.de/sld/VAGFR.zip

如果您打印标题,您会注意到第一列看起来不正确。 那是因为您正在解析使用带有BOM标记的UTF-8编码的文件。

基本上,文件以几个字节开头,表示编码是什么。 在版本2.5。*之前,解析器没有在内部处理它,你必须跳过这些字节才能获得正确的输出:

 //... your code here ZipEntry entry = zip.getEntry(file); InputStream in = zip.getInputStream(entry); if(in.read() == 239 & in.read() == 187 & in.read() == 191){ System.out.println("UTF-8 with BOM, bytes discarded"); } CsvParserSettings parserSettings = new CsvParserSettings(); //...rest of your code here 

上面的hack可以在2.5。*之前的任何版本上运行,但你也可以使用Commons-IO提供BOMInputStream以方便和更干净地处理这类事情 – 它只是非常慢。

更新到最新版本应自动处理。

希望能帮助到你。