Tag: spark csv

使用Apache Spark和Java将CSV解析为DataFrame / DataSet

我是新来的火花,我想使用group-by&reduce从CSV中找到以下内容(使用一行): Department, Designation, costToCompany, State Sales, Trainee, 12000, UP Sales, Lead, 32000, AP Sales, Lead, 32000, LA Sales, Lead, 32000, TN Sales, Lead, 32000, AP Sales, Lead, 32000, TN Sales, Lead, 32000, LA Sales, Lead, 32000, LA Marketing, Associate, 18000, TN Marketing, Associate, 18000, TN HR, Manager, 58000, TN 我希望通过Department,Designation,State简化包含其他列和sum(costToCompany)和TotalEmployeeCount的组的CSV 应得到如下结果: Dept, Desg, state, empCount, […]