Hadoop 1.2.1 – 多节点集群 – 对于Wordcount程序,Reducer阶段是否挂起?

我的问题可能听起来多余,但早期问题的解决方案都是临时性的。 很少有人尝试但没有运气。

实际上,我正在研究hadoop-1.2.1(在ubuntu 14上),最初我有单节点设置,在那里我成功运行了WordCount程序。 然后我根据本教程添加了一个节点。 它成功启动,没有任何错误,但现在当我运行相同的WordCount程序时,它处于还原阶段。 我查看了任务跟踪器日志,它们如下所示: –

INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201509110037_0001_m_000002_0 task's state:UNASSIGNED INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201509110037_0001_m_000002_0 which needs 1 slots INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch attempt_201509110037_0001_m_000002_0 which needs 1 slots INFO org.apache.hadoop.mapred.JobLocalizer: Initializing user hadoopuser on this TT. INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_201509110037_0001_m_18975496 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_201509110037_0001_m_18975496 spawned. INFO org.apache.hadoop.mapred.TaskController: Writing commands to /app/hadoop/tmp/mapred/local/ttprivate/taskTracker/hadoopuser/jobcache/job_201509110037_0001/attempt_201509110037_0001_m_000002_0/taskjvm.sh INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201509110037_0001_m_18975496 given task: attempt_201509110037_0001_m_000002_0 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201509110037_0001_m_000002_0 0.0% hdfs://HadoopMaster:54310/input/file02:25+3 INFO org.apache.hadoop.mapred.TaskTracker: Task attempt_201509110037_0001_m_000002_0 is done. INFO org.apache.hadoop.mapred.TaskTracker: reported output size for attempt_201509110037_0001_m_000002_0 was 6 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free slots : 2 INFO org.apache.hadoop.mapred.JvmManager: JVM : jvm_201509110037_0001_m_18975496 exited with exit code 0. Number of tasks it ran: 1 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_201509110037_0001_r_000000_0 task's state:UNASSIGNED INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201509110037_0001_r_000000_0 which needs 1 slots INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch attempt_201509110037_0001_r_000000_0 which needs 1 slots INFO org.apache.hadoop.io.nativeio.NativeIO: Initialized cache for UID to User mapping with a cache timeout of 14400 seconds. INFO org.apache.hadoop.io.nativeio.NativeIO: Got UserName hadoopuser for UID 10 from the native implementation INFO org.apache.hadoop.mapred.JvmManager: In JvmRunner constructed JVM ID: jvm_201509110037_0001_r_18975496 INFO org.apache.hadoop.mapred.JvmManager: JVM Runner jvm_201509110037_0001_r_18975496 spawned. INFO org.apache.hadoop.mapred.TaskController: Writing commands to /app/hadoop/tmp/mapred/local/ttprivate/taskTracker/hadoopuser/jobcache/job_201509110037_0001/attempt_201509110037_0001_r_000000_0/taskjvm.sh INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201509110037_0001_r_18975496 given task: attempt_201509110037_0001_r_000000_0 INFO org.apache.hadoop.mapred.TaskTracker.clienttrace: src: 127.0.1.1:500, dest: 127.0.0.1:55946, bytes: 6, op: MAPRED_SHUFFLE, cliID: attempt_201509110037_0001_m_000002_0, duration: 7129894 INFO org.apache.hadoop.mapred.TaskTracker: attempt_201509110037_0001_r_000000_0 0.11111112% reduce > copy (1 of 3 at 0.00 MB/s) > INFO org.apache.hadoop.mapred.TaskTracker: attempt_201509110037_0001_r_000000_0 0.11111112% reduce > copy (1 of 3 at 0.00 MB/s) > INFO org.apache.hadoop.mapred.TaskTracker: attempt_201509110037_0001_r_000000_0 0.11111112% reduce > copy (1 of 3 at 0.00 MB/s) > INFO org.apache.hadoop.mapred.TaskTracker: attempt_201509110037_0001_r_000000_0 0.11111112% reduce > copy (1 of 3 at 0.00 MB/s) > INFO org.apache.hadoop.mapred.TaskTracker: attempt_201509110037_0001_r_000000_0 0.11111112% reduce > copy (1 of 3 at 0.00 MB/s) > INFO org.apache.hadoop.mapred.TaskTracker: attempt_201509110037_0001_r_000000_0 0.11111112% reduce > copy (1 of 3 at 0.00 MB/s) > 

也在我正在运行程序的控制台上它挂起 –

 00:39:24 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 00:39:24 INFO util.NativeCodeLoader: Loaded the native-hadoop library 00:39:24 WARN snappy.LoadSnappy: Snappy native library not loaded 00:39:24 INFO mapred.FileInputFormat: Total input paths to process : 2 00:39:24 INFO mapred.JobClient: Running job: job_201509110037_0001 00:39:25 INFO mapred.JobClient: map 0% reduce 0% 00:39:28 INFO mapred.JobClient: map 100% reduce 0% 00:39:35 INFO mapred.JobClient: map 100% reduce 11% 

我的配置文件如下: –

//core-site.xml

   hadoop.tmp.dir /app/hadoop/tmp A base for other temporary directories.   fs.default.name hdfs://HadoopMaster:54310 The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.   

//hdfs-site.xml

   dfs.replication 1 Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.    

//mapred-site.xml

   mapred.job.tracker HadoopMaster:54311 The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task.    mapred.reduce.slowstart.completed.maps 0.80   

/ etc / hosts文件

 127.0.0.1 localhost 127.0.1.1 M-1947 #HADOOP CLUSTER SETUP 172.50.88.54 HadoopMaster 172.50.88.60 HadoopSlave1 # The following lines are desirable for IPv6 capable hosts ::1 ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters 

在/ etc /主机名

M-1947年

//大师

HadoopMaster

//奴隶

HadoopMaster

HadoopSlave1

我一直在努力奋斗,任何帮助都表示赞赏。 谢谢 !

解决了问题..虽然,同样的问题在论坛上有多个问题,但根据我的validation解决方案是群集中任何节点的主机名解析应该是正确的(而且这个问题不依赖于群集的大小)。

实际上这是dns-lookup的问题,确保做出以下更改以解决上述问题 –

  1. 尝试使用’$ hostname’在每台机器上打印主机名

  2. 检查为每台机器打印的主机名是否与各机器的主/从文件中的条目相同。

  3. 如果它不匹配,则通过在/ etc / hostname文件中进行更改重命名主机并重新引导系统。

示例: –

/ etc / hosts文件中(假设在hadoop集群的主机上)

127.0.0.1 localhost

127.0.1.1 john-machine

#Hadoop集群

172.50.88.21 HadoopMaster

172.50.88.22 HadoopSlave1

172.50.88.23 HadoopSlave2

那么它是 – > / etc / hostname文件 (在主机上)应该包含以下条目(对于上面要解决的问题)

HadoopMaster

类似地validation每个从节点的/ etc / hostname文件。