如何以编程方式为多播发现机制配置hazelcast?

如何以编程方式为多播发现机制配置hazelcast?


细节:

该文档仅提供TCP / IP的示例并且已过时:它使用不再存在的Config.setPort()。

我的配置看起来像这样,但发现不起作用(即我得到输出"Members: 1"

  Config cfg = new Config(); NetworkConfig network = cfg.getNetworkConfig(); network.setPort(PORT_NUMBER); JoinConfig join = network.getJoin(); join.getTcpIpConfig().setEnabled(false); join.getAwsConfig().setEnabled(false); join.getMulticastConfig().setEnabled(true); join.getMulticastConfig().setMulticastGroup(MULTICAST_ADDRESS); join.getMulticastConfig().setMulticastPort(PORT_NUMBER); join.getMulticastConfig().setMulticastTimeoutSeconds(200); HazelcastInstance instance = Hazelcast.newHazelcastInstance(cfg); System.out.println("Members: "+hazelInst.getCluster().getMembers().size()); 

更新1,考虑到asimarslan的答案

如果我偶然发现MulticastTimeout,我要么得到"Members: 1"或者

2013年12月5日下午8:50:42 com.hazelcast.nio.ReadHandler警告:[192.168.0.9]:4446 [dev] hz._hzInstance_1_dev.IO.thread-in-0关闭套接字到端点地址[192.168.0.7] :4446,原因:java.io.EOFException:远程套接字已关闭! 2013年12月5日下午8:57:24 com.hazelcast.instance.Node严重:[192.168.0.9]:4446 [dev]无法加入群集,关闭! com.hazelcast.core.HazelcastException:300秒内无法加入!


更新2,采取pveentjer关于使用tcp / ip的答案

如果我将配置更改为以下内容,我仍然只能获得1个成员:

 Config cfg = new Config(); NetworkConfig network = cfg.getNetworkConfig(); network.setPort(PORT_NUMBER); JoinConfig join = network.getJoin(); join.getMulticastConfig().setEnabled(false); join.getTcpIpConfig().addMember("192.168.0.1").addMember("192.168.0.2"). addMember("192.168.0.3").addMember("192.168.0.4"). addMember("192.168.0.5").addMember("192.168.0.6"). addMember("192.168.0.7").addMember("192.168.0.8"). addMember("192.168.0.9").addMember("192.168.0.10"). addMember("192.168.0.11").setRequiredMember(null).setEnabled(true); //this sets the allowed connections to the cluster? necessary for multicast, too? network.getInterfaces().setEnabled(true).addInterface("192.168.0.*"); HazelcastInstance instance = Hazelcast.newHazelcastInstance(cfg); System.out.println("debug: joined via "+join+" with "+hazelInst.getCluster() .getMembers().size()+" members."); 

更确切地说,此运行产生输出

debug:通过JoinConfig加入{multicastConfig = MulticastConfig [enabled = false,multicastGroup = 224.2.2.3,multicastPort = 54327,multicastTimeToLive = 32,multicastTimeoutSeconds = 2,trustedInterfaces = []],tcpIpConfig = TcpIpConfig [enabled = true,connectionTimeoutSeconds = 5, members = [192.168.0.1,192.168.0.2,192.168.0.3,192.168.0.4,192.168.0.5,192.168.0.6,192.168.0.7,19​​2.168.0.8,192.168.0.9,192.168.0.10,192.168.0.11],requiredMember = null],awsConfig = AwsConfig {enabled = false,region =’us-east-1’,securityGroupName =’null’,tagKey =’null’,tagValue =’null’,hostHeader =’ec2.amazonaws.com’,connectionTimeoutSeconds = 5}}有1名成员。

我的非hazelcast实现使用UDP多播并且工作正常。 防火墙真的可以成为问题吗?


更新3,考虑pveentjer关于检查网络的答案

由于我没有iptables的权限或安装iperf,我使用com.hazelcast.examples.TestApp来检查我的网络是否正常工作,如第2章“直接显示”部分中的Hazelcast入门中所述:

我在192.168.0.1上调用java -cp hazelcast-3.1.2.jar com.hazelcast.examples.TestApp并获取输出

 ...Dec 10, 2013 11:31:21 PM com.hazelcast.instance.DefaultAddressPicker INFO: Prefer IPv4 stack is true. Dec 10, 2013 11:31:21 PM com.hazelcast.instance.DefaultAddressPicker INFO: Picked Address[192.168.0.1]:5701, using socket ServerSocket[addr=/0:0:0:0:0:0:0:0,localport=5701], bind any local is true Dec 10, 2013 11:31:22 PM com.hazelcast.system INFO: [192.168.0.1]:5701 [dev] Hazelcast Community Edition 3.1.2 (20131120) starting at Address[192.168.0.1]:5701 Dec 10, 2013 11:31:22 PM com.hazelcast.system INFO: [192.168.0.1]:5701 [dev] Copyright (C) 2008-2013 Hazelcast.com Dec 10, 2013 11:31:22 PM com.hazelcast.instance.Node INFO: [192.168.0.1]:5701 [dev] Creating MulticastJoiner Dec 10, 2013 11:31:22 PM com.hazelcast.core.LifecycleService INFO: [192.168.0.1]:5701 [dev] Address[192.168.0.1]:5701 is STARTING Dec 10, 2013 11:31:24 PM com.hazelcast.cluster.MulticastJoiner INFO: [192.168.0.1]:5701 [dev] Members [1] { Member [192.168.0.1]:5701 this } Dec 10, 2013 11:31:24 PM com.hazelcast.core.LifecycleService INFO: [192.168.0.1]:5701 [dev] Address[192.168.0.1]:5701 is STARTED 

然后我在192.168.0.2上调用java -cp hazelcast-3.1.2.jar com.hazelcast.examples.TestApp并获取输出

 ...Dec 10, 2013 9:50:22 PM com.hazelcast.instance.DefaultAddressPicker INFO: Prefer IPv4 stack is true. Dec 10, 2013 9:50:22 PM com.hazelcast.instance.DefaultAddressPicker INFO: Picked Address[192.168.0.2]:5701, using socket ServerSocket[addr=/0:0:0:0:0:0:0:0,localport=5701], bind any local is true Dec 10, 2013 9:50:23 PM com.hazelcast.system INFO: [192.168.0.2]:5701 [dev] Hazelcast Community Edition 3.1.2 (20131120) starting at Address[192.168.0.2]:5701 Dec 10, 2013 9:50:23 PM com.hazelcast.system INFO: [192.168.0.2]:5701 [dev] Copyright (C) 2008-2013 Hazelcast.com Dec 10, 2013 9:50:23 PM com.hazelcast.instance.Node INFO: [192.168.0.2]:5701 [dev] Creating MulticastJoiner Dec 10, 2013 9:50:23 PM com.hazelcast.core.LifecycleService INFO: [192.168.0.2]:5701 [dev] Address[192.168.0.2]:5701 is STARTING Dec 10, 2013 9:50:23 PM com.hazelcast.nio.SocketConnector INFO: [192.168.0.2]:5701 [dev] Connecting to /192.168.0.1:5701, timeout: 0, bind-any: true Dec 10, 2013 9:50:23 PM com.hazelcast.nio.TcpIpConnectionManager INFO: [192.168.0.2]:5701 [dev] 38476 accepted socket connection from /192.168.0.1:5701 Dec 10, 2013 9:50:28 PM com.hazelcast.cluster.ClusterService INFO: [192.168.0.2]:5701 [dev] Members [2] { Member [192.168.0.1]:5701 Member [192.168.0.2]:5701 this } Dec 10, 2013 9:50:30 PM com.hazelcast.core.LifecycleService INFO: [192.168.0.2]:5701 [dev] Address[192.168.0.2]:5701 is STARTED 

所以多播发现通常在我的集群上运行,对吗? 5701也是发现的端口吗? 最后一个输出中的38476是ID还是端口?

加入仍然不适用于我自己的程序配置代码:(


更新4,了解pveentjer关于使用默认配置的答案

修改后的TestApp提供输出

 joinConfig{multicastConfig=MulticastConfig [enabled=true, multicastGroup=224.2.2.3, multicastPort=54327, multicastTimeToLive=32, multicastTimeoutSeconds=2, trustedInterfaces=[]], tcpIpConfig=TcpIpConfig [enabled=false, connectionTimeoutSeconds=5, members=[], requiredMember=null], awsConfig=AwsConfig{enabled=false, region='us-east-1', securityGroupName='null', tagKey='null', tagValue='null', hostHeader='ec2.amazonaws.com', connectionTimeoutSeconds=5}} 

并且在几秒钟之后检测其他成员(在每个实例一次只列出自己作为成员之后,如果所有成员同时启动),而

myProgram给出输出

 joined via JoinConfig{multicastConfig=MulticastConfig [enabled=true, multicastGroup=224.2.2.3, multicastPort=54327, multica\ stTimeToLive=32, multicastTimeoutSeconds=2, trustedInterfaces=[]], tcpIpConfig=TcpIpConfig [enabled=false, connectionTimeoutSecond\ s=5, members=[], requiredMember=null], awsConfig=AwsConfig{enabled=false, region='us-east-1', securityGroupName='null', tagKey='nu\ ll', tagValue='null', hostHeader='ec2.amazonaws.com', connectionTimeoutSeconds=5}} with 1 members. 

并且在大约1分钟的运行时间内没有检测到成员(我大约每5秒计算一次成员)。

但是,如果至少有一个TestApp实例在集群上同时运行,则会检测所有TestApp实例和所有myProgram实例,并且我的程序运行正常。 如果我一次启动TestApp一次然后启动myProgram两次,TestApp会提供以下输出:

 java -cp ~/CaseStudy/jtorx-1.10.0-beta8/lib/hazelcast-3.1.2.jar:. TestApp Dec 12, 2013 12:02:15 PM com.hazelcast.instance.DefaultAddressPicker INFO: Prefer IPv4 stack is true. Dec 12, 2013 12:02:15 PM com.hazelcast.instance.DefaultAddressPicker INFO: Picked Address[192.168.180.240]:5701, using socket ServerSocket[addr=/0:0:0:0:0:0:0:0,localport=5701], bind any local is true Dec 12, 2013 12:02:15 PM com.hazelcast.system INFO: [192.168.180.240]:5701 [dev] Hazelcast Community Edition 3.1.2 (20131120) starting at Address[192.168.180.240]:5701 Dec 12, 2013 12:02:15 PM com.hazelcast.system INFO: [192.168.180.240]:5701 [dev] Copyright (C) 2008-2013 Hazelcast.com Dec 12, 2013 12:02:15 PM com.hazelcast.instance.Node INFO: [192.168.180.240]:5701 [dev] Creating MulticastJoiner Dec 12, 2013 12:02:15 PM com.hazelcast.core.LifecycleService INFO: [192.168.180.240]:5701 [dev] Address[192.168.180.240]:5701 is STARTING Dec 12, 2013 12:02:21 PM com.hazelcast.cluster.MulticastJoiner INFO: [192.168.180.240]:5701 [dev] Members [1] { Member [192.168.180.240]:5701 this } Dec 12, 2013 12:02:22 PM com.hazelcast.core.LifecycleService INFO: [192.168.180.240]:5701 [dev] Address[192.168.180.240]:5701 is STARTED Dec 12, 2013 12:02:22 PM com.hazelcast.management.ManagementCenterService INFO: [192.168.180.240]:5701 [dev] Hazelcast will connect to Management Center on address: http://localhost:8080/mancenter-3.1.2/ Join: JoinConfig{multicastConfig=MulticastConfig [enabled=true, multicastGroup=224.2.2.3, multicastPort=54327, multicastTimeToLive=32, multicastTimeoutSeconds=2, trustedInterfaces=[]], tcpIpConfig=TcpIpConfig [enabled=false, connectionTimeoutSeconds=5, members=[], requiredMember=null], awsConfig=AwsConfig{enabled=false, region='us-east-1', securityGroupName='null', tagKey='null', tagValue='null', hostHeader='ec2.amazonaws.com', connectionTimeoutSeconds=5}} Dec 12, 2013 12:02:22 PM com.hazelcast.partition.PartitionService INFO: [192.168.180.240]:5701 [dev] Initializing cluster partition table first arrangement... hazelcast[default] > Dec 12, 2013 12:03:27 PM com.hazelcast.nio.SocketAcceptor INFO: [192.168.180.240]:5701 [dev] Accepting socket connection from /192.168.0.8:38764 Dec 12, 2013 12:03:27 PM com.hazelcast.nio.TcpIpConnectionManager INFO: [192.168.180.240]:5701 [dev] 5701 accepted socket connection from /192.168.0.8:38764 Dec 12, 2013 12:03:27 PM com.hazelcast.nio.SocketAcceptor INFO: [192.168.180.240]:5701 [dev] Accepting socket connection from /192.168.0.7:54436 Dec 12, 2013 12:03:27 PM com.hazelcast.nio.TcpIpConnectionManager INFO: [192.168.180.240]:5701 [dev] 5701 accepted socket connection from /192.168.0.7:54436 Dec 12, 2013 12:03:32 PM com.hazelcast.partition.PartitionService INFO: [192.168.180.240]:5701 [dev] Re-partitioning cluster data... Migration queue size: 181 Dec 12, 2013 12:03:32 PM com.hazelcast.cluster.ClusterService INFO: [192.168.180.240]:5701 [dev] Members [3] { Member [192.168.180.240]:5701 this Member [192.168.0.8]:5701 Member [192.168.0.7]:5701 } Dec 12, 2013 12:03:43 PM com.hazelcast.partition.PartitionService INFO: [192.168.180.240]:5701 [dev] Re-partitioning cluster data... Migration queue size: 181 Dec 12, 2013 12:03:45 PM com.hazelcast.partition.PartitionService INFO: [192.168.180.240]:5701 [dev] All migration tasks has been completed, queues are empty. Dec 12, 2013 12:03:46 PM com.hazelcast.nio.TcpIpConnection INFO: [192.168.180.240]:5701 [dev] Connection [Address[192.168.0.8]:5701] lost. Reason: Socket explicitly closed Dec 12, 2013 12:03:46 PM com.hazelcast.cluster.ClusterService INFO: [192.168.180.240]:5701 [dev] Removing Member [192.168.0.8]:5701 Dec 12, 2013 12:03:46 PM com.hazelcast.cluster.ClusterService INFO: [192.168.180.240]:5701 [dev] Members [2] { Member [192.168.180.240]:5701 this Member [192.168.0.7]:5701 } Dec 12, 2013 12:03:48 PM com.hazelcast.partition.PartitionService INFO: [192.168.180.240]:5701 [dev] Partition balance is ok, no need to re-partition cluster data... Dec 12, 2013 12:03:48 PM com.hazelcast.nio.TcpIpConnection INFO: [192.168.180.240]:5701 [dev] Connection [Address[192.168.0.7]:5701] lost. Reason: Socket explicitly closed Dec 12, 2013 12:03:48 PM com.hazelcast.cluster.ClusterService INFO: [192.168.180.240]:5701 [dev] Removing Member [192.168.0.7]:5701 Dec 12, 2013 12:03:48 PM com.hazelcast.cluster.ClusterService INFO: [192.168.180.240]:5701 [dev] Members [1] { Member [192.168.180.240]:5701 this } Dec 12, 2013 12:03:48 PM com.hazelcast.partition.PartitionService INFO: [192.168.180.240]:5701 [dev] Partition balance is ok, no need to re-partition cluster data... 

我在TestApp的配置中看到的唯一区别是

 config.getManagementCenterConfig().setEnabled(true); config.getManagementCenterConfig().setUrl("http://localhost:8080/mancenter-"+version); for(int k=1;k<= LOAD_EXECUTORS_COUNT;k++){ config.addExecutorConfig(new ExecutorConfig("e"+k).setPoolSize(k)); } 

所以我也拼命地尝试了myProgram。 但它并没有解决问题 – 仍然每个实例仅在整个运行期间检测到自己为成员。


更新myProgram的运行时间

可能是程序运行时间不够长(如pveentjer所说)?

我的实验似乎证实了这一点:如果Hazelcast.newHazelcastInstance(cfg);之间的时间t Hazelcast.newHazelcastInstance(cfg); 并初始化cleanUp() (即不再通过hazelcast进行通信而不再检查成员数量)是

  • 不到30秒,没有沟通和members: 1
  • 超过30秒:找到所有成员并进行通信(奇怪的是,发生的时间远远超过t – 30秒)。

30秒是一个真实的时间跨度,是一个黑网投影集群需要,还是有一些奇怪的事情发生? 以下是同时运行的4个myPrograms的日志(查找hazelcast-members重叠30秒,例如1和实例3):

 instance 1: 2013-12-19T12:39:16.553+0100 LOG 0 (START) engine started looking for members between 2013-12-19T12:39:21.973+0100 and 2013-12-19T12:40:27.863+0100 2013-12-19T12:40:28.205+0100 LOG 35 (Torx-Explorer) Model SymToSim is about to\ exit instance 2: 2013-12-19T12:39:16.592+0100 LOG 0 (START) engine started looking for members between 2013-12-19T12:39:22.192+0100 and 2013-12-19T12:39:28.429+0100 2013-12-19T12:39:28.711+0100 LOG 52 (Torx-Explorer) Model SymToSim is about to\ exit instance 3: 2013-12-19T12:39:16.593+0100 LOG 0 (START) engine started looking for members between 2013-12-19T12:39:22.145+0100 and 2013-12-19T12:39:52.425+0100 2013-12-19T12:39:52.639+0100 LOG 54 (Torx-Explorer) Model SymToSim is about to\ exit INSTANCE 4: 2013-12-19T12:39:16.885+0100 LOG 0 (START) engine started looking for members between 2013-12-19T12:39:21.478+0100 and 2013-12-19T12:39:35.980+0100 2013-12-19T12:39:36.024+0100 LOG 34 (Torx-Explorer) Model SymToSim is about to\ exit 

只有在hazelcast集群中有足够的成员后,我才能最好地启动我的实际分布式算法? 我可以hazelcast.initial.min.cluster.size编程方式设置hazelcast.initial.min.cluster.size吗? https://groups.google.com/forum/#!topic/hazelcast/sa-lmpEDa6A听起来会阻止Hazelcast.newHazelcastInstance(cfg); 直到达到initial.min.cluster.size。 正确? 不同实例如何同步(在哪个时间范围内)解锁?

问题显然是集群启动(并停止)并且不等到集群中有足够的成员。 您可以设置hazelcast.initial.min.cluster.size属性,以防止这种情况发生。

您可以使用以下方式以编程方式设置’hazelcast.initial.min.cluster.size’:

 Config config = new Config(); config.setProperty("hazelcast.initial.min.cluster.size","3"); 

您的配置是正确的但您设置的超长组播超时为200秒,默认值为2秒。 设置较小的值将解决它。

来自Hazelcast Java API Doc: MulticastConfig.html#setMulticastTimeoutSeconds(int)

指定节点在将自身声明为主节点并创建其自己的集群之前应等待来自网络中运行的另一节点的有效多播响应的时间(以秒为单位)。 这仅适用于尚未分配主节点的节点的启动。 如果指定一个较高的值,例如60秒,则意味着在选择主控器之前,每个节点将在继续之前等待60秒, 因此请小心提供高值 。 如果该值设置得太低,则可能是节点过早放弃并将创建自己的集群。

您似乎正在使用TCP / IP群集,因此这很好。 尝试以下(来自榛子书)

如果您正在使用iptables,则可以添加以下规则以允许来自端口33000-31000的出站流量:

 iptables -A OUTPUT -p TCP --dport 33000:31000 -m state --state NEW -j ACCEPT 

并控制从任何地址到端口5701的传入流量:

 iptables -A INPUT -p tcp -d 0/0 -s 0/0 --dport 5701 -j ACCEPT 

并允许传入的多播流量:

 iptables -A INPUT -m pkttype --pkt-type multicast -j ACCEPT 

连接测试如果由于计算机无法加入群集而遇到麻烦,则可以检查两台计算机之间的网络连接。 您可以使用名为iperf的工具。 在一台机器上执行:iperf -s -p 5701这意味着您正在侦听端口5701。

在另一台机器上执行以下命令:

 iperf -c 192.168.1.107 -d -p 5701 

在哪里用你的第一台机器的ip地址替换’192.168.1.107’。 如果您运行该命令,您将获得如下输出:

 ------------------------------------------------------------ Server listening on TCP port 5701 TCP window size: 85.3 KByte (default) ------------------------------------------------------------ ------------------------------------------------------------ Client connecting to 192.168.1.107, TCP port 5701 TCP window size: 59.4 KByte (default) ------------------------------------------------------------ [ 5] local 192.168.1.105 port 40524 connected with 192.168.1.107 port 5701 [ 4] local 192.168.1.105 port 5701 connected with 192.168.1.107 port 33641 [ ID] Interval Transfer Bandwidth [ 4] 0.0-10.2 sec 55.8 MBytes 45.7 Mbits/sec [ 5] 0.0-10.3 sec 6.25 MBytes 5.07 Mbits/sec 

你知道这两台机器可以相互连接。 但是,如果你看到这样的事情:

 Server listening on TCP port 5701 TCP window size: 85.3 KByte (default) ------------------------------------------------------------ connect failed: No route to host 

然后,您知道手上可能存在网络连接问题。

看起来Hazelcast在UDP端口54327(默认情况下)上使用多播地址224.2.2.3进行发现,然后使用端口5701进行TCP通信。 在防火墙中打开UDP端口54327为我修复了发现。 (我还打开了TCP端口5701,但这还不够。)

你能先尝试使用tcp / ip集群来确保其他一切都正常吗? 确认没有问题后,请尝试多播。 它也可能是防火墙问题btw。

所以看起来Multicast正在你的网络上工作; 这很好。

你可以尝试使用以下设置:

 Config cfg = new Config(); NetworkConfig network = cfg.getNetworkConfig(); JoinConfig join = network.getJoin(); join.getTcpIpConfig().setEnabled(false); join.getAwsConfig().setEnabled(false); join.getMulticastConfig().setEnabled(true); HazelcastInstance instance = Hazelcast.newHazelcastInstance(cfg); 

如您所见,我删除了所有自定义。

您可以尝试创建这样的Hazelcast实例:

 Config cfg = new Config(); HazelcastInstance hz = Hazelcast.newHazelcastInstance(cfg); 

管理中心的东西和执行程序的创建是不相关的(我在testapp中添加了代码,所以我100%肯定)。

然后,您应该具有与TestApp完全相同的网络配置。