在h2o中加载大于内存大小的数据

我正在尝试加载大于h2o内存大小的数据。

H2o 博客提到: A note on Bigger Data and GC: We do a user-mode swap-to-disk when the Java heap gets too full, ie, you're using more Big Data than physical DRAM. We won't die with a GC death-spiral, but we will degrade to out-of-core speeds. We'll go as fast as the disk will allow. I've personally tested loading a 12Gb dataset into a 2Gb (32bit) JVM; it took about 5 minutes to load the data, and another 5 minutes to run a Logistic Regression. A note on Bigger Data and GC: We do a user-mode swap-to-disk when the Java heap gets too full, ie, you're using more Big Data than physical DRAM. We won't die with a GC death-spiral, but we will degrade to out-of-core speeds. We'll go as fast as the disk will allow. I've personally tested loading a 12Gb dataset into a 2Gb (32bit) JVM; it took about 5 minutes to load the data, and another 5 minutes to run a Logistic Regression.

这是连接到h2o 3.6.0.8R代码:

 h2o.init(max_mem_size = '60m') # alloting 60mb for h2o, R is running on 8GB RAM machine 

 java version "1.8.0_65" Java(TM) SE Runtime Environment (build 1.8.0_65-b17) Java HotSpot(TM) 64-Bit Server VM (build 25.65-b01, mixed mode) .Successfully connected to http://127.0.0.1:54321/ R is connected to the H2O cluster: H2O cluster uptime: 2 seconds 561 milliseconds H2O cluster version: 3.6.0.8 H2O cluster name: H2O_started_from_R_RILITS-HWLTP_tkn816 H2O cluster total nodes: 1 H2O cluster total memory: 0.06 GB H2O cluster total cores: 4 H2O cluster allowed cores: 2 H2O cluster healthy: TRUE Note: As started, H2O is limited to the CRAN default of 2 CPUs. Shut down and restart H2O as shown below to use all your CPUs. > h2o.shutdown() > h2o.init(nthreads = -1) IP Address: 127.0.0.1 Port : 54321 Session ID: _sid_b2e0af0f0c62cd64a8fcdee65b244d75 Key Count : 3 

我试图将169 MB的csv加载到h2o中。

 dat.hex <- h2o.importFile('dat.csv') 

哪个错了,

 Error in .h2o.__checkConnectionHealth() : H2O connection has been severed. Cannot connect to instance at http://127.0.0.1:54321/ Failed to connect to 127.0.0.1 port 54321: Connection refused 

这表示内存不足错误 。

问题:如果H2o承诺加载大于其内存容量的数据集(如上面的博客引用所述,交换到磁盘机制),这是加载数据的正确方法吗?

默认情况下,默认情况下已禁用交换到磁盘,因为性能非常糟糕。 最前沿(不是最新的稳定版)有一个标志可以启用它:“ – clean”“(用于”内存清理器“)。
请注意,您的群集具有极小的内存: H2O cluster total memory: 0.06 GB即60MB! 勉强可以启动JVM,更不用说运行H2O了。 如果H2O可以在那里正常出现我会感到惊讶,更别提了交换到磁盘。 交换仅限于交换数据。 如果您正在尝试进行交换测试,请将JVM升级到1或2 Gigs ram,然后加载总和超过该值的数据集。

悬崖