openstack nova resize fail : nova.exception.NoValidHost: No valid host was found - openstack

I have got a VM in VMware on which openstack is installed. Openstack has two instances (2GB,1VCPU and 4GB,4VCPU). I changed the size of one of the instance having 4GB, 4VCPU to 8 GB using nova resize and it worked. Now I need RAM atleast 16 GB and VCPUs 6 but the resize command finishes way too early (within seconds) without generating error and RAM size has not upgraded.
I finally found following error in nova-conductor.log
nova.exception.NoValidHost: No valid host was found.
: nova.exception_Remote.NoValidHost_Remote: No valid host was found.
2022-05-09 10:57:11.622 56 WARNING nova.scheduler.utils [req-31c98f42-d465-4773-9b4a-3474aef85a1c 10f23c55b2ce420aa1e757062f3874f8 c458f62c51d249ef9bf2b9f11c4ddb98 - default default] [instance: 7b0e27e4-8c78-4dc5-bf38-3cfc02d356d8] Setting instance to ACTIVE state.: nova.exception_Remote.NoValidHost_Remote: No valid host was found.
2022-05-09 11:01:46.757 55 WARNING nova.scheduler.utils [req-cfa535cc-c19f-4b75-8fde-224d8f3610d5 10f23c55b2ce420aa1e757062f3874f8 c458f62c51d249ef9bf2b9f11c4ddb98 - default default] Failed to compute_task_migrate_server: No valid host was found.
Traceback (most recent call last):
File "/var/lib/kolla/venv/lib/python3.8/site-packages/oslo_messaging/rpc/server.py", line 241, in inner
return func(*args, **kwargs)
File "/var/lib/kolla/venv/lib/python3.8/site-packages/nova/scheduler/manager.py", line 209, in select_destinations
raise exception.NoValidHost(reason="")
nova.exception.NoValidHost: No valid host was found.
: nova.exception_Remote.NoValidHost_Remote: No valid host was found.
2022-05-09 11:01:46.759 55 WARNING nova.scheduler.utils [req-cfa535cc-c19f-4b75-8fde-224d8f3610d5 10f23c55b2ce420aa1e757062f3874f8 c458f62c51d249ef9bf2b9f11c4ddb98 - default default] [instance: 7b0e27e4-8c78-4dc5-bf38-3cfc02d356d8] Setting instance to ACTIVE state.: nova.exception_Remote.NoValidHost_Remote: No valid host was found.
I have read in some articles that changing ram_allocation_ratio and cpu_allocation_ratio will work but I dont know exactly how I should change it in my case. My VM where openstack is deployed has 25 GB Ram and 16 processor cores while the host machine has 32 GB RAM and 32 logical processors (16 cores). How can I modify these variables or is there something else I am missing?
EDIT:
I was able to solve the problem by analyzing the output of openstack hypervisor stats show. The flavor I was using to extend the size was demanding more disk space (80G) then the one showed in hypervisor stats (47G). By extending the size of VM and then directory /var/lib (since i am using docker style deployment), the issue resolved.

Have you set the reserved_host_memory_mb in nova.conf?
What's the version about your deployed openstack?
The default ram_allocation_ratio is 1.5, and you could set it to 2 or more, then check whether the error No valid host was found. fixed or not. This is the Sample Configuration File.
Get your hypervisor stats details by openstack hypervisor stats show. Make sure that system resource sufficiently, maybe there is some limit in your environment such as ports, disk or others.
Update for others reference:
The root cause is that the flavor with which instance was created has disk capacity 80G, but only get free_disk_gb is 47 by hypervisor stats. And OP fixed it by extend the host(hypervisor)'s disk capacity.

Related

Cannot start h2o

As the title says. I cannot run h20.init.
I have already downloaded the 64 bit version of the Java SE Development Kit 8u291. I also downloaded the xgboost library in R (install.packages("xgboost") ). Finally, I have updated all my NVIDIA drivers and downloaded the latest CUDA (although, tbh I don't even know what that does). I followed the steps described in the NVIDIA forums to avoid the crash I had when installing (i.e. remove integration with visual studio). FWIW I'm using a DELL Inspiron 15 Gaming and it has a NVIDIA GTX 1050 with 4GB.
Here's the full code I'm using (straight from the h2o download instructions except for the first line):
library(xgboost)
library(h2o)
localH2O = h2o.init()
demo(h2o.kmeans)
Any help would be much appreciated.
The full message I get when running the above code chunk:
H2O is not running yet, starting it now...
Note: In case of errors look at the following log files:
C:\Users\<my username>\AppData\Local\Temp\RtmpcdvCce\file1a106074110b/h2o_<my username>_started_from_r.out
C:\Users\<my username>\AppData\Local\Temp\RtmpcdvCce\file1a10253139db/h2o_<my username>_started_from_r.err
java version "15.0.2" 2021-01-19
Java(TM) SE Runtime Environment (build 15.0.2+7-27)
Java HotSpot(TM) 64-Bit Server VM (build 15.0.2+7-27, mixed mode, sharing)
Starting H2O JVM and connecting: ............................................................Diagnostic HTTP Request:
HTTP Status Code: -1
HTTP Error Message: Failed to connect to localhost port 54321: Connection refused
Cannot load library from path lib/windows_64/xgboost4j_gpu.dll
Cannot load library from path lib/xgboost4j_gpu.dll
Failed to load library from both native path and jar!
Cannot load library from path lib/windows_64/xgboost4j_omp.dll
Cannot load library from path lib/xgboost4j_omp.dll
Failed to load library from both native path and jar!
Cannot load library from path lib/windows_64/xgboost4j_minimal.dll
Cannot load library from path lib/xgboost4j_minimal.dll
Failed to load library from both native path and jar!
Failed to add native path to the classpath at runtime
java.io.IOException: Failed to get field handle to set library path
at ai.h2o.xgboost4j.java.NativeLibLoader.addNativeDir(NativeLibLoader.java:229)
at ai.h2o.xgboost4j.java.NativeLibLoader.initXGBoost(NativeLibLoader.java:43)
at ai.h2o.xgboost4j.java.NativeLibLoader.getLoader(NativeLibLoader.java:66)
at hex.tree.xgboost.XGBoostExtension.initXgboost(XGBoostExtension.java:70)
at hex.tree.xgboost.XGBoostExtension.isEnabled(XGBoostExtension.java:51)
at water.ExtensionManager.isEnabled(ExtensionManager.java:189)
at water.ExtensionManager.registerCoreExtensions(ExtensionManager.java:103)
at water.H2O.main(H2O.java:2203)
at water.H2OStarter.start(H2OStarter.java:22)
at water.H2OStarter.start(H2OStarter.java:48)
at water.H2OApp.main(H2OApp.java:12)
Cannot initialize XGBoost backend! Xgboost (enabled GPUs) needs:
- CUDA 8.0
XGboost (minimal version) needs:
- GCC 4.7+
For more details, run in debug mode: `java -Dlog4j.configuration=file:///tmp/log4j.properties -jar h2o.jar`
ERROR: Unknown argument (<my username>/AppData/Local/Temp/RtmpcdvCce)
Usage: java [-Xmx<size>] -jar h2o.jar [options]
(Note that every option has a default and is optional.)
-h | -help
Print this help.
-version
Print version info and exit.
-name <h2oCloudName>
Cloud name used for discovery of other nodes.
Nodes with the same cloud name will form an H2O cloud
(also known as an H2O cluster).
-flatfile <flatFileName>
Configuration file explicitly listing H2O cloud node members.
-ip <ipAddressOfNode>
IP address of this node.
-port <port>
Port number for this node (note: port+1 is also used by default).
(The default port is 54321.)
-network <IPv4network1Specification>[,<IPv4network2Specification> ...]
The IP address discovery code will bind to the first interface
that matches one of the networks in the comma-separated list.
Use instead of -ip when a broad range of addresses is legal.
(Example network specification: '10.1.2.0/24' allows 256 legal
possibilities.)
-ice_root <fileSystemPath>
The directory where H2O spills temporary data to disk.
-log_dir <fileSystemPath>
The directory where H2O writes logs to disk.
(This usually has a good default that you need not change.)
-log_level <TRACE,DEBUG,INFO,WARN,ERRR,FATAL>
Write messages at this logging level, or above. Default is INFO.
-max_log_file_size
Maximum size of INFO and DEBUG log files. The file is rolled over after a specified size has been reached.
(The default is 3MB. Minimum is 1MB and maximum is 99999MB)
-flow_dir <server side directory or HDFS directory>
The directory where H2O stores saved flows.
(The default is 'C:\Users\<my username>\h2oflows'.)
-nthreads <#threads>
Maximum number of threads in the low priority batch-work queue.
(The default is.)
-client
Launch H2O node in client mode.
-notify_local <fileSystemPath>
Specifies a file to write when the node is up. The file contains one line with the IP and
port of the embedded web server. e.g. 192.168.1.100:54321
-context_path <context_path>
The context path for jetty.
Authentication options:
-jks <filename>
Java keystore file
-jks_pass <password>
(Default is 'h2oh2o')
-jks_alias <alias>
(Optional, use if the keystore has multiple certificates and you want to use a specific one.)
-hostname_as_jks_alias
(Optional, use if you want to use the machine hostname as your certificate alias.)
-hash_login
Use Jetty HashLoginService
-ldap_login
Use Jetty Ldap login module
-kerberos_login
Use Jetty Kerberos login module
-spnego_login
Use Jetty SPNEGO login service
-pam_login
Use Jetty PAM login module
-login_conf <filename>
LoginService configuration file
-spnego_properties <filename>
SPNEGO login module configuration file
-form_auth
Enables Form-based authentication for Flow (default is Basic authentication)
-session_timeout <minutes>
Specifies the number of minutes that a session can remain idle before the server invalidates
the session and requests a new login. Requires '-form_auth'. Default is no timeout
-internal_security_conf <filename>
Path (absolute or relative) to a file containing all internal security related configurations
Cloud formation behavior:
New H2O nodes join together to form a cloud at startup time.
Once a cloud is given work to perform, it locks out new members
from joining.
Examples:
Start an H2O node with 4GB of memory and a default cloud name:
$ java -Xmx4g -jar h2o.jar
Start an H2O node with 6GB of memory and a specify the cloud name:
$ java -Xmx6g -jar h2o.jar -name MyCloud
Start an H2O cloud with three 2GB nodes and a default cloud name:
$ java -Xmx2g -jar h2o.jar &
$ java -Xmx2g -jar h2o.jar &
$ java -Xmx2g -jar h2o.jar &
So... after a lot of poking around I found the answer. Windows Defender ughhh was blocking access to the h2o.jar. The solution was to open PowerShell on the h2o java folder and run the h2o.jar using java -jar h2o.jar. Then you'll get the security prompt asking you to authorize the program (I've had to do it every time, so you might want to check your settings). Once you do that h2o.init() runs very smoothly in R.

Openstack Compute fails to run instances based on flavor and number of instances

I am attempting to create various instances and Compute is failing to spawn some of them.
My instance has the following characteristics:
Name: ThirdInstance
Created from image: CentOS-7-x86_64
Flavor: m1.medium (2 VCPU, 4GB RAM, 40GB Disk)
I have two other instances running. I was unable to spawn these instances unless I used the flavor m1.small (1VCPU, 2GB RAM, 20GB Disk). Any deviation from that flavor and the instance spawning failed.
Unfortunately, my ThirdInstance fails to spawn regardless of the flavor used. I have tried creating it with m1.small and it fails consistently.
I looked at the Nova logs, and am noting that when I attempt to create this instance I am consistently getting the following message in the nova-conductor.log file:
2020-08-29 13:21:09.637 98391 ERROR nova.conductor.manager
2020-08-29 13:21:09.637 98391 ERROR nova.conductor.manager
2020-08-29 13:21:09.890 98391 WARNING nova.scheduler.utils [req-30539015-22f1-4d46-b8b7-63f9c679eed1 4c4c7de6dd134250972958ce260530f2 166dc91ccec24f21963c71a437380ee9 - default default] Failed to compute_task_build_instances: No valid host was found.
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/oslo_messaging/rpc/server.py", line 241, in inner
return func(*args, **kwargs)
File "/usr/lib/python3.6/site-packages/nova/scheduler/manager.py", line 200, in select_destinations
raise exception.NoValidHost(reason="")
nova.exception.NoValidHost: No valid host was found.
: nova.exception_Remote.NoValidHost_Remote: No valid host was found.
2020-08-29 13:21:09.891 98391 WARNING nova.scheduler.utils [req-30539015-22f1-4d46-b8b7-63f9c679eed1 4c4c7de6dd134250972958ce260530f2 166dc91ccec24f21963c71a437380ee9 - default default] [instance: fe54feaf-ecb6-4725-97e9-7d208066ddb0] Setting instance to ERROR state.: nova.exception_Remote.NoValidHost_Remote: No valid host was found.
What am I missing here? What causes these No Host Found failures when I attempt to use flavors other than m1.small, and why does a third instance fail to spawn regardless of the flavor used??? How (if possible) can I get these instances to run properly?
NOTE: I am using an installation created from Packstack on CentOS 8. My machine is a 2- core with 32G of RAM and 3 Terabytes of disk space. The Openstack version is Ussuri.
Seems to me like you have not enough resources, especially CPU-cores. You have written, that your node has only two cores and you had already spawned 2 VMs with small flavor, which requires 1 core each. This No valid host was found-error comes also, when no compute-host was found with enough resources for the selected flavor.
You can check this by yourself:
Run openstack hypervisor list to list your hypervisor and then openstack hypervisor show <ID> with the id of your hypervisor. In the output you find vcpus and vcpus_used. vcpus is the maximum available number of cpu-cores on the selected compute-host. Based on the information of your question, I think both of these values are 2 in your case and that would show you, that you have not enough resources for your third VM.

openstack: Failed to launch instance from the glance

We have setup OpenStack using conjure-up on a (Ubuntu LTS server 16.04.3) single machine. All are services are up and running, and successfully I am able to upload images to the glance.
We wanted to save these glance images created by "glance image-create" in remote machine which have nfs server. So we have configured glance-api.conf file as below.
My glance-api.conf looks like this:
[glance_store]
filesystem_store_datadir = /var/lib/glance/images
default_store = file
And in glance controller node, I have mounted
remote machine Ip/home/glance/images/ in this directory path
/var/lib/glance/images
and have mentioned the same mounted directory path inside the glance-api.conf file.
I have created the two sample private network with some ip (192.168.1.0 and 10.221.50.0) but have not created a public network as at this moment I don't want to access this VM instance from outside.
When I am trying to launch the instance from dashboard UI as well as through CLI, I am getting below error.
Error: Failed to perform requested operation on instance "Ubuntu_Hawkbit", the instance has an error status: Please try again later [Error: No valid host was found. There are not enough hosts available.].
Note: I have tried by associating Instance with different private network ,thinking that it may be network IP address issue but facing the same error.
When I check /var/log/nova/nova-compute.log logs, I see below error.
ERROR nova.image.glance [req-1459f1b2-491c-46a2-b803-6ff621a79d30 6ebc7996240c4ce688234f544c9d0116 07427c9d49704357a049b24193ee0a28 - -
-] Error contacting glance server 'http://10.206.193.159:9292' for 'data', done trying.
ERROR nova.image.glance CommunicationError:
Error finding address for
http://10.206.193.159:9292/v1/images/6c30e2ab-1078-45ad-bed2-3e3a75f6af8c:
('Connection aborted.', BadStatusLine("''",))
ERROR
nova_lxd.nova.virt.lxd.image
[req-1459f1b2-491c-46a2-b803-6ff621a79d30
6ebc7996240c4ce688234f544c9d0116 07427c9d49704357a049b24193ee0a28 - -
-] [instance: eedc008d-ef34-498d-8774-b3813ce032f4] Failed to upload 6c30e2ab-1078-45ad-bed2-3e3a75f6af8c to LXD: Connection to glance
ERROR nova_lxd.nova.virt.lxd.operations
[req-1459f1b2-491c-46a2-b803-6ff621a79d30
6ebc7996240c4ce688234f544c9d0116 07427c9d49704357a049b24193ee0a28 - -
-] [instance: eedc008d-ef34-498d-8774-b3813ce032f4] Faild to start container instance-00000020: Connection to glance host
http://10.206.193.159:9292 failed: Error finding address for
http://10.206.193.159:9292/v1/images/6c30e2ab-1078-45ad-bed2-3e3a75f6af8c:
('Connection aborted.', BadStatusLine("''",))
ERROR nova.compute.manager [req-1459f1b2-491c-46a2-b803-6ff621a79d30
6ebc7996240c4ce688234f544c9d0116 07427c9d49704357a049b24193ee0a28 - -
-] [instance: eedc008d-ef34-498d-8774-b3813ce032f4] Instance failed to spawn

Cloudify : "Could not determine the type of file "sftp://root:***#192.168.10.xxx/root/gs-files"."

I am attempting to bootstrap a private OpenStack cloud using Cloudify 2.7.1. It boots up the Linux instance correctly but fails "Uploading files to 192.168.10.XXX." due to an SFTP problem : "Could not determine the type of file "sftp://root:***#192.168.10.xxx/root/gs-files".".
I can access to the Instance using ssh (there is no probleme in the connection). I tried with other images (CentOS, Ubuntu, Cerros, ...) but always the same error !!
can anyone help me please ?
I attached a screenshot of the network topology created by Cloudify, and the stack trace.
Full stack trace:
2015-04-30 10:26:27,470 INFO [org.cloudifysource.shell.commands.AbstractGSCommand] - Setting security profile to "nonsecure".
2015-04-30
10:26:27,589 INFO
[org.cloudifysource.shell.commands.AbstractGSCommand] - Bootstrapping
cloud openstack-havana. This may take a few minutes.
2015-04-30
10:26:27,677 INFO
[org.cloudifysource.esc.driver.provisioning.BaseProvisioningDriver] -
Setup network configuration for managers
2015-04-30 10:26:27,677
INFO [org.cloudifysource.esc.driver.provisioning.BaseProvisioningDriver]
- Using management network : Cloudify-Management-Network
2015-04-30
10:26:51,536 INFO
[org.cloudifysource.esc.shell.listener.CliAgentlessInstallerListener] -
Attempting to access Management VM 192.168.10.241.
2015-04-30
10:27:10,551 INFO
[org.cloudifysource.esc.shell.listener.CliAgentlessInstallerListener] -
Uploading files to 192.168.10.241.
2015-04-30 10:27:15,708 WARNING [com.jcraft.jsch] - Permanently added '192.168.10.241' (RSA) to the list of known hosts.
2015-04-30
10:27:25,998 INFO
[org.cloudifysource.esc.shell.installer.CloudGridAgentBootstrapper] -
Failed accessing management VM 192.168.10.241 Reason: Failed to set up
file transfer: Unknown message with code "Could not determine the type
of file "sftp://cirros#192.168.10.241/cirros/gs-files".".; Caused by:
org.cloudifysource.esc.installer.InstallerException: Failed to set up
file transfer: Unknown message with code "Could not determine the type
of file "sftp://cirros#192.168.10.241/cirros/gs-files".".
2015-04-30
10:27:26,210 INFO
[org.cloudifysource.esc.driver.provisioning.openstack.OpenStackCloudifyDriver]
- Deleting Floating ip:
FloatingIp[floatingNetworkId=15578898-5e6b-44d9-a73a-1328ca6ea140,floatingIpAddress=192.168.10.241,portId=4b8dc211-12e8-4383-8799-f783d2786e98,id=593d8424-cfec-41ed-8204-ed8609366416]
2015-04-30
10:27:29,607 SEVERE
[org.cloudifysource.shell.commands.AbstractGSCommand] - Failed to set up
file transfer: Unknown message with code "Could not determine the type
of file "sftp://cirros#192.168.10.241/cirros/gs-files".". :
org.cloudifysource.esc.installer.InstallerException: Failed to set up
file transfer: Unknown message with code "Could not determine the type
of file "sftp://cirros#192.168.10.241/cirros/gs-files".".
at org.cloudifysource.esc.installer.filetransfer.VfsFileTransfer.initialize(VfsFileTransfer.java:206)
at org.cloudifysource.esc.installer.AgentlessInstaller.uploadFilesToServer(AgentlessInstaller.java:306)
at org.cloudifysource.esc.installer.AgentlessInstaller.installOnMachineWithIP(AgentlessInstaller.java:210)
at org.cloudifysource.esc.shell.installer.CloudGridAgentBootstrapper$1.call(CloudGridAgentBootstrapper.java:865)
at org.cloudifysource.esc.shell.installer.CloudGridAgentBootstrapper$1.call(CloudGridAgentBootstrapper.java:860)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused
by: org.apache.commons.vfs2.FileSystemException: Unknown message with
code "Could not determine the type of file
"sftp://cirros#192.168.10.241/cirros/gs-files".".
at org.apache.commons.vfs2.provider.sftp.SftpFileObject.refresh(SftpFileObject.java:95)
at org.apache.commons.vfs2.provider.AbstractFileSystem.resolveFile(AbstractFileSystem.java:366)
at org.apache.commons.vfs2.provider.AbstractFileSystem.resolveFile(AbstractFileSystem.java:317)
at org.apache.commons.vfs2.provider.AbstractOriginatingFileProvider.findFile(AbstractOriginatingFileProvider.java:85)
at org.apache.commons.vfs2.provider.AbstractOriginatingFileProvider.findFile(AbstractOriginatingFileProvider.java:65)
at org.apache.commons.vfs2.impl.DefaultFileSystemManager.resolveFile(DefaultFileSystemManager.java:693)
at org.apache.commons.vfs2.impl.DefaultFileSystemManager.resolveFile(DefaultFileSystemManager.java:621)
at org.cloudifysource.esc.installer.filetransfer.VfsFileTransfer.resolveTargetDirectory(VfsFileTransfer.java:218)
at org.cloudifysource.esc.installer.filetransfer.VfsFileTransfer.initialize(VfsFileTransfer.java:203)
... 8 more
Caused
by: org.apache.commons.vfs2.FileSystemException: Could not determine
the type of file "sftp://cirros#192.168.10.241/cirros/gs-files".
at org.apache.commons.vfs2.provider.AbstractFileObject.getType(AbstractFileObject.java:505)
at org.apache.commons.vfs2.provider.sftp.SftpFileObject.refresh(SftpFileObject.java:91)
... 16 more
Caused by: org.apache.commons.vfs2.FileSystemException: Could not connect to SFTP server at "sftp://cirros#192.168.10.241/".
at org.apache.commons.vfs2.provider.sftp.SftpFileSystem.getChannel(SftpFileSystem.java:153)
at org.apache.commons.vfs2.provider.sftp.SftpFileObject.statSelf(SftpFileObject.java:151)
at org.apache.commons.vfs2.provider.sftp.SftpFileObject.doGetType(SftpFileObject.java:114)
at org.apache.commons.vfs2.provider.AbstractFileObject.getType(AbstractFileObject.java:496)
... 17 more
Caused by: com.jcraft.jsch.JSchException: java.io.IOException: Pipe closed
at com.jcraft.jsch.ChannelSftp.start(ChannelSftp.java:288)
at com.jcraft.jsch.Channel.connect(Channel.java:152)
at com.jcraft.jsch.Channel.connect(Channel.java:145)
at org.apache.commons.vfs2.provider.sftp.SftpFileSystem.getChannel(SftpFileSystem.java:130)
... 20 more
Caused by: java.io.IOException: Pipe closed
at java.io.PipedInputStream.read(PipedInputStream.java:308)
at java.io.PipedInputStream.read(PipedInputStream.java:378)
at com.jcraft.jsch.ChannelSftp.fill(ChannelSftp.java:2665)
at com.jcraft.jsch.ChannelSftp.header(ChannelSftp.java:2691)
at com.jcraft.jsch.ChannelSftp.start(ChannelSftp.java:257)
... 23 more
Looks like you are trying to sftp into a cirros instance - I am not sure cirros even supports sftp. You can try this by using the sftp command line utility.
In general, sftp has to be configured and available on the target machine.
You can try using the SCP file transfer mode by setting this in your compute template:
fileTransfer org.cloudifysource.domain.cloud.FileTransferModes.SCP
If you are really using cirros, I suspect bootstrapping will fail. Cloudify was never tested on cirros. I think cirros is lacking some very basic utilities (I think it is not running bash. Not sure if it has wget). Cirros was never meant as a generic distribution - it is meant for testing your cloud's basic functionality.
One more thing - Cloudify 2 has reached End-of-Life - it is no longer supported. You should check out Cloudify 3.

GridGain Out of Memory Exception: Unable to create new native thread

I'm trying to create more then 2 instances of Grid Gain (Just by running the shell script) in Red Hat Release 6.5 (Santiago), but i get the following error when i try to run the shell script a 3rd time:
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949)
at java.util.concurrent.ThreadPoolExecutor.prestartAllCoreThreads(ThreadPoolExecutor.java:1604)
at org.gridgain.grid.kernal.GridGainEx$GridNamedInstance.start0(GridGainEx.java:1507)
at org.gridgain.grid.kernal.GridGainEx$GridNamedInstance.start(GridGainEx.java:1289)
at org.gridgain.grid.kernal.GridGainEx.start0(GridGainEx.java:832)
at org.gridgain.grid.kernal.GridGainEx.start(GridGainEx.java:759)
at org.gridgain.grid.kernal.GridGainEx.start(GridGainEx.java:677)
at org.gridgain.grid.kernal.GridGainEx.start(GridGainEx.java:524)
at org.gridgain.grid.kernal.GridGainEx.start(GridGainEx.java:494)
at org.gridgain.grid.GridGain.start(GridGain.java:314)
at org.gridgain.grid.startup.cmdline.GridCommandLineStartup.main(GridCommandLineStartup.java:293)
I have set ulimit -n 4096 but still no joy
The box has 64GB of memory - ample amount to run more then 2 instances of GridGain
Can anyone help with this error? are there any configuration changes i can make in Red Hat?
Thanks
Most likely you are running out of allowed number of user processes. We have encountered the same issue on our CentOs servers and setting ulimit -u 10240 helped.

Resources