GridGain Out of Memory Exception: Unable to create new native thread - out-of-memory

I'm trying to create more then 2 instances of Grid Gain (Just by running the shell script) in Red Hat Release 6.5 (Santiago), but i get the following error when i try to run the shell script a 3rd time:
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949)
at java.util.concurrent.ThreadPoolExecutor.prestartAllCoreThreads(ThreadPoolExecutor.java:1604)
at org.gridgain.grid.kernal.GridGainEx$GridNamedInstance.start0(GridGainEx.java:1507)
at org.gridgain.grid.kernal.GridGainEx$GridNamedInstance.start(GridGainEx.java:1289)
at org.gridgain.grid.kernal.GridGainEx.start0(GridGainEx.java:832)
at org.gridgain.grid.kernal.GridGainEx.start(GridGainEx.java:759)
at org.gridgain.grid.kernal.GridGainEx.start(GridGainEx.java:677)
at org.gridgain.grid.kernal.GridGainEx.start(GridGainEx.java:524)
at org.gridgain.grid.kernal.GridGainEx.start(GridGainEx.java:494)
at org.gridgain.grid.GridGain.start(GridGain.java:314)
at org.gridgain.grid.startup.cmdline.GridCommandLineStartup.main(GridCommandLineStartup.java:293)
I have set ulimit -n 4096 but still no joy
The box has 64GB of memory - ample amount to run more then 2 instances of GridGain
Can anyone help with this error? are there any configuration changes i can make in Red Hat?
Thanks

Most likely you are running out of allowed number of user processes. We have encountered the same issue on our CentOs servers and setting ulimit -u 10240 helped.

Related

Cstack_info() output different between Rstudio Server and Rstudio Desktop on Ubuntu 20.04LTS

I am having trouble getting rid of the CStack limit when running my code.
I managed to get rid of the error by appending
* hard stack unlimited
* soft stack unlimited
* soft memlock unlimited
* hard memlock unlimited
root soft stack unlimited
root hard stack unlimited
root soft memlock unlimited
root hard memlock unlimited
to /etc/security/limits.conf which fixes the problem on RStudio Desktop.
I get the following output from running Cstack_info()
> Cstack_info()
size current direction eval_depth
NA NA 1 2
This is the output from ulimit -s on the desktop terminal
coolshades#coolshades-ws:~$ ulimit -s
unlimited
Code runs perfectly on RStudio Desktop.
On the same machine, I also am running RStudio Server (free) to run code remotely. It would seem that these settings are not sticking when running RStudio Server.
This is the output from Cstack_info() on the RStudio Server
> Cstack_info()
size current direction eval_depth
7969177 26336 1 2
This is the ulimit output from terminal on the RStudio Server
coolshades#coolshades-ws:~$ ulimit -s
8192
I am able to change the limit back to unlimited with ulimit -s unlimited. But it will only kick in after Rsession is restarted. However, when I restart the R session, the output of ulimit -s reverts back to 8192.
I am out of ideas as to how best to tackle this problem and hope a more experienced RStudio Server user will be able to advise on this matter.
I have solved this problem.
I had to make the following changes to the following files:
sudo nano /etc/systemd/user.conf add DefaultLimitSTACK=134217728
sudo nano /etc/systemd/system.conf add DefaultLimitSTACK=134217728
Make sure the number you define is a power of 2, else Ubuntu fails to login for some reason.
I have 128GB of RAM. So I have set my limit to 2^27.
Hope this helps someone with the same problem.

Stress-ng - Overload Memory

I want to test the systems reaction to a process that wants to consume more memory than there is available.
I run stress-ng with the following command (on a 6G RAM machine):
stress-ng --vm-bytes 8G --vm-keep -m 1 --aggressive
but I get this error:
stress-ng: error: [5035] stress-ng-vm: gave up trying to mmap, no available memory
Is it possible to force the program to ignore its own secure mechanism ?
try to add this parameter --vm 4
I was having the same problem and it is gone after that.

502 Gitlab is taking too much time to respond

After taking gitlab backup everyday gitlab is throwing 502 error.
I saw nginx logs but did not find that much information.
After gitlab-ctl restart it starts working again.
System Configurations:
OS : Ubuntu 16.04 LTS
4 GB Ram
200 GB Disk Space
can anyone give permanent solution for it.
There is a high possibility that it run out of shared memory. As each time after the backup you got the 502 error.
To check it with gitlab-ctl tail tail detail
It will show something like:
2019-04-12_12:37:17.27154 FATAL: could not map anonymous shared memory: Cannot allocate memory
2019-04-12_12:37:17.27157 HINT: This error usually means that PostgreSQL's request for a shared memory segment exceeded available memory, swap space, or huge pages. To reduce the request size (currently 4345470976 bytes), reduce PostgreSQL's shared memory usage, perhaps by reducing shared_buffers or max_connections.
2019-04-12_12:37:17.27171 LOG: database system is shut down
Then check it with free -m, which shows there is no available shared memory.
total used free shared buffers cached
Mem: 16081 13715 2365 0 104 753
-/+ buffers/cache: 12857 3223
Then you need to check if there is some process take too many shared memory, or too many zomibe process, then kill it with command like ps -aef | grep ffmpeg | awk '{print $2}' | xargs kill 9
Check it with free -h, there is about 112M shared memory now.
total used free shared buffers cached
Mem: 15G 4.4G 11G 112M 46M 416M
-/+ buffers/cache: 3.9G 11G
Swap: 0B 0B 0B
At last,restart you gitlab with gitlab-ctl restart, after sometime the gitlab booted, the 502 gone.
After long search i got something about it. After taking backup my gitlab-workhorse is getting ideal and gitlab.socket is refusing the connection. As temporary solution i have installed a new cron job for restarting gitlab service after the complpetion of gitlab backup cronjob.
If the gitlab is installed in Virtual-Box - Ubuntu server either 18.04 or 20.04,
please increase the RAM to 4gb and the provide atleast 3 processors.

Spark - No Space Left on device error

I am getting the below error . The Spark_local_dir has been set and has enough space and inodes left.
java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:326)
at org.apache.spark.storage.TimeTrackingOutputStream.write(TimeTrackingOutputStream.java:58)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at org.xerial.snappy.SnappyOutputStream.dumpOutput(SnappyOutputStream.java:294)
at org.xerial.snappy.SnappyOutputStream.compressInput(SnappyOutputStream.java:306)
at org.xerial.snappy.SnappyOutputStream.rawWrite(SnappyOutputStream.java:245)
at org.xerial.snappy.SnappyOutputStream.write(SnappyOutputStream.java:107)
at org.apache.spark.io.SnappyOutputStreamWrapper.write(CompressionCodec.scala:190)
at org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:218)
at org.apache.spark.util.collection.ChainedBuffer.read(ChainedBuffer.scala:56)
at org.apache.spark.util.collection.PartitionedSerializedPairBuffer$$anon$2.writeNext(PartitionedSerializedPairBuffer.scala:137)
at org.apache.spark.util.collection.ExternalSorter.writePartitionedFile(ExternalSorter.scala:757)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:70)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
cat spark-env.sh |grep -i local
export SPARK_LOCAL_DIRS=/var/log/hadoop/spark
disk usage
df -h /var/log/hadoop/spark
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/meta 200G 1.1G 199G 1% /var/log/hadoop
inodes
df -i /var/log/hadoop/spark
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/mapper/meta 209711104 185 209710919 1% /var/log/hadoop
I also encountered the same issue. To resolve it, I first checked my hdfs disk usage by running hdfs dfsadmin -report.
The Non DFS Used column was above 250 GB. This implied that my logs or tmp or intermediate data was consuming too much space.
After running du -lh | grep G from root folder I figured that spark/work was consuming over 200 GB.
After looking at the folders inside spark/work I understood that by mistake I forgot to uncomment System.out.println statement and hence the logs were consuming high space.
If you're running YARN in yarn-cluster mode then the local dirs used by both Spark executors and driver will be taken from YARN config (yarn.nodemanager.local-dirs). spark.local.dir and your env variable will be ignored.
If you're running YARN in yarn-client mode then the executors will use the local dirs configured the in the YARN config again but the driver will use the one you specified in your env variable because in that mode the driver is not ran on the YARN cluster.
So try setting that config.
You can find a bit more information in the documentation
And there's even a whole section on running spark on yarn
Please check how many inodes were used by hadoop. If they all have gone, the generic error would be the same, no space left, while there is still a space.

Installation of Riak under Ubuntu 14.04 LTS

I cant bring riak to work on Ubuntu 14.04. LTS using the bash instructions under
http://docs.basho.com/riak/latest/ops/building/installing/debian-ubuntu/.
When running riak start I get:
riak failed to start within 15 seconds,
see the output of 'riak console' for more information.
If you want to wait longer, set the environment variable
WAIT_FOR_ERLANG to the number of seconds to wait.
When running riak console afterwards:
Exec: /usr/lib/riak/erts-5.10.3/bin/erlexec -boot /usr/lib/riak/releases/2.1.3/riak -config /var/lib/riak/generated.configs/app.2016.02.28.21.43.04.config -args_file /var/lib/riak/generated.configs/vm.2016.02.28.21.43.04.args -vm_args /var/lib/riak/generated.configs/vm.2016.02.28.21.43.04.args -pa /usr/lib/riak/lib/basho-patches -- console -x
Root: /usr/lib/riak
Erlang R16B02_basho8 (erts-5.10.3) [source] [64-bit] [smp:2:2] [async-threads:64] [kernel-poll:true] [frame-pointer]
[os_mon] memory supervisor port (memsup): Erlang has closed
[os_mon] cpu supervisor port (cpu_sup): Erlang has closed
{"Kernel pid terminated",application_controller,"{application_start_failure,riak_core,{bad_return,{{riak_core_app,start,[normal,[]]},{'EXIT',{{function_clause,[{orddict,fetch,['riak#127.0.0.1',[{'riak#54.194.69.48',[{{riak_core,bucket_types},[true,false]},{{riak_core,fold_req_version},[v2,v1]},{{riak_core,net_ticktime},[true,false]},{{riak_core,resizable_ring},[true,false]},{{riak_core,security},[true,false]},{{riak_core,staged_joins},[true,false]},{{riak_core,vnode_routing},[proxy,legacy]},{{riak_pipe,trace_format},[ordsets,sets]}]}]],[{file,\"orddict.erl\"},{line,72}]},{riak_core_capability,renegotiate_capabilities,1,[{file,\"src/riak_core_capability.erl\"},{line,441}]},{riak_core_capability,handle_call,3,[{file,\"src/riak_core_capability.erl\"},{line,213}]},{gen_server,handle_msg,5,[{file,\"gen_server.erl\"},{line,585}]},{proc_lib,init_p_do_apply,3,[{file,\"proc_lib.erl\"},{line,239}]}]},{gen_server,call,[riak_core_capability,{register,{riak_core,vnode_routing},{capability,[proxy,legacy],legacy,{riak_core,legacy_vnode_routing,[{true,legacy},{false,proxy}]}}},infinity]}}}}}}"}
Any idea how to fix this? Installation has been done via apt-get. Default riak.conf. Riak version is 2.1.3.
This is a Riak error, not at all related to Ubuntu.
The error message indicates that the current name of the node does not match the name of any node in the ring file. This can happen if you start the node with a default configuration before configuring the node's name. See Note on changing the name value at http://docs.basho.com/riak/latest/ops/building/basic-cluster-setup/
If this is a singleton node, the simplest solution will be to delete the files in /var/lib/riak/ring (make a backup first). A new one will be created when you start the node.

Resources