Spark - No Space Left on device error - unix

I am getting the below error . The Spark_local_dir has been set and has enough space and inodes left.
java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:326)
at org.apache.spark.storage.TimeTrackingOutputStream.write(TimeTrackingOutputStream.java:58)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at org.xerial.snappy.SnappyOutputStream.dumpOutput(SnappyOutputStream.java:294)
at org.xerial.snappy.SnappyOutputStream.compressInput(SnappyOutputStream.java:306)
at org.xerial.snappy.SnappyOutputStream.rawWrite(SnappyOutputStream.java:245)
at org.xerial.snappy.SnappyOutputStream.write(SnappyOutputStream.java:107)
at org.apache.spark.io.SnappyOutputStreamWrapper.write(CompressionCodec.scala:190)
at org.apache.spark.storage.DiskBlockObjectWriter.write(BlockObjectWriter.scala:218)
at org.apache.spark.util.collection.ChainedBuffer.read(ChainedBuffer.scala:56)
at org.apache.spark.util.collection.PartitionedSerializedPairBuffer$$anon$2.writeNext(PartitionedSerializedPairBuffer.scala:137)
at org.apache.spark.util.collection.ExternalSorter.writePartitionedFile(ExternalSorter.scala:757)
at org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:70)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:70)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:70)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
cat spark-env.sh |grep -i local
export SPARK_LOCAL_DIRS=/var/log/hadoop/spark
disk usage
df -h /var/log/hadoop/spark
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/meta 200G 1.1G 199G 1% /var/log/hadoop
inodes
df -i /var/log/hadoop/spark
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/mapper/meta 209711104 185 209710919 1% /var/log/hadoop

I also encountered the same issue. To resolve it, I first checked my hdfs disk usage by running hdfs dfsadmin -report.
The Non DFS Used column was above 250 GB. This implied that my logs or tmp or intermediate data was consuming too much space.
After running du -lh | grep G from root folder I figured that spark/work was consuming over 200 GB.
After looking at the folders inside spark/work I understood that by mistake I forgot to uncomment System.out.println statement and hence the logs were consuming high space.

If you're running YARN in yarn-cluster mode then the local dirs used by both Spark executors and driver will be taken from YARN config (yarn.nodemanager.local-dirs). spark.local.dir and your env variable will be ignored.
If you're running YARN in yarn-client mode then the executors will use the local dirs configured the in the YARN config again but the driver will use the one you specified in your env variable because in that mode the driver is not ran on the YARN cluster.
So try setting that config.
You can find a bit more information in the documentation
And there's even a whole section on running spark on yarn

Please check how many inodes were used by hadoop. If they all have gone, the generic error would be the same, no space left, while there is still a space.

Related

Cannot delete Cinder volume with error message "image still has watchers"

I run Openstack cinder with ceph as its storage backend. when I occasionally tried to delete one of cinder-volume, it failed.
So I turned to use rbd commands to troubleshoot this issue, below is the error message printed by the command: rbd rm ${pool}/${volume-id}
rbd: error: image still has watchers
This means the image is still
open or the client using it crashed. Try again after closing/unmapping
it or waiting 30s for the crashed client to timeout.
Then rbd status ${pool}/${volume-id} shows
Watchers:
watcher=172.18.0.1:0/523356342 client.230016780
cookie=94001004445696
I am confused why the watcher stick on the volume and cause the volume unable to delete, is there any reason or something I did wrong?
And how to delete the volume in this case?
I found a solution to fix this issue, the concept is adding the watcher to the blacklist by using ceph osd blacklist, then the volume will become removable, after deleting, remove the watcher from the blacklist.
add the watcher to the blacklist
$ ceph osd blacklist add 172.18.0.1:0/523356342
blacklisting 172.18.0.1:0/523356342
check status and delete the volume
$ rbd status ${pool}/${volume-id}
Watchers: none
$ rbd rm ${pool}/${volume-id}
Removing image: 100% complete...done.
remove the watcher from the blacklist
$ ceph osd blacklist rm 172.18.0.1:0/523356342
un-blacklisting 172.18.0.1:0/523356342
That's all, but still finding the root cause.

Docker Per-Container Disk Quota on Bind Mounted Volumes

I am trying to create a simple hosting platform for my clients. I am deploying all of my apps via docker on a VPS behind nginx-proxy. For wordpress applications I want to be able to limit disk-space so that my clients do not use too much and affect other applications. I bind mount all volumes to a single directory so that I can back-up easily with cron.
I've change the file system to overlay2 and am on centos 7.
[root#my-ip ~]# docker info
Server:
Containers: 12
Running: 12
Paused: 0
Stopped: 0
Images: 11
Server Version: 19.03.1
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Native Overlay Diff: true
When I run a wordpress container with the --storage-opt size=10G I get the following error:
docker: Error response from daemon: --storage-opt is supported only for overlay over xfs with 'pquota' mount option.
This is an example of the bind mount I am using:
-v /DOCKER_VOLUMES/wordpress/appname/www/html:/var/www/html
How do I fix this? Can you please provide a full list of instructions to enable it?
from the Docs:
This (size) will allow to set the container rootfs size to 120G at creation time. This option is only available for the devicemapper, btrfs, overlay2, windowsfilter and zfs graph drivers. For the devicemapper, btrfs, windowsfilter and zfs graph drivers, user cannot pass a size less than the Default BaseFS Size. For the overlay2 storage driver, the size option is only available if the backing fs is xfs and mounted with the pquota mount option. Under these conditions, user can pass any size less than the backing fs size.
so the pquota should be enabled on your system
you can edit the file /etc/default/grub like so, and restart your machine:
GRUB_CMDLINE_LINUX_DEFAULT="rootflags=uquota,pquota"
and try to rerun your command with --storage-opt size=10G

Persistence with Bitnami's Wordpress Docker Setup

I'm trying to set up Wordpress with this documentation:
https://github.com/bitnami/bitnami-docker-wordpress#mount-host-directories-as-data-volumes-with-docker-compose
My host directories for the volumes look like this in the docker-compose file:
volumes:
- './mariadb_data:/bitnami'
...
volumes:
- './wordpress_data:/bitnami'
When running docker-compose up, the following errors occur:
mariadb_1 | INFO ==> Starting mysqld_safe...
mariadb_1 | Could not open required defaults file: /opt/bitnami/mariadb/conf/my.cnf
mariadb_1 | Fatal error in defaults handling. Program aborted
mariadb_1 | WARNING: Defaults file '/opt/bitnami/mariadb/conf/my.cnf' not found!
mariadb_1 | Could not open required defaults file: /opt/bitnami/mariadb/conf/my.cnf
mariadb_1 | Fatal error in defaults handling. Program aborted
mariadb_1 | WARNING: Defaults file '/opt/bitnami/mariadb/conf/my.cnf' not found!
mariadb_1 | 171105 05:15:41 mysqld_safe Logging to '/opt/bitnami/mariadb/data/200101d1b330.err'.
mariadb_1 | 171105 05:15:41 mysqld_safe Starting mysqld daemon with databases from /opt/bitnami/mariadb/data
mariadb_1 | /opt/bitnami/mariadb/bin/mysqld_safe_helper: Can't create/write to file '/opt/bitnami/mariadb/data/200101d1b330.err' (Errcode: 2 "No such file or directory")
myproject_mariadb_1 exited with code 1
However, if I change my docker-compose file to use non-host directories:
volumes:
- 'mariadb_data:/bitnami'
...
volumes:
- 'wordpress_data:/bitnami'
... the docker-compose up works.
If I then stop docker, and then revert my docker-compose file to use host directories again, docker-compose up will now work, and the host directories are populated correctly.
This is a solution to my problem, but I would like to know why, and if there is a way to make things work without this work-around.
Check if the bitnami/bitnami-docker-mariadb issue 123 is relevant in your case:
It seems that docker-compose up did not create a container from scratch (with a clean filesystem), but instead used one preexisting. I deduce this from the beginning sequence:
Starting mariadb_mariadb_1
Attaching to mariadb_mariadb_1
...
It seems to me that this container, in its previous execution, was started with an attached volume at /bitnami/mariadb. After that, the container was stopped, such volume detached, and then the container was restarted. It didn't configure anything and just tried to run the mysql server binary. Since we perform symbolic links from /opt/bitnami/mariadb pointing to /bitnami/mariadb (my.cnf file included), that file went missing and the binaries crashed at start time.
Could you please try using the docker-compose file we provide in this repo? If you only modify it for adding environment variables you shouldn't run into these kind of issues.
As a workaround, just run the following:
docker-compose down -v
docker-compose up
It will remove the the MariaDB container, along with any volume associated, and start from scratch. Bear in mind that you will lose any state you set in the container.

GridGain Out of Memory Exception: Unable to create new native thread

I'm trying to create more then 2 instances of Grid Gain (Just by running the shell script) in Red Hat Release 6.5 (Santiago), but i get the following error when i try to run the shell script a 3rd time:
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949)
at java.util.concurrent.ThreadPoolExecutor.prestartAllCoreThreads(ThreadPoolExecutor.java:1604)
at org.gridgain.grid.kernal.GridGainEx$GridNamedInstance.start0(GridGainEx.java:1507)
at org.gridgain.grid.kernal.GridGainEx$GridNamedInstance.start(GridGainEx.java:1289)
at org.gridgain.grid.kernal.GridGainEx.start0(GridGainEx.java:832)
at org.gridgain.grid.kernal.GridGainEx.start(GridGainEx.java:759)
at org.gridgain.grid.kernal.GridGainEx.start(GridGainEx.java:677)
at org.gridgain.grid.kernal.GridGainEx.start(GridGainEx.java:524)
at org.gridgain.grid.kernal.GridGainEx.start(GridGainEx.java:494)
at org.gridgain.grid.GridGain.start(GridGain.java:314)
at org.gridgain.grid.startup.cmdline.GridCommandLineStartup.main(GridCommandLineStartup.java:293)
I have set ulimit -n 4096 but still no joy
The box has 64GB of memory - ample amount to run more then 2 instances of GridGain
Can anyone help with this error? are there any configuration changes i can make in Red Hat?
Thanks
Most likely you are running out of allowed number of user processes. We have encountered the same issue on our CentOs servers and setting ulimit -u 10240 helped.

Unable to rsync between my server and my Mac

I have a server where I store data from Mac A and Mac B.
I use rsync to keep the files updated between my Macs.
I run the following code unsuccessfully
#!/bin/zsh
# to copy files from my server to my folder
rsync -Pav $Masi:~/private/ ~/Dropbox/Courses/math/
# to copy files from my folder to my server
rsync -Pav ~/Dropbox/Courses/math $Masi:~/private/
I get the following error message
ssh: connect to host port 22: Connection refused
rsync: connection unexpectedly closed (0 bytes received so far) [receiver]
rsync error: unexplained error (code 255) at io.c(600) [receiver=3.0.5]
ssh: connect to host port 22: Connection refused
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.5]
I have ssh keys in place so the connection should work, since I can use scp without problems.
How can you use rsync between my server and one of my Macs?
I used to do a lot of this. Just ran a test, a few suggestions.
Spell out your entire user#host pattern
Run the ssh connection sans the rsync first, you may need to first approve your fingerprint
You do not seem to pass a flag to protect extended attributes, this can yield broken files on OS X. If you do not need resource forks, you are OK, but most of the time you do need them.
My test case:
$ rsync -Pav ~/Desktop/ me#remote.example.com:~/rsyc-test
In that case, all the files within ~/Desktop were copied to the remote host, in my home dir. Since the directory 'rsyc-test' did not exist, it was made for me. I had a .app on my Desktop, it made it over, surprisingly, it works. Even some .webloc files made it and appear to work, though I do not trust it.
I would strongly suggest adding in the -E flag
-E, --extended-attributes
Apple specific option to copy extended attributes, resource
forks, and ACLs. Requires at least Mac OS X 10.4 or suitably
patched rsync.
I ran a new test, moved a Interarchy bookmark to my desktop, I know for a fact these break if they are copied sans resource forks. Running without the -E versus with the -E, there is a difference of 152 bytes in xfered data. The first file on the remote machine did not work, the second transfered file did work.
I can not help but notice in your example one of your paths is ~/Dropbox so this may all not matter, since DropBox, the app, does not at all support resource forks currently, though I hear there are plans to in the future.
You also are not sending in the --delete flag, if your end goal is a mirror of your data, you are not getting that, if your end goal is backups that continually grows, keeping everything that was ever on the source, the lack of --delete is good.
Other notes:
You can exclude those silly .DS_Store files
--exclude '.DS_Store'
You can also set rsync up in a way to be a true mirror, so you would not need to run your other command, see the man page for details.
My final working command to shove the Desktop of my laptop to a remote machine:
$ rsync -PEav --delete --exclude '.DS_Store' ~/Desktop/ me#remote.example.com:~/rsycn-test
Check "$Masi". Is that the hostname you are trying to reach?
Try the following command to debug it:
rsync -e 'ssh -v' -Pav $Masi:~/private/ ~/Dropbox/Courses/math/
The Connection refused usually happens when there is a connection issue to the remote (e.g. firewall).
In your case the problem is that $Masi variable is empty. If it's not variable, use Masi.
As per this error:
ssh: connect to host port 22: Connection refused
Notice the double space above after the host word.
the connect to host message doesn't say to which host, so you're trying to connect to empty host. So it sound like a typo in the host name.

Resources