rsync fails with intermittently with different error codes on the same command - rsync

I am running rsync for a huge server (100s of Gb), and got two failures that didn't seem to make sense:
3 Errors selecting input/output files, dirs
and
10 Error in socket I/O
The exact rsync command is rsync -rtlzv --delete. The same command can result in these errors on different runs (before completing). What might cause this?

In my case, the issue my target directory was on an external drive, and there is some instability in that drive's connection (I suspect it periodically re-mounts), leading to sudden failures of rsync at whatever step in the process happens to be running when this happens.
The solution for me is to manually re-mount the drive and try again (I haven't come up with a way to increase the mount drive stability, unfortunately).rsync -rtlzv --delete

Related

When using mpirun with R script, should I copy manually file/script on clusters?

I'm trying to understand how openmpi/mpirun handle script file associated with an external program, here a R process ( doMPI/Rmpi )
I can't imagine that I have to copy my script on each host before running something like :
mpirun --prefix /home/randy/openmpi -H clust1,clust2 -n 32 R --slave -f file.R
But, apparently it doesn't work until I copy the script 'file.R' on clusters, and then run mpirun. Then, when I do this, the results are written on cluster, but I expected that they would be returned to working directory of localhost.
Is there another way to send R job from localhost to multiple hosts, including the script to be evaluated ?
Thanks !
I don't think it's surprising that mpirun doesn't know details of how scripts are specified to commands such as "R", but the Open MPI version of mpirun does include the --preload-files option to help in such situations:
--preload-files <files>
Preload the comma separated list of files to the current working
directory of the remote machines where processes will be
launched prior to starting those processes.
Unfortunately, I couldn't get it to work, which may be because I misunderstood something, but I suspect it isn't well tested because very few use that option since it is quite painful to do parallel computing without a distributed file system.
If --preload-files doesn't work for you either, I suggest that you write a little script that calls scp repeatedly to copy the script to the cluster nodes. There are some utilities that do that, but none seem to be very common or popular, which I again think is because most people prefer to use a distributed file system. Another option is to setup an sshfs file system.

Rsync freezing mid transfer with no warning

Intermittently, I'll run into an issue where my rsync script will simply freeze mid transfer. This freeze may occur while downloading a file, or amidst listing uptodate files.
I'm running this on my mac, here's the code below:
rsync -vvhrtplHP -e "ssh" --rsync-path="sudo rsync" --filter=". $FILTER" --delete --delete-excluded --log-file="$BACKUP/log" --link-dest="$BACKUP/current/" $CONNECT:$BASE $BACKUP/$DATE/
For example, the console will output the download progress of a file, and stop at an arbitrary percentage and speed. The log doesn't even list the file (probably because it's incomplete).
I'll try numerous attempts and it'll freeze on different files or steps with no rhyme or reason. Terminal will show the loading icon while it's working, the output will freeze, and after a few seconds the loading icon vanishes.
Any ideas what could be causing this? I'm using rsync 3.1.0 on Mavericks. Could it be a connectivity issue or a system max execution time issue?
I have had rsync freezes in the past and I recall reading somewhere that it may have to do with rsync having to look for files to link, something increasingly difficult as you accumulate backup over backup. I suggest you skip the --link-dest in the next backup if your disk space allows it (to break the chain, so to speak).
As mentioned in https://serverfault.com/a/207693 you could use the hardlink command afterwards, I haven't tried it yet.
Just had a similar problem while doing rsync from harddisk to a fat32 usb. rsync froze already in less than a second in my case and did not react at all after that.
Found out that the problem was a combination of usage of hardlinks on the harddisk and having fat32 filesystem on the usb drive, which does not support hardlinks.
Formatting the usb drive with ext4 solved the problem for me.

Update deployed meteor app while running with minimum downtime - best practice

I run my meteor app on EC2 like this: node main.js (in tmux session)
Here are the steps I use to update my meteor app:
1) meteor bundle app.tgz
2) scp app.tgz EC2-server:/path
3) ssh EC2-server and attach to tmux
4) kill the current meteor-node process by C-c
5) extract app.tgz
6) run "node main.js" of the extracted app.tgz
Is this the standard practice?
I realize forever can be used too but still do you have to kill the old node process and start a new one every time I update my app? Can the upgrade be more seamless without killing the Node process?
You can't do this without killing the node process, but I haven't found that really matters. What's actually more annoying is the browser refresh on the client, but there isn't much you can do about that.
First, let's assume the application is already running. We start our app via forever with a script like the one in my answer here. I'd show you my whole upgrade script but it contains all kinds of Edthena-specific stuff, so I'll outline the steps we take below:
Build a new bundle. We do this on the server itself, which avoids any missing fibers issues. The bundle file is written to /home/ubuntu/apps/edthena/edthena.tar.gz.
We cd into the /home/ubuntu/apps/edthena directory and rm -rf bundle. That will blow away the files used by the current running process. Because the server is still running in memory it will keep executing. However, this step is problematic if your app regularly does uncached disk operatons like reading from the private directory after startup. We don't, and all of the static assets are served by nginx, so I feel safe in doing this. Alternatively, you can move the old bundle directory to something like bundle.old and it should work.
tar xzf edthena.tar.gz
cd bundle/programs/server && npm install
forever restart /home/ubuntu/apps/edthena/bundle/main.js
There really isn't any downtime with this approach - it just restarts the app in the same way it would if the server threw an exception. Forever also keeps the environment from your original script, so you don't need to specify your environment variables again.
Finally, you can have a look at the log files in your ~/.forever directory. The exact path can be found via forever list.
David's method is better than this once, because there's less downtime when using forever restart compared to forever stop; ...; forever start.
Here's the deploy script spelled out, using the latter technique. In ~/MyApp, I run this bash script:
echo "Meteor bundling..."
meteor bundle myapp.tgz
mkdir ~/myapp.prod 2> /dev/null
cd ~/myapp.prod
forever stop myapp.js
rm -rf bundle
echo "Unpacking bundle"
tar xzf ~/MyApp/myapp.tgz
mv bundle/main.js bundle/myapp.js
# `pwd` is there because ./myapp.log would create the log in ~/.forever/myapp.log actually
PORT=3030 ROOT_URL=http://myapp.example.com MONGO_URL=mongodb://localhost:27017/myapp forever -a -l `pwd`/myapp.log start myapp.js
You're asking about best practices.
I'd recommend mup and cluster
They allow for horizontal scaling, and a bunch of other nice features, while using simple commands and configuration.

Running different versions of the same binary, same file

I have a single binary that can run in server or client mode. It can be used like this:
$ ./a.out --server &
$ ./a.out --client &
They talk to each other, and this is working well. My question is what is the expected behavior when I launch the server:
$ ./a.out --server &
But then I forget to kill it, and go about my development work, editing and building, and running the client:
$ edit client.c
$ make
$ ./a.out --client
^C
<repeat>
Now without the sticky bit set, is my OS (Ubuntu) running two different versions of my binary? Or is it taking a shortcut and using the in-memory instance and therefore ignoring my latest build? Are there any other side effects to this mistake?
make replaces the executable by deleting the original file. However, since it is executing in the background, there is a reference to it. The file isn't completely deleted until the reference is cleared (though directory entries are cleared to make way for the new executable file).
So, in your example there are two versions of the program running. One side-effect is if you make changes which cause major incompatibility b/w your server & client code - such as changes in packet structures. You'll probably see weird, unexplainable behavior, crashes, etc. Its always a good idea to kill the background server and re-run your entire test.
If you do not change server code, the just copy your a.out into 'my_server' e.g. Then run it as my_server --server. make will replace a.out, but not my_server.
Another way - tell make to kill all running a.out-s just before recompile: add target 'all' (it must be first in makefile), which depends on a.out and executes 'killall a.out'.

Does unix SCP return an error code for network issues

In a script, I am using SCP to copy huge file to another host.
scp -qrp hugefile.txt /opt/perf05/tmp
However, we have noticed that this file is not being copied. I am suspecting this was because we are losing connection while copying this huge file across network. Does scp command return any error code in case of such network disconnect or is there any other way of debugging this to find out what exactly causes this failed copy. Thanks in advance.
-Steve
scp -v
gives debugging information.
scp does return non-zero (>1) when the connection fails or any other network error occurs.

Resources