resume Rsnapshot to same drive - rsync

Sometimes, on a large rsync using Rsnapshot, the NFS mount we are syncing to will drop.
Then when you run:
rnsapshot monthly
to resume it, it will behave as if this is a brand new, rotating the monthly.0 to monthly.1 and so on.
Is there a way to resume the rsync using rsnapshot monthly if something gets interrupted? That won't start a brand new backup?

The answer is no not really but sort of. rsnapshot runs as a batch job - generally triggered from a cron job. It does not keep any state between runs apart from the backups themselves. If your NFS mount goes away mid backup, after a while you'll get some sort of IO error, and rsnapshot will give up and die with an error. The backup will fail. Next time it is run after the failure, it will start the backup as if from scratch.
However, if you use the sync_first config option, and not the link_dest option, rsnapshot will do a better job of recovery. It will leave the files its already transferred in place and won't have to transfer them again, but it will have to check again that the source and destination are the same in the usual rsync way. The man page gives some detail on this.
This is not the case with the link_dest method which for most errors removes the work its done and starts again. Specifically, on error link_dest does a "rollback" like this:
ERROR: /usr/bin/rsync returned 255 while processing sam#localhost:..
WARNING: Rolling back "localhost/"
/bin/rm -rf /tmp/rs-test/backups/hourly.0/localhost/
/bin/cp -al /tmp/rs-test/backups/hourly.1/localhost \
/tmp/rs-test/backups/hourly.0/localhost
touch /tmp/rs-test/backups/hourly.0/
rm -f /tmp/rs-test/rsnapshot-test.lock
If your not using it already, and you can use it (apparently some non UNIX systems can't), use the sync_first method.

I don't use rsnapshot, but I've written my own equivalent wrapper over rsync [in perl] and I do delta backups, so I've encountered similar problems.
To solve the problem of creating "false whole backups", the basic idea is to do:
while 1
rsnapshot remote_whatever/tmp
if (rsnapshot_was_okay) then
mv remote_whatever/tmp remote_whatever/monthly_whatever
break
endif
end
The above mv should be atomic, even over NFS.
I just answered a similar question here: https://serverfault.com/questions/741346/rsync-directory-so-all-changes-appear-atomically/741420#741420 that has more details

Related

fwrite(): write of XX bytes failed with errno=5 Input/output error

I had 2 similar questions before, however after more debugging I came to the conclusion the problem was (probably) not within my own code.
In my code I am trying to unzip a gzipped file, for this I wrote a small method;
<?php
namespace App\Helpers;
class Gzip
{
public static function unzip($filePath)
{
$outFilePath = str_replace('.gz', '', $filePath);
// Open our files (in binary mode)
$file = gzopen($filePath, 'rb');
$outFile = fopen($outFilePath, 'wb');
// Keep repeating until the end of the input file
while (!gzeof($file)) {
// Read buffer-size bytes
// Both fwrite and gzread and binary-safe
fwrite($outFile, gzread($file, 4096));
}
// Files are done, close files
fclose($outFile);
gzclose($file);
}
}
This should result in the unzipped file;
Gzip::unzip('path/to/file.csv.gz');
This is where it gets tricky, sometimes it will unzip the file and sometimes it will throw this exception; (keep in mind that this has nothing to do with the StreamHandler itself, this is a pure input/output error problem)
I can refresh the page as many times as I want but nothing will change, if I would try the gunzip command on the command line it will fail with sort off the same error;
Which file I am unzipping does not matter, it randomly happens to a random file.
Now it also won't matter if I run the gunzip command multiple times, but like I said these exceptions / errors happen randomly so they also randomly "fix" them self.
The application is written in Laravel 8.0, PHP7.4 running on a Homestead environment (Ubuntu 18.04.5 LTS) my base laptop runs on Windows 10.
To me it's super weird that this exception / error happens randomly and also randomly out of nowhere "fixes" itself, so my question is: how does this happen, why does this happen and ultimately how can I fix it.
errno=5 Input/output error is a failure to read/write the Linux file system.
A real server, you need to check the disk with fsck, etc...
Homestead running on Windows, I think we should look for the windows 10 homestead errno -5 issue.
winnfsd - https://github.com/winnfsd/vagrant-winnfsd/issues/96#issuecomment-336105685
If your Vagrant on Windows is using VirtualBox, HyperV can course this.
Try to disable HyperV for VirtualBox on the powershell, and reboot Windows
Disable-WindowsOptionalFeature -Online -FeatureName Microsoft-Hyper-V-Hypervisor
regards
The problem relied in me using Homestead (a Vagrant box) with NFS turned on, Vagrant + NFS + Windows = problems. There are many possible solutions to the problem, most exceptions regarding a errno5 come down to NFS + Vagrant.
The solution for me was to stop using NFS, for now this will be the accepted answer as this fixes my problem. However if someone manages to find a actual solution to this error I will accept that.
I solved it and I keep using NFS.
This situation happens to me usually when I'm dropping a database schema or creating a new one and fill it with a lot of data on my virtual box. I'm talking about volumes like around 50 Megabytes. I guess this is enough for virtual box to start re-scaling the virtual hard disc and it makes Ubuntu crazy and Kernel panicking.
So the solution is to reboot vagrant as many times as it takes for it to fix the issue.
That is what usually work for me:
make vagrant halt - it would go with errors
then vagrant up - it probably would not work
them vagrant halt - it probably would go with errors again
then vagrant up --provision - it would take time and probably also give errors
then vagrant halt - it should work this time
then vagrant up --provision - because "why not provisioning it again" and it is usually enough.
In 9 out of 10 cases it is enough. When it is not enough then I just create a new homestead.

How does execlp work exactly?

So I am looking at my professor's code that he handed out to try and give us an idea of how to implement >, <, | support into our unix shell. I ran his code and was amazed at what actually happened.
if( pid == 0 )
{
close(1); // close
fd = creat( "userlist", 0644 ); // then open
execlp( "who", "who", NULL ); // and run
perror( "execlp" );
exit(1);
}
This created a userlist file in the directory I was currently in, with the "who" data inside that file. I don't see where any connection between fd, and execlp are being made. How did execlp manage to put the information into userlist? How did execlp even know userlist existed?
Read Advanced Linux Programming. It has several chapters related to the issue. And we cannot explain all this in a few sentences. See also the standard stream and process wikipages.
First, all the system calls (see syscalls(2) for a list, and read the documentation of every individual system call that you are using) your program is doing should be tested against failure. But assume they all succeed. After close(1); the file descriptor 1 (STDOUT_FILENO) is free. So creat("userlist",0644) is likely to re-use it, hence fd is 1; you have redirected your stdout to the newline created userlist file.
At last, you are calling execlp(3) which will call execve(2). When successful, your entire process is restarted with the new executable (so a fresh virtual address space is given to it), and its stdout is still the userlist file descriptor. In particular (unless execve fails) the perror call is not reached.
So your code is a bit what a shell running who > userlist is doing; it does a redirection of stdout to userlist and runs the who command.
If you are coding a shell, use strace(1) -notably with -f option- to understand what system calls are done. Try also strace -f /bin/sh -c ls to look into the behavior of a shell. Study also the source code of existing free software shells (e.g. bash and sash).
See also this and the references I gave there.
execlp knowns nothing. Before execing stdout was closed and a file opened, so the descriptor is the one corresponding to stdout (opens always returns the lowest free descriptor). At that point the process has an "stdout" plugged to the file. Then exec is called and this replaces to whole address space, but some properties remains as the descriptors, so know the code of who is executed with an stdout that correspond to the file. This is the way redirections are managed by shells.
Remember that when you use printf (for example) you never specify what stdout exactly is... That can be a file, a terminal, etc.
Basile Starynkevitch correctly explained:
After close(1); the file descriptor 1 (STDOUT_FILENO) is free. So creat("userlist",0644) is likely to re-use it…
This is because, as Jean-Baptiste Yunès wrote, "opens always returns the lowest free descriptor".
It should be stressed that the professor's code only likely works; it fails if file descriptor 0 is closed.

Phabricator Making Assumptions About Environment

I am attempting to get Phabricator running on Solaris over apache. The website is working, but all of the cli scripts are not. For example, phd.
The first problem, is that it is not passing arguments to the underling manage-daemons.php script that it invokes. Looking at the phd file, this does not surprise me:
$> cat phd
../scripts/daemon/manage_daemons.php
Now, given my default shell is bash, this isn't going to pass-through my arguments. To do this, I have modified the script:
#! /bin/bash
../scripts/daemon/manage_daemons.php $*
This will now pass-through the arguments, but it's now failing to find transative scripts it requires via relative path:
./phd start
Preparing to launch daemons.
NOTE: Logs will appear in '/var/tmp/phd/log/daemons.log'.
Launching daemon "PhabricatorRepositoryPullLocalDaemon".
[2014-05-09 19:29:59] EXCEPTION: (CommandException) Command failed with error #127!
COMMAND
exec ./phd-daemon 'PhabricatorRepositoryPullLocalDaemon' --daemonize --log='/var/tmp/phd/log/daemons.log' --phd='/var/tmp/phd/pid'
STDOUT
(empty)
STDERR
./phd-daemon: line 1: launch_daemon.php: not found
at [/XXX/XXX/libphutil/src/future/exec/ExecFuture.php:398]
#0 ExecFuture::resolvex() called at [/XXX/XXX/phabricator/src/applications/daemon/management/PhabricatorDaemonManagementWorkflow.php:167]
#1 PhabricatorDaemonManagementWorkflow::launchDaemon(PhabricatorRepositoryPullLocalDaemon, Array , false) called at [/XXX/XXX/phabricator/src/applications/daemon/management/PhabricatorDaemonManagementWorkflow.php:246]
#2 PhabricatorDaemonManagementWorkflow::executeStartCommand() called at [/XXX/XXX/phabricator/src/applications/daemon/management/PhabricatorDaemonManagementStartWorkflow.php:18]
#3 PhabricatorDaemonManagementStartWorkflow::execute(Object PhutilArgumentParser) called at [/XXX/XXX/libphutil/src/parser/argument/PhutilArgumentParser.php:396]
#4 PhutilArgumentParser::parseWorkflowsFull(Array of size 9 starting with: { 0 => Object PhabricatorDaemonManagementListWorkflow }) called at [/XXX/XXX/libphutil/src/parser/argument/PhutilArgumentParser.php:292]
#5 PhutilArgumentParser::parseWorkflows(Array of size 9 starting with: { 0 => Object PhabricatorDaemonManagementListWorkflow }) called at [/XXX/XXX/phabricator/scripts/daemon/manage_daemons.php:30]
Note I have obscured my paths with XXX as they give away sensitive information.
Now, obviously I shouldn't be modifying these scripts. This is an indication that some prerequisite is not set up properly.
It's clear to me that Phabricator is making some (bold) assumption about my setup. But I'm not quite sure what...?
These are supposed to be symlinks. For example, if you look at "phd" in the repository on GitHub, you can see that the file type is "symbolic link":
https://github.com/facebook/phabricator/blob/master/bin/phd
Something in your environment is incorrectly turning the symlinks into normal files. I'm not aware of any Git configuration which can cause this, although it's possible there is something. One situation where I've seen this happen is when a working copy was cloned, then copied using something like rsync without appropriate flags to preserve symlinks.

RHadoop Stream Job Fail with Apache Oozie

I'm really just looking to pick the community's brain for some leads in figuring out what is going on with the issue I'm having.
I'm writing a MR job with RHadoop (rmr2, v3.0.0) and things are great -- IO with HDFS, mapping, reducing. No problems. Life is great.
I'm trying to schedule the job with Apache Oozie, and am running into some issues:
Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce, :
hadoop streaming failed with error code 1
I've read the rmr2 debugging guide, but nothing is really getting to the stderr because the job fails before anything even gets scheduled.
In my head, everything points to a difference in environments. However, Oozie is running the job as the same user that I'm able to run everything with via cli, and all of the R environment variables (fetched with Sys.getenv()) are the same, excepting there's some additional class path stuff set with Oozie.
I can post more of the OS or Hadoop versions and config details, but sleuthing some version-specific bugs seems like a bit of a red herring as everything runs fine at the command line.
Anybody have any thoughts what might be some helpful next steps in hunting this beast down?
UPDATE:
I overwrote the system function in the base package to log the user, the host name of the node, and the command being executed before the internal call to system. So before any system call is actually executed, I get something like the following in the stderr:
user#host.name
/usr/bin/hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming-2.2.0.2.0.6.0-102.jar ...
When ran with Oozie, the command printed in the stderr fails with an exit status of 1. When I run the command on user#host.name, it runs successfully. So essentially the EXACT same command with the SAME user on the SAME node fails with Oozie, but runs successfully from cli.

SVN - SQLite - disk I/O error

When trying to commit to my SVN repository, I got the following error:
Working copy 'Z:\prace-pj\projects\other\CopyRT' locked.
So I run the clean up command and then the commit succeeded, but at the end of the response message, there was the following error:
Error bumping revisions post-commit (details follow):
disk I/O error, executing statement 'RELEASE s11'
Now when I try to e.g. update the repository, it says that it is stil locked. When I clean up and try to update again, I get an error like this:
disk I/O error, executing statement 'RELEASE s2'
sqlite: disk I/O error
What should I do to fix this?
For others reference, I just had this same error and found that one of my log files was taking up all my space (and could not write to the HDD because there was no free space).
Run (to make sure you have enough disk space)
df -h
Then I just needed to run:
svn cleanup
This resolved the error for me.
have you tried:
svn unlock --force path/to/workingcopy
? Seems it can be pointed at a url if the problem is in the repository itself... I've only used an unlock operation via the tortoise gui before, but I assume it just wraps the svn command anyway.
hope that helps

Resources