Is there a way to backup a btrfs file system by copying the entire disk over at first backup, but then copying over snapshot files in place of using rsync (or is this a bad idea)?
You can definitely do this, though rsync will duplicate some blocks on the new system.
You might be interested in Buttersink. ButterSink is like rsync, but for btrfs subvolumes instead of files, which makes it much more efficient for things like archiving backup snapshots. It is built on top of btrfs send and receive capabilities. Sources and destinations can be local btrfs file systems, remote btrfs file systems over SSH, or S3 buckets.
For example, the following will copy over just snapshot differences to the remote machine, and create an efficient mirror of your snapshots there:
buttersink /home/snaps/ ssh://backup-server/bak/snaps/
Related
The remote folder in question does not have a shell or allows ssh connections, nor it has a ftp server. The folder to be sync is big, with hundreds of files and continuously updated, and I need some way of downloading any new or modified files quickly.
I have tried lftp and rsync, but the connection is not allowed. I thought about writing a script over sftp, but it seems too complex and slow. I could also use Python, but from what I know the script would be similar.
I would like to install only well-known libraries, if any, but I am not sure about my options here.
I am looking for a way to store files in Artifactory repository in a storage efficient way and upload/download difference between local version and remote in order to save disk space, bandwidth and time.
There are two good utilities which works in this way rsync and rdiff-backup. Sure there are others.
Is there a way to organize something similar with Artifactory stack?
What is rsync:
DESCRIPTION
Rsync is a fast and extraordinarily versatile file copying tool. It can copy locally,
to/from another host over any remote shell, or to/from a remote rsync daemon. It offers
a large number of options that control every aspect of its behavior and permit very
flexible specification of the set of files to be copied. It is famous for its
delta-transfer algorithm, which reduces the amount of data sent over the network by
sending only the differences between the source files and the existing files in the des-
tination. Rsync is widely used for backups and mirroring and as an improved copy com-
mand for everyday use.
JFrog CLI includes a functionality called "Sync Deletes", allowing to sync files between the local file system and Artifactory.
This functionality is supported by both the "jfrog rt upload" and "jfrog rt download" commands. Both commands accept the optional --sync-deletes flag.
When uploading, the value of this flag specofies a path in Artifactory, under which to sync the files after the upload. After the upload, this path will include only the files uploaded during this upload operation. The other files under this path will be deleted.
The same goes for downloading, but this time, the value of the --sync-deletes flag specifies a path in the local file system, under which files which had not been downloaded from Artifactory are deleted.
Read more this in the following link:
https://www.jfrog.com/confluence/display/CLI/CLI+for+JFrog+Artifactory
Let us assume we have a static file server (Nginx + Linux) that serves 10 files. The files are read almost as frequently as the server can process. However, some of the files need to be replaced with new versions, so that the filename and URL address remain unaltered. How to replace the files safely without a fear that some reads fail or become a mix of two versions?
I understand this is a rather basic operating system matter and has something to do with renames, symlinks, and file sizes. However, I failed to find a clear reference or a good discussion and I hope we can build one here.
Use rsync. Typically I choose rsync -av src dst, but YMMV.
What is terrific about rsync is that, in addition to having essentially zero cost when little or nothing changed, it uses atomic rename. So during file transfer, a ".fooNNNNN" temp file gets bigger and bigger. Once completed, rsync closes the file and renames it on top of "foo". So web clients either see all of the old, or all of the new file. Notice that range downloads (say from restart after error) are not atomic, exposing such clients to lossage, especially if bytes were inserted near beginning of file. SHA1 wouldn't validate for such a client, and he would have to restart his download from scratch. BTW, if these are "large" files, tell nginx to use zero-copy sendfile().
I use rsync to backup a folder of 60G between my laptop and an external USB drive. Only 4G of data have been added. It took a long time to finish : 2 hours.
Here is the command :
rsync -av --exclude=target/ --exclude=".git/" --delete --link-dest=$destdir/backup.1 $element $destdir/backup.0
Do you have an explanation ?
What slows down rsync more : a lot of small files or big binary files (photos) ?
As I don't exactly know your system, I am making a few assumptions here. If they don't match your situation, please clarify your question and I'll happily update my answer.
I am assuming you have a lot of files, no matter their sizes in the location you are copying from. This will cause a rather slow rsync caused by the design of the rsync protocol.
rsync works like this:
1. Build a file-list of the source location.
2. For all files in the source location:
a. Get the size and the mtime (modification timestamp)
b. Compare it with the size and mtime of the copy in the destination location
c. If they differ, copy the file from the source to the destination
Done.
If you just have a few files, this will obviously be faster than for many files. Your usb drive might be your bottleneck, as retrieval of the size and timestamp will create a lot of jumps in the inode table.
Maybe a tool like iotop (in the case your on Linux, similar tools are available for almost all platforms) can help you identify the bottleneck.
The --delete option can also cause a slow rsync, if retrieving the complete file-list of the target location is slow (which is probable for an external, rotating USB disk). To verify that this is the problem, on any os with a bash, just type time ls -Ral <target-location> > filelist.txt (diverting the output to a file since putting out data on the screen is way slower). If this takes a lot longer than for your source disk, your target disk could be the bottleneck.
I have one folder on my computer and one folder on a remote server, I transferred a large number of files but for some reason I have now 2 more files in my own folder than on the server so I would like to check which ones these are instead of going through them all manually.
I looked for directory comparison and I found the command diff to display differences, but I tried it for my different two folders and it couldn't find the directory on the remote server. This is what I tried:
diff /Volumes/TC1-SIMDATA/Parallel/ModelWSSim/ fraukje#localhost:parallel/ModelWSSim/
Could anyone hint me what I am doing wrong here?
The diff command works only with file system accessible files and folders (i.e. mounted folders generally speaking).
If you can mount the folders, you'll be able to compare them with diff, else you need to invest some time to find a good diff merge tool with FTP or SFTP or whatever access protocol that you need.