How do I Download efficiently with rsync? - rsync

A couple of questions related to one theme: downloading efficiently with Rsync.
Currently, I move files from an 'upload' folder onto a local server using rsync. Files to be moved are often dumped there, and I regularly run rsync so the files don't build up. I use '--remove-source-files' to remove files that have been transferred.
1) the '--delete' options that remove destination files have various options that allow you to choose when to remove the files. This would be handly for '--remove-source-files' since is seems that, by default, rsync only removes the files after all files have been transferred, rather than after each file; Othere than writing a script to make rsync transfer files one-by-one, is there a better way to do this?
2) on the same problem, if a large (single) file is transferred, it can only be deleted after the whole thing has been sucessfully moved. It strikes me that I might be able to use 'split' to split the file up into smaller chunks, to allow each to be deleted as the file downloads; is there a better way to do this?
Thanks.

Related

Recoverable file deletion in R

According to these questions:
Automatically Delete Files/Folders
how to delete a file with R?
the two ways to delete files in R are file.remove and unlink. These are both permanent and non-recoverable.
Is there an alternative method to delete files so they end up in the trash / recycle bin?
I wouldn't know about a solution that is fully compatible with Windows' "recycle bin", but if you're looking for something that doesn't quite delete files, but prevents them from being stored indefinitely, a possible solution would be to move files to the temporary folder for the current session.
The command tempdir() will give the location of the temporary folder, and you can just move files there - to move files, use file.rename().
They will remain available for as long as the current session is running, and will automatically be deleted afterwards . This is less persistent than the classic recycle bin, but if that's what you're looking for, you probably just want to move files to a different folder and delete it completely when you're done.
For a slightly more consistent syntax, you can use the fs package (https://github.com/r-lib/fs), and its fs::path_temp() and fs::file_move().

Rsync - How to display only changed files

When my colleague and I upload a PHP web project to production, we use rsync for the file transfer with these arguments:
rsync -rltz --progress --stats --delete --perms --chmod=u=rwX,g=rwX,o=rX
When this runs, we see a long list of files that were changed.
Running this 2 times in a row, will always show the files that were changed between the 2 transfers.
However, when my colleague runs the same command after I did it, he will see a very long list of all files being changed (though the contents are identical) and this is extremely fast.
If he uploads again, then again there will be only minimal output.
So it seams to me that we get the correct output, only showing changes, but if someone else uploads from another computer, rsync will regard everything as changed.
I believe this may have something to do with the file permissions or times, but would like to know how to best solve this.
The idea is that we only see the changes, regardless who does the upload and in what order.
The huge file list is quite scary to see in a huge project, so we have no idea what was actually changed.
PS: We both deploy using the same user#server as target.
The t in your command says to copy the timestamps of the files, so if they don't match you'll see them get updated. If you think the timestamps on your two machines should match then the problem is something else.
The easiest way to ensure that the timestamps match would be to rsync them down from the server before making your edits.
Incidentally, having two people use rsync to update a production server seems error prone and fragile. You should consider putting your files in Git and pushing them to the server that way (you'd need a server-side hook to update the working copy for the web server to use).

IPython Notebook - Is it safe to delete temporary files and folders?

IPython Notebook has been opening lots of temporary folders, including those ending with .ipynb[some random chars] and folders with checkpoints.
I guess some of them were created when my compyter crashed or all sort of things happened. I would have assumed those files would clean themselves once everything is normally saved, but they don't. They keep being there, trashing my workspace.
Is it safe to delete those files and folders, once I've saved my original .ipynb file?
Thanks
It works like any other checkpoint-systems. They are always safe to delete, but you never know when you will need them, which is why they are created.
An interesting question could be how you can disable the whole checkpoint system.
It is safe to delete those files(.ipynb_checkpoints) after saving your progress to the main file
This files are there as a backup but you can delete it if it is cluttering your workspace

Identifying Files in Plone BlobStorage

Files in var/blobstorage can be listed and sorted by their sizes via Unix commands. This way shows big files on top list. How can I identify these files belongs to which IDs/paths in a Plone site?
There is no 'supported' way to do this. You could probably write a script to inspect the ZODB storage, but it'd be complicated. If you want to find the biggest files in your Plone site, you're probably better off writing a script that runs in Plone and using it to search (using portal_catalog) for all File objects (or whatever content type is most likely to have big files) and calling get_size() on it. That should return the (cached) size, and you can delete what you want to clean up.

rsync list of specific local files in 1 step

I'm working on a web application where a user uploads a list of files, which should then be immediately rsynced to a remote server. I have a list of all the local files that need to be rsynced, but they will be mixed in with other files that I do not want rsynced every time. I know rsync will only send the changed files, but this directory structure and contents will grow very large over time and the delay would not be acceptable.
I know that doing a remote rsync, I can specify a list of remote files, i.e...
rsync "host:/path/to/file1 /path/to/file2 /path/to/file3"
... but that does not work once I remove "host:" and try to specify the files locally.
I also know I can use --files-from, but that would require me to create a file ahead of time with a list of files that I want to rsync (and then delete it afterwards). I think it'd be cleaner to just effectively say "rsync these 4 specific files to this remote server", but I can't seem to get that to work.
Is there any way to do what I'm trying to accomplish, or do I have to resort to creating a tmp file with a list in it?
Thanks!
You should be able to list the files similar to the example you gave. I did this on my machine to copy 2 specific files from a directory with many other files present.
rsync test.sql test2.cpp myUser#myHost:path/to/files/synced/

Resources