Move/rsync while excluding some patterns - rsync

I have several TB of photos, spread throughout subfolders. Each photo has an original, a watermarked resized version, and a thumbnail.
Named as such:
img1001.jpg
img1001_w.jpg
img1001_t.jpg
DSC9876.jpg
DSC9876_w.jpg
DSC9876_t.jpg
etc etc.
What I need to do, is move all of the originals to a different server. Presumably rsync is the best tool for this?
Is it possible to rsync a directory, while excluding any files that end in _t.jpg or _w.jpg? I'm not concerned about possible edge cases where the original file ends with either of those, as there are no such cases in my data.
Or am I better off just rsync'ing the whole lot, and then selectively deleting the _t & _w files from the destination?
Thanks

Yes, rsync is a good choice. Also because it works incremental so you can stop and start it when needed.
By default rsync does not delete anything on remote, I believe.
Yes, you can sync whole directory structures.
It is possible exclude files or folders from syncing.
I think I'm using a command like
rsync -av [--exclude <excludes-file>] <source> <destination>

Related

Need help deleting extra files from Wordpress

Hey guys I need either a SQL query or a script in order to delete all the images from my media library in Wordpress, except the ones that they have '320x180' as a resolution (or in their name)
The reason why is I have more than 100k files and I can't delete the manually as this would take centuries. SQL/programming is not my strongest point either.
Thanks
You can easily delete files using "find" command on linux, just needed to do the same thing. Try this in the bash console:
cd /var/www/wordpress/wp-content/uploads
find -type f -regex '.*[0-9]+x[0-9]+.\(jpg\|png\|jpeg\)$' -delete
The above will delete all resized image versions leaving just the original uploads.

Is there an always existing, known Unix path string guaranteed to name an always empty directory?

Under many, most, or maybe all Unix file systems, if you iterate over the links in a directory, there will usually/always be at least two, one pointing to the current directory ("./"), and one back-pointing to the parent directory ("../"). Except maybe for the root, which would have only the first of these two links.
But it might be that this is not true under some other file systems that purport to comport with most Unix conventions (but don't quite).
Is there a directory somewhere in a Unix file system guaranteed to always be an empty directory and whose link count can always be read using, e.g., stat() or equivalent?
If so, one can check the link count and expect it to be 2. Or perhaps something else, which would allow a program to adjust its behavior accordingly.
There is no standard directory which is always empty -- but you could create one, if you needed to. One easy way to do this would be using the mkdtemp() function.
However, there is no guarantee that all directories will be using the same file system. For instance, if a FAT filesystem is mounted, directories corresponding to that file system may behave differently from other ones.

Mass Thunderbird folder to Gnus nnfolder conversions

I'm pondering the idea of importing a few thousand Thunderbird folders, each folder containing many emails of course, as a set of Emacs' Gnus mailgroups. Each mailgroup name would be derived from the folder hierarchy. Because of the quantity, the work is going to be fairly tedious, so I would automate this massive import if possible.
Among the available backends, nnfolder seems the most promising in this case. I presume it would be better to populate the mailgroups from within Gnus. Otherwise, I would have to thoroughly understand the nnfolder format, and this might require many iterations before I really get it right. Moreover, as email continues to flow in, iterations may become difficult to properly organize without loosing anything.
I guess I have to respool everything, under the constraint that the selected mailgroup is a function of the Thunderbird origin, overriding the standard Gnus selection mechanism. I did some Gnus coding in the past, but since I did not touch Emacs for a dozen years, it is all very rusty. I'm a bit lost about how to approach this task as efficiently and quickly as possible. So my question: how would you handle it? Or is there some clever Gnus hidden corner that I should explore more deeply? :-)
François
P.S. After I wrote this question, I found out that Gnus has a nice, helping function towards this goal. The idea is to first copy all Thunderbird folder files within the ~/Mail directory, as they are for the contents, but properly renamed. Once this done, M-x nnfolder-generate-active-file does at once, for each copied folder, edit the contents, leave a ~ backup, generate NOV data, create one mailgroup and, of course, adjust the ~/Mail/active file.
To copy the folders underneath the ~/.thunderbird/LOGIN/Mail/Local Folders/ directory, I wrote a small Python script. It ignores all .msf files, and recurse within .sbd directories. The folder path name, relative to Local Folders/, has all its .sbd/ strings turned into periods to produce the mailgroup name, also lowering case, turning spaces and underlines to dashes, and handling other special characters appropriately. In particular, non-ASCII characters are not handled properly, nnfolder is confusing UTF-8 and ISO-8859-1 here and there. The script also has to skip msgfilterrules.dat and likely drafts, junk and such things.
I notice two details requiring attention :
Thunderbird itself can be used to compact folders before copying them, otherwise one might unwillingly recover messages which were already deleted.
(setq nnmail-use-long-file-names t) is needed in ~/.emacs prior to the whole operation.
The batch transformation aborted, saying it is not able to decrypt one of the message. I moved the offending folder out of the way, and then, the lengthy operation succeeded.

Drupal - Attach files automatically by name to nodes

i need better file attachement function. Best would be that if you upload files to FTP and have a similar name as the name of the node (containing the same word), so they appear under this node (to not have to add each file separately if you need to have more nodes below). Can you think of a solution? Alternatively, some that will not be as difficult as it always manually add it again.
Dan.
This would take a fair bit of coding. Basically you want to implement hook_cron() and run a function that loops through every file in your FTP folder. The function will look for names of files that have not already been added to any node and then decide which node to add them to.
Bear in mind there will be a delay once you upload your files until they are attached to the node until the next cron job runs.
This is not a good solution and if I could give you any advice it would be not to do it - The reason you upload files through the Drupal interface is so that they are tracked in the files table and can be re-used.
Also the way you're proposing leaves massive amounts of ambiguity as to which file will go where. Consider this:
You have two nodes, one about cars and one about motorcycle sidecars. Your code will have to be extremely complex to make the decision of which node to add to if the file you've uploaded is called 'my-favourite-sidecar.jpg'.

Best backup strategy for checked-out and hijacked files in all ClearCase VOBs and views

Our policy here is that only "most important" CCase views are backed up.
All the important data are considered to be in the VOBs and also under non-CCase directories, but never in views.
However, an special case are the checked-out files in views.
People quite very often forget that they became private files in their dynamic view.
Some times they cannot be found easily (or at all) under the dynamic view storage area.
In snapshot views hijacked elements may become also important.
What is the best strategy to find and backup all those files only (checked-out / hijacked) in every (dynamic / snapshot) view and VOB ?.
(It should be possible to script it in very few lines, i think, ct lsco, ct lspriv ...).
Thank you very much in advance, Javier.
(FJCobas, Spain).
The idea is to use the SO question "Command line to delete all ClearCase view-private files", adapting it to select only checkout, hijacked and/or eclipsed files.
With Unix:
cleartool ls -r -nxn | grep -e "(CHECKEDOUT|hijacked|eclipsed)"
Note: as mentioned in the SO question "ClearCase: Backup for only modified checked-out elements in all views", an optimized solution would check if a checkout file actually introduced any changes. But if you have lots of checkouts, this wouldn't scale: a full copy (of all files) every time will be faster.
You can then copy them in a safe backup location.

Resources