Searching with path pattern in ack is very slow - ack

I'm trying to use ack to search for foo, but only in all Gemfile files in the current directory and all subdirectories.
Searching in all files is crazy fast: ack foo
But this is awfully slow: ack -g Gemfile | ack -x foo
The latter command takes 25 seconds to return which makes it basically unusable.
Am I doing it wrong? How to I search for a term in specific files only (I know about --ruby and --php but sometimes you want to specify the pattern youself)
UPDATE
After reporting this issue in the Github project, a fix was added that speeds up the -g option with large codebases (https://github.com/petdance/ack2/issues/458).
It should be included in the next release (> 2.12).

If you're running ack 2.x you can easily create a file type for Gemfile.
ack --type-add=gemfile:is:Gemfile --gemfile foo
If you do that a lot, you put the --type-add=gemfile:is:Gemfile in your .ackrc and then you can just say
ack --gemfile foo
As the ack author/maintainer, I'm a little concerned that your original pipeline takes so long, and if you had some time and would be willing to submit it as an issue, someone might be able to take a look at why that is. https://github.com/petdance/ack2/issues

Related

copy with rsync when files are different

I have to copy a big directory to my NAS using rsync, I would like to say to rsync only copy the files when source and destination are different to avoid to copy a files already copied.
Skipping identical files is the whole purpose why people use rsync. This is default behavior of rsync. Most of the time the only option you want to use is -a:
rsync -a -P <source> <dest>
The -P just means show progress and the -a means "archive" and that means "when copying files, try to make copy as identical as possible" (try to keep permissions, ownership, timestamps, etc.) but is also means "Only update files if you have to". It's like saying "make sure <dest> is an up-to-date backup of <source>".
However, by default rsync will already consider two files identical, if they have same file size and same last modification date. Of course, two files may also have same size and same last modification date and not be identical. So when running that command for the very first time and you are not sure which files may need update and which ones don't, try this:
rsync -a -c -P <source> <dest>
-c means don't rely just upon size and date, checksum every file and compare the checksums. Only if checkums are identical, consider files as identical. Note that rsync will not necessary checksum the whole file, big files are broken into smaller chunks and every chunk is checksumed separately as only chunks that have changed are transferred.
So even with checksuming you can save you a lot of time when copying over a network connection. It won't save you any time when copying locally because just copying everything is probably faster than checksuming everything. So a plain copy will always beat a checksuming rsync in speed when both, source and destination, are local drives. In that case use
cp -a -v <source> <dest>
or if your system doesn't know -a, use
cp -pPR -v <source> <dest>
that's identical to -a. Again, the -v is just to see some progress.
And I'd only use -c for the very first sync, after that, relying on file size and last modification date usually works very well for updating and it is a whole lot faster. It will work because if a file has been altered since the last sync, it will have a different last modification date and so by just comparing the dates rysnc will know that the file must be updated at the destination. Of course, that only works if your systems all have the correct date/time set and if you don't manipulate the last modification date of files and also don't forbid your system to update them.
If you want to skip files solely on presence, use this:
rsync -a -P --ignore-existing <source> <dest>
That's like telling rsync "If you see a file with the same name at the destination, always consider it to be identical and never update it".
Please note that if -a detects a file in <source> is different than a files in <dist>, whether this is determined by size and modification date or by checksumming, it will always update the file at <dest> to match then file at <source>. If multiple sources are syncing to the same destination, you might also want to add -u which means "in case two files are different, only update if the file at <source> has a newer last modification date than then file at <dest>"
Just as a general tip, if you type
man <command>
in a terminal, you will get a nice help page on most systems (Linux, MacOS X and UNIX systems), explaining you all the options in all detail. You can scroll up/down using arrow keys or page up/down and you can leave that view by hitting "q" for quit. E.g.
man rsync

rsync incremental file list taking forever

I'm copying from one NAS to another. (Netgear ReadyNAS -> QNAP) i tried Pulling the files by running rsync on the QNAP, and that took forever, so I'm currently trying to push them from the Netgear. The code I'm using is:
rsync -avhr /sauce/folder admin#xxx.xxx.xxx.xxx:/dest/folder
i'm seeing:
sending incremental file list
and nothing after that.
File transfer is 577gb and there are a lot of files, however I'm seeing 0 network traffic on the QNAP (It fluctuates between 0kb/s to 6kb/s) so it looks like its not sending any kind of incremental file list.
all folders are created on the destination and then nothing happens after that.
Anyone have any thoughts? Or any ideas on if there is a better way to copy files from a ReadyNAS to QNAP
The documentation for -v says increase verbosity.
If the only thing you're interested in is seeing more progress, you can chain -v together like so:
rsync -avvvhr /sauce/folder/ admin#xxx.xxx.xxx.xxx:/dest/folder/
and you should see more interesting progress.
This could tell you if your copying requirements -a are stricter than you need and thus take a lot of unnecessary processing time.
For example, I attempted to use -a, which is equivalent to -rlptgoD, on over 100,000 images. Sending the incremental file list did not finish, even overnight.
After changing it to
rsync -rtvvv /sauce/folder/ admin#xxx.xxx.xxx.xxx:/dest/folder/
sending the incremental file list became much faster, being able to see file transfers within 15 minutes
After leaving it over night and it doing nothing, i came in and tried again.
the code that worked appended a '*' to the end of the sauce folder. so this was what worked:
rsync -avhr /sauce/folder/* admin#xxx.xxx.xxx.xxx:/dest/folder
If anyone else has troubles - give this a shot.
My encounter with this was a large file that was incomplete but considered "finished transfer".
I deleted the large (incomplete) file on the remote side and did another sync, which appears to have resolved the issue.
I am using QNAP 1 as production system and QNAP 2 as a backup server. On QNAP 1, I use the following script as cronjob to copy files in regular intervals to the backup-QNAP. Maybe you could try this:
DATUM=`date '+%Y-%m-%d'`;
MAILFILE="/tmp/rsync_svn.txt"
EMAIL="my.mail#mail.com"
echo "Subject: SVN Sync" > $MAILFILE
echo "From: $EMAIL" >> $MAILFILE
echo "To: $EMAIL" >> $MAILFILE
echo "" >> $MAILFILE
echo "-----------------------------------" >> $MAILFILE
rsync -e ssh -av /share/MD0_DATA/subversion 192.168.2.201:/share/HDA_DATA/subversion_backup >> $MAILFILE 2>&1
echo "-----------------------------------" >> $MAILFILE
cat $MAILFILE | sendmail -t
I encountered the same thing and determined that it was because rsync was attempting to calculate checksums for comparison, which is very slow. By default rsync uses file size, creation time, and some other attributes to check if two files are identical.
To avoid this, either omit -c / --checksum or explicitly disable checksum checking with the appropriate flag.
This will be a problem with large files or large numbers of files, so it may look like an issue with the file list, but most often is not.

Copy or rsync command

The following command is working as expected...
cp -ur /home/abc/* /mnt/windowsabc/
Does rsync has any advantage over it? Is there a better way to keep to backup folder in sync every 24 hours?
Rsync is better since it will only copy only the updated parts of the updated file, instead of the whole file. It also uses compression and encryption if you want. Check out this tutorial.
rsync is not necessarily more efficient, due to the more detailed inventory of files and blocks it performs. The algorithm is fantastic at what it does, but you need to understand your problem to know if it is really going to be the best choice.
On a very large file system (say many thousands or millions of files) where files tend to be added but not updated, "cp -u" will likely be more efficient. cp makes the decision to copy solely on metadata and can simply get to the business of copying.
Note that you might want some buffering, e.g. by using tar rather than straight cp, depending on the size of the files, network performance, other disk activity, etc. I find the following idea very useful:
tar cf - . | tar xCf directory -
Metadata itself may actually become a significant overhead on very large (cluster) file systems, but rsync and cp will share this problem.
rsync seems to frequently be the preferred tool (and in general purpose applications is my usual default choice), but there are probably many people who blindly use rsync without thinking it through.
The command as written will create new directories and files with the current date and time stamp, and yourself as the owner. If you are the only user on your system and you are doing this daily it may not matter much. But if preserving those attributes matters to you, you can modify your command with
cp -pur /home/abc/* /mnt/windowsabc/
The -p will preserve ownership, timestamps, and mode of the file. This can be pretty important depending on what you're backing up.
The alternative command with rsync would be
rsync -avh /home/abc/* /mnt/windowsabc
With rsync, -a indicates "archive" which preserves all those attributes mentioned above. -v indicates "verbose" which just lists what it's doing with each file as it runs. -z is left out here for local copies, but is for compression, which will help if you are backing up over a network. Finally, the -h tells rsync to report sizes in human-readable formats like MB,GB,etc.
Out of curiosity, I ran one copy to prime the system and avoid biasing against the first run, then I timed the following on a test run of 1GB of files from an internal SSD drive to a USB-connected HDD. These simply copied to empty target directories.
cp -pur : 19.5 seconds
rsync -ah : 19.6 seconds
rsync -azh : 61.5 seconds
Both commands seem to be about the same, although zipping and unzipping obviously tax the system where bandwidth is not a bottleneck.
Especially if you use a copy-on-write filesystem like BTRFS or ZFS, rsync is much better.
I use BTRFS, and I have this in my ~/.bashrc:
alias cp="rsync -ah --inplace --no-whole-file --info=progress2"
The important flag here for CoW FSs like BTRFS is --inplace because it only copies the changed part of the files, doesn't create new inodes for small changes between files, etc. See this.
It's not really a question of what's more efficient.
The commands 'rsync', and 'cp' are not equivalent and achieve different goals.
1- rsync can preserve the time of creation of existing files. (using -a option)
2- rsync will run multiprocess and transfer using either local sockets or network sockets. (i.e. fork itself into multiple processes)
3- The multiprocessing, and threading will increase your throughput when copying large number of small files, and even with multiple larger files.
So bottom line is rsync is for large data, and cp is for smaller local copying. (MB to small GB range). When you start getting into multiple GB or in the TB range, go with rsync. And of course network copies, rsync all the way.
For a local copy, the only advantage of rsync is that it will avoid copying if the file already exists in the destination directory. The definition of "already exists" is (a) same file name (b) same size (c) same timestamp. (Maybe same owner/group; I am not sure...)
The "rsync algorithm" is great for incremental updates of a file over a slow network link, but it will not buy you much for a local copy, as it needs to read the existing (partial) file to run it's "diff" computation.
So if you are running this sort of command frequently, and the set of changed files is small relative to the total number of files, you should find that rsync is faster than cp. (Also rsync has a --delete option that you might find useful.)
Keep in mind that while transferring files internally on a machine i.e not network transfer, using the -z flag can have a massive difference in the time taken for the transfer.
Transfer within same machine
Case 1: With -z flag:
TAR took: 9.48345208168
Encryption took: 2.79352903366
CP took = 5.07273387909
Rsync took = 30.5113282204
Case 2: Without the -z flag:
TAR took: 10.7535531521
Encryption took: 3.0386879921
CP took = 4.85565590858
Rsync took = 4.94515299797
if you are using cp doesn't save existing files when copying folders of the same name. Lets say you have this folders:
/myFolder
someTextFile.txt
/someOtherFolder
/myFolder
wellHelloThere.txt
Then you copy one over the other:
cp /someOtherFolder/myFolder /myFolder
result:
/myFolder
wellHelloThere.txt
This is at least what happens on macOS and I wanted to preserve the diff files so I used rsync.
I will prefer to use rsync with the following options
rsync -avhW --no-compress --progress --info=progress2 <src directory> <dst directory>
The above parameters can be defined as follows :
-a for the archive to preserves ownership, permissions, etc.
-v for verbose
-h for human-readable
-W for copying whole files only
--no-compress as there's no lack of bandwidth between local devices
--progress to see the progress of large files
--info=progress2 to see the overall progress
source directory path
destination directory path
rsync is much much better compared to cp because rsync copies whole files/directory only the first time. The next time when you use rsync command with the same files/directory, only new changes are copied to the destination folder, not the entire files are copied.
I used rsynk to transfer 330G data from a local HD to a external HD via USB 3.0. It took me three days. The transfer rate went down to 800 Kb/s and rised to 50 M/s for a while only after pausing the job. It is a typical overbuffering issue. Bad experience for local file tranfers: as the name indicates, (R)sync stands for REMOTE-sync (optimized for tranfers via network). As often happens, I discovered the "-z" flag only after I wondered about the issue and looked for an understandment

How do I synchronize in both directions?

I want to use rsync to synchronize two directories in both directions.
I refer to synchronization in classical sense
(not how it is meant in rsync manuals):
I want to update the directories in both directions,
depending on which of them is newer.
Can this be done by rsync (preferable in a Linux-way)?
If not, what other solutions exist?
Just run it twice, with "newer" mode (-u or --update flag) plus -t (to copy file modified time), -r (for recursive folders), and -v (for verbose output to see what it is doing):
rsync -rtuv /path/to/dir_a/* /path/to/dir_b
rsync -rtuv /path/to/dir_b/* /path/to/dir_a
This won't handle deletes, but I'm not sure there is a good solution to that problem with only periodic sync'ing.
Do you know Unison File Synchronizer?
Unison is a file-synchronization tool
for Unix and Windows. It allows two
replicas of a collection of files and
directories to be stored on different
hosts (or different disks on the same
host), modified separately, and then
brought up to date by propagating the
changes in each replica to the other. ...
Note also that it is resilient to failure:
Unison is resilient to failure. It is
careful to leave the replicas and its
own private structures in a sensible
state at all times, even in case of
abnormal termination or communication failures.
You need to run rsync twice and I recommend to run it with -au:
rsync -au /local/source/* /remote/destination
rsync -au /remote/destination/* /local/source
-a (a for archive) is a shortcut for -rlptgoD:
-r Recurse into sub directories
-l Also sync symbolic links
-p Also sync file permissions
-t Also sync file modification times
-g Also sync file groups
-o Also sync file owner
-D Also sync special (not regular/meta) files
Basically whenever you want to create an identical one-to-one copy using rsync, you should always use -a as that's what most users expect to happen when they talk about "syncing". Other answers here seem to overlook that sometimes the content of a file stays unchanged but its owner may have changed or its access permissions may have changed and in that case rsync would not sync the file which could be fatal.
But you also require -u as that tells rsync to completely leave any file/folder alone, in case it exists already at the destination and has a newer last modification date. Without -u rsync would sync regardless if a file/folder is newer or not.
Please note that this solution cannot handle deleted files. Handling deletes is not easily possible as consider the following situation: A file has been deleted at the source, now how shall rsync know if that file once existed and has been deleted (in that case it must be deleted at the destination as well) or whether it never existed at the source (in that case it must be copied from the destination). These two situations look identical to rsync thus it cannot know how to react correctly. It won't help to sync the other way round as that can lead to the same situation: A file exists at the source but not at the destination. Why? Has it never existed at the destination or has it been deleted? Both cases look identical to rsync.
Sync tools that can reliably sync deleted files usually manage a sync log about all past sync operations. If that log reveals that there once was a file and has been synced but now it is missing, it's clear that it has been deleted. If there never was such a file according to the log, it must be synced. By storing all log entries with timestamps, it's even possible that a deleted file comes back and gets deleted multiple times yet the sync tool will always know what to do and the result is always correct. rsync has no such log, it only relies on the current file state of two sides of the operation.
You can however build yourself a sync command using rsync and a bit POSIX shell scripting which gets already very close to a sync tool as described above. As I needed such a tool myself, here is an answer on Stackoverflow that guides you through the creation of such a script.
Thanks jsight
rsync -urv --progress dir_a dir_b && rsync -urv --progress dir_b dir_a
This would result in the second sync happening immediately after 1st sync is over. In case the directory structure is huge, this will save time, as one does not need to sit before the pc. If the structure is huge, remove the verbose and progress stuff
rsync -ur dir_a dir_b && rsync -ur dir_b dir_a
Use rsync <OPTIONS> [hostname:]source-dir [hostname:]dest-dir
for example:
rsync -pogtEtvr --progress --bwlimit=2000 xxx-files different-stuff
Will sync xxx-files to different-stuff/xxx-files .If different-stuff/xxx-files did not exist, it will create it - i.e. copy it.
-pogtEtv - just bunch of options to preserve file metadata, plus v - verbose and r - recursive
--progress - show progress of syncing in real time - super useful if you copy big files
--bwlimit=2000 - sets maximum speed of copying/syncing (bw = bandwidth)
P.S. rsync is critically important when you work over network in case of local machine you can use commands like cp.
Good Luck!
What you need is Rclone. Rclone ("rsync for cloud storage") is a command line Linux program to sync files and directories to and from different cloud storage providers (box,dropbox,ftp etc) and local filesystems. Rlone supports mirror syncing only.
Another more graphical solution which includes real-time syncing would be to use FreeFileSync, which includes the program RealTimeSync. FreefileSync support 2-way bidirectional syncing which includes handling deletes.
I was having the same question and end up using git. It might not fit your situation, but if anyone find this topic and have the same question, you may consider a version control system.
I'm using rsync with inotifywait.
When you change any file, rsync will be executed.
inotifywait -m --exclude "$_LOG_FILE" -r -e create,delete,delete_self,modify,moved_to --format "%w%f" "$folder"
You need run inotifywait on both host. Please check example inotifywait

Add last n lines of files to tar/zip

I need to regularly send a collection of log files that can grow quite large, so I would like to only send the last n lines of the each of the files.
for example:
/usr/local/data_store1/file.txt (500 lines)
/usr/local/data_store2/file.txt (800 lines)
Given a file with a list of needed files named files.txt, I would like to create an archive (tar or zip) with the last 100 lines of each of those files.
I can do this by creating a separate directory structure with the tail-ed files, but that seems like a waste of resources when there's probably some piping magic that can happen to accomplish it. Full directory structure also must be preserved since files can have the same names in different directories.
I would like the solution to be a shell script if possible, but perl (without added modules) is also acceptable (this is for Solaris machines that don't have ruby/python/etc.. installed on them.)
You could try
tail -n 10 your_file.txt | while read line; do zip /tmp/a.zip $line; done
where a.zip is the zip file and 10 is n or
tail -n 10 your_file.txt | xargs tar -czvf test.tar.gz --
for tar.gz
You are focusing in an specific implementation instead of looking at the bigger picture.
If the final goal is to have an exact copy of the files on the target machine while minimizing the amount of data transfered, what you should use is rsync, which automatically sends only the parts of the files that have changed and also can automatically compress while sending and decompress while receiving.
Running rsync doesn't need any more daemons on the target machine that the standard sshd one, and to setup automatic transfers without passwords you just need to use public key authentication.
There is no piping magic for that, you will have to create the folder structure you want and zip that.
mkdir tmp
for i in /usr/local/*/file.txt; do
mkdir -p "`dirname tmp/${i:1}`"
tail -n 100 "$i" > "tmp/${i:1}"
done
zip -r zipfile tmp/*
Use logrotate.
Have a look inside /etc/logrotate.d for examples.
Why not put your log files in SCM?
Your receiver creates a repository on his machine from where he retrieves the files by checking them out.
You send the files just by commiting them. Only the diff will be transmitted.

Resources