Concatenating input to svn list command with output, then pass it to grep - unix

I currently have the following shell command which is only partially working:
svn list $myrepo/libs/ |
xargs -P 10 -L 1 -I {} echo $myrepo/libs/ {} trunk |
sed 's/ //g' |
xargs -P 20 -L 1 svn list --depth infinity |
grep .xlsx
where $myrepo corresponds to the svn server address.
The libs folder contains a number of subfolders (currently about 30 although eventually up to 100), each which contain a number of tags, branches and a trunk. I wish to get a list of xlsx files contained only within the trunk folder of each of these subfolders. The command above works fine however it only returns the relative path from $myrepo/libs/subfolder/trunk/, so I get this back:
1/2/3/file.xlsx
Because of the potentially large number of files I would have to search through, I am performing it in two parallel steps by using xargs -P (I do not have and cannot use parallels). It am also trying to do this in one command so it can be used in php/perl/etc. and avoid multiple sytem calls.
What I would like to do is concatenate the input to this part of the command:
xargs -P 20 -L 1 svn list --depth infinity
with the output from it, to give the following:
$myrepo/libs/subfolder/trunk/1/2/3/file.xlsx
Then pass this to the grep to find the xlsx files.
I appreciate any assistance that could be provided.

If I manage to correctly divine your intention, something like this might work for you.
svn list "$myrepo/libs/" |
xargs -P 20 -n 1 sh -c 'svn list -R "$0/trunk/$1" |
sed -n "s%.*\.xlsx$%$0/trunk/$1/&%p"' "$myrepo"
Briefly, we postprocess the output from the inner svn list to filter to just .xslx files and tack the full SVN path back on at the same time. This way, the processing happens where the repo path is still known.
We hack things a bit by passing in "$myrepo" as "$0" to the subordinate sh so we don't have to export this variable. The input from the outer svn list comes as $1.
(The repos I have access to have a slightly different layout so there could be a copy/paste error somewhere.)

Related

Loop through a directory and perform action on files with specific permissions in unix

I want to loop through a directory with many subdirectories that have hidden files. I want to loop the directory and open only files that have some certain permission, "drwx-----T+" in this case.
My current script is
#!/bin/sh
cd /z/vendors #vendors has a list of directories which contain hidden files
for FILE in *; do
if [ <what should i put here to select files with permission "drwx-----T+">]; then
cd "$FILE" #and do something here e.g open the hidden files
cd ..
fi
done
I don't know what test condition to use, I know the command ls -l | grep "drwx-----T+" will list the files I need but how can I include this in the if test
The exit status of grep indicates whether the input matched the pattern. So you can use it as the condition in if.
if ls -ld "$FILE" | grep -q -F 'drwx-----T+'; then
# do what you want
fi
The -q option prevents grep from printing the match, and -F makes it match a fixed string rather than treating it as a regular expression (+ has special meaning in regexp).

Mac OS: How to use RSYNC to copy files modified within the last 24 hours and keep folder structure?

It's a simple question that I can't seem to figure out. I'm on a Mac with Big Sur with all the latest updates, and I'm going through Terminal to get these commands to run. If there's a better way please let me know.
This is, in basic terms, what I'm trying to do--I want RSYNC to recursively go through a source directory (which in this case would ideally be an entire drive), find any files modified within the last 24 hours, and copy those to another drive, while preserving the folder structure. So if I have:
/Volumes/Drive1/Folder1/File1.file
/Volumes/Drive1/Folder1/File2.file
/Volumes/Drive1/Folder1/File3.file
And File1 has been modified in the last 24 hours, but the other two haven't, I want it to copy that file, so that on the second drive I wind up with:
/Volumes/Drive2/Folder1/File1.file
But without copying File2 and File3.
I've tried a lot of different solutions and strings, but I'm running into problems. The closest I've been able to get is this:
find /Volumes/Drive1/ -type f -mtime -1 -exec cp -a "{}" /Volumes/Drive2/ \;
The problem is that while this one does go through Drive1 and find all the files newer than a day like I want, when it copies them it just dumps them all into the root of Drive2.
This one also seems to come close:
rsync --progress --files-from=<(find /Volumes/Drive1/ -mtime -1 -type f -exec basename {} \;) /Volumes/Drive1/ /Volumes/Drive2/
This one also identifies all the files modified in the last 24 hours, but instead of copying them it gives an error, "link_stat (filename and path) failed: no such file or directory (2)."
I've spent several days trying to figure out what I'm doing wrong but I can't figure it out. Help please!
I think this'll work:
srcDir=/Volumes/Drive1
destDir=/Volumes/Drive2
(cd "$srcDir" && find . -type f -mtime -1 -print0) |
while IFS= read -r -d $'\0' filepath; do
mkdir -p "$(dirname "$destDir/$filepath")"
cp -a "$srcDir/$filepath" "$destDir/$filepath"
done
Explanation:
Using cd "$srcDir"; find . -whatever will generate relative paths (starting with "./") from the source directory to the found files; that means appending the results to $srcDir and $destDir will give the full source and destination paths for each file.
Putting it in parentheses makes it run in a subshell, so the cd won't affect other commands. Coupling cd and find with && means that if cd fails, it won't run find (which would run in the wrong place, generate a list of the wrong file file, and generally cause trouble).
Using -print0 and while IFS= read -r -d $'\0' is a standard weird-filename-safe way of iterating over found files (see BashFAQ #20). Note that if anything in the loop reads from standard input (e.g. cp -i asking for confirmation), it'll steal part of the file list; if this is a worry, use this variant (instead of the pipe) to send the file list over file descriptor #3 instead of standard input:
while IFS= read -r -d $'\0' filepath <&3; do
...
done 3< <(cd "$srcDir" && find . -type f -mtime -1 -print0)
Finally, mkdir -p is used to make sure the destination directory exists, and then cp to copy the file.

Git Status - List only the direct subfolders with changes, not inner files

I love using Git to organize version control and backup all my web files in Wordpress.
After updating plugins, I'd like to get the list of changes only on the direct subfolder by using git status. Typically if doing git status will a very long line of changes including the inner of each subfolder.
what I'd like is to limit the result to the subfolders with the changes inside the plugins directory.
For example, this git command:
git status project_folder/wp-content/plugins
will result to:
plugins/wpml-translation-management/classes/translation-basket/
plugins/wpml-translation-management/classes/translation-dashboard/
plugins/acfml/assets/
plugins/acfml/classes/class-wpml-acf-attachments.php
plugins/wordpress-seo/js/dist/commons-921.min.js
plugins/wordpress-seo/js/dist/components-921.min.js
Actually, the git command will make a really long list of lines like on the screenshot:
What I would love to know is the git command to output only:
plugins/wpml-translation-management/
plugins/acfml/
plugins/wordpress-seo/
Command such:
git status project_folder/wp-content/plugins --{display_only_direct_subfolders_with_changes}
try this:
git status --porcelain | awk '{print $2}' | xargs -n 1 dirname | uniq
awk '{print $2}'
get filename from list
xargs -n 1 dirname
extract dir from a full path
uniq
show only unique directories
uniq can be a little slower when you have many lines

Pipe output of cat to cURL to download a list of files

I have a list URLs in a file called urls.txt. Each line contains 1 URL. I want to download all of the files at once using cURL. I can't seem to get the right one-liner down.
I tried:
$ cat urls.txt | xargs -0 curl -O
But that only gives me the last file in the list.
This works for me:
$ xargs -n 1 curl -O < urls.txt
I'm in FreeBSD. Your xargs may work differently.
Note that this runs sequential curls, which you may view as unnecessarily heavy. If you'd like to save some of that overhead, the following may work in bash:
$ mapfile -t urls < urls.txt
$ curl ${urls[#]/#/-O }
This saves your URL list to an array, then expands the array with options to curl to cause targets to be downloaded. The curl command can take multiple URLs and fetch all of them, recycling the existing connection (HTTP/1.1), but it needs the -O option before each one in order to download and save each target. Note that characters within some URLs ] may need to be escaped to avoid interacting with your shell.
Or if you are using a POSIX shell rather than bash:
$ curl $(printf ' -O %s' $(cat urls.txt))
This relies on printf's behaviour of repeating the format pattern to exhaust the list of data arguments; not all stand-alone printfs will do this.
Note that this non-xargs method also may bump up against system limits for very large lists of URLs. Research ARG_MAX and MAX_ARG_STRLEN if this is a concern.
A very simple solution would be the following:
If you have a file 'file.txt' like
url="http://www.google.de"
url="http://www.yahoo.de"
url="http://www.bing.de"
Then you can use curl and simply do
curl -K file.txt
And curl will call all Urls contained in your file.txt!
So if you have control over your input-file-format, maybe this is the simplest solution for you!
Or you could just do this:
cat urls.txt | xargs curl -O
You only need to use the -I parameter when you want to insert the cat output in the middle of a command.
xargs -P 10 | curl
GNU xargs -P can run multiple curl processes in parallel. E.g. to run 10 processes:
xargs -P 10 -n 1 curl -O < urls.txt
This will speed up download 10x if your maximum download speed if not reached and if the server does not throttle IPs, which is the most common scenario.
Just don't set -P too high or your RAM may be overwhelmed.
GNU parallel can achieve similar results.
The downside of those methods is that they don't use a single connection for all files, which what curl does if you pass multiple URLs to it at once as in:
curl -O out1.txt http://exmple.com/1 -O out2.txt http://exmple.com/2
as mentioned at https://serverfault.com/questions/199434/how-do-i-make-curl-use-keepalive-from-the-command-line
Maybe combining both methods would give the best results? But I imagine that parallelization is more important than keeping the connection alive.
See also: Parallel download using Curl command line utility
Here is how I do it on a Mac (OSX), but it should work equally well on other systems:
What you need is a text file that contains your links for curl
like so:
http://www.site1.com/subdirectory/file1-[01-15].jpg
http://www.site1.com/subdirectory/file2-[01-15].jpg
.
.
http://www.site1.com/subdirectory/file3287-[01-15].jpg
In this hypothetical case, the text file has 3287 lines and each line is coding for 15 pictures.
Let's say we save these links in a text file called testcurl.txt on the top level (/) of our hard drive.
Now we have to go into the terminal and enter the following command in the bash shell:
for i in "`cat /testcurl.txt`" ; do curl -O "$i" ; done
Make sure you are using back ticks (`)
Also make sure the flag (-O) is a capital O and NOT a zero
with the -O flag, the original filename will be taken
Happy downloading!
As others have rightly mentioned:
-cat urls.txt | xargs -0 curl -O
+cat urls.txt | xargs -n1 curl -O
However, this paradigm is a very bad idea, especially if all of your URLs come from the same server -- you're not only going to be spawning another curl instance, but will also be establishing a new TCP connection for each request, which is highly inefficient, and even more so with the now ubiquitous https.
Please use this instead:
-cat urls.txt | xargs -n1 curl -O
+cat urls.txt | wget -i/dev/fd/0
Or, even simpler:
-cat urls.txt | wget -i/dev/fd/0
+wget -i/dev/fd/0 < urls.txt
Simplest yet:
-wget -i/dev/fd/0 < urls.txt
+wget -iurls.txt

how to delete all files except the latest three in a folder

I have a folder which contains some subversion revision checkouts (these are checked out when running a capistrano deployment recipe).
what I want to do really is that to keep the latest 3 revisions which the capistrano script checkouts and delete other ones, so for this I am planning to run some command on the terminal using a run command, actually capistrano hasn't got anything to do here, but a unix command.
I was trying to run a command to get a list of files except the lastest three and delete the rest, I could get the list of files using the following command.
(ls -t /var/path/to/folder |head -n 3; ls /var/path/to/folder)|sort|uniq -u|xargs
now if I add a rm -Rf to the end of this command it returns me with file not found to delete. so thats obvious because this returns only the name of the folder, not the full path to the folder.
is there anyway to delete these files / folders using one unix command?
Alright, there are a few things wrong with your script.
First, and most problematically, is this line:
ls -t /var/path/to/folder |head -n 3;
ls -t will return a list of files in order of their last modification time, starting with the most recently modified. head -n 3 says to only list the first three lines. So what this is saying is "give me a list of only the three most recently modified files", which I don't think is what you want.
I'm not really sure what you're doing with the second ls command, but I'm pretty sure that's just going to concatenate all the files in the directory into your list. That means when it gets sorted and uniq'ed, you'll just be left with an alphabetical list of all the files in that directory. When this gets passed to something like xargs rm, you'll wipe out everything in that directory.
Next, sort | uniq doesn't need the uniq part. You can just use the -u switch on sort to get rid of duplicates. You don't need this part anyway.
Finally, the actual removal of the directory. On that part, you had it right in your question: just use rm -r
Here's the easiest way I can think to do this:
ls -t1 /var/path/to/folder | tail -n +4 | xargs rm -r
Here's what's happening here:
ls -t1 is printing a list, one file/directory per line, of all files in /var/path/to/folder, ordering by the most recent modification date.
tail -n +4 is printing all lines in the output of ls -t1 starting with the fourth line (i.e. the three most recently modified files won't be listed)
xargs rm -r says to delete any file output from the tail. The -r means to recursively delete files, so if it encounters a directory, it will delete everything in that directory, then delete the directory itself.
Note that I'm not sorting anything or removing any duplicates. That's because:
ls only reports a file once, so there are no duplicates to remove
You're deleting every file passed anyway, so it doesn't matter in what order they're deleted.
Does all of that make sense?
Edit:
Since I was wrong about ls specifying the full path when passed an absolute directory, and since you might not be able to perform a cd, perhaps you could use tail instead.
For example:
ls -t1 /var/path/to/folder | tail -n +4 | xargs find /var/path/to/folder -name $1 | xargs rm -r
Below is a useful way of doing the task.......!!
for Linux and HP-UX:
ls -t1 | tail -n +50 | xargs rm -r # to leave latest 50 files/directories.
for SunOS:
rm `(ls -t |head -n 100; ls)|sort|uniq -u`
Hi I found a way to do this we can use the unix &&
so the command will look like this
cd /var/path/to/folder && ls -t1 /var/path/to/folder | tail -n +4 | xargs rm -r

Resources