copy first 100 files from directory which have a specific sentence inside the file

copy first 100 files from directory which have a specific sentence inside the file - unix

Copy first 100 files from one directory /ctm/logs to another /data/input
These files must have the sentence "completed with error code zero" inside them
Is there any method?

cd into your /ctm/logs directory, then use this:
grep -lr "completed with error code zero" . | head -100 | xargs -I% cp "%" /data/input
Double check to make sure the desired 100 files are being copied. If there is an issue with the ordering this is better suited to be solved with a small program/script rather than a one-liner.

Related

Unix Find directories containing more than 2 files

I have a directory containing over a thousand subdirectories. I only want to 'ls' all the directories that contain more than 2 files. I don't need the directories that contain less than 2 files. This is in C-shell, not bash. Anyone know of a good command for this?
I tried this command but it's not giving the desired output. I simply want the full list of directories with more than 2 files. A reason it isn't working is because it will go into sub dirs in those dirs to find if they have more than 2 files. I don't want a recursive search. Just a list of first level directories in the main directory they are in.
$ find . -type f -printf '%h\n' | sort | uniq -c | awk '$1 > 2'

My mistake, I was thinking bash rather than csh. Although I don't have a csh to test with, I think this is the csh syntax for the same thing:
foreach d (*)
if (d "$d" && `ls -1 "$d"|wc -l` > 2) echo "$d"
end
I've added a guard so that non-directories aren't unnecessarily processed, and I've included double-quotes in case there are any "funny" file or directory names (containing spaces e.g.).
One possible problem (I don't know what your exact task is): any immediate subdirectories will also count as files.
Sorry, I was working in bash here:
for d in *;do if [ $(ls -1 $d|wc -l) -gt 2 ];then echo $d;fi;done
For a faster solution, you could try "cheating" by deconstructing the format of the directories themselves if you're on pure Unix. They're just files themselves with contents that can be analyzed in that case. Needless to say that is NOT PORTABLE, to e.g. any bash running on Windows, so not recommended.

Script Issues with find -> tar/gzip

I am currently working on a script, to store/backup our old files, so that we have more space on our server. This script will be used as a cronjob to backup the stuff every week. My script currently looks like this:
#!/bin/bash
currentDate=$(date '+%Y%m%d%T' | sed -e 's/://g')
find /Directory1/ -type f -mtime +90 | xargs tar cvf - | gzip > /Directory2/Backup$currentDate.tar.gz
find /Directory1/ -type f -mtime +90 -exec rm {} \;
The script is at first saving the current Date + Timestamp(without ":") as a variable. Afterwards it searches for files older than 90 days, tars them and finally makes a gzip out of them, which has the name "Backup$currentDate.tar.gz".
Then it's supposed to find the files again and remove them.
I do however have some issues here:
Directory1 consists of multiple Directories. It does find the files and creates the gz file, but while some files are zipped properly(for instance /DirName1/DirName2/DirName3/File), others appear directly in the "root" Dir. What could be the issue here?
Is there a way to tell the Script, to only create the gz file, if files are found? Because currently, we get gz files, even if there was nothing found, leading to empty directories.
Can I somehow use the find output later on(store variable?), so that the remove at the end really only targets those files found in the step before? Because if the third step would take, let's say a hour and the last step gets executed after it's finished, it could potentially remove files, that weren't older than 90 days before, but are now, so they are never backed up, but then deleted(highly unlikly, but not impossible).
If there's anything else you need to know, feel free to ask ^^
Best regards

I've "rephrased" your original code a bit. I don't have an AIX machine to test anything, so DO NOT cut and paste this. Using this code, you should be able to address your issues. To wit:
It make a record of what files it intends to operate on ($BFILES).
This record can be used to check for empty tar files.
This record can be used to see why your find is producing "funny" output. It wouldn't surprise me to find that xargs hit a space character.
This record can be used to delete exactly the files archived.
As a child, I had a serious accident with xargs and have avoided it ever since. Maybe there is a safe version out there.
#!/bin/bash
# I don't have an AIX machine to test this, so exit immediately until
# someone can proof this code.
exit 1
currentDate=$(date '+%Y%m%d%T' | sed -e 's/://g')
BFILES=/tmp/Backup$currentDate.files
find /Directory1 -type f -mtime +90 -print > $BFILES
# Here is the time to proofread the file list, $BFILES
# The AIX page I read lists the '-L' option to take filenames from an
# input file. I've found xargs to be sketchy unless you are very
# careful about quoting.
#tar -c -v -L $BFILES -f - | gzip -9 > /Directory2/Backup$currentDate.tar.gz
# I've found xargs to be sketchy unless you are very careful about
# quoting. I would rather loop over the input file one well quoted
# line at a time rather than use the faster, less safe xargs. But
# here it is.
#xargs rm < $BFILES

How to use mv command to rename multiple files in unix?

I am trying to rename multiple files with extension xyz[n] to extension xyz
example :
mv *.xyz[1] to *.xyz
but the error is coming as - " *.xyz No such file or directory"

Don't know if mv can directly work using * but this would work
find ./ -name "*.xyz\[*\]" | while read line
do
mv "$line" ${line%.*}.xyz
done

Let's say we have some files as shown below.Now i want remove the part -(ab...) from those files.
> ls -1 foo*
foo-bar-(ab-4529111094).txt
foo-bar-foo-bar-(ab-189534).txt
foo-bar-foo-bar-bar-(ab-24937932201).txt
So the expected file names would be :
> ls -1 foo*
foo-bar-foo-bar-bar.txt
foo-bar-foo-bar.txt
foo-bar.txt
>
Below is a simple way to do it.
> ls -1 | nawk '/foo-bar-/{old=$0;gsub(/-\(.*\)/,"",$0);system("mv \""old"\" "$0)}'
for detailed explanation check here

Here is another way using the automated tools of StringSolver. Let us say your first file is named abc.xyz[1] a second named def.xyz[1] and a third named ghi.jpg (not the same extension as the previous two).
First, filter the files you want by giving examples (ok and notok are any words such that the first describes the accepted files):
filter abc.xyz[1] ok def.xyz[1] ok ghi.jpg notok
Then perform the move with the filter it created:
mv abc.xyz[1] abc.xyz
mv --filter --all
The second line generalizes the first transformation on all files ending with .xyz[1].
The last two lines can also be abbreviated in just one, which performs the moves and immediately generalizes it:
mv --filter --all abc.xyz[1] abc.xyz
DISCLAIMER: I am a co-author of this work for academic purposes. Other examples are available on youtube.

I think mv can't operate on multiple files directly without loop.
Use rename command instead. it uses regular expressions but easy to use once mastered and more powerful.
rename 's/^text-to-replace/new-text-you-want/' text-to-replace*
e.g to rename all .jar files in a directory to .jar_bak
rename 's/^jar/jar_bak/' jar*

Efficient way of getting listing of files in large filesystem

What is the most efficient way to get a "ls"-like output of the most recently created files in a very large unix file system (100 thousand files +)?
Have tried ls -a and some other varients.

You can also use less to search and scroll it easily.
ls -la | less

If I'm understanding your question correctly try
ls -a | tail
More information here

If the files are in a single directory, then you can use:
ls -lt | less
the -t option to ls will sort the files by modification time and less will let you scroll through them
If the want recent files across an entire file system --- i.e., in different directories, then you can use the find command:
find dir -mtime 1 -print | xargs ls -ld
Substitute the directory where you want to start the search for "dir". The find command will print the names of all of the files that have been modified in the last day (-mtime 1 means modified in the last one day) and the xargs command will take that list of files and feed it to ls, giving you the ls-like output you want

Diff files present in two different directories

I have two directories with the same list of files. I need to compare all the files present in both the directories using the diff command. Is there a simple command line option to do it, or do I have to write a shell script to get the file listing and then iterate through them?

You can use the diff command for that:
diff -bur folder1/ folder2/
This will output a recursive diff that ignore spaces, with a unified context:
b flag means ignoring whitespace
u flag means a unified context (3 lines before and after)
r flag means recursive

If you are only interested to see the files that differ, you may use:
diff -qr dir_one dir_two | sort
Option "q" will only show the files that differ but not the content that differ, and "sort" will arrange the output alphabetically.

Diff has an option -r which is meant to do just that.
diff -r dir1 dir2

diff can not only compare two files, it can, by using the -r option, walk entire directory trees, recursively checking differences between subdirectories and files that occur at comparable points in each tree.
$ man diff
...
-r --recursive
Recursively compare any subdirectories found.
...
Another nice option is the über-diff-tool diffoscope:
$ diffoscope a b
It can also emit diffs as JSON, html, markdown, ...

If you specifically don't want to compare contents of files and only check which one are not present in both of the directories, you can compare lists of files, generated by another command.
diff <(find DIR1 -printf '%P\n' | sort) <(find DIR2 -printf '%P\n' | sort) | grep '^[<>]'
-printf '%P\n' tells find to not prefix output paths with the root directory.
I've also added sort to make sure the order of files will be the same in both calls of find.
The grep at the end removes information about identical input lines.

If it's GNU diff then you should just be able to point it at the two directories and use the -r option.
Otherwise, try using
for i in $(\ls -d ./dir1/*); do diff ${i} dir2; done
N.B. As pointed out by Dennis in the comments section, you don't actually need to do the command substitution on the ls. I've been doing this for so long that I'm pretty much doing this on autopilot and substituting the command I need to get my list of files for comparison.
Also I forgot to add that I do '\ls' to temporarily disable my alias of ls to GNU ls so that I lose the colour formatting info from the listing returned by GNU ls.

When working with git/svn or multiple git/svn instances on disk this has been one of the most useful things for me over the past 5-10 years, that somebody might find useful:
diff -burN /path/to/directory1 /path/to/directory2 | grep +++
or:
git diff /path/to/directory1 | grep +++
It gives you a snapshot of the different files that were touched without having to "less" or "more" the output. Then you just diff on the individual files.

In practice the question often arises together with some constraints. In that case following solution template may come in handy.
cd dir1
find . \( -name '*.txt' -o -iname '*.md' \) | xargs -i diff -u '{}' 'dir2/{}'

Here is a script to show differences between files in two folders. It works recursively. Change dir1 and dir2.
(search() { for i in $1/*; do [ -f "$i" ] && (diff "$1/${i##*/}" "$2/${i##*/}" || echo "files: $1/${i##*/} $2/${i##*/}"); [ -d "$i" ] && search "$1/${i##*/}" "$2/${i##*/}"; done }; search "dir1" "dir2" )

Try this:
diff -rq /path/to/folder1 /path/to/folder2

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex