I have some .txt file in a particular /path/doc.txt and i wish to gzip all the files and move the new file that zipped all txt file into another path. How will i achieve that in one line of code.
maybe something like:
find /path/doc/ -type f -name \*.txt | xargs tar -z -c -f save.tar.gz && mv save.tar.gz other/path
use
tar -vtf save.tar.gz
to check archive content
Related
I have a directory (dir) (with files and subdirectories):
ls -1 dir
plot.pdf
subdir.1
subdir.2
obj.RDS
And then ls -1 for either subdir.1 or subdir.2:
plot.pdf
PC.pdf
results.csv
de.pdf
de.csv
de.RDS
I would like to tar and gzip dir (in unix) and I'd like to exclude all RDS files (the the level right below dir and the ones in its subdirectories).
What's the easiest way to achieve that? Perhaps in a one liner
Something like:
find dir -type f -not -name '*.RDS' -print0 |
tar --null -T- -czf TARGET.tgz
should do it.
First, find finds the files, and then tar accepts the list via -T- (= --files-from /dev/stdin).
-print0 on find combined wth --null on tar protect from weird filenames.
-czf == Create gZipped File
You can add v to get verbose output.
To later inspect the contents, you can do:
tar tf TARGET.tgz
tar --exclude=*.RDS -Jcf outputball.tar dir_to_compress
this will ignore *.RDS across any dir or subdirs
decompress using
tar -xvf outputball.tar
I've got a little problem with my bash script. I'm newbie in unix world, so I find it difficult to deal with an exercise. What I have to do is find files on Solaris server with specific name, modified in specific time and archive them in one .tar file. First two points are easy, but I'm having a nightmare with trying to archive it. The thing is, I constantly archive whole tree of file (with file at the end) to .tar file, but I need just a file. My code looks like this:
find ~ -name "$maska" -mtime -$dni | xargs -t -L 1 tar -cvf $3 -C
where $maska is the name of the file, $dni refers to modification time and $3 is just a archive name. I found out about -C switch, that let's me jump into the folder where desired file is, but when I use it with xargs, it seems just to jump there and do nothing else.
So my question is:
1) is there any possibility of achieving my goal this way?
Please remember, I don't work on gnu tar. And I HAVE TO use commands: tar, find.
Edit: I'd like to specify more my problem. When I use the script for, for example, file a, it should look for it since the point shown in script (it's ~ ) and everything it will find should be in one tar file.
What I got right now is (I'm in /home/me/Scripts):
-bash-3.2$ ./Script.sh a 1000 backup
a /home/me/Program/Test/a/ 0K
a /home/me/Program/Test/a/a.c 1K
a /home/me/Program/Test/a/a.out 8K
So script has done some packing. Next I want to see my packed file, so:
-bash-3.2$ tar -tf backup
/home/me/Program/Test/a/
/home/me/Program/Test/a/a.c
/home/me/Program/Test/a/a.out
And that's the problem. Tar file have all the paths in it, so if I will untar it, instead of getting just the file I wanted to archive, I will replace them in their old places. For visualisation:
-bash-3.2$ ls
Script.sh* Script.sh~* backup
-bash-3.2$ tar -xvf backup
x /home/me/Program/Test/a, 0 bytes, 0 tape blocks
x /home/me/Program/Test/a/a.c, 39 bytes, 1 tape blocks
x /home/me/Program/Test/a/a.out, 7928 bytes, 16 tape blocks
-bash-3.2$ ls
Script.sh* Script.sh~* backup
That's the problem.
So all I want is to pack all those desired file (a in example above) in one tar file without those paths, so it will simply untar in the directory I run the Script.sh.
I'm not sure to understand what you want but this might be it :
find ~ -name "$maska" -mtime -$dni -exec tar cvf $3 {} +
Edit: second attempt after your wrote the main issue is the absolute path:
( cd ~; find . -name "$maska" -type f -mtime -$dni -exec tar cvf $3 {} + )
Edit: third attempt, after you wrote you want no path at all in the archive, maska is a directory name and $3 need to be in the current directory:
mkdir ~/foo && \
find ~ -name "$maska" -type d -mtime -$dni -exec sh -c 'ln -s $1/* ~/foo/' sh {} \; && \
( cd ~/foo ; tar chf - * ) > $3 && \
rm -rf ~/foo
Replace ~/foo by ~/somethingElse if ~/foo already exists for some reason.
Maybe you can do something like this:
#!/bin/bash
find ~ -name "$maska" -mtime -$dni -print0 | while read -d $'\0' file; do
d=$(dirname "$file")
f=$(basename "$file")
echo $d: $f # Show directory and file for debug purposes
tar -rvf tarball.tar -C"$d" "$f"
done
I don't have a Solaris box at hand for testing :-)
First of all, my assumptions:
1. "one tar file", like you said, and
2. no absolute paths, ie if you backup ~/dir/file, you should be able to test extracting it in /tmp obtaining /tmp/dir/file.
If the problem is the full paths, you should replace
find ~ # etc
with
cd ~ || exit
find . # etc
If the tar archive isn't an absolute name, instead, it should be something like
(
cd ~ || exit
find . etc etc | xargs tar cf - etc etc
) > $3
Explanation
"(...)" runs a subshell, meaning some of the tings you change in there have no effects outside of the parens; the current directory is one of them, so "(cd whatever; foo)" means you run another shell, change its current directory, run foo from there, and then you're back in your script which never changed directory.
"cd ~ || exit" is paranoia, it means "cd ~; if that fails, exit".
"." is an alias meaning "the current directory, whatever that is"; play with "find ." vs "find ~" if you don't know what it means, you'll understand it better than if I explained it here.
"tar cf -" means that you create the tar archive on standard output; I think the syntax is portable enough, you may have to replace "-" with "/dev/stdout" or whatever works on solaris (the simplest solution is simply "tar", without the "c" command, but it's ugly to read).
The final "> $3", outside of the parens, is output redirection: rather than writing the output to the terminal, you save it into a file.
So the whole script reads like this:
- open a subshell
- change the subshell's current directory to ~
- in the subshell, find the files newer than requested, archive them, and write the contents of the resulting tar archive to standard output
- the subshell's stdout is saved to $3; because the redirection is outside the parens, relative paths are resolved relatively to your script's $PWD, meaning that eg if you run the script from the /tmp directory you'll get a tar archive in the /tmp directory (it would be in ~ if the redirection happened in the subshell).
If I misunderstood your question, the solution doesn't work or the explanation isn't clear let me know (the answer is too long, but I already know that :).
The pax command will output tar-compatible archives and has the flexibility you need to rewrite pathnames.
find ~ -name "$maska" -mtime -$dni | pax -w -x ustar -f "$3" -s '!.*/!!'
Here are what the options mean, paraphrasing from the man page:
-w write the contents of the file operands to the standard output (or to the pathname specified by the -f option) in an archive format.
-x ustar the output archive format is the extended tar interchange format specified in the IEEE POSIX standard.
-s '!.*/!!' Modifies file operands according to the substitution expression, using regular expression syntax. Here, it deletes all characters in each file name from the beginning to the final /.
In Unix, is it possible to use one command ONLY to list the directory if a sub-directory exists?
For example, I would like to list the directory name if it contains a sub-directory called "division_A"
/data/data_file/form_100/division_A
/data/data_file/form_101/division_A
/data/data_file/form_102/division_A
The desired result would be
form_100
form_101
form_102
I can only use 2 command lines to realize the goal.
cd /data/data_files
echo `ls -d */division_A 2> /dev/null | sed 's,/division_A,,g'`
So I would like to ask if anyone can use one command to proceed it.
Many Thanks!
Using find:
find /data/data_file -type d -name division_A -exec sh -c 'basename `dirname {}`' \; 2> /dev/null
If you don't mind the weird .., you can just do:
$ ls -d /data/data_file/*/division_A/..
It will output something like /data/data_file/form_100/division_A/.. and you can access it like normal folders.
I have a tar.gz file about 13GB in size. It contains about 1.2 million documents. When I untar this all these files sit in one single directory & any reads from this directory takes ages. Is there any way I can split the files from the tar into multiple new folders?
e.g.: I would like to create new folders named [1,2,...] each having 1000 files.
This is a quick and dirty solution but it does the job in Bash without using any temporary files.
i=0 # file counter
dir=0 # folder name counter
mkdir $dir
tar -tzvf YOURFILE.tar.gz |
cut -d ' ' -f12 | # get the filenames contained in the archive
while read filename
do
i=$((i+1))
if [ $i == 1000 ] # new folder for every 1000 files
then
i=0 # reset the file counter
dir=$((dir+1))
mkdir $dir
fi
tar -C $dir -xvzf YOURFILE.tar.gz $filename
done
Same as a one liner:
i=0; dir=0; mkdir $dir; tar -tzvf YOURFILE.tar.gz | cut -d ' ' -f12 | while read filename; do i=$((i+1)); if [ $i == 1000 ]; then i=0; dir=$((dir+1)); mkdir $dir; fi; tar -C $dir -xvzf YOURFILE.tar.gz $filename; done
Depending on your shell settings the "cut -d ' ' -f12" part for retrieving the last column (filename) of tar's content output could cause a problem and you would have to modify that.
It worked with 1000 files but if you have 1.2 million documents in the archive, consider testing this with something smaller first.
Obtain filename list with --list
Make files containing filenames with grep
untar only these files using --files-from
Thus:
tar --list archive.tar > allfiles.txt
grep '^1' allfiles.txt > files1.txt
tar -xvf archive.tar --files-from=files1.txt
If you have GNU tar you might be able to make use of the --checkpoint and --checkpoint-action options. I have not tested this, but I'm thinking something like:
# UNTESTED
cd /base/dir
mkdir $(printf "dir%04d\n" {1..1500}) # probably more than you need
ln -s dest0 linkname
tar -C linkname ... --checkpoint=1000 \
--checkpoint-action='sleep=1' \
--checkpoint-action='exec=ln -snf dest%u linkname ...
you can look at the man page and see if there are options like that. worst comes to worst, just extract the files you need (maybe using --exclude ) and put them into your folders.
tar doesn't provide that capability directly. It only restores its files into the same structure from which it was originally generated.
Can you modify the source directory to create the desired structure there and then tar the tree? If not, you could untar the files as they are in the file and then post-process that directory using a script to move the files into the desired arrangement. Given the number of files, this will take some time but at least it can be done in the background.
The unzip command doesn't have an option for recursively unzipping archives.
If I have the following directory structure and archives:
/Mother/Loving.zip
/Scurvy/Sea Dogs.zip
/Scurvy/Cures/Limes.zip
And I want to unzip all of the archives into directories with the same name as each archive:
/Mother/Loving/1.txt
/Mother/Loving.zip
/Scurvy/Sea Dogs/2.txt
/Scurvy/Sea Dogs.zip
/Scurvy/Cures/Limes/3.txt
/Scurvy/Cures/Limes.zip
What command or commands would I issue?
It's important that this doesn't choke on filenames that have spaces in them.
If you want to extract the files to the respective folder you can try this
find . -name "*.zip" | while read filename; do unzip -o -d "`dirname "$filename"`" "$filename"; done;
A multi-processed version for systems that can handle high I/O:
find . -name "*.zip" | xargs -P 5 -I fileName sh -c 'unzip -o -d "$(dirname "fileName")/$(basename -s .zip "fileName")" "fileName"'
A solution that correctly handles all file names (including newlines) and extracts into a directory that is at the same location as the file, just with the extension removed:
find . -iname '*.zip' -exec sh -c 'unzip -o -d "${0%.*}" "$0"' '{}' ';'
Note that you can easily make it handle more file types (such as .jar) by adding them using -o, e.g.:
find . '(' -iname '*.zip' -o -iname '*.jar' ')' -exec ...
Here's one solution that extracts all zip files to the working directory and involves the find command and a while loop:
find . -name "*.zip" | while read filename; do unzip -o -d "`basename -s .zip "$filename"`" "$filename"; done;
You could use find along with the -exec flag in a single command line to do the job
find . -name "*.zip" -exec unzip {} \;
This works perfectly as we want:
Unzip files:
find . -name "*.zip" | xargs -P 5 -I FILENAME sh -c 'unzip -o -d "$(dirname "FILENAME")" "FILENAME"'
Above command does not create duplicate directories.
Remove all zip files:
find . -depth -name '*.zip' -exec rm {} \;
Something like gunzip using the -r flag?....
Travel the directory structure recursively. If any of the file names specified on the command line are directories, gzip will descend into the directory and compress all the files it finds there (or decompress them in the case of gunzip ).
http://www.computerhope.com/unix/gzip.htm
If you're using cygwin, the syntax is slightly different for the basename command.
find . -name "*.zip" | while read filename; do unzip -o -d "`basename "$filename" .zip`" "$filename"; done;
I realise this is very old, but it was among the first hits on Google when I was looking for a solution to something similar, so I'll post what I did here. My scenario is slightly different as I basically just wanted to fully explode a jar, along with all jars contained within it, so I wrote the following bash functions:
function explode {
local target="$1"
echo "Exploding $target."
if [ -f "$target" ] ; then
explodeFile "$target"
elif [ -d "$target" ] ; then
while [ "$(find "$target" -type f -regextype posix-egrep -iregex ".*\.(zip|jar|ear|war|sar)")" != "" ] ; do
find "$target" -type f -regextype posix-egrep -iregex ".*\.(zip|jar|ear|war|sar)" -exec bash -c 'source "<file-where-this-function-is-stored>" ; explode "{}"' \;
done
else
echo "Could not find $target."
fi
}
function explodeFile {
local target="$1"
echo "Exploding file $target."
mv "$target" "$target.tmp"
unzip -q "$target.tmp" -d "$target"
rm "$target.tmp"
}
Note the <file-where-this-function-is-stored> which is needed if you're storing this in a file that is not read for a non-interactive shell as I happened to be. If you're storing the functions in a file loaded on non-interactive shells (e.g., .bashrc I believe) you can drop the whole source statement. Hopefully this will help someone.
A little warning - explodeFile also deletes the ziped file, you can of course change that by commenting out the last line.
Another interesting solution would be:
DESTINY=[Give the output that you intend]
# Don't forget to change from .ZIP to .zip.
# In my case the files were in .ZIP.
# The echo were for debug purpose.
find . -name "*.ZIP" | while read filename; do
ADDRESS=$filename
#echo "Address: $ADDRESS"
BASENAME=`basename $filename .ZIP`
#echo "Basename: $BASENAME"
unzip -d "$DESTINY$BASENAME" "$ADDRESS";
done;
You can also loop through each zip file creating each folder and unzip the zip file.
for zipfile in *.zip; do
mkdir "${zipfile%.*}"
unzip "$zipfile" -d "${zipfile%.*}"
done
this works for me
def unzip(zip_file, path_to_extract):
"""
Decompress zip archives recursively
Args:
zip_file: name of zip archive
path_to_extract: folder where the files will be extracted
"""
try:
if is_zipfile(zip_file):
parent_file = ZipFile(zip_file)
parent_file.extractall(path_to_extract)
for file_inside in parent_file.namelist():
if is_zipfile(os.path.join(os.getcwd(),file_inside)):
unzip(file_inside,path_to_extract)
os.remove(f"{zip_file}")
except Exception as e:
print(e)