Unzip only limited number of files in linux - unzip

I have a zipped file containing 10,000 compressed files. Is there a Linux command/bash script to unzip only 1,000 files ? Note that all compressed files have same extension.

unzip -Z1 test.zip | head -1000 | sed 's| |\\ |g' | xargs unzip test.zip
-Z1 provides a raw list of files
sed expression encodes spaces (works everywhere, including MacOS)

You can use wildcards to select a subset of files. E.g.
Extract all contained files beginning with b:
unzip some.zip b*
Extract all contained files whose name ends with y:
unzip some.zip *y.extension
You can either select a wildcard pattern that is close enough, or examine the output of unzip -l some.zip closely to determine a pattern or set of patterns that will get you exactly the right number.

I did this:
unzip -l zipped_files.zip |head -1000 |cut -b 29-100 >list_of_1000_files_to_unzip.txt
I used cut to get only the filenames, first 3 columns are size etc.
Now loop over the filenames :
for files in `cat list_of_1000_files_to_unzip.txt `; do unzip zipped_files.zip $files;done

Some advices:
Execute zip to only list a files, redirect output to some file
Truncate this file to get only top 1000 rows
Pass the file to zip to extract only specified files

Related

How to move files based on a list (which contains the filename and destination path) in terminal?

I have a folder that contains a lot of files. In this case images.
I need to organise these images into a directory structure.
I have a spreadsheet that contains the filenames and the corresponding path where the file should be copied to. I've saved this file as a text document named files.txt
+--------------+-----------------------+
| image01.jpg | path/to/destination |
+--------------+-----------------------+
| image02.jpg | path/to/destination |
+--------------+-----------------------+
I'm trying to use rsync with the --files-from flag but can't get it to work.
According to man rsync:
--include-from=FILE
This option is related to the --include option, but it specifies a FILE that contains include patterns (one per line). Blank lines in the file and lines starting with ';' or '#' are ignored. If FILE is -, the list will be read from standard input
Here's the command i'm using: rsync -a --files-from=/path/to/files.txt path/to/destinationFolder
And here's the rsync error: syntax or usage error (code 1) at /BuildRoot/Library/Caches/com.apple.xbs/Sources/rsync/rsync-52.200.1/rsync/options.c(1436) [client=2.6.9]
It's still pretty unclear to me how the files.txt document should be formatted/structured and why my command is failing.
Any help is appreciated.

Delete files from a list in a text file

I have a text file containing around 500 lines. Each line is an absolute path to a file. I want to delete these files using a script.
There's a suggestion here but my files have spaces in them. They have been treated with \ to escape the space but it still doesn't work. There is discussion on that thread about problems with white spaces but no solutions.
I can't simply use the find command as that won't give me the precise result, I need to use the list (which was created by running find and editing out the discrepancies).
Edit: some context. I noticed that iTunes has re-downloaded and copied multiple songs and put them in the same directory as the original songs, e.g., inside a particular album directory is '01 This Song.aac' and '01 This Song 1.aac'.
I ran a find to produce a text file with all songs matching "* 1.*" to get songs ending in 1 but of any file type. I ran this in my iTunes Media/Music directory.
Some of these songs included in the file had the number 1 in but weren't actually duplicates (victims of circumstance), so I manually deleted them.
The file I am left with is around 500 lines with songs all including spaces in the filenames. Because it's an iTunes issue, there are just a few songs in one directory, then more in another, then another, and so on -- I can't just run a script on a single directory, it has to work recursively and run only on the files named in my list.txt
As you would expect, the trick is to get the quoting right:
while read line; do rm "$line"; done < filename
To remove the file which name has spaces you can just wrap the whole path in quotes.
And to delete the list of files I would recommend to change each line of your file so that it looks like rm call. The fastest way is to use sed. So if your file is in following format:
/home/path/file name.asd
/opt/some/string/another name.wasd
...
The oneliner for that would be something like this:
sed -e 's/^/rm -f "/' file.txt | sed -e 's/$/" ;/' > newfile.sh
First sed replaces beginning of the line with rm -f ", second sed end of the line with " ;.
It would produce file with following content:
rm -rf "/home/path/file name.asd" ;
rm -rf "/opt/some/string/another name.wasd" ;
...
So you can just execute this file as a bash script.

Is there any way to extract only one file (or a regular expression) from tar file

I have a tar.gz file.
Because of space issues and the time required extract is longer, I need to extract only the selected file.
I have tried the below
grep -l '<text>' *
file1
file2
only file1,file2 should be extracted.
What should I Do to SAVE all the tail -f data to a FILE swa3?
I have swa1.out which has list of online data inputs.
swa2 is a file which should skip the keywords from swa1.
swa3 is a file where it should write the data.
Can anyone help in this?
I have tried below commnad, but I'm not able to get it
tail -f SWA1.out |grep -vf SWA2 >> swa3
You can use do this with --extract option like this
tar --extract --file=test.tar.gz main.c
Here in --file , specify the .gz filename and at the end specify the
filename you want to extract.

Number of lines differ in text and zipped file

I zippded few files in unix and later found zipped files have different number of lines than the raw files.
>>wc -l
70308 /location/filename.txt
2931 /location/filename.zip
How's this possible?
zip files are binary files. wc command is targeted for text files.
zip compressed version of a text file may contain more or less number of newline characters because zipping is not done line per line. So if they both give same output for all commands, there is no point of compressing and keeping the file in different format.
From wc man page:
-l, --lines
print the newline counts
To get the matching output, you should try
$ unzip -c | wc -l # Decompress on stdout and count the lines
This would give (about) 3 extra lines (if there is no directory structure involved). If you compressed directory containing text file instead of just file, you may see a few more lines containing the file/directory information.
In compression algorithm word/character is replaced by some binary sequence.
let's suppose \n is replaced by 0011100
and some other character 'x' is replaced by 0001010(\n)
so wc program search for sequence 0001010 in zip file and count of these can vary.

Extracting a list of files and creating a new file containing this list

I am a researcher and my skill in Unix commands is limited. I am currently dealing with a folder containing about 1000 files and I have to extract some filenames from this folder and create another file (configuration file) containing these filenames.
Basically, the folder has filenames in the following format :
1_Apple_A_someword.txt
1_Apple_B_someword.txt
2_Apple_A_someword.txt
2_Apple_B_someword.txt
3_Apple_A_someword.txt
3_Apple_B_someword.txt
and so on up until
1000_Apple_A_someword.txt
1000_Apple_B_someword.txt
I just want to extract out all files which have "Apple_A" in them. Also, I want to create another file which has 'labels' (Unix variables) for each of these "Apple_A" files whose values are the names of the files. Also, the 'labels' are part of the filenames (everything up until the word "Apple") For example,
1_Apple=1_Apple_A_someword.txt
2_Apple=2_Apple_A_someword.txt
3_Apple=3_Apple_A_someword.txt
and so on...till
1000_Apple=1000_Apple_A_someword.txt
Could you tell me a one-line Unix command that does this ? Maybe using "awk" and "sed"
Try
ls *Apple_A* | sed 's/\(\(.*Apple\).*\)$/\2=\1/'
This might work for you (GNU sed):
ls -1 | sed '/^\([0-9]\+_Apple\)_A/!d;s//\1=&/'
An awk version, FYI.:
ls -1 | awk -vFS='_' '/Apple_A/ {print $1"_"$2"="$0}'
ls -1|perl -F_ -ane 'if($_=~m/Apple_A/){print $F[0]."_".$F[1]."=".$_}'

Resources