Extract exactly one file (any) from each 7zip archive, in bulk (Unix) - unix

I have 1,500 7zip archives, each archive contains 2 to 10 files, with no subdirectories.
Each file has the same extension, however the filename varies.
I only want one file out of each archive, but I'd like to perform this in bulk. I do not care which file is taken out, as long as only one file is taken out. It can be the first file, the newest, the biggest, the smallest, it doesn't matter.
Here's an example:
aa.7z {blah 56.smc, blah 57.smc, 1 blah 58.smc}
ab.7z {xx.smc, xx 1.smc, xx_2.smc}
ac.7z {1.smc}
I want to run something equivalent to:
7z e *.7z # But somehow only extract one file
Thank you!

Ultimately my solution was to extract all files and run the following in the directory:
for n in *; do echo "$n"; done > files.txt
I then imported that list into excel, and split the files by a special character that divided the title of the file with the qualifying data inside the filename (for example: Some Title (V1) [X2].smc), specifically I used a brackets delimiter.
Then I removed all duplicates, leaving me with only one edition of each from the zip. I finally remerged the columns (unfortunately the bracket was deleted during the splitting so wrote a function to add it back on the condition of whether there was content in the next column) and then resaved files.txt, after a bit of reviewing StackOverflow for answers, deleted files based on an input file (files.txt). A word of warning on this, spaces in filenames cause problems with rm and xargs so I had to encapsulate the variable with quotes.
Ultimately this still didn't serve me well enough so I just used a different resource entirely.
Posting this answer so others who find themselves in a similar predicament find an alternative resolution.

Related

How to search by key word list in file content for a drive or directory

I have a text file containing a list of key words. I need to search a drive (and recursively its subfolders) containing millions of files in different formats by those key words in the filename AND their content.
The desired result would be a list of the files containing those keywords and their location.
Is there a way to achieve that with bash or an R script? I was told that bash would be way too slow to execute for that quantity of files, hence I thought about R.
Any suggestions on functions that could achieve that would be appreciated.
I need to search a drive (and recursively its subfolders) containing millions of files in different formats by those key words in the filename AND their content.
The desired result would be a list of the files containing those keywords and their location.
Using bash is perfect for this situation.
for item in ($cat file) ; do grep -rni "$item" /search_path ; done
search might take a while tho.

list of files with space in the name

I would like to get the list of files with a specific extention in a folder. However, these files has space in the name. So for example, imagining I have files named file test1.txt, file test2.txt, file test3.txt, file test4.txt, if I do
list.files(pattern="file test*.txt")
I got
character(0)
NOTA: Apparentely, using simply pattern="file test*" it works fine but I need the extention file as well.
Try:
list.files(pattern="file test.*.txt")
Actually, what this says is:
list.files(pattern="file test(.*).txt")
(which also works). . refers to any character and * refers to the idea that this character should be present 0 or more times (see ?regex).
In your kast example you said that using pattern="file test*" works but you need a way to search for the extension as well.
All you have to do is Change your code to pattern="file test.*.txt". This would make your code search for any filename that matched "file testX.txt" with any one character in place of X.

gzgrep help in multiple large archives - solaris

In solaris- i need to perform a gzgrep of archives. But i need to filter so not searching ALL the archives- maybe just files with '09.30-12' in the name.. then i want to search IN that particular file or files for a particular expression. I have this close.. but it takes WAY too long as its searching unnecessary files first and matching on those.. then moving onto the October archives and finding what i need in them. I need to basically search any files in which filename contains 'x' then look in those files for text 'y' and output to > fileoutput. Perhaps just change the *.gz to just match on a set of files?? i cannot figure out how though. Any help is MUCH appreciated.
Something like this works- but i get way too much output and it takes way too long.
gzgrep 'firstexpression' *.gz > /fileoutput.file
maybe just files with '09.30-12' in the name..
You could say:
gzgrep 'firstexpression' *09.30-12*.gz > fileoutput.file
or
gzgrep pattern_to_search *filename_pattern*.gz > outfile

Reading a file into R with partly unknown filename

Is there a way to read a file into R where I do not know the complete file name. Something like.
read.csv("abc_*")
In this case I do not know the complete file name after abc_
If you have exactly one file matching your criteria, you can do it like this:
read.csv(dir(pattern='^abc_')[1])
If there is more than one file, this approach would just use the first hit. In a more elaborated version you could loop over all matches and append them to one dataframe or something like that.
Note that the pattern uses regular expressions and thus is a bit different from what you did expect (and what I wrongly assumed at my first shot to answer the question). Details can be found using ?regex
If you have a directory you want to submit, you have do modify the dir command accordingly:
read.csv(dir('path/to/your/file', full.names=T, pattern="^abc"))
The submitted path in your case may be c:\\users\\user\\desktop, and then the pattern as above. full.names=T forces dir() to output a whole path and not only the file name. Try running dir(...) without the read.csv to understand what is happening there.
If you want to give your path as a complete string, it again gets a bit more complicated:
filepath <- 'path/to/your/file/abc_'
read.csv(dir(dirname(filepath), full.names=T, pattern=paste("^", basename(filepath), sep='')))
That process will fail if your filename contains any regular expression keywords. You would have to substitute then with their corresponding escape sequences upfront. But that again is another topic.

Are there any symbols that come after all letters in Unix's alphabetical sorting?

When I organize my directories I often want certain directories to stand out in ls. For example, I will sometimes have a directory called #backup# and this will end up in the top of the list of directories, rather than in between all directories starting in "b". Sometimes, though, I want a directory to be at the bottom of the list, but I haven't found any symbol that achieves this. (The closest I've come is z#name#z, but this doesn't quite cut it.) So: Are there any symbols that come after all letters in Unix's alphabetical sorting?
You can use any (e.g. ASCII or Unicode [it depends upon your encoding and localization]) character except NULL (used as the ending of filepath) and / (used to separate directories in file path). See path_resolution(7). You might consider using ~ because several utilities (see indent(1), mv(1)....) adopt the convention to backup file /home/nag/foo as /home/nag/foo~. AFAIK #foo# could be used by emacs to backup temporarily an edited (but unsaved) file foo.

Resources