Extracting a list of files and creating a new file containing this list - unix

I am a researcher and my skill in Unix commands is limited. I am currently dealing with a folder containing about 1000 files and I have to extract some filenames from this folder and create another file (configuration file) containing these filenames.
Basically, the folder has filenames in the following format :
1_Apple_A_someword.txt
1_Apple_B_someword.txt
2_Apple_A_someword.txt
2_Apple_B_someword.txt
3_Apple_A_someword.txt
3_Apple_B_someword.txt
and so on up until
1000_Apple_A_someword.txt
1000_Apple_B_someword.txt
I just want to extract out all files which have "Apple_A" in them. Also, I want to create another file which has 'labels' (Unix variables) for each of these "Apple_A" files whose values are the names of the files. Also, the 'labels' are part of the filenames (everything up until the word "Apple") For example,
1_Apple=1_Apple_A_someword.txt
2_Apple=2_Apple_A_someword.txt
3_Apple=3_Apple_A_someword.txt
and so on...till
1000_Apple=1000_Apple_A_someword.txt
Could you tell me a one-line Unix command that does this ? Maybe using "awk" and "sed"

Try
ls *Apple_A* | sed 's/\(\(.*Apple\).*\)$/\2=\1/'

This might work for you (GNU sed):
ls -1 | sed '/^\([0-9]\+_Apple\)_A/!d;s//\1=&/'

An awk version, FYI.:
ls -1 | awk -vFS='_' '/Apple_A/ {print $1"_"$2"="$0}'

ls -1|perl -F_ -ane 'if($_=~m/Apple_A/){print $F[0]."_".$F[1]."=".$_}'

Related

How to search for a pattern in a file from the end of a directory using grep?

I need to search for files containing a pattern in a directory (to search from the end of the directory to the start).
This is the command I use now,
grep -rl 'pattern'
Is there any command to search for a pattern from the last file of a directory to the first file?
If you want to grep to search in some order, you need to pass it a list of file names in the order you want. If you want the files in the current directory in reverse order of name, ls -r would do the job. How about something like this?
ls -1br | xargs grep 'pattern'
Note the -b, which is needed to mitigate problems with spaces and metacharacters in file names.
Note also that this won't cope well with sub-directories. But the principle is sound - generate a list of files in the order you want and pass it to grep using xargs.

Changing multiple filenames that with a number already in the file name in Unix

so I want to batch change name files with these type of names (about 400 files):
L1_Mviridis.fasta
L2_Mviridis.fasta
L3_Mviridis.fasta...
to this:
L1_1_Mviridis.fasta
L2_2_Mviridis.fasta
L3_3_Mviridis.fasta
I do not have the function "rename" available either.
Thanks for any suggestion!
you have two choices I suggest you can write a python script to rename each file first you split() function to split the underscore and extract the number the question is not that clear
Rename multiple files in a directory in Python
there is already an answer here also
Or you can use the mv command to rename mv <old name> <new name> and write a bash script to rename each one you can use sed or awk to rename each file
you can chain command for example ls -la | awk you can use for loop to iterate ls -l below is the guide to shell scripting
http://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO-7.html
there is already an answer here
BASH: Rename multiple files. But only rename part of the filename
I hope this is a good starting point to you

approximate matching using grep

I need to create a script that loops through files in a directory, check if the filename is in "list.txt" then process it. My problem is that the filenames are dynamic since it has timestamp.
Is there a way to grep approximate match in unix?
Sample.
list.txt
SAMPLE_REPORT_1
SAMPLE_REPORT_2
Report Filenames
SAMPLE_REPORT_1_20180416121345.csv
SAMPLE_REPORT_2_20180416121645.csv
I need to check if the filenames are in list.txt
bash + grep solution:
for f in *.csv; do
if grep -qx "${f%_*.csv}" list.txt; then
# processing file $f
fi
done

Unix - Using ls with grep

How can I use ls (or other commands) and grep together to search from specific files for a certain word inside that file?
Example I have a file - 201503003_315_file.txt and I have other files in my dir.
I only want to search files that have a file name that contains _315_ and inside that file, search for the word "SAMPLE".
Hope this is clear and thanks in advance for any help.
You can do:
ls * _315_* | xargs grep "SAMPLE"
The first part: ls * _315_* will list only files that have 315 as part of the file name, this list of files is piped to grep which will scan each one of them and look for "SAMPLE"
UPDATE
A bit easier (and actually safer) approach was mentioned by David in the comments bellow:
grep "SAMPLE" *_315_*.txt
The reason why it's safer is that ls doesn't handle well special characters.
Another option, as mentioned by Charles Duffy in the comments below:
printf '%s\0' *_315_* | xargs -0 grep
Change to that directory (using cd dir) and try:
grep SAMPLE *_315_*
If you really MUST use ls AND grep try this:
ls *_315_* | xargs grep SAMPLE
The first example, however, requires less typing...

Unzip only limited number of files in linux

I have a zipped file containing 10,000 compressed files. Is there a Linux command/bash script to unzip only 1,000 files ? Note that all compressed files have same extension.
unzip -Z1 test.zip | head -1000 | sed 's| |\\ |g' | xargs unzip test.zip
-Z1 provides a raw list of files
sed expression encodes spaces (works everywhere, including MacOS)
You can use wildcards to select a subset of files. E.g.
Extract all contained files beginning with b:
unzip some.zip b*
Extract all contained files whose name ends with y:
unzip some.zip *y.extension
You can either select a wildcard pattern that is close enough, or examine the output of unzip -l some.zip closely to determine a pattern or set of patterns that will get you exactly the right number.
I did this:
unzip -l zipped_files.zip |head -1000 |cut -b 29-100 >list_of_1000_files_to_unzip.txt
I used cut to get only the filenames, first 3 columns are size etc.
Now loop over the filenames :
for files in `cat list_of_1000_files_to_unzip.txt `; do unzip zipped_files.zip $files;done
Some advices:
Execute zip to only list a files, redirect output to some file
Truncate this file to get only top 1000 rows
Pass the file to zip to extract only specified files

Resources