How to merge multiple files from multiple directories/folders - r

I have 300 directories/folders, each directory has two columns single file (xxx.gz), I want to merge all files from all folders in a single file. In all files first column is Identifier (ID) which is same.
How to merge all files into single file?
And I want to header for each column as name of file in respective directory.
Directory names are are: (68a7eb0a-123, b5694957-764, etc.. ) and files name are : (a5c403c2, 292c4a2f etc),
directory name and respective file name are not same, I want file name as header.
all directories
ls
6809b1c3-75a5
68e9b641-0cc9
71ae07b8-8bde
b7815cd2-1e69
..
..
each directory contain single file:
cd 6809b1c3-75a5
ls bd21dc2e.txt.gz

Try this:
for i in * ; do for j in $i/*.gz ; do echo $j >> ../final.txt ; gunzip -c $j >> ../final.txt ; done ; done
Annotated version:
for i in * # for each directory under current working directory
do # have nothing else in there
for j in $i/*.gz # for each gzipped file under directories
do
echo $j >> ../final.txt # echo path/file to the final file
gunzip -c $j >> ../final.txt # append gunzipping the file to the final file
done
done
Result:
$ head -8 ../final.txt
6809b1c3-75a5/bd21dc2e.txt.gz
blabla
whatever
you
have
in
those
files

Related

paste text to filenames in unix

I have a list of strings in String that I want to add at the beginning of all files names of Targets in the folder. All files are ordered.
String.txt:
ID1Somestring_
IDISomeOtherString_
IDISomeThirdString_
Targets:
example1.fastq
example2.fastq
example3.fastq
output:
ID1Somestring_example1.fastq
IDISomeOtherString_example2.fastq
IDISomeThirdString_example3.fastq
First, read the file into an array
mapfile -t strings < String.txt
Then, iterate over the files and access each array element in turn:
n=0; for file in *fastq; do echo mv "$file" "${strings[n++]}$file"; done
mv example1.fastq ID1Somestring_example1.fastq
mv example2.fastq IDISomeOtherString_example2.fastq
mv example3.fastq IDISomeThirdString_example3.fastq
Or, assuming your filenames do not contain newlines
paste String.txt <(printf "%s\n" *fastq) |
while read -r string file; do echo mv "$file" "$string$file"; done

Script to Tar file in a specific Directory

I'm trying to write a Unix script that will go to a specific directory (/tmp/Sanbox/logs) and tar.gz all the log files in that directory and delete the original files that are older than 2 days. So far I have this but it doesn't work. Any help would be appreciated.
#!/bin/bash
AGE_TO_COMPRESS="172800" # 172800 seconds = 2 days
LOG_FILES="export/home/H0166015/Sandbox/logs"
# Any file older than EDGE_DATE must be compressed
NOW=$( date +%s )
EDGE_DATE=$(( NOW - AGE_TO_COMPRESS ))
for file in $LOG_FILES ; do
# check if file exists
if [ -e "$file" ] ; then
# compare "modified date" of file to EDGE_DATE
if [ $( stat -c %Y "$file" ) -gt ${EDGE_DATE} ] ; then
# create tar file of a single file
tar -cvzf $file.tar.gz $file --remove-files
fi
fi
done

How to use awk for multiple file search in two directories, print records only from files with matching string in second directory

Remade a previous question so that it is more clear. I'm trying to search files in two directories and print matching character strings (+ line immediately following) into a new file from the second directory only if they match a record in the first directory. I have found similar examples but nothing quite the same. I don't know how to use awk for multiple files from different directories and I've tortured myself trying to figure it out.
Directory 1, 28,000 files, formatted viz.:
>ABC
KLSDFIOUWERMSDFLKSJDFKLSJDSFKGHGJSNDKMVMFHKSDJFS
>GHI
OOILKJSDFKJSDFLMOPIWERIOUEWIRWIOEHKJTSDGHLKSJDHGUIYIUSDVNSDG
Directory 2, 15 files, formatted viz.:
>ABC
12341234123412341234123412341234123412341234123412341234123412341234
>DEF
12341234123412341234123412341234
>GHI
12341234123412341234123412341234123412341234123412341234123412341234123412341234
Desired output:
>ABC
12341234123412341234123412341234123412341234123412341234123412341234
>GHI
12341234123412341234123412341234123412341234123412341234123412341234123412341234
Directories 1 and 2 are located in my home directory: (./Test1 & ./Test2)
If anyone could advise command to specific the different directories, I'd be immensely grateful! Currently when I include file path (e.g., /Test1/*.fa) I get the following error:
awk: can't open file /Test1/*.fa
You'll want something like this (untested):
awk '
FNR==1 {
dirname = FILENAME
sub("/.*","",dirname)
if (NR==1) {
dirname1 = dirname
}
}
dirname == dirname1 {
if (FNR % 2) {
key = $0
}
else {
map[key] = $0
}
next
}
(FNR % 2) && ($0 in map) && !seen[$0,map[$0]]++ {
print $0 ORS map[$0]
}
' Test1/* Test2/*
Given you're getting the error message /usr/bin/awk: Argument list too long which means you're exceeding your shells maximum argument length for a command and that 28,000 of your files are in the Test1 directory, try this:
find Test1 -type f -exec cat {} \; |
awk '
NR == FNR {
if (FNR % 2) {
key = $0
}
else {
map[key] = $0
}
next
}
(FNR % 2) && ($0 in map) && !seen[$0,map[$0]]++ {
print $0 ORS map[$0]
}
' - Test2/*
Solution in TXR:
Data:
$ ls dir*
dir1:
file1 file2
dir2:
file1 file2
$ cat dir1/file1
>ABC
KLSDFIOUWERMSDFLKSJDFKLSJDSFKGHGJSNDKMVMFHKSDJFS
>GHI
OOILKJSDFKJSDFLMOPIWERIOUEWIRWIOEHKJTSDGHLKSJDHGUIYIUSDVNSDG
$ cat dir1/file2
>XYZ
SDOIWEUROIUOIWUEROIWUEROIWUEROIWUEROUIEIDIDIIDFIFI
>MNO
OOIWEPOIUWERHJSDHSDFJSHDF
$ cat dir2/file1
>ABC
12341234123412341234123412341234123412341234123412341234123412341234
>DEF
12341234123412341234123412341234
>GHI
12341234123412341234123412341234123412341234123412341234123412341234123412341234
$ cat dir2/file2
>STP
12341234123412341234123412341234123412341234123412341234123412341234123412341234
>MNO
123412341234123412341234123412341234123412341234123412341234123412341234
$
Run:
$ txr filter.txr dir1/* dir2/*
>ABC
12341234123412341234123412341234123412341234123412341234123412341234
>GHI
12341234123412341234123412341234123412341234123412341234123412341234123412341234
>MNO
123412341234123412341234123412341234123412341234123412341234123412341234
Code in filter.txr:
#(bind want #(hash :equal-based))
#(next :args)
#(all)
#dir/#(skip)
#(and)
# (repeat :gap 0)
#dir/#file
# (next `#dir/#file`)
# (repeat)
>#key
# (do (set [want key] t))
# (end)
# (end)
#(end)
#(repeat)
#path
# (next path)
# (repeat)
>#key
#datum
# (require [want key])
# (output)
>#key
#datum
# (end)
# (end)
#(end)
To separate the dir1 paths from the rest, we use an #(all) match (try multiple pattern branches, which must all match) with two branches. The first branch matches one #dir/#(skip) pattern, binding the variable dir to text that is preceded by a slash, and ignore the rest. The second branch matches a whole consecutive sequence of #dir/#file patterns via #(repeat :gap 0). Because the same dir variable appears that already has a binding from the first branch of the all, this constrains the matches to the same directory name. Inside this repeat we recurse into each file via next and gather the >-delimited keys into the keep hash. After that, we process the remaining arguments as path names of files to process; they don't all have to be in the same directory. We scan through each one for the >#key pattern followed by a line of #datum. The #(require ...) directive will fail the match if key is not in the wanted hash, otherwise we fall through to the #(output).

Search for a pattern in files and log the result into another file

I have a file One.lst that has the following content:
a
b
c
Now,i have to search in :
a_man.lst
b_man.lst
c_man.lst
the pattern : <inc>true</inc>.
Now,suppose b_man.xml has the above pattern.
So, I have to log b in another file, say Two.lst.
You can for example do:
while read line
do
grep -q "<inc>true</inc>" ${line}_man.lst && echo $line >> Two.lst
done < One.lst
The while read keeps reading the file names from One.lst and stores them in $line.
grep -q checks the existence of <inc>true</inc> in ${line}_man.lst. In case of match, it prints the file name ($line) into Two.lst.

how to copy the dynamic file name and append some string while copying into other directory in unix

I have many files like ABC_Timestamp.txt , RAM_Timestamp.txthere timestamp will be different everytime. I want to copy this file into other directory but while copying I want append one string at the end of the file , so the format will be ABC_Timestamp.txt.OK and RAM_Timestamp.txt.OK. How to append the string in dynamic file. Please suggest.
My 2 pence:
(cat file.txt; echo "append a line"; date +"perhaps with a timestamp: %T") > file.txt.OK
Or more complete for your filenames:
while sleep 3;
do
for a in ABC RAM
do
(echo "appending one string at the end of the file" | cat ${a}_Timestamp.txt -) > ${a}_Timestamp.txt.OK
done
done
Execute this on command line.
ls -1|awk '/ABC_.*\.txt/||/RAM_.*\.txt/
{old=$0;
new="/new_dir/"old".OK";
system("cp "old" "new); }'
Taken from here
You can say:
for i in *.txt; do cp "${i}" targetdirectory/"${i}".OK ; done
or
for i in ABC_*.txt RAM_*.txt; do cp "${i}" targetdirectory/"${i}".OK ; done
How about first dumping the names of the file in another file and then moving file one by one.
find . -name "*.txt" >fileNames
while read line
do
newName="${line}appendText"
echo $newName
cp $line $newName
done < fileNames

Resources