Formatting text outputs in unix - unix

Hi I have a list here:
list_1.txt
Alpha
Bravo
Charlie
and files in a directory with the following filenames and contents:
Alpha_123.log
This is a sample line in the file
error_log "This is error1 in file"
This is another sample line in the file
This is another sample line in the file
This is another sample line in the file
error_log "This is error2 in file"
This is another sample line in the file
This is another sample line in the file
error_log "This is error3 in file"
This is another sample line in the file
This is another sample line in the file
Alpha_123.sh
This is a sample line in the file
This is another sample line in the file
This is another sample line in the file
error_log "This is errorA in file"
This is another sample line in the file
This is another sample line in the file
This is another sample line in the file
This is another sample line in the file
error_log "This is errorB in file"
This is another sample line in the file
This is another sample line in the file
error_log "This is errorC in file"
This is another sample line in the file
Bravo.log
Charlie.log
contents of Bravo.log and Charlie.log is similar to Alpha.log
I would like to have an output like this:
Alpha|"This is error1 in file"
Alpha|"This is error2 in file"
Alpha|"This is error3 in file"
Alpha|"This is errorA in file"
Alpha|"This is errorB in file"
Alpha|"This is errorC in file"
Any inputs is greatly appreciated. Thank you!
So basically, I would like to find first the files with names containing the string patterns in list_1.txt, then find the error messages and output with |

awk to the rescue!
awk '{gsub(/^error_log /,FILENAME"|")}1' $(awk '{print $0".log"}' list_1.txt)
UPDATE
Based on the updated info, I think that's what you're looking for.
awk '/^error_log/ {split(FILENAME,f,"_");
gsub(/^error_log /,f[1]"|")}' $(awk '{print $0"_*"}' list_1.txt)

If I understood correctly, this should do:
awk -vOFS=\| 'FNR==1{file=FILENAME;sub("[.]log$","",file)}{sub("^error_log ","");print file,$0}' *.log
explanation:
-vOFS=\| sets the output field separator to |. (The \ is needed to escape the | from the shell (which would treat it as pipe). You could use -vOFS='|' instead.)
FNR==1{...} makes sure this code is run only once per input file: FNR is number of records (i.e lines) read by awk so far from the current file. So this is only equal to 1 when processing the very first line of each file.
file=FILENAME just stores the filename of the currently processed input file in a variable for later editing.
sub("[.]log$","",file) removes .log (the [...] escape the dot (.) from being interpreted as any character in the regular expression. You could use \\. instead.) from the end (that's what the $ stands for) of the filename.
{...} runs the code for every record/line of each input file.
sub("^error_log ","") removes "error_log " (note the trailing space!) from the beginning (that's what the ^ stands for) of each line ("record") of the input.
print file,$0 prints the remainder of each record (i.e. line) prefixed by the corresponding filenames. Note that the comma (,) will be replaced by the output field separator specified earlier. You could use print file "|" $0 instead without specifying the OFS.
*.log will make every file ending in .log in the current directory an input file for the awk command. You could specify Alpha.log Bravo.log Charly.log explicitely instead.
Here is an alternative using your list.txt to construct the filenames:
awk -vOFS=\| '{file=$0;logfile=file ".log";while(getline < logfile){sub("^error_log ","");print file,$0}}' list.txt
explanation:
file=$0 saves the current line (record) from list.txt in a variable.
logfile=file ".log" appends .log to it to get the corresponding log filename.
while(getline < logfile){...} will run the code for each line/record in the current log file.
The rest should be clear from the above example.

Related

unix search a pattern on one column and remove those lines

I am trying to search one column with a particular pattern, and eliminate those rows and create a new file without that pattern.
Sample Data:
col1|col2|col3|col4
abc|test123|demo|test
def|test345|exam|write
ghf|456|test|account
ijk|789|travel|destination
Expected Output:
col1|col2|col3|col4
ghf|456|test|account
ijk|789|travel|destination
I want to search for the pattern "test" in 2nd column, and remove those rows from the source file, and create a new file as shown in the expected output.
File is a delimited file "|".
awk -F"|" '{if(index($2,"test")==0) printf "%s\n", $0}' test > test_out
test is original file.
test_out is final expected file.

Adding text to the beginning of a text file without having to copy the entire file in R

I have many large text files, and I would like to add a line at the very beginning. I saw someone had asked this already here. However, this involves reading the entire text file and appending it to the single line. Is there a better (faster) way?
I tested this on windows 7 and it works. Essentially, you use the shell function and do everything on the windows cmd which is quite fast.
write_beginning <- function(text, file){
#write your text to a temp file i.e. temp.txt
write(text, file='temp.txt')
#print temp.txt to a new file
shell(paste('type temp.txt >' , 'new.txt'))
#append your file to new.txt
shell(paste('type', file, '>> new.txt'))
#remove temp.txt - I use capture output to get rid of the
#annoying TRUE printed by file.remove
dump <- capture.output(file.remove('temp.txt'))
#uncomment the last line below to rename new.txt with the name of your file
#and essentially modify your old file
#dump <- capture.output(file.rename('new.txt', file))
}
#assuming your file is test.txt and you want to add 'hello' at the beginning just do:
write_beginning('hello', 'test.txt')
On linux you just need to find the corresponding command in order to send a file to another one (I really think you need to replace type by cat on linux but I cannot test right now).
You'd use the system() function on a Linux distro:
system('cp file.txt temp.txt; echo " " > file.txt; cat temp.txt >> file.txt; rm temp.txt')

how to split a large csv file in unix command line

I am just splitting a very large csv file in to parts. When ever i run the following command. the doesn't completely split rather returns me the following error. how can i avoid the split the whole file.
awk -F, '{print > $2}' test1.csv
awk: YY1 makes too many open files
input record number 31608, file test1.csv
source line number 1
Just close the files after writing:
awk -F, '{print > $2; close($2)}' test1.csv
You must have a lot of lines. Are you sure that the second row repeats enough to put those records into an individual file? Anyway, awk is holding the files open until the end. You'll need a process that can close the file handles when not in use.
Perl to the rescue. Again.
#!perl
while( <> ) {
#content = split /,/, $_;
open ( OUT, ">> $content[1]") or die "whoops: $!";
print OUT $_;
close OUT;
}
usage: script.pl your_monster_file.csv
outputs the entire line into a file named the same as the value of the second CSV column in the current directory, assuming no quoted fields etc.

extracting first line from file using awk command

I've been going through an online UNIX course and have come across this question which I'm stuck on. Would appreciate any help!
You are provided with a set of files each one of which contains personal details about an individual. Each file is laid out in the following format, with one file per individual:
name:Niko Tanaka
age:41
occupation:Doctor
I know the answer has to be in the form:
n=$(awk -F: ' / /{print }' filename)
n=$(awk -F: '/name/{print $2}' infile)
Whatever is inside of / / are regular expressions. In this case you just want to match on the line that contains 'name'.

To replace the first character of the last line of a unix file with the file name

We need a shell script that retrieves all txt files in the current directory and for each file checks if it is an empty file or contains any data in it (which I believe can be done with wc command).
If it is empty then ignore it else since in our condition, all txt files in this directory will either be empty or contain huge data wherein the last line of the file will be like this:
Z|11|21||||||||||
That is the last line has the character Z then | then an integer then | then an integer then certain numbers of | symbols.
If the file is not empty, then we just assume it to have this format. Data before the last line are garbled and not necessary for us but there will be at least one line before the last line, i.e. there will be at least two lines guaranteed if the file is non-empty.
We need a code wherein, if the file is non-empty, then it takes the file, replaces the 'Z' in the last line with 'filename.txt' and writes the new data into another file say tempfile. The last line will thus become as:
filename.txt|11|21|||||||
Remaining part of the line remains same. From the tempfile, the last line, i.e., filename.txt|int|int||||| is taken out and merged into a finalfile. The contents of tempfile is cleared to receive data from next filename.txt in the same directory. finalfile has the edited version of the last lines of all non-empty txt files in that directory.
Eg: file1.txt has data as
....
....
....
Z|1|1|||||
and file2.txt has data as
....
....
....
Z|2|34|||||
After running the script, new data of file1.txt becomes
.....
.....
.....
file1.txt|1|1||||||
This will be written into a new file say temp.txt which is initially empty. From there the last line is merged into a file final.txt. So, the data in final.txt is:
file1.txt|1|1||||||
After this merging, the data in temp.txt is cleared
New data of file2.txt becomes
...
...
...
file2.txt|2|34||||||
This will be written into the same file temp.txt. From there the last line is merged into the same file final.txt.
So, the data in final.txt is
file1.txt|1|1||||||
file2.txt|2|34||||||
After considering N number of files that was returned to be as of type txt and non-empty and within the same directory, the data in final.txt becomes
file1.txt|1|1||||||
file2.txt|2|34||||||
file3.txt|8|3||||||
.......
.......
.......
fileN.txt|22|3|||||
For some of the conditions, I already know the command, like
For finding files in a directory of type text,
find <directory> -type f -name "*.txt"
For taking the last line and merging it into another file
tail -1 file.txt>>destination.txt
You can use 'sed' to replace the "z" character. You'll be in a loop, so you can use the filename that you have in that. This just removes the Z, and then echos the line and filename.
Good luck.
#!/bin/bash
filename=test.txt
line=`tail -1 $filename | sed "s/Z/$filename/"`
echo $line
Edit:
Did you run your find command first, and see the output? It has of course a ./ at the start of each line. That will break sed, since sed uses / as a delimiter. It also will not work with your problem statement, which does not have an extra "/" before the filename. You said current directory, and the command you give will traverse ALL subdirectories. Try being simple and using LS.
# `2>/dev/null` puts stderr to null, instead of writing to screen. this stops
# us getting the "no files found" (error) and thinking it's a file!
for filename in `ls *.txt 2>/dev/null` ; do
... stuff ...
done

Resources