File1 + (File2 - first line) > File3 - unix

I have two csv/text files that I'd like to join. Both contain the same first line. I'm trying to figure out how to use sed and cat to produce a merged file, but with only one copy of the first line. And I'm having a time with syntax. Any help would be greatly appreciated :-D!
Thanks,
Andrew

Another option with awk:
awk 'NR==FNR || FNR>1' file1.txt file2.txt .. fileN.txt
This prints all lines in the first file, OR any line in subsequent files after the first line.

This will combine files data1.txt and data2.txt in file merged.txt, skipping the first line from data2.txt. It uses awk if you are ok with it:
(cat data1.txt; awk 'NR>1' data2.txt) > merged.txt
awk appends all lines with line number > 1 from file data2.txt to file merged.txt.
NR is a built-in awk variable that stands for the current line number of the file being processed. If the Boolean expression NR > 1 is true, awk prints the line implicitly.
If you didn't care about keeping data1.txt intact, you could just append your 2nd file (minus its first line) and reduce to just this:
awk 'NR>1' data2.txt >> data1.txt

I'd say the most straightforward solution is:
( cat file1.txt ; tail -n +2 file2.txt ) > file3.txt
It has the advantage of stating clearly just what you're doing: print the entire first file, then print all but the first line of the second file, writing the output to the third file.

solved with one line
'1 d' means to delete first line in file2
the following command will append the result to file1
sed '1 d' file2 >> file1

Related

Awk command to perform action on lines excluding 1st and last

I have multiple MS excel files in csv format in a particular directory.
I want to update the value of one particular column in all the rows of the csv files.
Also, the action should not be operated on 1st and last line.
So far I have come up with below code for one row:
awk -F, 'NR>2{$2=300;}1' OFS=, test.csv
But i am facing difficulty in excluding the last line.
Also, i need to perform the same for all the files in the directory.
So far tried the below but not able to succeed to replace that string value using awk.
1)
2)
This may do:
awk -F, 't{print t} {a=t=$0} NR>1{$2=300;t=$0} END {print a}' OFS=, test.csv
$ cat file
1,a,b
2,c,d
3,e,f
$ awk 'BEGIN{FS=OFS=","} NR>1{print (NR>2 ? chgd : orig)} {orig=$0; $2=300; chgd=$0} END{print orig}' file
1,a,b
2,300,d
3,e,f
You could simplify the script a bit by reading the file twice:
awk 'BEGIN{FS=OFS=","} NR==FNR {c=NR;next} !(FNR==1||FNR==c){$2=200} 1' file file
This uses the NR==FNR section merely to count lines, giving you a simple expression for determining whether to update the field in question.
And if you have GNU awk available, you might save a few CPU cycles by not reassigning the c variable for every line, using something like this:
gawk 'BEGIN{FS=OFS=","} ENDFILE {c=FNR} NR==FNR{next} !(FNR==1||FNR==c){$2=200} 1' file file
This still reads the file twice, but assigns c only after each file is read.
If you want, you can emulate the ENDFILE condition in non-GNU awk using NR>FNR && FNR==1 if you only have two files, then set c=NR-1. It won't perform as well.
I haven't tested the speed difference between these two, but I suspect it would be negligible except in cases of truly obscenely large files.
Thanks all,
I got to make it work. Below is the command:
awk -v sq="" -F, 't{print t} {a=t=$0} NR>2{$3=sq"ops_data"sq;t=$0} END {print a}' OFS=, test1.csv

One liner required, pref UNIX-based, for variation of JOIN command

I need a one-liner (that I can put in a dos batch file), preferably using a unix command like AWK or JOIN. The function I need is essentially a more elaborate version of the following JOIN command:
join -j 1 -a 1 file1.txt file2.txt -t "^" > output.txt
[walkthrough: field separators are "^", join key is 1st field of both fields, and not exactly sure what the "-a 1" is doing exactly but it is sticking the bit-to-be joined on the end of the row of the other file, which is what I want.
Now, this one-liner works fine where both files are sorted and there is only one matching line in the 2nd file ... but I need it to try to match up to 4 lines in the 2nd file.
E.g.
file1:
12^blahblah
13^blahblahblahblah
14^blahblahblahblahblahblahblahblah
file2:
12^banana
12^orange
12^apple
13^potato
14^tomato
So I want the output like this:
12^blahblah^banana,orange,apple
13^blahblahblahblah^potato
14^blahblahblahblahblahblahblahblah^tomato
[Doesn't have to be a comma separating the new items]
You can try this awk command:
awk -F'^' 'NR==FNR{if($1 in a){a[$1]=a[$1]","$2} else {a[$1]=$2}} NR>FNR{print $0 "^" a[$1]}' file2 file1
The script fills an array a with the content of file2 and and append the content of the array when parsing file1
$ awk -F'^' 'NR==FNR{a[$1]=$0 FS;next} {a[$1] = a[$1] s[$1] $2; s[$1]=","} END{for (i in a) print a[i]}' file1 file2
12^blahblah^banana,orange,apple
13^blahblahblahblah^potato
14^blahblahblahblahblahblahblahblah^tomato

commandline output lines that are specified in another file

iam searching for some command line that takes a text file and a file with line numbers (one on each line) (alternatively from stdin) and outputs only that lines from the first file.
the text file may be several hundreds of MB large and the line list may contains several thousands of entries (but are sorted ascending)
in short:
one file contains data
another file contains indexes
a command should extract only indexed lines
first file:
many lines
of course they are all very different
and contain very important data
...
more lines
...
even more lines
second file
1
5
7
expected output
many lines
more lines
even more lines
The second (line number) file does not necessarily have to exist. Its data also may come from stdin (in deed this would the optimum). Also the format of that data may vary from the shown if this would make the task easier.
This can be an approach:
$ awk 'FNR==NR {a[$1]; next} FNR in a' file_with_line_numbers file_with_data
many lines
more lines
even more lines
It reads the file_with_line_numbers and stores the lines in an array a[]. Then it reads the other file and keeps checking if the line number is in the array, in which case the line is printed.
The trick used is the following:
awk 'FNR==NR {something; next} {other things}' file1 file2
that performs actions related to file1 in the {something} block and then actions related to file2 in the {other things} block.
What if the line numbers are given through stdin?
For this you can use awk '...' - file, so that stdin is called with -. This is called Naming Standard Input. So that you can do:
your_commands | awk 'FNR==NR {a[$1]; next} FNR in a' - file_with_data
Test
$ echo "1
5
7" | awk 'FNR==NR {a[$1]; next} FNR in a' - file_with_data
many lines
more lines
even more lines
With sed, convert the line numbers to a sed program, and use that generated program to print out the wanted lines;
$ sed -n "$( sed 's/$/p/' second_file )" first_file
many lines
more lines
even more lines
This works too.
foreach line ( "cat file2" )
foreach? sed -n "$line p" file1
foreach? end
many lines
more lines
even more lines

How to append one file to the other, with first file being edited, hence can't use usual cat command

Suppose that I have two files, each of them have header in the first line and records in the remaining lines. And I want to concatenate two files into one, but don't include header twice.
I tried the following commands while googling for the answer, (hence I may not cope in an optimal way).
cat awk 'NR!=1 {printf "%s\n", $1}' file2.csv >| file.csv
However, I got the following error.
cat: awk: No such file or directory
cat: NR!=1 {printf "%s\n",$1}: No such file or directory
It looks like cat recognized awk as files, not commands. I want the result of awk to be the content of files, so I also tried to pipe it to the argument of cat.
awk 'NR!=1 {printf "%s\n", $1}' file2.csv > cat file.csv
However, in this way, I got file cat, in which I got the result of awk...
So how can I solve it?
Thanks.
You need some grouping:
{
cat file1
sed '1d' file2
} > file.csv
As one line
{ cat file1; sed '1d' file2; } > file.csv
The semicolon before the ending brace is required.
{cat file1; tail -n +2 file2} > out
Print first line from first file, then print line #2 to the end of any file
awk 'NR==1||FNR>1' file1 file2 (file3 file4 ..) > outfile

How to save both matching and non-matching from grep

I use grep very often and am familiar with it's ability to return matching lines (by default) and non-matching lines (using the -v parameter). However, I want to be able to grep a file once to separate matching and non-matching lines.
If this is not possible, please let me know. I realize I could do this easily in perl or awk, but am curious if it is possible with grep.
Thanks!
If it does NOT have to be grep - this is a single pass split based on a pattern -- pattern found > file1 pattern not found > file2
awk '/pattern/ {print $0 > "file1"; next}{print $0 > "file2"}' inputfile
I had the exact same problem and I wrote a small Perl script for that [1]. It only accepts one argument: the regex to grep input on.
[1] https://gist.github.com/tonejito/c9c0bffd75d8c81483f9107c609439e1
It reads STDIN by line and checks against the given regex, matched lines go to STDOUT and not matched go to STDERR.
I made it this way because this tool sits in the middle of a pipeline and I use shell redirection to save the files on their final location.
Step 1 : Read the file
Step 2 : Replace spaces with a new line and save the result in a temporary file
Step 3 : Get only lines contains '_' from the temporary file and save it into multiwords.txt
Step 4 : Exclude the lines that contains '-' from the temporary file then save the result into singlewords.txt
Step 5 : Delete the temporary file
cat file | tr ' ' '\n' > tmp.txt | grep '_' tmp.txt > multiwords.txt | grep -v '_' tmp.txt > singlewords.txt | find . -type f -name 'tmp.txt' -delete

Resources