Change FS in AWK for multiple files

Change FS in AWK for multiple files - unix

I'm trying to read multiple files in an AWK-script but when I change between file, the field seperator (FS) needs to change as well. At this point I got:
FILENAME=="A.txt"{
FS=";"
//DoSomething
}
FILENAME=="B.txt"{
FS=" - "
//DoSomething
}
But as you might know, the FS will not get set correctly for the first line of the file. How can I solve this?

You can specify the field separators at the command line:
awk -f a.awk FS=";" A.txt FS=" - " B.txt
In this way, the field separator will change for each file.
From http://www.delorie.com/gnu/docs/gawk/gawk_82.html :
Any awk variable can be set by including a variable assignment among
the arguments on the command line when awk is invoked
and
With it, a variable is set either at the beginning of the awk run or
in between input files.

You can do it as #HakonHaegland suggests by setting FS between file names in the arg list if you are listing the files individually. That is the typical way to do this.
Alternatively, if you can't do that (e.g. because you need to use * or similar for the file list), then you can use BEGINFILE if you are using GNU awk, but otherwise you can do it the way you are already by adding an assignment of $0 to itself after changing FS to force awk to re-split the record. e.g.:
$ cat file
a-b-c
d e f
$ awk '{print NF, $1}' file
1 a-b-c
3 d
$ awk '{FS="-"; $0=$0; print NF, $1}' file
3 a
1 d e f
If you are going to do it that way it's best done just once at the start of each file (when FNR==1).

Related

Linux - Get Substring from 1st occurence of character

FILE1.TXT
0020220101
or
01 20220101
Need to extra date part from file where text starts from 2
Options tried:
t_FILE_DT1='awk -F"2" '{PRINT $NF}' FILE1.TXT'
t_FILE_DT2='cut -d'2' -f2- FILE1.TXT'
echo "$t_FILE_DT1"
echo "$t_FILE_DT2"
1st output : 0101
2nd output : 0220101
Expected Output: 20220101
Im new to linux scripting. Could some one help guide where Im going wrong?

Use grep like so:
echo "0020220101\n01 20220101" | grep -P -o '\d{8}\b'
20220101
20220101
Here, GNU grep uses the following options:
-P : Use Perl regexes.
-o : Print the matches only (1 match per line), not the entire lines.
SEE ALSO:
grep manual
perlre - Perl regular expressions

Using any awk:
$ awk '{print substr($0,length()-7)}' file
20220101
20220101
The above was run on this input file:
$ cat file
0020220101
01 20220101
Regarding PRINT $NF in your question - PRINT != print. Get out of the habit of using all-caps unless you're writing Cobol. See correct-bash-and-shell-script-variable-capitalization for some reasons.
The 2 in your scripts is telling awka and cut to use the character 2 as the field separator so each will carve up the input into substrings everywhere a 2 occurs.
The 's in your question are single quotes used to make strings literal, you were intending to use backticks, `cmd`, but those are deprecated in favor of $(cmd) anyway.

I would instead of looking for "after" the 2 .. (not having to worry about whether there is a space involved as well) )
Think instead about extracting the last 8 characters, which you know for fact is your date ..
input="/path/to/txt/file/FILE1.TXT"
while IFS= read -r line
do
# read in the last 8 characters of $line .. You KNOW this is the date ..
# No need to worry about exact matching at that point, or spaces ..
myDate=${line: -8}
echo "$myDate"
done < "$input"

About the cut and awk commands that you tried:
Using awk -F"2" '{PRINT $NF}' file will set the field separator to 2, and $NF is the last field, so printing the value of the last field is 0101
Using cut -d'2' -f2- file uses a delimiter of 2 as well, and then print all fields starting at the second field, which is 0220101
If you want to match the 2 followed by 7 digits until the end of the string:
awk '
match ($0, /2[0-9]{7}$/) {
print substr($0, RSTART, RLENGTH)
}
' file
Output
20220101

The accepted answer shows how to extract the first eight digits, but that's not what you asked.
grep -o '2.*' file
will extract from the first occurrence of 2, and
grep -o '2[0-9]*' file
will extract all the digits after every occurrence of 2. If you specifically want eight digits, try
grep -Eo '2[0-9]{7}'
maybe also with a -w option if you want to only accept a match between two word boundaries. If you specifically want only digits after the first occurrence of 2, maybe try
sed -n 's/[^2]*\(2[0-9]*\).*/\1/p' file

Awk command to perform action on lines excluding 1st and last

I have multiple MS excel files in csv format in a particular directory.
I want to update the value of one particular column in all the rows of the csv files.
Also, the action should not be operated on 1st and last line.
So far I have come up with below code for one row:
awk -F, 'NR>2{$2=300;}1' OFS=, test.csv
But i am facing difficulty in excluding the last line.
Also, i need to perform the same for all the files in the directory.
So far tried the below but not able to succeed to replace that string value using awk.
1)
2)

This may do:
awk -F, 't{print t} {a=t=$0} NR>1{$2=300;t=$0} END {print a}' OFS=, test.csv

$ cat file
1,a,b
2,c,d
3,e,f
$ awk 'BEGIN{FS=OFS=","} NR>1{print (NR>2 ? chgd : orig)} {orig=$0; $2=300; chgd=$0} END{print orig}' file
1,a,b
2,300,d
3,e,f

You could simplify the script a bit by reading the file twice:
awk 'BEGIN{FS=OFS=","} NR==FNR {c=NR;next} !(FNR==1||FNR==c){$2=200} 1' file file
This uses the NR==FNR section merely to count lines, giving you a simple expression for determining whether to update the field in question.
And if you have GNU awk available, you might save a few CPU cycles by not reassigning the c variable for every line, using something like this:
gawk 'BEGIN{FS=OFS=","} ENDFILE {c=FNR} NR==FNR{next} !(FNR==1||FNR==c){$2=200} 1' file file
This still reads the file twice, but assigns c only after each file is read.
If you want, you can emulate the ENDFILE condition in non-GNU awk using NR>FNR && FNR==1 if you only have two files, then set c=NR-1. It won't perform as well.
I haven't tested the speed difference between these two, but I suspect it would be negligible except in cases of truly obscenely large files.

Thanks all,
I got to make it work. Below is the command:
awk -v sq="" -F, 't{print t} {a=t=$0} NR>2{$3=sq"ops_data"sq;t=$0} END {print a}' OFS=, test1.csv

Median Calculation in Unix

I need to calculate median value for the below input file. It is working fine for odd occurrences but not for even occurrences. Below is the input file and the script used. Could you please check what is wrong with this command and correct the same.
Input file:
col1,col2
AR,2.52
AR,3.57
AR,1.29
AR,6.66
AR,3.05
AR,5.52
Desired Output:
AR,3.31
Unix command:
cat test.txt | sort -t"," -k2n,2 | awk '{arr[NR]=$1} END { if (NR%2==1) print arr[(NR+1)/2]; else print (arr[NR/2]+arr[NR/2+1])/2}'

Don't forget that your input file has an additional line, containing the header. You need to take an additional step in your awk script to skip the first line.
Also, due to the fact you're using the default field separator, $1 will contain the whole line, so your code arr[NR/2]+arr[NR/2+1])/2 is never going to work. I would suggest that you changed it so that awk splits the input on a comma, then use the second field $2.
sort -t, -k2n,2 file | awk -F, 'NR>1{a[++i]=$2}END{if(i%2==1)print a[(i+1)/2];else print (a[i/2]+a[i/2+1])/2}'
I also removed your useless use of cat. Most tools, including sort and awk, are capable of reading in files directly, so you don't need to use cat with them.
Testing it out:
$ cat file
col1,col2
AR,2.52
AR,3.57
AR,1.29
AR,6.66
AR,3.05
AR,5.52
$ sort -t, -k2n,2 file | awk -F, 'NR>1{a[++i]=$2}END{if(i%2==1)print a[(i+1)/2];else print (a[i/2]+a[i/2+1])/2}'
3.31
It shouldn't be too difficult to modify the script slightly to change the output to whatever you want.

How to save both matching and non-matching from grep

I use grep very often and am familiar with it's ability to return matching lines (by default) and non-matching lines (using the -v parameter). However, I want to be able to grep a file once to separate matching and non-matching lines.
If this is not possible, please let me know. I realize I could do this easily in perl or awk, but am curious if it is possible with grep.
Thanks!

If it does NOT have to be grep - this is a single pass split based on a pattern -- pattern found > file1 pattern not found > file2
awk '/pattern/ {print $0 > "file1"; next}{print $0 > "file2"}' inputfile

I had the exact same problem and I wrote a small Perl script for that [1]. It only accepts one argument: the regex to grep input on.
[1] https://gist.github.com/tonejito/c9c0bffd75d8c81483f9107c609439e1
It reads STDIN by line and checks against the given regex, matched lines go to STDOUT and not matched go to STDERR.
I made it this way because this tool sits in the middle of a pipeline and I use shell redirection to save the files on their final location.

Step 1 : Read the file
Step 2 : Replace spaces with a new line and save the result in a temporary file
Step 3 : Get only lines contains '_' from the temporary file and save it into multiwords.txt
Step 4 : Exclude the lines that contains '-' from the temporary file then save the result into singlewords.txt
Step 5 : Delete the temporary file
cat file | tr ' ' '\n' > tmp.txt | grep '_' tmp.txt > multiwords.txt | grep -v '_' tmp.txt > singlewords.txt | find . -type f -name 'tmp.txt' -delete

Unix command to prepend text to a file

Is there a Unix command to prepend some string data to a text file?
Something like:
prepend "to be prepended" text.txt

printf '%s\n%s\n' "to be prepended" "$(cat text.txt)" >text.txt

sed -i.old '1s;^;to be prepended;' inFile
-i writes the change in place and take a backup if any extension is given. (In this case, .old)
1s;^;to be prepended; substitutes the beginning of the first line by the given replacement string, using ; as a command delimiter.

Process Substitution
I'm surprised no one mentioned this.
cat <(echo "before") text.txt > newfile.txt
which is arguably more natural than the accepted answer (printing something and piping it into a substitution command is lexicographically counter-intuitive).
...and hijacking what ryan said above, with sponge you don't need a temporary file:
sudo apt-get install moreutils
<<(echo "to be prepended") < text.txt | sponge text.txt
EDIT: Looks like this doesn't work in Bourne Shell /bin/sh
Here String (zsh only)
Using a here-string - <<<, you can do:
<<< "to be prepended" < text.txt | sponge text.txt

This is one possibility:
(echo "to be prepended"; cat text.txt) > newfile.txt
you'll probably not easily get around an intermediate file.
Alternatives (can be cumbersome with shell escaping):
sed -i '0,/^/s//to be prepended/' text.txt

If it's acceptable to replace the input file:
Note:
Doing so may have unexpected side effects, notably potentially replacing a symlink with a regular file, ending up with different permissions on the file, and changing the file's creation (birth) date.
sed -i, as in Prince John Wesley's answer, tries to at least restore the original permissions, but the other limitations apply as well.
Here's a simple alternative that uses a temporary file (it avoids reading the whole input file into memory the way that shime's solution does):
{ printf 'to be prepended'; cat text.txt; } > tmp.txt && mv tmp.txt text.txt
Using a group command ({ ...; ...; }) is slightly more efficient than using a subshell ((...; ...)), as in 0xC0000022L's solution.
The advantages are:
It's easy to control whether the new text should be directly prepended to the first line or whether it should be inserted as new line(s) (simply append \n to the printf argument).
Unlike the sed solution, it works if the input file is empty (0 bytes).
The sed solution can be simplified if the intent is to prepend one or more whole lines to the existing content (assuming the input file is non-empty):
sed's i function inserts whole lines:
With GNU sed:
# Prepends 'to be prepended' *followed by a newline*, i.e. inserts a new line.
# To prepend multiple lines, use '\n' as part of the text.
# -i.old creates a backup of the input file with extension '.old'
sed -i.old '1 i\to be prepended' inFile
A portable variant that also works with macOS / BSD sed:
# Prepends 'to be prepended' *followed by a newline*
# To prepend multiple lines, escape the ends of intermediate
# lines with '\'
sed -i.old -e '1 i\
to be prepended' inFile
Note that the literal newline after the \ is required.
If the input file must be edited in place (preserving its inode with all its attributes):
Using the venerable ed POSIX utility:
Note:
ed invariably reads the input file as a whole into memory first.
To prepend directly to the first line (as with sed, this won't work if the input file is completely empty (0 bytes)):
ed -s text.txt <<EOF
1 s/^/to be prepended/
w
EOF
-s suppressed ed's status messages.
Note how the commands are provided to ed as a multi-line here-document (<<EOF\n...\nEOF), i.e., via stdin; by default string expansion is performed in such documents (shell variables are interpolated); quote the opening delimiter to suppress that (e.g., <<'EOF').
1 makes the 1st line the current line
function s performs a regex-based string substitution on the current line, as in sed; you may include literal newlines in the substitution text, but they must be \-escaped.
w writes the result back to the input file (for testing, replace w with ,p to only print the result, without modifying the input file).
To prepend one or more whole lines:
As with sed, the i function invariably adds a trailing newline to the text to be inserted.
ed -s text.txt <<EOF
0 i
line 1
line 2
.
w
EOF
0 i makes 0 (the beginning of the file) the current line and starts insert mode (i); note that line numbers are otherwise 1-based.
The following lines are the text to insert before the current line, terminated with . on its own line.

This will work to form the output. The - means standard input, which is provide via the pipe from echo.
echo -e "to be prepended \n another line" | cat - text.txt
To rewrite the file a temporary file is required as cannot pipe back into the input file.
echo "to be prepended" | cat - text.txt > text.txt.tmp
mv text.txt.tmp text.txt

Prefer Adam's answer
We can make it easier to use sponge. Now we don't need to create a temporary file and rename it by
echo -e "to be prepended \n another line" | cat - text.txt | sponge text.txt

Probably nothing built-in, but you could write your own pretty easily, like this:
#!/bin/bash
echo -n "$1" > /tmp/tmpfile.$$
cat "$2" >> /tmp/tmpfile.$$
mv /tmp/tmpfile.$$ "$2"
Something like that at least...

Editor's note:
This command will result in data loss if the input file happens to be larger than your system's pipeline buffer size, which is typically 64 KB nowadays. See the comments for details.
In some circumstances prepended text may available only from stdin.
Then this combination shall work.
echo "to be prepended" | cat - text.txt | tee text.txt
If you want to omit tee output, then append > /dev/null.

Another way using sed:
sed -i.old '1 {i to be prepended
}' inFile
If the line to be prepended is multiline:
sed -i.old '1 {i\
to be prepended\
multiline
}' inFile

Solution:
printf '%s\n%s' 'text to prepend' "$(cat file.txt)" > file.txt
Note that this is safe on all kind of inputs, because there are no expansions. For example, if you want to prepend !##$%^&*()ugly text\n\t\n, it will just work:
printf '%s\n%s' '!##$%^&*()ugly text\n\t\n' "$(cat file.txt)" > file.txt
The last part left for consideration is whitespace removal at end of file during command substitution "$(cat file.txt)". All work-arounds for this are relatively complex. If you want to preserve newlines at end of file.txt, see this: https://stackoverflow.com/a/22607352/1091436

As tested in Bash (in Ubuntu), if starting with a test file via;
echo "Original Line" > test_file.txt
you can execute;
echo "$(echo "New Line"; cat test_file.txt)" > test_file.txt
or, if the version of bash is too old for $(), you can use backticks;
echo "`echo "New Line"; cat test_file.txt`" > test_file.txt
and receive the following contents of "test_file.txt";
New Line
Original Line
No intermediary file, just bash/echo.

Another fairly straight forward solution is:
$ echo -e "string\n" $(cat file)

% echo blaha > blaha
% echo fizz > fizz
% cat blaha fizz > buzz
% cat buzz
blaha
fizz

You can do that easily with awk
cat text.txt|awk '{print "to be prepended"$0}'
It seems like the question is about prepending a string to the file not each line of the file, in this case as suggested by Tom Ekberg the following command should be used instead.
awk 'BEGIN{print "to be prepended"} {print $0}' text.txt

If you like vi/vim, this may be more your style.
printf '0i\n%s\n.\nwq\n' prepend-text | ed file

For future readers who want to append one or more lines of text (with variables or even subshell code) and keep it readable and formatted, you may enjoy this:
echo "Lonely string" > my-file.txt
Then run
cat <<EOF > my-file.txt
Hello, there!
$(cat my-file.txt)
EOF
Results of cat my-file.txt:
Hello, there!
Lonely string
This works because the read of my-file.txt happens first and in a subshell. I use this trick all the time to append important rules to config files in Docker containers rather than copy over entire config files.

you can use variables
Even though a bunsh of answers here work pretty well, I want to contribute this one-liner, just for completeness. At least it is easy to keep in mind and maybe contributes to some general understanding of bash for some people.
PREPEND="new line 1"; FILE="text.txt"; printf "${PREPEND}\n`cat $FILE`" > $FILE
In this snippe just replace text.txt with the textfile you want to prepend to and new line 1 with the text to prepend.
example
$ printf "old line 1\nold line 2" > text.txt
$ cat text.txt; echo ""
old line 1
old line 2
$ PREPEND="new line 1"; FILE="text.txt"; printf "${PREPEND}\n`cat $FILE`" > $FILE
$ cat text.txt; echo ""
new line 1
old line 1
old line 2
$

# create a file with content..
echo foo > /tmp/foo
# prepend a line containing "jim" to the file
sed -i "1s/^/jim\n/" /tmp/foo
# verify the content of the file has the new line prepened to it
cat /tmp/foo

I'd recommend defining a function and then importing and using that where needed.
prepend_to_file() {
file=$1
text=$2
if ! [[ -f $file ]] then
touch $file
fi
echo "$text" | cat - $file > $file.new
mv -f $file.new $file
}
Then use it like so:
prepend_to_file test.txt "This is first"
prepend_to_file test.txt "This is second"
Your file contents will then be:
This is second
This is first
I'm about to use this approach for implementing a change log updater.

With ex,
ex - $file << PREPEND
-1
i
prepended text
.
wq
PREPEND
The ex commands are
-1 Go to the very beginning of the file
i Begin insert mode
. End insert mode
wq Save (write) and quit

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Change FS in AWK for multiple files - unix

Related

Linux - Get Substring from 1st occurence of character

Awk command to perform action on lines excluding 1st and last

Median Calculation in Unix

How to save both matching and non-matching from grep

Unix command to prepend text to a file

Categories

Resources