Unix - Find patterns in file, copy into another file

Unix - Find patterns in file, copy into another file - unix

I have spent some time considering how to tackle this but i'm not sure and my use of unix is fairly limited so far.
I have a text file, lets give it a name "Text.txt", which contains lots of information. Let's say it contains:
SomethingA: aValue
SomethingB: bValue
SomethingC: cValue
SomethingD: dValue
SomethingD: anotherDValueThisTime
SomethingA: aValueToIgnore
I want to search through "Text.txt", and find some values, then put the values in a new file, output.txt.
This gets a little bit more tricky tough, as what I am trying to do is get the first value of somethingA, and then every SomethingD value which occurs.
So the output in "output.txt" should be:
aValue
dValue
anotherDValue
The second "SomethingA" value wants to be ignored as this is not the first "SomethingA" value.
I imagine the logic to be something like:
Find SomethingA > output.txt
Find ALL SomethingD's >> output.txt
But I just can't quite get it.
Any help is much appreciated!

awk is ideal
awk '/^SomethingA/ && ! a++ || /^SomethingD/ { print $2 }' FS=: text.txt > output.txt
This is a little sloppy, but you can be more precise with:
awk '$1 == "SomethingA" && ! a++ || $1 == "SomethingD" { print $2 }' FS=: text.txt > output.txt
Unfortunately, that requires a fixed string for the keys. If you want a regex, you can do:
awk 'match($1, "pattern") && ...

grep -m 1 somethingA inputfile.txt >outputfile.txt
grep somethingD inputfile.txt >>outputfile.txt
grep option -m sets the maximum number of matches you want to get.
>> appends to a file rather than overwriting it like > does.

I think what you need is the grep command with the -o options. It works like this:
grep -o -e 'pattern1' -e 'pattern2' /tmp/1 >> /tmp/2

Related

Unix command to replace first column of a .csv file

I want a unix command (that I will call in a ControlM job) that changes the value of the first column of my .csv file (not the header line), with the date of the previous day (expected format : YYYY-MM-DD).
I tried many commands but none of them do want I want :
tmp=$(mktemp) && awk -F\| -v val=`date -d yesterday +%F` 'NR>1 {gsub($1,val)}' file.csv > "$tmp" && mv "$tmp" file.csv
or :
awk -F\| -v val=`date -d yesterday +%F` '{gsub($1, val)}1' file.csv
even tried gensub but not working.
Example of what I want :
Input :
VALUE_DATE;TRADE_DATE;DESCR1;DESCR2
2019-03-05;2017-11-15;BRIDGE;HELLO
2019-03-05;2018-03-17;WORK;DATA
Output I want (as today is 2019-03-07):
VALUE_DATE;TRADE_DATE;DESCR1;DESCR2
2019-03-06;2017-11-15;BRIDGE;HELLO
2019-03-06;2018-03-17;WORK;DATA
Can you help please and give me examples of commands that should work, I'm not finding a solution.
Thanks a lot

Could you please try following first?(not saving output into file.csv itself it will print output on terminal once happy then you could use answer
provided at last of this post)
awk -v val=$(date -d yesterday +%F) 'BEGIN{FS=OFS=";"}FNR>1{$1=val} 1' file.csv
Problems identified in OP's code(and fixed in my suggestion):
1- Use of backtick is depreciated now to save shell variable's values, so instead use val=$(date....) for declaring awk's variable named val.
2- Use of -F, you have set your field separator as \| which is pipe but when we see your provided sample Input_file carefully it is delimited with ;(semi colon) NOT | so that is also one of the reason why it is not reflecting in output.
3- Since use of gsub($1,val), replaces whole line to only with value of variable val
because
syntax of gsub is something like: gsub(your_regex/value_needs_to_be_replaced,"new_value"/variable_which_should_be_there_after_replacement,current_line/variable). Since you have defined wrong field separator so whole line being treated as $1 and thus when you print it by doing awk -F\| -v val=$(date -d yesterday +%F) 'NR>1 {gsub($1,val)} 1' file.csv it will only print previous dates.
4- 4th and main issue is you have NOT printed anything, so even you did mistakes you will NOT see any output either on terminal or in output file.
If happy then you could run your own command to make changes into Input_file itself.(I am assuming that you are having propervaluein your tmp variable here)
tmp=$(mktemp) && awk -v val=$(date -d yesterday +%F) 'BEGIN{FS=OFS=";"}FNR>1{$1=val} 1' file.csv > "$tmp" && mv "$tmp" file.csv

Check if a string in one file exists in another in unix

I have a file that contains the version name and version number. The contents of the first file looks as-
File1-
<Line contains the name of product1>
package_name0_9_8 >= 1.2.3x-4.5.6
package_name0_9_8-32bit >= 3.6.1g-3.5.1
package_name0_9_8-xx >= 6.3.2v-3.0.4
<Line contains the name of product2>
anotherpackage_name0_9_8 >= 3.5.6u-3.6.5
And,
File2.xml-
<package name="package_name0_9_8" version="1.2.3x-4.4.4"/>
<package name="package_name0_9_8-32bit" version="3.6.1g-3.4.0"/>
.
.
Is there a way to check the existance of package_name that is present in File1 with the package_name of File2 and check if the corresponding version of package_name in File1 with that of corresponding version of package_name of File2?
I am frank that I am pretty much weak in concatenating the 'grep' and 'awk' commands along with options to be used here. Please help out.

for a in $(sed -n '/>=/p' File1.txt | grep -o '^[^ ]*'); do for b in $(sed -n "/^$a /{s/.*>=\(.*\)$/\1/p}" File1.txt); do ((! $(grep -c "$a.*$b" File2.txt))) && (echo "$a $b" >> missing_pkgs.txt); done; done;
this is a quick one liner - you could print it out a bit prettier
the way this works is nested for loop that grabs both pieces separate into variables (you could do that with read and put them in on one loop if you want) and then just counts the occurences in the second file with grep and whenever there is a count of zero it will reverse the value making the test (()) turn true and echo the missing packages to the file missing_pkgs.txt
here is another quick one liner that does the same thing except more efficient with one loop and variables loaded via read
while read each; do read a b < <(echo $each) && ((! $(grep -c "$a.*$b" File2.txt))) && (echo "$a $b" >> missing_pkgs.txt); done < <(awk '/>=/{ print $1" "$3 }' File1.txt)
more simplified:
while read a b; do ((! $(grep -c "$a.*$b" File2.txt))) && (echo "$a $b" >> missing_pkgs.txt); done < <(awk '/>=/{ print $1" "$3 }' File1.txt)

sed -n 's².*²s#<package name="\\(&"/>#\\1 Present#p²;s/ *>= */\\)" *version="/p' File1 > /tmp/File1.sed
sed -n -f /tmp/File1.sed File2
rm /tmp/File1.sed
not in on instruction like awk could do, but do the job (posix version so --posix on GNU sed
you could change the output message that is the \\1 Present text where \\1 will the be the package name (with few modification, version could also be used)

It looks like you already got a much shorter solution in a format closer to what you desired. However, since I asked if a Python solution would work, and you said yes, check out the code here:
http://pastebin.com/F5LYrmea
(I haven't debugged it more than a little, but it seems to work on at least a little more than your example files. I released the code to the public domain. CC-BY-SA isn't a software license, according to the makers of CC; so, that's why I didn't post it here, as posting it here would give it that license. Plus, you get syntax highlighting specific to Python at the link provided.)
Basically, it's a lot of complicated text parsing. Not much of an algorithm to explain. It gets the contents of both files, strips out the packages, their versions and the operands (puts all those in a dictionary for use later), and loops through lines of the other file and compares versions; then it tells you which ones match and which ones don't.

awk getline not accepting external variable from a file

I have a file test.sh from which I am executing the following awk command.
awk -f x.awk < result/output.txt >>difference.txt
x.awk
while (getline < result/$bld/$DeviceType)
the variable DeviceType and bld are available in test.sh.
I have declared them as export type.
export DeviceType=$line
Even then while executing test.sh file, the script stops at following line
awk -f x.awk < result/output.txt >>difference.txt
and I am getting
awk: x.awk:4: (FILENAME=- FNR=116) fatal: division by zero attempted
error.

The awk script is read by awk, not touched by the shell. Inside an awk script, $bld means 'the field designated by the number in the variable bld' (that's the awk variable bld).
You can set awk variables on the command line (officially with the -v option):
awk -v bld="$bld" -v dev="$DeviceType" -f x.awk < result/output.txt >> difference.txt
Whether that does what you want is still debatable. Most likely you need x.awk to contain something like:
BEGIN { file = sprintf("result/%s/%s", bld, dev); }
{ while ((getline < file) > 0) print }

awk is not shell just like C is not shell. You should not expect to be able to access shell variables within an awk program any more than you can access shell variables within a C program.
To pass the VALUE of shell variables to an awk script, see http://cfajohnson.com/shell/cus-faq-2.html#Q24 for details but essentially:
awk -v awkvar="$shellvar" '{ ... use awkvar ...}'
is usually the right approach.
Having said that, whatever you're trying to do it looks like the wrong approach. If you are considering using getline, make sure to read http://awk.freeshell.org/AllAboutGetline first and understand all of the caveats but if you tell us what it is you're trying to do with sample input and expected output we can almost certainly help you come up with a better approach that has nothing to do with getline.

Unix - Need to cut a file which has multiple blanks as delimiter - awk or cut?

I need to get the records from a text file in Unix. The delimiter is multiple blanks. For example:
2U2133 1239
1290fsdsf 3234
From this, I need to extract
1239
3234
The delimiter for all records will be always 3 blanks.
I need to do this in an unix script(.scr) and write the output to another file or use it as an input to a do-while loop. I tried the below:
while read readline
do
read_int=`echo "$readline"`
cnt_exc=`grep "$read_int" ${Directory path}/file1.txt| wc -l`
if [ $cnt_exc -gt 0 ]
then
int_1=0
else
int_2=0
fi
done < awk -F' ' '{ print $2 }' ${Directoty path}/test_file.txt
test_file.txt is the input file and file1.txt is a lookup file. But the above way is not working and giving me syntax errors near awk -F
I tried writing the output to a file. The following worked in command line:
more test_file.txt | awk -F' ' '{ print $2 }' > output.txt
This is working and writing the records to output.txt in command line. But the same command does not work in the unix script (It is a .scr file)
Please let me know where I am going wrong and how I can resolve this.
Thanks,
Visakh

The job of replacing multiple delimiters with just one is left to tr:
cat <file_name> | tr -s ' ' | cut -d ' ' -f 2
tr translates or deletes characters, and is perfectly suited to prepare your data for cut to work properly.
The manual states:
-s, --squeeze-repeats
replace each sequence of a repeated character that is
listed in the last specified SET, with a single occurrence
of that character

It depends on the version or implementation of cut on your machine. Some versions support an option, usually -i, that means 'ignore blank fields' or, equivalently, allow multiple separators between fields. If that's supported, use:
cut -i -d' ' -f 2 data.file
If not (and it is not universal — and maybe not even widespread, since neither GNU nor MacOS X have the option), then using awk is better and more portable.
You need to pipe the output of awk into your loop, though:
awk -F' ' '{print $2}' ${Directory_path}/test_file.txt |
while read readline
do
read_int=`echo "$readline"`
cnt_exc=`grep "$read_int" ${Directory_path}/file1.txt| wc -l`
if [ $cnt_exc -gt 0 ]
then int_1=0
else int_2=0
fi
done
The only residual issue is whether the while loop is in a sub-shell and and therefore not modifying your main shell scripts variables, just its own copy of those variables.
With bash, you can use process substitution:
while read readline
do
read_int=`echo "$readline"`
cnt_exc=`grep "$read_int" ${Directory_path}/file1.txt| wc -l`
if [ $cnt_exc -gt 0 ]
then int_1=0
else int_2=0
fi
done < <(awk -F' ' '{print $2}' ${Directory_path}/test_file.txt)
This leaves the while loop in the current shell, but arranges for the output of the command to appear as if from a file.
The blank in ${Directory path} is not normally legal — unless it is another Bash feature I've missed out on; you also had a typo (Directoty) in one place.

Other ways of doing the same thing aside, the error in your program is this: You cannot redirect from (<) the output of another program. Turn your script around and use a pipe like this:
awk -F' ' '{ print $2 }' ${Directory path}/test_file.txt | while read readline
etc.
Besides, the use of "readline" as a variable name may or may not get you into problems.

In this particular case, you can use the following line
sed 's/ /\t/g' <file_name> | cut -f 2
to get your second columns.

In bash you can start from something like this:
for n in `${Directoty path}/test_file.txt | cut -d " " -f 4`
{
grep -c $n ${Directory path}/file*.txt
}

This should have been a comment, but since I cannot comment yet, I am adding this here.
This is from an excellent answer here: https://stackoverflow.com/a/4483833/3138875
tr -s ' ' <text.txt | cut -d ' ' -f4
tr -s '<character>' squeezes multiple repeated instances of <character> into one.

It's not working in the script because of the typo in "Directo*t*y path" (last line of your script).

Cut isn't flexible enough. I usually use Perl for that:
cat file.txt | perl -F' ' -e 'print $F[1]."\n"'
Instead of a triple space after -F you can put any Perl regular expression. You access fields as $F[n], where n is the field number (counting starts at zero). This way there is no need to sed or tr.

Interpret as fixed string/literal and not regex using sed

For grep there's a fixed string option, -F (fgrep) to turn off regex interpretation of the search string.
Is there a similar facility for sed? I couldn't find anything in the man. A recommendation of another gnu/linux tool would also be fine.
I'm using sed for the find and replace functionality: sed -i "s/abc/def/g"

Do you have to use sed? If you're writing a bash script, you can do
#!/bin/bash
pattern='abc'
replace='def'
file=/path/to/file
tmpfile="${TMPDIR:-/tmp}/$( basename "$file" ).$$"
while read -r line
do
echo "${line//$pattern/$replace}"
done < "$file" > "$tmpfile" && mv "$tmpfile" "$file"
With an older Bourne shell (such as ksh88 or POSIX sh), you may not have that cool ${var/pattern/replace} structure, but you do have ${var#pattern} and ${var%pattern}, which can be used to split the string up and then reassemble it. If you need to do that, you're in for a lot more code - but it's really not too bad.
If you're not in a shell script already, you could pretty easily make the pattern, replace, and filename parameters and just call this. :)
PS: The ${TMPDIR:-/tmp} structure uses $TMPDIR if that's set in your environment, or uses /tmp if the variable isn't set. I like to stick the PID of the current process on the end of the filename in the hopes that it'll be slightly more unique. You should probably use mktemp or similar in the "real world", but this is ok for a quick example, and the mktemp binary isn't always available.

Option 1) Escape regexp characters. E.g. sed 's/\$0\.0/0/g' will replace all occurrences of $0.0 with 0.
Option 2) Use perl -p -e in conjunction with quotemeta. E.g. perl -p -e 's/\\./,/gi' will replace all occurrences of . with ,.
You can use option 2 in scripts like this:
SEARCH="C++"
REPLACE="C#"
cat $FILELIST | perl -p -e "s/\\Q$SEARCH\\E/$REPLACE/g" > $NEWLIST

If you're not opposed to Ruby or long lines, you could use this:
alias replace='ruby -e "File.write(ARGV[0], File.read(ARGV[0]).gsub(ARGV[1]) { ARGV[2] })"'
replace test3.txt abc def
This loads the whole file into memory, performs the replacements and saves it back to disk. Should probably not be used for massive files.

If you don't want to escape your string, you can reach your goal in 2 steps:
fgrep the line (getting the line number) you want to replace, and
afterwards use sed for replacing this line.
E.g.
#/bin/sh
PATTERN='foo*[)*abc' # we need it literal
LINENUMBER="$( fgrep -n "$PATTERN" "$FILE" | cut -d':' -f1 )"
NEWSTRING='my new string'
sed -i "${LINENUMBER}s/.*/$NEWSTRING/" "$FILE"

You can do this in two lines of bash code if you're OK with reading the whole file into memory. This is quite flexible -- the pattern and replacement can contain newlines to match across lines if needed. It also preserves any trailing newline or lack thereof, which a simple loop with read does not.
mapfile -d '' < file
printf '%s' "${MAPFILE//"$pat"/"$rep"}" > file
For completeness, if the file can contain null bytes (\0), we need to extend the above, and it becomes
mapfile -d '' < <(cat file; printf '\0')
last=${MAPFILE[-1]}; unset "MAPFILE[-1]"
printf '%s\0' "${MAPFILE[#]//"$pat"/"$rep"}" > file
printf '%s' "${last//"$pat"/"$rep"}" >> file

perl -i.orig -pse 'while (($i = index($_,$s)) >= 0) { substr($_,$i,length($s), $r)}'--\
-s='$_REQUEST['\'old\'']' -r='$_REQUEST['\'new\'']' sample.txt
-i.orig in-place modification with backup.
-p print lines from the input file by default
-s enable rudimentary parsing of command line arguments
-e run this script
index($_,$s) search for the $s string
substr($_,$i,length($s), $r) replace the string
while (($i = index($_,$s)) >= 0) repeat until
-- end of perl parameters
-s='$_REQUEST['\'old\'']', -r='$_REQUEST['\'new\'']' - set $s,$r
You still need to "escape" ' chars but the rest should be straight forward.
Note: this started as an answer to How to pass special character string to sed hence the $_REQUEST['old'] strings, however this question is a bit more appropriately formulated.

You should be using replace instead of sed.
From the man page:
The replace utility program changes strings in place in files or on the
standard input.
Invoke replace in one of the following ways:
shell> replace from to [from to] ... -- file_name [file_name] ...
shell> replace from to [from to] ... < file_name
from represents a string to look for and to represents its replacement.
There can be one or more pairs of strings.