pattern matching and delete all the lines except the last occurence - unix

I have a txt file which is having 100+ lines, i want to search for pattern and delete all the lines except the last occurrence.
Here are the lines from the txt file.
my pattern search is "string1=" , "string2=", "string3=" , "string4=" and "string5="
string1=hi
string2=hello
string3=welcome
string3=welcome1
string3=
string4=hi
string5=hello
i want to go through the each line and keep "string3=" is empty on the file and remove the "string3=welcome" ,"string3=welcome1"
please help me.

For a single pattern, you can start with something like this:
grep "string3" input | tail -1

#!/usr/bin/perl
my %h;
while (<STDIN>) {
my ($k, $v) = split /=/;
$h{$k} = $v;
}
foreach my $k ( sort keys %h ) {
print "$k=$h{$k}";
}
The perl script here will take your list as stdin and process output as you mention. This assumes you want the keys (string*) as sorted output.
If you only wants the values that start with string1-5 only then you can put a match in the beginning of your while loop as so:
next if ! /^string[1-5]=/;

Related

How to get the variable's name from a file using source command in UNIX?

I have a file named param1.txt which contains certain variables. I have another file as source1.txt which contains place holders. I want to replace the place holders with the values of the variables that I get from the parameter file.
I have basically hard coded the script where the variable names in the parameter.txt file is known before hand. I want to know a dynamic solution to the problem where the variable names will not be known beforehand. In other words, is there any way to find out the variable names in a file using the source command in UNIX?
Here is my script and the files.
Script:
#!/bin/bash
source /root/parameters/param1.txt
sed "s/{DB_NAME}/$DB_NAME/gI;
s/{PLANT_NAME}/$PLANT_NAME/gI" \
/root/sources/source1.txt >
/root/parameters/Output.txt`
param1.txt:
PLANT_NAME=abc
DB_NAME=gef
source1.txt:
kdashkdhkasdkj {PLANT_NAME}
jhdbjhasdjdhas kashdkahdk asdkhakdshk
hfkahfkajdfk ljsadjalsdj {PLANT_NAME}
{DB_NAME}
I cannot comment since I don't have enough points.
But is it correct that this is what you're looking for:
How to reference a file for variables using Bash?
Your problem statement isn't very clear to me. Perhaps you can simplify your problem and desired state.
Don't understand why you try to source param1.txt.
You can try with this awk :
awk '
NR == FNR {
a[$1] = $2
next
}
{
for ( i = 1 ; i <= NF ; i++ ) {
b = $i
gsub ( "^{|}$" , "" , b )
if ( b in a )
sub ( "{" b "}" , a[b] , $i )
}
} 1' FS='=' param1.txt FS=" " source1.txt

unix shell scripting to find and remove unwanted string in a pipe delimited file in a particular column

{
I have a requirement, where the file is pipe "|" delimited.
The first row contains the headers, and the count of columns is 5.
I have to delete only the string in the 3rd column if it matches the pattern.
Also note the 3rd column can contain strings with commas ,, semicolon ; or colon : but it will never contain a pipe | (due to which we have chosen a pipe delimiter).
Input File:
COL1|COL2|COL3|COL4|COL5
1|CRIC|IPL|CRIC1:IPL_M1;IPL_M2;TEST_M1,CRIC2:ODI_M1;IPL_M3|C1|D1
2|CRIC|TEST|CRIC1:TEST_M2,CRIC2:ODI_M1;IPL_M1;TEST_M2;IPL_M3;T20_M1|C2|D2
Output should change only in COL3 no other columns should be changed, i.e. in COL3 the string which matches the pattern 'IPL_' should be present.
Any other strings like "TEST_M1","ODI_M1" should be made null.
And any unwanted semi colons should be removed.
eg
Question - CRIC1:IPL_M1;IPL_M2;TEST_M1,CRIC2:ODI_M1;IPL_M3
result - CRIC1:IPL_M1;IPL_M2,CRIC2:IPL_M3
Another scenario where if only strings that do not match "IPL_" are present then
Question - CRIC1:TEST_M1,CRIC2:ODI_M1
Result - CRIC1:,CRIC2:
Output File:
COL1|COL2|COL3|COL4|COL5
1|CRIC|IPL|CRIC1:IPL_M1;IPL_M2,CRIC2:IPL_M3|C1|D1
2|CRIC|TEST|CRIC1:,CRIC2:IPL_M1;IPL_M3|C2|D2
Basic requirement is to find and replace the string,
INPUT
COL1|COL2|COL3|COL4|COL5
1|A1|A12|A13|A14|A15
Replace A13 with B13 in column 3 (A13 can change, I mean we have to find any pattern like A13)
OUTPUT
COL1|COL2|COL3|COL4|COL5
1|A1|A12|B13|A14|A15
Thanks in advance.
Re formatting the scenario in simpler terms,by taking only 2 columns, where I need to search "IPL_" and keep only those strings and any other string like "ODI_M3;TEST_M5" should be deleted
{
I/P:
{
COL1|COL2
CRIC1|IPL_M1;IPL_M2;TEST_M1
CRIC2|ODI_M1;IPL_M3
CRIC3|ODI_M3;TEST_M5
CRIC4|IPL_M5;ODI_M5;IPL_M6
}
O/P:
{
COL1|COL2
CRIC1|IPL_M1;IPL_M2
CRIC2|IPL_M3
CRIC3|
CRIC4|IPL_M5;IPL_M6
}
Awaiting your precious suggestions.
Please help I'm new to this platform.
Thanks,
Saquib
}
If I'm reading this correctly (and I'm not entirely sure I am; I'm going mostly by the provided examples), then this could be done relatively sanely with Perl:
#!/usr/bin/perl
while(<>) {
if($. > 1) {
local #F = split /\|/;
$F[3] = join(",", map {
local #H = split /:/;
$H[1] = join(";", grep(/IPL_/, split(";", $H[1])));
join ":", #H;
} split(/,/, $F[3]));
$_ = join "|", #F;
}
print;
}
Put this code into a file, say foo.pl, then if your data is in a file data.txt you can run
perl -f foo.pl data.txt
This works as follows:
#!/usr/bin/perl
# Read lines from input (in our case: data.txt)
while(<>) {
# In all except the first line (the header line):
if($. > 1) {
# Apply the transformation. To do this, first split the line into fields
local #F = split /\|/;
# Then edit the third field. This has to be read right-to-left at the top
# level, which is to say: first the field is split along commas, then the
# tokens are mapped according to the code in the inner block, then they
# are joined with commas between them again.
$F[3] = join(",", map {
# the map block does a similar thing. The inner tokens (e.g.,
# "CRIC1:IPL_M1;IPL_M2") are split at the colon into the CRIC# part
# (which is to be unchanged) and the value list we want to edit.
local #H = split /:/;
# This value list is again split along semicolons, filtered so that
# only those elements that match /IPL_/ remain, and then joined with
# semicolons again.
$H[1] = join(";", grep(/IPL_/, split(";", $H[1])));
# The map result is the CRIC# part joined to the edited list with a colon.
join ":", #H;
} split(/,/, $F[3]));
# When all is done, rejoin the outermost fields with pipe characters
$_ = join "|", #F;
}
# and print the result.
print;
}

How to format text using UNIX commands?

I'm trying to display all the files in a directory that have the same contents in a specific way. If the file is unique, it does not need to be displayed. Any file that is identical to others need to be displayed on the same line separated by commas.
For example,
c176ada8afd5e7c6810816e9dd786c36 2group1
c176ada8afd5e7c6810816e9dd786c36 2group2
e5e6648a85171a4af39bbf878926bef3 4group1
e5e6648a85171a4af39bbf878926bef3 4group2
e5e6648a85171a4af39bbf878926bef3 4group3
e5e6648a85171a4af39bbf878926bef3 4group4
2d43383ddb23f30f955083a429a99452 unique
3925e798b16f51a6e37b714af0d09ceb unique2
should be displayed as,
2group1, 2group2
4group1, 4group2, 4group3, 4group4
I know which files are considered unique in a directory from using md5sum, but I do not know how to do the formatting part. I think the solution involves awk or sed, but I am not sure. Any suggestions?
Awk solution (for your current input):
awk '{ a[$1]=a[$1]? a[$1]", "$2:$2 }END{ for(i in a) if(a[i]~/,/) print a[i] }' file
a[$1]=a[$1]? a[$1]", "$2:$2 - accumulating group names (from field $2) for each unique hash presented by the 1st field value $1. The array a is indexed by hashes with concatenated group names as a values (separated by a comma ,).
for(i in a) - iterating through array items
if(a[i]~/,/) print a[i] - means: if the hash associated with more than one group (separated by comma ,) - print the item
The output:
2group1, 2group2
4group1, 4group2, 4group3, 4group4
Given the input you provided, you essentially want to collect all the second columns where the first column is the same. So the first step is use awk to hash the second columns by the first. I leverage the solution posted here: Concatenate lines by first column by awk or sed
awk '{table[$1]=table[$1] $2 ",";} END {for (key in table) print key " => " table[key];}' file
c176ada8afd5e7c6810816e9dd786c36 => 2group1,2group2,
e5e6648a85171a4af39bbf878926bef3 => 4group1,4group2,4group3,4group4,
3925e798b16f51a6e37b714af0d09ceb => unique2,
2d43383ddb23f30f955083a429a99452 => unique,
And if you really want to filter to exclude the unique ones, just make sure you have at least two fields (telling AWK to use ',' as the separator):
awk '{table[$1]=table[$1] $2 ",";} END {for (key in table) print key " => " table[key];}' file | awk -F ',' 'NF > 2'
c176ada8afd5e7c6810816e9dd786c36 => 2group1,2group2,
e5e6648a85171a4af39bbf878926bef3 => 4group1,4group2,4group3,4group4,
perl:
perl -lane '
push #{$groups{$F[0]}}, $F[1]
} END {
for $g (keys %groups) {
print join ", ", #{$groups{$g}} if #{$groups{$g}} > 1
}
' file
The order of the output is indeterminate.
This might work for you (GNU sed):
sed -r 'H;x;s/((\S+)\s+\S+)((\n[^\n]+)*)\n\2\s+(\S+)/\1,\5\3/;x;$!d;x;s/.//;s/^\S+\s*//Mg;s/\n[^,]+$//Mg;s/,/, /g' file
Gather up all the lines of the file and use pattern matching to collapse the lines. At the end of the file, remove the keys and any unique lines and then print the remainder.

To list files based on unique part of the filename in Unix

I've a directory with below files in it -
111-xxx-typec_2015-10-13.csv.gz
111-xxx-typec_2015-10-14.csv.gz
222-yyy-typec_2015-10-13.csv.gz
222-yyy-typec_2015-10-14.csv.gz
333-zzz-typec_2015-10-13.csv.gz
333-zzz-typec_2015-10-14.csv.gz
444-ppp-typec_2015-10-13.csv.gz
444-ppp-typec_2015-10-14.csv.gz
444-ppp-typec_2015-10-15.csv.gz
I want to see the oldest file of each type (xxx, yyy, etc) only, i.e. the output should be,
111-xxx-typec_2015-10-13.csv.gz
222-yyy-typec_2015-10-13.csv.gz
333-zzz-typec_2015-10-13.csv.gz
444-ppp-typec_2015-10-13.csv.gz
Is there a way to do this?
What you could do is do an 'ls', pipe it through an 'AWK' script where you match the 'type', and check it against a dictionary. If it is in the list, ignore, otherwise print and add to list.
Something like this nawk script:
{
match($0, /(.*)-typec/, m);
if (matches[m[1]] == "")
{
print ;
matches[m[1]] = m[1];
}
}

How to condense a file: uniq occurences and sum another field

I have a very large file that looks something like this:
1,22,A
2,10,A
3,4,B
4,3,B
5,20,B
The second column tells me how many instances of the third column there are. So I want to collapse the third column (so that it is effectively uniqued), but add up the second column values. Desired output would be something like:
32,A
27,B
I can come up with some rather complicated ways to do this, but it seems like it ought to be rather simple...
I'm not sure what kind of "math" answer you would expect...
Given you have a file input.txt with the following content:
1,22,A
2,10,A
3,4,B
4,3,B
5,20,B
Create a new file with the following script in Ruby, put in the same directory as your input.txt, and run ruby script.rb from the console:
File.open('output.txt', 'w+') do |file|
result = {}
File.readlines("input.txt").each do |line|
values = line.split(',')
letter = values[2]
letter_value = values[1].to_i
result[letter] ||= 0
result[letter] += letter_value
end
result.each do |letter, value|
file << [value, letter].join(', ')
end
end
Then, look for your result in output.txt in the same directory.

Resources