Unix Script to remove last seven characters from a variable - unix

Need to remove the last seven characters from a variable.
For example if my variable string is
COLUMN_NAME||','||
then it should output COLUMN_NAME
I have tried the below but last pipe symbol only getting removed
var=$(lastline%|)
var=$(lastline%|*)
Result : COLUMN_NAME||','|

To remove the last 7 characters:
$ var="COLUMN_NAME||','||"
$ echo "${var%???????}"
COLUMN_NAME
To remove everything after the first pipe:
$ echo "${var%%|*}"
COLUMN_NAME
See https://www.gnu.org/software/bash/manual/bashref.html#Shell-Parameter-Expansion
and https://www.gnu.org/software/bash/manual/bashref.html#Pattern-Matching

The old school way
echo "COLUMN_NAME||','||" | rev|cut -c 8-|rev
So you are just reversing the string, deleting first 7 characters and again reversing the string.
Use the $variable with echo to do the same.
You can also use awk like below which would be faster.
awk '{print substr($0, 1, length($0)-7)}'
Example:
$ export variable1="COLUMN_NAME||','||"
$ echo $variable1|rev|cut -c 8-|rev
COLUMN_NAME
$ echo $variable1|awk '{print substr($0, 1, length($0)-7)}'
COLUMN_NAME

You need to use two % to strip the longest match:
$ r="COLUMN_NAME||','||"
$ echo ${r%%|*}
COLUMN_NAME
As BashFAQ says in Removing part of a string:
% means "remove the shortest possible match from the end of the
variable's contents".
%% means "remove the longest possible match from the end of the
variable's contents".

You could find the length of your string/s and index from that point:
str1="1234567890foobar";
strlen=${#str1};
str2=${str1:0:$strlen-7};
echo $str2;

Related

Replace last 9 delimeters "," with "|" in Unix

I want to replace last 9 "," delimeters with "|" in a file.
For example, from:
abcd,3,5,5,7,7,1,2,3,4
"ashu,pant,something",3,5,5,7,7,8,7,8,8,8
to:
abcd|3|5|5|7|7|1|2|3|4
"ashu,pant,something"|3|5|5|7|7|8|7|8|8|8
Help would be really appreciated.
Not exactly the same but replace all after the second occurrence with GNU sed:
$ echo \"ashu,pant\",3,5,5,7,7,87,8,8,8 |
sed 's/,/|/2g'
"ashu,pant"|3|5|5|7|7|87|8|8|8
Edit to match your changed requirements:
Hackish, but first reverse lines and replace all commas with pipes, then replace pipes with commas starting from 10th occurrence:
$ echo -e \"ashu,pant\",3,5,5,7,7,87,8,8,8\\nabcd,3,5,5,7,7,1,2,3,4 |
rev |
sed 's/,/|/g; s/|/,/10g' |
rev
"ashu,pant"|3|5|5|7|7|87|8|8|8
abcd|3|5|5|7|7|1|2|3|4
You could also use GNU awk and FPAT to replace all comma outside of quotes:
$ echo -e \"ashu,pant\",3,5,5,7,7,87,8,8,8\\nabcd,3,5,5,7,7,1,2,3,4 |
awk 'BEGIN{FPAT="([^,]+)|(\"[^\"]+\")";OFS="|"}{$1=$1}1'
"ashu,pant"|3|5|5|7|7|87|8|8|8
abcd|3|5|5|7|7|1|2|3|4
awk '{gsub(/[[:digit:]]/," |&")gsub(/, /,"")}1' file
output
abcd|3|5|5|7|7|1|2|3|4
"ashu,pant,something"|3|5|5|7|7|8|7|8|8|8

Define specific output count in EXPR command

I have a scenario wherein I want to have 9 character count in expr.
I have sample code which is:
var1=012345678 #this is 9 characters
sum=`expr $var1 + 1`
echo "$sum"
Here is the result:
./sample.sh : 12345679 #this is only 8 characters
My expected output:
./sample.sh : 012345679
Any help on this?
The leading zero is removed when doing the math.
You can force a 9 length output using printf "%09d" 123.
When you try to use the the syntax ((sum=${var1} + 1 )) you have another problem: When the first digit is 0, bash expects a different radix.
You can remove the first 0 with
var1=012345678
echo "${var1#0}"
This only helps with your input, not with 00012.
Removing the leading zeroes and printing the sum can be done with echo $((10#$var1))
var1=00012345678
((sum=$((10#$var1)) + 1))
printf "%09d\n" $sum
This can be solved easier with
var1=00012345678
echo "${var1} 1" |awk '{ printf("%09d\n", $1 + $2) }'
You can avoid the echo with
awk -v var1=$var1 'BEGIN { printf("%09d\n", var1 + 1) }'
The BEGIN is used for parsing without an inputfile.
The option -v is a clean way to use a shell variable inside an awk script.
Do not try things with quotes, one day it will shoot your own foot:
# Don't do this
awk 'BEGIN { printf("%09d\n", '${var1}' + 1) }' # Just do not do it

How to split and replace strings in columns using awk

I have a tab-delim text file with only 4 columns as shown below:
GT:CN:CNL:CNP:CNQ:FT .:2:a:b:c:PASS .:2:c:b:a:PASS .:2:d:c:a:FAIL
If the string "FAIL" is found in a specific column starting from column2 to columnN (all the strings are separated by ":") then it would need to replace the second element in that column to "-1". Sample output is shown below:
GT:CN:CNL:CNP:CNQ:FT .:2:a:b:c:PASS .:2:c:b:a:PASS .:-1:d:c:a:FAIL
Any help using awk?
With any awk:
$ awk 'BEGIN{FS=OFS="\t"} {for (i=2;i<=NF;i++) if ($i~/:FAIL$/) sub(/:[^:]+/,":-1",$i)} 1' file
GT:CN:CNL:CNP:CNQ:FT .:2:a:b:c:PASS .:2:c:b:a:PASS .:-1:d:c:a:FAIL
In order to split in awk you can use "split".
An example of it would be the following:
split(1,2,"3");
1 is the string you want to split
2 is the array you want to split it into
and 3 is the character that you want to be split on
e.g
string="hello:world"
result=`echo $string | awk '{ split($1,ARR,":"); printf("%s ",ARR[1]);}'`
In this case the result would be equal to hello, because we split the string to the " : " character and we printed the first half of the ARR, if we would print the second half (so printf("%s ",ARR[2])) of the ARR then it would be returned to result the "world".
With gawk:
awk '{$0=gensub(/[^:]*(:[^:]*:[^:]*:[^:]:FAIL)/,"-1\\1", "g" , $0)};1' File
with sed:
sed 's/[^:]*\(:[^:]*:[^:]*:[^:]:FAIL\)/-1\1/g' File
If you are using GNU awk, you can take advantage of the RT feature1 and split the records at tabs and newlines:
awk '$NF == "FAIL" { $2 = "-1"; } { printf "%s", $0 RT }' RS='[\t\n]' FS=':' infile
Output:
GT:CN:CNL:CNP:CNQ:FT .:2:a:b:c:PASS .:2:c:b:a:PASS .:-1:d:c:a:FAIL
1 The record separator that follows the current record.
Your requirements are somewhat vague, but I'm pretty sure this does what you want with bog standard awk (no gnu-awk extensions):
awk '/FAIL/{$2=-1}1' ORS=\\t RS=\\t FS=: OFS=: input

Getting the last x digits from output with grep command

I need help getting the last 16 digits from the output I get with this command ;
cat q5data.txt | grep -o '[0-9]*[0-9]\{16\}'
The output I get is :
6420029454020029
26787889786973463
92272417810036027222591368318424
1147142436072964
And id want the last 16 digits only of the numbers above, so it would look something like this :
6420029454020029
6787889786973463
7222591368318424
1147142436072964
So yeah, the question is, how would I get the last 16 digits ?
q5data contains this:
0111102.82575525572371251FriThuSat32169716436971243.1415 foo100001$$$3.14153
foo`3.1415Green100010blah2.8
2.85720948213811501Purple`WedTueBLACK1869228491762178BLACK$$3.14100001Feb010000
taoblahfoopiGreen010111
VOIDchiOrangeSatNILLVOIDBLACK$$$Sat3.14155378825854705118Mar$WHITEAug`Tue
4421929582063064
2.8$$$$BLACKSun$"blah$ThublahJun2057411253659033Orange$$Sun$$fubar'
BLACKSun8061215743158569Jul'010101`2.8MayFri$$'blah
100001$3.141533.14153taoBLACKWHITE3.141532.8'foo"chi`BLACK$$$3300209361826966
5976364681345632YellowFri"JanWHITEWedWHITE3652470302503667WHITE
1237496282374608WHITEpiNILLVOID110111WHITEApr'$$$2.83536505910579946111010
54891762211716313.14$$RedWedtaoMonFri110010$$3068508931421361$PurpleNILLWHITE9242959892278294Sep
000110BlueOct2582940799974379
phifoo$
Purple3.1415Green '
3.14BLACKTuepiYellowWHITEchi35798399298233973.14153.1415WHITEpitao$SunBlue010110
NULLBLACKTue1650665049652872`2.8$'$$$NULL3.14SatGreen$$3.141533.14153GreenVOIDJul"
chichifubarWedpiBLACK3.14153BLACKpiWHITEThu$ BLACK
blah2.8fubar4411479881441554$$`BLACKWHITE1101113.14SepWHITEJanThuGreen
$$WHITE'"3675572769992033fooBlueNULL100000'
BLACK 3.14WHITEDecfubarOrangeMay NILLWHITE2570850288634750 101011$$$Mon
Tue" 3.143.14phiSat7665425103246257MayphiTue'0010110101112.8BLACK$fubar"
0358649831711525100010'FriJunThu"3.14SunGreenfubarMonWHITEVOID$$$VOID1877369637528056Jan$010010
GreenTue000111ThuBLACKApr011010
Jun6244216458497289`PurpleAug$$$2685357800265115''2.8taopi101100$$chiFeb
9471418620899225VOID8617331495319240NULLWHITEblah5461478451014026
6352741666667105
WHITEfooOct011010pi$$$110100BLACKBLACKTuePurpleWHITE9093492271343727SepNovchi
Orange3.144596443153024361`"'$$78253311502390510101103.14153Friphi $Mon
1385825179552755YellowBLACK001011Sep$$RedFebfubarMon010010000010fubar"Jul0110117544560082562350
3.141653642540032022chi'Orange
1253542283769081tao4876457038962098MonSunMayWHITEYellow3.14153$Orange000101blah
RedSatNILLphiVOIDWedfubarGreen chi$$piphiJul$$$111001`9540185369262601NILLVOID
7006440921851679Wed3.14152.8chiGreenThu$$Tuefoofooblahpi$$$taopi$ May 'Feb
MayNILLblah8007182476768737JantaophiThutao$'Jul AprNILLBLACK'3.14153Feb3.1415
57067714600406493.141537231229468300261Mon$`SunNILL `NULL3.14153foochi1000109494160741986074
6577869219715310JulJanBLACKfubarBLACK2.8phiGreen0091496849086433
SunBlue2355648762601053 3.1415NULL$$$BLACK100011 ThuDecJun2.83.1415phiFeb"
9173525733960126BLACK 3.14153`110001PurpleRedFebfubarVOIDfoo$$$blah9330024102534139
Jun$$VOIDVOID4099554992034342Julpi9976331355660412taoWHITEGreen$$100010NILLVOID
3.14153phiSatphi43658305924319679197159994746838phipiApr
3.1415RedblahMayfooJul100011NovtaoMon3.141533.14JanGreen$$ OctNILLfooWHITE3.1415
96027197435535111011013.14VOID3583462878046156NULL3.1415blahOrangefoo 100101taofoo3.14153"3.1415
$$Red3.14Marblah'
3797758515388131tao $$$101010NULL2268984774582096BlueBlue3.14153Oct`
74321533961822933.14153994759453326425$$Jul001111PurpleGreenTueNovJan2742714540787707Blue$$$
0010003.14blah3.14ThuWHITE$$$$blah
3997313793176662 3.141463510697622121Yellow 3.1415'Jul`3.14153NILL2.8Thuphi
3134920264311067fooNov`NULL1111119335359393623483Tue$$$GreenVOIDtaoRedTueAug$$3.141532.8Sat'
3.14153Oct100010FebJan$$3.1415pi$$'chiRed$$$NILL8614261680268364
fubarBLACKpi110001110101pichi0126011887834143GreenNILLYellow NILLfoo101000 $$$
RedTueNULLThu2.814091424413091162.8 WHITE$WHITE60620358244865230211111773156587'pi
Yellow3.1415$$$$$
"Aug3.1415VOIDBLACK0810996065354809$$$NULLfoo$$Orange6850772642048628WedBLACK
BLACKBluepi 70173555329860651869981769139132phi$$$$$$3.14Feb2.86083883638401362
6420029454020029WHITE26787889786973463.14 3.14 Mon`92272417810036027222591368318424$$$tao
fooTue"1147142436072964AprPurpleSep
Okay so, at the begining of q5data we see 01111102. and right after this we see : 82575525572371251 (17 digits)
Id like it to output the last 16 digits ( 2575525572371251 )
Thank you :)
To match the end of the pattern use \b
grep -o '[0-9]\{16\}\b' q5data.txt
so this will match 16 digits up to a word boundary.
If you want to capture digits in strings terminated with non-numerical chars you need negative lookahead (with -P option, not available in standard grep)
$ grep -Po '[0-9]{16}(?![0-9])'
e.g.
$ echo "12345678901234567890aaa" | grep -Po '[0-9]{16}(?![0-9])'
5678901234567890
If you want the last 16 digits from every run of 16 or more digits, then you could filter through grep twice:
grep -Eo '[0-9]{16,}' <q5data.txt | grep -Eo '.{16}$'
The first selects all runs of 16 or more digits, and the second selects the last 16 characters from each run.
Testing this on the first line of your input file gives:
$ grep -Eo '[0-9]{16,}' <<<'0111102.82575525572371251FriThuSat32169716436971243.1415 foo100001$$$3.14153' | grep -Eo '.{16}$'
2575525572371251
2169716436971243
grep -Eo '([0-9]{16})$' q5data.txt

Remove all lines from file with duplicate value in field, including the first occurrence

I would like to remove all the lines in my data file that contain a value in column 2 that is repeated in column 2 in other lines.
I've sorted by the value in column 2, but can't figure out how to use uniq for just the values in one field as the values are not necessarily of the same length.
Alternately, I can remove lines with the duplicate using an awk one-liner like
awk -F"[,]" '!_[$2]++'
but this retains the line with the first incidence of the repeated value in col 2.
As an example, if my data is
a,b,c
c,b,a
d,e,f
h,i,j
j,b,h
I would like to remove ALL lines (including the first) where b occurs in the second column.
Like this:
d,e,f
h,i,j
Thanks for any advice!!
If the order is not important then the following should work:
awk -F, '
!seen[$2]++ {
line[$2] = $0
}
END {
for(val in seen)
if(seen[val]==1)
print line[val]
}' file
Output
h,i,j
d,e,f
Solution with grep:
grep -v -E '\b,b,\b' text.txt
Content of the file:
$ cat text.txt
a,b,c
c,b,a
d,e,f
h,i,j
j,b,h
a,n,b
b,c,f
$ grep -v -E '\b,b,\b' text.txt
d,e,f
h,i,j
a,n,b
b,c,f
Hope it helps
Some different awk:
awk -F, '
BEGIN {f=0}
FNR==NR {_[$2]++;next}
f==0 {
f=1
for(j in _)if(_[j]>1)delete _[j]
}
$2 in _
' file file
Explanation
The awk passes through the file twice - that's why it appears twice at the end. On the first pass (when FNR==NR) I count the number of times each column 2 appears in array _[]. At the end of the first pass, I then delete all elements of _[] where that element has been seen more than once. Then, on the second pass, I print lines whose second field appears in _[].

Resources