Creating string of repeated characters in shell script [duplicate] - unix

This question already has answers here:
How can I repeat a character in Bash?
(36 answers)
Closed 8 years ago.
I need to generate a string of dots (.characters) as a variable.
I.e., in my Bash script, for input 15 I need to generate this string of length 15: ...............
I need to do so variably. I tried using this as a base (from Unix.com):
for i in {1..100};do printf "%s" "#";done;printf "\n"
But how do I get the 100 to be a variable?

You can get as many NULL bytes as you want from /dev/zero. You can then turn these into other characters. The following prints 16 lowercase a's
head -c 16 < /dev/zero | tr '\0' '\141'

len=100 ch='#'
printf '%*s' "$len" | tr ' ' "$ch"

Easiest and shortest way without a loop
VAR=15
Prints as many dots as VAR says (change the first dot to any other character if you like):
printf '.%.0s' {1..$VAR}
Saves the dotted line in a variable to be used later:
line=`printf '.%.0s' {1..$VAR}`
echo "Sign here $line"
-Blatantly stolen from dogbane's answer https://stackoverflow.com/a/5349842/3319298
Edit: Since I have now switched to fish shell, here is a function defined in config.fish that does this with convenience in that shell:
function line -a char -a length
printf '%*s\n' $length "" | tr ' ' $char
end
Usage: line = 8 produces ========, line \" 8 produces """""""".

On most systems, you could get away with a simple
N=100
myvar=`perl -e "print '.' x $N;"`

I demonstrated a way to accomplish this task with a single command in another question, assuming it's a fixed number of characters to be produced.
I added an addendum to the end about producing a variable number of repeated characters, which is what you asked for, so my previous answer is relevant here:
https://stackoverflow.com/a/17030976/2284005
I provided a full explanation of how it works there. Here I'll just add the code to accomplish what you're asking for:
n=20 # This the number of characters you want to produce
variable=$(printf "%0.s." $(seq 1 $n)) # Fill $variable with $n periods
echo $variable # Output content of $variable to terminal
Outputs:
....................

You can use C-style for loops in Bash:
num=100
string=$(for ((i=1; i<=$num; i++));do printf "%s" "#";done;printf "\n")
Or without a loop, using printf without using any externals such as sed or tr:
num=100
printf -v string "%*s" $num ' ' '' $'\n'
string=${string// /#}

The solution without loops:
N=100
myvar=`seq 1 $N | sed 's/.*/./' | tr -d '\n'`

num=100
myvar=$(jot -b . -s '' $num)
echo $myvar

When I have to create a string that contains $x repetitions of a known character with $x below a constant value, I use this idiom:
base='....................'
# 0 <= $x <= ${#base}
x=5
expr "x$base" : "x\(.\{$x\}\)" # Will output '\n' too
Output:
.....

Related

Adding previous lines to current line unless pattern is found in unix shell script

I am facing an issue while adding previous lines to current line for a pattern. I have a 43 MB file in unix. The snippet is shown below:
AAA7034 new value and a old value
A
78698 new line and old value
BCA0987 old value and new value
new value
What I want is :
AAA7034 new value and a old value A 78698 new line and old value
BCA0987 old value and new value new value
Means I have add all the the lines till next pattern is found ( first pattern is : AAA and next pattern is : BCA )
because of high size of files..not sure if awk/sed shall work. Any bash script is appreciated.
You can combine all patterns and perform a regex match. Try something like this (it is just a scratch, you should trim the output if you need):
#!/bin/bash
patterns="^(AAA|BCS|BABA|BCA)"
file="$1"
while IFS= read -r line; do
if [[ "$line" =~ $patterns ]] ; then
echo # prints new line
fi
echo -n $line " " # prints the line itself and a space as a separator
done < "$file"
You can redirect the output to a file, of course.
It's not really clear precisely what you want. You've stated that you want to match the patterns 'AAA' and 'BCA', and later expanded that to "patter shall be like: AAA, BCS, BABA, BCA". I don't know if that means that you only want to match 'AAA', 'BCA', 'AAA, 'BCS, 'BABA', and 'BCA, or if you want to match 3 or 4 characters strings containing only 'A', B', 'C', and 'S', but it sounds like you are just looking for:
awk '/[A-Z]{3,4}/{printf "\n"} { printf "%s ", $0} END {printf "\n"}' input-file
Change the pattern as needed when your requirements are made more precise.
Based on the comment, it is trivial to convert any awk program to perl. Here is (basically) the output of a2p on the above awk script, with changes to reflect the stated pattern:
#!/usr/bin/env perl
while (<>) {
chomp;
if (/AAA|BCA|BCS|BABA/) {
printf "\n";
}
printf '%s ', $_;
}
printf "\n";
You can simplify that a bit:
perl -pe 'chomp; printf "\n" if /AAA|BCA|BCS|BABA/; printf "%s ", $_' input-file; echo

Re-order fields from nth to NF-1 with awk

My problem :
I have a pipe delimiter input file and I need to put the last column at first, drop the 2nd, and print from the third to the last-1.
Currently, this works with my 7 fields file :
awk 'BEGIN { FS="|"; OFS="|"; } {print $NF,$2,$3,$4,$5,$6}'
But i am looking for something more automatic, which works with n number of columns
I have tried a loop, but it prints all fields on separate line.
awk 'BEGIN { FS="|"; OFS="|"; } {for(i=2;i<=NF-1;++i)print $i}'
But this print all fields on separate rows, plus the first is not printed.
I have tried many another solutions but no luck so far...
Is there any option i'm missing ?
Input :
"PRILYYYTVENIZKEB#XXXX"|2017-09-08T09:46:40.000|"AUDIOTEL"|"Virement +"|25|"50747071"|6440bc7a8f41a96f89ee123159b7eb819a99767c9107b24e9d346eb3835f74a7
"CSRBQDVXJEFPACTKOO#AAA"|2020-02-11T10:02:20.000|"WEB"|"Virement +"|25|"51254683"|cd558b1319595aa63929d8cf3d8213ccc004aac089e6dd3bbad1d595ad010335
"WOGMKZLBHDFPACTKHG#ZZZZ"|2019-07-03T12:00:00.000|"WEB"|"Virement +"|195|"51080106"|f128a559267df0f9a6352fb40f65594aa8f5d01d5c3b90f471ffa0be07739c4d
Expected :
6440bc7a8f41a96f89ee123159b7eb819a99767c9107b24e9d346eb3835f74a7|2017-09-08T09:46:40.000|"AUDIOTEL"|"Virement +"|25|"50747071"
cd558b1319595aa63929d8cf3d8213ccc004aac089e6dd3bbad1d595ad010335|2020-02-11T10:02:20.000|"WEB"|"Virement +"|25|"51254683"
f128a559267df0f9a6352fb40f65594aa8f5d01d5c3b90f471ffa0be07739c4d|2019-07-03T12:00:00.000|"WEB"|"Virement +"|195|"51080106"
(email on 2nd is deleted, and hash on last is put on first).
Global context (maybe another solution more direct is possible) :
My goal is to replace the first field with a hash-calculated value of this field.
I use a temporary file to add my calculated field at the end of my file :
while read line
do
echo -n "$line|"
echo -n $line | cut -d'|' -f1 | sed "s/\"//g" | tr -d '\n' | sha256sum | cut -d' ' -f1
done < $f_x_file_name.$f_x_file_extension > $f_x_file_name.hash.$f_x_file_extension ;
Thanks !
Regards
If I understand correctly what you mean by:
put the last column at first, drop the 2nd, and print from the third
to the last-1
then a more concise way of saying that would be:
move the first column to the 2nd and move the last column to the first
which would be:
awk 'BEGIN{FS=OFS="|"} {$2=$1; $1=$NF; NF--} 1' file
for example:
$ echo 'a|b|c|d' | awk 'BEGIN{FS=OFS="|"} {$2=$1; $1=$NF; NF--} 1'
d|a|c
Using NF-- to delete the last column is undefined behavior per POSIX, if your awk doesn't support it then just change NF-- to sub(/\|[^|]*$/,"").
If I misunderstood what you're trying to do then edit your question to provide concise, testable sample input and expected output.
based on the script, not your description, you want
awk 'BEGIN{FS=OFS="|"} {$1=$NF; NF--}1' file
example:
$ seq 5 | paste -sd'|' | awk 'BEGIN{FS=OFS="|"} {$1=$NF; NF--}1'
5|2|3|4
Modify the script where you calculate the hash.
while read -r line
do
# hash from your command:
# hash=$(echo -n $line | cut -d'|' -f1 | sed "s/\"//g" | tr -d '\n' |
# sha256sum | cut -d' ' -f1)
# Slightly changed
hash=$(cut -d'|' -f1 <<<"${line}"| tr -d '\n"' | sha256sum | cut -d' ' -f1)
echo "${hash}|$(cut -d '|' -f2- <<< "${line}")"
done < "$f_x_file_name"."$f_x_file_extension" > "$f_x_file_name".hash."$f_x_file_extension"
or even easier:
while IFS='|' read -r firstfield otherfields
do
hash=$(sha256sum <<< "${firstfield}" | cut -d' ' -f1)
echo "${hash}|${otherfields}"
done < "$f_x_file_name"."$f_x_file_extension" > "$f_x_file_name".hash."$f_x_file_extension"
While in the current situation, this is easily implemented, I'm always wondering why there is no concat function which does the reverse operation of split:
split(s, a[, fs ]): Split the string s into array elements a[1], a[2], ..., a[n], and return n. All elements of the array shall be deleted before the split is performed. The separation shall be done with the ERE fs or with the field separator FS if fs is not given. Each array element shall have a string value when created and, if appropriate, the array element shall be considered a numeric string (see Expressions in awk). The effect of a null string as the value of fs is unspecified.
concat(a[, ofs ]): Concatenate the array elements a[1], a[2], ..., a[n] with ofs as field separator or OFS if ofs is not given. Numeric string values are converted to strings using CONVFMT. The first n array elements are concatenated, where such that n+1 in a returns 0.
The implementation of concat would read:
function concat(a, ofs, s,i) {
ofs=(ofs=="" && ofs==0 ? OFS : ofs)
i=1; while(i in a) { s = s (i==1?"":ofs) a[i]; i++ }
return s
}
Using this function, you could then easily create an array with elements and assemble it as a string of fields:
BEGIN{FS=OFS="|"}
{ n=split($0,a) }
{ a[2]=a[1]; a[1]=a[n]; delete a[n] }
{ print concat(a) }
See comments below for more information about this.

Use sed to replace all occurrences of strings which start with 'xy' and of length 5 or more

I am running AIX 6.1
I have a file which contains strings/words starting with some specific characters, say 'xy' or 'Xy' or 'Xy' or 'XY' (case insensitive) and I need to mask the entire word/string with asterisks '*' if the word is greater than say 5 characters.
e.g. I need a sed command which when run against a file containing the below line...
This is a test line xy12345 xy12 Xy123 Xy11111 which I need to replace specific strings
should give below as the output
This is a test line xy12 which I need to replace specific strings
I tried the below commands (did not yet come to the stage where I restrict to word lengths) but it does not work and displays the full line without any substitutions.
I tried using \< and > as well as \b for word identification.
sed 's/\<xy\(.*\)\>/******/g' result2.csv
sed 's/\bxy\(.*\)\b******/g' result2.csv
You can try with awk:
echo 'This is a test line xy12345 xy12 Xy123 Xy11111 which I need to replace specific strings' | awk 'BEGIN{RS=ORS=" "} !(/^[xX][yY]/ && length($0)>=5)'
The awk record separator is set to a space in order to be able to get the length of each word.
This works with GNU awk in --posix and --traditional modes.
With sed for the mental exercice
sed -E '
s/(^|[[:blank:]])([xyXY])([xyXY].{2}[^[:space:]]*)([^[:space:]])/\1#\3#/g
:A
s/(#[^#[:blank:]]*)[^#[:blank:]](#[#]*)/\1#\2/g
tA
s/#/*/g'
This need to not have # in the text.
A simple POSIX awk version :
awk '{for(i=1;i<=NF;++i) if ($i ~ /^[xX][yY]/ && length($i)>=5) gsub(/./,"*",$i)}1'
This, however, does not keep the spacing intact (multiple spaces are converted to a single one), the following does:
awk 'BEGIN{RS=ORS=" "}(/^[xX][yY]/ && length($i)>=5){gsub(/./,"*")}1'
You may use awk:
s='This is a test line xy12345 xy12 Xy123 Xy11111 which I need to replace specific strings xy123 xy1234 xy12345 xy123456 xy1234567'
echo "$s" | awk 'BEGIN {
ORS=RS=" "
}
{
for(i=1;i<=NF;i++) {
if(length($i) >= 5 && $i~/^[Xx][Yy][a-zA-Z0-9]+$/)
gsub(/./,"*", $i);
print $i;
}
}'
A one liner:
awk 'BEGIN {ORS=RS=" "} { for(i=1;i<=NF;i++) {if(length($i) >= 5 && $i~/^[Xx][Yy][a-zA-Z0-9]+$/) gsub(/./,"*", $i); print $i; } }'
# => This is a test line ******* xy12 ***** ******* which I need to replace specific strings ***** ****** ******* ******** *********
See the online demo.
Details
BEGIN {ORS=RS=" "} - start of the awk: set the output record separator equal to the space record separator
{ for(i=1;i<=NF;i++) {if(length($i) >= 5 && $i~/^xy[a-zA-Z0-9]+$/) gsub(/./,"*", $i); print $i; } } - iterate over each field (with for(i=1;i<=NF;i++)) and if the current field ($i) length is equal or more than 5 (length($i) >= 5) and it matches a Xy and (&&) 1 or more alphanumeric chars pattern ($i~/^[Xx][Yy][a-zA-Z0-9]+$/), then replace each char with * (with gsub(/./,"*", $i)) and then print the current field value.
This might work for you (GNU sed):
sed -r ':a;/\bxy\S{5,}\b/I!b;s//\n&\n/;h;s/[^\n]/*/g;H;g;s/\n.*\n(.*)\n.*\n(.*)\n.*/\2\1/;ta' file
If the current line does not contain a string which begins with xy case insensitive and 5 or more following characters, then there is no work to be done.
Otherwise:
Surround the string by newlines
Copy the pattern space (PS) to the hold space (HS)
Replace all characters other than newlines with *'s
Append the PS to the HS
Replace the PS with the HS
Swap the strings between the newlines retaining the remainder of the first line
Repeat

Maximum number of characters in a field of a csv file using unix shell commands?

I have a csv file. In one of the fields, say the second field, I need to know maximum number of characters in that field. For example, given the file below:
adf,jlkjl,lkjlk
jf,j,lkjljk
jlkj,lkejflkj,adfafef,
jfje,jj,lkjlkj
jjee,eeee,ereq
the answer would be 8 because row 3 has 8 characters in the second field. I would like to integrate this into a bash script, so common unix command line programs are preferred. Imaginary bonus points for explaining what the command is doing.
EDIT: Here is what I have so far
cut --delimiter=, -f 2 test.csv | wc -m
This gives me the character count for all of the fields, not just one, so I still have progress to make.
I would use awk for the task. It uses a comma to split line in fields and for each line checks if the length of second field is bigger that the value already saved.
awk '
BEGIN {
FS = ","
}
{ c = length( $2 ) > c ? length( $2 ) : c }
END {
print c
}
' infile
Use it as a one-liner and assign the return value to a variable, like:
num=$(awk 'BEGIN { FS = "," } { c = length( $2 ) > c ? length( $2 ) : c } END { print c }' infile)
Well #oob, you basically provided the answer with your last edit, and it's the most simple of all answers given. However, I also like #Birei's answer just because I enjoy AWK. :-)
I too had to find the longest possible value for a given field inside a text file today. Tested with your sample and got the expected 8.
cut -d, -f2 test.csv | wc -L
As you see, just a matter of using the correct option for wc (which I hope you have already figured by now).
My solution is to loop over the lines. Than I exchange the commas with new lines to loop over the words than I check which is the longest word and save the data.
#!/bin/bash
lineno=1
matchline=0
matchlen=0
for line in $(cat input.txt); do
words=`echo $line | sed -e 's/,/\n/g'`
for word in $words; do
# echo "line: $lineno; length: ${#word}; input: $word"
if [ $matchlen -lt ${#word} ]; then
matchlen=${#word}
matchline=$lineno
fi
done;
lineno=$(($lineno + 1))
done;
echo max length is $matchlen in line $matchline
Bash and Coreutils Solution
There are a number of ways to solve this, but I vote for simplicity. Here's a solution that uses Bash parameter expansion and a few standard shell utilities to measure each line:
cut -d, -f2 /tmp/foo |
while read; do
echo ${#REPLY}
done | sort | tail -n1
The idea here is to split the CSV file, and then use the parameter length expansion of the implicit REPLY variable to measure the characters on each line. When we sort the measurements, the last line of the sorted output will hold the length of the longest line found.
cut out the desired column
print each line length
sort the line lengths
grab the max line length
cut -d, -f2 test.csv | awk '{print length($0);}' | sort -n | tail -n 1

How do I delete a matching line and the previous one?

I need delete a matching line and one previous to it.
e.g In file below I need to remove lines 1 & 2.
I tried "grep -v -B 1 "page.of." 1.txt
and I expected it to not print the matchning lines and the context.
I tried the How do I delete a matching line, the line above and the one below it, using sed? but could not understand the sed usage.
---1.txt--
**document 1** -> 1
**page 1 of 2** -> 2
testoing
testing
super crap blah
**document 1**
**page 2 of 2**
You want to do something very similar to the answer given
sed -n '
/page . of ./ { #when pattern matches
n #read the next line into the pattern space
x #exchange the pattern and hold space
d #skip the current contents of the pattern space (previous line)
}
x #for each line, exchange the pattern and hold space
1d #skip the first line
p #and print the contents of pattern space (previous line)
$ { #on the last line
x #exchange pattern and hold, pattern now contains last line read
p #and print that
}'
And as a single line
sed -n '/page . of ./{n;x;d;};x;1d;p;${x;p;}' 1.txt
grep -v -B1 doesnt work because it will skip those lines but will include them later on (due to the -B1. To check this out, try the command on:
**document 1** -> 1
**page 1 of 2** -> 2
**document 1**
**page 2 of 2**
**page 3 of 2**
You will notice that the page 2 line will be skipped because that line won't be matched and the next like wont be matched.
There's a simple awk solution:
awk '!/page.*of.*/ { if (m) print buf; buf=$0; m=1} /page.*of.*/ {m=0}' 1.txt
The awk command says the following:
If the current line has that "page ... of ", then it will signal that you haven't found a valid line. If you do not find that string, then you print the previous line (stored in buf) and reset the buffer to the current line (hence forcing it to lag by 1)
grep -vf <(grep -B1 "page.*of" file | sed '/^--$/d') file
Not too familiar with sed, but here's a perl expression to do the trick:
cat FILE | perl -e '#a = <STDIN>;
for( $i=0 ; $i <= $#a ; $i++ ) {
if($i > 0 && $a[$i] =~ /xxxx/) {
$a[$i] = "";
$a[$i-1] = "";
}
} print #a;'
edit:
where "xxxx" is what you are trying to match.
Thanks, I was trying to use the awk command given by Foo Bah
to delete the matching line and the previous one. I have to use it multiple times, so for the matching part I use a variable. The given awk command works, but when using a variable it does not work (i.e. it does not delete the matching & prev. line). I tried:
awk -vvar="page.*of.*" '!/$var/ { if (m) print buf; buf=$0; m=1} /$var/ {m=0}' 1.txt

Resources