jq string interpolation with line breaks - jq

I have this expression:
echo '{"foo":"bar","boo":"moo"}' | jq -r '"\(.foo)|\(.boo)"'
bar|moo
Now, imagine that the filter is long, so I would like to break it into separate lines for readability while still producing the same output, conceptually like this:
jq '"
\(.foo)|
\(.boo)
"'
This however is not valid. How to do this?

Outside of strings, your jq filter may have line breaks as you see fit. As for the strings, you can piece the single parts together using +:
echo '{"foo":"bar","boo":"moo"}' | jq -r '
"\(.foo)|" +
"\(.boo)"
'
bar|moo
Demo

You might also wish to consider programmatic approaches, e.g.
map_values( "\(.)" )
| join("|")
or if you need to be picky about which keys to include, you could begin with:
[with_entries(.value |= "\(.)")[]]
| join("|")
or
. as $in
| reduce keys_unsorted[] as $k (""; . + "\($in[$k])|" )
| sub("[|]$"; "")
or:
. as $in
| keys_unsorted as $keys
| [ "\(.[$keys[0]])",
($keys[1:][] | "|\($in[.])" )]
| add

Related

remove a substring from a string

I want to remove , from a string in jq. Take the following example, how to remove , when outputting 1,2?
$ jq -r .x <<< '{"x":"1,2"}'
1,2
To remove specific positions from a string, use the indices you want to keep:
jq -r '.x | .[:1] + .[2:]' <<< '{"x":"1,2"}'
12
Demo
To remove one occurrence at any position, use sub to replace with the empty string
jq -r '.x | sub(","; "")' <<< '{"x":"1,2,3"}'
12,3
Demo
To remove all occurrences, use gsub the same way
jq -r '.x | sub(","; "")' <<< '{"x":"1,2,3"}'
123
Demo
You didn't make clear what output you wanted. A literal reading suggests you want 12, but I find it more likely that you want each of the comma-separated items to be output on separate lines. The following achieves this:
jq -r '.x | split(",")[]'
For the provided input, this outputs
1
2
Demo on jqplay
You can use sub.
Filter
.x | sub(","; " ")
Input
{"x":"1,2"}
Output
1 2
Demo
https://jqplay.org/s/IaViogZTsI

Re-order fields from nth to NF-1 with awk

My problem :
I have a pipe delimiter input file and I need to put the last column at first, drop the 2nd, and print from the third to the last-1.
Currently, this works with my 7 fields file :
awk 'BEGIN { FS="|"; OFS="|"; } {print $NF,$2,$3,$4,$5,$6}'
But i am looking for something more automatic, which works with n number of columns
I have tried a loop, but it prints all fields on separate line.
awk 'BEGIN { FS="|"; OFS="|"; } {for(i=2;i<=NF-1;++i)print $i}'
But this print all fields on separate rows, plus the first is not printed.
I have tried many another solutions but no luck so far...
Is there any option i'm missing ?
Input :
"PRILYYYTVENIZKEB#XXXX"|2017-09-08T09:46:40.000|"AUDIOTEL"|"Virement +"|25|"50747071"|6440bc7a8f41a96f89ee123159b7eb819a99767c9107b24e9d346eb3835f74a7
"CSRBQDVXJEFPACTKOO#AAA"|2020-02-11T10:02:20.000|"WEB"|"Virement +"|25|"51254683"|cd558b1319595aa63929d8cf3d8213ccc004aac089e6dd3bbad1d595ad010335
"WOGMKZLBHDFPACTKHG#ZZZZ"|2019-07-03T12:00:00.000|"WEB"|"Virement +"|195|"51080106"|f128a559267df0f9a6352fb40f65594aa8f5d01d5c3b90f471ffa0be07739c4d
Expected :
6440bc7a8f41a96f89ee123159b7eb819a99767c9107b24e9d346eb3835f74a7|2017-09-08T09:46:40.000|"AUDIOTEL"|"Virement +"|25|"50747071"
cd558b1319595aa63929d8cf3d8213ccc004aac089e6dd3bbad1d595ad010335|2020-02-11T10:02:20.000|"WEB"|"Virement +"|25|"51254683"
f128a559267df0f9a6352fb40f65594aa8f5d01d5c3b90f471ffa0be07739c4d|2019-07-03T12:00:00.000|"WEB"|"Virement +"|195|"51080106"
(email on 2nd is deleted, and hash on last is put on first).
Global context (maybe another solution more direct is possible) :
My goal is to replace the first field with a hash-calculated value of this field.
I use a temporary file to add my calculated field at the end of my file :
while read line
do
echo -n "$line|"
echo -n $line | cut -d'|' -f1 | sed "s/\"//g" | tr -d '\n' | sha256sum | cut -d' ' -f1
done < $f_x_file_name.$f_x_file_extension > $f_x_file_name.hash.$f_x_file_extension ;
Thanks !
Regards
If I understand correctly what you mean by:
put the last column at first, drop the 2nd, and print from the third
to the last-1
then a more concise way of saying that would be:
move the first column to the 2nd and move the last column to the first
which would be:
awk 'BEGIN{FS=OFS="|"} {$2=$1; $1=$NF; NF--} 1' file
for example:
$ echo 'a|b|c|d' | awk 'BEGIN{FS=OFS="|"} {$2=$1; $1=$NF; NF--} 1'
d|a|c
Using NF-- to delete the last column is undefined behavior per POSIX, if your awk doesn't support it then just change NF-- to sub(/\|[^|]*$/,"").
If I misunderstood what you're trying to do then edit your question to provide concise, testable sample input and expected output.
based on the script, not your description, you want
awk 'BEGIN{FS=OFS="|"} {$1=$NF; NF--}1' file
example:
$ seq 5 | paste -sd'|' | awk 'BEGIN{FS=OFS="|"} {$1=$NF; NF--}1'
5|2|3|4
Modify the script where you calculate the hash.
while read -r line
do
# hash from your command:
# hash=$(echo -n $line | cut -d'|' -f1 | sed "s/\"//g" | tr -d '\n' |
# sha256sum | cut -d' ' -f1)
# Slightly changed
hash=$(cut -d'|' -f1 <<<"${line}"| tr -d '\n"' | sha256sum | cut -d' ' -f1)
echo "${hash}|$(cut -d '|' -f2- <<< "${line}")"
done < "$f_x_file_name"."$f_x_file_extension" > "$f_x_file_name".hash."$f_x_file_extension"
or even easier:
while IFS='|' read -r firstfield otherfields
do
hash=$(sha256sum <<< "${firstfield}" | cut -d' ' -f1)
echo "${hash}|${otherfields}"
done < "$f_x_file_name"."$f_x_file_extension" > "$f_x_file_name".hash."$f_x_file_extension"
While in the current situation, this is easily implemented, I'm always wondering why there is no concat function which does the reverse operation of split:
split(s, a[, fs ]): Split the string s into array elements a[1], a[2], ..., a[n], and return n. All elements of the array shall be deleted before the split is performed. The separation shall be done with the ERE fs or with the field separator FS if fs is not given. Each array element shall have a string value when created and, if appropriate, the array element shall be considered a numeric string (see Expressions in awk). The effect of a null string as the value of fs is unspecified.
concat(a[, ofs ]): Concatenate the array elements a[1], a[2], ..., a[n] with ofs as field separator or OFS if ofs is not given. Numeric string values are converted to strings using CONVFMT. The first n array elements are concatenated, where such that n+1 in a returns 0.
The implementation of concat would read:
function concat(a, ofs, s,i) {
ofs=(ofs=="" && ofs==0 ? OFS : ofs)
i=1; while(i in a) { s = s (i==1?"":ofs) a[i]; i++ }
return s
}
Using this function, you could then easily create an array with elements and assemble it as a string of fields:
BEGIN{FS=OFS="|"}
{ n=split($0,a) }
{ a[2]=a[1]; a[1]=a[n]; delete a[n] }
{ print concat(a) }
See comments below for more information about this.

unix sort on column without separator

I'd like to sort a file content with a Unix script depending on a particular column :
ex : sort the following file on the 3rd column :
ax5aa
aa3ya
fg7ds
pp0dd
aa1bb
would result as
pp0dd
aa1bb
aa3ya
ax5aa
fg7ds
I have tried sort -k 3,3, but it just sort on the 3d group of word (separator=SPACE).
Is there any way to have unix sort behave the way I like, or should I use another tool?
$ sort --key=1.3,1.3 inputfile
pp0dd
aa1bb
aa3ya
ax5aa
fg7ds
man page of sort:
[...]
-k, --key=POS1[,POS2]
start a key at POS1 (origin 1), end it at POS2 (default end of line)
[...]
POS is F[.C][OPTS], where F is the field number and C the character position in the field; both are origin 1. If neither -t nor -b is in effect, characters in a field are counted from the beginning of the preceding whitespace. OPTS is one or more single-letter ordering options, which override global ordering options for that key. If no key is given, use the entire line as the key.
With --key=1.3,1.3, you said that there only one field (the entire line) and that you're comparing the third character position of this field.
use sed to create the columns before sorting
$ echo "ax5aa
aa3ya
fg7ds
pp0dd
aa1bb" | sed 's/\(.\)/\1 /g' | sort -t ' ' -k3,3 | tr -d ' '
pp0dd
aa1bb
aa3ya
ax5aa
fg7ds
cat inputfile | perl -npe 's/(.)/ $1/g' | sort -k 3,3 | perl -npe 's/ //g'
I would directly stick to perl and define a comparator
echo $content | perl -e 'print sort {substr($a,3,1) cmp substr($b,3,1)} <>;'
I had the same problem with lines that have one or more spaces before the line segment used as key.
A field separator which is never present in the text to be sorted makes the whole line one field so you can use e.g.:
sort -n -t\| -k1.3,1.3 inputfile

Maximum number of characters in a field of a csv file using unix shell commands?

I have a csv file. In one of the fields, say the second field, I need to know maximum number of characters in that field. For example, given the file below:
adf,jlkjl,lkjlk
jf,j,lkjljk
jlkj,lkejflkj,adfafef,
jfje,jj,lkjlkj
jjee,eeee,ereq
the answer would be 8 because row 3 has 8 characters in the second field. I would like to integrate this into a bash script, so common unix command line programs are preferred. Imaginary bonus points for explaining what the command is doing.
EDIT: Here is what I have so far
cut --delimiter=, -f 2 test.csv | wc -m
This gives me the character count for all of the fields, not just one, so I still have progress to make.
I would use awk for the task. It uses a comma to split line in fields and for each line checks if the length of second field is bigger that the value already saved.
awk '
BEGIN {
FS = ","
}
{ c = length( $2 ) > c ? length( $2 ) : c }
END {
print c
}
' infile
Use it as a one-liner and assign the return value to a variable, like:
num=$(awk 'BEGIN { FS = "," } { c = length( $2 ) > c ? length( $2 ) : c } END { print c }' infile)
Well #oob, you basically provided the answer with your last edit, and it's the most simple of all answers given. However, I also like #Birei's answer just because I enjoy AWK. :-)
I too had to find the longest possible value for a given field inside a text file today. Tested with your sample and got the expected 8.
cut -d, -f2 test.csv | wc -L
As you see, just a matter of using the correct option for wc (which I hope you have already figured by now).
My solution is to loop over the lines. Than I exchange the commas with new lines to loop over the words than I check which is the longest word and save the data.
#!/bin/bash
lineno=1
matchline=0
matchlen=0
for line in $(cat input.txt); do
words=`echo $line | sed -e 's/,/\n/g'`
for word in $words; do
# echo "line: $lineno; length: ${#word}; input: $word"
if [ $matchlen -lt ${#word} ]; then
matchlen=${#word}
matchline=$lineno
fi
done;
lineno=$(($lineno + 1))
done;
echo max length is $matchlen in line $matchline
Bash and Coreutils Solution
There are a number of ways to solve this, but I vote for simplicity. Here's a solution that uses Bash parameter expansion and a few standard shell utilities to measure each line:
cut -d, -f2 /tmp/foo |
while read; do
echo ${#REPLY}
done | sort | tail -n1
The idea here is to split the CSV file, and then use the parameter length expansion of the implicit REPLY variable to measure the characters on each line. When we sort the measurements, the last line of the sorted output will hold the length of the longest line found.
cut out the desired column
print each line length
sort the line lengths
grab the max line length
cut -d, -f2 test.csv | awk '{print length($0);}' | sort -n | tail -n 1

Counting no. of Delimiter in a row in a File in Unix

I have a file 'records.txt' which contains over 200,000 records.
Each record is on a separate line and has multiple fields separated by a delimiter '|'.
Each row should have 35 fields, but the problem is one of these rows has <>35 fields, i.e. <>35 '|' characters.
Can someone please suggest a way in Unix, by which I can identify the row. (Like getting count of '|' characters in each row in the file)
Try this:
awk -F '|' 'NF != 35 {print NR, $0} ' your_filefile
This small perl script should do it:
cat records.txt | perl -ne '$t = $_; $t =~ s/[^\|]//g; print unless length($t) == 35;'
This works by removing all the characters except the |, then counting what is left.
Greg's way with bash stuff, for the bash friends out there :)
while read n; do [ `echo $n | tr -cd '|' | wc -c` != 35 ] && echo $n; done < records.txt

Resources