I want translate the word "abcd" into upper case "ABCD" using tr command then translate the "ABCD" to digit e.g 1234.
I want to chain two translations together (lowercase to upper case, then upper case to 1234) using pipes and also pipe the final output into more.
I'm not able to chain the second part.
echo "abcd" | tr '[:lower:]' '[:upper:]' > file1
Here I'm not sure how to add the second translation in the same command.
You can't do it in a single tr command; you can do it in a single pipeline:
echo "abcd" | tr '[:lower:]' '[:upper:]' | tr 'ABCD' '1234'
Note that your [:lower:] and [:upper:] notation will translate more than abcd to ABCD. If you want to extend the mapping of digits so A-I map to 1-9, that's doable; what maps to 0?
If you want to do it in a single command, then you could write:
echo "abcdABCD" | tr 'abcdABCD' '12341234'
Or, abbreviated slightly:
$ echo 'abecedenarian-DIABOLICALISM' | tr 'a-dA-D' '1-41-4'
12e3e4en1ri1n-4I12OLI31LISM
$
Related
I'm trying to convert filenames to remove unacceptable characters, but tr doesn't always treat its input arguments exactly as they're given.
For example:
$ echo "(hello) - {world}" | tr '()-{}' '_'
_______ _ _______
...whereas I only intended to replace (, ), -, { and }, all the characters between ) and { in ASCII collation order were replaced as well -- so every letter in the input also became a _!
Is there a way to make tr replace only the exact characters given in its argument?
tr's syntax is surprisingly complicated. It supports ranges, character classes, collation-based equivalence matching, etc.
To avoid surprises (when a string matches any of that syntax unexpectedly), we can convert our literal characters to a string of \### octal specifiers of those characters' ordinals:
trExpressionFor() {
printf %s "$1" | od -v -A n -b | tr ' ' '\\'
}
trL() { # name short for "tr-literal"
tr "$(trExpressionFor "$1")" "$(trExpressionFor "$2")"
}
...used as:
$ trExpressionFor '()-{}'
\050\051\055\173\175
$ echo "(hello) - {world}" | trL '()-{}' '_'
_hello_ _ _world_
Using raw output I have to quote some values of the output.
echo [{"a" : "b"}] | jq-win64.exe --raw-output ".[] | \"Result is: \" + .a + \".\""
generates
Result is: b.
but how can I generate
Result is: "b".
Unfortunately it has to run on Windows called from inside a CMD file.
You need to escape the slashes to escape a "
$ echo [{"a" : "b"}] | jq-win64.exe --raw-output ".[] | \"Result is: \\\"\" + .a + \"\\\".\""
Result is: "b".
A hacky workaround with less backslashing could be:
jq -r ".[] | \"Result is: \" + (.a|tojson)"
[REVISED to reflect OP goal.]
Since you're trying to output double quotes in a double quoted string, you need to escape the inner quotes. And to escape the inner quotes, you need to also escape the escaping backslashes. So a literal double quote would have to be entered as \\\". You can do this a little cleaner by using string interpolation instead of regular string concatenation.
jq -r ".[] | \"Result is: \\\"\(.a)\\\".\""
I have a string let's say
k=CHECK_${SOMETHING}_CUSTOM_executable.acs
Now I want to fetch only CUSTOM_executable from the above string. This is what I have tried so far in Unix
echo $k|awk -F '_' '{print $2}'
Can you explain how can i do this
Try this :
$ echo "$k"
CHECK_111_CUSTOM_executable.acs
code:
echo "$k" | awk 'BEGIN{FS=OFS="_"}{sub(/.acs/, "");print $3, $4}'
Assume the variable ${SOMETHING} has the value SOMETHING just for simplicity.
The following assignment, therefore,
k=CHECK_${SOMETHING}_CUSTOM_executable.acs
sets the value of k to CHECK_SOMETHING_CUSTOM_executable.acs.
When split into fields on _ by awk -F '_' (note the single quotes aren't necessary here).
You get the following fields:
$ echo "$k" | awk -F _ '{for (i=0; i<=NF; i++) {print i"="$i}}'
0=CHECK_SOMETHING_CUSTOM_executable.acs
1=CHECK
2=SOMETHING
3=CUSTOM
4=executable.acs
So to get the output you want simply use
echo "$k" | awk -F _ -v OFS=_ '{print $3,$4}'
Suppose if SOMETHING variable is having 111_222_333 (or) 111_222_333_444,
Use this:
$ k=CHECK_${SOMETHING}_CUSTOM_executable.acs
$ echo $k | awk 'BEGIN{FS=OFS="_"}{ print $(NF-1),$NF }'
(Or)
echo $k | awk -F_ '{ print $(NF-1), $NF }' OFS=_
Explanation :
NF - The number of fields in the current input record.
Try this simple awk:
awk -F[._] '{print $3"_"$4}' <<<"$k"
CUSTOM_executable
The -F[._] defines both dot and underline as field separator. Then awk prints the filed number 3 and 4 from $k as input.
If the k contains k='CHECK_${111_111}_CUSTOM_executable.acs', then use filed with numbers $4 and $5:
awk -F[._] '{print $4"_"$5}' <<<"$k"
CHECK_${111_111}_CUSTOM_executable.acs
| $1| |$2 | |$3| | $4 | | $5 | |$6|
You do not need to use awk, it can be done in bash easily. I assume that $SOMETHING does not contains _ characters (also CUSTOM and executable part is just some text, they also not contains _). Then:
k=CHECK_${SOMETHING}_CUSTOM_executable.acs
l=${k#*_}; l=${l#*_}; l=${l%.*};
This cuts anything from the beginning to the 2nd _ char, and chomps off anything after the last . char. Result is put into the l env.var.
If $SOMETHING may contain _ then a little bit work has to be done (I assume the CUSTOM and executable part does not contain _):
k=CHECK_${SOMETHING}_CUSTOM_executable.acs
l=${k%_*}; l=${l%_*}; l=${k#${l}_*}; l=${l%.*};
This chomps off everything after the last but one _ character, the cuts the result off from the original string. The last statement chomps the extension off. The result is in l env.var.
Or it can be done using regex:
[[ $k =~ ([^_]+_[^_]+)\.[^.]+$ ]] && l=${BASH_REMATCH[1]}
This matches any string containing two words separated by _ and finished with .<extension>. The extension part is chomped off and result is in l env.var.
I hope this helps!
1.txt
1|2|3
4|5|6
7|3|6
2.txt (double pipe)
1||2||3
4||5||6
expected
7|3|6
I want to compare 1.txt and 2.txt and print the difference . Note that the numbers of columns can vary each time
awk -F"|" 'NR==FNR{a[$0]++;next} !(a[$0])' 2.txt 1.txt
How can I modify the code to include delimiters in each files.
The code below works for first field alone but I am not sure how it separated the fields by double pipe
awk -F"|" 'NR==FNR{a[$1]++;next} !(a[$1])' 2.txt 1.txt
One simple workaround would be to squeeze the double delimiters in the second file before feeding to awk:
awk -F"|" 'NR==FNR{a[$0]++;next} !(a[$0])' <(tr -s '|' < 2.txt) 1.txt
For your sample input, it'd produce:
7|3|6
EDIT: You assert that
awk -F"|" 'NR==FNR{a[$1]++;next} !(a[$1])' 2.txt 1.txt
works. It doesn't do what you expect. It compares only the first field and not the entire line.
You can use this awk,
awk -F"|" 'NR==FNR{gsub(/\|\|/,"|",$0);a[$0]++;next} !(a[$0])' 2.txt 1.txt
I typically use bash features to accomplish this:
diff 1.txt <(sed 's/||/|/g' < 2.txt)
You can use regexp as a delimiter in gawk, like so if you don't mind if your output is unsorted (as arrays in awk), you can do it with a single command:
gawk 'BEGIN {FS="\\|\\|*"} {gsub(FS,"|") ; a[$0]++} END {for (k in a) {if ( a[k] > 0 ) { print k } } }'
BEGIN {FS="\\|\\|*"} ==> The field separator is one or more |
{gsub(FS,"|") ; a[$0]++} ==> On every line normalize the number of separator |s to one and store the line in an array, or if it's already in the array, increment the value related to it
END {for (k in a) {if ( a[k] > 0 ) { print k } } } finally print every array element where it found more than once.
I have a ~20GB csv file.
Sample file:
1,a#a.com,M
2,b#b.com,M
1,c#c.com,F
3,d#d.com,F
The primary key in this file is the first column.
I need to write two file, uniq.csv and duplicates.csv
uniq.csv should contain all non-duplicate records and duplicates.csv will contain all duplicate records with current timesstamp.
uniq.csv
1,a#a.com,M
2,b#b.com,M
3,d#d.com,F
duplicates.csv
2012-06-29 01:53:31 PM, 1,c#c.com,F
I am using Unix Sort so that I can take advantage of its External R-Way merge sorting algorithm
To identify uniq records
tail -n+2 data.txt | sort -t, -k1 -un > uniq.csv
To identify duplicate records
awk 'x[$1]++' FS="," data.txt | awk '{print d,$1}' "d=$(date +'%F %r')," > duplicates.csv
I was wondering if there is anyway to find both duplicates and uniq with a single scan of this large file?
Your awk script is nearly there. To find the unique lines, you merely need to use the in operator to test whether the entry is in the associate array or not. This allows you to collect the data in one pass through the data file and to avoid having to call sort.
tail -n +2 data.txt | \
awk '
BEGIN { OFS=FS="," }
{
if (!($1 in x)) {
print $0 > "/dev/fd/3"
}
x[$1]++
}
END {
for (t in x) {
print d, t, x[t]
}
}' d="$(date +'%F %r')" 3> uniq.csv > duplicates.csv
I got this question in an interview, a couple jobs ago.
One answer is to use uniq with the "-c" (count) option. An entry with a count of "1' is unique, and otherwise not unique.
sort foo | uniq -c | awk '{ if ($1 == 1) { write-to-unique } else {write-to-duplicate }'
If you want to write a special-purpose program and/or avoid the delay caused by sorting, I would use Python.
Read the input file, hashing each entry and ++ an integer value for each unique key that you encounter. Remember that hash values can collide even when the two items are not equal, so keep each key individually along with its count.
At EOF on the input, traverse the hash structure and spit each entry into one of two files.
You seem not to need sorted output, only categorized output, so the hashing should be faster. Constructing a hash is O(1), while sorting is O(I forget; is unix sort Nlog(N)?)
Here is a code on perl which will do processing in one scan
#!/usr/bin/perl
open(FI,"sort -t, -k1 < file.txt |");
open(FD,">duplicates.txt");
open(FU,">uniques.txt");
my #prev;
while(<FI>)
{
my (#cur) = split(',');
if($prev[0] && $prev[0]==$cur[0])
{
print FD localtime()." $_";
}
else
{
print FU $_;
}
#prev=#cur;
}