Create a CSV File with specific format from a String obtained after WS Invocation - r

The title is self explanatory. I am calling a web service which is returning a String like this :
First Name="Kunal";Middle Name="";Last Name="Bhowmick";Address 1="HGB";Address 2="cvf";Address 3="tfg";City="DF";State="KL";Country="MN";Postal Code="0012";Telephone="(+98)6589745623"
Now i have to write a shell script to create a csv file named CSV_Output.csv and the file must be formatted with the String content.
The format must be something like this :
Field Name(in yellow color) Value(in yellow color)
First Name Kunal
Middle Name
Last Name Bhowmick
Address 1 HGB
Address 2 cvf
Address 3 tfg
City DF
State KL
Country MN
Postal Code 0012
Telephone (+98)6589745623
Now I can easily generate a CSV file using redirection(>>), but how can i create and format a CSV file like in the format show above ?
Sorry, to be blunt and i have no code to show as well, as i am not understanding what to use here.
Kindly provide some suggestions(sample code). Any help is greatly appreciated .

an awk one-liner could convert the format:
awk -v RS="\\n|;" -v OFS="\t" -F= '{gsub(/"/,"");$1=$1}7' file
if you want the output to look better, you could pass the output to column and change the OFS like:
awk -v RS="\\n|;" -v OFS="#" -F= '{gsub(/"/,"");$1=$1}7' file|column -s"#" -t
the output is:
kent$ awk -v RS="\\n|;" -v OFS="#" -F= '{gsub(/"/,"");$1=$1}7' f|column -s"#" -t
First Name Kunal
Middle Name
Last Name Bhowmick
Address 1 HGB
Address 2 cvf
Address 3 tfg
City DF
State KL
Country MN
Postal Code 0012
Telephone (+98)658974562
short explanation:
awk #awk command
-v RS="\\n|;" #set line separator is \n(newline) or ;(semi)
-v OFS="\t" #set output field separator: <tab>
-F= #set "=" as field separator
'{gsub(/"/,""); #remove all double quotes
$1=$1} #$1=$1, to let awk reformat the line with given OFS
7' #the non-zero number to print the whole line.

Can be achieved using tr and column:
$ cat input
First Name="Kunal";Middle Name="";Last Name="Bhowmick";Address 1="HGB";Address 2="cvf";Address 3="tfg";City="DF";State="KL";Country="MN";Postal Code="0012";Telephone="(+98)6589745623"
$ cat input | tr ";" "\n" | column -s= -t | tr -d \"
First Name Kunal
Middle Name
Last Name Bhowmick
Address 1 HGB
Address 2 cvf
Address 3 tfg
City DF
State KL
Country MN
Postal Code 0012
Telephone (+98)6589745623
Split input on ;; pipe the output to column specifying = as the delimiter, get rid of quotes!
EDIT: Didn't realize that you want a CSV. In that event, use:
$ cat input | tr ";" "\n" | tr "=" "\t" | tr -d \"
which will result into a TAB delimited output.

Related

Comparison of two CSV files

I'm trying to use Awk to compare the content of two large CSV files (more than 5000 rows each) but I can't get what I want.
Here is my problem:
I have a first file (File1) with a list of names ($1) and cities ($2) whose structure looks like this:
john;london
marc;paris
karen;new york
ben;london
vic;dublin
I have a second file (File2) with other information where we find some names ($3) of File1:
45456;windows;john;454646
47764;mac;zack;470093
41225;mac;ben;622101
12634;windows;ben;218996
7856;windows;karen;637294
12;mac;finn;878317
2315;windows;beverly;221167
445;windows;lilly;12316
3232;mac;john;601316
4546;mac;fish;305035
487;windows;vic;447421
46464;mac;karen;468154
I would like to extract from File2 all the lines whose names appear in File1 while adding the cities associated to each name in File1. Here is an example of the result I am looking for:
45456;windows;john;454646;london
3232;mac;john;601316;london
7856;windows;karen;637294;new york
46464;mac;karen;468154;new york
41225;mac;ben;622101;london
12634;windows;ben;218996;london
487;windows;vic;447421;dublin
Could you help me?
Build an associative array of the first file, making the name the index and the city the value. For the second file check if the name features in the aray, if yes, print the line and attach the city.
awk -F';' 'NR==FNR{a[$1]=$2}$3 in a{print $0";"a[$3]}' File1 File2
45456;windows;john;454646;london
41225;mac;ben;622101;london
12634;windows;ben;218996;london
7856;windows;karen;637294;new york
3232;mac;john;601316;london
487;windows;vic;447421;dublin
46464;mac;karen;468154;new york
With bash, GNU sort and GNU join:
join -t ';' -1 1 -2 3 <(sort File1) <(sort -t ';' -k 3 File2) -o 2.1,2.2,1.1,2.4,1.2
Output:
12634;windows;ben;218996;london
41225;mac;ben;622101;london
45456;windows;john;454646;london
3232;mac;john;601316;london
46464;mac;karen;468154;new york
7856;windows;karen;637294;new york
487;windows;vic;447421;dublin
Using csvkit
csvjoin -d ';' -H -c 3,1 File2 File1 | csvformat -D ';' | tail -n +2

Convert specific column of file into upper case in unix (without using awk and sed)

My file is as below
file name = test
1 abc
2 xyz
3 pqr
How can i convert second column of file in upper case without using awk or sed.
You can use tr to transform from lowercase to uppercase. cut will extract the single columns and paste will combine the separated columns again.
Assumption: Columns are delimited by tabs.
paste <(cut -f1 file) <(cut -f2 file | tr '[:lower:]' '[:upper:]')
Replace file with your file name (that is test in your case).
In pure bash
#!/bin/bash
while read -r col1 col2;
do
printf "%s%7s\n" "$col1" "${col2^^}"
done < file > output-file
Input-file
$ cat file
1 abc
2 xyz
3 pqr
Output-file
$ cat output-file
1 ABC
2 XYZ
3 PQR

what is the alternate way to count the occurrence of each word without using 'uniq -c' command?

Is it possible to count the occurrence of each word like using uniq -c but with the count after the word rather than before?
Example scenario
Input file named as text1.txt which contain the following data
Renault:cilo:84563
Renault:cilo:84565
M&M:Thar:84566
Tata:nano:84567
M&M:quanto:84568
M&M:quanto:84569
The fields used in the above data are car_company:car_model:customerID
Desired result
cilo 2
Thar 1
nano 1
quanto 2
(car_model and number of cars sold grouped by car_model)
My code
cat test1.txt | cut -d: -f2 | uniq -c
Actual Result
2 cilo
1 Thar
1 nano
2 quanto
Is it possible to do the above process without using uniq -c ,so that I can swap the order of the fields (columns)?
You can use uniq, and simply post-process its output to swap the columns:
cut -d: -f2 test1.txt | uniq -c | awk '{print $2 "\t" $1 "\n" }'
EDIT: Added \n, as noted in a comment.
Save your commands output into a file "badresult";
cat test1.txt | cut -d: -f2 | uniq -c > badresult
Then cut the seventh field and save it into a file named "counts"(you should use space(" ") as a seperator);
cut -d" " -f7 badresult > counts
Then cut the eighth field and save it into a file named "models"(you should use space(" ") as a seperator);
cut -d" " -f8 badresult > models
Now you have your counts and models in seperate files. All you have to do is to show these two files seperately with "pr" command(-m: one file per column, -T:no pre-information)
pr -m -T models counts
Using awk:
cat test1.txt | cut -d: -f2 | uniq -c | awk '{ t = $1; $1 = $2; $2 = t; print }'
The little awk code exchanges fields 1 and 2 using a temporary.
You just need awk for this:
$ awk -F: '{a[$2]++} END {for (i in a) print i, a[i]}' file
cilo 2
quanto 2
nano 1
Thar 1
This goes through every line keeping track of how many times the second field has appeared. Since everything is stored in the array a, then it is just a matter of looping through it and printing its content.

Finding common elements from one file in a column of another file and output the entire row of the latter

I needed to extract all hits from one list (list.txt) which can be found in one of the columns of another (here in Data.txt) into a third (output.txt).
Data.txt (tab delimited)
some_data more_data other_data here yet_more_data etc
A B 2 Gee;Whiz;Hello 13 12
A B 2 Gee;Whizz;Hi 56 32
E 4 Btm;Lol 16 2
T 3 Whizz 13 3
List.txt
Gee
Whiz
Lol
Ideally output.txt looks like
some_data more_data other_data here yet_more_data etc
A B 2 Gee;Whiz;Hello 13 12
A B 2 Gee;Whizz;Hi 56 32
E 4 Btm;Lol 16 2
So I tried a shell script
for ids in List.txt
do
grep $ids Data.txt >> output.txt
done
except I typed out everything (cut and paste actually) in List.txt in said script.
Unfortunately it gave me an output.txt including the last line, I assume as 'Whizz' contains 'Whiz'.
I also tried cat Data.txt | egrep -F "List.txt" and that resulted in grep: conflicting matchers specified -- I suppose that was too naive of me. The actual files: List.txt contains a sorted list of 985 words, Data.txt has 115576 rows with 17 columns.
Some help/guidance would be much appreciated thanks.
Try something like this:
for ids in List.txt
do
grep "[TAB;]$ids[TAB;]" Data.txt >> output.txt
done
But it has two drawbacks:
"Data.txt" is scanned multiple times
You can get one line multiple times.
If it is problem try two step version:
cat List.txt | sed -e "s/.*/[TAB;]\0[TAB;]/g" > List_mod.txt
grep -f List_mod.txt Data.txt > output.txt
Note:
TAB character can be inserted by combination Ctrl-V following by Tab key in command line, and Tab character in editor. You have to check if your edit does not change tab to series of spaces.
The UNIX tool for general text processing is "awk":
awk '
NR==FNR { list[$0]; next }
{
for (word in list) {
if ($0 ~ "[\t;]" word "[\t;]") {
print
next
}
}
}
' List.txt Data.txt > output.txt

Unix Command for counting number of words which contains letter combination (with repeats and letters in between)

How would you count the number of words in a text file which contains all of the letters a, b, and c. These letters may occur more than once in the word and the word may contain other letters as well. (For example, "cabby" should be counted.)
Using sample input which should return 2:
abc abb cabby
I tried both:
grep -E "[abc]" test.txt | wc -l
grep 'abcdef' testCount.txt | wc -l
both of which return 1 instead of 2.
Thanks in advance!
You can use awk and use the return value of sub function. If successful substitution is made, the return value of the sub function will be the number of substitutions done.
$ echo "abc abb cabby" |
awk '{
for(i=1;i<=NF;i++)
if(sub(/a/,"",$i)>0 && sub(/b/,"",$i)>0 && sub(/c/,"",$i)>0) {
count+=1
}
}
END{print count}'
2
We keep the condition of return value to be greater than 0 for all three alphabets. The for loop will iterate over every word of every line adding the counter when all three alphabets are found in the word.
I don't think you can get around using multiple invocations of grep. Thus I would go with (GNU grep):
<file grep -ow '\w+' | grep a | grep b | grep c
Output:
abc
cabby
The first grep puts each word on a line of its own.
Try this, it will work
sed 's/ /\n/g' test.txt |grep a |grep b|grep c
$ cat test.txt
abc abb cabby
$ sed 's/ /\n/g' test.txt |grep a |grep b|grep c
abc
cabby
hope this helps..

Resources