Array values being overwritten in gawk - multidimensional-array

Sample of File I'm reading in
011084,31.0581,-87.0547, 25.9 AL BREWTON 3 SSE
012813,30.5467,-87.8808, 7.0 AL FAIRHOPE 2 NE
013160,32.8347,-88.1342, 38.1 AL GAINESVILLE LOCK
013511,32.7017,-87.5808, 67.1 AL GREENSBORO
013816,31.8700,-86.2542, 132.0 AL HIGHLAND HOME
015749,34.7442,-87.5997, 164.6 AL MUSCLE SHOALS AP
017157,34.1736,-86.8133, 243.8 AL SAINT BERNARD
017304,34.6736,-86.0536, 187.5 AL SCOTTSBORO
GAWK Code
#!/bin/gawk
BEGIN{
FS=",";
OFS=",";
}
{
print $1,$2,$3,$4
station=""$1 #Forces to be string
#Save latitude
stationInfo[station][lat]=$2
print "lat",stationInfo[station][lat]
#Save longitude
stationInfo[station][lon]=$3
print "lon",stationInfo[station][lon]
#Now try printing the latitude again
#It will return the value of the longitude instead
print "lat",stationInfo[station][lat]
print "---------------"
}
Sample output
011084,31.0581,-87.0547, 25.9 AL BREWTON 3 SSE
lat,31.0581
lon,-87.0547
lat,-87.0547
---------------
012813,30.5467,-87.8808, 7.0 AL FAIRHOPE 2 NE
lat,30.5467
lon,-87.8808
lat,-87.8808
---------------
For some reason the value stored in stationInfo[station][lat] is being overwritten by the longitude. I'm at a loss for what in the world is going on.
I'm using GAWK 4.1.1 on Fedora 22

Your problem is the fact that lon and lat are variables and evaluate to the empty string so this assignment stationInfo[station][lat]=$2 and stationInfo[station][lon]=$3 are assigning to stationInfo[station]["].
You need to quote the lat and lon in those (and the other) lines to use strings instead of variables.
#!/bin/gawk
BEGIN{
FS=",";
OFS=",";
}
{
print $1,$2,$3,$4
station=""$1 #Forces to be string
#Save latitude
stationInfo[station]["lat"]=$2
print "lat",stationInfo[station]["lat"]
#Save longitude
stationInfo[station]["lon"]=$3
print "lon",stationInfo[station]["lon"]
#Now try printing the latitude again
#It will return the value of the longitude instead
print "lat",stationInfo[station]["lat"]
print "---------------"
}

Related

Extract date from a text document in R

I am again here with an interesting problem.
I have a document like shown below:
"""UDAYA FILLING STATION ps\na MATTUPATTY ROAD oe\noe 4 MUNNAR Be:\nSeat 4 04865230318 Rat\nBree 4 ORIGINAL bepas e\n\noe: Han Die MC DE ER DC I se ek OO UO a Be ten\" % aot\n: ag 29-MAY-2019 14:02:23 [i\n— INVOICE NO: 292 hee fos\nae VEHICLE NO: NOT ENTERED Bea\nss NOZZLE NO : 1 ome\n- PRODUCT: PETROL ae\ne RATE : 75.01 INR/Ltr yee\n“| VOLUME: 1.33 Ltr ae\n~ 9 =6AMOUNT: 100.00 INR mae wae\nage, Ee pel Di EE I EE oe NE BE DO DC DE a De ee De ae Cate\notome S.1T. No : 27430268741C =. ver\nnes M.S.T. No: 27430268741V ae\n\nThank You! Visit Again\n""""
From the above document, I need to extract date highlighted in bold and Italics.
I tried with strpdate function but did not get the desired results.
Any help will be greatly appreciated.
Thanks in advance.
Assuming you only want to capture a single date, you may use sub here:
text <- "UDAYA FILLING STATION ps\na MATTUPATTY ROAD oe\noe 4 MUNNAR Be:\nSeat 4 04865230318 Rat\nBree 4 ORIGINAL bepas e\n\noe: Han Die MC DE ER DC I se ek OO UO a Be ten\" % aot\n: ag 29-MAY-2019 14:02:23 [i\n— INVOICE NO: 292 hee fos\nae VEHICLE NO: NOT ENTERED Bea\nss NOZZLE NO : 1 ome\n- PRODUCT: PETROL ae\ne RATE : 75.01 INR/Ltr yee\n“| VOLUME: 1.33 Ltr ae\n~ 9 =6AMOUNT: 100.00 INR mae wae\nage, Ee pel Di EE I EE oe NE BE DO DC DE a De ee De ae Cate\notome S.1T. No : 27430268741C =. ver\nnes M.S.T. No: 27430268741V ae\n\nThank You! Visit Again\n"
date <- sub("^.*\\b(\\d{2}-[A-Z]+-\\d{4})\\b.*", "\\1", text)
date
[1] "29-MAY-2019"
If you had the need to match multiple such dates in your text, then you may use regmatches along with regexec:
text <- "Hello World 29-MAY-2019 Goodbye World 01-JAN-2018"
regmatches(text,regexec("\\b(\\d{2}-[A-Z]+-\\d{4})\\b", text))[[1]]
[1] "29-MAY-2019" "29-MAY-2019"

SAS plot SGPLOT

I have 3 columns A, B, C. I tried to do a overlay plot, which shows one line of B and one line of C (A is the x axis). However, when I use the code below, the output looks super ugly. What is a better way to do it? Thank you.
proc plot data=djia;
plot A*B='*'
A*C='o' / overlay box;
title 'Plot of Highs and Lows';
title2 'for the Dow Jones Industrial Average';
run;
http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a002473570.htm
In SGPLOT the plotting statements, by defaults, plot onto the same graphing 'canvas', and thus overlay. The first statements are drawn first, so you can produce any desired 'z-effect' for the overlaying.
Example plotting djia data.
proc sgplot data=djia;
band x=year lower=low upper=high / fillatrrs=(color=vlig);
series x=year y=high / markers;
series x=year y=low / markers;
run;
The SAS knowledge base article http://support.sas.com/kb/51/821.html shows how to band (fill) the region between low and high.
Data for example
* from http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a000075748.htm#a000075747 ;
data djia;
input Year #7 HighDate date7. High #24 LowDate date7. Low;
format highdate lowdate date7.;
datalines;
1954 31DEC54 404.39 11JAN54 279.87
1955 30DEC55 488.40 17JAN55 388.20
1956 06APR56 521.05 23JAN56 462.35
1957 12JUL57 520.77 22OCT57 419.79
1958 31DEC58 583.65 25FEB58 436.89
1959 31DEC59 679.36 09FEB59 574.46
1960 05JAN60 685.47 25OCT60 568.05
1961 13DEC61 734.91 03JAN61 610.25
1962 03JAN62 726.01 26JUN62 535.76
1963 18DEC63 767.21 02JAN63 646.79
1964 18NOV64 891.71 02JAN64 768.08
1965 31DEC65 969.26 28JUN65 840.59
1966 09FEB66 995.15 07OCT66 744.32
1967 25SEP67 943.08 03JAN67 786.41
1968 03DEC68 985.21 21MAR68 825.13
1969 14MAY69 968.85 17DEC69 769.93
1970 29DEC70 842.00 06MAY70 631.16
1971 28APR71 950.82 23NOV71 797.97
1972 11DEC72 1036.27 26JAN72 889.15
1973 11JAN73 1051.70 05DEC73 788.31
1974 13MAR74 891.66 06DEC74 577.60
1975 15JUL75 881.81 02JAN75 632.04
1976 21SEP76 1014.79 02JAN76 858.71
1977 03JAN77 999.75 02NOV77 800.85
1978 08SEP78 907.74 28FEB78 742.12
1979 05OCT79 897.61 07NOV79 796.67
1980 20NOV80 1000.17 21APR80 759.13
1981 27APR81 1024.05 25SEP81 824.01
1982 27DEC82 1070.55 12AUG82 776.92
1983 29NOV83 1287.20 03JAN83 1027.04
1984 06JAN84 1286.64 24JUL84 1086.57
1985 16DEC85 1553.10 04JAN85 1184.96
1986 02DEC86 1955.57 22JAN86 1502.29
1987 25AUG87 2722.42 19OCT87 1738.74
1988 21OCT88 2183.50 20JAN88 1879.14
1989 09OCT89 2791.41 03JAN89 2144.64
1990 16JUL90 2999.75 11OCT90 2365.10
1991 31DEC91 3168.83 09JAN91 2470.30
1992 01JUN92 3413.21 09OCT92 3136.58
1993 29DEC93 3794.33 20JAN93 3241.95
1994 31JAN94 3978.36 04APR94 3593.35
;
In general in SGxxx procs you just add more statements to get more things to appear on the graph. For example you might want to show regression lines for AGE * WEIGHT and AGE * HEIGHT on the same graph.
proc sort data=sashelp.class out=class ;
by age;
run;
proc sgplot data=class;
reg x=age y=weight / legendlabel='Weight';
reg x=age y=height / legendlabel='Height' y2axis;
run;

parsing text file in tcl and creating dictionary of key value pair where values are in list format

How to seperate following text file and Keep only require data for corresponding :
for example text file have Format:
Name Roll_number Subject Experiment_name Marks Result
Joy 23 Science Exp related to magnet 45 pass
Adi 12 Science Exp electronics 48 pass
kumar 18 Maths prime numbers 49 pass
Piya 19 Maths number roots 47 pass
Ron 28 Maths decimal numbers 12 fail
after parsing above Information and storing in dictionary where key is subject(unique) and values corresponding to subject is list of pass Student name
set studentInfo [dict create]; # Creating empty dictionary
set fp [open input.txt r]
set line_no 0
while {[gets $fp line]!=-1} {
incr line_no
# Skipping line number 1 alone, as it has the column headers
# You can alter this logic, if you want to
if {$line_no==1} {
continue
}
if {[regexp {(\S+)\s+\S+\s+(\S+).*\s(\S+)} $line match name subject result]} {
if {$result eq "pass"} {
# Appending the student's name with key value as 'subject'
dict lappend studentInfo $subject $name
}
}
}
close $fp
puts [dict get $studentInfo]
Output :
Science {Joy Adi} Maths {kumar Piya}

Row count for a column

In my subreport I want do display for eg.
Number of clients born in 1972: 34
So in the database I have a list of their birth years
How can I display this number in a field?
Here is a Sample of the data:
<Born> <Name> <BleBle>
1981 Mnr EH Van Niekerk 9517
1982 MEV A BELL 9520
1972 Mnr GI van der Westhuize 9517
1987 Mnr A Juyn 9517
1983 Mev MJC Prinsloo 9513
1972 Mnr WA Van Rensburg 9517
1989 Kmdt EL Van Der Colff 9514
1972 Mnr JS Jansen Van Vuuren 9517
So if this was all the data the output would have to be
Number of clients born in 1972: 3
Create a variable BORN_IN_1972.
Set its "Variable class" to java.lang.Integer.
Set "Calculation" to "Count".
Set "Variable Expression" to $F{Born}.
Set "Initial Value Expression" to 0.
Than add "Summary" band to your report. And put static text "Number of clients born in 1972:" and text field "$V{BORN_IN_1972}" into it.
Assuming birth year is a string:
SELECT COUNT(*)
FROM MyClients
WHERE birth_year = '1972'
And if birth year is being used as an input control:
SELECT COUNT(*)
FROM MyClients
WHERE birth_year = $P{birth_year}
To count non-zero records in jasper use the expression below -
( $F{test} == 0.0 ? null : $F{test} )

Merging two files horizontally and formatting

I have two files as follows:
File_1
Austin
Los Angeles
York
San Ramon
File_2
Texas
California
New York
California
I want to merge them horizontally as follows:
Austin Texas
Los Angeles California
York New York
San Ramon California
I am able to merge horizontally by using paste command, but the formatting is going haywire.
Austin Texas
Los Angeles California
York New York
San Ramon California
I realize that paste is working as it is supposed to, but can someone point me in the right direction to get the formatting right.
Thanks.
paste is using a tab when 'merging' the file, so maybe you have to post-process the file and remove the tab with spaces:
paste File_1 File_2 | awk 'BEGIN { FS = "\t" } ; {printf("%-20s%s\n",$1,$2) }'
result:
Austin Texas
Los Angeles California
York New York
San Ramon California
Firstly you have to check number of characters in the longest line. Than you may use fmt to pad line from the first file to greater length. Finish it using paste.
If you have an idea about the field width, you could do something like this:
IFS_BAK="$IFS"
IFS=$'\t'
paste file_1 file_2 \
| while read city state; do
printf "%-15s %-15s\n" "$city" "$state"
done
IFS="$IFS_BAK"
Or this shorter version:
paste file_1 file_2 | while IFS=$'\t' read city state; do
printf "%-15s %-15s\n" "$city" "$state"
done
Or use the column tool from bsdmainutils:
paste file_1 file_2 | column -s $'\t' -t

Resources