Output R dataframe to SAS format Issue - r

I have a dataset that looks like this:
df_dummy = data.frame(
Company=c("0001","0002","0003","0004","0005"),
Measure=c("A","B","C","D","E"),
Num=c(10,10,10,10,10),
Den=c(20,20,20,20,20),
Rate=c(50.0,50.0,50.0,50.0,50.0)
)
df_dummy$Company <- as.character(df_dummy$Company)
df_dummy$Measure <- as.character(df_dummy$Measure)
I am using this to export to an .xpt file
write.xport(df_dummy, file = "data/tmp.xpt")
lookup.xport("data/tmp.xpt")
In SAS, I use this code to import:
libname sasfile 'PATH\data';
libname xptfile xport 'PATH\data\tmp.xpt' access=readonly;
proc copy inlib=xptfile outlib=sasfile;
run;
The table looks fine, but the rate doesn't show the decimal point.
In my actual dataset, there are a lot more rows but it's the same format essentially and if I run a lookup.xport I get this:
Variables in data set `MEASURES':
dataset name type format flength fdigits iformat iflength ifdigits label nobs
MEASURES ID character 0 0 0 0 29064
MEASURES MEASURE character 0 0 0 0 29064
MEASURES NUM numeric 0 0 0 0 29064
MEASURES DEN numeric 0 0 0 0 29064
MEASURES RATE numeric 0 0 0 0 29064
However, if I use the same SAS code to import this, I get something that looks completely off and I can't figure out what's causing it.

I cannot replicate your issue using R (3.4.1) and SAS (9.4 TS1M4) on Mac OS X with both being 64 bit versions. The 32/64 bit versions can cause issues sometimes.
I used R Studio and SAS UE, both freely available for education usage.
Full R code:
install.packages("SASxport")
library("SASxport")
df_dummy = data.frame(
Company=c("0001","0002","0003","0004","0005"),
Measure=c("A","B","C","D","E"),
Num=c(10,10,10,10,10),
Den=c(20,20,20,20,20),
Rate=c(50.0,50.0,50.0,50.0,50.0)
)
df_dummy$Company <- as.character(df_dummy$Company)
df_dummy$Measure <- as.character(df_dummy$Measure)
write.xport(df_dummy, file = "tmp.xpt")
Full SAS Code:
libname sasfile '/folders/myfolders/';
libname xptfile xport '/folders/myfolders/tmp.xpt' access=readonly;
proc copy inlib=xptfile outlib=sasfile;
run;

Your example works. Even with older version or R. Make sure your transport file had not been corrupted by transferring between machines. A transport file is binary data with fixed length 80 byte records, but much of data looks like ASCII codes.
SAS transport files follow the SAS V5 rules for names. Make sure that your member name and variable names are valid SAS names and are not longer than 8 characters. Character variables cannot be longer than 200 characters.
You can quickly look at the file using a simple data step. Especially for your small example. So if you see that the length is not exactly a multiple of 80 or you see that the header records do not start at the beginning of an 80 byte record then something has corrupted the file.
56 data _null_;
57 infile '/test/tmp.xpt' lrecl=80 recfm=f ;
58 input;
59 list;
60 run;
NOTE: The infile '/test/tmp.xpt' is:
Filename=/test/tmp.xpt,
Owner Name=xxxxx,Group Name=xxxxx,
Access Permission=-rw-r--r--,
Last Modified=29Sep2017:09:16:16,
File Size (bytes)=1680
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
1 HEADER RECORD*******LIBRARY HEADER RECORD!!!!!!!000000000000000000000000000000
2 CHAR SAS SAS SASLIB 7.00 R 3.0.2. 29SEP17:09:16:16
ZONE 54522222545222225454442232332222523232302222222222222222222222223354533333333333
NUMR 3130000031300000313C92007E000000203E0E200000000000000000000000002935017A09A16A16
3 29SEP17:09:16:16
4 HEADER RECORD*******MEMBER HEADER RECORD!!!!!!!000000000000000001600000000140
5 HEADER RECORD*******DSCRPTR HEADER RECORD!!!!!!!000000000000000000000000000000
6 CHAR SAS DF_DUMMYSASDATA 7.00 R 3.0.2. 29SEP17:09:16:16
ZONE 54522222445454455454454232332222523232302222222222222222222222223354533333333333
NUMR 3130000046F45DD9313414107E000000203E0E200000000000000000000000002935017A09A16A16
7 29SEP17:09:16:16
8 HEADER RECORD*******NAMESTR HEADER RECORD!!!!!!!000000000500000000000000000000
9 CHAR ........COMPANY ........
ZONE 00000000444544522222222222222222222222222222222222222222222222220000000022222222
NUMR 020008013FD01E900000000000000000000000000000000000000000000000000000000000000000
10 CHAR ....................................................................MEASURE
ZONE 00000000000000000000000000000000000000000000000000000000000000000000444555422222
NUMR 00000000000000000000000000000000000000000000000000000000000002000802D51352500000
11 CHAR ........ ....................
ZONE 22222222222222222222222222222222222222222222000000002222222200000000000000000000
NUMR 00000000000000000000000000000000000000000000000000000000000000000008000000000000
12 CHAR ................................................NUM
ZONE 00000000000000000000000000000000000000000000000045422222222222222222222222222222
NUMR 000000000000000000000000000000000000000001000803E5D00000000000000000000000000000
13 CHAR ........ ........................................
ZONE 22222222222222222222222200000000222222220000000100000000000000000000000000000000
NUMR 00000000000000000000000000000000000000000000000000000000000000000000000000000000
14 CHAR ............................DEN
ZONE 00000000000000000000000000004442222222222222222222222222222222222222222222222222
NUMR 000000000000000000000100080445E0000000000000000000000000000000000000000000000000
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
15 CHAR ........ ............................................................
ZONE 22220000000022222222000000010000000000000000000000000000000000000000000000000000
NUMR 00000000000000000000000000080000000000000000000000000000000000000000000000000000
16 CHAR ........RATE ........
ZONE 00000000545422222222222222222222222222222222222222222222222222220000000022222222
NUMR 01000805214500000000000000000000000000000000000000000000000000000000000000000000
17 CHAR ....... ....................................................
ZONE 00000002000000000000000000000000000000000000000000000000000022222222222222222222
NUMR 00000000000000000000000000000000000000000000000000000000000000000000000000000000
18 HEADER RECORD*******OBS HEADER RECORD!!!!!!!000000000000000000000000000000
19 CHAR 0001 A A ......B.......B2......0002 B A ......B.......B2......
ZONE 33332222422222224A000000410000004300000033332222422222224A0000004100000043000000
NUMR 00010000100000001000000024000000220000000002000020000000100000002400000022000000
20 CHAR 0003 C A ......B.......B2......0004 D A ......B.......B2......
ZONE 33332222422222224A000000410000004300000033332222422222224A0000004100000043000000
NUMR 00030000300000001000000024000000220000000004000040000000100000002400000022000000
21 CHAR 0005 E A ......B.......B2......
ZONE 33332222422222224A00000041000000430000002222222222222222222222222222222222222222
NUMR 00050000500000001000000024000000220000000000000000000000000000000000000000000000
NOTE: 21 records were read from the infile '/test/tmp.xpt'.

Related

Extracting record from big endian data

I have the following code for network protocol implementation. As the protocol is big endian, I wanted to use the Bit_Order attribute and High_Order_First value but it seems I made a mistake.
With Ada.Unchecked_Conversion;
with Ada.Text_IO; use Ada.Text_IO;
with System; use System;
procedure Bit_Extraction is
type Byte is range 0 .. (2**8)-1 with Size => 8;
type Command is (Read_Coils,
Read_Discrete_Inputs
) with Size => 7;
for Command use (Read_Coils => 1,
Read_Discrete_Inputs => 4);
type has_exception is new Boolean with Size => 1;
type Frame is record
Function_Code : Command;
Is_Exception : has_exception := False;
end record
with Pack => True,
Size => 8;
for Frame use
record
Function_Code at 0 range 0 .. 6;
Is_Exception at 0 range 7 .. 7;
end record;
for Frame'Bit_Order use High_Order_First;
for Frame'Scalar_Storage_Order use High_Order_First;
function To_Frame is new Ada.Unchecked_Conversion (Byte, Frame);
my_frame : Frame;
begin
my_frame := To_Frame (Byte'(16#32#)); -- Big endian version of 16#4#
Put_Line (Command'Image (my_frame.Function_Code)
& " "
& has_exception'Image (my_frame.Is_Exception));
end Bit_Extraction;
Compilation is ok but the result is
raised CONSTRAINT_ERROR : bit_extraction.adb:39 invalid data
What did I forget or misunderstand ?
UPDATE
The real record in fact is
type Frame is record
Transaction_Id : Transaction_Identifier;
Protocol_Id : Word := 0;
Frame_Length : Length;
Unit_Id : Unit_Identifier;
Function_Code : Command;
Is_Exception : Boolean := False;
end record with Size => 8 * 8, Pack => True;
for Frame use
record
Transaction_Id at 0 range 0 .. 15;
Protocol_Id at 2 range 0 .. 15;
Frame_Length at 4 range 0 .. 15;
Unit_id at 6 range 0 .. 7;
Function_Code at 7 range 0 .. 6;
Is_Exception at 7 range 7 .. 7;
end record;
Where Transaction_Identifier, Word and Length are 16-bit wide.
These ones are displayed correctly if I remove the Is_Exception field and extend Function_Code to 8 bits.
The dump of the frame to decode is as following:
00000000 00 01 00 00 00 09 11 03 06 02 2b 00 64 00 7f
So my only problem is really to extract the 8th bit of the last byte.
So,
for Frame use
record
Transaction_Id at 0 range 0 .. 15;
Protocol_Id at 2 range 0 .. 15;
Frame_Length at 4 range 0 .. 15;
Unit_id at 6 range 0 .. 7;
Function_Code at 7 range 0 .. 6;
Is_Exception at 7 range 7 .. 7;
end record;
It seems you want Is_Exception to be the the LSB of the last byte?
With for Frame'Bit_Order use System.High_Order_First; the LSB will be bit 7,
(also, 16#32# will never be -- Big endian version of 16#4#, the bit pattern just doesn't match)
It may be more intuitive and clear to specify all of your fields relative to the word they're in, rather than the byte:
Unit_ID at 6 range 0..7;
Function_Code at 6 range 8 .. 14;
Is_Exception at 6 range 15 .. 15;
Given the definition of Command above, the legal values for the last byte will then be:
2 -> READ_COILS FALSE
3 -> READ_COILS TRUE
8 -> READ_DISCRETE_INPUTS FALSE
9 -> READ_DISCRETE_INPUTS TRUE
BTW,
by applying your update to your original program, and adding/changing the following, you program works for me
add
with Interfaces;
add
type Byte_Array is array(1..8) of Byte with Pack;
change, since we don't know the definition
Transaction_ID : Interfaces.Unsigned_16;
Protocol_ID : Interfaces.Unsigned_16;
Frame_Length : Interfaces.Unsigned_16;
Unit_ID : Interfaces.Unsigned_8;
change
function To_Frame is new Ada.Unchecked_Conversion (Byte_Array, Frame);
change
my_frame := To_Frame (Byte_Array'(00, 01, 00, 00, 00, 09, 16#11#, 16#9#));
I finally found what was wrong.
In fact, the Modbus Ethernet Frame definition mentioned that, in case of exception, the returned code should be the function code plus 128 (0x80) (see explanation on Wikipedia). That's the reason why I wanted to represent it through a Boolean value but my representation clauses were wrong.
The correct clauses are these ones :
for Frame use
record
Transaction_Id at 0 range 0 .. 15;
Protocol_Id at 2 range 0 .. 15;
Frame_Length at 4 range 0 .. 15;
Unit_id at 6 range 0 .. 7;
Is_Exception at 6 range 8 .. 8;
Function_Code at 6 range 9 .. 15;
end record;
This way, the Modbus network protocol is correctly modelled (or not but at least, my code is working).
I really thank egilhh and simonwright for making me find what was wrong and explain the semantics behind the aspects.
Obviously, I don't know who reward :)
Your original record declaration works fine (GNAT complains about the Pack, warning: pragma Pack has no effect, no unplaced components). The problem is with working out the little-endian Byte.
---------------------------------
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | BE bit numbers
---------------------------------
| c c c c c c c | e |
---------------------------------
| 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | LE bit numbers
---------------------------------
so if you want the Command to be Read_Discrete_Inputs, the Byte needs to have BE bit 4 (LE bit 3) set i.e. LE 16#8#.
Take a look at this AdaCore post on bit order and byte order to see how they handle it. After reading that, you will probably find that the bit order of your frame value is really 16#08#, which probably is not what you are expecting.
Big Endian / Little Endian typically refers to Byte order rather than bit order, so when you see that Network protocols are Big Endian, they mean Byte order. Avoid setting Bit_Order for your records. In modern systems, you will almost never need that.
Your record is only one byte in size, so byte order won't matter for it by itself. Byte order comes into play when you have larger field values (>8 bits long).
The bit_order pragma doesn't reverse the order that the bits appear in memory. It simply defines whether the most significant bit (left most) will be logically referred to as zero (High_Order_First) or the least significant bit will be referred to as zero (Low_Order_First) when interpreting the First_Bit and Last_Bit offsets from the byte position in the representation clause. Keep in mind that these offsets are taken from the MSB or LSB of the scalar the record component belongs to AS A VALUE. So in order for the byte positions to carry the same meaning on a little endian CPU as they do on a big endian CPU (as well as the in memory representation of multibyte machine scalars, which exist when one or more record components with the same byte position have a last_bit value which exceeds the capacity of a single byte) then 'Scalar_Storage_Order must also be specified.

R data.table fread fails on special character

I can only give you picture of data I'm working with or the character that creates my problems in .csv file. I don't know how to get that character.
This pillar character is stopping fread working. Is there away to escape it? readr read_csv works through them with no problem. I have tried to drop, make it character column, use comment.char = "", but nothing seems to work.
Here what I'm hoping to get out (what I get out with read_csv)
# A tibble: 5 x 4
X1 trade date trade_condition
<dbl> <dbl> <date> <chr>
1 2902 28.3 2019-01-14 -12------P----
2 2903 28.0 2019-01-14 P
3 2904 28.0 2019-01-14 P
4 2905 28.0 2019-01-14 P
5 2906 28.1 2019-01-14 P
I'm using data.table_1.12.0
Here is Verbose = T
omp_get_max_threads() = 8
omp_get_thread_limit() = 2147483647
DTthreads = 0
RestoreAfterFork = true
Input contains no \n. Taking this to be a filename to open
[01] Check arguments
Using 8 threads (omp_get_max_threads()=8, nth=8)
NAstrings = [<<NA>>]
None of the NAstrings look like numbers.
show progress = 1
0/1 column will be read as integer
[02] Opening the file
Opening file C:/Users/Markku/Desktop/KONECRANES_2019.01.14/trades.csv
File opened, size = 592KB (606768 bytes).
Memory mapped ok
[03] Detect and skip BOM
[04] Arrange mmap to be \0 terminated
\n has been found in the input and different lines can end with different line endings (e.g. mixed \n and \r\n in one file). This is common and ideal.
[05] Skipping initial rows if needed
Positioned on line 1 starting: <<,trade,date,trade_condition,sy>>
[06] Detect separator, quoting rule, and ncolumns
Detecting sep automatically ...
sep=',' with 100 lines of 9 fields using quote rule 0
Detected 9 columns on line 1. This line is either column names or first data row. Line starts as: <<,trade,date,trade_condition,sy>>
Quote rule picked = 0
fill=false and the most number of columns found is 9
[07] Detect column types, good nrow estimate and whether first row is column names
Number of sampling jump points = 10 because (606767 bytes from row 1 to eof) / (2 * 27623 jump0size) == 10
Type codes (jump 000) : 57AAAA5AA Quote rule 0
A line with too-few fields (4/9) was found on line 4 of sample jump 7. Most likely this jump landed awkwardly so type bumps here will be skipped.
A line with too-few fields (4/9) was found on line 13 of sample jump 9. Most likely this jump landed awkwardly so type bumps here will be skipped.
Type codes (jump 010) : 57AAAA5AA Quote rule 0
'header' determined to be true due to column 2 containing a string on row 1 and a lower type (float64) in the rest of the 858 sample rows
=====
Sampled 858 rows (handled \n inside quoted fields) at 11 jump points
Bytes from first data row on line 2 to the end of last row: 606683
Line length: mean=213.01 sd=86.78 min=59 max=372
Estimated number of rows: 606683 / 213.01 = 2849
Initial alloc = 5698 rows (2849 + 100%) using bytes/max(mean-2*sd,min) clamped between [1.1*estn, 2.0*estn]
=====
[08] Assign column names
[09] Apply user overrides on column types
After 0 type and 0 drop user overrides : 57AAAA5AA
[10] Allocate memory for the datatable
Allocating 9 column slots (9 - 0 dropped) with 5698 rows
[11] Read the data
jumps=[0..1), chunk_size=606683, total_size=606683
Restarting team from jump 0. nSwept==0 quoteRule==1
jumps=[0..1), chunk_size=606683, total_size=606683
Restarting team from jump 0. nSwept==0 quoteRule==2
jumps=[0..1), chunk_size=606683, total_size=606683
Restarting team from jump 0. nSwept==0 quoteRule==3
jumps=[0..1), chunk_size=606683, total_size=606683
Read 2903 rows x 9 columns from 592KB (606768 bytes) file in 00:00.014 wall clock time
[12] Finalizing the datatable
Type counts:
2 : int32 '5'
1 : float64 '7'
6 : string 'A'
=============================
0.003s ( 21%) Memory map 0.001GB file
0.007s ( 50%) sep=',' ncol=9 and header detection
0.000s ( 0%) Column type detection using 858 sample rows
0.000s ( 0%) Allocation of 5698 rows x 9 cols (0.000GB) of which 2903 ( 51%) rows used
0.004s ( 29%) Reading 1 chunks (0 swept) of 0.579MB (each chunk 2903 rows) using 1 threads
+ 0.000s ( 0%) Parse to row-major thread buffers (grown 0 times)
+ 0.002s ( 14%) Transpose
+ 0.002s ( 14%) Waiting
0.000s ( 0%) Rereading 0 columns due to out-of-sample type exceptions
0.014s Total
Warning message:
In fread(trades_file, verbose = T) :
Stopped early on line 2905. Expected 9 fields but found 4. Consider fill=TRUE and comment.char=. First discarded non-empty line: <<2903,28.04,2019-01-14,"P>>

Obtaining pixel value from 16-bit tiff images using magick package in R

Does anyone know how to obtain the pixel value for each channel (RGB) from 16-bit tiff images using the magick package in R? Currently I am using Mathematica to perform this operation, because I could not find an equivalent way to doing it in mathematica.
I have tried to read the pixel value from the image-magick package and the results is a raw type (e.g. "ff"). I used the function rawToNum (package "pack") to convert the raw type to numeric and the results is close to what I obtain using ImageDate function in Mathematica, but not exactly the same.
You can also access the pixels as a numeric array with the magick package. The example is based on this vignette from the package.
library(magick)
tiger <- image_read('http://jeroen.github.io/images/tiger.svg')
tiger_tiff <- image_convert(tiger, "tiff")
# Access data in raw format and convert to integer
tiger_array <- as.integer(tiger_tiff[[1]])
Then if check the dimension and type you get:
dim(tiger_array)
[1] 900 900 4
is.numeric(tiger_array)
[1] TRUE
I don't know too much about R at all, but I guess you can "shell out" and execute an external command, using system() or somesuch.
If, so, maybe you can use this. First, let's make a 16-bit TIFF file that is a gradient from red-blue just 10 pixels wide and 1 pixel tall:
convert -size 10x1 gradient:red-blue image.tiff
Now we can dump the pixels to a file using ImageMagick:
convert image.tiff rgb:image.rgb
# Now check its length - yes, 60 bytes = 10 pixels with 2 bytes each for RG &B
ls -l image.rgb
-rw-r--r-- 1 mark staff 60 11 Jul 10:32 image.rgb
We can also write the data to stdout like this:
convert image.tiff rgb:-
and also look at it with 1 pixel per line (6 bytes)
convert image.tiff rgb:- | xxd -g 3 -c 6
00000000: ffff00 000000 ...... # Full Red, no Green, no Blue
00000006: 8de300 00721c ....r. # Lots of Red, no Green, a little Blue
0000000c: 1cc700 00e338 .....8
00000012: aaaa00 005555 ....UU
00000018: 388e00 00c771 8....q
0000001e: c77100 00388e .q..8.
00000024: 555500 00aaaa UU....
0000002a: e33800 001cc7 .8....
00000030: 721c00 008de3 r.....
00000036: 000000 00ffff ...... # No Red, no Green, full Blue
I'm hoping you can do something like that in R, with:
system("convert image.tif rgb:-")
Another way of dumping the pixels might be with Perl to slurp the entire file and then unpack the contained unsigned shorts and print them one per line:
convert image.tiff rgb: | perl -e 'my $str=do{local $/; <STDIN>}; print join("\n",unpack("v*",$str)),"\n";'
Sample Output
65535 # Full Red
0 # No Green
0 # No Blue
58253 # Lots of Red
0 # No Green
7282 # A little Blue
50972 # Moderate Red
0
14563
43690
0
21845
36408
0
29127
29127
0
36408
21845
0
43690
14563
0
50972
7282
0 # No Green
58253 # Lots of Blue
0 # No Red
0 # No Green
65535 # Full Blue
Another way of seeing the data may be using od and awk like this:
convert image.tiff rgb: | od -An -tuS | awk '{for(i=1;i<=NF;i++){print $i}}'
65535
0
0
58253
0
7282
50972
0
14563
43690
0
21845
36408
0
29127
29127
0
36408
21845
0
43690
14563
0
50972
7282
0
58253
0
0
65535
where the -An suppresses printing of the address, and the -tuS says the type of the data is unsigned short.
Perhaps a slightly simpler way in ImageMagick would be to use the txt: output format.
Using Mark Setchell's image:
convert -size 10x1 gradient:red-blue image.tiff
Using TXT: as
convert image.tiff txt: | sed -n 's/^.*[(]\(.*\)[)].*[#].*$/\1/p'
Produces:
65535,0,0
58253,0,7282
50972,0,14563
43690,0,21845
36408,0,29127
29127,0,36408
21845,0,43690
14563,0,50972
7282,0,58253
0,0,65535
or Using TXT: to include the pixel coordinates
convert image.tiff txt: | sed -n 's/^\(.*[)]\).*[#].*$/\1/p'
Produces:
0,0: (65535,0,0)
1,0: (58253,0,7282)
2,0: (50972,0,14563)
3,0: (43690,0,21845)
4,0: (36408,0,29127)
5,0: (29127,0,36408)
6,0: (21845,0,43690)
7,0: (14563,0,50972)
8,0: (7282,0,58253)
9,0: (0,0,65535)
Thank you all. The best answer I found was given by a student of mine using the package raster:
library(raster)
img <- stack(filename)
x <- as.matrix(raster(img, 1)) # here we specify the layer 1, 2 or 3
The only issue is that the function as.matrix of the package raster may be confused with the one from the base package, so it may be necessary to specify raster::as.matrix.
#Nate_A gave the answer, but three cents short of a dollar.
After
dim(tiger_array)
[1] 900 900 4
You get each channel color of first pixel by
tiger_array[1,1,1] # red
tiger_array[1,1,2] # green
tiger_array[1,1,3] # blue
Or if you prefer between 0 - 255
tiger_array[1,1,1]*255
tiger_array[1,1,2]*255
tiger_array[1,1,3]*255

Counting observations using multiple BY groups SAS

I am examining prescription patterns within a large EHR dataset. The data is structured so that we are given several key bits of information, such as patient_num, encounter_num, ordering_date, medication, age_event (age at event) etc. Example below:
Patient_num enc_num ordering_date medication age_event
1111 888888 07NOV2008 Wellbutrin 48
1111 876578 11MAY2011 Bupropion 50
2222 999999 08DEC2009 Amitriptyline 32
2222 999999 08DEC2009 Escitalopram 32
3333 656463 12APR2007 Imipramine 44
3333 643211 21DEC2008 Zoloft 45
3333 543213 02FEB2009 Fluoxetine 45
Currently I have the dataset sorted by patient_id then by ordering_date so that I can see what each individual was prescribed during their encounters in a longitudinal fashion. For now, I am most concerned with the prescription(s) that were made during their first visit. I wrote some code to count the number of prescriptions and had originally restricted later analyses to RX = 1, but as we can see, that doesn't work for people with multiple scripts on the same encounter (Patient 2222).
data pt_meds_;
set pt_meds;
by patient_num;
if first.patient_num then RX = 1;
else RX + 1;
run;
Patient_num enc_num ordering_date medication age_event RX
1111 888888 07NOV2008 Wellbutrin 48 1
1111 876578 11MAY2011 Bupropion 50 2
2222 999999 08DEC2009 Amitriptyline 32 1
2222 999999 08DEC2009 Escitalopram 32 2
3333 656463 12APR2007 Imipramine 44 1
3333 643211 21DEC2008 Zoloft 45 2
3333 543213 02FEB2009 Fluoxetine 45 3
I think it would be more appropriate to recode the encounter numbers into a new variable so that they reflect a style similar to the RX variable. Where each encounter is listed 1-n, and the number will repeat if multiple scripts are made in the same encounter. Such as below:
Patient_num enc_num ordering_date medication age_event RX Enc_
1111 888888 07NOV2008 Wellbutrin 48 1 1
1111 876578 11MAY2011 Bupropion 50 2 2
2222 999999 08DEC2009 Amitriptyline 32 1 1
2222 999999 08DEC2009 Escitalopram 32 2 1
3333 656463 12APR2007 Imipramine 44 1 1
3333 643211 21DEC2008 Zoloft 45 2 2
3333 543213 02FEB2009 Fluoxetine 45 3 3
From what I have seen, this could be possible with a variant of the above code using 2 BY groups (patient_num & enc_num), but I can't seem to get it. I think the first. / last. codes require sorting, but if I am to sort by enc_num, they won't be in chronological order because the encounter numbers are generated by the system and depend on all other encounters going in at that time.
I tried to do the following code (using ordering_date instead because its already sorted properly) but everything under Enc_ is printed as a 1. I'm sure my logic is all wrong. Any thoughts?
data pt_meds_test;
set pt_meds_;
by patient_num ordering_date;
if first.patient_num;
if first.ordering_date then enc_ = 1;
else enc_ + 1;
run;
First
.First/.Last flags doesn't require sorting if data is properly ordered or you use NOTSORTED in your BY statement. If your variable in BY statement is not properly ordered then BY statment will throw error and stop executing when encounter deviations. Like this:
data class;
set sashelp.class;
by age;
first = first.age;
last = last.age;
run;
ERROR: BY variables are not properly sorted on data set SASHELP.CLASS.
Name=Alfred Sex=M Age=14 Height=69 Weight=112.5 FIRST.Age=1 LAST.Age=1 first=. last=. _ERROR_=1 _N_=1
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 2 observations read from the data set SASHELP.CLASS.
Try this code to see how exacly .first/.last flags works:
data pt_meds_test;
set pt_meds_;
by patient_num ordering_date;
fp = first.patient_num;
lp = last.patient_num;
fo = first.ordering_date;
lo = last.ordering_date;
run;
Second
Those condidions works differently than you think:
if expression;
If expression is true then continue with next instructions after if.
Otherwise return to begining of data step (no implicit output). This also implies your observation is not retained in the output.
In most cases if without then is equivalent to where. However
whereworks faster but it is limited to variables that comes from data set you are reading
if can be used with any type of expression including calculated fields
More info:: IF
Statement, Subsetting
Third
I think lag() function can be your answear.
data pt_meds_test;
set pt_meds_;
by patient_num;
retain enc_;
prev_patient_num = lag(patient_num);
prev_ordering_date = lag(ordering_date);
if first.patient_num then enc_ = 1;
else if patient_num = prev_patient_num and ordering_date ne prev_ordering_date then enc_ + 1;
end;
run;
With lag() function you can look what was the value of vairalbe on the previos observation and compare it with current one later.
But be carefull. lag() doesn't look for variable value from previous observation. It takes vale of variable and stores it in a FIFO queue with size of 1. On next call it retrives stored value from queue and put new value there.
More info: LAG Function
I'm not sure if this hurts the rest of your analysis, but what about just
proc freq data=pt_meds noprint;
tables patient_num ordering_date / out=pt_meds_freq;
run;
data pt_meds_freq2;
set pt_meds_freq;
by patient_num ordering_date;
if first.patient_num;
run;

parsing text file in tcl and creating dictionary of key value pair where values are in list format

How to seperate following text file and Keep only require data for corresponding :
for example text file have Format:
Name Roll_number Subject Experiment_name Marks Result
Joy 23 Science Exp related to magnet 45 pass
Adi 12 Science Exp electronics 48 pass
kumar 18 Maths prime numbers 49 pass
Piya 19 Maths number roots 47 pass
Ron 28 Maths decimal numbers 12 fail
after parsing above Information and storing in dictionary where key is subject(unique) and values corresponding to subject is list of pass Student name
set studentInfo [dict create]; # Creating empty dictionary
set fp [open input.txt r]
set line_no 0
while {[gets $fp line]!=-1} {
incr line_no
# Skipping line number 1 alone, as it has the column headers
# You can alter this logic, if you want to
if {$line_no==1} {
continue
}
if {[regexp {(\S+)\s+\S+\s+(\S+).*\s(\S+)} $line match name subject result]} {
if {$result eq "pass"} {
# Appending the student's name with key value as 'subject'
dict lappend studentInfo $subject $name
}
}
}
close $fp
puts [dict get $studentInfo]
Output :
Science {Joy Adi} Maths {kumar Piya}

Resources