What if I run a create table using ODBC in SAS. This table is now saved to my permanent library in SAS. Now I want to take that table because it searched through millions of rows of data and after I am done I have filtered the items and this table has 664 distinct sys_id rows.
I need to take this 664 distinct sys_id rows and now I need to pull all sys_id's in the ODBC that match. I am looking to match any sys_id that has a medical claim during a certain period of time. I know how to do the query part but not sure how to connect a table in my local library and an ODBC at the same time. I have tried doing tons of different things like libname test.sys_id left out join to galaxy.sys_id but nothing works. I have also tried to read up on if this is even possible. I am thinking it might not be possible. The odd thing is I can do it in Access by taking the table I create and connecting it to a table on the server so I would think it would be possible with SAS. I cannot run this program in Access. Not enough memory. Any advice?
Below is the code I have tried so far:
/***the table is successfully created and saved to my libname readm*****/
proc sql;
connect to odbc (dsn=server user=user password=password);
create table readm.test as
select * from connection to odbc
(select distinct server.sys_id, server.clm_aud_nbr,
server.fst_srvc_dt, server.proc_cd
from server.table
where server.proc_cd in ('27130', '27132', '27447')
and server.fst_srvc_dt between (&startdt) and (&enddt))
order by server.sys_id, server.fst_srvc_dt;
disconnect from odbc;
quit;
proc sql;
connect to odbc (dsn=server user=user password=password);
create table readm.test2 as
select * from connection to odbc
(select libname readm.test,
server.mem_sys_id, server.clm_aud_nbr, server.fst_srvc_dt,
server.proc_cd
from libname readm.test
left outer join server.table on
readm.test_sys_id = server.table_sys_id
where server.fst_srvc_dt
between (&startdt) ad (&enddt))
disconnect from odbc;
quit;
Excellent question... We have a macro that we use here to get around that issue as we don't have to ability to upload files to the ODBC server or create temp tables etc... A simple example of using the macro is:
proc sql noprint;
create table xx as
select *
from sashelp.class
where name in ( %ds2list(iDs=sashelp.class, iField=name, iQuote=1, iDelimiter=%str(,)) )
;
quit;
Although the example above doesn't use ODBC passthrough it will work fine with it. And if OPTION MPRINT is on then the log would show something like the below:
121 proc sql noprint;
122 create table xx as
123 select *
124 from sashelp.class
125 where name in (%ds2list(iDs=sashelp.class,iField=name,iQuote=1, iDelimiter=%str(,)))
MPRINT(DS2LIST): 'Alfred'
MPRINT(DS2LIST): ,'Alice'
MPRINT(DS2LIST): ,'Barbara'
MPRINT(DS2LIST): ,'Carol'
MPRINT(DS2LIST): ,'Henry'
MPRINT(DS2LIST): ,'James'
MPRINT(DS2LIST): ,'Jane'
MPRINT(DS2LIST): ,'Janet'
MPRINT(DS2LIST): ,'Jeffrey'
MPRINT(DS2LIST): ,'John'
MPRINT(DS2LIST): ,'Joyce'
MPRINT(DS2LIST): ,'Judy'
MPRINT(DS2LIST): ,'Louise'
MPRINT(DS2LIST): ,'Mary'
MPRINT(DS2LIST): ,'Philip'
MPRINT(DS2LIST): ,'Robert'
MPRINT(DS2LIST): ,'Ronald'
MPRINT(DS2LIST): ,'Thomas'
MPRINT(DS2LIST): ,'William'
126 ;
127 quit;
NOTE: Table WORK.XX created, with 19 rows and 5 columns.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.15 seconds
cpu time 0.06 seconds
As you can see it produced a comma separated list of names, and quoted the names. You can change the delimiter as well as the quotes that are used. There is no limit to the number of items in the list (we've used it on lists with over 100k items) as the list is 'streamed' by the macro, not stored in a macro variable. The only size limit is the one enforced by the ODBC server's querysize. The code to the macro is a little bit scary but place it in your macro autocall folder and forget about it.
The macro code is below:
/***************************************************************************
** PROGRAM: MACRO.DS2LIST.SAS
**
** UTILITY PROGRAM THAT DETECTS RETURNS A LIST OF FIELD VALUES FROM A
** DATASET IN DELIMITED FORMAT.
**
** PARAMETERS:
** iDs : THE LIBNAME.DATASET NAME THAT YOU WANT TO CHECK.
** iField : THE FIELD THAT CONTAINS THE VALUES YOU WANT RETURNED IN A
** DELIMITED FORMAT.
** iDelimiter: DEFAULT IS A COMMA. THE DELIMITER TO USE FOR THE RETURNED LIST.
** iDsOptions: ANY STANDARD DATASET OPTIONS THAT YOU WOULD LIKE TO APPLY SUCH
** AS A WHERE STATEMENT.
** iQuote : (0=NO,1=YES). DEFAULT=0/NO. DETERMINES WHETHER THE RETURNED
** LIST IS QUOTED OR NOT.
** iQuoteChar: (SINGLE,DOUBLE) DEFAULT=SINGLE. SPECIFIES WHETHER SINGLE0.
** OR DOUBLE QUOTES ARE USED WHEN QUOTING THE RETURNED LIST
**
*****************************************************************************
** VERSION:
**
** 1.0 ON: 05-FEB-2007 BY: ROBERT PENRIDGE
** CREATED.
** 1.1 ON: 29-APR-2008 BY: ROBERT PENRIDGE
** PUT IN ERROR CHECKING.
** ADDED AUTOMATIC TYPE DETECTION
** FIXED OUTPUT.
** 1.2 ON: 23-APR-2010 BY: ROBERT PENRIDGE
** CHANGED SO THAT OUTPUT SPOOLED. ALLOWS MACRO TO RETURN OUTPUT > 64KB.
** 1.3 ON: 12-MAY-2010 BY: ROBERT PENRIDGE
** ADDED PARAMETER CHECK AFTER I SPENT 10 MINUTES TRYING TO FIGURE OUT
** WHY MY CODE WAS RETURNING AN ERROR. DUH!
** 1.4 ON: 26-MAY-2010 BY: KN
** ADDED IQUOTE.
** 1.5 ON: 08-JUN-2010 BY: RP
** FIXED DCLOSE SO DATASET WOULD CLOSE PROPERLY AND RELEASE LOCK.
** 1.6 ON: 16-JUN-2010 BY: RP
** ADDED IQUOTECHAR PARAMETER
** 1.7 ON: 20-JUL-2010 BY: RP
** UNQUOTED RETURNED VALUES
** 1.8 ON: 11-OCT-2010 BY: KN
** MODIFIED TO ALLOW BLANK CHARACTER VALUES AND ALSO REMOVED TRAILING
** MODIFIED TO ALLOW PARENTHESES IN CHARACTER VALUES
*****************************************************************************/
%macro ds2list(iDs=, iField=, iDsOptions=, iDelimiter=%str(,), iQuote=0, iQuoteChar=single);
%local dsid pos rc result cnt quotechar;
%let result=;
%let cnt=0;
%if &iQuote %then %do;
%if "%upcase(&iQuoteChar)" eq "DOUBLE" %then %do;
%let quotechar = %nrstr(%");
%end;
%else %if "%upcase(&iQuoteChar)" eq "SINGLE" %then %do;
%let quotechar = %nrstr(%');
%end;
%else %do;
%let quotechar = %nrstr(%");
%put WARNING: MACRO.DS2LIST.SAS: PARAMETER IQUOTECHAR INCORRECT. DEFAULTED TO DOUBLE;
%end;
%end;
%else %do;
%let quotechar = ;
%end;
/*
** ENSURE ALL THE REQUIRED PARAMETERS WERE PASSED IN.
*/
%if "&iDs" ne "" and "&iField" ne "" %then %do;
%let dsid=%sysfunc(open(&iDs(&iDsOptions),i));
%if &dsid %then %do;
%let pos=%sysfunc(varnum(&dsid,&iField));
%if &pos %then %do;
%let rc=%sysfunc(fetch(&dsid));
%do %while (&rc eq 0);
%if "%sysfunc(vartype(&dsid,&pos))" = "C" %then %do;
%let value = %qsysfunc(getvarc(&dsid,&pos));
%if "%trim(&value)" ne "" %then %do;
%let value = %qsysfunc(cats(%nrstr(&value)));
%end;
%end;
%else %do;
%let value = %sysfunc(getvarn(&dsid,&pos));
%end;
/* WHITESPACE/CARRIAGE RETURNS REMOVED IN THE BELOW LINE */
/* TO ENSURE NO WHITESPACE IS RETURNED IN THE OUTPUT. */
%if &cnt ne 0 %then %do;%unquote(&iDelimiter)%end;%unquote("echar&value"echar.)
%let cnt = %eval(&cnt + 1);
%let rc = %sysfunc(fetch(&dsid));
%end;
%if &rc ne -1 %then %do;
%put WARNING: MACRO.DS2LIST.SAS: %sysfunc(sysmsg());
%end;
%end;
%else %do;
%put ERROR: MACRO.DS2LIST.SAS: FIELD &iField NOT FOUND IN DATASET %upcase(&iDs).;
%end;
%end;
%else %do;
%put ERROR: MACRO.DS2LIST.SAS: DATASET %upcase(&iDs) COULD NOT BE OPENED.;
%end;
%let rc=%sysfunc(close(&dsid));
%end;
%else %do;
%put ERROR: MACRO.DS2LIST.SAS: YOU MUST SPECIFY BOTH THE IDS AND IFIELD PARAMETERS TO CALL THIS MACRO.;
%end;
%mend;
Since you are able to do everything that you want EXCEPT join a table from your local machine with the ODBC data, it would seem that a subquery would work.
Once your subquery gets the 664 sys_ids, that small subset is joined with the ODBC data to return only the desired records...which should not be an unreasoable amount of records
Proc SQL Subquery Links Here and Here
If you are familiar with the HASH object in SAS...this is similar. Or previously, using Proc SQL to create a macro variable with all the sys_ids separated by commas and used with an IN operator in a Data step (like what #Rob Penridge uses in his macro).
Related
I would like to ask how can you check using SAS, if the path/folder is empty?
If excel file exist in that path then import it to sas dataset.
Thank you!
Try this
%macro isemptyfolder(folder);
%let filrf=mydir;
%let rc=%sysfunc(filename(filrf, "&folder"));
%let did=%sysfunc(dopen(&filrf));
%let memcount=%sysfunc(dnum(&did));
%let rc=%sysfunc(dclose(&did));
%put 'number of members in folder ' &folder ' = ' &memcount;
%mend isemptyfolder;
The below code is getting many errors. Just want to check if my code syntax is correct. The main logic behind the package is to get all the cases for particular review and spool it to a file using shell script. As of now, am concentrating on the package .
create or replace PACKAGE BODY PK_FCP_EXTRACT is
PROCEDURE sp_fcp_extract is
cursor Rev_cur is select * from t_uar_reviews where CREATED_DATE=trunc(sysdate) ;
r Rev_cur%rowtype;
cursor case_cur( c_revid IN t_uar_reviews.review_id%type )
is select *
from t_uar_cases where review_id= c_revid ;
c case_cur%rowtype;
begin
open Rev_cur;
loop
fetch Rev_cur into r;
exit when Rev_cur%notfound;
open case_cur( r.review_id );
loop
fetch case_cur into c;
exit when case_cur%notfound;
dbms_output.put_line(c.UAR_CASE_ID||','||c.UAR_REVIEW_ID||','||c.CASE_TYPE||','||c.CASE_NMBR||','||c.ACTIVE_FLAG||','|| c.CREATED_DATE);
end loop;
close case_cur;
end loop;
close Rev_cur;
end;
END PK_FCP_EXTRACT;
Your syntax is correct.
The following test code works correctly in an empty schema:
create table t_uar_reviews(
review_id number,
created_date date
)
/
create table t_uar_cases(
review_id number,
UAR_CASE_ID number,
UAR_REVIEW_ID number,
CASE_TYPE varchar2(10),
CASE_NMBR number,
active_flag varchar2(1),
created_date date)
/
create package PK_FCP_EXTRACT is
PROCEDURE sp_fcp_extract;
end;
/
create or replace PACKAGE BODY PK_FCP_EXTRACT is
PROCEDURE sp_fcp_extract is
cursor Rev_cur is select * from t_uar_reviews where CREATED_DATE=trunc(sysdate) ;
r Rev_cur%rowtype;
cursor case_cur( c_revid IN t_uar_reviews.review_id%type )
is select *
from t_uar_cases where review_id= c_revid ;
c case_cur%rowtype;
begin
open Rev_cur;
loop
fetch Rev_cur into r;
exit when Rev_cur%notfound;
open case_cur( r.review_id );
loop
fetch case_cur into c;
exit when case_cur%notfound;
dbms_output.put_line(c.UAR_CASE_ID||','||c.UAR_REVIEW_ID||','||c.CASE_TYPE||','||c.CASE_NMBR||','||c.ACTIVE_FLAG||','|| c.CREATED_DATE);
end loop;
close case_cur;
end loop;
close Rev_cur;
end;
END PK_FCP_EXTRACT;
Possible causes of trouble:
Make sure that the package spec has been created first (CREATE PACKAGE)
Ensure that the package is created in the schema that owns t_uar_reviews and t_uar_cases, or in a schema that has a direct SELECT grant on the tables (not via a role).
Make sure all of the columns you reference in the package exist in the tables.
If those are all done, it should work.
To simplify things, try using this alternate syntax for your cursor loops:
for r in Rev_cur loop
for c in Case_cur(r.review_id) loop
...do something..
end loop;
end loop;
By using this method, you do not need to define the record r or c; they are implicitly created, and you do not to open/fetch/check/close the cursors.
I need to list all columns names of all table in a directory and subdirectories.
However, I tried to use the dictionary.columns but it doesn't show the datasets in the subdirectories
Here is the code
create table Kiwi.summarytablecolumns as select * from
dictionary.columns where libname="Kiwi";
;
quit;
what I need is something like this:
Table Name |Columns Name | Path |
Modification from #Python R SAS solution,
Change from macro to full data step solution
Still query from dictionary table
/* Get a list of all subdirectories */
x "dir &basedirectory /s /b /o:n /ad > &basedirectory\list.txt";
filename dirs "&basedirectory\list.txt";
/* Parse each subdirectory into a macro variable */
data _null_;
retain ii 0;
infile dirs end=last;
input;
ii + 1;
rc = libname(catt("DIR", put(ii,8. -l)), _infile_);
run;
Then your original query gets modified slightly:
proc sql;
create table Kiwi.summarytablecolumns as
select *
from dictionary.columns
where libname like 'DIR%';
;
quit;
%let basedirectory = C:\users;
/* Get a list of all subdirectories */
x "dir &basedirectory /s /b /o:n /ad > &basedirectory\list.txt";
filename dirs "&basedirectory\list.txt";
/* Parse each subdirectory into a macro variable */
data _null_;
retain ii 0;
infile dirs end=last;
input;
call symput("dir" || strip(put(ii,8.)),_infile_);
if last then call symput("dirnum", ii);
ii + 1;
run;
/* Process each macro variable and get contents in corresponding library. Append results to grand summary dataset */
%macro loopthrough;
%do ii = 0 %to &dirnum;
libname thislib "&&&dir&ii";
proc contents data=thislib._all_ out=contents noprint;
run;
data contents;
set contents;
length path $200;
where missing(typemem);
TableName = memname;
ColumnName = name;
Path = "&&&dir&ii";
keep TableName ColumnName Path;
run;
%if ii = 0 %then %do;
data summary;
set contents;
run;
%end;
%else %do;
proc append base=summary data=contents;
run;
%end;
%end;
%mend;
%loopthrough;
I have a list of latitudes and longitudes with locations, another list with lats and longs only. I need to map this other set to an approximation of locations from the first list. I tried geosphere in R but the data is too big, and I ended up getting an error message saying "Cannot allocate a vector of Size 718.5 GB"! Any ideas? The data we are looking at to map is just huge (close to 100M rows divided into 48 segments that needs to be mapped to a list of lats and longs which is approximately 80k records long...)
Going off of Roman Luštrik's idea, dividing this up into as small of chunks as possible is going to be your most ideal solution. Let's start by finding the closest point on a per-row basis, rather than trying to load them all into memory at once. This example will be a SAS-based solution.
This example can also be much more efficiently accomplished by traversing hash tables, but would be more complex to explain here. That can be parallelized as well. This way has medium efficiency, but is easier to follow. Let's use two example datasets for this:
1. Mobile_Activity_3months_scrambled.csv - http://js.cit.datalens.api.here.com/datasets/starter_pack/Mobile_activity_3months_scrambled.csv
500k rows. Let's consider this your big dataset.
2. sashelp.zipcode
41k rows. Let's consider this your small dataset.
Goal: Map each data point to the closest city.
To keep this as simple as possible, let's read just one row and match it to the nearest city. First, read in your data:
proc import
file='CHANGE DIRECTORY HERE\Mobile_activity_3months_scrambled.csv'
out=bigdata
dbms=csv
replace;
run;
Next, we'll read in one row and calculate its geographic distance with all other lat/long pairs. We will do a cartesian product with this data using SQL.
proc sql noprint;
create table nearest_point as
select geodist(t1.lat, t1.lon, t2.y, t2.x) as Distance
, t2.city as Nearest_City
from bigdata(obs=1 firstobs=1) as t1
CROSS JOIN
sashelp.zipcode as t2
where NOT missing(t2.x)
order by Distance
;
quit;
The first observation in the output dataset is your closest distance.
Let's generalize this for multiple observations. Let's do it for 10 of them, but increase the efficiency a little bit. We don't need to output all 41k observations. We just need to output the observation with the smallest distance and append it to a master table. Add the outobs=1 option to SQL.
%macro nearest_distance;
%do i = 1 %to 10;
proc sql outobs=1 noprint;
create table nearest_point as
select geodist(t1.lat, t1.lon, t2.y, t2.x) as Distance
, t2.city as Nearest_City
from bigdata(obs=&i. firstobs=&i.) as t1
CROSS JOIN
sashelp.zipcode as t2
where NOT missing(t2.x)
order by Distance
;
quit;
proc append base=all_nearest_points
data=nearest_point
force;
run;
%end;
%mend;
%nearest_distance;
Let's generalize it even more, and remove writing to the log to make it faster. Let's even pre-load our zip code data into memory, and do this for all observations. For the sake of testing an example, we will first force bigdata to be a maximum of 100 obs.
data bigdata;
set bigdata(obs=100);
run;
%macro nearest_distance;
%let dsid = %sysfunc(open(bigdata) );
%let n = %sysfunc(attrn(&dsid., nlobs) );
%let rc = %sysfunc(close(&dsid.) );
proc printto log="%sysfunc(getoption(work) )\_tmp_.txt";
run;
%do i = 1 %to &n.;
proc sql outobs=1 noprint;
create table nearest_point as
select geodist(t1.lat, t1.lon, t2.y, t2.x) as Distance
, t2.city as Nearest_City
from bigdata(obs=&i. firstobs=&i.) as t1
CROSS JOIN
sashelp.zipcode as t2
where NOT missing(t2.x)
order by Distance
;
quit;
proc append base=all_nearest_points
data=nearest_point
force;
run;
%end;
proc printto log=log;
run;
%mend;
%nearest_distance;
Next, let's parallelize it, and finish it all up. You can change the number of parallel sessions you would like to use with the threads option.
%macro nearest_distance(threads=5);
/* Parallel submit options */
options
autosignon=yes
sascmd='!sascmd'
;
/* Current session work directory */
%let workdir = %sysfunc(getoption(work) );
/* Total obs in big data */
%let dsid = %sysfunc(open(bigdata) );
%let n = %sysfunc(attrn(&dsid., nlobs) );
%let rc = %sysfunc(close(&dsid.) );
/* Load lookup table to memory */
sasfile sashelp.zipcode load;
/* Prevent writing to session log */
proc printto log="%sysfunc(getoption(work) )\_tmp_.txt";
run;
/* Run in &threads parallel sessions */
%do t = 1 %to &threads.;
/* Divide up observations for each thread */
%let firstobs = %sysevalf(&n-(&n/&threads.)*(&threads.-&t+1)+1, floor);
%let obs = %sysevalf(&n-(&n/&threads.)*(&threads.-&t.), floor);
/* Transfer primary session macro variables to each worker session */
%syslput _USER_ / remote=worker&t.;
/* Parallel calculations for data in memory */
rsubmit wait=no remote=worker&t.;
/* We are in a specific session, and must define this as a macro within the session */
%macro thread_loop;
%do i = &firstobs. %to &obs.;
/* Primary session library */
libname workdir "&workdir.";
proc sql outobs=1 noprint;
create table nearest_point as
select geodist(t1.lat, t1.lon, t2.y, t2.x) as Distance
, t2.city as Nearest_City
from workdir.bigdata(obs=&i. firstobs=&i.) as t1
CROSS JOIN
sashelp.zipcode as t2
where NOT missing(t2.x)
order by Distance
;
quit;
/* Save to primary session library */
proc append base=workdir._all_nearest_points_&t.
data=nearest_point
force;
run;
%end;
%mend;
%thread_loop;
endrsubmit;
%end;
/* Wait for all workers to end */
waitfor _ALL_;
/* Unload zipcode data from memory */
sasfile sashelp.zipcode close;
/* Append all data to the master file */
proc datasets nolist;
/* Delete final appended output data if it already exists */
delete work.all_nearest_points;
%do t = 1 %to &threads.;
append base = all_nearest_points
data = _all_nearest_points_&t.
force
;
%end;
/* Remove tmp files */
delete _all_nearest_points_:;
quit;
/* Restore log */
proc printto log=log;
run;
%mend;
%nearest_distance;
I am using Oracle 10g with SqlDeveloper. When I execute the following code, it says
"FUNCTION wafadar compiled
Warning: execution completed with warning"
create or replace function wafadar
return varchar2(10)
is
cursor c1 is
SELECT employee_id,first_name FROM employees where department_id=50 ;
begin
for i in c1
loop
dbms_output.put_line(i.first_name);
end loop;
return 'hello';
end;
SHOW ERRORS at the end is also not showing the warnings. Why are the warnings there ?
Errors!
At first, you should care about errors, and I bet you have one on the return clause of your function (you can't specify the size of the "varchar2".
Warnings
Did you look for "warning" in the manuals?
http://download.oracle.com/docs/cd/E11882_01/appdev.112/e17126/errors.htm#LNPLS00711
How to see warnings(enable categories you need)
alter function wafadar compile plsql_warnings='ENABLE:ALL' reuse settings
Check:
select plsql_warnings
from user_plsql_object_settings ps
where ps.name = 'WAFADAR'
Your warnings:
Client tools like sql*plus or Sql Developer(if supported):
show errors
or
select *
from user_errors ur
where ur.name = 'WAFADAR'
NAME TYPE SEQUENCE LINE POSITION TEXT ATTRIBUTE MESSAGE_NUMBER
------------------------------ ------------ ---------- ---------- ---------- -------------------------------------------------------------------------------- --------- --------------
WAFADAR FUNCTION 1 1 1 PLW-05018: unit WAFADAR omitted optional AUTHID clause; default value DEFINER us WARNING 5018
Finally, I suggest you to read a little bit of:
How to ask: http://www.catb.org/~esr/faqs/smart-questions.html#before
http://download.oracle.com/docs/cd/E11882_01/appdev.112/e17126/toc.htm
All the "oracles" are here: http://tahiti.oracle.com/
Sql developer oracles(if you want to use it!): http://download.oracle.com/docs/cd/E11882_01/doc.112/e12152/toc.htm