I would like to ask how can you check using SAS, if the path/folder is empty?
If excel file exist in that path then import it to sas dataset.
Thank you!
Try this
%macro isemptyfolder(folder);
%let filrf=mydir;
%let rc=%sysfunc(filename(filrf, "&folder"));
%let did=%sysfunc(dopen(&filrf));
%let memcount=%sysfunc(dnum(&did));
%let rc=%sysfunc(dclose(&did));
%put 'number of members in folder ' &folder ' = ' &memcount;
%mend isemptyfolder;
Related
I need to list all columns names of all table in a directory and subdirectories.
However, I tried to use the dictionary.columns but it doesn't show the datasets in the subdirectories
Here is the code
create table Kiwi.summarytablecolumns as select * from
dictionary.columns where libname="Kiwi";
;
quit;
what I need is something like this:
Table Name |Columns Name | Path |
Modification from #Python R SAS solution,
Change from macro to full data step solution
Still query from dictionary table
/* Get a list of all subdirectories */
x "dir &basedirectory /s /b /o:n /ad > &basedirectory\list.txt";
filename dirs "&basedirectory\list.txt";
/* Parse each subdirectory into a macro variable */
data _null_;
retain ii 0;
infile dirs end=last;
input;
ii + 1;
rc = libname(catt("DIR", put(ii,8. -l)), _infile_);
run;
Then your original query gets modified slightly:
proc sql;
create table Kiwi.summarytablecolumns as
select *
from dictionary.columns
where libname like 'DIR%';
;
quit;
%let basedirectory = C:\users;
/* Get a list of all subdirectories */
x "dir &basedirectory /s /b /o:n /ad > &basedirectory\list.txt";
filename dirs "&basedirectory\list.txt";
/* Parse each subdirectory into a macro variable */
data _null_;
retain ii 0;
infile dirs end=last;
input;
call symput("dir" || strip(put(ii,8.)),_infile_);
if last then call symput("dirnum", ii);
ii + 1;
run;
/* Process each macro variable and get contents in corresponding library. Append results to grand summary dataset */
%macro loopthrough;
%do ii = 0 %to &dirnum;
libname thislib "&&&dir&ii";
proc contents data=thislib._all_ out=contents noprint;
run;
data contents;
set contents;
length path $200;
where missing(typemem);
TableName = memname;
ColumnName = name;
Path = "&&&dir&ii";
keep TableName ColumnName Path;
run;
%if ii = 0 %then %do;
data summary;
set contents;
run;
%end;
%else %do;
proc append base=summary data=contents;
run;
%end;
%end;
%mend;
%loopthrough;
I have a list of latitudes and longitudes with locations, another list with lats and longs only. I need to map this other set to an approximation of locations from the first list. I tried geosphere in R but the data is too big, and I ended up getting an error message saying "Cannot allocate a vector of Size 718.5 GB"! Any ideas? The data we are looking at to map is just huge (close to 100M rows divided into 48 segments that needs to be mapped to a list of lats and longs which is approximately 80k records long...)
Going off of Roman Luštrik's idea, dividing this up into as small of chunks as possible is going to be your most ideal solution. Let's start by finding the closest point on a per-row basis, rather than trying to load them all into memory at once. This example will be a SAS-based solution.
This example can also be much more efficiently accomplished by traversing hash tables, but would be more complex to explain here. That can be parallelized as well. This way has medium efficiency, but is easier to follow. Let's use two example datasets for this:
1. Mobile_Activity_3months_scrambled.csv - http://js.cit.datalens.api.here.com/datasets/starter_pack/Mobile_activity_3months_scrambled.csv
500k rows. Let's consider this your big dataset.
2. sashelp.zipcode
41k rows. Let's consider this your small dataset.
Goal: Map each data point to the closest city.
To keep this as simple as possible, let's read just one row and match it to the nearest city. First, read in your data:
proc import
file='CHANGE DIRECTORY HERE\Mobile_activity_3months_scrambled.csv'
out=bigdata
dbms=csv
replace;
run;
Next, we'll read in one row and calculate its geographic distance with all other lat/long pairs. We will do a cartesian product with this data using SQL.
proc sql noprint;
create table nearest_point as
select geodist(t1.lat, t1.lon, t2.y, t2.x) as Distance
, t2.city as Nearest_City
from bigdata(obs=1 firstobs=1) as t1
CROSS JOIN
sashelp.zipcode as t2
where NOT missing(t2.x)
order by Distance
;
quit;
The first observation in the output dataset is your closest distance.
Let's generalize this for multiple observations. Let's do it for 10 of them, but increase the efficiency a little bit. We don't need to output all 41k observations. We just need to output the observation with the smallest distance and append it to a master table. Add the outobs=1 option to SQL.
%macro nearest_distance;
%do i = 1 %to 10;
proc sql outobs=1 noprint;
create table nearest_point as
select geodist(t1.lat, t1.lon, t2.y, t2.x) as Distance
, t2.city as Nearest_City
from bigdata(obs=&i. firstobs=&i.) as t1
CROSS JOIN
sashelp.zipcode as t2
where NOT missing(t2.x)
order by Distance
;
quit;
proc append base=all_nearest_points
data=nearest_point
force;
run;
%end;
%mend;
%nearest_distance;
Let's generalize it even more, and remove writing to the log to make it faster. Let's even pre-load our zip code data into memory, and do this for all observations. For the sake of testing an example, we will first force bigdata to be a maximum of 100 obs.
data bigdata;
set bigdata(obs=100);
run;
%macro nearest_distance;
%let dsid = %sysfunc(open(bigdata) );
%let n = %sysfunc(attrn(&dsid., nlobs) );
%let rc = %sysfunc(close(&dsid.) );
proc printto log="%sysfunc(getoption(work) )\_tmp_.txt";
run;
%do i = 1 %to &n.;
proc sql outobs=1 noprint;
create table nearest_point as
select geodist(t1.lat, t1.lon, t2.y, t2.x) as Distance
, t2.city as Nearest_City
from bigdata(obs=&i. firstobs=&i.) as t1
CROSS JOIN
sashelp.zipcode as t2
where NOT missing(t2.x)
order by Distance
;
quit;
proc append base=all_nearest_points
data=nearest_point
force;
run;
%end;
proc printto log=log;
run;
%mend;
%nearest_distance;
Next, let's parallelize it, and finish it all up. You can change the number of parallel sessions you would like to use with the threads option.
%macro nearest_distance(threads=5);
/* Parallel submit options */
options
autosignon=yes
sascmd='!sascmd'
;
/* Current session work directory */
%let workdir = %sysfunc(getoption(work) );
/* Total obs in big data */
%let dsid = %sysfunc(open(bigdata) );
%let n = %sysfunc(attrn(&dsid., nlobs) );
%let rc = %sysfunc(close(&dsid.) );
/* Load lookup table to memory */
sasfile sashelp.zipcode load;
/* Prevent writing to session log */
proc printto log="%sysfunc(getoption(work) )\_tmp_.txt";
run;
/* Run in &threads parallel sessions */
%do t = 1 %to &threads.;
/* Divide up observations for each thread */
%let firstobs = %sysevalf(&n-(&n/&threads.)*(&threads.-&t+1)+1, floor);
%let obs = %sysevalf(&n-(&n/&threads.)*(&threads.-&t.), floor);
/* Transfer primary session macro variables to each worker session */
%syslput _USER_ / remote=worker&t.;
/* Parallel calculations for data in memory */
rsubmit wait=no remote=worker&t.;
/* We are in a specific session, and must define this as a macro within the session */
%macro thread_loop;
%do i = &firstobs. %to &obs.;
/* Primary session library */
libname workdir "&workdir.";
proc sql outobs=1 noprint;
create table nearest_point as
select geodist(t1.lat, t1.lon, t2.y, t2.x) as Distance
, t2.city as Nearest_City
from workdir.bigdata(obs=&i. firstobs=&i.) as t1
CROSS JOIN
sashelp.zipcode as t2
where NOT missing(t2.x)
order by Distance
;
quit;
/* Save to primary session library */
proc append base=workdir._all_nearest_points_&t.
data=nearest_point
force;
run;
%end;
%mend;
%thread_loop;
endrsubmit;
%end;
/* Wait for all workers to end */
waitfor _ALL_;
/* Unload zipcode data from memory */
sasfile sashelp.zipcode close;
/* Append all data to the master file */
proc datasets nolist;
/* Delete final appended output data if it already exists */
delete work.all_nearest_points;
%do t = 1 %to &threads.;
append base = all_nearest_points
data = _all_nearest_points_&t.
force
;
%end;
/* Remove tmp files */
delete _all_nearest_points_:;
quit;
/* Restore log */
proc printto log=log;
run;
%mend;
%nearest_distance;
I'm aware that we could assign a variable in SAS
For example %let date1=Mar16;
%put &date1 ;
Proc sql;
Create table temp as
Select *
From Dd_base.my_table_&date1 ;
Quit;
How can I do this in teradata?
Select *
From Dd_base.my_table_&date1 ;
I need to pass in a date value in the table name
Is there a way to do this in teradata?
I am working on a project with a database. This database is very simple. There is only one table with 2 columns : id (int) and text (string).
To fill this base I want to create a .sql script file.
(this database isn't created inside an android project because I want an already filled database to insert in my android project)
I want my script to create the table and then read a .txt file with a string value (for text column) on each row.
For each row, it should insert the string value into the table.
I am not very familiar with SQLite and SQL in general.
I already found a way to auto-increment the id using an iterator (but I dind't test it yet), but I couldn't found how to read a .txt file line by line.
So my question is : Is it possible to read a .txt file line by line in a SQLite script ?
And if it is, could you please tell me how to do it.
Here's a solution in pure sqlite
CREATE TEMP TABLE input (value STRING);
INSERT INTO input VALUES (TRIM(readfile('input.txt'), char(10)));
CREATE TABLE lines (s STRING);
WITH RECURSIVE
nn (s, rest)
AS (
SELECT
(SELECT SUBSTR(input.value, 0, INSTR(input.value, char(10))) FROM input),
(SELECT SUBSTR(input.value, INSTR(input.value, char(10)) + 1) FROM input)
UNION ALL
SELECT
CASE INSTR(nn.rest, char(10))
WHEN 0 THEN nn.rest
ELSE SUBSTR(nn.rest, 0, INSTR(nn.rest, char(10)))
END,
CASE INSTR(nn.rest, char(10))
WHEN 0 THEN ''
ELSE SUBSTR(nn.rest, INSTR(nn.rest, char(10)) + 1)
END
FROM nn
WHERE LENGTH(nn.rest) > 0
)
INSERT INTO lines (s)
SELECT nn.s FROM nn;
DROP TABLE input;
A few subtleties here:
sqlite does not have a \n escape so you have to use char(10)
this doesn't work well for mixed newlines or \r\n newlines (though you can adjust some + 1s to + 2s and char(10) to char(13) || char(10)
most of the magic is in the recursive union in the middle which nibbles off a line at a time
note that I'm using this approach to solve advent of code -- https://github.com/anthonywritescode/aoc2020
SQLite is an embedded database; it is designed to be used together with some 'real' programming language.
There are no functions to access and parse text files.
You have to write your own script in whatever language you like, or use some existing tool.
If there is a character that is guaranteed not to occurr in the text file, you can use the sqlite3 command-line shell and a temporary, one-column table for importing:
CREATE TEMP TABLE i(txt);
.separator ~
.import MyFile.txt i
INSERT INTO TheRealTable(text) SELECT txt FROM i; -- assumes id is autoincrementing
DROP TABLE i;
I think the simplest way is work on the txt file to convert it to a csv file. Then you can import it directly in Sqlite3 or by a programming language.
sqlite> .mode csv table_name
sqlite> .import file_name.csv table_name
You can use a BufferedReader for that. the code could look like:
InputStream in = context.getResources().openRawResource( R.raw.your_txt_file );
BufferedReader reader = new BufferedReader( new InputStreamReader( in ) );
String line = null;
while( null != ( line = reader.readLine() ) ){
doStuffWithLine( line );
}
reader.close();
Yes, reading a .txt file line by line in a SQLite script is possible. But you'll need to use an extension. Specifically, sqlean-fileio can do the job.
Its fileio_scan(path) function reads the file specified by path line by line without loading the whole file into memory.
For example:
$ echo 'one' > data.txt
$ echo 'two' >> data.txt
$ echo 'three' >> data.txt
create table data(id integer primary key, txt text);
insert into data(txt)
select value from fileio_scan('data.txt');
select * from data;
┌────┬───────┐
│ id │ txt │
├────┼───────┤
│ 1 │ one │
│ 2 │ two │
│ 3 │ three │
└────┴───────┘
That's it!
So my question is : Is it possible to read a .txt file line by line in a SQLite script ?
Yes.
And if it is, could you please tell me how to do it.
There we go:
Pseudo-code algorithm:
Open the file.
Read line by line and insert new row in the database.
Close resources and commit transactions.
1) Open the file
InputStream instream = new FileInputStream("myfilename.txt");
InputStreamReader inputreader = new InputStreamReader(instream);
BufferedReader buffreader = new BufferedReader(inputreader);
2) Read line by line and insert new row in database
List<String> nameList = new ArrayList<>();
String line;
do {
line = buffreader.readLine();
if (line != null){
nameList.add(line);
}
} while (line != null);
Now you should insert all names in database:
storeNamesInDB(nameList);
Where
private void storeNamesInDB(nameList){
String sql = "INSERT INTO table (col1) VALUES (?)";
db.beginTransaction();
SQLiteStatement stmt = db.compileStatement(sql);
for (int i = 0; i < nameList.size(); i++) {
stmt.bindString(1, values.get(i));
stmt.execute();
stmt.clearBindings();
}
db.setTransactionSuccessful();
db.endTransaction();
}
3) Close resources
Don't forget to close resources:
instream.close();
inputreader.close();
DISCLAIMER!
You shouldn't copy&paste this code. Replace each var name and some instructions with someone that make sense in your project. This is just an idea.
What if I run a create table using ODBC in SAS. This table is now saved to my permanent library in SAS. Now I want to take that table because it searched through millions of rows of data and after I am done I have filtered the items and this table has 664 distinct sys_id rows.
I need to take this 664 distinct sys_id rows and now I need to pull all sys_id's in the ODBC that match. I am looking to match any sys_id that has a medical claim during a certain period of time. I know how to do the query part but not sure how to connect a table in my local library and an ODBC at the same time. I have tried doing tons of different things like libname test.sys_id left out join to galaxy.sys_id but nothing works. I have also tried to read up on if this is even possible. I am thinking it might not be possible. The odd thing is I can do it in Access by taking the table I create and connecting it to a table on the server so I would think it would be possible with SAS. I cannot run this program in Access. Not enough memory. Any advice?
Below is the code I have tried so far:
/***the table is successfully created and saved to my libname readm*****/
proc sql;
connect to odbc (dsn=server user=user password=password);
create table readm.test as
select * from connection to odbc
(select distinct server.sys_id, server.clm_aud_nbr,
server.fst_srvc_dt, server.proc_cd
from server.table
where server.proc_cd in ('27130', '27132', '27447')
and server.fst_srvc_dt between (&startdt) and (&enddt))
order by server.sys_id, server.fst_srvc_dt;
disconnect from odbc;
quit;
proc sql;
connect to odbc (dsn=server user=user password=password);
create table readm.test2 as
select * from connection to odbc
(select libname readm.test,
server.mem_sys_id, server.clm_aud_nbr, server.fst_srvc_dt,
server.proc_cd
from libname readm.test
left outer join server.table on
readm.test_sys_id = server.table_sys_id
where server.fst_srvc_dt
between (&startdt) ad (&enddt))
disconnect from odbc;
quit;
Excellent question... We have a macro that we use here to get around that issue as we don't have to ability to upload files to the ODBC server or create temp tables etc... A simple example of using the macro is:
proc sql noprint;
create table xx as
select *
from sashelp.class
where name in ( %ds2list(iDs=sashelp.class, iField=name, iQuote=1, iDelimiter=%str(,)) )
;
quit;
Although the example above doesn't use ODBC passthrough it will work fine with it. And if OPTION MPRINT is on then the log would show something like the below:
121 proc sql noprint;
122 create table xx as
123 select *
124 from sashelp.class
125 where name in (%ds2list(iDs=sashelp.class,iField=name,iQuote=1, iDelimiter=%str(,)))
MPRINT(DS2LIST): 'Alfred'
MPRINT(DS2LIST): ,'Alice'
MPRINT(DS2LIST): ,'Barbara'
MPRINT(DS2LIST): ,'Carol'
MPRINT(DS2LIST): ,'Henry'
MPRINT(DS2LIST): ,'James'
MPRINT(DS2LIST): ,'Jane'
MPRINT(DS2LIST): ,'Janet'
MPRINT(DS2LIST): ,'Jeffrey'
MPRINT(DS2LIST): ,'John'
MPRINT(DS2LIST): ,'Joyce'
MPRINT(DS2LIST): ,'Judy'
MPRINT(DS2LIST): ,'Louise'
MPRINT(DS2LIST): ,'Mary'
MPRINT(DS2LIST): ,'Philip'
MPRINT(DS2LIST): ,'Robert'
MPRINT(DS2LIST): ,'Ronald'
MPRINT(DS2LIST): ,'Thomas'
MPRINT(DS2LIST): ,'William'
126 ;
127 quit;
NOTE: Table WORK.XX created, with 19 rows and 5 columns.
NOTE: PROCEDURE SQL used (Total process time):
real time 0.15 seconds
cpu time 0.06 seconds
As you can see it produced a comma separated list of names, and quoted the names. You can change the delimiter as well as the quotes that are used. There is no limit to the number of items in the list (we've used it on lists with over 100k items) as the list is 'streamed' by the macro, not stored in a macro variable. The only size limit is the one enforced by the ODBC server's querysize. The code to the macro is a little bit scary but place it in your macro autocall folder and forget about it.
The macro code is below:
/***************************************************************************
** PROGRAM: MACRO.DS2LIST.SAS
**
** UTILITY PROGRAM THAT DETECTS RETURNS A LIST OF FIELD VALUES FROM A
** DATASET IN DELIMITED FORMAT.
**
** PARAMETERS:
** iDs : THE LIBNAME.DATASET NAME THAT YOU WANT TO CHECK.
** iField : THE FIELD THAT CONTAINS THE VALUES YOU WANT RETURNED IN A
** DELIMITED FORMAT.
** iDelimiter: DEFAULT IS A COMMA. THE DELIMITER TO USE FOR THE RETURNED LIST.
** iDsOptions: ANY STANDARD DATASET OPTIONS THAT YOU WOULD LIKE TO APPLY SUCH
** AS A WHERE STATEMENT.
** iQuote : (0=NO,1=YES). DEFAULT=0/NO. DETERMINES WHETHER THE RETURNED
** LIST IS QUOTED OR NOT.
** iQuoteChar: (SINGLE,DOUBLE) DEFAULT=SINGLE. SPECIFIES WHETHER SINGLE0.
** OR DOUBLE QUOTES ARE USED WHEN QUOTING THE RETURNED LIST
**
*****************************************************************************
** VERSION:
**
** 1.0 ON: 05-FEB-2007 BY: ROBERT PENRIDGE
** CREATED.
** 1.1 ON: 29-APR-2008 BY: ROBERT PENRIDGE
** PUT IN ERROR CHECKING.
** ADDED AUTOMATIC TYPE DETECTION
** FIXED OUTPUT.
** 1.2 ON: 23-APR-2010 BY: ROBERT PENRIDGE
** CHANGED SO THAT OUTPUT SPOOLED. ALLOWS MACRO TO RETURN OUTPUT > 64KB.
** 1.3 ON: 12-MAY-2010 BY: ROBERT PENRIDGE
** ADDED PARAMETER CHECK AFTER I SPENT 10 MINUTES TRYING TO FIGURE OUT
** WHY MY CODE WAS RETURNING AN ERROR. DUH!
** 1.4 ON: 26-MAY-2010 BY: KN
** ADDED IQUOTE.
** 1.5 ON: 08-JUN-2010 BY: RP
** FIXED DCLOSE SO DATASET WOULD CLOSE PROPERLY AND RELEASE LOCK.
** 1.6 ON: 16-JUN-2010 BY: RP
** ADDED IQUOTECHAR PARAMETER
** 1.7 ON: 20-JUL-2010 BY: RP
** UNQUOTED RETURNED VALUES
** 1.8 ON: 11-OCT-2010 BY: KN
** MODIFIED TO ALLOW BLANK CHARACTER VALUES AND ALSO REMOVED TRAILING
** MODIFIED TO ALLOW PARENTHESES IN CHARACTER VALUES
*****************************************************************************/
%macro ds2list(iDs=, iField=, iDsOptions=, iDelimiter=%str(,), iQuote=0, iQuoteChar=single);
%local dsid pos rc result cnt quotechar;
%let result=;
%let cnt=0;
%if &iQuote %then %do;
%if "%upcase(&iQuoteChar)" eq "DOUBLE" %then %do;
%let quotechar = %nrstr(%");
%end;
%else %if "%upcase(&iQuoteChar)" eq "SINGLE" %then %do;
%let quotechar = %nrstr(%');
%end;
%else %do;
%let quotechar = %nrstr(%");
%put WARNING: MACRO.DS2LIST.SAS: PARAMETER IQUOTECHAR INCORRECT. DEFAULTED TO DOUBLE;
%end;
%end;
%else %do;
%let quotechar = ;
%end;
/*
** ENSURE ALL THE REQUIRED PARAMETERS WERE PASSED IN.
*/
%if "&iDs" ne "" and "&iField" ne "" %then %do;
%let dsid=%sysfunc(open(&iDs(&iDsOptions),i));
%if &dsid %then %do;
%let pos=%sysfunc(varnum(&dsid,&iField));
%if &pos %then %do;
%let rc=%sysfunc(fetch(&dsid));
%do %while (&rc eq 0);
%if "%sysfunc(vartype(&dsid,&pos))" = "C" %then %do;
%let value = %qsysfunc(getvarc(&dsid,&pos));
%if "%trim(&value)" ne "" %then %do;
%let value = %qsysfunc(cats(%nrstr(&value)));
%end;
%end;
%else %do;
%let value = %sysfunc(getvarn(&dsid,&pos));
%end;
/* WHITESPACE/CARRIAGE RETURNS REMOVED IN THE BELOW LINE */
/* TO ENSURE NO WHITESPACE IS RETURNED IN THE OUTPUT. */
%if &cnt ne 0 %then %do;%unquote(&iDelimiter)%end;%unquote("echar&value"echar.)
%let cnt = %eval(&cnt + 1);
%let rc = %sysfunc(fetch(&dsid));
%end;
%if &rc ne -1 %then %do;
%put WARNING: MACRO.DS2LIST.SAS: %sysfunc(sysmsg());
%end;
%end;
%else %do;
%put ERROR: MACRO.DS2LIST.SAS: FIELD &iField NOT FOUND IN DATASET %upcase(&iDs).;
%end;
%end;
%else %do;
%put ERROR: MACRO.DS2LIST.SAS: DATASET %upcase(&iDs) COULD NOT BE OPENED.;
%end;
%let rc=%sysfunc(close(&dsid));
%end;
%else %do;
%put ERROR: MACRO.DS2LIST.SAS: YOU MUST SPECIFY BOTH THE IDS AND IFIELD PARAMETERS TO CALL THIS MACRO.;
%end;
%mend;
Since you are able to do everything that you want EXCEPT join a table from your local machine with the ODBC data, it would seem that a subquery would work.
Once your subquery gets the 664 sys_ids, that small subset is joined with the ODBC data to return only the desired records...which should not be an unreasoable amount of records
Proc SQL Subquery Links Here and Here
If you are familiar with the HASH object in SAS...this is similar. Or previously, using Proc SQL to create a macro variable with all the sys_ids separated by commas and used with an IN operator in a Data step (like what #Rob Penridge uses in his macro).