SAS how to get summary counts within the same dataset - count

I have a dataset that looks like:
id,colour
12,blue
12,green
12,yellow
13,blue
14,black
15,blue
15,green
In the same data set I would like to have the counts of each id
Ultimately what I want to do is eliminate the ids that have more than one
In SQL I would use a SUM OVER() windowing function, or self join the table with the counts of each id
Whats the best way to do this in SAS?
id,colour,num
12,blue,3
12,green,3
12,yellow,3
13,blue,1
14,black,1
15,blue,2
15,green,2
My end result is going to look like this in the end after excluding the duplicate ids:
id,colour
13,blue
14,black

Use PROC SORT in SAS 9.3+ to get a set of unique observations by key variables.
proc sort data=have out=duprec nouniquekey uniqueout=want;
by id;
run;
In SAS SQL you could remerge directly - something that isn't supported in other versions of SQL. You could further limit the query with a HAVING clause to get your final output directly.
proc sql;
create table want3 as
select *
from have
group by id
having count(*)=1;
quit;

Related

Search similar value in another table using IN (SQLite)

I'm using SQLite to deal with tons of data (like 100gb of data).
I need to seach the value of one column in other table in the fastest way possible.
For example, I need to find the following values of Table 1
[COD]
C62
K801
And then find them in Table 2:
[COD_2]
C60-C63
K80-K81
My desired result is something like:
[COD_1] [COD_2]
C62 C60-C63
K801 K80-K81
Since I have a lot of data, it is inefficient to do something like:
SELECT *
FROM TABLE_1, TABLE_2
WHERE COD_1 LIKE '%' || COD_2 || '%';
Instead, I was trying to do this:
SELECT *
FROM TABLE_1
WHERE COD_1 IN (SELECT COD_2 FROM TABLE_2);
Of course that this doesn't result because the codes are not exactly the sames. Is there a way to search for similar values of one column (something like the LIKE operator) in other table by using IN? Or other way that doesn't cross TABLE_1 and TABLE_2?
Thank you!!!
useful to me.
Based on the small data set shown, and my presumed answer to #Shawn's question (K801 is a typo and is meant to be K80 or K81) I assume the following problem description:
Find a row in COD_2 such that the value in COD_1 is between {value1}-{value2} in COD_2; the - being significant and dependable.
I cannot speak to speed, but I would approach it this way:
SELECT value1, value2
from COD_1,COD_2
where value1 between substr(value2,1,instr(value2,'-')-1) and substr(value2,instr(value2,'-')+1)
The thought being: split the value from COD-2 into a "start" and an "end" value.

SQLite3 split date while creating index

I'm using a SQLite3 database, and I have a table that looks like this:
The database is quite big and running queries is very slow. I'm trying to speed up the process by indexing some of the columns. One of the columns that I want to index is the QUOTE_DATETIME column.
Problem: I want to index by date (YYYY-MM-DD) only, not by date and time (YYYY-MM-DD HH:MM:SS), which is the data I currently have in QUOTE_DATETIME.
Question: How can I use CREATE INDEX to create an index that uses only dates in the format YYYY-MM-DD? Should I split QUOTE_DATETIME into 2 columns: QUOTE_DATE and QUOTE_TIME? If so, how can I do that? Is there an easier solution?
Thanks for helping! :D
Attempt 1: I tried running CREATE INDEX id ON DATA (date(QUOTE_DATETIME)) but I got the error Error: non-deterministic functions prohibited in index expressions.
Attempt 2: I ran ALTER TABLE data ADD COLUMN QUOTE_DATE TEXT to create a new column to hold the date only. And then INSERT INTO data(QUOTE_DATE) SELECT date(QUOTE_DATETIME) FROM data. The date(QUOTE_DATETIME) should convert the date + time to only date, and the INSERT INTO should add the new values to QUOTE_DATE. However, it doesn't work and I don't know why. The new column ends up not having anything added to it.
Expression indexes must not use functions that might change their return value based on data not mentioned in the function call itself. The date() function is such a function because it might use the current time zone setting.
However, in SQLite 3.20 or later, you can use date() in indexes as long as you are not using any time zone modifiers.
INSERT adds new rows. To modify existing rows, use UPDATE:
UPDATE Data SET Quote_Date = date(Quote_DateTime);

Query a manual list of data items

I would like to run a query involving joining a table to a manually generated list but am stuck trying to generate the manual list. There is an example of what I am attempting to do below:
SELECT
*
FROM
('29/12/2014', '30/12/2014', '30/12/2014') dates
;
Ideally I would want my output to look like:
29/12/2014
30/12/2014
31/12/2014
What's your Teradata release?
In TD14 there's STRTOK_SPLIT_TO_TABLE:
SELECT *
FROM TABLE (STRTOK_SPLIT_TO_TABLE(1 -- any dummy value
,'29/12/2014,30/12/2014,30/12/2014' -- any delimited string
,',' -- delimiter
)
RETURNS (outkey INTEGER
,tokennum INTEGER
,token VARCHAR(20) CHARACTER SET UNICODE) -- modify to match the actual size
) AS d
You can easily put this in a Derived Table and then join to it.
inkey (here the dummy value 1) is a numeric or string column, usually a key. Can be used for joining back to the original row.
outkey is the same as inkey.
tokennum is the ordinal position of the token in the input string.
token is the extracted substring.
Try this:
select '29/12/2014'
union
select '30/12/2014'
union
...
It should work in Teradata as well as in MySql.

Utility to insert row in any sas dataset

I want to create a unix utility to insert 1 row into a sas dataset. When run, this scipt will ask user to insert value for each variable in the dataset(preferabely telling him the type and length of the variable). It will then pass these values to SAS using EXPORT command and then SAS will create macro variable for these variables and using 'proc sql; insert into' will insert the value into dataset.
data raw_str;
/* init PDV */
if 0 then
set tracking_data;
/* programmatic structure to enable addressing of vars */
array a_c(*) _character_;
array a_n(*) _numeric_;
run;
now raw_str will variables whose type and length be same as that of the tracking data
proc sql noprint;
select distinct name
into : varlist separated by ' '
from dictionary.columns
where libname='WORK'
and memname='raw_str';
quit;
then i want to pass this list to unix, from there i will ask user to enter value for these variables and then i will append these values into the tracking_data using.
problem is with passing values from unix to sas and creating macro variables for these values
I can also pass the length and type of variable to the front end, telling user to pass value which matched the type and length of raw_str dataset
proc sql;
insert into raw_str
values (&val1, &val2, &val3...);
quit;
finally i can use proc append to append it into the original data
Here's one possible approach for getting user-entered values from UNIX into SAS:
Have your UNIX shell script write out the user-entered values into a (consistently formatted) temporary text file, e.g. CSV
Write a SAS data step that can read the text file and import the values into the required formats. You can run proc import and look at the log to get an idea of the sort of code to use.
Have the script call SAS via the command line, telling it to run a program containing the data step you wrote.

Oracle PL/SQL: how can I use INSERT INTO but select data from different tables?

Here's what I want to do:
Insert some data being passed into a procedure into a new table, but also include some data from another table too.
Example:
INSERT INTO my_new_table(name, age, whatever)
VALUES (names.name, ages.age, passed_in_data);
But that won't work.
I did check another question on here, but it was a bit much for my feeble brain.
Something like this should do it:
insert into my_new_table (
name,
age,
whatever
)
select
names.name,
ages.age,
passed_in_data
from
names inner join
ages on ....
where
....

Resources