I have a SAS code that I need to convert into R.
My SAS code is something like this -
proc sql;
create table data as
select a.*,b.qty from Sales as a inner join Units as b
on a.id=b.id and put(a.date,yymmn6.)=put(c.date,yymmn6.)
quit;
I know that put(a.date,yymmn6.) converts the date into a SAS date value. But what does a.date become after this function? If date=01jan2012, put(a.date,yymmn6.) makes it as some SAS value that represents 201201 or 20120101? i.e. the SAS value created will stand for the whole date or just the year and mon of the date?
Currently, I am writing the R code for this as -
data <- sqldf("select a.*,b.qty from Sales as a inner join Units as b
on a.id=b.id and a.date=c.date")
Should I be doing it as -
Sales$date <- as.yearmon(Sales$date)
Units$date <- as.yearmon(Units$date)
data <- sqldf("select a.*,b.qty from Sales as a inner join Units as b
on a.id=b.id and a.date=c.date")
I don't have access to SAS and hence, I cannot try this out on a sample data. Any help would be great. Thanks!
put(a.date,yymmn6.) converts a numeric date value to a character value stored as yyyymm (e.g. 201201). Therefore the join condition is matching all dates where the month and year are the same, but not necessarily the day.
I'm not sure of the best way of achieving this in R, but you seem to have some ideas on this.
Hope this helps.
When you use put(a.date,yymmn6.) the output of that function is a character. Put takes a numeric input and format and outputs the formatted numeric value as character. input function does the opposite.
data mydata;
sas_numeric_date = "01jan2012"d;
sas_yyyymm_char_date = put(sas_numeric_date, yymmn6.);
sas_yyyymm_numeric_date = input(sas_yyyymm_char_date, yymmn6.);
output;
sas_numeric_date = "29Feb2012"d;
sas_yyyymm_char_date = put(sas_numeric_date, yymmn6.);
sas_yyyymm_numeric_date = input(sas_yyyymm_char_date, yymmn6.);
output;
format sas_numeric_date sas_yyyymm_numeric_date date9.;
run;
sas_numeric_date sas_yyyymm_char_date sas_yyyymm_numeric_date
01Jan2012 201201 01Jan2012
29Feb2012 201202 01Feb2012
So, when you apply the yymmn6. as informat on sas_yyyymm_char_date - which itself is in yyyymm format, the resulting value is numeric and day part in the date defaults to the first day of the month as shown above.
Related
I have a table including date column and hour column which is an integer type column varying from 0 to 24. I need to combine these two fields and create an hourly composite datetime field.
However, I was able to create that kind of variable by using || and cast. But I am unable to transform this code to Hive editor syntax. Can you help me with this problem
SQL Code:
CAST(CAST(CAST(DATE_OF_TRANSACTION AS FORMAT 'yyyy-mm-dd') AS VARCHAR(11))||' '||CAST(CAST( BasketHour AS FORMAT '99:') AS VARCHAR(10))||'00:00' AS TIMESTAMP(0)) Date_Time
Thank you very much
For example like this:
cast(concat(DATE_OF_TRANSACTION, ' ', lpad(BasketHour ,2,0),':00:00.0' ) as timestamp)
How to convert date to financial quarters:
3/31/2018 to 2018q1
I pulled a dataset from the FDIC website. Their date format is currently dd/mm/yyyy.
I am interested in creating a Scatter Plot/Bubble Chart using Gapminder.
However Gapminder needs each date to be converted to financial quarters. e.g. yyyyq1, yyyyq2, yyyyq3, or yyyyq4. e.g. 20017q1, 20017q2, 20017q3, or 2017q4.
This query needs to convert the date to financial quarters, but doesn't already do so. What needs to be added to convert "repdte" output dd/mm/yyyy to yyyyq1?
SELECT
PCR.name,
PCR.repdte as Quarter,
PCR.idlncorr as NetLoansAndLeasesToCoreDeposits,
CAST(LD.IDdeplam as int) as DepositAccounts$GreaterThan$250k
from All_Reports_20180630_Performance_and_Condition_Ratios as PCR
join
'All_Reports_20180630_Deposits_Based_on_the_$250,000_Reporting_Threshold'
as LD on PCR.cert = LD.cert
UNION ALL
SELECT
PCR.name,
PCR.repdte as Quarter,
PCR.idlncorr as NetLoansAndLeasesToCoreDeposits,
CAST(LD.IDdeplam as int) as DepositAccounts$GreaterThan$250k
FROM All_Reports_20180331_Performance_and_Condition_Ratios as PCR
JOIN
'All_Reports_20180331_Deposits_Based_on_the_$250,000_Reporting_Threshold'
as LD on PCR.cert = LD.cert
What I currently have
Quarter
03/31/2018
The format that Gapminder needs to render the Bubble Chart:
ReportDate
2009q1
I believe that using
substr(PCR.repdte,7,4)||'q'||CAST(1+((substr(PCR.repdte,1,2)-1) / 3) AS INTEGER)
will convert the date for you.
For example, consider the following :-
DROP TABLE IF EXISTS PCR;
CREATE TABLE IF NOT EXISTS PCR (repdte);
INSERT INTO PCR VALUES('01/31/2009'),('02/31/2009'),('03/31/2009'),('04/31/2009'),('05/31/2009'),('06/31/2009'),('07/31/2009'),('08/31/2009'),('09/31/2009'),('10/31/2009'),('11/31/2009'),('12/31/2009');
SELECT PCR.repdte,
substr(PCR.repdte,7,4)||'q'||CAST(1+((substr(PCR.repdte,1,2)-1) / 3) AS INTEGER) FROM PCR;
Which results in :-
Additional
Re comment :-
It works. However, I'm getting an output of '018q2' instead of
'2018q2'. What would I change to add a '2' to '018q2'?
This would appear to be due to the date have a variable length day part, that is if the day part is less then 10 then it is a single numeric rather than being padded with 0 and two numerics when 10 or more.
The following could be used :-
replace(substr(PCR.repdte,6),'/','')||'q'||CAST(1+((substr(PCR.repdte,1,2)-1) / 3) AS INTEGER)
this works by taking the year from from the 6th character and removing the / if it exists, consider the following
:-
DROP TABLE IF EXISTS PCR;
CREATE TABLE IF NOT EXISTS PCR (repdte);
INSERT INTO PCR VALUES('01/31/2009'),('02/1/2009'),('03/31/2009'),('04/31/2009'),('05/1/2009'),('06/31/2009'),('07/31/2009'),('08/1/2009'),('09/31/2009'),('10/31/2009'),('11/31/2009'),('12/31/2009');
SELECT PCR.repdte,
substr(PCR.repdte,7,4)||'q'||CAST(1+((substr(PCR.repdte,1,2)-1) / 3) AS INTEGER), -- OLD
replace(substr(PCR.repdte,6),'/','')||'q'||CAST(1+((substr(PCR.repdte,1,2)-1) / 3) AS INTEGER) -- MODIFIED
FROM PCR;
Which results in :-
I have two large data sets.
The first data set includes ID, Starting time and Ending time.
The second data set includes ID, Starting time and Ending time.
I want to merge these two data sets based on ID and Starting time by considering the fact that each date from the first data set can merge to any date with its range of 5 days more or less. It means if we have 23/4/2012 in the first data set, It can merge to any staring date between 18/4/2012 to 28/4/2012.
Input data:
x<-c(1,2,3,4,5,6,6,7,7,8,8,9,10)
StartTime<-c(24/5/1980,2/6/1932,24/6/1945,25/9/1954,12/11/1970,14/3/1984,15/5/1999,20/5/1990,25/9/1981,28/2/1980,29/1/1984,24/4/1987,30/6/1988)
Endtime<-c(24/6/1980,2/8/1932,24/9/1945,25/10/1954,14/11/1970,14/12/1984,15/10/1999,26/5/1990,29/9/1981,28/3/1980,29/1/1984,24/6/1987,30/7/1988)
df1<-data.frame(x,StartTime,Endtime)
x<-c(1,1,1,2,2,3,3,4,5,5,6,6,7)
StartTime<-c(29/5/1980,20/5/1980,23/5/1945,5/6/1932,7/6/1932,27/6/1945,20/6/1945,20/5/1990,25/9/1981,28/2/1980,29/3/1984,24/5/1987,30/7/1988)
Endtime1<-c(24/6/1980,2/8/1990,24/9/1945,25/10/1954,14/11/1970,14/12/1984,15/10/1999,26/5/1990,29/9/1981,28/3/1980,29/1/1984,24/6/1987,30/7/1988)
df2<-data.frame(x,StartTime,Endtime2)
Convert your date strings to Dates using as.POSIXct() https://stat.ethz.ch/R-manual/R-devel/library/base/html/as.POSIXlt.html
library(sqldf)
df3 <- sqldf("SELECT df1.*, df2.* FROM df1 INNER JOIN df2 ON julianday(df1.StartDate) - julianday(df2.StartDate) BETWEEN -5 AND 5 AND df1.ID = df2.ID")
Hello I am trying to set a date 01/02/2015 to 01/09/2015. I am using a IF statement but I think since the column has as format of mmddyy10 and informat datetime20 its not working.
The column looks like this
01/09/2015
01/02/2015
05/23-2015
So I want it to look like this
01/09/2015
01/09/2015
05/23-2015
Is there a function in SAS that allows me to do this? Or will I have to convert the column to a Char and do my If statement that way.
As long as it's a date value - so the unformatted value is something around 20000 - you can use date constant.
data want;
set have;
if datevar = '02JAN2015'd then datevar='09JAN2015'd;
run;
This is irrespective of the format applied to the column. Format is not variable type; it's something like "When I look at this variable, please print it nicely using this format." The underlying value is the same for all dates: some number of days since 1/1/1960.
Date constants are always represented using DATE9. format (DDMONYYYY).
If it's actually a datetime, then the format would not be MMDDYY10. but something else. Then you use a datetime constant:
if datetimevar = '02JAN2015:00:00:00'dt then ...
or convert it to date using datepart first.
if datepart(datetimevar) = '02JAN2015'd then ...
I have two columns in my table one is having a date in yyyymm format and other column has some integer values between 1 to 50. How can I add these two fields and get a date value?
For example: 201402 + 12 should give me 201502 as an answer!
I assume you don't really have a DATE column but a varchar column that stores a month specification in the format yyyymm.
If you want to make use of Oracle's date arithmetic you first need to convert this "month" into a real date.
Something like this:
select to_char(add_months(to_date('201402', 'yyyymm'), 12), 'yyyymm')
from dual;
You will need to replace the character literal '201402' with a reference to your column.