Formatting year month variable as date - datetime

In Stata I have a variable yearmonth which is formatted as 201201, 201202 etc. for the years 2012 - 2019, monthly with no gaps. When I format the variable as
format yearmonth %tm
The results look like: 2.0e+05 for all periods, with the exact same number each time. A Dickey-Fuller test tells me I have gaps in my data (I don't) and a tsfill command generates dozens of empty observations between each period.
How do I properly format my yearmonth variable so I can set it as a monthly date?

You do have gaps — between 201212 and 201301, for example. Consider a statement like
gen wanted = ym(floor(yearmonth/100), mod(yearmonth, 100))
which parses your integers like 201201 into year and month components. So floor(201201/100) is floor(2012.01) and so 2012 while mod(201201, 100) is 1. The two components are then the arguments of ym() which expects a year and a month argument.
Then and only then will your format statement do you want. That command won’t create date variables.
See help datetime in Stata for more information and Problem with displaying reformatted string into a four-digit year in Stata 17 for an explanation of the difference between a date value and a date display format.

Related

Trouble obtaining quarterly values from a date variable in stata

I am starting with a date_of_survey variable that is a string formatted as YYYY-MM-DD. I then run the following commands to convert it to a date variable, and display that variable in a useful format:
gen date = date(date_of_survey, "YMD")
gen date_clean = date
format date_clean %dM_d,_CY
drop date_of_survey
That leaves me with a "date_clean" variable displayed as "September 3, 2020" and a corresponding "date" variable displayed as "22161" (equal to days since January 1, 1960).
I now need to create a variable that indicates the year and quarter of each observation, preferably in YYYY-QQ format. I assumed this shouldn't be difficult, but no matter how I have coded it, I wind up with years in the 7000s and inaccurate quarters. I must be misunderstanding how the dates are stored. My first instinct was to try a simple format date %tq command, but I'm still not getting the output I need. Any help is much appreciated. I read over the help files, and can't find the discrepancy that's causing this little problem.
ANSWER: I needed to put the date variable into quarters since January 1, 2021.  a qofd() function call before the format %tq did the trick!

controlM variable for YYYYMM?

I'm using ControlM and in a command, I would like to find a variable that gives me the date in this format : YYYYMM
I found there is %%$DATE variable but it gives YYYYMMDD
Thanks for you help
It is possible to define and concatenate a variable that will represent the date in such a format.
These are available:
Day DD, %%DAY,
Month MM, %%MONTH,
Year YY, %%YEAR,
Year YYYY, %%$YEAR
Prefer %%$OYEAR AND %%OMONTH over %%$YEAR and %%MONTH
I suggest using the variables %%$OYEAR and %%OMONTH over %%$YEAR and %%MONTH. The reason is that date variables beginning with O refer to processing dates and do not necessarily coincide with the system date. For this case you could use any of the following options:
1. YYYYMM = %%$OYEAR.%%OMONTH
2. YYYYMM = %%SUBSTRING %%$ODATE 1 6
The $ symbol preceding the variable %%$OYEAR or %%$ODATE indicates that the year is returned in 4-digit format, instead of OYEAR or ODATE which print the year with only 2 digits.
The dot (.) character is used for concatenate variables.
For example: For the order day May 29, 2020.
1. %%$ODATE would print 20200529
2. %%ODATE would print 200529

Issue with reading csv file in R software with year variable

I am reading some csv files for each year and every table has a year (two digit year), day and month column; instead I need one column just with the date. I was doing fine using my R code until for one of the tables the year variable has four digits (e.g. 2000). In this case my code convert this year to 2020.
Any thoughts?
dt_00$date=as.Date(with(dt_00,paste(MONTH,DAY,YEAR,sep='-')),'%m-%d-%y)
Because lubridate accommodates quite a few date format varieties, this might work:
library(lubridate)
dt_00$date <- mdy(dt_00$date)

time series in R with sales prediction with only date values

i have a data with date(2015)with mm/dd/yy format and sales. I need to predict sales for 2016 with the given data. I just know, I need to use time series forecasting. However no idea. Since, many examples have only year like(1960,1970,..) my data has only one year with several months. Don't know how to plot too. can you give me a clear structure how to proceed?
Assuming that the date is in string and in the format mm/dd/yy
convert string into date by using this code
a <- "07/23/15"
b <- as.Date(a, format = "%m/%d/%y")
fullYear <- format(b,'%Y') // to get 2015 as year
halfYear <- format(b, '%y') //to get 15 as year
After this you can work on
I have found the solution. Converted sales figure into time series format.
plotted the data and seen whether there is any trend/Seasonality.
Since the data has only trend applied holts exponential smoothing under forecast package. Sales of 2016 has been found and plotted.

Post-Process a Stata %tw date in R

The %tw format in Stata has the form: 1960w1 which has no equivalent in R.
Therefore %tw dates must be post-processed.
Importing a .dta file into R, the date is an integer like 1304 (instead of 1985w5) or 1426 (instead of 1987w23). If it was a simple time series you could set a starting date as follows:
ts(df, start= c(1985,5), frequency=52)
Another possibility would be:
as.Date(Camp$date, format= "%Yw%W" , origin = "1985w5")
But if each row is not a single date, then you must convert it.
The package ISOweek is based on ISO-8601 with the form "1985-W05" and does not process the Stata %tw.
The Lubridate package does not work with this format. The week() returns the number of complete seven day periods that have occurred between the date and January 1st, plus one. week function
In Stata week 1 of any year starts on 1 January, whatever day of the week that is. Stata Documentation on Dates
In the format %W of Date in R the week starts as Monday as first day of the week.
From strptime %V is
the Week of the year as decimal number (00--53) as defined in ISO
8601. If the week (starting on Monday) containing 1 January has four or more days in the new year, then it is considered week 1. Otherwise,
it is the last week of the previous year, and the next week is week 1.
(Accepted but ignored on input.) Strptime
Larmarange noted on Github that Haven doesn't interpret dates properly:
months, week, quarter and halfyear are specific format from Stata,
respectively %tm, %tw, %tq and %th. I'm not sure that there are
corresponding formats available in R. So far they are imported as
integers.
Is there a way to convert Stata %tw to a date format R understands?
Here is an Stata file with dates
This won't be an answer in terms of R code, but it is commentary on Stata weeks that can't be fitted into a comment.
Strictly, dates in Stata are not defined by the display formats that make them intelligible to people. A date in Stata is always a numeric variable or scalar or macro defined with origin the first instance in 1960. Thus it is at best a shorthand to talk about %tw dates, etc. We can use display to see the effects of different date display formats:
. di %td 0
01jan1960
. di %tw 0
1960w1
. di %tq 0
1960q1
. di %td 42
12feb1960
. di %tw 42
1960w43
. di %tq 42
1970q3
A subtle point made explicit above is that changing the display format will not change what is stored, i.e. the numeric value.
Otherwise put, dates in Stata are not distinct data types; they are just integers made intelligible as dates by a pertinent display format.
The question presupposes that it was correct to describe some weekly dates in terms of Stata weeks. This seems unlikely, as I know no instance in which a body outside StataCorp uses the week rules of Stata, not only that week 1 always starts on 1 January, but also that week 52 always includes either 8 or 9 days and hence that there is never a week 53 in a calendar year.
So, you need to go upstream and find out what the data should have been. Failing some explanation, my best advice is to map the 52 weeks of each year to the days that start them, namely days 1(7)358 of each calendar year.
Stata weeks won't map one-to-one to any other scheme for defining weeks.
More in this article on Stata weeks
It's not completely clear what the question is but the year and week corresponding to 1304 are:
wk <- 1304
1960 + wk %/% 52
## [1] 1985
wk %% 52 + 1
## [1] 5
so assuming that the first week of the year is week 1 and starts on Jan 1st, the beginning of the above week is this date:
as.Date(paste(1960 + wk %/% 52, 1, 1, sep = "-")) + 7 * (wk %% 52)
## [1] "1985-01-29"

Resources