extracting month and year from object of POSIX Dates and Times Classes in r - r

I tried to extract month and year from an object of POSIXlt class, however, the returned month seemed to be one month earlier and the year did not seem correct either. My R code is as follows:

The POSIXlt stores month of the year using 0 to 11 (0 for Jan, 11 for Dec), and the year is counted as years since 1990.

Related

Convert from character to date in a "YYYY-WW" format in R

I have a hard time converting character to date in R.
I have a file where the dates are given as "2014-01", where the first is the year and the second is the week of the year. I want to convert this to a date type.
I have tried the following
z <- as.Date('2014-01', '%Y-%W')
print(z)
Output: "2014-12-05"
Which is not what I desire. I want to get the same format out, ie. the output should be "2014-01" but now as a date type.
It sounds like you are dealing with some version of year week, which exists in three forms in lubridate:
week() returns the number of complete seven day periods that have
occurred between the date and January 1st, plus one.
isoweek() returns the week as it would appear in the ISO 8601 system,
which uses a reoccurring leap week.
epiweek() is the US CDC version of epidemiological week. It follows
same rules as isoweek() but starts on Sunday. In other parts of the
world the convention is to start epidemiological weeks on Monday,
which is the same as isoweek.
Lubridate has functions to extract these from a date, but I don't know of a built-in way to go the other direction, from week to one representative day (out of 7 possible). One simple way if you're dealing with the first version would be to add 7 * (Week - 1) to jan 1 of the year.
library(dplyr)
data.frame(yearweek = c('2014-01', '2014-03')) %>%
tidyr::separate(yearweek, c("Year", "Week"), convert = TRUE) %>%
mutate(Date = as.Date(paste0(Year, "-01-01")) + 7 * (Week-1))
Year Week Date
1 2014 1 2014-01-01
2 2014 3 2014-01-15

Formatting year month variable as date

In Stata I have a variable yearmonth which is formatted as 201201, 201202 etc. for the years 2012 - 2019, monthly with no gaps. When I format the variable as
format yearmonth %tm
The results look like: 2.0e+05 for all periods, with the exact same number each time. A Dickey-Fuller test tells me I have gaps in my data (I don't) and a tsfill command generates dozens of empty observations between each period.
How do I properly format my yearmonth variable so I can set it as a monthly date?
You do have gaps — between 201212 and 201301, for example. Consider a statement like
gen wanted = ym(floor(yearmonth/100), mod(yearmonth, 100))
which parses your integers like 201201 into year and month components. So floor(201201/100) is floor(2012.01) and so 2012 while mod(201201, 100) is 1. The two components are then the arguments of ym() which expects a year and a month argument.
Then and only then will your format statement do you want. That command won’t create date variables.
See help datetime in Stata for more information and Problem with displaying reformatted string into a four-digit year in Stata 17 for an explanation of the difference between a date value and a date display format.

Year column to time series [duplicate]

This question already has answers here:
Convert four digit year values to class Date
(5 answers)
Closed 5 years ago.
OK, this should be really simple but I'm not 'getting it.' I have a data frame with a column "Year" that I want to convert to a time series, but the format is tripping me up. How do I convert the "Year" value to a date, with the actual date being the end of each respective year (e.g. 2015 -> December 31st 2015)?
Year Production
1 1900 38400000
2 1901 43400000
3 1902 49000000
4 1903 44100000
5 1904 49800000
Goal is to get this to a time series data frame. (e.g. xts)
It is not quite the same as a previous question that converted a vector of years to dates. "Convert four digit year values to date type". Goal is to index the data by date, converting it to xts or similar object.
Edited:
This was the final solution:
df <- xts(x = df_original, order.by = as.Date(paste0(df_original[,1], "-12-31")))
whereby the "[,1]" indicates the first column of the original data frame.
If you want each full date to be 31 December, you could use paste along with as.Date to cast to a date:
df$date <- as.Date(paste0(df$Year, "-12-31"))
In addition to Tim Biegeleisen's answer, I will just add another way
df$final_date <- as.Date(ISOdate(df$Year, 12, 31))

Determine week number from date over several years

I'm looking for a way to determine the week number (week beginning on Monday) over several years. That means I don't want to have 0-53 but if, let's say I have 2 years of dates, I want them to be numbered with 0-106 in R.
I tried strftime(Datum, format ="%W") but then I only get the annual week number and not as a whole.
Given that you did not provide any data, I took the liberty of creating some:
#create data
Datum<-c("2013-03-01", "2014-06-02", "2013-06-01")
# format data to year-month-day with strptime
Datum<-strptime(Datum, "%Y-%m-%d")
You now need to identify the origin year. As I'm sure you are aware not all years have the same number of weeks 52.29 in a leap year vs. 52.4 in a standard calendar year but as this is unlikely to be a consideration for only 2 years we can use the number of weeks returned through the strftime function.
origin.year=as.numeric(min(substring(Datum,1,4)))
# number of weeks in first year (offset for second year)
n.weeks<-52
Now we can create a vector containing the number of weeks to offset each week in Datum (X).
X<-as.numeric(substring(Datum,1,4)!=origin.year)*n.weeks
We can then simply add this vector to the number of weeks returned by strftime when it is applied to Datum
week.vec<-as.numeric(strftime(Datum, "%W"))+X
This will work for 2 years, but if you have more years than this, you will need to modify the offsets to account for this.

Post-Process a Stata %tw date in R

The %tw format in Stata has the form: 1960w1 which has no equivalent in R.
Therefore %tw dates must be post-processed.
Importing a .dta file into R, the date is an integer like 1304 (instead of 1985w5) or 1426 (instead of 1987w23). If it was a simple time series you could set a starting date as follows:
ts(df, start= c(1985,5), frequency=52)
Another possibility would be:
as.Date(Camp$date, format= "%Yw%W" , origin = "1985w5")
But if each row is not a single date, then you must convert it.
The package ISOweek is based on ISO-8601 with the form "1985-W05" and does not process the Stata %tw.
The Lubridate package does not work with this format. The week() returns the number of complete seven day periods that have occurred between the date and January 1st, plus one. week function
In Stata week 1 of any year starts on 1 January, whatever day of the week that is. Stata Documentation on Dates
In the format %W of Date in R the week starts as Monday as first day of the week.
From strptime %V is
the Week of the year as decimal number (00--53) as defined in ISO
8601. If the week (starting on Monday) containing 1 January has four or more days in the new year, then it is considered week 1. Otherwise,
it is the last week of the previous year, and the next week is week 1.
(Accepted but ignored on input.) Strptime
Larmarange noted on Github that Haven doesn't interpret dates properly:
months, week, quarter and halfyear are specific format from Stata,
respectively %tm, %tw, %tq and %th. I'm not sure that there are
corresponding formats available in R. So far they are imported as
integers.
Is there a way to convert Stata %tw to a date format R understands?
Here is an Stata file with dates
This won't be an answer in terms of R code, but it is commentary on Stata weeks that can't be fitted into a comment.
Strictly, dates in Stata are not defined by the display formats that make them intelligible to people. A date in Stata is always a numeric variable or scalar or macro defined with origin the first instance in 1960. Thus it is at best a shorthand to talk about %tw dates, etc. We can use display to see the effects of different date display formats:
. di %td 0
01jan1960
. di %tw 0
1960w1
. di %tq 0
1960q1
. di %td 42
12feb1960
. di %tw 42
1960w43
. di %tq 42
1970q3
A subtle point made explicit above is that changing the display format will not change what is stored, i.e. the numeric value.
Otherwise put, dates in Stata are not distinct data types; they are just integers made intelligible as dates by a pertinent display format.
The question presupposes that it was correct to describe some weekly dates in terms of Stata weeks. This seems unlikely, as I know no instance in which a body outside StataCorp uses the week rules of Stata, not only that week 1 always starts on 1 January, but also that week 52 always includes either 8 or 9 days and hence that there is never a week 53 in a calendar year.
So, you need to go upstream and find out what the data should have been. Failing some explanation, my best advice is to map the 52 weeks of each year to the days that start them, namely days 1(7)358 of each calendar year.
Stata weeks won't map one-to-one to any other scheme for defining weeks.
More in this article on Stata weeks
It's not completely clear what the question is but the year and week corresponding to 1304 are:
wk <- 1304
1960 + wk %/% 52
## [1] 1985
wk %% 52 + 1
## [1] 5
so assuming that the first week of the year is week 1 and starts on Jan 1st, the beginning of the above week is this date:
as.Date(paste(1960 + wk %/% 52, 1, 1, sep = "-")) + 7 * (wk %% 52)
## [1] "1985-01-29"

Resources