POSIXlt Changing Weekday In March 2014 - r

I have an interesting problem arising. I have minute by minute data for 2014 of certain stocks and I want to analyze Fridays only and am using the code below to do so. It works great up until it gets to March. All of a sudden, Thursdays are being given a weekday value of 5 from the 4th line of the code below.
For example, 3/14/14 was this past Friday, however, the code below is setting 3/13/14 as Friday even though it was a Thurs.
My guess is this has something to do with leap years, but it is only a guess. Any idea what is causing this and how to fix it?
LNKD.csv, https://drive.google.com/file/d/0B4xAKSwsHiEBNVpEbHJGMU9QYXc/edit?usp=sharing
LNKD Clean.csv, https://drive.google.com/file/d/0B4xAKSwsHiEBVjBKcTM1VVg3aU0/edit?usp=sharing
data <- read.csv("LNKD.csv", stringsAsFactors=FALSE)
data$Up <- NULL
data$Down <- NULL
data$weekday <- as.POSIXlt(data$Date, format="%m/%d/%y")$wday
data <- subset(data, data$weekday==5)
write.csv(data, file="LNKD Clean.csv", row.names=FALSE)
Thank you.

It's because your date format uses '%y' and not '%Y'.
'%y' is the two-digit year ('14') but '%Y' is the 4-digit year, and your years have 4 digits.
e.g.
as.POSIXlt('03/13/2014', format="%m/%d/%y")
# "2020-03-13"
as.POSIXlt('03/13/2014', format="%m/%d/%Y")
# "2014-03-13"
All your dates are being interpreted as the year 2020 because the first two digits of '2014' is '20' and '%y' means this is the year '2020'.

Related

Convert from character to date in a "YYYY-WW" format in R

I have a hard time converting character to date in R.
I have a file where the dates are given as "2014-01", where the first is the year and the second is the week of the year. I want to convert this to a date type.
I have tried the following
z <- as.Date('2014-01', '%Y-%W')
print(z)
Output: "2014-12-05"
Which is not what I desire. I want to get the same format out, ie. the output should be "2014-01" but now as a date type.
It sounds like you are dealing with some version of year week, which exists in three forms in lubridate:
week() returns the number of complete seven day periods that have
occurred between the date and January 1st, plus one.
isoweek() returns the week as it would appear in the ISO 8601 system,
which uses a reoccurring leap week.
epiweek() is the US CDC version of epidemiological week. It follows
same rules as isoweek() but starts on Sunday. In other parts of the
world the convention is to start epidemiological weeks on Monday,
which is the same as isoweek.
Lubridate has functions to extract these from a date, but I don't know of a built-in way to go the other direction, from week to one representative day (out of 7 possible). One simple way if you're dealing with the first version would be to add 7 * (Week - 1) to jan 1 of the year.
library(dplyr)
data.frame(yearweek = c('2014-01', '2014-03')) %>%
tidyr::separate(yearweek, c("Year", "Week"), convert = TRUE) %>%
mutate(Date = as.Date(paste0(Year, "-01-01")) + 7 * (Week-1))
Year Week Date
1 2014 1 2014-01-01
2 2014 3 2014-01-15

as.Date produces unexpected result in a sequence of week-based dates

I am working on the transformation of week based dates to month based dates.
When checking my work, I found the following problem in my data which is the result of a simple call to as.Date()
as.Date("2016-50-4", format = "%Y-%U-%u")
as.Date("2016-50-5", format = "%Y-%U-%u")
as.Date("2016-50-6", format = "%Y-%U-%u")
as.Date("2016-50-7", format = "%Y-%U-%u") # this is the problem
The previous code yields correct date for the first 3 lines:
"2016-12-15"
"2016-12-16"
"2016-12-17"
The last line of code however, goes back 1 week:
"2016-12-11"
Can anybody explain what is happening here?
Working with week of the year can become very tricky. You may try to convert the dates using the ISOweek package:
# create date strings in the format given by the OP
wd <- c("2016-50-4","2016-50-5","2016-50-6","2016-50-7", "2016-51-1", "2016-52-7")
# convert to "normal" dates
ISOweek::ISOweek2date(stringr::str_replace(wd, "-", "-W"))
The result
#[1] "2016-12-15" "2016-12-16" "2016-12-17" "2016-12-18" "2016-12-19" "2017-01-01"
is of class Date.
Note that the ISO week-based date format is yyyy-Www-d with a capital W preceeding the week number. This is required to distinguish it from the standard month-based date format yyyy-mm-dd.
So, in order to convert the date strings provided by the OP using ISOweek2date() it is necessary to insert a W after the first hyphen which is accomplished by replacing the first - by -W in each string.
Also note that ISO weeks start on Monday and the days of the week are numbered 1 to 7. The year which belongs to an ISO week may differ from the calendar year. This can be seen from the sample dates above where the week-based date 2016-W52-7 is converted to 2017-01-01.
About the ISOweek package
Back in 2011, the %G, %g, %u, and %V format specifications weren't available to strptime() in the Windows version of R. This was annoying as I had to prepare weekly reports including week-on-week comparisons. I spent hours to find a solution for dealing with ISO weeks, ISO weekdays, and ISO years. Finally, I ended up creating the ISOweek package and publishing it on CRAN. Today, the package still has its merits as the aforementioned formats are ignored on input (see ?strptime for details).
As #lmo said in the comments, %u stands for the weekdays as a decimal number (1–7, with Monday as 1) and %U stands for the week of the year as decimal number (00–53) using Sunday as the first day. Thus, as.Date("2016-50-7", format = "%Y-%U-%u") will result in "2016-12-11".
However, if that should give "2016-12-18", then you should use a week format that has also Monday as starting day. According to the documentation of ?strptime you would expect that the format "%Y-%V-%u" thus gives the correct output, where %V stands for the week of the year as decimal number (01–53) with monday as the first day.
Unfortunately, it doesn't:
> as.Date("2016-50-7", format = "%Y-%V-%u")
[1] "2016-01-18"
However, at the end of the explanation of %V it sais "Accepted but ignored on input" meaning that it won't work.
You can circumvent this behavior as follows to get the correct dates:
# create a vector of dates
d <- c("2016-50-4","2016-50-5","2016-50-6","2016-50-7", "2016-51-1")
# convert to the correct dates
as.Date(paste0(substr(d,1,8), as.integer(substring(d,9))-1), "%Y-%U-%w") + 1
which gives:
[1] "2016-12-15" "2016-12-16" "2016-12-17" "2016-12-18" "2016-12-19"
The issue is because for %u, 1 is Monday and 7 is Sunday of the week. The problem is further complicated by the fact that %U assumes week begins on Sunday.
For the given input and expected behavior of format = "%Y-%U-%u", the output of line 4 is consistent with the output of previous 3 lines.
That is, if you want to use format = "%Y-%U-%u", you should pre-process your input. In this case, the fourth line would have to be as.Date("2016-51-7", format = "%Y-%U-%u") as revealed by
format(as.Date("2016-12-18"), "%Y-%U-%u")
# "2016-51-7"
Instead, you are currently passing "2016-50-7".
Better way of doing it might be to use the approach suggested in Uwe Block's answer. Since you are happy with "2016-50-4" being transformed to "2016-12-15", I suspect in your raw data, Monday is counted as 1 too. You could also create a custom function that changes the value of %U to count the week number as if week begins on Monday so that the output is as you expected.
#Function to change value of %U so that the week begins on Monday
pre_process = function(x, delim = "-"){
y = unlist(strsplit(x,delim))
# If the last day of the year is 7 (Sunday for %u),
# add 1 to the week to make it the week 00 of the next year
# I think there might be a better solution for this
if (y[2] == "53" & y[3] == "7"){
x = paste(as.integer(y[1])+1,"00",y[3],sep = delim)
} else if (y[3] == "7"){
# If the day is 7 (Sunday for %u), add 1 to the week
x = paste(y[1],as.integer(y[2])+1,y[3],sep = delim)
}
return(x)
}
And usage would be
as.Date(pre_process("2016-50-7"), format = "%Y-%U-%u")
# [1] "2016-12-18"
I'm not quite sure how to handle when the year ends on a Sunday.

R: Get workweek number, not seven day periods since Jan 1st

Hi I am looking at data to do with prices of commodities throughout a period of a few years. I want to summarize prices by work weeks, not weeks defined by seven day periods since Jan 1st. When I tried:
data <- mutate(data, week = week(strptime(Date, "%m/%d/%Y")))
The lubridate week() function counts "1/13/10" (mdy) as week 2 and "1/14/10" as week 3. I want those to be in the same week. Basically any run of mon-fri in the same week. If the year starts on a wednesday I want week1 to be wed-fri, week2 to start the next monday. I have no data on any weekends. Any thoughts? Thanks
This will give you week number assuming Date column is in Date format (you can use as.Date() to convert):
data <- mutate(data, week = format(Date, '%U'))
If you want week and year, you can use:
data <- mutate(data, week = format(Date, '%Y-%U'))
It will correctly number partial weeks.
Note: week number starts with 00 (but, that should be no problem).
You can also do it WITHOUT dplyr and it's mutate, like this:
data$week <- format(data$Date, '%U')

How to convert decimal date format (e.g. 2011.580) to normal date format?

I'm trying to change from the decimal date format (return type of cpts.ts() from the changepoint package) to the normal date format %Y-%m-%d. Example:
cpts.ts(myTimeSeries.BinSeg)
[1] 2001.667 2004.083 2008.750 2011.583 2011.917
The actual dates are sometime around August 2001, January 2004, September 2008, June/July 2011 and December 2011 (I don't know them exactly, I'm reading them off a graph).
I can't seem to find a standard method of converting this format back to the usual date format.
Can anybody help me?
Thanks
Slightly different results with lubridate:
library(lubridate)
decimals <- c(2001.667, 2004.083, 2008.750, 2011.583, 2011.917)
format(date_decimal(decimals), "%Y-%m-%d")
# [1] "2001-09-01" "2004-01-31" "2008-10-01" "2011-08-01" "2011-12-01"
> foo <- c(2001.667,2004.083,2008.750,2011.583,2011.917)
> as.Date(paste(trunc(foo),round((foo-trunc(foo))*365,0)),"%Y %j")
[1] "2001-08-31" "2004-01-30" "2008-09-30" "2011-08-01" "2011-12-01"
Look at ?as.Date and its format parameter, which will direct you to ?strptime, from which I took the %j format specification.
You may need to adapt for some corner cases, like January 1st.
For those considering a base R solution to this issue, the core of lubridate's date_decimal is essentially:
start <- as.POSIXct(paste0(trunc(foo), "/01/01"), tz="UTC")
end <- as.POSIXct(paste0(trunc(foo)+1,"/01/01"), tz="UTC")
start + (difftime(end, start, units="secs") * (foo - trunc(foo)))
I.e. - set a start date back at the start of the year in which the date occurs, set an end date at the start of the following year, multiply the difference between start and end by the fraction of the year elapsed, add this difference back to the start. Doing this takes into account leap years, and will work appropriately for January 1st.

From MMDD to day of the year in R

I have this .txt file:
http://pastebin.com/raw.php?i=0fdswDxF
First column (Date) shows date in month/day
So 0601 is the 1st of June
When I load this into R and I show the data, it removes the first 0 in the data.
So when loaded it looks like:
601
602
etc
For 1st of June, 2nd of June
For the months 10,11,12, it remains unchanged.
How do I change it back to 0601 etc.?
What I am trying to do is to change these days into the day of the year, for instance,
1st of January (0101) would be 1, and 31st of December would be 365.
There is no leap year to be considered.
I have the code to change this, if my data was shown as 0601 etc, but not as 601 etc.
copperNew$Date = as.numeric(as.POSIXct(strptime(paste0("2013",copperNew$Date), format="%Y%m%d")) -
as.POSIXct("2012-12-31"), units = "days")
Where Date of course is from the file linked above.
Please ask if you do not consider the description to be good enough.
You can use colClasses in the read.table function, then convert to POSIXlt and extract the year date. You are over complicating the process.
copperNew <- read.table("http://pastebin.com/raw.php?i=0fdswDxF", header=TRUE,
colClasses=c("character", "integer", rep("numeric", 3)))
tmp <- as.POSIXlt( copperNew$Date, format='%m%d' )
copperNew$Yday <- tmp$yday
The as.POSIXct function is able to parse a string without a year (assumes the current year) and computes the day of the year for you.
d<-as.Date("0201", format = "%m%d")
strftime(d, format="%j")
#[1] "032"
First you parse your string and obtain Date object which represents your date (notice that it will add current year, so if you want to count days for some specific year add it to your string: as.Date("1988-0201", format = "%Y-%m%d")).
Function strftime will convert your Date to POSIXlt object and return day of year. If you want the result to be a numeric value, you can do it like this: as.numeric(strftime(d, format = "%j"))(Thanks Gavin Simpson)
Convert it to POSIXlt using a year that is not a leap-year, then access the yday element and add 1 (because yday is 0 on January 1st).
strptime(paste0("2011","0201"),"%Y%m%d")$yday+1
# [1] 32
From start-to-finish:
x <- read.table("http://pastebin.com/raw.php?i=0fdswDxF",
colClasses=c("character",rep("numeric",5)), header=TRUE)
x$Date <- strptime(paste0("2011",x$Date),"%Y%m%d")$yday+1
In which language?
If it's something like C#, Java or Javascript, I'd follow these steps:
1-) parse a pair of integers from that column;
2-) create a datetime variable whose day and month are taken from the integers from step one. Set the year to some fixed value, or to the current year.
3-) create another datetime variable, whose date is the 1st of February of the same year as the one in step 2.
The number of the day is the difference in days between the datetime variables, + 1 day.
This one worked for me:
copperNew <- read.table("http://pastebin.com/raw.php?i=0fdswDxF",
header=TRUE, sep=" ", colClasses=c("character",
"integer",
rep("numeric", 3)))
copperNew$diff = difftime(as.POSIXct(strptime(paste0("2013",dat$Date),
format="%Y%m%d", tz="GMT")),
as.POSIXct("2012-12-31", tz="GMT"), units="days")
I had to specify the timezone (tz argument in as.POSIXct), otherwise I got two different timezones for the vectors I am subtracting and therefore non-integer days.

Resources