converting data.frames into time series - r

Here we go againi, I am still banging my head against the wall on the above problem.
I have a data.frame that I upload via csv which looks like:
X SPY VTI
01.02.2002 0.0000 0.0000
04.02.2002 -2.4578 -2.4167
.....
31.12.2015 -1.003 -0.9685
where X is date and SPY and VTI are stock returns
I tried many things to convert to a time series. first I tried
spyvti$X <- as.Date(as.character(spyvti$X),format="%d.%m.%Y.")
and what I get is:
X SPY VTI
NA 0.0000 0.0000
NA -2.4856 -2.4167
.....
NA -1.003 -0.9685
so it looks like it can't convert the first column, which is a factor, in an object of class(Date).
I tried also to detach the data.drame into 3 different vectors, converting first the date vector into character, which worked, then
date <- as.Date(date, format = "%d.%m.%Y.")
error in charToDate(x):
character string is not in a standard unambiguous format.
So I'd like to get some help with overcoming the Date problem, and I'd like to know if, when the date problem is over, creating a ts object as below is correct
tsobject <- xts(date,spy)
where spy is a numeric.
Thanks a lot
Paolo

Use the "lubridate" package. It makes conversion of dates super easy.
library(lubridate)
dmy(spyvti$x)

I am making this up from my mind. Hope it works. You can try the following:
Yourdataframe$X<-strptime(as.character(Yourdataframe$X),format="%d.%m.%Y")
Yourdataframe<-xts(Yourdataframe[,2:3],order.by=Yourdataframe[,1]

Assuming your example data frame is named df you can convert it into a xts time series object like so:
library(xts)
xtsObject <- as.xts(df[,-1],order.by = as.Date(as.character(df[,1]), format = "%d.%m.%Y"))

Related

How to convert a date with only a year to a date with the format "Year-Month-Day" in R

Sorry for the question, I started using RStudio a month ago and I get confronted to things I've never learned. I checked all the websites, helps and forums possible the past two days and this is getting me crazy.
I got a variable called Release giving the date of the release of a song. Some dates are following the format %Y-%m-%d whereas some others only give me a Year.
I'd like them to be all the same but I'm struggling to only modify the observations with the year.
Brief summary in word:
11/11/2011
01/06/2011
1974
1970
16/09/2003
I've imported the data with :
music<-read.csv("music2.csv", header=TRUE, sep = ",", encoding = "UTF-8",stringsAsFactors = F)
And this how I have it in RStudio
"2011-11-11" "2011-06-01" "1974" "1970" "2003-09-16"
This is an example as I got 2200 obs.
The working code is
Modifdates<- ifelse(nchar(music$Release)==4,paste0("01-01-",music$Release),music$Release)
Modifdates
I obtain this :
"2011-11-11" "2011-06-01" "01-01-1974" "01-01-1970" "2003-09-16"
I just would like them to be all with the same format "%Y-%m-%d". How can I do that?
So I tried this
as.Date(music$Release,format="%Y-%m-%d")
But I got NA's where I modified my dates.
Could anyone help?
Update
Using sub find occurrences of date consisting from single year ("(^[0-9]{4}$)" part), using back-reference substitute it to add -01-01 at the end of the string ("\\1-01-01" part), and finally convert it to the date class, using as.Date() (as.Date() default is format = "%Y-%m-%d" so you don't need to specify it):
dat <- c("2011-11-11", "2011-06-01", "1974", "1970", "2003-09-16")
dat class is character:
as.Date(sub("(^[0-9]{4}$)", "\\1-01-01", dat))
# "2011-11-11" "2011-06-01" "1974-01-01" "1970-01-01" "2003-09-16"
dat class is factor, but sub automatically coerce it to the character class for you:
# dat <- as.factor(dat); dat
# 2011-11-11 2011-06-01 1974 1970 2003-09-16
# Levels: 1970 1974 2003-09-16 2011-06-01 2011-11-11
as.Date(sub("(^[0-9]{4}$)", "\\1-01-01", dat))
# "2011-11-11" "2011-06-01" "1974-01-01" "1970-01-01" "2003-09-16"
Welcome to SO, please try to provide a reproducible example next time so that we can best help you.
I think here you could use:
testdates <- c("1974", "12-12-2012")
betterdates <- ifelse(nchar(testdates)==4,paste0("01-01-",testdates),testdates)
> betterdates
[1] "01-01-1974" "12-12-2012"
EDIT: if your vector is factor you should use as.character.factor first. If you then want to convert back to factor you can use as.factor
EDIT2 : do not convert as.date before doing this. Only do it after this modification

Converting my dates into a POSIXct class

I'm currently working my way through the adehabitatLT package.
I've put my date_time column into characters and named it da:
da<-as.character(dat$date_time)
head(da)
[1] "7/08/2015 0:22" "7/08/2015 0:52" "7/08/2015 1:22" "7/08/2015 1:52" "7/08/2015 2:56" "7/08/2015 3:26"
As you can see my date_time input is a bit non traditional and i think this is where the error occurs, because when i create the class POSIXct:
da<-as.POSIXct(strptime(as.character(dat$date_time),"%d/%m/%y% H:%M:%S"))
It creates the class but i get NA for all my values:
head(da)
[1] NA NA NA NA NA NA
My end objective here is to create an object of the class ltraj (but not only containing the date but the time as well).
Any ideas anyone?
Kind regards,
Sam
da<-as.POSIXct(strptime(as.character(locs$Date),"%y%m%d"))
The format should be modified to
as.POSIXct(strptime(da, "%d/%m/%Y %H:%M"))
Or if month is first followed by day, then change it to "%m/%d/%Y %H:%M"
While parsing tricky date/time formats, it might be useful to use lubridate package by Garrett Grolemund and Hadley Wickham.
In your case, simply do
require(lubridate)
a <- dmy_hm(da)
The separator and the number of digits for day or month or hours etc are automatically parsed.
Find more info here

converting hours:minutes with unevern column lengths - zeros

i am trying to convert a data.frame with the amount of time in the format hours:minutes.
i found this post useful and like the simple code approach of using the POSIXlt field type.
R: Convert hours:minutes:seconds
However each column represents a month's worth of days. columns are thus uneven. When i try the code below following several other SO posts, i get zeros in the one column with fewer row values.
The code is below. Note that when run, you get all zeros for feb which has fewer data values in its rows.
rDF <- data.frame(jan=c("9:59","10:02","10:04"),
feb=c("9:59","10:02",""),
mar=c("9:59","10:02","10:04"),stringsAsFactors = FALSE)
for (i in 1:3) {
Res <- as.POSIXlt(paste(Sys.Date(), rDF[,i]))
rDF[,i] <- Res$hour + Res$min/60
}
Thank you for any suggestions to fix this issue. I'm open to a more efficient approach as well.
Best,
Leah
You could try using the package lubridate. Here we are converting your data row by row to hour-minute format (using hm), then extracting the hours, and adding the minutes divided by 60:
library(lubridate)
rDF[] <- lapply(rDF, function(x){hm(x)$hour + hm(x)$minute/60})
jan feb mar
1 9.983333 9.983333 9.983333
2 10.033333 10.033333 10.033333
3 10.066667 NA 10.066667
This could easily be achieved with package lubridate's hm:
library(lubridate)
temp<-lapply(rDF,hm)
NewDF<-data.frame(jan=temp[[1]],feb=temp[[2]],mar=temp[[3]])

Creating XTS with correct time values

I'm trying to create a new XTS object for a set of intraday FX data. The initial dataframe is called "one_day_series" and looks like this:
pair id date_time bid ask mid_price
1 USDCAD 485194239 2009-08-03 08:00:00.451 1.07679 1.07699 1.07689
The command I use to create an XTS object is as follows:
my_xts <- xts(one_day_series[,6], as.POSIXct(strptime(one_day_series[,3], "%Y-%m-%d %H:%M:%S")))
I get an XTS object out of this however the time has been reset to 00:00:02, an example:
row.names V1
1 2009-08-03 00:00:02 1.07591
I just want the time to be created correctly from the dataframe so I would be grateful if someone could help me understand what is going wrong here ?

Operation with a date vector returns this message: Error in charToDate(x)

I have a problem with some date variables in my data. I already checked other similar questions here but I couldn't find the answer.
I have a very long dataset and some date vectors. The data was originally in stata format, I've tried to change them into R date format with:
as.Date(example$dstart)
which seems to work, after checking the class of the vector; but then I realised that apparently some cases are not in the standard unambiguous format that R requires, I realised when I was trying to convert "." into NAs, when I got this message
Error in charToDate(x) :
character string is not in a standard unambiguous format
This is an example of the data that I have:
head(sample)
dstart dstart2 dleave Ind
2005-03-20 <NA> 2005-11-19 1
2005-10-27 2006-07-07 2005-11-15 2
2000-02-29 2008-04-16 2005-03-02 3
2003-09-10 2007-07-23 2005-04-05 4
2004-04-24 2006-02-28 2005-10-17 5
2005-08-16 <NA> 2005-08-20 6
I presume that there are a few cases in the wrong format, but I don't know how to identify those cases.
Could you please advice me how to change the format of the of the date vector into an R format? I've tried this but it doesn't solve my problem.
as.Date(example$dstart, format = "%Y/%m/%d")
This has caused me some problem in my analysis when trying to sort by date, some dates are sorted before when they are obviously posterior.
A sample of the data
Your date format specification is using "/" instead of "-". If all your data is like your example, this should do it:
as.Date(example$dstart, format = "%Y-%m-%d")

Resources