R - converting dataframe to xts adds quotations (in xts) - r

I am using a dataframe to create an xts.
The xts gets created but all values (except the index in the xts) are within quotation. This leads that I cannot use the data, since many functions such as sum, does not work.
Any ideas how I can get an xts produced without the quotations?
Here is my code [updated due to comment of inconsisty names of dataframes/xts]:
# creates a dataframe with dates and currency
mydf3 <- data.frame(date = c("2013-12-10", "2015-04-01",
"2016-01-03"), plus = c(4, 3, 2), minus = c(-1, -2, -4))
# transforms the column date to date-format
mydf3 = transform(mydf3,date=as.Date(as.character(date),format='%Y-%m-%d'))
# creates the xts, based on the dataframe mydf3
myxts3 <- xts(mydf3, order.by = mydf3$date)
# removes the column date, since date is now stored as index in the xts
myxts3$date <- NULL

You need to realize that the underlying data structure that stores your data in the xts object is an R matrix object, which can only be of one R type (e.g. all numeric or all character). The timestamps are stored as a separate vector (your date column in this case) which is used for indexing/subsetting the data by time.
The cause of your problem is that your date column is forcing the matrix of data to convert to character type matrix (in the xts object) instead of numeric. It seems that the date class converts to character when it is included in the matrix:
> as.matrix(mydf3)
date plus minus
[1,] "2013-12-10" "4" "-1"
[2,] "2015-04-01" "3" "-2"
[3,] "2016-01-03" "2" "-4"
Any time you have non-numeric data in your data you're converting to xts (in the x argument of xts), you'll get this kind of problem.
Your issue can be resolved as follows (which wici has shown in the comments)
myxts3 <- xts(x= mydf3[, c("plus", "minus")], order.by = mydf3[, "date"])
> coredata(myxts3)
plus minus
[1,] 4 -1
[2,] 3 -2
[3,] 2 -4
> class(coredata(myxts3))
[1] "matrix"

The date part should be in the index, not part of the data. read.zoo will do that for you producing a zoo object. You can then convert that to xts if you need it in that form.
library(xts)
as.xts(read.zoo(mydf3))
## plus minus
## 2013-12-10 4 -1
## 2015-04-01 3 -2
## 2016-01-03 2 -4

Related

How to extract date from time series and convert it to date in R

I have a dataset consisting of 4 variables namely date,Gold price,crude price and dollar price. I converted the class of the data to time series using ts() function. After converting, the dates got changed into some value. I am able to retrieve the dates from converted values using as.Date() function.Now I want to replace the date values by date itself in the ts object's date variable.
#CONVERTING DATA FRAME TO TIME SERIES OBJECT.
Gold.ts <- ts(Gold,start=Gold$DATE[1])
head(Gold.ts)
#OUTPUT.
DATE GOLD.PRICE CRUDE DOLLAR.INR
[1,] 13152 533.9 63.42 44.705
[2,] 13153 526.3 62.79 44.600
[3,] 13154 539.7 64.21 44.320
head(as.Date(index(Gold.ts)))
[1] "2006-01-04" "2006-01-05" "2006-01-06" "2006-01-07" "2006-01-08" "2006- 01-09"
Gold.ts$DATE <- as.Date(index(Gold.ts)) # This won't work because $ is not acceptable to extract variables from a time series object.
index(Gold.ts) <- as.Date(index(Gold.ts)) #This should work but gives error. How to display date instead of values in time series object i.e Gold.ts?
What is the right way to do it?

Convert a date vector to ranks

I have the the date vector:
d <- c("30/5/15", "6/6/15", "23/5/15")
I would like to convert it to 2, 3, 1 with smallest rank to older and biggest to newest.
I tried rank(d) but it looks like it makes the ranking based on days only and reverse, it returns 3, 1, 2.
Convert to Date class, then numeric, then rank:
d <- c("30/5/15", "6/6/15", "23/5/15")
rank(as.numeric(as.Date(d, "%d/%m/%y")))
#[1] 2 3 1
Suggestions from comments:
drop as.numeric, as rank can handle dates. Although it might be preferable to be explicit.
use lubridate package: library(lubridate); rank(dmy(d))
convert the data into date format and then rank it. internally date will save in numeric values. so it can rank on it.
d <- c("30/5/15", "6/6/15", "23/5/15")
rank(as.Date(d,'%d/%m/%y'))

Converting numeric dates into Hours since format

I'm having problems converting dates into an "hours since" format, I don't think I need to apply some of the more complex methods of date manipulation, but I might be wrong, I was hoping someone might know a good solution?
The data I have is in a table format, which I read in from a text file. A 3 line example of the 5,000+ rows of data is;
date1 <- matrix(c(2007,2007,2007, 12,12,12,1,2,3,0.365,0.096,-0.416),nrow=3)
Which prints out as:
date1
[,1] [,2] [,3] [,4]
[1,] 2007 12 1 0.365
[2,] 2007 12 2 0.096
[3,] 2007 12 3 -0.416
The first column is the year, the second the month, and third the day. The value in the 4th column is an index value relevant to my study.
The data I would like to match the index value is in a slightly odd format, of hours since "1800-01-01"
ftime <- c(1822548, 1822572, 1822596)
ftime can be printed as just a date, via the following function.
as.Date(ftime/24,"1800-01-01")
[1] "2007-12-01" "2007-12-02" "2007-12-03"
My code all uses the numeric values in ftime to match data, but I cannot seem to work out how to format the new data (data1) into the same.
I have the feeling it should be a simple solution, but cannot seem to get it to work.
Help is always greatly appreciated!
You can use the difftime function, if I got what you want:
#setting the origin
myorigin<-as.Date("1800-01-01")
#converting date1 to Date objects
myDates<-as.Date(do.call(function(...) paste(...,sep="-"),as.data.frame(date1[,1:3])))
#get the results
difftime(myDates,myorigin,units="hour")
#Time differences in hours
#[1] 1822536 1822560 1822584

Extract numbers from string as numeric or dates in R

I am working with some hdf5 data sets. However, the dates are stored in the file and no hint of these dates from the file name. The attribute file consists of day of the year, month of the year, day of the month and year columns.
I would like to pull out data to create time series identity for each of the files i.e.year month date format that can be used for time series.
A sample of the data can be downloaded here:
[ ftp://l5eil01.larc.nasa.gov/tesl1l2l3/TES/TL3COD.003/2007.08.31/TES-Aura_L3-CO_r0000006311_F01_09.he5 ]
There is an attribute group file and a data group file.
I use the R library "rhdf5" to explore the hdf5 files. E.g
CO1<-h5ls ("TES-Aura_L3-CO_r0000006311_F01_09.he5")
Attr<-h5read("TES-Aura_L3-CO_r0000006311_F01_09.he5","HDFEOS INFORMATION/coremetadata")
Data<-h5read("TES-Aura_L3-CO_r0000006311_F01_09.he5", "HDFEOS\SWATHS\ColumnAmountNO2\Data Fields\ColumnAmountNO2Trop")
The Attr when read consist of a long string with the only required information being "2007-08-31" which is the date of acquisition. I have been able to extract this using the Stringr library:
regexp <- "([[:digit:]]{4})([-])([[:digit:]]{2})([-])([[:digit:]]{2})"
Date<-str_extract(Attr,pattern=regexp)
which returns the Date as:
"2007-08-31"
The only problem left now is that the Date isnt recognised as numeric or date. How do I change this as I need to bind the Date with the data for all days to create a time series (more like an identifier as the data sets are irregular), please? a sample of how it looks after extracting the dates from string and binding with the CO values for each date is below
Dates CO3b
[1,] "2011-03-01" 1.625811e+18
[2,] "2011-03-04" 1.655504e+18
[3,] "2011-03-11" 1.690428e+18
[4,] "2011-03-15" 1.679871e+18
[5,] "2011-03-17" 1.705987e+18
[6,] "2011-03-17" 1.661198e+18
[7,] "2011-03-17" 1.662694e+18
[8,] "2011-03-20" 1.520328e+18
[9,] "2011-03-21" 1.510642e+18
[10,] "2011-03-21" 1.556637e+18
However, R recognises these dates as character and not as date. I need to convert them to a time series I can work with.
Seems like you've already done all the hard work! Based off your comment, here's how you could take it across the finish line.
From your comment, seems like you have the strings in a good format. Given that your variable is named date, simply go
dateObjects<-as.Date(Date) #where Date is your variable
and either the single value or vector of character strings (as the format you gave in the comment) will now be date objects, which you could use with a library like zoo to create time series.
If your strings are not necessarily in the format you've described, then refer to the following link to see how to format other string forms as dates.
http://www.statmethods.net/input/dates.html
Given your example data frame you can create a time series in the following way, using the package zoo.
library(zoo)
datavect<-as.zoo(df$CO3b)
index(datavect)<-as.Date(df$Date)
here we take your CO data, covert it to a zoo object, then assign the appropriate date to each entry, converting it from a character to a date object. Now if you print datavect, you'll see each data entry attached to a date. This allows you to take advantage of zoo methods, such as merge and window.
Here is one approach not using string extraction. If you know how long your time series should be, which you should based on the length of your dataset and knowledge of its periodicity, you could just create a regular date series and then add that into a data.frame with other variables of interest. Assuming you have daily data the below would work. Obviously your length.out would be different.
d1 <- ISOdate(year=2007,month=8,day=31)
d2 <- as.Date(format(seq(from=d1,by="day",length.out=10),"%Y-%m-%d"))
[1] "2007-08-31" "2007-09-01" "2007-09-02" "2007-09-03" "2007-09-04" "2007-09-05" "2007-09-06" "2007-09-07" "2007-09-08" "2007-09-09"
class(d2)
[1] "Date"
Edit of Original:
Oh I see. Well after reading in your new data example the below worked for me. It was a pretty straight forward transform. cheers
library(magrittr) # Needed for the pipe operator %>% it makes it really easy to string steps together.
dateData
Dates CO3b
1 2011-03-01 1.63e+18
2 2011-03-04 1.66e+18
3 2011-03-11 1.69e+18
4 2011-03-15 1.68e+18
5 2011-03-17 1.71e+18
6 2011-03-17 1.66e+18
7 2011-03-17 1.66e+18
8 2011-03-20 1.52e+18
9 2011-03-21 1.51e+18
10 2011-03-21 1.56e+18
dateData %>% sapply(class) # classes before transforming (character,numeric)
dateData[,1] <- as.Date(dateData[,1]) # Transform to date
dateData %>% sapply(class) # classes after transforming (Date,numeric)
str(dateData) # one more check
'data.frame': 10 obs. of 2 variables:
$ Dates: Date, format: "2011-03-01" "2011-03-04" "2011-03-11" "2011-03-15" ...
$ CO3b : num 1.63e+18 1.66e+18 1.69e+18 1.68e+18 1.71e+18 ...

Take the start date of a data frame

With this we can create a data frame with days having the start day:
dataframe = seq(as.Date("2000-09-01"), by="days", length=11)
If we have already a dataframe which has values can we do the same?
That actually creates a Date vector, not a data frame.
x <- seq(as.Date("2000-09-01"), by="days", length=11)
class(x)
# [1] "Date"
If you define the starting date as the earliest date (i.e., the minimum date), then use min:
min(x)
# [1] "2000-09-01"
But if you're looking for the first element in the vector (which is also the earliest in your case, but of course it doesn't have to be), then grab it by its index:
x[1]
# [1] "2000-09-01"

Resources