Format a date from CSV into a date R can use.
I have a time series data file. I want to load it into R, and cast the Date column to be something usable by R. Can I specify the input format to as.Date(), or use another function that will correctly cast a Date such as 1/1/14?
In other languages I'm used to passing in a format string that tells the caster exactly how to format it, e.g. toDate('1/1/1', '%d/%m/%y'). I haven't found this function yet for R.
my_time_series.csv
Date Value
1/1/14 123
1/2/14 128.56
1/3/14 129.14
1/4/14 130.13
1/5/14 137.97
1/6/14 141.05
1/7/14 141.35
1/8/14 142.14
1/9/14 142.14
1/10/14 149.89
Now I can import it into R:
$ R
> dat = read.csv("time series test.csv", header = TRUE)
> dat
Date Value
1 1/1/14 123.0000
2 1/2/14 128.5693
3 1/3/14 129.1474
4 1/4/14 130.1361
5 1/5/14 137.9758
6 1/6/14 141.0548
7 1/7/14 141.3517
8 1/8/14 142.1449
9 1/9/14 142.1479
10 1/10/14 149.8912
Ok now I need to format those dates as actual dates. The as.Date() casting function looks promising, but returns an incorrect date:
> as.Date('1/10/14')
[1] "0001-10-14"
So I searched for whether I can specify an input format for as.Date(), but it only has a second parameter format for the output format.
I tried to work around this in Excel before saving the CSV, but it doesn't have any formats that seem to work by default with as.Date().
I'm an idiot, was just typing in the wrong formatter. This works fine.
> as.Date('1/10/14', '%m/%d/%y')
[1] "2014-01-10"
So if I have any arbitrary date format in the future, just rearrange the format string.
Related
I am trying to convert a data frame with dates in the human readable format "MM/dd/yy" to epoch days.
Here is an example of the data frame:
human.date
1 7/25/11
2 7/25/11
3 7/25/11
4 7/25/11
5 7/25/11
6 7/25/11
I want to utilize this function to convert each date in the data frame to epoch days:
# Convert from human readable date to days since epoch and round to nearest day
round(as.numeric(as.POSIXct("yyyy/MM/dd HH:mm:ss", origin="1970-01-01"))/86400)
However, the function above only works if the human readable date is in the format of "yyyy/MM/dd".
round(as.numeric(as.POSIXlt("2000/02/20", origin="1970-01-01"))/86400)
[1] 11007
Is there an easier way to do this conversion or another function I can use that will take dates in the format of "MM/dd/yy"?
You should use the format argument of the as.POSIXlt function:
> human.date <- "7/25/11"
> round(as.numeric(as.POSIXlt(human.date, format = "%M/%d/%y", origin = "1970-01-01"))/86400)
[1] 15180
I am trying to change my date within my dataframe into the correct format within R. (m/d/y to the correct yyyy-mm-dd).
I have data that looks like this
Date Time pH
1 1/4/1981 9:00 3.9
2 1/8/1981 8:30 3.9
etc
The name of my data frame I am working in is data.cat.AC
I tried
data.cat.AC[,1]$Date <- as.Date(data.cat.AC[,1]$Date, "%Y/%m/%d")
...but this did not work.
I am getting the error,
$ operator is invalid for atomic vectors
Any tips or pointers on what I am doing wrong?
When you use as.Date, you should not enter the format that you want as output. Instead enter the format as it is in the data.
as.Date("1/4/1981", format="%m/%d/%Y")
[1] "1981-01-04"
We got lucky in this case in that your desired output happens to be the default output. But for learning purposes, let's say you wanted the format "dd:mm:YYYY". After converting to Date format as we did above, we would use:
format(x2, "%m:%d:%Y")
[1] "01:04:1981"
I'm facing a very minor issue, but somehow can't resolve it.
When I'm importing a csv file that has date, the date is coming in "%Y-%m-%d" format. But I want it to be in "%d-%m-%Y" format. I tried "as.Date" to transform it. But it's not working.
The data structure look like this after importing:
Date Share_Val
21/01/2015 20
22/01/2015 19
23/01/2015 21
24/01/2015 23
25/01/2015 26
But when I'm importing the file by read.csv, the data look like the following:
Date Share_Val
01/21/2015 20
01/22/2015 19
01/23/2015 21
01/24/2015 23
01/25/2015 26
I tried lubridate. But it didn't help.
Sam's result comes exactly the way I wanted. But when I'm trying the following, it's not coming
data$date<-format(as.Date(data$date,"%m/%d/%Y"))
Can anybody please give me any suggestions?
See if this helps. Note the stringsAsFactors. If your Date field is a factor, you will need data$Date <- as.character(data$Date) first
data <- data.frame(Date = c("21/01/2015", "22/01/2015", "23/01/2015",
"24/01/2015", "25/01/2015"), Share_Val=c(20, 19, 21, 23, 26),
stringsAsFactors=F)
format(as.Date(data$Date, "%d/%m/%Y"), "%d-%m-%Y")
[1] "21-01-2015" "22-01-2015" "23-01-2015" "24-01-2015" "25-01-2015"
Too long for a comment.
I think you may be misunderstanding how Dates work in R. A variable (or column) of class Date is stored internally as the number of days since 1970-01-01. When you print a Date variable, it is displayed using the %Y-%m-%d format. The as.Date(...) function converts character to Date. The format=... argument controls how the character string is interpreted, not how the result is displayed, as in:
as.Date("02/05/2015", format="%m/%d/%Y")
# [1] "2015-02-05"
as.Date("02/05/2015", format="%d/%m/%Y")
# [1] "2015-05-02"
So in the first case the string is interpreted as 05 Feb, in the second 02 May. Note that in both cases the result is displayed (printed) in %Y-%m-%d format.
I have a problem with some date variables in my data. I already checked other similar questions here but I couldn't find the answer.
I have a very long dataset and some date vectors. The data was originally in stata format, I've tried to change them into R date format with:
as.Date(example$dstart)
which seems to work, after checking the class of the vector; but then I realised that apparently some cases are not in the standard unambiguous format that R requires, I realised when I was trying to convert "." into NAs, when I got this message
Error in charToDate(x) :
character string is not in a standard unambiguous format
This is an example of the data that I have:
head(sample)
dstart dstart2 dleave Ind
2005-03-20 <NA> 2005-11-19 1
2005-10-27 2006-07-07 2005-11-15 2
2000-02-29 2008-04-16 2005-03-02 3
2003-09-10 2007-07-23 2005-04-05 4
2004-04-24 2006-02-28 2005-10-17 5
2005-08-16 <NA> 2005-08-20 6
I presume that there are a few cases in the wrong format, but I don't know how to identify those cases.
Could you please advice me how to change the format of the of the date vector into an R format? I've tried this but it doesn't solve my problem.
as.Date(example$dstart, format = "%Y/%m/%d")
This has caused me some problem in my analysis when trying to sort by date, some dates are sorted before when they are obviously posterior.
A sample of the data
Your date format specification is using "/" instead of "-". If all your data is like your example, this should do it:
as.Date(example$dstart, format = "%Y-%m-%d")
I am working with some hdf5 data sets. However, the dates are stored in the file and no hint of these dates from the file name. The attribute file consists of day of the year, month of the year, day of the month and year columns.
I would like to pull out data to create time series identity for each of the files i.e.year month date format that can be used for time series.
A sample of the data can be downloaded here:
[ ftp://l5eil01.larc.nasa.gov/tesl1l2l3/TES/TL3COD.003/2007.08.31/TES-Aura_L3-CO_r0000006311_F01_09.he5 ]
There is an attribute group file and a data group file.
I use the R library "rhdf5" to explore the hdf5 files. E.g
CO1<-h5ls ("TES-Aura_L3-CO_r0000006311_F01_09.he5")
Attr<-h5read("TES-Aura_L3-CO_r0000006311_F01_09.he5","HDFEOS INFORMATION/coremetadata")
Data<-h5read("TES-Aura_L3-CO_r0000006311_F01_09.he5", "HDFEOS\SWATHS\ColumnAmountNO2\Data Fields\ColumnAmountNO2Trop")
The Attr when read consist of a long string with the only required information being "2007-08-31" which is the date of acquisition. I have been able to extract this using the Stringr library:
regexp <- "([[:digit:]]{4})([-])([[:digit:]]{2})([-])([[:digit:]]{2})"
Date<-str_extract(Attr,pattern=regexp)
which returns the Date as:
"2007-08-31"
The only problem left now is that the Date isnt recognised as numeric or date. How do I change this as I need to bind the Date with the data for all days to create a time series (more like an identifier as the data sets are irregular), please? a sample of how it looks after extracting the dates from string and binding with the CO values for each date is below
Dates CO3b
[1,] "2011-03-01" 1.625811e+18
[2,] "2011-03-04" 1.655504e+18
[3,] "2011-03-11" 1.690428e+18
[4,] "2011-03-15" 1.679871e+18
[5,] "2011-03-17" 1.705987e+18
[6,] "2011-03-17" 1.661198e+18
[7,] "2011-03-17" 1.662694e+18
[8,] "2011-03-20" 1.520328e+18
[9,] "2011-03-21" 1.510642e+18
[10,] "2011-03-21" 1.556637e+18
However, R recognises these dates as character and not as date. I need to convert them to a time series I can work with.
Seems like you've already done all the hard work! Based off your comment, here's how you could take it across the finish line.
From your comment, seems like you have the strings in a good format. Given that your variable is named date, simply go
dateObjects<-as.Date(Date) #where Date is your variable
and either the single value or vector of character strings (as the format you gave in the comment) will now be date objects, which you could use with a library like zoo to create time series.
If your strings are not necessarily in the format you've described, then refer to the following link to see how to format other string forms as dates.
http://www.statmethods.net/input/dates.html
Given your example data frame you can create a time series in the following way, using the package zoo.
library(zoo)
datavect<-as.zoo(df$CO3b)
index(datavect)<-as.Date(df$Date)
here we take your CO data, covert it to a zoo object, then assign the appropriate date to each entry, converting it from a character to a date object. Now if you print datavect, you'll see each data entry attached to a date. This allows you to take advantage of zoo methods, such as merge and window.
Here is one approach not using string extraction. If you know how long your time series should be, which you should based on the length of your dataset and knowledge of its periodicity, you could just create a regular date series and then add that into a data.frame with other variables of interest. Assuming you have daily data the below would work. Obviously your length.out would be different.
d1 <- ISOdate(year=2007,month=8,day=31)
d2 <- as.Date(format(seq(from=d1,by="day",length.out=10),"%Y-%m-%d"))
[1] "2007-08-31" "2007-09-01" "2007-09-02" "2007-09-03" "2007-09-04" "2007-09-05" "2007-09-06" "2007-09-07" "2007-09-08" "2007-09-09"
class(d2)
[1] "Date"
Edit of Original:
Oh I see. Well after reading in your new data example the below worked for me. It was a pretty straight forward transform. cheers
library(magrittr) # Needed for the pipe operator %>% it makes it really easy to string steps together.
dateData
Dates CO3b
1 2011-03-01 1.63e+18
2 2011-03-04 1.66e+18
3 2011-03-11 1.69e+18
4 2011-03-15 1.68e+18
5 2011-03-17 1.71e+18
6 2011-03-17 1.66e+18
7 2011-03-17 1.66e+18
8 2011-03-20 1.52e+18
9 2011-03-21 1.51e+18
10 2011-03-21 1.56e+18
dateData %>% sapply(class) # classes before transforming (character,numeric)
dateData[,1] <- as.Date(dateData[,1]) # Transform to date
dateData %>% sapply(class) # classes after transforming (Date,numeric)
str(dateData) # one more check
'data.frame': 10 obs. of 2 variables:
$ Dates: Date, format: "2011-03-01" "2011-03-04" "2011-03-11" "2011-03-15" ...
$ CO3b : num 1.63e+18 1.66e+18 1.69e+18 1.68e+18 1.71e+18 ...