I have data in YYMMDDHH format but am trying to get the weekday so I need to go to a date format but can't figure it out.
Here's a dput of the relevant data:
structure(list(id = c(7927751403363142656, 18236986451472797696,
5654946373641778176, 14195690822403907584, 1693303484298446848,
1.1362181921561e+19, 11694645532962195456, 1221431312630614784,
1987127670789791488, 379819848497418688), hour = c(14102118L,
14102217L, 14102812L, 14102912L, 14102820L, 14102401L, 14102117L,
14102312L, 14102301L, 14102414L)), .Names = c("id", "hour"), row.names = c(3620479L,
8510796L, 29632625L, 34450879L, 31874113L, 13420799L, 3332671L,
11543560L, 9602012L, 15574701L), class = "data.frame")
When I use:
dat2$dow <- as.Date(substr(as.character(dat2$hour), 1,6), format = '%Y%m%d')
I just get NA's. Any suggestions?
"%Y" is for 4-digit years; "%y" is for 2-digit years. And you don't need to use substr. as.Date will ignore anything after the end of the specified format.
dat2$dow <- as.Date(as.character(dat2$hour), format='%y%m%d')
Related
I'm playing around with functions in R and want to create a function that takes a character variable and converts it to a POSIXct.
The time variable currently looks like this:
"2020-01-01T05:00:00.283236Z"
I've successfully converted the time variable in my janviews dataset with the following code:
janviews$time <- gsub('T',' ',janviews$time)
janviews$time <- as.POSIXct(janviews$time, format = "%Y-%m-%d %H:%M:%S", tz = Sys.timezone())
Since I have to perform this on multiple datasets, I want to create a function that will perform this. I created the following function but it doesn't seem to be working and I'm not sure why:
set.time <- function(dat, variable.name){
dat$variable.name <- gsub('T', ' ', dat$variable.name)
dat$variable.name <- as.POSIXct(dat$variable.name, format = "%Y-%m-%d %H:%M:%S", tz = Sys.timezone())
}
Here's the first four rows of the janviews dataset:
structure(list(customer_id = c("S4PpjV8AgTBx", "p5bpA9itlILN",
"nujcp24ULuxD", "cFV46KwexXoE"), product_id = c("kq4dNGB9NzwbwmiE",
"FQjLaJ4B76h0l1dM", "pCl1B4XF0iRBUuGt", "e5DN2VOdpiH1Cqg3"),
time = c("2020-01-01T05:00:00.283236Z", "2020-01-01T05:00:00.895876Z",
"2020-01-01T05:00:01.362329Z", "2020-01-01T05:00:01.873054Z"
)), row.names = c(NA, -4L), class = c("data.table", "data.frame"
), .internal.selfref = <pointer: 0x1488180e0>)
Also, if there is a better way to convert my time variable, I am open to changing my method!
I would use the lubridate package and the as_datetime() function.
lubridate::as_datetime("2020-01-01T05:00:00.283236Z")
Returns
"2020-01-01 05:00:00 UTC"
Lubridate Info
I am reading in a .csv of dates and gps positions. I need to convert the date column to a date class.
I am using:
data = data.frame(rbind(c('2016/07/19 17:52:00',3674.64416424279,354.266660979476),
c('2016/07/19 17:54:00',3674.65121597935,354.246972537617),
c('2016/07/19 17:55:00',3674.65474186293,354.237128326737),
c('2016/07/19 17:56:00',3674.65826775671,354.227284122559)))
colnames(data) = (c('GMT_DateTime','northing','easting'))
data$GMT_DateTime<-as.POSIXct(data$GMT_DateTime, tz="GMT", format = "%Y/%m/%d %H:%M:%S")
Sometimes the date in the .csv to be read is formatted as "%Y/%m/%d %H:%M:%S" and sometimes as "%m/%d/%Y %H:%M"
Is there a way to feed in two possible formats to as.POSIXct() to try both possible formats? I imagine something like this:
data$GMT_DateTime<-as.POSIXct(data$GMT_DateTime, tz="GMT", format = "%m/%d/%Y %H:%M" or "%Y/%m/%d %H:%M:%S")
Thank you!
In what follows I will use package lubridate.
I have added two extra rows to the example dataset, with date/time values in the "%m/%d/%Y %H:%M" format. Note that that column is of class character, if it is of class factor it will probably throw an error.
As for the warnings, don't worry, they are just lubridate telling you that it found several formats and cannot process them all in one go.
tmp <- data$GMT_DateTime # work on a copy
na <- is.na(ymd_hms(tmp))
data$GMT_DateTime[!na] <- ymd_hms(tmp)[!na]
data$GMT_DateTime[na] <- mdy_hm(tmp)[na]
data$GMT_DateTime <- as.POSIXct(as.numeric(data$GMT_DateTime),
format = "%Y-%m-%d",
origin = "1970-01-01", tz = "GMT")
rm(tmp) # final clean up
Data in dput() format.
data <-
structure(list(GMT_DateTime = c("2016/07/19 17:52:00", "2016/07/19 17:54:00",
"2016/07/19 17:55:00", "2016/07/19 17:56:00", "07/22/2016 17:02",
"07/23/2016 17:15"), northing = c(3674.64416424279, 3674.65121597935,
3674.65474186293, 3674.65826775671, 3674.662, 3674.665), easting = c(354.266660979476,
354.246972537617, 354.237128326737, 354.227284122559, 354.2702,
354.3123)), row.names = c(NA, -6L), class = "data.frame")
I have some have some water quality sample data.
> dput(GrowingArealog90s[1:10,])
structure(list(SampleDate = structure(c(6948, 6949, 6950, 7516,
7517, 7782, 7783, 7784, 8092, 8106), class = "Date"), Flog90 = c(1.51851393987789,
1.48970743802793, 1.81243963000062, 0.273575501327576, 0.874218895695207,
1.89762709129044, 1.44012088794774, 0.301029995663981, 1.23603370361931,
0.301029995663981)), .Names = c("SampleDate", "Flog90"), class = c("tbl_df",
"data.frame"), row.names = c(NA, -10L))
This data is collected monthly, although some months are missed over the 25 year period.
I know there is so much help out there for converting dates to different formats but I have not been able to figure this out. I want to create a time series with just a month/year format, so that I can do things like decompose the data by month and run seasonal kendalls and such. I have tried so many different ways of converting my date to the desired format that I have completely confused myself. I don't care about the exact format as long as it is recognized month/year.
I also need to fill in the missing months with NAs.
I tried uploading the "SampleDate" column in a numeric format, "yyyymm". I could then merge that data frame with another that contained all the dates I need.
GA90 <- merge(Dates, GrowingArealog90s, by.x = "Date", by.y = "Date", all.x = TRUE)
However, when I converted the resulting data frame to a time series it would not recognize the 12 month frequency.
GA90ts <- as.ts(GA90, frequency(12))
> GA90ts
Time Series:
Start = 1
End = 324
Frequency = 1
Any help with this is appreciated.
Here's how to do it with zoo. You'll get a warning, but it's OK for now. You'll get a series with mon/yy.
series <-structure(list(SampleDate = structure(c(6948, 6949, 6950, 7516,
7517, 7782, 7783, 7784, 8092, 8106), class = "Date"), Flog90 = c(1.51851393987789,
1.48970743802793, 1.81243963000062, 0.273575501327576, 0.874218895695207,
1.89762709129044, 1.44012088794774, 0.301029995663981, 1.23603370361931,
0.301029995663981)), .Names = c("SampleDate", "Flog90"), class = c("tbl_df",
"data.frame"), row.names = c(NA, -10L))
library(zoo)
series <-as.data.frame(series) #to drop dplyr class
series.zoo <-zoo(series[,-1,drop=FALSE],as.yearmon(series[,1]))
Best practice would be to keep your series with actual date and use as.yearmon or as.yearmon only when you actually need to make calculations or aggregate.zoo by month and year.
The following is a matter of taste, but I've dealt with a lot of time series and I think zoo is superior to ts and xts. Much more flexible.
Now, to fill in missing values, you have to create a vector of dates. Here, I'm using a zoo object with actual dates. I then use na.locf, which is "last observation carry forward". You could also look at na.approx.
series.zoo <-zoo(series[,-1,drop=FALSE],(series[,1]))
my.seq <-seq.Date(first(series[,1,drop=FALSE]), last(series[,1,drop=FALSE]),by="month")
merged <-merge.zoo(series.zoo,zoo(,my.seq))
na.locf(merged)
UPDATE
With aggregate.
GrowingArealog90s <-structure(list(SampleDate = structure(c(6948, 6949, 6950, 7516,
7517, 7782, 7783, 7784, 8092, 8106), class = "Date"), Flog90 = c(1.51851393987789,
1.48970743802793, 1.81243963000062, 0.273575501327576, 0.874218895695207,
1.89762709129044, 1.44012088794774, 0.301029995663981, 1.23603370361931,
0.301029995663981)), .Names = c("SampleDate", "Flog90"), class = c("tbl_df",
"data.frame"), row.names = c(NA, -10L))
library(zoo);library(xts)
GrowingArealog90s <-as.data.frame(GrowingArealog90s) #to remove dplyr format
GrowingArealog90s.zoo <-zoo(GrowingArealog90s[,-1,drop=FALSE],as.Date(GrowingArealog90s[,1]))
#First aggregate by month. I chose to get the mean per month
GrowingArealog90s.agg <-aggregate(GrowingArealog90s.zoo, as.yearmon, mean) #replace mean with last to get last reading of the month
#Then create a sequence of months and merge it
my.seq <-seq.Date(first(GrowingArealog90s[,1]), last(GrowingArealog90s[,1]),by="month")
merged <-merge.zoo(GrowingArealog90s.agg ,zoo(,as.yearmon(my.seq)))
na.locf(merged)
I have a data that has one date column and 10 other columns.
The date column has the format of 199010.
so it's yyyymm.
It seems like that zoo/xts requires that the date has days info in it.
Is there any way to address this issue?
hier ist my data
structure(list(Date = 198901:198905, NoDur = c(5.66, -1.44, 5.51,
5.68, 5.32)), .Names = c("Date", "NoDur"), class = "data.frame", row.names = c(NA,
5L))
data<-read.zoo("C:/***/data_port.csv",sep=",",format="%Y%m",header=TRUE,index.column=1,colClasses=c("character",rep("numeric",1)))
The code has these problems:
the data is space separated but the code specifies that it is comma separated
the data does not describe dates since there is no day but the code is using the default of dates
the data is not provided in reproducible form. Note how one can simply copy the data and code below and paste it into R without any additional work.
Try this:
Lines <- "Date NoDur
198901 5.66
198902 -1.44
198903 5.51
198904 5.68
198905 5.32
"
library(zoo)
read.zoo(text = Lines, format = "%Y%m", FUN = as.yearmon, header = TRUE,
colClasses = c("character", NA))
The above converts the index to "yearmon" class which probably makes most sense here but it would alternately be possible to convert it to "Date" class by using FUN = function(x, format) as.Date(as.yearmon(x, format)) in place of the
FUN argument above.
This question already has answers here:
Convert integer to class Date
(3 answers)
Closed 2 years ago.
I have a dataset, the first column has the date field with yyyymmdd formats. I tried to convert these dates to R format using:
yield.usd$Fecha <- as.Date(yield.usd$Fecha, format='%Y%m%d')
and the output is:
Error in as.Date.numeric(yield.usd$Fecha, format = "%Y%m%d") : 'origin' must be supplied
Then I tried:
yield.dates <- as.Date(yield.usd[1], format='%Y%m%d')
the output as follows:
Error in as.Date.default(yield.usd[1], format = "%Y%m%d") : do not know how to convert 'yield.usd[1]' to class “Date”
How can I make R read those dates?
The dput(head(yield.usd)) as follows:
structure(list(Fecha = c(20120815L, 20120815L, 20120815L, 20120815L,
20120815L, 20120815L), Plazo = 1:6, Soberana = c(0.001529738,
0.001558628, 0.001587518, 0.001616408, 0.001645299, 0.001674189
), AAA = c(0.009642716, 0.009671607, 0.009700497, 0.009729387,
0.009758277, 0.009787168), AA. = c(0.017483959, 0.01751285, 0.01754174,
0.01757063, 0.01759952, 0.017628411), AA = c(0.017762383, 0.017791273,
0.017820163, 0.017849053, 0.017877944, 0.017906834), AA..1 = c(0.018207843,
0.018236733, 0.018265624, 0.018294514, 0.018323404, 0.018352294
), A = c(0.036340293, 0.036369183, 0.036398073, 0.036426964,
0.036455854, 0.036484744)), .Names = c("Fecha", "Plazo", "Soberana",
"AAA", "AA.", "AA", "AA..1", "A"), row.names = c(NA, 6L), class = "data.frame")
Your dates are in number format, so R thinks it represents a timestamp (in such case it needs origin of timestamp, usually 1st January 1970). What you want is comprehension from strings, so you should convert your numbers to characters first:
as.Date(as.character(yield.usd$Fecha), format='%Y%m%d')