I have a series of dates that appear to be defined in seoncds since Jan 1, 1960.
'data.frame': 5 obs. of 1 variable:
$ original: int 1624086000 1624086000 1508137200 1508137200 1508137200
(for reproduction:)
data <- as.data.frame(c(1624086000,1624086000,1508137200,1508137200,1508137200))
setnames(data, c("original"))
I would like to convert these to dates in the format %Y-%m-%d.
I wrote the following code for this:
uniqueDates <- as.data.frame(unique(data))
uniqueDates$converted <- sapply(uniqueDates$original, function(x) as.Date(as.POSIXct(x, origin="1960-01-01", tz = "GMT"), "GMT", "%Y-%m-%d"))
The result are dates in a five-digit numeric format:
> str(uniqueDates$converted)
num [1:2] 15144 13802
If I just run
as.Date(as.POSIXct(1624086000, origin="1960-01-01", tz = "GMT"), "GMT", "%Y-%m-%d")
I get the desired result:
[1] "2011-06-19"
What am I doing wrong that results in the five-digits numeric type values instead of the date objects?
as.Date(as.POSIXct(data[,1], origin="1960-01-01", tz = "GMT"), "GMT", "%Y-%m-%d")
[1] "2011-06-19" "2011-06-19" "2007-10-16" "2007-10-16" "2007-10-16"
The function is already vectorized. There is no need for the lapply function. Use the apply family if you have multiple columns of dates. If you want to avoid the long anonymous function, you can create the function first and use it in the way that works for your cases:
as.ymd <- function(x) {
as.Date(as.POSIXct(x, origin="1960-01-01", tz = "GMT"), "GMT", "%Y-%m-%d")
}
So now with either a single vector or array with multiple dimensions, you can convert the dates for those cases:
data2 <- data.frame(c(1624086000,1624086000,1508137200,1508137200,1508137200), c(1624086000,1624086000,1508137200,1508137200,1508137200))
setnames(data2, c("original", "second"))
as.ymd(data2[,1])
[1] "2011-06-19" "2011-06-19" "2007-10-16" "2007-10-16" "2007-10-16"
data2[] <- lapply(data2, as.ymd)
data2
original second
1 2011-06-19 2011-06-19
2 2011-06-19 2011-06-19
3 2007-10-16 2007-10-16
4 2007-10-16 2007-10-16
5 2007-10-16 2007-10-16
The five-digit numeric output from sapply is due to its simplification process. The dates are being converted to class numeric. Try adding the argument simplify=FALSE to the first function that you tried for comparison.
You can work around it with strftime since it outputs vectors with the class character. With sapply there will not be any problem simplifying it, but then you're left with character strings instead of the chosen date classes (POSIXct, POSIXlt, Date, zoo, xts, ...).
Related
I have another problem. I have a date time column in a data frame, which when i upload it comes as factor and I want it to be POSIXct
str(ida$DATA_TRAMA)
Factor w/ 1122932 levels "1-1-2010 00:00:51",..: 629101 629120 629128 629132 629139 629149
And i want it to be POSIXct (%Y-%m-%d %H:%M:%S) format. I already tried all of these methods but none of them seem to work. Whichever i apply it gets NA values.
ida$DATA_TRAMA<- as.POSIXct(ida$DATA_TRAMA,format='%d/%m/%Y %H:%M:%S')
ida$DATA_TRAMA<- as.POSIXct(as.character(ida$DATA_TRAMA), format = "%d/%m/%Y %H:%M")
ida$DATA_TRAMA <-format(ida$DATA_TRAMA, "%Y-%m-%d")
ida$DATA_TRAMA <- as.POSIXct(ida$DATA_TRAMA, format = '%Y-%m-%d:%H:%M:%S')
ida$DATA_TRAMA <- as.POSIXlt(as.character(ida$DATA_TRAMA), format="%m/%d/%Y %H:%M:%S")
ida$DATA_TRAMA <- strptime(ida$DATA_TRAMA,"%Y-%m-%d %H:%M:%S")
Do you know how to do it?
With a "factor" argument as.POSIXct will invoke as.POSIXct.default and that uses as.POSIXlt which has a "factor" method so just do:
DF <- data.frame(d = "1-1-2010 00:00:51") # test data. d has factor class.
transform(DF, d = as.POSIXct(d, format = "%m-%d-%Y %T"))
giving:
d
1 2010-01-01 00:00:51
I'm trying to set up a new variable that incorporates the difference (in number of days) between a known date and the end of a given year. Dummy data below:
> Date.event <- as.POSIXct(c("12/2/2000","8/2/2001"), format = "%d/%m/%Y", tz = "Europe/London")
> Year = c(2000,2001)
> Dates.test <- data.frame(Date.event,Year)
> Dates.test
Date.event Year
1 2000-02-12 2000
2 2001-02-08 2001
I've tried applying a function to achieve this, but it returns an error
> Time.dif.fun <- function(x) {
+ as.numeric(as.POSIXct(sprintf('31/12/%s', s= x['Year']),format = "%d/%m/%Y", tz = "Europe/London") - x['Date.event'])
+ }
> Dates.test$Time.dif <- apply(
+ Dates.test, 1, Time.dif.fun
+ )
Error in unclass(e1) - e2 : non-numeric argument to binary operator
It seems that apply() does not like as.POSIXct(), as testing a version of the function that only derives the end of year date, it is returned as a numeric in the form '978220800' (e.g. for end of year 2000). Is there any way around this? For the real data the function is a bit more complex, including conditional instances using different variables and sometimes referring to previous rows, which would be very hard to do without apply.
Here are some alternatives:
1) Your code works with these changes. We factored out s, not because it is necessary, but only because the following line gets very hard to read without that due to its length. Note that if x is a data frame then so is x["Year"] but x[["Year"]] is a vector as is x$Year. Since the operations are all vectorized we do not need apply.
Although we have not made this change, it would be a bit easier to define s as s <- paste0(x$Year, "-12-31") in which case we could omit the format argument in the following line owing to the use of the default format.
Time.dif.fun <- function(x) {
s <- sprintf('31/12/%s', x[['Year']])
as.numeric(as.POSIXct(s, format = "%d/%m/%Y", tz = "Europe/London") -x[['Date.event']])
}
Time.dif.fun(Dates.test)
## [1] 323 326
2) Convert to POSIXlt, set the year, month and day to the end of the year and subtract. Note that the year component uses years since 1900 and the mon component uses Jan = 0, Feb = 1, ..., Dec = 11. See ?as.POSIXlt for details on these and other components:
lt <- as.POSIXlt(Dates.test$Date.event)
lt$year <- Dates.test$Year - 1900
lt$mon <- 11
lt$mday <- 31
as.numeric(lt - Dates.test$Date.event)
## [1] 323 326
3) Another possibility is:
with(Dates.test, as.numeric(as.Date(paste0(Year, "-12-31")) - as.Date(Date.event)))
## [1] 323 326
You could use the difftime function:
Dates.test$diff_days <- difftime(as.POSIXct(paste0(Dates.test[,2],"-12-31"),format = "%Y-%m-%d", tz = "Europe/London"),Dates.test[,1],unit="days")
You can use ISOdate to build the end of year date, and the difftime(... units='days') to get the days til end of year.
From ?difftime:
Limited arithmetic is available on "difftime" objects: they can be
added or subtracted, and multiplied or divided by a numeric vector.
If you want to do more than the limited arithmetic, just coerce with as.numeric(), but you will have to stick with whatever units you specified.
By convention, you may wish to use the beginning of the next year (midnight on new year's eve) as your endpoint for that year. For example:
Dates.test <- data.frame(
Date.event = as.POSIXct(c("12/2/2000","8/2/2001"),
format = "%d/%m/%Y", tz = "Europe/London")
)
# use data.table::year() to get the year of a date
year <- function(x) as.POSIXlt(x)$year + 1900L
Dates.test$Date.end <- ISOdate(year(Dates.test$Date.event)+1,1,1)
# if you don't want class 'difftime', wrap it in as.numeric(), as in:
Dates.test$Date.diff <- as.numeric(
difftime(Dates.test$Date.end,
Dates.test$Date.event,
units='days')
)
Dates.test
# Date.event Date.end Date.diff
# 1 2000-02-12 2001-01-01 12:00:00 324.5
# 2 2001-02-08 2002-01-01 12:00:00 327.5
The apply() family are basically a clean way of doing for loops, and you should strive for more efficient, vectorized solutions.
If I have a vector of dates and hours such as...
c("2016-03-15 13","2016-03-16 23","2016-03-17 06","2016-03-18 15","2016-03-19 08","2016-03-20 21")
Can I find the number of hours that pass between each timestamp? I looked into difftime but it requires 2 vectors.
We can do this after converting to 'DateTime' class using lubridate, then get the difference in 'hour' between adjacent elements using difftime by passing two vectors after removing the last and first observation in the vector
library(lubridate)
v2 <- ymd_h(v1)
Or a base R option is as.POSIXct
v2 <- as.POSIXct(v1, format = "%Y-%m-%d %H")
and then do the difftime
difftime(v2[-length(v2)], v2[-1], unit = "hour")
data
v1 <- c("2016-03-15 13","2016-03-16 23","2016-03-17 06",
"2016-03-18 15","2016-03-19 08","2016-03-20 21")
You can do this by using strptime() function.
Try something like this.
data <- c("2016-03-15 13","2016-03-16 23","2016-03-17 06","2016-03-18 15","2016-03-19 08","2016-03-20 21")
datevec <- strptime(data,"%Y-%m-%d %H")
difftime(datevec[-length(datevec)],datevec[-1],units="hours")
Here is the output.
> difftime(datevec[-length(datevec)],datevec[-1],units="hours")
Time differences in hours
[1] -34 -7 -33 -17 -37
I have a data frame containing dates as characters,dd.mm.yyyy format. want to convert those in date class, format yyyy-m-d. as.date() is not working returning error, do not know how to convert 'dates' to class “Date”
dates <- data.frame(cbind(c("5.1.2015", "6.1.2014", "17.2.2014", "28.10.2014")))
colnames(dates) <- c("dates")
as.Date(dates, format = "%Y-%m-%d")
new_format_dates <- cbind(gsub("[[:punct:]]", "", dates[1:nrow(dates),1]))
as.Date(new_format_dates, format = "%Y-%m-%d")
So I tried to replace the . and reformat those dates under new_format_dates, returning result like [1] NA NA NA NA
Firstly, make your data.frames properly; don't use cbind in data.frame. Next, set the format argument of as.Date to the format you've got, including separators. If you don't know the symbol you need, check out ?strptime.
dates <- data.frame(dates = c("5.1.2015", "6.1.2014", "17.2.2014", "28.10.2014"))
dates$dates_new <- as.Date(dates$dates, format = "%d.%m.%Y")
dates
# dates dates_new
# 1 5.1.2015 2015-01-05
# 2 6.1.2014 2014-01-06
# 3 17.2.2014 2014-02-17
# 4 28.10.2014 2014-10-28
dates <- data.frame(cbind(c("5.1.2015", "6.1.2014", "17.2.2014", "28.10.2014")))
colnames(dates) <- c("dates")
dates$new_Dates <- gsub("[.]","-",dates$dates)
dates$dates <- NULL
dates_new <- as.Date(dates$new_Dates, format = "%d-%m-%Y")
dates_new <- data.frame(dates_new)
print(dates_new)
I’m attempting to transform two columns in my dataframe to the ‘good’ date & time class, and until now didn’t have much success with it. I’ve tried various classes (timeDate, Date, timeSeries, POSIXct, POSIXlt) but without success. Perhaps I’m just overlooking the obvious and because I’ve tried so many approaches I just don’t know what’s what anymore. I hope some of you can shed some light on where I go wrong.
Goal:
I want to calculate the difference between two dates using the earliest and latest date. I got this working with head() and tail(), but because those values aren’t necessary the earliest and latest date in my data, I need another way. (I can’t get the sorting of data to work, because it sorts the data only on the day of the date.)
Second goal: I want to convert the dates from daily format (i.e. 8-12-2010) to weekly, monthly, and yearly levels (i.e. '49-2010', 'december-10', and just '2010'). This can be done with the format settings (like %d-%m-%y). Can this be done with converting the data.frame to an time class, and than transforming the timeclass in the right format (8-12-2010 -> format("%B-%y") -> 'december-10'), and then transforming that time class into an factor with levels for each month?
For both goals I need to convert the dateframe in some way to an time class, and this is where I ran into some difficulties.
My dataframe looks like this:
> tradesList[c(1,10,11,20),14:15] -> tmpTimes4
> tmpTimes4
EntryTime ExitTime
1 01-03-07 10-04-07
10 29-10-07 02-11-07
11 13-04-07 14-05-07
20 18-12-07 20-02-08
Here’s an summary of what I’ve tried:
> class(tmpTimes4)
[1] "data.frame"
> as.Date(head(tmpTimes4$EntryTimes, n=1), format="%d-%m-%y")
Error in as.Date.default(head(tmpTimes4$EntryTimes, n = 1), format = "%d-%m-%y") :
do not know how to convert 'head(tmpTimes4$EntryTimes, n = 1)' to class "Date"
> as.timeDate(tmpTimes4, format="%d-%m-%y")
Error in as.timeDate(tmpTimes4, format = "%d-%m-%y") :
unused argument(s) (format = "%d-%m-%y")
> timeSeries(tmpTimes4, format="%d-%m-%y")
Error in midnightStandard2(charvec, format) :
'charvec' has non-NA entries of different number of characters
> tmpEntryTimes4 <- timeSeries(tmpTimes4$EntryTime, format="%d-%m-%y")
> tmpExitTimes4 <- timeSeries(tmpTimes4$ExitTime, format="%d-%m-%y")
> tmpTimes5 <- cbind(tmpEntryTimes4,tmpExitTimes4)
> colnames(tmpTimes5) <- c("Entry","Exit")
> tmpTimes5
Entry Exit
[1,] 01-03-07 10-04-07
[2,] 29-10-07 02-11-07
[3,] 13-04-07 14-05-07
[4,] 18-12-07 20-02-08
> class(tmpTimes5)
[1] "timeSeries"
attr(,"package")
[1] "timeSeries"
> as.timeDate(tmpTimes5, format="%d-%m-%y")
Error in as.timeDate(tmpTimes5, format = "%d-%m-%y") :
unused argument(s) (format = "%d-%m-%y")
> as.Date(tmpTimes5, format="%d-%m-%y")
Error in as.Date.default(tmpTimes5, format = "%d-%m-%y") :
do not know how to convert 'tmpTimes5' to class "Date"
> format.POSIXlt(tmpTimes5, format="%d-%m-%y", usetz=FALSE)
Error in format.POSIXlt(tmpTimes5, format = "%d-%m-%y", usetz = FALSE) :
wrong class
> as.POSIXlt(tmpTimes5, format="%d-%m-%y", usetz=FALSE)
Error in as.POSIXlt.default(tmpTimes5, format = "%d-%m-%y", usetz = FALSE) :
do not know how to convert 'tmpTimes5' to class "POSIXlt"
> as.POSIXct(tmpTimes5, format="%d-%m-%y", usetz=FALSE)
Error in as.POSIXlt.default(x, tz, ...) :
do not know how to convert 'x' to class "POSIXlt"
The TimeDate packages has an function for ‘range’, however, converting to the Date class works for an individual instance, but for some reason not for an data frame:
> as.Date(tmpTimes4[1,1], format="%d-%m-%y")
[1] "2007-03-01"
> as.Date(tmpTimes4, format="%d-%m-%y")
Error in as.Date.default(tmpTimes4, format = "%d-%m-%y") :
do not know how to convert 'tmpTimes4' to class "Date"
At this point I almost believe it’s impossible to do, so any thoughts would be highly appreciated!
Regards,
Start with some dummy data:
start <- as.Date("2010/01/01")
end <- as.Date("2010/12/31")
set.seed(1)
datewant <- seq(start, end, by = "days")[sample(15)]
tmpTimes <- data.frame(EntryTime = datewant,
ExitTime = datewant + sample(100, 15))
## reorder on EntryTime so in random order
tmpTimes <- tmpTimes[sample(NROW(tmpTimes)), ]
head(tmpTimes)
so we have something like this:
> head(tmpTimes)
EntryTime ExitTime
8 2010-01-14 2010-03-16
9 2010-01-05 2010-01-17
7 2010-01-10 2010-01-30
3 2010-01-08 2010-04-16
10 2010-01-01 2010-01-26
13 2010-01-12 2010-02-15
Using the above, look at Goal 1, compute difference between earliest and latest date. You can treat dates as if they were numbers (that is how they are stored internally anyway), so functions like min() and max() will work. You can use the difftime() function:
> with(tmpTimes, difftime(max(EntryTime), min(EntryTime)))
Time difference of 14 days
or use standard subtraction
> with(tmpTimes, max(EntryTime) - min(EntryTime))
Time difference of 14 days
to get the difference in days. head() and tail() will only work if you sort the dates as these take the first and the last value in a vector, not the highest and lowest actual value.
Goal 2: You seem to be trying to convert a data frame to a Date. You can't do this. What you can do is reformat the data in the components of the data frame. Here I add columns to tmpTimes by reformatting the EntryTime column into several different summaries of the date.
tmpTimes2 <- within(tmpTimes, weekOfYear <- format(EntryTime, format = "%W-%Y"))
tmpTimes2 <- within(tmpTimes2, monthYear <- format(EntryTime, format = "%B-%Y"))
tmpTimes2 <- within(tmpTimes2, Year <- format(EntryTime, format = "%Y"))
Giving:
> head(tmpTimes2)
EntryTime ExitTime weekOfYear monthYear Year
8 2010-01-14 2010-03-16 02-2010 January-2010 2010
9 2010-01-05 2010-01-17 01-2010 January-2010 2010
7 2010-01-10 2010-01-30 01-2010 January-2010 2010
3 2010-01-08 2010-04-16 01-2010 January-2010 2010
10 2010-01-01 2010-01-26 00-2010 January-2010 2010
13 2010-01-12 2010-02-15 02-2010 January-2010 2010
If you are American or want to use the US convention for the start of the week (%W starts the week on a Monday, in US convention is to start on a Sunday), change the %W to %U. ?strftime has more details of what %W and %U represent.
A final point on data format: In the above I have worked with dates in standard R format. You have your data stored in a data frame in a non-standard markup, presumably as characters or factors. So you have something like:
tmpTimes3 <- within(tmpTimes,
EntryTime <- format(EntryTime, format = "%d-%m-%y"))
tmpTimes3 <- within(tmpTimes3,
ExitTime <- format(ExitTime, format = "%d-%m-%y"))
> head(tmpTimes3)
EntryTime ExitTime
8 14-01-10 16-03-10
9 05-01-10 17-01-10
7 10-01-10 30-01-10
3 08-01-10 16-04-10
10 01-01-10 26-01-10
13 12-01-10 15-02-10
You need to convert those characters or factors to something R understands as a date. My preference would be the "Date" class. Before you try the above answers with your data, convert your data to the correct format:
tmpTimes3 <-
within(tmpTimes3, {
EntryTime <- as.Date(as.character(EntryTime), format = "%d-%m-%y")
ExitTime <- as.Date(as.character(ExitTime), format = "%d-%m-%y")
})
so that your data looks like this:
> head(tmpTimes3)
EntryTime ExitTime
8 2010-01-14 2010-03-16
9 2010-01-05 2010-01-17
7 2010-01-10 2010-01-30
3 2010-01-08 2010-04-16
10 2010-01-01 2010-01-26
13 2010-01-12 2010-02-15
> str(tmpTimes3)
'data.frame': 15 obs. of 2 variables:
$ EntryTime:Class 'Date' num [1:15] 14623 14614 14619 14617 14610 ...
$ ExitTime :Class 'Date' num [1:15] 14684 14626 14639 14715 14635 ...
Short answer:
Convert to date if not already done.
Then use min and max on the list
of dates.
date_list = structure(c(15401, 15405, 15405), class = "Date")
date_list
#[1] "2012-03-02" "2012-03-06" "2012-03-06"
min(date_list)
#[1] "2012-03-02"
max(date_list)
#[1] "2012-03-06"
More easy. Use summary() on date column directly giving Min and Max and more. Example: summary(df$date)