I have data that looks like
Dates another column
2015-05-13 23:53:00 some values
2015-05-13 23:53:00 ....
2015-05-13 23:33:00
2015-05-13 23:30:00
...
2003-01-06 00:01:00
2003-01-06 00:01:00
The code I then used is
trainDF<-read.csv("train.csv")
diff<-as.POSIXct(trainDF[1,1])-as.POSIXct(trainDF[,1])
head(diff)
Time differences in hours
[1] 23.88333 23.88333 23.88333 23.88333 23.88333 23.88333
However, this doesn't make sense because subtracting the 1st two entries should give 0, since they are the exact same time. Subtracting the 3rd entry from the 1st should give a difference of 20 minutes, not 23.88333 hours. I get the similar values that don't make sense when I try as.duration(diff) and as.numeric(diff). Why is this?
If you just have a series of dates in POSIXct, you can use the diff function to calculate the difference between each date. Here's an example:
> BD <- as.POSIXct("2015-01-01 12:00:00", tz = "UTC") # Making a begin date.
> ED <- as.POSIXct("2015-01-01 13:00:00", tz = "UTC") # Making an end date.
> timeSeq <- seq(BD, ED, "min") # Creating a time series in between the dates by minute.
>
> head(timeSeq) # To see what it looks like.
[1] "2015-01-01 12:00:00 UTC" "2015-01-01 12:01:00 UTC" "2015-01-01 12:02:00 UTC" "2015-01-01 12:03:00 UTC" "2015-01-01 12:04:00 UTC"
[6] "2015-01-01 12:05:00 UTC"
>
> diffTime <- diff(timeSeq) # Takes the difference between each adjacent time in the time series.
> print(diffTime) # Printing out the result.
Time differences in mins
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
>
> # For the sake of example, let's make a hole in the data.
>
> limBD <- as.POSIXct("2015-01-01 12:15:00", tz = "UTC") # Start of the hole we want to create.
> limED <- as.POSIXct("2015-01-01 12:45:00", tz = "UTC") # End of the hole we want to create.
>
> timeSeqLim <- timeSeq[timeSeq <= limBD | timeSeq >= limED] # Make a hole of 1/2 hour in the sequence.
>
> diffTimeLim <- diff(timeSeqLim) # Taking the diff.
> print(diffTimeLim) # There is now a large gap, which is reflected in the print out.
Time differences in mins
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 30 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
However, I read through your post again, and it seems you just want to subtract each item not in the first row by the first row. I used the same sample I used above to do this:
Time difference of 1 mins
> timeSeq[1] - timeSeq[2:length(timeSeq)]
Time differences in mins
[1] -1 -2 -3 -4 -5 -6 -7 -8 -9 -10 -11 -12 -13 -14 -15 -16 -17 -18 -19 -20 -21 -22 -23 -24 -25 -26 -27 -28 -29 -30 -31 -32 -33 -34 -35 -36
[37] -37 -38 -39 -40 -41 -42 -43 -44 -45 -46 -47 -48 -49 -50 -51 -52 -53 -54 -55 -56 -57 -58 -59 -60
Which gives me what I'd expect. Trying a data.frame method:
> timeDF <- data.frame(time = timeSeq)
> timeDF[1,1] - timeDF[, 1]
Time differences in secs
[1] 0 -60 -120 -180 -240 -300 -360 -420 -480 -540 -600 -660 -720 -780 -840 -900 -960 -1020 -1080 -1140 -1200 -1260 -1320 -1380
[25] -1440 -1500 -1560 -1620 -1680 -1740 -1800 -1860 -1920 -1980 -2040 -2100 -2160 -2220 -2280 -2340 -2400 -2460 -2520 -2580 -2640 -2700 -2760 -2820
[49] -2880 -2940 -3000 -3060 -3120 -3180 -3240 -3300 -3360 -3420 -3480 -3540 -3600
It seems I'm not encountering the same problem as you. Perhaps coerce everything to POSIX.ct first and then do your subtraction? Try checking the class of your data and make sure it is actually in POSIXct. Check the actual values you are subtracting and that may give you some insight.
EDIT:
After downloading the file, here's what I ran. The file is trainDF:
trainDF$Dates <- as.POSIXct(trainDF$Dates, tz = "UTC") # Coercing to POSIXct.
datesDiff <- trainDF[1, 1] - trainDF[, 1] # Taking the difference of each date with the first date.
head(datesDiff) # Printing out the head.
With results:
Time differences in secs
[1] 0 0 1200 1380 1380 1380
The only thing I did differently was use the time zone UTC, which does not shift hours with daylight savings time, so there should be no effect there.
HOWEVER, I did the exact same method as you and got the same results:
> diff<-as.POSIXct(trainDF[1,1])-as.POSIXct(trainDF[,1])
> head(diff)
Time differences in hours
[1] 23.88333 23.88333 23.88333 23.88333 23.88333 23.88333
So there is something up with your method, but I can't say what. I do find that it is typically safer to coerce and then do some mathematical operation instead of all together in one line.
Related
I want to calculate Cluster with the 'Openair' package. I read the trajectory data with:
traj_2010 <- read.csv("C:/Users/stieger/Trajektorien/traj_10.csv", header=TRUE)
traj_2010$date <- as.POSIXct(strptime(traj_2010$date, format="%Y-%m-%d %H:%M", "GMT"))
traj_2010$date2 <- as.POSIXct(strptime(traj_2010$date2, format="%Y-%m-%d %H:%M", "GMT"))
Thus, I have trajectory data in form of:
head(traj_2010)
receptor year month day hour hour.inc lat lon height pressure date2 date
1 1 10 1 1 0 0 51.330 12.420 500.0 912.2 2010-01-01 00:00:00 2010-01-01
2 1 9 12 31 23 -1 51.350 12.523 464.1 915.2 2009-12-31 23:00:00 2010-01-01
3 1 9 12 31 22 -2 51.353 12.668 422.2 919.4 2009-12-31 22:00:00 2010-01-01
4 1 9 12 31 21 -3 51.329 12.840 380.9 922.0 2009-12-31 21:00:00 2010-01-01
5 1 9 12 31 20 -4 51.279 13.029 351.5 923.6 2009-12-31 20:00:00 2010-01-01
6 1 9 12 31 19 -5 51.218 13.240 335.1 924.2 2009-12-31 19:00:00 2010-01-01
I now use the command trajCluster:
cluster_2010<-trajCluster(traj_2010, method = "Angle", n.cluster = 5, col = c("red2", "blue", "green3", "purple", "black"),
map.fill= FALSE, key=FALSE, main="2010", xlab="latitude", ylab="longitude", xlim=range(-35:35), ylim=range(35:70),
par.settings=list(axis.line=list(lwd=1.5), strip.border=list(lwd=2),
fontsize=list(text=32)))
it runs some minutes but with the result:
|==============================================================================|100% ~0 s remaining Error in summarise_impl(.data, dots) :
Evaluation error: argument "x" is missing, with no default.
I am a little bit confused because in the past it worked without problems. Could somebody help me?
I am using the following packages:
library(openair)
library(plyr)
library(reshape2)
library(mapdata)
library(rworldmap)
library(dplyr)
Here is subset of my original data I am working with:
dput(datumi)
structure(c("21:26", "21:33", "21:38", "23:02", "23:03", "21:27",
"21:34", "21:39", "23:03", "23:04", "21:26", "21:33", "21:38",
"23:02", "23:04", "21:26", "21:34", "21:38", "23:02", "23:04",
"21:27", "21:34", "21:39", "23:02", "23:04"), .Dim = c(5L, 5L
), .Dimnames = list(c("2", "3", "4", "5", "6"), c("Datum_1",
"Datum_2", "Datum_3", "Datum_4", "Datum_5")))
So I am working with time, where e.g., 21:26 means time of the day.
Now I would like to subtract second column from first one and third from second and so on, this means that I would subtract column Datum_2 from Datum_1 and column Datum_3 from Datum_2 and Datum_4 from Datum_3. And my output will be new columns with differences in seconds
I've already created function/loop that does this if my data would be numeric, so for example in case of numeric data I would do this and get the desired output:
dat <- data.frame(
column1 = round(runif(n = 10, min=0, max=5),0),
column2 = round(runif(n = 10, min=0, max=5),0),
column3 = round(runif(n = 10, min=0, max=5),0),
column4 = round(runif(n = 10, min=0, max=5),0)
)
results <- list()
for(i in 1:length(dat)) {
if (i==length(dat)){
results[[i]] <-dat[,i]
} else {results[[i]] <-dat[,i+1] - dat[,i]}
}
results <- t(do.call(rbind,results))
results <- data.frame(results)
But I cannot figure it out for time format and I have tried strptime and as.POSIXct
x1 <- strptime(datumi, "%H:%M")
as.numeric(x1,units="secs")
and
as.POSIXct(datumi,format="%H:%M")
And also looked at this
Subtract time in r
Subtracting Two Columns Consisting of Both Date and Time in R
convert character to time in R
Here is one solution based on the answer given in R: Convert hours:minutes:seconds.
datumi
# Datum_1 Datum_2 Datum_3 Datum_4 Datum_5
# 2 "21:26" "21:27" "21:26" "21:26" "21:27"
# 3 "21:33" "21:34" "21:33" "21:34" "21:34"
# 4 "21:38" "21:39" "21:38" "21:38" "21:39"
# 5 "23:02" "23:03" "23:02" "23:02" "23:02"
# 6 "23:03" "23:04" "23:04" "23:04" "23:04"
makeTime <- function(x) as.POSIXct(paste(Sys.Date(), x))
dat <- apply(datumi, 2, makeTime)
mapply(x = 2:ncol(dat),
y = 1:(ncol(dat) -1),
function(x, y) dat[ , x] - dat[ , y])
# [,1] [,2] [,3] [,4]
# [1,] 60 -60 0 60
# [2,] 60 -60 60 0
# [3,] 60 -60 0 60
# [4,] 60 -60 0 0
# [5,] 60 0 0 0
You can also use as.POSIXct without pasting the current data with the 'format' argument:
makeTime <- function(x) as.POSIXct(x, format = "%H:%M")
Note, the result is the same because as.POSIXct assumes the current date when none is given.
One way you could also do it if you wanted to have column names in addition to your original data would be to do:
df<-as.data.frame(lapply(dat,strptime,format="%H:%M"))
lapply(1:4, function(i) df[,paste0("diff",i,"_",i+1)] <<- difftime(df[,i],df[,i+1],units=c("secs")))
df
Datum_1 Datum_2 Datum_3 Datum_4 Datum_5 diff1_2 diff2_3 diff3_4
2 2016-07-22 21:26:00 2016-07-22 21:27:00 2016-07-22 21:26:00 2016-07-22 21:26:00 2016-07-22 21:27:00 -60 secs 60 secs 0 secs
3 2016-07-22 21:33:00 2016-07-22 21:34:00 2016-07-22 21:33:00 2016-07-22 21:34:00 2016-07-22 21:34:00 -60 secs 60 secs -60 secs
4 2016-07-22 21:38:00 2016-07-22 21:39:00 2016-07-22 21:38:00 2016-07-22 21:38:00 2016-07-22 21:39:00 -60 secs 60 secs 0 secs
5 2016-07-22 23:02:00 2016-07-22 23:03:00 2016-07-22 23:02:00 2016-07-22 23:02:00 2016-07-22 23:02:00 -60 secs 60 secs 0 secs
6 2016-07-22 23:03:00 2016-07-22 23:04:00 2016-07-22 23:04:00 2016-07-22 23:04:00 2016-07-22 23:04:00 -60 secs 0 secs 0 secs
diff4_5
2 -60 secs
3 0 secs
4 -60 secs
5 0 secs
6 0 secs
I've found solution to my problem including function/loop that I've created for numeric data, I just needed to include
difftime(strptime(datumi[,i+1], format = "%H:%M"), strptime(datumi[,i], format = "%H:%M"), units = "secs") in my for loop function so code looks like this
datumi <- as.data.frame(datumi)
results <- list()
for(i in 1:length(dat)) {
if (i==length(dat)){
results[[i]] <-NULL
} else {results[[i]] <-difftime(strptime(datumi[,i+1], format = "%H:%M"), strptime(datumi[,1], format = "%H:%M"), units = "secs") }
}
results <- t(do.call(rbind,results))
results <- data.frame(results)
#And output
X1 X2 X3 X4
2 60 0 0 60
3 60 0 60 60
4 60 0 0 60
5 60 0 0 0
6 60 60 60 60
But because mapply used by #dayne is more convenient for me (because it applys function to multiple list arguments and is more readable for me) I used his solution.
I have the following data as a list of POSIXct times that span one month. Each of them represent a bike delivery. My aim is to find the average amount of bike deliveries per ten-minute interval over a 24-hour period (producing a total of 144 rows). First all of the trips need to be summed and binned into an interval, then divided by the number of days. So far, I've managed to write a code that sums trips per 10-minute interval, but it produces incorrect values. I am not sure where it went wrong.
The data looks like this:
head(start_times)
[1] "2014-10-21 16:58:13 EST" "2014-10-07 10:14:22 EST" "2014-10-20 01:45:11 EST"
[4] "2014-10-17 08:16:17 EST" "2014-10-07 17:46:36 EST" "2014-10-28 17:32:34 EST"
length(start_times)
[1] 1747
The code looks like this:
library(lubridate)
library(dplyr)
tripduration <- floor(runif(1747) * 1000)
time_bucket <- start_times - minutes(minute(start_times) %% 10) - seconds(second(start_times))
df <- data.frame(tripduration, start_times, time_bucket)
summarized <- df %>%
group_by(time_bucket) %>%
summarize(trip_count = n())
summarized <- as.data.frame(summarized)
out_buckets <- data.frame(out_buckets = seq(as.POSIXlt("2014-10-01 00:00:00"), as.POSIXct("2014-10-31 23:0:00"), by = 600))
out <- left_join(out_buckets, summarized, by = c("out_buckets" = "time_bucket"))
out$trip_count[is.na(out$trip_count)] <- 0
head(out)
out_buckets trip_count
1 2014-10-01 00:00:00 0
2 2014-10-01 00:10:00 0
3 2014-10-01 00:20:00 0
4 2014-10-01 00:30:00 0
5 2014-10-01 00:40:00 0
6 2014-10-01 00:50:00 0
dim(out)
[1] 4459 2
test <- format(out$out_buckets,"%H:%M:%S")
test2 <- out$trip_count
test <- cbind(test, test2)
colnames(test)[1] <- "interval"
colnames(test)[2] <- "count"
test <- as.data.frame(test)
test$count <- as.numeric(test$count)
test <- aggregate(count~interval, test, sum)
head(test, n = 20)
interval count
1 00:00:00 32
2 00:10:00 33
3 00:20:00 32
4 00:30:00 31
5 00:40:00 34
6 00:50:00 34
7 01:00:00 31
8 01:10:00 33
9 01:20:00 39
10 01:30:00 41
11 01:40:00 36
12 01:50:00 31
13 02:00:00 33
14 02:10:00 34
15 02:20:00 32
16 02:30:00 32
17 02:40:00 36
18 02:50:00 32
19 03:00:00 34
20 03:10:00 39
but this is impossible because when I sum the counts
sum(test$count)
[1] 7494
I get 7494 whereas the number should be 1747
I'm not sure where I went wrong and how to simplify this code to get the same result.
I've done what I can, but I can't reproduce your issue without your data.
library(dplyr)
I created the full sequence of 10 minute blocks:
blocks.of.10mins <- data.frame(out_buckets=seq(as.POSIXct("2014/10/01 00:00"), by="10 mins", length.out=30*24*6))
Then split the start_times into the same bins. Note: I created a baseline time of midnight to force the blocks to align to 10 minute intervals. Removing this later is an exercise for the reader. I also changed one of your data points so that there was at least one example of multiple records in the same bin.
start_times <- as.POSIXct(c("2014-10-01 00:00:00", ## added
"2014-10-21 16:58:13",
"2014-10-07 10:14:22",
"2014-10-20 01:45:11",
"2014-10-17 08:16:17",
"2014-10-07 10:16:36", ## modified
"2014-10-28 17:32:34"))
trip_times <- data.frame(start_times) %>%
mutate(out_buckets = as.POSIXct(cut(start_times, breaks="10 mins")))
The start_times and all the 10 minute intervals can then be merged
trips_merged <- merge(trip_times, blocks.of.10mins, by="out_buckets", all=TRUE)
These can then be grouped by 10 minute block and counted
trips_merged %>% filter(!is.na(start_times)) %>%
group_by(out_buckets) %>%
summarise(trip_count=n())
Source: local data frame [6 x 2]
out_buckets trip_count
(time) (int)
1 2014-10-01 00:00:00 1
2 2014-10-07 10:10:00 2
3 2014-10-17 08:10:00 1
4 2014-10-20 01:40:00 1
5 2014-10-21 16:50:00 1
6 2014-10-28 17:30:00 1
Instead, if we only consider time, not date
trips_merged2 <- trips_merged
trips_merged2$out_buckets <- format(trips_merged2$out_buckets, "%H:%M:%S")
trips_merged2 %>% filter(!is.na(start_times)) %>%
group_by(out_buckets) %>%
summarise(trip_count=n())
Source: local data frame [6 x 2]
out_buckets trip_count
(chr) (int)
1 00:00:00 1
2 01:40:00 1
3 08:10:00 1
4 10:10:00 2
5 16:50:00 1
6 17:30:00 1
i have time variable : "00:00:29","00:06:39","20:43:15"....
and I want to recode to new vector - time based work shifts:
07:00:00 - 13:00:00 - 1
13:00:00 - 20:00:00 - 2
23:00:00 - 7:00:00 - 3
thanks for any idea :)
Assuming the time variables are strings as shown, this seems to work:
secNr <- function(x){ sum(as.numeric(unlist(strsplit(x,":",fixed=TRUE))) * c(3600,60,1)) }
workShift <- function(x)
{
n <- which.max(secNr(x) >= c(secNr("23:00:00"),secNr("20:00:00"),secNr("13:00:00"),secNr("07:00:00"),secNr("00:00:00")))
c(3,NA,2,1,3)[n]
}
"workShift" computes the work shift of one such time string. If you have a vector of time strings, use "sapply". Example:
> Time <- sprintf("%i:%02i:00", 0:23, sample(0:59,24))
> Shift <- sapply(Time,"workShift")
> Shift
0:37:00 1:17:00 2:35:00 3:09:00 4:08:00 5:28:00 6:03:00 7:43:00 8:27:00 9:38:00 10:48:00 11:50:00 12:58:00 13:32:00 14:05:00 15:39:00 16:56:00
3 3 3 3 3 3 3 1 1 1 1 1 1 2 2 2 2
17:00:00 18:22:00 19:02:00 20:42:00 21:11:00 22:15:00 23:01:00
2 2 2 NA NA NA 3
I am quite new to R and have been struggling with trying to convert my data and could use some much needed help.
I have a dataframe which is approx. 70,000*2. This data covers a whole year (52 weeks/365 days). A portion of it looks like this:
Create.Date.Time Ticket.ID
1 2013-06-01 12:59:00 INCIDENT684790
2 2013-06-02 07:56:00 SERVICE684793
3 2013-06-02 09:39:00 SERVICE684794
4 2013-06-02 14:14:00 SERVICE684796
5 2013-06-02 17:20:00 SERVICE684797
6 2013-06-03 07:20:00 SERVICE684799
7 2013-06-03 08:02:00 SERVICE684839
8 2013-06-03 08:04:00 SERVICE684841
9 2013-06-03 08:04:00 SERVICE684842
10 2013-06-03 08:08:00 SERVICE684843
I am trying to get the number of tickets in every hour of the week (that is, hour 1 to hour 168) for each week. Hour 1 would start on Monday at 00.00, and hour 168 would be Sunday 23.00-23.59. This would be repeated for each week. I want to use the Create.Date.Time data to calculate the hour of the week the ticket is in, say for:
2013-06-01 12:59:00 INCIDENT684790 - hour 133,
2013-06-03 08:08:00 SERVICE684843 - hour 9
I am then going to do averages for each hour and plot those. I am completely at a loss as to where to start. Could someone please point me to the right direction?
Before addressing the plotting aspect of your question, is this the format of data you are trying to get? This uses the package lubridate which you might have to install (install.packages("lubridate",dependencies=TRUE)).
library(lubridate)
##
Events <- paste(
sample(c("INCIDENT","SERVICE"),20000,replace=TRUE),
sample(600000:900000,20000)
)
t0 <- as.POSIXct(
"2013-01-01 00:00:00",
format="%Y-%m-%d %H:%M:%S",
tz="America/New_York")
Dates <- sort(t0 + sample(0:(3600*24*365-1),20000))
Weeks <- week(Dates)
wDay <- wday(Dates,label=TRUE)
Hour <- hour(Dates)
##
hourShift <- function(time,wday){
hShift <- sapply(wday, function(X){
if(X=="Mon"){
0
} else if(X=="Tues"){
24*1
} else if(X=="Wed"){
24*2
} else if(X=="Thurs"){
24*3
} else if(X=="Fri"){
24*4
} else if(X=="Sat"){
24*5
} else {
24*6
}
})
##
tOut <- hour(time) + hShift + 1
return(tOut)
}
##
weekHour <- hourShift(time=Dates,wday=wDay)
##
Data <- data.frame(
Event=Events,
Timestamp=Dates,
Week=Weeks,
wDay=wDay,
dayHour=Hour,
weekHour=weekHour,
stringsAsFactors=FALSE)
##
This gives you:
> head(Data)
Event Timestamp Week wDay dayHour weekHour
1 SERVICE 783405 2013-01-01 00:13:55 1 Tues 0 25
2 INCIDENT 860015 2013-01-01 01:06:41 1 Tues 1 26
3 INCIDENT 808309 2013-01-01 01:10:05 1 Tues 1 26
4 INCIDENT 835509 2013-01-01 01:21:44 1 Tues 1 26
5 SERVICE 769239 2013-01-01 02:04:59 1 Tues 2 27
6 SERVICE 762269 2013-01-01 02:07:41 1 Tues 2 27