I am using the R programming language. I am trying to take the difference between two date columns. Both dates are in the following format : 2010-01-01 12:01
When I bring my file into R, the dates are in "Factor" format. Here is my attempt to recreate the file in R:
#how my file looks like when I import it into R
date_1 = c("2010-01-01 13:01 ", "2010-01-01 14:01" )
date_2 = c("2010-01-01 15:01 ", "2010-01-01 16:01" )
file = data.frame(date_1, date_2)
file$date_1 = as.factor(file$date_1)
file$date_2 = as.factor(file$date_2)
Now, I am trying to create a new column which takes the difference between these dates (in minutes)
I first tried to convert both date variables into the appropriate "Date" formats:
#convert to date formats:
file$date_a = as.POSIXlt(file$date_1,format="%Y-%m-%dT%H:%M")
file$date_b = as.POSIXlt(file$date_2,format="%Y-%m-%dT%H:%M")
Then, I tried to take the difference :
file$diff = difftime(file$date_a, file$date_b, units="mins")
But this results in "NA's":
> file
date_1 date_2 date_a date_b diff
1 2010-01-01 13:01 2010-01-01 13:01 <NA> <NA> NA mins
2 2010-01-01 13:01 2010-01-01 13:01 <NA> <NA> NA mins
Can someone please show me what I am doing wrong?
Thanks
Reference: How to get difference (in minutes) between two date strings?
There is no T in the string. So, we need the format as
difftime(as.POSIXct(file$date_1, format = '%Y-%m-%d %H:%M'),
as.POSIXct(file$date_2, format = '%Y-%m-%d %H:%M'), units = 'mins')
#Time differences in mins
#[1] -120 -120
Related
I'm having trouble converting character values into date (hour + minutes), I have the following codes:
start <- c("2022-01-10 9:35PM","2022-01-10 10:35PM")
end <- c("2022-01-11 7:00AM","2022-01-11 8:00AM")
dat <- data.frame(start,end)
These are all in character form. I would like to:
Convert all the datetimes into date format and into 24hr format like: "2022-01-10 9:35PM" into "2022-01-10 21:35",
and "2022-01-11 7:00AM" into "2022-01-11 7:00" because I would like to calculate the difference between the dates in hrs.
Also I would like to add an ID column with a specific ID, the desired data would like this:
ID <- c(101,101)
start <- c("2022-01-10 21:35","2022-01-10 22:35")
end <- c("2022-01-11 7:00","2022-01-11 8:00")
diff <- c(9,10) # I'm not sure how the calculations would turn out to be
dat <- data.frame(ID,start,end,diff)
I would appreciate all the help there is! Thanks!!!
You can use lubridate::ymd_hm. Don't use floor if you want the exact value.
library(dplyr)
library(lubridate)
dat %>%
mutate(ID = 101,
across(c(start, end), ymd_hm),
diff = floor(end - start))
start end ID diff
1 2022-01-10 21:35:00 2022-01-11 07:00:00 101 9 hours
2 2022-01-10 22:35:00 2022-01-11 08:00:00 101 9 hours
The base R approach with strptime is:
strptime(dat$start, "%Y-%m-%d %H:%M %p")
[1] "2022-01-10 09:35:00 CET" "2022-01-10 10:35:00 CET"
I have a csv of real-time data inputs with timestamps and I am looking to group these data in a time series of 30 mins for analysis.
A sample of the real-time data is
Date:
2019-06-01 08:03:04
2019-06-01 08:20:04
2019-06-01 08:33:04
2019-06-01 08:54:04
...
I am looking to group them in a table with a step increment of 30 mins (i.e. 08:30, 09:00, etc..) to seek out the number of occurences during each period. I created a new csv file to access through R. This is so that I will not corrupt the formatting of the orginal dataset.
Date:
2019-06-01 08:00
2019-06-01 08:30
2019-06-01 09:00
2019-06-01 09:30
I have firstly constructed a list of 30 mins intervals by:
sheet_csv$Date <- as.POSIXct(paste(sheet_csv$Date), format = "%Y-%m-%d %H:%M", tz = "GMT") #to change to POSIXct
sheet_csv$Date <- timeDate::timeSequence(from = "2019-06-01 08:00", to = "2019-12-03 09:30", by = 1800,
format = "%Y-%m-%d %H:%M", zone = "GMT")
I encountered an error "Error in x[[idx]][[1]] : this S4 class is not subsettable" for this interval.
I am relatively new to R. Please do help out where you can. Greatly Appreciated.
You probably don't need the timeDate package for something like this. One package that is very helpful to manipulate dates and times is lubridate - you may want to consider going forward.
I used your example and added another date/time for illustration.
To create your 30 minute intervals, you could use cut and seq.POSIXt to create a sequence of date/times with 30 minute breaks. I used your minimum date/time to start with (rounding down to nearest hour) but you can also specify another date/time here.
Use of table will give you frequencies after cut.
sheet_csv <- data.frame(
Date = c("2019-06-01 08:03:04",
"2019-06-01 08:20:04",
"2019-06-01 08:33:04",
"2019-06-01 08:54:04",
"2019-06-01 10:21:04")
)
sheet_csv$Date <- as.POSIXct(sheet_csv$Date, format = "%Y-%m-%d %H:%M:%S", tz = "GMT")
as.data.frame(table(cut(sheet_csv$Date,
breaks = seq.POSIXt(from = round(min(sheet_csv$Date), "hours"),
to = max(sheet_csv$Date) + .5 * 60 * 60,
by = "30 min"))))
Output
Var1 Freq
1 2019-06-01 08:00:00 2
2 2019-06-01 08:30:00 2
3 2019-06-01 09:00:00 0
4 2019-06-01 09:30:00 0
5 2019-06-01 10:00:00 1
I want to make a time series with the frequency a date and time is observed. The raw data looked something like this:
dd-mm-yyyy hh:mm
28-2-2018 0:12
28-2-2018 11:16
28-2-2018 12:12
28-2-2018 13:22
28-2-2018 14:23
28-2-2018 14:14
28-2-2018 16:24
The date and time format is in the wrong way for R, so I had to adjust it:
extracted_times <- as.POSIXct(bedrijf.CSV$viewed_at, format = "%d-%m-%Y %H:%M")
I ordered the data with frequency in a table using the following code:
timeserieswithoutzeros <- table(extracted_times)
The data looks something like this now:
2018-02-28 00:11:00 2018-02-28 01:52:00 2018-02-28 03:38:00
1 2 5
2018-02-28 04:10:00 2018-02-28 04:40:00 2018-02-28 04:45:00
2 1 1
As you may see there are a lot of unobserved dates and times.
I want to add these unobserved dates and times with the frequency of 0.
I tried the complete function, but the error states that it can't best used, because I use as.POSIXct().
Any ideas?
As already mentinoned in the comments by #eric-lecoutre, you can combine your observations with a sequence begining at the earliest ending at the last date using seq and subtract 1 of the frequency table.
timeseriesWithzeros <- table(c(extracted_times, seq(min(extracted_times), max(extracted_times), "1 min")))-1
Maybe the following is what you want.
First, coerce the data to class "POSIXt" and create the sequence of all date/time between min and max by steps of 1 minute.
bedrijf.CSV$viewed_at <- as.POSIXct(bedrijf.CSV$viewed_at, format = "%d-%m-%Y %H:%M")
new <- seq(min(bedrijf.CSV$viewed_at),
max(bedrijf.CSV$viewed_at),
by = "1 mins")
tmp <- data.frame(viewed_at = new)
Now see if these values are in the original data.
tmp$viewed <- tmp$viewed_at %in% bedrijf.CSV$viewed_at
tbl <- xtabs(viewed ~ viewed_at, tmp)
sum(tbl != 0)
#[1] 7
Final clean up.
rm(new, tmp)
I'm new to R, so this may very well be a simple problem, but it's causing me a lot of difficulty.
I am trying to subset between two values found across data frames, and I am having difficulty when trying to subset between these two values. I will first describe what I've done, what is working, and then what is not working.
I have two data frames. One has a series of storm data, including dates of storm events, and the other has a series of data corresponding to discharge for many thousands of monitoring events. I am trying to see if any of the discharge data corresponds within the storm event start and end dates/times.
What I have done thus far is as follows:
Example discharge data:
X. DateTime Depth DateTime1 newcol
1 3 8/2/2013 13:15 0.038 2013-08-02 13:15:00 1375463700
2 4 8/2/2013 13:30 0.038 2013-08-02 13:30:00 1375464600
3 5 8/2/2013 13:45 0.039 2013-08-02 13:45:00 1375465500
4 6 8/2/2013 14:00 0.039 2013-08-02 14:00:00 1375466400
Example storm data:
Storm newStart newEnd
1 1 1382125500 1382130000
2 2 1385768100 1385794200
#Make a value to which the csv files are attached
CA_Storms <- read.csv(file = "CA_Storms.csv", header = TRUE, stringsAsFactors = FALSE)
CA_adj <- read.csv(file = "CA_Adj.csv", header = TRUE, stringsAsFactors = FALSE)
#strptime function (do this for all data sets)
CA_adj$DateTime1 <- strptime(CA_adj$DateTime, format = "%m/%d/%Y %H:%M")
CA_Storms$Start.time1 <- strptime(CA_Storms$Start.time, format = "%m/%d/%Y %H:%M")
CA_Storms$End.time1 <- strptime(CA_Storms$End.time, format = "%m/%d/%Y %H:%M")
#Make dates and times continuous
CA_adj$newcol <- as.numeric(CA_adj$DateTime1)
CA_Storms$newStart <- as.numeric(CA_Storms$Start.time1)
CA_Storms$newEnd <- as.numeric(CA_Storms$End.time1)
This allows me to do the following subsets successfully:
CA_adj[CA_adj$newcol == "1375463700", ]
Example output:
X. DateTime Depth DateTime1 newcol
1 3 8/2/2013 13:15 0.038 2013-08-02 13:15:00 1375463700
CA_adj[CA_adj$newcol == CA_Storms[1,19], ]
X. DateTime Depth DateTime1 newcol
7403 7408 10/18/2013 15:45 0.058 2013-10-18 15:45:00 1382125500
CA_adj[CA_adj$newcol <= CA_Storms[1,20], ]
However, whenever I try to have it move between two values, such as in:
CA_adj[CA_adj$newcol >= CA_Storms[1,19] & CA_adj$newol <= CA_Storms[1,20], ]
it responds with this:
[1] X. DateTime Depth DateTime1 newcol
<0 rows> (or 0-length row.names)
I know this output is incorrect, as, through a cursory look through my large data set, there is at least one value that falls within these criteria.
What gives?
discharge<-data.frame( x=c(3,4,5,6),
DateTime=c("8/2/2013 13:15","8/2/2013 13:30",
"8/2/2013 13:45","8/2/2013 14:00"),
Depth=c(0.038, 0.038, 0.039, 0.039)
)
discharge$DateTime1<- as.POSIXct(discharge$DateTime, format = "%m/%d/%Y %H:%M")
storm<-data.frame( storm=c(1,2),
start=c("8/2/2013 13:15","8/2/2013 16:30"),
end=c("8/2/2013 13:45","8/2/2013 16:45")
)
storm$start<- as.POSIXct(storm$start, format = "%m/%d/%Y %H:%M")
storm$end<- as.POSIXct(storm$end, format = "%m/%d/%Y %H:%M")
discharge[(discharge$DateTime1>=storm[1,2] & discharge$DateTime1<=storm[1,3]),]
We have a csv file with Dates in Excel format and Nav for Manager A and Manager B as follows:
Date,Manager A,Date,Manager B
41346.6666666667,100,40932.6666666667,100
41347.6666666667,100,40942.6666666667,99.9999936329992
41348.6666666667,100,40945.6666666667,99.9999936397787
41351.6666666667,100,40946.6666666667,99.9999936714362
41352.6666666667,100,40947.6666666667,100.051441180137
41353.6666666667,100,40948.6666666667,100.04877283951
41354.6666666667,100.000077579585,40949.6666666667,100.068400298752
41355.6666666667,100.00007861475,40952.6666666667,100.070263374822
41358.6666666667,100.000047950872,40953.6666666667,99.9661095940006
41359.6666666667,99.9945012295984,40954.6666666667,99.8578245935173
41360.6666666667,99.9944609274138,40955.6666666667,99.7798031949116
41361.6666666667,99.9944817907402,40956.6666666667,100.029523604978
41366.6666666667,100,40960.6666666667,100.14859511024
41367.6666666667,99.4729804387476,40961.6666666667,99.7956029017769
41368.6666666667,99.4729804387476,40962.6666666667,99.7023420799123
41369.6666666667,99.185046151864,40963.6666666667,99.6124531927299
41372.6666666667,99.1766469096966,40966.6666666667,99.5689030038018
41373.6666666667,98.920738006398,40967.6666666667,99.5701493637685
,,40968.6666666667,99.4543885041996
,,40969.6666666667,99.3424528379521
We want to create a zoo object with the following structure [Dates, Manager A Nav, Manager B Nav].
After reading the csv file with:
data = read.csv("...", header=TRUE, sep=",")
we set an index for splitting the object and use lapply to split
INDEX <- seq(1, by = 2, length = ncol(data) / 2)
data.zoo <- lapply(INDEX, function(i, data) data[i:(i+1)], data = zoo(data))
I'm stuck with the fact that Dates are in Excel format and don't know how to fix that stuff. Is the problem set in a correct way?
If all you want to do is to convert the dates to proper dates you can do this easily enough. The thing you need to know is the origin date. Your numbers represent the integer and fractional number of days that have passed since the origin date. Usually this is Jan 0 1990!!! Go figure, but be careful as I don't think this is always the case. You can try this...
# Excel origin is day 0 on Jan 0 1900, but treats 1900 as leap year so...
data$Date <- as.Date( data$Date , origin = "1899/12/30")
data$Date.1 <- as.Date( data$Date.1 , origin = "1899/12/30")
# For more info see ?as.Date
If you are interested in keeping the times as well, you can use as.POSIXct, but you must also specify the timezone (UTC by default);
data$Date <- as.POSIXct(data$Date, origin = "1899/12/30" )
head(data)
# Date Manager.A Date.1 Manager.B
# 1 2013-03-13 16:00:00 100 2012-01-24 100.00000
# 2 2013-03-14 16:00:00 100 2012-02-03 99.99999
# 3 2013-03-15 16:00:00 100 2012-02-06 99.99999
# 4 2013-03-18 16:00:00 100 2012-02-07 99.99999
# 5 2013-03-19 16:00:00 100 2012-02-08 100.05144
# 6 2013-03-20 16:00:00 100 2012-02-09 100.04877