I have a data file that needs to averaged.
data<-data.frame(
Data=seq(
from=as.POSIXct("2014-04-01 00:00:00"),
to=as.POSIXct("2014-04-03 00:00:00"),
by ="5 min"
),
value=rnorm(577,0,1)
)
I need to find the average of "value" from 05:00:00 to 17:00:00 and then 17:00:00 to 05:00:00 (of the following day).
e.g. from 2014-04-01 05:00:00 to 2014-04-01 17:00:00 and from 2014-04-01 17:00:00 to 2014-04-02 05:00:00
The real data is not continuous and is missing several intervals. I can do it for the same day, but I don't know how to include the time from the following day.
Here's one strategy. You can use the cut.POSIXt and the seq.POSIXt to create an interval factor and then use that to take the means of the different intervals.
intervals<-cut(
data$Data,
breaks=seq(
as.POSIXct("2014-03-31 17:00:00"),
as.POSIXct("2014-04-03 5:00:00"),
by="12 hours"
)
)
means<-tapply(data$value, intervals, mean)
as.data.frame(means)
Here is a way:
day <- data[as.numeric(strftime(data$Data,"%H")) > 5 &
as.numeric(strftime(data$Data,"%H")) < 17,]
night <- data[as.numeric(strftime(data$Data,"%H")) < 5 |
as.numeric(strftime(data$Data,"%H")) > 17,]
strftime returns a character vector, which is why it is nested inside as.numeric here. From there it is just indexing.
Related
I'm having trouble converting character values into date (hour + minutes), I have the following codes:
start <- c("2022-01-10 9:35PM","2022-01-10 10:35PM")
end <- c("2022-01-11 7:00AM","2022-01-11 8:00AM")
dat <- data.frame(start,end)
These are all in character form. I would like to:
Convert all the datetimes into date format and into 24hr format like: "2022-01-10 9:35PM" into "2022-01-10 21:35",
and "2022-01-11 7:00AM" into "2022-01-11 7:00" because I would like to calculate the difference between the dates in hrs.
Also I would like to add an ID column with a specific ID, the desired data would like this:
ID <- c(101,101)
start <- c("2022-01-10 21:35","2022-01-10 22:35")
end <- c("2022-01-11 7:00","2022-01-11 8:00")
diff <- c(9,10) # I'm not sure how the calculations would turn out to be
dat <- data.frame(ID,start,end,diff)
I would appreciate all the help there is! Thanks!!!
You can use lubridate::ymd_hm. Don't use floor if you want the exact value.
library(dplyr)
library(lubridate)
dat %>%
mutate(ID = 101,
across(c(start, end), ymd_hm),
diff = floor(end - start))
start end ID diff
1 2022-01-10 21:35:00 2022-01-11 07:00:00 101 9 hours
2 2022-01-10 22:35:00 2022-01-11 08:00:00 101 9 hours
The base R approach with strptime is:
strptime(dat$start, "%Y-%m-%d %H:%M %p")
[1] "2022-01-10 09:35:00 CET" "2022-01-10 10:35:00 CET"
I have a column called "time" with some observations in "hours: minutes: seconds" and others only with "hours: minutes". I would like to remove the seconds and be left with only hours and minutes.
So far I have loaded the lubridate package and tried:
format(data$time ,format = "%H:%M")
but no change occurs.
And with:
data$time <- hm(data$time)
all the observations with h:m:s become NAs
What should I do?
You can use parse_date_time from lubridate to bring time into POSIXct format and then use format to keep the information that you need.
data <- data.frame(time = c('10:04:00', '14:00', '15:00', '12:34:56'))
data$time1 <- format(lubridate::parse_date_time(x, c('HMS', 'HM')), '%H:%M')
data
# time time1
#1 10:04:00 10:04
#2 14:00 14:00
#3 15:00 15:00
#4 12:34:56 12:34
I have a csv of real-time data inputs with timestamps and I am looking to group these data in a time series of 30 mins for analysis.
A sample of the real-time data is
Date:
2019-06-01 08:03:04
2019-06-01 08:20:04
2019-06-01 08:33:04
2019-06-01 08:54:04
...
I am looking to group them in a table with a step increment of 30 mins (i.e. 08:30, 09:00, etc..) to seek out the number of occurences during each period. I created a new csv file to access through R. This is so that I will not corrupt the formatting of the orginal dataset.
Date:
2019-06-01 08:00
2019-06-01 08:30
2019-06-01 09:00
2019-06-01 09:30
I have firstly constructed a list of 30 mins intervals by:
sheet_csv$Date <- as.POSIXct(paste(sheet_csv$Date), format = "%Y-%m-%d %H:%M", tz = "GMT") #to change to POSIXct
sheet_csv$Date <- timeDate::timeSequence(from = "2019-06-01 08:00", to = "2019-12-03 09:30", by = 1800,
format = "%Y-%m-%d %H:%M", zone = "GMT")
I encountered an error "Error in x[[idx]][[1]] : this S4 class is not subsettable" for this interval.
I am relatively new to R. Please do help out where you can. Greatly Appreciated.
You probably don't need the timeDate package for something like this. One package that is very helpful to manipulate dates and times is lubridate - you may want to consider going forward.
I used your example and added another date/time for illustration.
To create your 30 minute intervals, you could use cut and seq.POSIXt to create a sequence of date/times with 30 minute breaks. I used your minimum date/time to start with (rounding down to nearest hour) but you can also specify another date/time here.
Use of table will give you frequencies after cut.
sheet_csv <- data.frame(
Date = c("2019-06-01 08:03:04",
"2019-06-01 08:20:04",
"2019-06-01 08:33:04",
"2019-06-01 08:54:04",
"2019-06-01 10:21:04")
)
sheet_csv$Date <- as.POSIXct(sheet_csv$Date, format = "%Y-%m-%d %H:%M:%S", tz = "GMT")
as.data.frame(table(cut(sheet_csv$Date,
breaks = seq.POSIXt(from = round(min(sheet_csv$Date), "hours"),
to = max(sheet_csv$Date) + .5 * 60 * 60,
by = "30 min"))))
Output
Var1 Freq
1 2019-06-01 08:00:00 2
2 2019-06-01 08:30:00 2
3 2019-06-01 09:00:00 0
4 2019-06-01 09:30:00 0
5 2019-06-01 10:00:00 1
I want to make a time series with the frequency a date and time is observed. The raw data looked something like this:
dd-mm-yyyy hh:mm
28-2-2018 0:12
28-2-2018 11:16
28-2-2018 12:12
28-2-2018 13:22
28-2-2018 14:23
28-2-2018 14:14
28-2-2018 16:24
The date and time format is in the wrong way for R, so I had to adjust it:
extracted_times <- as.POSIXct(bedrijf.CSV$viewed_at, format = "%d-%m-%Y %H:%M")
I ordered the data with frequency in a table using the following code:
timeserieswithoutzeros <- table(extracted_times)
The data looks something like this now:
2018-02-28 00:11:00 2018-02-28 01:52:00 2018-02-28 03:38:00
1 2 5
2018-02-28 04:10:00 2018-02-28 04:40:00 2018-02-28 04:45:00
2 1 1
As you may see there are a lot of unobserved dates and times.
I want to add these unobserved dates and times with the frequency of 0.
I tried the complete function, but the error states that it can't best used, because I use as.POSIXct().
Any ideas?
As already mentinoned in the comments by #eric-lecoutre, you can combine your observations with a sequence begining at the earliest ending at the last date using seq and subtract 1 of the frequency table.
timeseriesWithzeros <- table(c(extracted_times, seq(min(extracted_times), max(extracted_times), "1 min")))-1
Maybe the following is what you want.
First, coerce the data to class "POSIXt" and create the sequence of all date/time between min and max by steps of 1 minute.
bedrijf.CSV$viewed_at <- as.POSIXct(bedrijf.CSV$viewed_at, format = "%d-%m-%Y %H:%M")
new <- seq(min(bedrijf.CSV$viewed_at),
max(bedrijf.CSV$viewed_at),
by = "1 mins")
tmp <- data.frame(viewed_at = new)
Now see if these values are in the original data.
tmp$viewed <- tmp$viewed_at %in% bedrijf.CSV$viewed_at
tbl <- xtabs(viewed ~ viewed_at, tmp)
sum(tbl != 0)
#[1] 7
Final clean up.
rm(new, tmp)
I have a data frame with a series of times in the following format:
08:09:23.079
> class(timer3)
[1] "factor"
I would like to round/convert them to this format:
08:09
The end goal is to use them as values for the x-axis of a plot so I assume they would need to go to some type of time format (zoo, as.Date, etc.).
Any suggestions?
Suppose we have this input data:
DF <- data.frame(times = c("08:09:23.079", "08:30:13.062"), values = 1:2)
To keep it simple lets assume that there is at most one time point per minute (we show an alternative that is slightly longer afterwards without this restriction):
library(zoo)
library(chron)
# this assumes we want to store times to the second
tt <- times(as.character(DF$times))
z <- zoo(DF$values, tt)
plot(z, xaxt = "n")
# custom axis - assumes sufficiently many points to get reasonable graph
# round tick mark locations to the minute and remove the seconds from label
axt <- trunc(times(axTicks(1)), "min")
axis(1, at = axt, lab = sub(":..$", "", axt))
The above method of creating z could alternately be replaced with this. It works whether or not there is more than one point per minute as it aggregates them to the minute:
# with this z we will be store times to the minute
z <- read.zoo(DF, FUN = function(x) trunc(times(as.character(x)), "min"),
aggregate = mean)
EDIT: plotting and truncation.
At risk of being called necromancer, I will answer this question as I think this situation arises quite often.
Here is how to do it if you convert your timeseries data in xts format. The function to be used here is align.time
> head(GBPJPY)
GBPJPY.Open GBPJPY.High GBPJPY.Low GBPJPY.Close
2009-05-01 00:14:59 146.387 146.882 146.321 146.620
2009-05-01 00:29:54 146.623 146.641 146.434 146.579
2009-05-01 00:44:59 146.579 146.908 146.570 146.810
2009-05-01 00:59:59 146.810 146.842 146.030 146.130
2009-05-01 01:14:59 146.130 146.330 146.100 146.315
2009-05-01 01:29:57 146.315 146.382 146.159 146.201
> head(align.time(GBPJPY, 15*60))
GBPJPY.Open GBPJPY.High GBPJPY.Low GBPJPY.Close
2009-05-01 00:15:00 146.387 146.882 146.321 146.620
2009-05-01 00:30:00 146.623 146.641 146.434 146.579
2009-05-01 00:45:00 146.579 146.908 146.570 146.810
2009-05-01 01:00:00 146.810 146.842 146.030 146.130
2009-05-01 01:15:00 146.130 146.330 146.100 146.315
2009-05-01 01:30:00 146.315 146.382 146.159 146.201
as.zoo(sapply(timer3,substring,1,5))
or as.xts?
Maybe looking at a bigger sample of your data would help.
Two steps: 1) Factor to character: as.character() 2) character to POSIXct: strptime()