R: aggregate by date - (every 30min mean) [closed]

R: aggregate by date - (every 30min mean) [closed] - r

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 6 years ago.
Improve this question
I have been struggling with this for a while now:
I have a data frame that contains 5-minute measurements (for around 6 months) of different parameters. I want to aggregate them and get the mean of every parameter every 30 min. Here is a short example:
TIMESTAMP <- c("2015-12-31 0:30", "2015-12-31 0:35","2015-12-31 0:40", "2015-12-31 0:45", "2015-12-31 0:50", "2015-12-31 0:55", "2015-12-31 1:00", "2015-12-31 1:05", "2015-12-31 1:10", "2015-12-31 1:15", "2015-12-31 1:20", "2015-12-31 1:25", "2015-12-31 1:30")
value1 <- c(45, 50, 68, 78, 99, 100, 5, 9, 344, 10, 45, 68, 33)
mymet <- as.data.frame(TIMESTAMP, value1)
mymet$TIMESTAMP <- as.POSIXct(mymet$TIMESTAMP, format = "%Y-%m-%d %H:%M")
halfhour <- aggregate(mymet, list(TIME = cut(mymet$TIMESTAMP, breaks = "30 mins")),
mean, na.rm = TRUE)
What I want to get is the average between 00:35 and 1:00 and call this DATE-1:00AM, however, what I get is: average between 00:30 and 00:55 and this is called DATE-12:30am.
How can I change the function to give me the values that I want?

The trick (I think) is looking at when your first observation starts. If the first observation is 00:35 and you do the 30 minute cut then the intervals should follow the logic you want. Regarding the name of the Breaks it's just a matter of adding 25 minutes to the name and then you get what you want. Here is an example for 6 months of 2015:
require(lubridate)
require(dplyr)
TIMESTAMP <- seq(ymd_hm('2015-01-01 00:00'),ymd_hm('2015-06-01 23:55'), by = '5 min')
TIMESTAMP <- data.frame(obs=1:length(TIMESTAMP),TS=TIMESTAMP)
TIMESTAMP <- TIMESTAMP[-(1:7),] #TO start with at 00:35 minutes
TIMESTAMP$Breaks <- cut(TIMESTAMP$TS, breaks = "30 mins")
TIMESTAMP$Breaks <- ymd_hms(as.character(TIMESTAMP$Breaks)) + (25*60)
Averages <- TIMESTAMP %>% group_by(Breaks) %>% summarise(MeanObs=mean(obs,na.rm = TRUE))

If you get mymet constructed properly, you can cut TIMESTAMP into bins (which you can do with cut.POSIXt) so you can aggregate:
mymet$half_hour <- cut(mymet$TIMESTAMP, breaks = "30 min")
aggregate(value1 ~ half_hour, mymet, mean)
## half_hour value1
## 1 2015-12-31 00:30:00 73.33333
## 2 2015-12-31 01:00:00 80.16667
## 3 2015-12-31 01:30:00 33.00000
Data
mymet <- structure(list(TIMESTAMP = structure(c(1451539800, 1451540100,
1451540400, 1451540700, 1451541000, 1451541300, 1451541600, 1451541900,
1451542200, 1451542500, 1451542800, 1451543100, 1451543400), class = c("POSIXct",
"POSIXt"), tzone = ""), value1 = c(45, 50, 68, 78, 99, 100, 5,
9, 344, 10, 45, 68, 33)), .Names = c("TIMESTAMP", "value1"), row.names = c(NA,
-13L), class = "data.frame")

Related

Grouping rows of large dataset in R

I am trying to calculate driver activity using GPS data. I've written a loop that calculates the difference in time between two consecutive points in a dataframe over the range of values, summing it as it goes.
Here is an example of my data:
DriveNo Date.and.Time Latitude Longitude
1 156 2014-01-31 23:00:00 41.88367 12.48778
2 187 2014-01-31 23:00:01 41.92854 12.46904
3 297 2014-01-31 23:00:01 41.89107 12.49270
4 89 2014-01-31 23:00:01 41.79318 12.43212
5 79 2014-01-31 23:00:01 41.90028 12.46275
6 191 2014-01-31 23:00:02 41.85231 12.57741
Reprex:
taxi_noOutlier <- structure(list(DriveNo = c(156, 187, 297, 89, 79, 191),
Date.and.Time = structure(c(1391209200.73917, 1391209201.14846,
1391209201.22007, 1391209201.47085, 1391209201.63114, 1391209202.04855),
class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Latitude = c(41.883670807, 41.928543091, 41.891067505, 41.793178558,
41.900276184, 41.852306366),
Longitude = c(12.48777771, 12.469037056, 12.492704391, 12.432122231,
12.46274662, 12.577406883)),
row.names = c(NA, 6L), class = "data.frame")
And the loop:
taxi_156 <- filter(taxi_noOutlier, DriveNo == 156)
datelist = taxi_156$Date.and.Time
dlstandard = as.POSIXlt(datelist)
diffsum <- as.numeric(sum(Filter(function(x) x <= 60, difftime(tail(dlstandard, -1), head(dlstandard, -1), units = 'secs'))))
print(paste("The total activity time for driver #156 is ", diffsum))
Which gives an output of:
[1] "The total activity time for driver #264 is 705655.37272048"
My question is, how can I expand this code to find the activity for each other driver? (There are 374 unique drivers, each with thousands of points.) I have tried to replicate the above code using a loop that would calculate the time difference for each DriveNo, but I am new to R and I my understanding of loop syntax isn't great.
Can I filter into separate dataframes using a method like this? (This gives an error to do with unexpected bracketing).
for (i in seq_along(taxi_noOutlier$DriveNo))
{
taxi_[[i]] <- filter(taxi_noOutlier, DriveNo == [[i]])
}
and then use my original code on each one? Or is there a more efficient way? Thanks

You can group_by each DriveNo get the difference between consecutive Date.and.Time, remove the values which are less than a minute and sum the differences.
library(dplyr)
taxi_noOutlier %>%
group_by(DriveNo) %>%
mutate(difftime = difftime(Date.and.Time, lag(Date.and.Time), units = 'secs')) %>%
filter(difftime <= 60) %>%
summarise(diffsum = sum(as.numeric(difftime), na.rm = TRUE)) -> result
result

Is this what you need.
The separate dataframes are stored in the list taxi.list.
taxi.list <- list()
for (i in taxi_noOutlier$DriveNo){
name <- paste0("taxi_",i)
taxi.list[[name]] <- filter(taxi_noOutlier, DriveNo == i)
#same as
#taxi.list[[name]] <- taxi_noOutlier %>% filter(DriveNo == i)
}

how to generate speific periods during serval days in R

I want to generate the same period during serval days, e.g. from 09:30:00 to 16:00:00 every day, and I know that
dates<- seq(as.POSIXct("2000-01-01 9:00",tz='UTC'), as.POSIXct("2000-04-9 16:00",tz='UTC'), by=300)
can help me obtain the time series observed every 5 minutes during 24 hours in 100 days. But what I want is the 09:30:00 to 16:00:00 over 100 days.
Thanks in advance

Here is one way. We can create a date sequence for every day, and then create sub-list with each day for the five minute interval. Finally, we can combine this list. final_seq is the final output.
date_seq <- seq(as.Date("2000-01-01"), as.Date("2000-04-09"), by = 1)
hour_seq <- lapply(date_seq, function(x){
temp_date <- as.character(x)
temp_seq <- seq(as.POSIXct(paste(temp_date, "09:30"), tz = "UTC"),
as.POSIXct(paste(temp_date, "16:00"), tz = "UTC"),
by = 300)
})
final_seq <- do.call("c", hour_seq)

An option using tidyr::crossing() (which I love) and the lubridate package:
crossing(c1 = paste(dmy("01/01/2000") + seq(1:100), "09:30"),
c2 = seq(0, 390, 5)) %>%
mutate(time_series = ymd_hm(c1) + minutes(c2)) %>%
pull(time_series)

How to format dates in data.frame and ggplot? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I have the following data frame:
df <- data.frame(A=c("2019-01", "2019-02", "2019-03", "2019-04", "2019-05"),
B=c(12.5, 24.5, 23.4, 45.0, 12.0))
## + > df
## A B
## 1 2019-01 12.5
## 2 2019-02 24.5
## 3 2019-03 23.4
## 4 2019-04 45.0
## 5 2019-05 12.0
Where column A contains dates (YYYY-MM) and column B the observations corresponding to those dates.
I want to plot the graph using ggplot2 and I need the dates to display along the X axis with the format MM-YY, e.g. Jan-19.

Please consider giving us a reproducible example next time and stating your problem in a much more precise way.
Nonetheless, with the information you provide, you could do something as follows:
First, let's make a vector with the correct date data to avoid further problems. I'm assuming your observations are done the first day of the month, so your vector A would be something as:
A <- paste0(c("2019-01", "2019-02", "2019-03", "2019-04", "2019-05"), "-01")
If we create directly the data.frame it would be something as:
df <- data.frame(A=paste0(c("2019-01", "2019-02", "2019-03", "2019-04", "2019-05"), "-01"),
B=c(12.5, 24.5, 23.4, 45.0, 12.0), stringsAsFactors = FALSE)
Or using magrittr:
library(magrittr)
df <- paste0(c("2019-01", "2019-02", "2019-03", "2019-04", "2019-05"), "-01") %>%
data.frame(A=.,B=c(12.5, 24.5, 23.4, 45.0, 12.0), stringsAsFactors = FALSE)
Then we format A as date:
df$A <- as.Date(df$A, format="%Y-%m-%d")
To plot it you should do something like:
library(ggplot2)
ggplot(data = df, aes(x=A, y=B)) +
geom_line() +
scale_x_date(date_labels = "%b-%y", date_breaks = "1 month") +
theme_light() +
labs(x="time") +
theme(legend.position = "bottom")
Hope it helps
PS: check out this post regarding date formats

Multiple coordinates with dates in the suncalc R package -how to find sunrise/sunset times

I've got thousands of location points (latitude and longitude) with timestamps (YYYY-MM-DD HH:MM:SS) that I need the sunrise and sunset times for each position.
Tried doing this in in the R package "suncalc" but the examples given in the vignette accompanying the package are not practical real-world examples and gives no obvious solution to the coding I can use for my specific need.
First I tried the following code, which works great for just one date and one location:
> getSunlightTimes(date = date("2019-05-12"), lat = 24, lon = 28, keep = c("sunrise", "sunset"), tz = "CET")
date lat lon sunrise sunset
1 2019-05-12 24 28 2019-05-12 05:28:29 2019-05-12 18:42:55
Then I try run it with a few more dates and coordinates:
data <- data.frame(date = c("2019-05-12", "2019-05-13", "2019-05-14"),
lat = c(-24, -25, -26),
lon = c(28, 29, 20))
getSunlightTimes(data = data,
keep = c("sunrise", "sunset"), tz = "CET")
I would expect to get a result with the sunrise and sunset times for each of the three locations (e.g. one result for -24, 28 on 2019-05-12, another for -25, 29 on the 2019-05-13 etc), alas instead I get:
Error in getSunlightTimes(data = data, keep = c("sunrise", "sunset"), : date must to be a Date object (class Date)
Anyone?

You need to use as.Date to create multiple dates:
data <- data.frame(date = as.Date(c("2019-05-12", "2019-05-13", "2019-05-14")),
lat = c(-24, -25, -26), lon = c(28, 29, 20))

Converting Excel times to POSIXct in r

I'm using an Excel dataset where the time values, MM:SS, come in numeric values that I need to convert to POSIXct in r and then make calculations.
Below is sample data of what I have and I need to get
dfOrig <- data.frame(StandarTime = c(615,735,615 ),
AchievedTime = c(794,423,544 ))
This is what I'm looking for:
dfCleaned <- data.frame(StandarTime = c("2017-08-25 10:15",
"2017-08-25 12:15",
"2017-08-25 10:15" ),
AchievedTime = c("2017-08-25 13:14 PDT",
"2017-08-25 7:03 PDT",
"2017-08-25 9:04 PDT" ))
I'm not sure how to best approach this problem.

Not sure what the values are but in case these are seconds you can use:
> dfOrig$StandarTime <- ISOdate(2017, 8, 25, hour = 0) + dfOrig$StandarTime
> dfOrig$AchievedTime <- ISOdate(2017, 8, 25, hour = 0) + dfOrig$AchievedTime
> dfOrig
StandarTime AchievedTime
1 2017-08-25 00:10:15 2017-08-25 00:13:14
2 2017-08-25 00:12:15 2017-08-25 00:07:03
3 2017-08-25 00:10:15 2017-08-25 00:09:04
ISOdate(2017, 8, 25, hour = 0) sets the start time, then you can add a value in seconds. You can also specify a time zone using tz = ""

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R: aggregate by date - (every 30min mean) [closed] - r

Related

Grouping rows of large dataset in R

how to generate speific periods during serval days in R

How to format dates in data.frame and ggplot? [closed]

Multiple coordinates with dates in the suncalc R package -how to find sunrise/sunset times

Converting Excel times to POSIXct in r

Categories

Resources