I am a new user of "R", and I couldn't find a good solution to solve it. I got a timeseries in the following format:
>dates temperature depth salinity
>12/03/2012 11:26 9.7533 0.48073 37.607
>12/03/2012 11:56 9.6673 0.33281 37.662
>12/03/2012 12:26 9.6673 0.33281 37.672
I have an irregular frequency for variable measurements, done every 15 or every 30 minutes depending on the period. I would like to calculate annual, monthly and daily averages for each of my variables, whatever the number of data in a day/month/year is. I read a lot of things about the packages zoo, timeseries, xts, etc. but I can't get a clear vision of what I nead (maybe cause I'm not skilled enough with R...).
I hope my post is clear, don't hesitate to tell me if it's not.
Convert your data to an xts object, then use apply.daily et al to calculate whatever values you want.
library(xts)
d <- structure(list(dates = c("12/03/2012 11:26", "12/03/2012 11:56",
"12/03/2012 12:26"), temperature = c(9.7533, 9.6673, 9.6673),
depth = c(0.48073, 0.33281, 0.33281), salinity = c(37.607,
37.662, 37.672)), .Names = c("dates", "temperature", "depth",
"salinity"), row.names = c(NA, -3L), class = "data.frame")
x <- xts(d[,-1], as.POSIXct(d[,1], format="%m/%d/%Y %H:%M"))
apply.daily(x, colMeans)
# temperature depth salinity
# 2012-12-03 12:26:00 9.695967 0.3821167 37.647
I'd add the day, month and year into the data frame and then use aggregate().
First convert your date column into a POSIXct objet:
d$timestamp <- as.POSIXct(d$dates,format = "%m/%d/%Y %H:%M",tz ="GMT")
Then get the date (e.g. 12/03/2012) into a column called Date, try this:
d$Date <- format(d$timestamp,"%y-%m-%d",tz = "GMT")
Next, aggregate by the date:
aggregate(cbind("temperature.mean" = temperature,
"salinity.mean" = salinity) ~ Date,
data = d,
FUN = mean)
Similarly, you can get the month into a column (let's call it M for month), and then...
d$M <- format(d$timestamp,"%B",tz = "GMT")
aggregate(cbind("temperature.mean" = temperature,
"salinity.mean" = salinity) ~ M,
data = d,
FUN = mean)
or if you want year-month
d$YM <- format(d$timestamp,"%y-%B",tz = "GMT")
aggregate(cbind("temperature.mean" = temperature,
"salinity.mean" = salinity) ~ YM,
data = d,
FUN = mean)
If you have any NA values in your data, you may need to account for those:
aggregate(cbind("temperature.mean" = temperature,
"salinity.mean" = salinity) ~ YM,
data = d,
function(x) mean(x,na.rm = TRUE))
Finally, if you want to average by week, you can do that as well. First generate the week number, and then use aggregate() again.
d$W <- format(d$timestamp,"%W",tz = "GMT")
aggregate(cbind("temperature.mean" = temperature,
"salinity.mean" = salinity) ~ W,
data = d,
function(x) mean(x,na.rm = TRUE))
This version of week number defines week 1 as being the week with the first Monday of the year. The weeks are from Monday to Sunday.
Yet, another method using plyr:
df <- structure(list(dates = c("12/03/2012 11:26", "12/03/2012 11:56",
"12/03/2012 12:26"), temperature = c(9.7533, 9.6673, 9.6673),
depth = c(0.48073, 0.33281, 0.33281), salinity = c(37.607,
37.662, 37.672)), .Names = c("dates", "temperature", "depth",
"salinity"), row.names = c(NA, -3L), class = "data.frame")
library(plyr)
# Change date to POSIXct
df$dates <- with(d,as.POSIXct(dates,format="%m/%d/%Y %H:%M"))
# Make new variables, year and month
df <- transform(d,month=as.numeric(format(dates,"%m")),year=as.numeric(format(dates,"%Y")))
## According to year
ddply(df,.(year),summarize,meantemp=mean(temperature),meandepth=mean(depth),meansalinity=mean(salinity))
year meantemp meandepth meansalinity
1 2012 9.695967 0.3821167 37.647
## According to month
ddply(df,.(month),summarize,meantemp=mean(temperature),meandepth=mean(depth),meansalinity=mean(salinity))
month meantemp meandepth meansalinity
1 12 9.695967 0.3821167 37.647
The package hydroTSM holds a multiple functions to creat annual and other summaries:
daily2annual(x, ...)
subdaily2annual(x, ...)
monthly2annual(x, ...)
annualfunction(x, FUN, na.rm = TRUE, ...)
Related
I am attempting to limit my dataframe to the days of each month between the 20th and the 25th . I got a big dataset with many dates ranging over many years. It looks something like this:
Event Date
Football 20.12.2016
Work 15.10.2019
Holiday 30.11.2018
Running 24.01.2020
I would then like to restrict my results to:
Event Date
Football 20.12.2016
Running 24.01.2020
Any tips on how to do this?
This is a solution using dplyr/lubridate although I have converted your Date column using as.Date
df <-
data.frame(
Event = c("Football", "Work", "Holiday", "Running"),
Date = c("20.12.2016", "15.10.2019", "30.11.2018", "24.01.2020")
)
df$Date <- as.Date(df$Date, format = "%d.%m.%Y")
df %>% filter(day(Date) >= 20 & day(Date) <= 25)
Output
1 Football 2016-12-20
2 Running 2020-01-24
Doing a literal string-match, keeping your Date column as strings (not real dates):
# base R
subset(quux, between(as.integer(sub("\\..*", "", Date)), 20, 25))
# Event Date
# 1 Football 20.12.2016
# 4 Running 24.01.2020
# dplyr
quux %>%
filter(between(as.integer(sub("\\..*", "", Date)), 20, 25))
between can be from dplyr::, data.table::, or we can easily craft our own with:
between <- function(x, y, z) y <= x & x <= z
... though the two package versions are more robust to NAs and other issues.
Data
quux <- structure(list(Event = c("Football", "Work", "Holiday", "Running"), Date = c("20.12.2016", "15.10.2019", "30.11.2018", "24.01.2020")), class = "data.frame", row.names = c(NA, -4L))
I've got a GPS dataset with about 5600 rows of coordinates from 5 GPS devices ('nodes') over several days and I want to reduce the number of GPS points to just one point per hour. Because the number of points per hour fluctuates, a simple for-loop is not possible.
A simplified structure of the table would be this:
ID node easting northing year month day hour minute time
The column 'time' is class "POSIXlt" "POSIXt".
Trying my first approach, a multiple nested for-loop, I learned about the Second circle of Inferno.
Does someone has any idea, how to reduce multiple rows (per hour) to one (per hour), separated by each device in R.
Assuming that the year, month, day, and time columns contain information related to the time column, the solution could be as follows:
# Generate data
md <- data.frame(
node = rep(1:5, each = 2)
, easting = sample(1:10, size = 20, replace = TRUE)
, northing = sample(1:10, size = 20, replace = TRUE)
, year = 2017
, month = "June "
, day = 6
, hour = rep(1:2, each = 2, times = 5)
, minute = NA
, time = NA
)
# Solution
library(dplyr)
md %>%
group_by(node, year, month, day, hour) %>%
summarize(
easting = mean(easting),
northing = mean(northing)
)
You can create a new column "Unix_hour": the UNIX timestamp divided by 3600.
So, you will have a unique id for each hour.
To do this, you should user as.numeric to convert a POSIXct date into Unix timestamp (in seconds):
as.numeric(POSIXct_variable) / 3600
It will return the timestamp.
Then, you will just group by on this new column "Unix_hour":
aggregate(. ~ Unix_hour, df, mean)
(Change aggregate function "mean" if you to aggregate other variables in another way)
You could convert your multi columns for date time into one, e.g:
DateTimeUTCmin5 <- ISOdate(year = tmp$Year,
month = tmp$Month,
day = tmp$Day,
hour = tmp$Hour,
min = tmp$Min,
sec = tmp$Sec,
tz = "America/New_York")
add an hour floor using floor_date from lubridate
df$HourFloor = floor_date(df$DateTimeUTCmin5, unit = "hour")
then decide how you want to extract the data from that hour, mean, first, max?
Hourstats <- df %>% group_by(HourFloor) %>%
summarise(meanEast = mean(easting, na.rm = TRUE),
firstNorth = first(northing, na.rm = TRUE))) %>%
ungroup()
I have a data frame which consists of date and temperature of 34 different systems each system in different column. I need to calculate every systems average hourly temperature. I use this code to calculate average for 1 system. But if I want to calculate average for other 33 systems, I have to repeat code again, and again. Is there a better way to find hourly average in all columns at once ?
dat$ut_ms <- dat$ut_ms/1000
dat[ ,1]<- as.POSIXct(dat[,1], origin="1970-01-01")
dat$ut_ms <- strptime(dat$ut_ms, "%Y-%m-%d %H:%M")
dat$ut_ms <- cut(dat[enter image description here][1]$ut_ms, breaks = 'hour')
meanNPWD2401<- aggregate(NPWD2401 ~ ut_ms, dat, mean)
I added a picture of the data. For better understing of what I want.
You can split your data per hour and itterate,
list1 <- split(dat, cut(strptime(dat$ut_ms, format = '%Y-%m-%d %H:%M'), 'hour'))
lapply(list1, colMeans)
When you rearrange the data into a long format, things get much easier
n.system <- 34
n.time <- 100
temp <- rnorm(n.time * n.system)
temp <- matrix(temp, ncol = n.system)
seconds <- runif(n.time, max = 3 * 3600)
time <- as.POSIXct(seconds, origin = "1970-01-01")
dataset <- data.frame(time, temp)
library(dplyr)
library(tidyr)
dataset %>%
gather(key = "system", value = "temperature", -time) %>%
mutate(hour = cut(time, "hour")) %>%
group_by(system, hour) %>%
summarise(average = mean(temperature))
I am using the NAPM ISM data set from the FRED database. The data is monthly frequency. I would like to create another data frame with daily frequency where value each business day is the last monthly data release. So if last release of 49.5 on 02/01/16 then every day in February has a value of 49.5.
Code Sample
start_date <- as.Date("1970-01-01")
end_date <- Sys.Date()
US_PMI <- getSymbols("NAPM", auto.assign = FALSE, src ="FRED", from = start_date, to = end_date)
test <- data.frame(date=index(US_PMI), coredata(US_PMI))
I do not know which packages and data I need to reproduce your example but you can use a sequence of daily dates, the merge function and the NA filler in the zoo package to create the daily data frame:
library(zoo)
# Date range
sd = as.Date("1970-01-01")
ed = Sys.Date()
# Create daily and monthly data frame
daily.df = data.frame(date = seq(sd, ed, "days"))
monthly.df = data.frame(date = seq(sd, ed, "months"))
# Add some variable to the monthly data frame
monthly.df$v = rnorm(nrow(monthly.df))
# Merge
df = merge(daily.df, monthly.df, by = "date", all = TRUE)
# Fill up NA's
df = transform(df, v = na.locf(v))
This might not be the fastest way to obtain the data frame but it should work.
I don't often have to work with dates in R, but I imagine this is fairly easy. I have a column that represents a date in a dataframe. I simply want to create a new dataframe that summarizes a 2nd column by Month/Year using the date. What is the best approach?
I want a second dataframe so I can feed it to a plot.
Any help you can provide will be greatly appreciated!
EDIT: For reference:
> str(temp)
'data.frame': 215746 obs. of 2 variables:
$ date : POSIXct, format: "2011-02-01" "2011-02-01" "2011-02-01" ...
$ amount: num 1.67 83.55 24.4 21.99 98.88 ...
> head(temp)
date amount
1 2011-02-01 1.670
2 2011-02-01 83.550
3 2011-02-01 24.400
4 2011-02-01 21.990
5 2011-02-03 98.882
6 2011-02-03 24.900
I'd do it with lubridate and plyr, rounding dates down to the nearest month to make them easier to plot:
library(lubridate)
df <- data.frame(
date = today() + days(1:300),
x = runif(300)
)
df$my <- floor_date(df$date, "month")
library(plyr)
ddply(df, "my", summarise, x = mean(x))
There is probably a more elegant solution, but splitting into months and years with strftime() and then aggregate()ing should do it. Then reassemble the date for plotting.
x <- as.POSIXct(c("2011-02-01", "2011-02-01", "2011-02-01"))
mo <- strftime(x, "%m")
yr <- strftime(x, "%Y")
amt <- runif(3)
dd <- data.frame(mo, yr, amt)
dd.agg <- aggregate(amt ~ mo + yr, dd, FUN = sum)
dd.agg$date <- as.POSIXct(paste(dd.agg$yr, dd.agg$mo, "01", sep = "-"))
A bit late to the game, but another option would be using data.table:
library(data.table)
setDT(temp)[, .(mn_amt = mean(amount)), by = .(yr = year(date), mon = months(date))]
# or if you want to apply the 'mean' function to several columns:
# setDT(temp)[, lapply(.SD, mean), by=.(year(date), month(date))]
this gives:
yr mon mn_amt
1: 2011 februari 42.610
2: 2011 maart 23.195
3: 2011 april 61.891
If you want names instead of numbers for the months, you can use:
setDT(temp)[, date := as.IDate(date)
][, .(mn_amt = mean(amount)), by = .(yr = year(date), mon = months(date))]
this gives:
yr mon mn_amt
1: 2011 februari 42.610
2: 2011 maart 23.195
3: 2011 april 61.891
As you see this will give the month names in your system language (which is Dutch in my case).
Or using a combination of lubridate and dplyr:
temp %>%
group_by(yr = year(date), mon = month(date)) %>%
summarise(mn_amt = mean(amount))
Used data:
# example data (modified the OP's data a bit)
temp <- structure(list(date = structure(1:6, .Label = c("2011-02-01", "2011-02-02", "2011-03-03", "2011-03-04", "2011-04-05", "2011-04-06"), class = "factor"),
amount = c(1.67, 83.55, 24.4, 21.99, 98.882, 24.9)),
.Names = c("date", "amount"), class = c("data.frame"), row.names = c(NA, -6L))
You can do it as:
short.date = strftime(temp$date, "%Y/%m")
aggr.stat = aggregate(temp$amount ~ short.date, FUN = sum)
Just use xts package for this.
library(xts)
ts <- xts(temp$amount, as.Date(temp$date, "%Y-%m-%d"))
# convert daily data
ts_m = apply.monthly(ts, FUN)
ts_y = apply.yearly(ts, FUN)
ts_q = apply.quarterly(ts, FUN)
where FUN is a function which you aggregate data with (for example sum)
Here's a dplyr option:
library(dplyr)
df %>%
mutate(date = as.Date(date)) %>%
mutate(ym = format(date, '%Y-%m')) %>%
group_by(ym) %>%
summarize(ym_mean = mean(x))
I have a function monyr that I use for this kind of stuff:
monyr <- function(x)
{
x <- as.POSIXlt(x)
x$mday <- 1
as.Date(x)
}
n <- as.Date(1:500, "1970-01-01")
nn <- monyr(n)
You can change the as.Date at the end to as.POSIXct to match the date format in your data. Summarising by month is then just a matter of using aggregate/by/etc.
One more solution:
rowsum(temp$amount, format(temp$date,"%Y-%m"))
For plot you could use barplot:
barplot(t(rowsum(temp$amount, format(temp$date,"%Y-%m"))), las=2)
Also, given that your time series seem to be in xts format, you can aggregate your daily time series to a monthly time series using the mean function like this:
d2m <- function(x) {
aggregate(x, format(as.Date(zoo::index(x)), "%Y-%m"), FUN=mean)
}