R: converting xts or zoo object to a data frame - r

What is an easy way of coercing time series data to a data frame, in a format where the resulting data is a summary of the original?
This could be some example data, stored in xts or zoo object:
t, V1
"2010-12-03 12:00", 10.0
"2010-11-04 12:00", 10.0
"2010-10-05 12:00", 10.0
"2010-09-06 12:00", 10.0
...and so on, monthly data for many years.
and I would like to transform it to a data frame like:
year, month, V1
2010, 12, a descriptive statistic calculated of that month's data
2010, 11, ...
2010, 10, ...
2010, 9, ...
The reason I'm asking this, is because I want to plot monthly calculated summaries of data in the same plot. I can do this quite easily for data in the latter format, but haven't found a plotting method for the time series format.
For example, I could have temperature data from several years measured in a daily interval and I would like to plot the curves for the monthly mean temperatures for each year in the same plot. I didn't figure out how to do this using the xts-formatted data, or if this even suits the purpose of the xts/zoo formatting of the data, which seems to always carry the year information along it.

Please provide a sample of data to work with and I will try to provide a less general answer. Basically you can use apply.monthly to calculate summary statistics on your xts object. Then you can convert the index to yearmon and convert the xts object to a data.frame.
x <- xts(rnorm(50), Sys.Date()+1:50)
mthlySumm <- apply.monthly(x, mean)
index(mthlySumm) <- as.yearmon(index(mthlySumm))
Data <- as.data.frame(mthlySumm)

Here's a solution using the tidyquant package, which includes functions as_xts() for coercing data frames to xts objects and as_tibble() for coercing xts objects to tibbles ("tidy" data frames).
Recreating your data:
> data_xts
V1
2010-09-06 10
2010-10-05 10
2010-11-04 10
2010-12-03 10
Use as_tibble() to convert to a tibble. The preserve_row_names = TRUE adds a column called "row.names" with the xts index as character class. A rename and mutate are used to clean up dates. The output is a tibble with dates and values.
> data_df <- data_xts %>%
as_tibble(preserve_row_names = TRUE) %>%
rename(date = row.names) %>%
mutate(date = as_date(date))
> data_df
# A tibble: 4 × 2
date V1
<date> <dbl>
1 2010-09-06 10
2 2010-10-05 10
3 2010-11-04 10
4 2010-12-03 10
You can go a step further and add other fields such as day, month, and year using the mutate function.
> data_df %>%
mutate(day = day(date),
month = month(date),
year = year(date))
# A tibble: 4 × 5
date V1 day month year
<date> <dbl> <int> <dbl> <dbl>
1 2010-09-06 10 6 9 2010
2 2010-10-05 10 5 10 2010
3 2010-11-04 10 4 11 2010
4 2010-12-03 10 3 12 2010

Related

Sum table values ​per day

I have a table as shown in the image, where each comment has a publication date, with year, month, day and time, I would like to add the sentiment values ​​by day.
this is how the table is composed
serie <- data.frame(comments$created_time,sentiment2$positive-sentiment2$negative)
Using dplyr you can do:
library(dplyr)
df %>%
group_by(as.Date(comments.created_time)) %>%
summarize(total = sum(sentiment))
Here is some sample data that will help others to troubleshoot and understand the data:
df <- tibble(comments.created_time = c("2015-01-26 22:43:00",
"2015-01-26 22:44:00",
"2015-01-27 22:43:00",
"2015-01-27 22:44:00",
"2015-01-28 22:43:00",
"2015-01-28 22:44:00"),
sentiment = c(1,3,5,1,9,1))
Using the sample data will yield:
# A tibble: 3 × 2
`as.Date(comments.created_time)` total
<date> <dbl>
1 2015-01-26 4
2 2015-01-27 6
3 2015-01-28 10

Converting daily data to summed/averaged monthly data with a specific layout format

I've looked through similar past questions but have yet to find something specific to what I'm looking for.
I have daily data, that I would like to convert to average/sum monthly data. With the final product being a dataframe with months in the column and years in the rows Example.
I've managed to get the monthly average of my dataset using:
library(xts)
ts <- xts(data$tmax, as.Date(data$date, "%Y-%m-%d"))
ts_m = apply.monthly(ts, mean)
data$Date data$tmax
1 1951-01-01 3.22777778
2 1951-01-02 6.48888889
3 1951-01-03 10.52777778
4 1951-01-04 1.92777778
5 1951-01-05 1.30000000
6 1951-01-06 0.10000000
7 1951-01-07 -6.72777778
8 1951-01-08 -4.48888889
9 1951-01-09 -0.83888889
10 1951-02-01 -9.92777778
11 1951-02-02 -11.60000000
12 1951-02-03 -8.61111111
13 1951-02-04 -1.40000000
... ... ...
The code above gives me an xts with the averages:
Y-M-D Tmax_avg
1951-01-09 1.279630
1951-02-12 -3.548611
But I can't figure out out to convert the layout of the xts (or if I have to convert the xts) so that it looks like this (months running down, and years running across):
1951 1952 1953
01 1.27 ...
02 -3.54 ...
...
12 ... ...
Thanks in advance!
We can extract the 'year' and the 'Date' and then use xtabs
Year <- format(as.Date(index(tsm)), '%Y')
Month <- format(as.Date(index(tsm)), '%m')
df1 <- data.frame(Year, Month, tmax = tsm[,1])
xtabs(tmax ~ Month + Year, df1)

R Calculate change in Weekly values Year on Year (with additional complication)

I have a data set of daily value. It spans from Dec-1 2018 to April-1 2020.
The columns are "date" and "value". As shown here:
date <- c("2018-12-01","2000-12-02", "2000-12-03",
...
"2020-03-30","2020-03-31","2020-04-01")
value <- c(1592,1825,1769,1909,2022, .... 2287,2169,2366,2001,2087,2099,2258)
df <- data.frame(date,value)
What I would like to do is the sum the values by week and then calculate week over week change from the current to previous year.
I know that I can sum by week using the following function:
Data_week <- df%>% group_by(category ,week = cut(date, "week")) %>% mutate(summed= sum(value))
My questions are twofold:
1) How do I sum by week and then manipulate the dataframe so that I can calculate week over week change (e.g. week dec.1 2019/ week dec.1 2018).
2) How can I do that above, but using a "customized" week. Let's say I want to define a week as moving 7 days back from the latest date I have data for. Eg. the latest week I would have would be week starting on March 26th (April 1st -7 days).
We can use lag from dplyr to help and also some convenience functions from lubridate.
library(dplyr)
library(lubridate)
df %>%
mutate(year = year(date)) %>%
group_by(week = week(date),year) %>%
summarize(summed = sum(value)) %>%
arrange(year, week) %>%
ungroup %>%
mutate(change = summed - lag(summed))
# week year summed change
# <dbl> <dbl> <dbl> <dbl>
# 1 48 2018 3638. NA
# 2 49 2018 15316. 11678.
# 3 50 2018 13283. -2033.
# 4 51 2018 15166. 1883.
# 5 52 2018 12885. -2281.
# 6 53 2018 1982. -10903.
# 7 1 2019 14177. 12195.
# 8 2 2019 14969. 791.
# 9 3 2019 14554. -415.
#10 4 2019 12850. -1704.
#11 5 2019 1907. -10943.
If you would like to define "weeks" in different ways, there is also isoweek and epiweek. See this answer for a great explaination of your options.
Data
set.seed(1)
df <- data.frame(date = seq.Date(from = as.Date("2018-12-01"), to = as.Date("2019-01-29"), "days"), value = runif(60,1500,2500))

How to calculate aggregate statistics on a dataframe in R by applying conditions on time values?

I am working on climate data analysis. After loading file in R, my interest is to subset data based upon hours in a day.
for time analysis we can use $hour with the variable in which time vector has been stored if our interest is to deal with hours.
I want to subset my data for each hour in a day for 365 days and then take an average of the data at a particular hour throughout the year. Say I am interested to take values of irradiation/wind speed etc at 12:OO PM for a year and then take mean of these values to get the desired result.
I know how to subset a data frame based upon conditions. If for example my data is in a matrix called data and contains 2 rows say time and wind speed and I'm interested to subset rows of data in which irradiationb isn't zero. We can do this using the following code
my_data <- subset(data, data[,1]>0)
but now in order to deal with hours values in time column which is a variable stored in data, how can I subset values?
My data look like this:
I hope I made sense in this question.
Thanks in advance!
Here is a possible solution. You can create a hourly grouping with format(df$time,'%H'), so we obtain only the hour for each period, we can then simply group by this new column and calculate the mean for each group.
df = data.frame(time=seq(Sys.time(),Sys.time()+2*60*60*24,by='hour'),val=sample(seq(5),49,replace=T))
library(dplyr)
df %>% mutate(hour=format(df$time,'%H')) %>%
group_by(hour) %>%
summarize(mean_val = mean(val))
To subset the non-zero values first, you can do either:
df = subset(df,val!=0)
or start the dplyr chain with:
df %>% filter(df$val!=0)
Hope this helps!
df looks as follows:
time val
1 2018-01-31 12:43:33 4
2 2018-01-31 13:43:33 2
3 2018-01-31 14:43:33 2
4 2018-01-31 15:43:33 3
5 2018-01-31 16:43:33 3
6 2018-01-31 17:43:33 1
7 2018-01-31 18:43:33 2
8 2018-01-31 19:43:33 4
... ... ... ...
And the output:
# A tibble: 24 x 2
hour mean_val
<chr> <dbl>
1 00 3.50
2 01 3.50
3 02 4.00
4 03 2.50
5 04 3.00
6 05 2.00
.... ....
This assumes your time column is already of class POSIXct, otherwise you'd first have to convert it using for example as.POSIXct(x,format='%Y-%m-%d %H:%M:%S')

reshaping daily time series data

I have daily time series data starting from 1980 and ends at 2013 and it is the following format https://www.dropbox.com/s/i6qu6epxzdksvg7/a.xlsx?dl=0. My codes thus far are
# trying to reshape my data
require(reshape)
data <- melt(data1, id.vars=c("year","month"))
However, this did not me my desired output. I would like to have my data in a 4 columns ( year, month, day and data ) or 2 columns with ( date and data) in a time series ( starting from 1st Jan 1980 and ends 31st Dec 2013)
I would be grateful for some guidance on how to get this done.
With kind regards
Extending Jason's / Dominic's solution this gives you an example of how to plot your data as a xts time series as you asked for:
library(xts)
dat<-read.csv('~/Downloads/stack_a.csv')
dat.m <-reshape(dat,direction='long',idvar=c('year','month'),varying=list(3:33),v.names='value')
dat.m <- dat.m[order(dat.m[,1],dat.m[,2],dat.m[,3]),] # order by year, month, day(time)
dat.m$date <-paste0(dat.m$year,'-',dat.m$month,'-',dat.m$time) # concatenate these 3 columns
dat.m <- na.omit(dat.m) # remove the NAs introduced in the original data
dat.xts <- as.xts(dat.m$value,order.by = as.Date(dat.m$date))
names(dat.xts) <- 'value'
plot(dat.xts)
I used the data you uploaded so it read for me like follows:
dat<-read.csv('a.csv')
library(reshape)
newDF<-reshape(dat,direction='long',idvar=c('year','month'),varying=list(3:33),v.names='X')
newDF<-as.ts(newDF)
Is that what you wanted?
Same results as Jason's, but using tidyr::gather instead of reshape
new.df <- gather(dat, key = year, value=month, na.rm = FALSE, convert = TRUE)
new.df$variable <- as.numeric(sub("X", "", new.df$var))
names(new.df)[3] <- "day"
new.df.ts <- as.ts(new.df)
head(new.df.ts)
year month day value
[1,] 1980 1 1 2.3
[2,] 1980 2 1 1.0
[3,] 1980 3 1 0.0
[4,] 1980 4 1 1.8
[5,] 1980 5 1 3.8
[6,] 1980 6 1 10.4

Resources