I'm trying to get the standard deviation of a stock price by year, but I'm getting the same value for every year.
I tried with dplyr (group_by, summarise) and also with a function, but had no luck in any of them, both return the same value of 67.0.
It is probably passing the whole dataframe without subsetting it, how can this issue be fixed?
library(quantmod)
library(tidyr)
library(dplyr)
#initial parameters
initialDate = as.Date('2010-01-01')
finalDate = Sys.Date()
ybeg = format(initialDate,"%Y")
yend = format(finalDate,"%Y")
ticker = "AAPL"
#getting stock prices
stock = getSymbols.yahoo(ticker, from=initialDate, auto.assign = FALSE)
stock = stock[,4] #working only with closing prices
With dplyr:
#Attempt 1 with dplyr - not working, all values by year return the same
stock = stock %>% zoo::fortify.zoo()
stock$Date = stock$Index
separate(stock, Date, c("year","month","day"), sep="-") %>%
group_by(year) %>%
summarise(stdev= sd(stock[,2]))
# A tibble: 11 x 2
# year stdev
# <chr> <dbl>
# 1 2010 67.0
# 2 2011 67.0
#....
#10 2019 67.0
#11 2020 67.0
And with function:
#Attempt 2 with function - not working - returns only one value instead of multiple
#getting stock prices
stock = getSymbols.yahoo(ticker, from=initialDate, auto.assign = FALSE)
stock = stock[,4] #working only with closing prices
#subsetting
years = as.character(seq(ybeg,yend,by=1))
years
calculate_stdev = function(series,years) {
series[years] #subsetting by years, to be equivalent as stock["2010"], stock["2011"] e.g.
sd(series[years][,1]) #calculate stdev on closing prices of the current subset
}
yearly.stdev = calculate_stdev(stock,years)
> yearly.stdev
[1] 67.04185
Use apply.yearly() (a convenience wrapper around the more general period.apply()) to call a function on yearly subsets of the xts object returned by getSymbols().
You can use the Cl() function to extract the close column from objects returned by getSymbols().
stock = getSymbols("AAPL", from = "2010-01-01", auto.assign = FALSE)
apply.yearly(Cl(stock), sd)
## AAPL.Close
## 2010-12-31 5.365208
## 2011-12-30 3.703407
## 2012-12-31 9.568127
## 2013-12-31 6.412542
## 2014-12-31 13.371293
## 2015-12-31 7.683550
## 2016-12-30 7.640743
## 2017-12-29 14.621191
## 2018-12-31 20.593861
## 2019-12-31 34.538978
## 2020-06-19 29.577157
I don't know dplyr, but here's how with data.table
library(data.table)
# convert data.frame to data.table
setDT(stock)
# convert your Date column with content like "2020-06-17" from character to Date type
stock[,Date:=as.Date(Date)]
# calculate sd(price) grouped by year, assuming here your price column is named "price"
stock[,sd(price),year(Date)]
Don't pass the name of the dataframe again in your summarise function. Use the variable name instead.
separate(stock, Date, c("year","month","day"), sep="-") %>%
group_by(year) %>%
summarise(stdev = sd(AAPL.Close)) # <-- here
# A tibble: 11 x 2
# year stdev
# <chr> <dbl>
# 1 2010 5.37
# 2 2011 3.70
# 3 2012 9.57
# 4 2013 6.41
# 5 2014 13.4
# 6 2015 7.68
# 7 2016 7.64
# 8 2017 14.6
# 9 2018 20.6
#10 2019 34.5
#11 2020 28.7
Related
I am working with a large time series of oceanographic data which needs a lot of manipulation.
I have several days of data missing and would like to interpolate them. Specifically date/depth/temperature.
Here is an example of my df:
> tibble(df)
# A tibble: 351,685 x 9
date time depthR SV temp salinity conduct density calcSV
<date> <times> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2021-11-17 07:50:18 0.5 1524. 19.7 37.8 51.0 27 1524.
2 2021-11-17 07:50:22 0.5 1524. 19.9 37.6 50.9 26.8 1524.
3 2021-11-17 07:50:23 1.1 1524. 19.9 37.6 50.9 26.8 1524.
4 2021-11-17 07:50:24 1.5 1524. 19.9 37.6 50.9 26.8 1524.
5 2021-11-17 07:50:25 2 1524. 19.9 37.6 50.9 26.8 1524.
Each date contains over 1000 lines of data and so my idea was to find the max depth of each day to therefore interpolate reasonable max depth values for the missing days between.
So far, I have found the max depth per date:
group <- df %>% group_by(date) %>% summarise(max =max(depthR, na.rm=TRUE))
> tibble(group)
# A tibble: 40 x 2
date max
<date> <dbl>
1 2021-11-17 685.
2 2021-11-18 695.
3 2021-11-19 136.
4 2021-11-20 138.
5 2021-11-21 142.
6 2021-11-22 26
7 2021-11-23 136.
8 2021-11-24 297.
9 2021-11-25 613.
10 2021-11-26 81.1
# ... with 30 more rows
And then I managed to interpolate the missing dates by:
> group <- seq(min(group$date), max(group$date), by = "1 day")
> group <- data.frame(date=group)
> tibble(group)
# A tibble: 69 x 1
date
<date>
1 2021-11-17
2 2021-11-18
3 2021-11-19
4 2021-11-20
5 2021-11-21
6 2021-11-22
7 2021-11-23
8 2021-11-24
9 2021-11-25
10 2021-11-26
# ... with 59 more rows
As you can see, the previous query was overwritten.
So I tried creating a new df for the interpolated dates and tried merging them together. I got the error:
> library(stringr)
> group$combined <- str_c(group$date, '', dateinterp$date)
Error: Assigned data `str_c(group$date, "", dateinterp$date)` must be compatible with existing data.
x Existing data has 40 rows.
x Assigned data has 69 rows.
i Only vectors of size 1 are recycled.
How can I insert these two matrices of differing length into the dataframe in chronological order without overwriting original data or conflicting?
Following that, I'm not sure how I would proceed to interpolate the depths and temperatures for each date.
Perhaps starting with something like the following:
depth = seq(1, 200, length.out = 100))
Eventually the date variable will be exchanged for geo coords.
Any advice greatly appreciated.
EDIT: As requested by #AndreaM, an example of my data:
> dput(head(df))
structure(list(date = structure(c(18948, 18948, 18948, 18948,
18948, 18948), class = "Date"), time = structure(c(0.326597222222222,
0.326643518518519, 0.326655092592593, 0.326666666666667, 0.326678240740741,
0.326712962962963), format = "h:m:s", class = "times"), depth = c(0.5,
0.5, 1.1, 1.5, 2, 2.5), SV = c(1524.024, 1524.026, 1524.025,
1524.008, 1524.016, 1524.084), temp = c(19.697, 19.864, 19.852,
19.854, 19.856, 19.847), salinity = c(37.823, 37.561, 37.557,
37.568, 37.573, 37.704), conduct = c(51.012, 50.878, 50.86, 50.876,
50.884, 51.032), density = c(27, 26.755, 26.758, 26.768, 26.773,
26.877), calcSV = c(1523.811, 1523.978, 1523.949, 1523.975, 1523.993,
1524.124)), row.names = 100838:100843, class = "data.frame")
one approach, adapt to your case as appropriate:
library(dplyr)
library(lubridate) ## facilitates date-time manipulations
## example data:
patchy_data <- data.frame(date = as.Date('2021-11-01') + sample(1:10, 6),
value = rnorm(12)) %>%
arrange(date)
## create vector of -only!- missing dates:
missing_dates <-
setdiff(
seq.Date(from = min(patchy_data$date),
to = max(patchy_data$date),
by = '1 day'
),
patchy_data$date
) %>% as.Date(origin = '1970-01-01')
## extend initial dataframe with rows per missing date:
full_data <-
patchy_data %>%
bind_rows(data.frame(date = missing_dates,
value = NA)
) %>%
arrange(date)
## group by month and impute missing data from monthwise statistic:
full_data %>%
mutate(month = lubridate::month(date)) %>%
group_by(month) %>%
## coalesce conveniently replaces ifelse-constructs to replace NAs
mutate(imputed = coalesce(.$value, mean(.$value, na.rm = TRUE)))
edit
One possibility to granulate generated data (missing dates) with additional parameters (e. g. measuring depths) is to use expand.grid as follows. Assuming object names from previous code:
## depths of daily measurements:
observation_depths <- c(0.5, 1.1, 1.5) ## example
## generate dataframe with missing dates x depths:
missing_dates_and_depths <-
setNames(expand.grid(missing_dates, observation_depths),
c('date','depthR')
)
## stack both dataframes as above:
full_data <-
patchy_data %>%
bind_rows(missing_dates_and_depths) %>%
arrange(date)
I am trying to fit linear models to a time-series where the regression begins at midnight each day and uses all data until 0600 the following morning (covering a total of 30 hrs). I want to do this for every day in the time-series, and this also needs to be applied by a grouping factor. What I ultimately need is the regression coefficients added to the data frame for the day where the regression started. I am familiar with rolling and window regressions and how to apply functions across groups using dplyr. Where I am struggling is how to code that the regression needs to start at midnight each day. If I were to use a window function, after the first day it would be shifted ahead six hours from midnight and I am not sure how to shift the window back to midnight. Seems like I need to specify a window size and a lag/lead at each iteration but can't visualize how to implement that. Any insight is appreciated.
here is some sample data. I would like to model dv ~ datetime, by = grp
df <- dplyr::arrange(data.frame(datetime = seq(as.POSIXct("2020-09-19 00:00:00"), as.POSIXct("2020-09-30 00:00:00"),"hour"),
grp = rep(c('a', 'b', 'c'), 265),
dv = rnorm(795)),grp, datetime)
We assume that we want each regression to cover 30 rows (except for any stub at the end) and that we should move forward by 24 hours for each regression so that there is one regression per date within grp.
ans <- df %>%
group_by(grp) %>%
group_modify(~ {
r <- rollapplyr(1:nrow(.), 30, by = 24,
function(ix) coef(lm(dv ~ datetime, ., subset = ix)),
align = "left", partial = TRUE)
data.frame(date = head(unique(as.Date(.$datetime)), nrow(r)),
coef1 = r[, 1], coef2 = r[, 2])
}) %>%
ungroup
giving:
> ans
# A tibble: 36 x 4
grp date coef1 coef2
<chr> <date> <dbl> <dbl>
1 a 2020-09-19 -7698. 0.00000481
2 a 2020-09-20 -2048. 0.00000128
3 a 2020-09-21 -82.0 0.0000000514
4 a 2020-09-22 963. -0.000000602
5 a 2020-09-23 2323. -0.00000145
6 a 2020-09-24 5886. -0.00000368
7 a 2020-09-25 7212. -0.00000450
8 a 2020-09-26 -17448. 0.0000109
9 a 2020-09-27 1704. -0.00000106
10 a 2020-09-28 15731. -0.00000982
# ... with 26 more rows
old
After re-reading question I replaced this with the above.
Within group create g which groups the values since the last 6 am and let width be the number of rows since the most recent 6am row. Then run rollapplyr using the width vector to define the widths to regress over.
library(dplyr)
library(zoo)
ans <- df %>%
group_by(grp) %>%
group_modify(~ {
g <- cumsum(format(.$datetime, "%H") == "06")
width = 1:nrow(.) - match(g, g) + 1
r <- rollapplyr(1:nrow(.), width,
function(ix) coef(lm(dv ~ datetime, ., subset = ix)),
partial = TRUE, fill = NA)
mutate(., coef1 = r[, 1], coef2 = r[, 2])
}) %>%
ungroup
giving:
> ans
# A tibble: 795 x 5
grp datetime dv coef1 coef2
<chr> <dttm> <dbl> <dbl> <dbl>
1 a 2020-09-19 00:00:00 -0.560 -0.560 NA
2 a 2020-09-19 01:00:00 -0.506 -24071. 0.0000150
3 a 2020-09-19 02:00:00 -1.76 265870. -0.000166
4 a 2020-09-19 03:00:00 0.0705 -28577. 0.0000179
5 a 2020-09-19 04:00:00 1.95 -248499. 0.000155
6 a 2020-09-19 05:00:00 0.845 -205918. 0.000129
7 a 2020-09-19 06:00:00 0.461 0.461 NA
8 a 2020-09-19 07:00:00 0.359 45375. -0.0000284
9 a 2020-09-19 08:00:00 -1.40 412619. -0.000258
10 a 2020-09-19 09:00:00 -0.446 198902. -0.000124
# ... with 785 more rows
Note
Input used
set.seed(123)
df <- dplyr::arrange(data.frame(datetime = seq(as.POSIXct("2020-09-19 00:00:00"), as.POSIXct("2020-09-30 00:00:00"),"hour"),
grp = rep(c('a', 'b', 'c'), 265),
dv = rnorm(795)),grp, datetime)
I am trying to replicate a trading strategy and backtest in R. However, I am having a slight problem with the tq_transmute() function. Any help would be appreciated.
So, I have the following code that I have written until now:
#Importing the etfs data
symbols<- c("SPY","XLF","XLE")
start<-as.Date("2000-01-01")
end<- as.Date("2018-12-31")
price_data<- lapply(symbols, function(symbol){
etfs<-as.data.frame(getSymbols(symbol,src="yahoo", from=start, to= end,
auto.assign = FALSE))
colnames(etfs)<- c("Open", "High","Low","Close","volume","Adjusted")
etfs$Symbol<- symbol
etfs$Date<- rownames(etfs)
etfs
})
# Next, I used do.call() with rbind() to combine the data into a single data frame
etfs_df<- do.call(rbind, price_data)
#This because of POSIXct error
daily_price<- etfs_df %>%
mutate(Date=as.Date(Date, frac=1))
# I have deleted some columns of the table as my work only concerned the "Adjusted" column.
#So, until now we have:
head(daily_price)
Adjusted Symbol Date
1 98.14607 SPY 2000-01-03
2 94.30798 SPY 2000-01-04
3 94.47669 SPY 2000-01-05
4 92.95834 SPY 2000-01-06
5 98.35699 SPY 2000-01-07
6 98.69440 SPY 2000-01-10
#Converting the daily adjusted price to monthly adjusted price
monthly_price<-
tq_transmute(daily_price,select = Adjusted, mutate_fun = to.monthly, indexAt = "lastof")
head(monthly_price)
# And now, I get the following table:
# A tibble: 6 x 2
Date Adjusted
<date> <dbl>
1 2000-01-31 16.6
2 2000-02-29 15.9
3 2000-03-31 17.9
4 2000-04-30 17.7
5 2000-05-31 19.7
6 2000-06-30 18.6
So, as you can see, the Date and Adjusted prices have been successfully converted to monthly figures but my Symbol column has disappeared. Could anyone please tell me why did that happen and how do I get it back?
Thank you.
group the data by Symbol and apply tq_transmute.
library(dplyr)
library(quantmod)
library(tidyquant)
monthly_price <- daily_price %>%
group_by(Symbol) %>%
tq_transmute(daily_price,select = Adjusted,
mutate_fun = to.monthly, indexAt = "lastof")
# Symbol Date Adjusted
# <chr> <date> <dbl>
# 1 SPY 2000-01-31 94.2
# 2 SPY 2000-02-29 92.7
# 3 SPY 2000-03-31 102.
# 4 SPY 2000-04-30 98.2
# 5 SPY 2000-05-31 96.6
# 6 SPY 2000-06-30 98.5
# 7 SPY 2000-07-31 97.0
# 8 SPY 2000-08-31 103.
# 9 SPY 2000-09-30 97.6
#10 SPY 2000-10-31 97.2
# … with 674 more rows
I would do it like this:
symbols <- c("SPY", "XLF", "XLE")
start <- as.Date("2000-01-01")
end <- as.Date("2018-12-31")
# Environment to hold data
my_data <- new.env()
# Tell getSymbols() to load the data into 'my_data'
getSymbols(symbols, from = start, to = end, env = my_data)
# Combine all the adjusted close prices into one xts object
price_data <- Reduce(merge, lapply(my_data, Ad))
# Remove "Adjusted" from column names
colnames(price_data) <- sub(".Adjusted", "", colnames(price_data), fixed = TRUE)
# Get the last price for each month
monthly_data <- apply.monthly(price_data, last)
# Convert to a long data.frame
long_data <- fortify.zoo(monthly_data,
names = c("Date", "Symbol", "Adjusted"), melt = TRUE)
Suppose I have a daily rain data.frame like this:
df.meteoro = data.frame(Dates = seq(as.Date("2017/1/19"), as.Date("2018/1/18"), "days"),
rain = rnorm(length(seq(as.Date("2017/1/19"), as.Date("2018/1/18"), "days"))))
I'm trying to sum the accumulated rain between a 14 days interval with this code:
library(tidyverse)
library(lubridate)
df.rain <- df.meteoro %>%
mutate(TwoWeeks = round_date(df.meteoro$data, "14 days")) %>%
group_by(TwoWeeks) %>%
summarise(sum_rain = sum(rain))
The problem is that it isn't starting on 2017-01-19 but on 2017-01-15 and I was expecting my output dates to be:
"2017-02-02" "2017-02-16" "2017-03-02" "2017-03-16" "2017-03-30" "2017-04-13"
"2017-04-27" "2017-05-11" "2017-05-25" "2017-06-08" "2017-06-22" "2017-07-06" "2017-07-20"
"2017-08-03" "2017-08-17" "2017-08-31" "2017-09-14" "2017-09-28" "2017-10-12" "2017-10-26"
"2017-11-09" "2017-11-23" "2017-12-07" "2017-12-21" "2018-01-04" "2018-01-18"
TL;DR I have a year long daily rain data.frame and want to sum the accumulate rain for the dates above.
Please help.
Use of round_date in the way you have shown it will not give you 14-day periods as you might expect. I have taken a different approach in this solution and generated a sequence of dates between your first and last dates and grouped these into 14-day periods then joined the dates to your observations.
startdate = min(df.meteoro$Dates)
enddate = max(df.meteoro$Dates)
dateseq =
data.frame(Dates = seq.Date(startdate, enddate, by = 1)) %>%
mutate(group = as.numeric(Dates - startdate) %/% 14) %>%
group_by(group) %>%
mutate(starts = min(Dates))
df.rain <- df.meteoro %>%
right_join(dateseq) %>%
group_by(starts) %>%
summarise(sum_rain = sum(rain))
head(df.rain)
> head(df.rain)
# A tibble: 6 x 2
starts sum_rain
<date> <dbl>
1 2017-01-19 6.09
2 2017-02-02 5.55
3 2017-02-16 -3.40
4 2017-03-02 2.55
5 2017-03-16 -0.12
6 2017-03-30 8.95
Using a right-join to the date sequence is to ensure that if there are missing observation days that spanned a complete time period you'd still get that period listed in the result (though in your case you have a complete year of dates anyway).
round_date rounds to the nearest multiple of unit (here, 14 days) since some epoch (probably the Unix epoch of 1970-01-01 00:00:00), which doesn't line up with your purpose.
To get what you want, you can do the following:
df.rain = df.meteoro %>%
mutate(days_since_start = as.numeric(Dates - as.Date("2017/1/18")),
TwoWeeks = as.Date("2017/1/18") + 14*ceiling(days_since_start/14)) %>%
group_by(TwoWeeks) %>%
summarise(sum_rain = sum(rain))
This computes days_since_start as the days since 2017/1/18 and then manually rounds to the next multiple of two weeks.
Assuming you want to round to the closest date from the ones you have specified I guess the following will work
targetDates<-seq(ymd("2017-02-02"),ymd("2018-01-18"),by='14 days')
df.meteoro$Dates=targetDates[sapply(df.meteoro$Dates,function(x) which.min(abs(interval(targetDates,x))))]
sum_rain=ddply(df.meteoro,.(Dates),summarize,sum_rain=sum(rain,na.rm=T))
as you can see not all dates have the same number of observations. Date "2017-02-02" for instance has all the records between "2017-01-19" until "2017-02-09", which are 22 records. From "2017-02-10" on dates are rounded to "2017-02-16" etc.
This may be a cheat, but assuming each row/observation is a separate day, then why not just group by every 14 rows and sum.
# Assign interval groups, each 14 rows
df.meteoro$my_group <-rep(1:100, each=14, length.out=nrow(df.meteoro))
# Grab Interval Names
my_interval_names <- df.meteoro %>%
select(-rain) %>%
group_by(my_group) %>%
slice(1)
# Summarise
df.meteoro %>%
group_by(my_group) %>%
summarise(rain = sum(rain)) %>%
left_join(., my_interval_names)
#> Joining, by = "my_group"
#> # A tibble: 27 x 3
#> my_group rain Dates
#> <int> <dbl> <date>
#> 1 1 3.86 2017-01-19
#> 2 2 -0.581 2017-02-02
#> 3 3 -0.876 2017-02-16
#> 4 4 1.80 2017-03-02
#> 5 5 3.79 2017-03-16
#> 6 6 -3.50 2017-03-30
#> 7 7 5.31 2017-04-13
#> 8 8 2.57 2017-04-27
#> 9 9 -1.33 2017-05-11
#> 10 10 5.41 2017-05-25
#> # ... with 17 more rows
Created on 2018-03-01 by the reprex package (v0.2.0).
I have some data that I need to analyse easily. I want to create a graph of the average usage per day of a week. The data is in a data.table with the following structure:
time value
2014-10-22 23:59:54 7433033.0
2014-10-23 00:00:12 7433034.0
2014-10-23 00:00:31 7433035.0
2014-10-23 00:00:49 7433036.0
...
2014-10-23 23:59:21 7443032.0
2014-10-23 23:59:40 7443033.0
2014-10-23 23:59:59 7443034.0
2014-10-24 00:00:19 7443035.0
Since the value is cumulative, I would need the maximum value of a day, minus the minimum value of that day, and then average all the values with the same days.
I already know how to get the day of the week (using as.POSIXlt and $wday). So how can I get the daily difference? Once I have the data in a structure like:
dayOfWeek value
0 10
1 20
2 50
I should be able to find the mean myself using some functions.
Here is a sample:
library(data.table)
data <- fread("http://pastebin.com/raw.php?i=GXGiCAiu", header=T)
#get the difference per day
#create average per day of week
There are many ways to do this with R. You can use ave from base R or data.table or dplyr packages. These solutions all add the summaries as columns of your data.
data
df <- data.frame(dayOfWeek = c(0L, 0L, 1L, 1L, 2L),
value = c(10L, 5L, 20L, 60L, 50L))
base r
df$min <- ave(df$value, df$dayOfWeek, FUN = min)
df$max <- ave(df$value, df$dayOfWeek, FUN = max)
data.table
require(data.table)
setDT(df)[, ":="(min = min(value), max = max(value)), by = dayOfWeek][]
dplyr
require(dplyr)
df %>% group_by(dayOfWeek) %>% mutate(min = min(value), max = max(value))
If you just want the summaries, you can also use the following:
# base
aggregate(value~dayOfWeek, df, FUN = min)
aggregate(value~dayOfWeek, df, FUN = max)
# data.table
setDT(df)[, list(min = min(value), max = max(value)), by = dayOfWeek]
# dplyr
df %>% group_by(dayOfWeek) %>% summarise(min(value), max(value))
This is actually a trickier problem than it seemed at first glance. I think you need two separate aggregations, one to aggregate the cumulative usage values within each calendar day by taking the difference of the range, and then a second to aggregate the per-calendar-day usage values by weekday. You can extract the weekday with weekdays(), calculate the daily difference with diff() on the range(), calculate the mean with mean(), and aggregate with aggregate():
set.seed(1);
N <- as.integer(60*60*24/19*14);
df <- data.frame(time=seq(as.POSIXct('2014-10-23 00:00:12',tz='UTC'),by=19,length.out=N)+rnorm(N,0,0.5), value=seq(7433034,by=1,length.out=N)+rnorm(N,0,0.5) );
head(df);
## time value
## 1 2014-10-23 00:00:11 7433034
## 2 2014-10-23 00:00:31 7433035
## 3 2014-10-23 00:00:49 7433036
## 4 2014-10-23 00:01:09 7433037
## 5 2014-10-23 00:01:28 7433039
## 6 2014-10-23 00:01:46 7433039
tail(df);
## time value
## 63658 2014-11-05 23:58:14 7496691
## 63659 2014-11-05 23:58:33 7496692
## 63660 2014-11-05 23:58:51 7496693
## 63661 2014-11-05 23:59:11 7496694
## 63662 2014-11-05 23:59:31 7496695
## 63663 2014-11-05 23:59:49 7496697
df2 <- aggregate(value~date,cbind(df,date=as.Date(df$time)),function(x) diff(range(x)));
df2;
## date value
## 1 2014-10-23 4547.581
## 2 2014-10-24 4546.679
## 3 2014-10-25 4546.410
## 4 2014-10-26 4545.726
## 5 2014-10-27 4546.602
## 6 2014-10-28 4545.194
## 7 2014-10-29 4546.136
## 8 2014-10-30 4546.454
## 9 2014-10-31 4545.712
## 10 2014-11-01 4546.901
## 11 2014-11-02 4544.684
## 12 2014-11-03 4546.378
## 13 2014-11-04 4547.061
## 14 2014-11-05 4547.082
df3 <- aggregate(value~dayOfWeek,cbind(df2,dayOfWeek=weekdays(df2$date)),mean);
df3;
## dayOfWeek value
## 1 Friday 4546.196
## 2 Monday 4546.490
## 3 Saturday 4546.656
## 4 Sunday 4545.205
## 5 Thursday 4547.018
## 6 Tuesday 4546.128
## 7 Wednesday 4546.609
Came across this looking for something else. I think you were looking for the difference and mean per Monday, Tuesday, etc. Sticking with data.table allows a quick all in one call to get the mean per day of week and the difference per day of the week. This gives an output of 7 rows and three columns.
library(data.table)
data <- fread("http://pastebin.com/raw.php?i=GXGiCAiu", header=T)
data_summary <- data[,list(mean = mean(value),
diff = max(value)-min(value)),
by = list(date = format(as.POSIXct(time), format = "%A"))]
This gives an output of 7 rows and three columns.
date mean diff
1: Thursday 7470107 166966
2: Friday 7445945 6119
3: Saturday 7550000 100000
4: Sunday 7550000 100000
5: Monday 7550000 100000
6: Tuesday 7550000 100000
7: Wednesday 7550000 100000