Ignore Max Value in Mean Calculation in R - r

I have the following sample df from 20m sprint testing athletes with split times. They do 3 trials. I want to create new columns for each split that average their two fastest trials (drop the slowest trial).
Here is a sample of the df:
Athlete 0_10m_1 10_20m_1 0_20m_1 0_10m_2 10_20m_2 0_20m_2 0_10m_3 10_20m_3 0_20m_3
1 Athlete 1 2.005 1.320 3.325 1.904 1.306 3.210 1.993 1.316 3.309
2 Athlete 2 1.967 1.383 3.350 1.931 1.391 3.322 2.005 1.399 3.404
3 Athlete 3 2.008 1.381 3.389 2.074 1.365 3.439 2.047 1.408 3.455
4 Athlete 4 1.817 1.286 3.103 1.924 1.285 3.209 NA NA NA
The end result would be 3 new columns with the mean values of the 2 fastest trials (based on the 0_20m time) ("Avg_0_10m", "Avg_10_20m", Avg_0_20m"). Ideally the solution is robust enough to handle NA values as there will be some within the dataset.
Any suggestions on how to approach this? I'm not sure how to be able to filter out the slowest 0_20m trial with the related split times and average the other trials.

library(tidyverse)
x <- read.table(text=" Athlete 0_10m_1 10_20m_1 0_20m_1 0_10m_2 10_20m_2 0_20m_2 0_10m_3 10_20m_3 0_20m_3
'Athlete 1' 2.005 1.320 3.325 1.904 1.306 3.210 1.993 1.316 3.309
'Athlete 2' 1.967 1.383 3.350 1.931 1.391 3.322 2.005 1.399 3.404
'Athlete 3' 2.008 1.381 3.389 2.074 1.365 3.439 2.047 1.408 3.455
'Athlete 4' 1.817 1.286 3.103 1.924 1.285 3.209 NA NA NA", header=TRUE, check.names=FALSE)
x %>%
gather(trial,time,-Athlete) %>%
separate(trial, sep = "(?<=m)_", into = c("trial_time", "trial_try")) %>%
group_by(Athlete, trial_time) %>%
group_split() %>%
purrr::map(function(x) {
x %>%
arrange(time) %>%
group_by(Athlete, trial_time) %>%
summarise(time_avg = mean(time[1:2], na.rm = TRUE))
}) %>%
bind_rows() %>%
spread(trial_time, time_avg)

First to create the data.frame.
x <- read.table(text="x Athlete 0_10m_1 10_20m_1 0_20m_1 0_10m_2 10_20m_2 0_20m_2 0_10m_3 10_20m_3 0_20m_3
1 Athlete 1 2.005 1.320 3.325 1.904 1.306 3.210 1.993 1.316 3.309
2 Athlete 2 1.967 1.383 3.350 1.931 1.391 3.322 2.005 1.399 3.404
3 Athlete 3 2.008 1.381 3.389 2.074 1.365 3.439 2.047 1.408 3.455
4 Athlete 4 1.817 1.286 3.103 1.924 1.285 3.209 NA NA NA", header=T, check.names=F)
x %>% select(-x) %>%
gather("split", "time", -Athlete) %>%
mutate(split = gsub("_\\d$","", split)) %>%
group_by(Athlete, split) %>%
arrange(time) %>%
slice(1:2) %>%
summarize(Avg = mean(time))
# A tibble: 12 x 3
# Groups: Athlete [4]
# Athlete split Avg
# <int> <chr> <dbl>
# 1 1 0_10m 1.95
# 2 1 0_20m 3.26
# 3 1 10_20m 1.31
# 4 2 0_10m 1.95
# 5 2 0_20m 3.34
# 6 2 10_20m 1.39
# 7 3 0_10m 2.03
# 8 3 0_20m 3.41
# 9 3 10_20m 1.37
#10 4 0_10m 1.87
#11 4 0_20m 3.16
#12 4 10_20m 1.29

Related

How do I use dplyr to correlate each column in a for loop?

I have a dataframe of 19 stocks, including the S&P500 (SPX), throughout time. I want to correlate each one of these stocks with the S&P for each month (Jan-Dec), making 18 x 12 = 216 different correlations, and store these in a list called stockList.
> tokens
# A tibble: 366 x 21
Month Date SPX TZERO .....(16 more columns of stocks)...... MPS
<dbl> <dttm> <dbl> <dbl> <dbl>
1 2020-01-02 3245.50 0.95 176.72
...
12 2020-12-31 3733.42 2.90 .....(16 more columns of stocks)..... 360.73
Here's where my error pops up, by using the index [i], or [[i]], in the cor() function
stockList <- list()
for(i in 1:18) {
stockList[[i]] <- tokens %>%
group_by(Month) %>%
summarize(correlation = cor(SPX, tokens[i+3], use = 'complete.obs'))
}
Error in summarise_impl(.data, dots) :
Evaluation error: incompatible dimensions.
How do I use column indexing in the cor() function when trying to summarize? Is there an alternative way?
First, to recreate data like yours:
library(tidyquant)
# Get gamestop, apple, and S&P 500 index prices
prices <- tq_get(c("GME", "AAPL", "^GSPC"),
get = "stock.prices",
from = "2020-01-01",
to = "2020-12-31")
library(tidyverse)
prices_wide <- prices %>%
select(date, close, symbol) %>%
pivot_wider(names_from = symbol, values_from = close) %>%
mutate(Month = lubridate::month(date)) %>%
select(Month, Date = date, GME, AAPL, SPX = `^GSPC`)
This should look like your data:
> prices_wide
# A tibble: 252 x 5
Month Date GME AAPL SPX
<dbl> <date> <dbl> <dbl> <dbl>
1 1 2020-01-02 6.31 75.1 3258.
2 1 2020-01-03 5.88 74.4 3235.
3 1 2020-01-06 5.85 74.9 3246.
4 1 2020-01-07 5.52 74.6 3237.
5 1 2020-01-08 5.72 75.8 3253.
6 1 2020-01-09 5.55 77.4 3275.
7 1 2020-01-10 5.43 77.6 3265.
8 1 2020-01-13 5.43 79.2 3288.
9 1 2020-01-14 4.71 78.2 3283.
10 1 2020-01-15 4.61 77.8 3289.
# … with 242 more rows
Then I put that data in longer "tidy" format where each row has the stock value and the SPX value so I can compare them:
prices_wide %>%
# I want every row to have month, date, and SPX
pivot_longer(cols = -c(Month, Date, SPX),
names_to = "symbol",
values_to = "price") %>%
group_by(Month, symbol) %>%
summarize(correlation = cor(price, SPX)) %>%
ungroup()
# A tibble: 24 x 3
Month symbol correlation
<dbl> <chr> <dbl>
1 1 AAPL 0.709
2 1 GME -0.324
3 2 AAPL 0.980
4 2 GME 0.874
5 3 AAPL 0.985
6 3 GME -0.177
7 4 AAPL 0.956
8 4 GME 0.873
9 5 AAPL 0.792
10 5 GME -0.435
# … with 14 more rows

shift observation of one group into next group by id in R

Suppose I have a dataframe like so:
contracts
Dates Last.Price Last.Price.1 id carry
1 1998-11-30 94.50 98.50 QS -0.040609137
2 1998-11-30 31.32 32.13 HO -0.025210084
3 1998-12-31 95.50 98.00 QS -0.025510204
4 1998-12-31 34.00 34.28 HO -0.008168028
5 1999-01-29 100.00 100.50 QS -0.004975124
6 1999-01-29 33.16 33.42 HO -0.007779773
7 1999-02-26 100.25 100.25 QS 0.000000000
8 1999-02-26 32.29 32.37 HO -0.002471424
9 1999-02-26 10.88 11.00 CO -0.010909091
10 1999-03-31 131.50 130.75 QS 0.005736138
11 1999-03-31 44.68 44.00 HO 0.015454545
12 1999-03-31 15.24 15.16 CO 0.005277045
I want to calculate the weights of each id in each month. I have a function that does this. I use dplyr to achieve this:
library(dplyr)
library(lubridate)
contracts <- contracts %>%
mutate(Dates = ymd(Dates)) %>%
group_by(Dates) %>%
mutate(weights = weight(carry))
which gives:
contracts
Dates Last.Price Last.Price.1 id carry weights
1 1998-11-30 94.50 98.50 QS -0.040609137 0.616979910
2 1998-11-30 31.32 32.13 HO -0.025210084 0.383020090
3 1998-12-31 95.50 98.00 QS -0.025510204 0.757468623
4 1998-12-31 34.00 34.28 HO -0.008168028 0.242531377
5 1999-01-29 100.00 100.50 QS -0.004975124 0.390056023
6 1999-01-29 33.16 33.42 HO -0.007779773 0.609943977
7 1999-02-26 100.25 100.25 QS 0.000000000 NA
8 1999-02-26 32.29 32.37 HO -0.002471424 0.184703218
9 1999-02-26 10.88 11.00 CO -0.010909091 0.815296782
10 1999-03-31 131.50 130.75 QS 0.057361377 0.057361377
11 1999-03-31 44.68 44.00 HO 0.015454545 0.015454545
12 1999-03-31 15.24 15.16 CO 0.005277045 0.005277045
Now I want the lag the weights, such that the weights calculated in november are applied in december. So I essentially want to shift the weights column by group, the group being the dates. So the values in November end up being the values in December and so on.
Now I also want the shift to match by id, such that if a new id is included, the group where the id first appears will have an NA in the lagged column.
The desired output is given below:
desired
Dates Last.Price Last.Price.1 id carry weights w
1 1998-11-30 94.50 98.50 QS -0.040609137 0.616979910 NA
2 1998-11-30 31.32 32.13 HO -0.025210084 0.383020090 NA
3 1998-12-31 95.50 98.00 QS -0.025510204 0.757468623 0.61697991
4 1998-12-31 34.00 34.28 HO -0.008168028 0.242531377 0.38302009
5 1999-01-29 100.00 100.50 QS -0.004975124 0.390056023 0.75746862
6 1999-01-29 33.16 33.42 HO -0.007779773 0.609943977 0.24253138
7 1999-02-26 100.25 100.25 QS 0.000000000 NA 0.39005602
8 1999-02-26 32.29 32.37 HO -0.002471424 0.184703218 0.60994398
9 1999-02-26 10.88 11.00 CO -0.010909091 0.815296782 NA
10 1999-03-31 131.50 130.75 QS 0.057361377 0.057361377 NA
11 1999-03-31 44.68 44.00 HO 0.015454545 0.015454545 0.18470322
12 1999-03-31 15.24 15.16 CO 0.005277045 0.005277045 0.81529678
Take note of February 1999. CO has an NA because it first appears in February.
Now look at March 1999, CO has the value from Februray, QS has an NA only because the February value was NA (due to division by 0).
Can this be done?
Data:
contracts <- read.table(text = "Dates, Last.Price, Last.Price.1, id,carry
1998-11-30, 94.500, 98.500, QS, -0.0406091371
1998-11-30, 31.320, 32.130, HO, -0.0252100840
1998-12-31, 95.500, 98.000, QS, -0.0255102041
1998-12-31, 34.000, 34.280, HO, -0.0081680280
1999-01-29, 100.000, 100.500, QS, -0.0049751244
1999-01-29, 33.160, 33.420, HO, -0.0077797726
1999-02-26, 100.250, 100.250, QS, 0.0000000000
1999-02-26, 32.290, 32.370, HO, -0.0024714242
1999-02-26, 10.880, 11.000, CO, -0.0109090909
1999-03-31, 131.500, 130.750, QS, 0.0057361377
1999-03-31, 44.680, 44.000, HO, 0.0154545455
1999-03-31, 15.240, 15.160, CO, 0.0052770449", sep = ",", header = T)
desired <- read.table(text = "Dates,Last.Price,Last.Price.1,id,carry,weights,w
1998-11-30,94.5,98.5, QS,-0.0406091371,0.616979909839741,NA
1998-11-30,31.32,32.13, HO,-0.025210084,0.383020090160259,NA
1998-12-31,95.5,98, QS,-0.0255102041,0.757468623182272,0.616979909839741
1998-12-31,34,34.28, HO,-0.008168028,0.242531376817728,0.383020090160259
1999-01-29,100,100.5, QS,-0.0049751244,0.390056023188584,0.757468623182272
1999-01-29,33.16,33.42, HO,-0.0077797726,0.609943976811416,0.242531376817728
1999-02-26,100.25,100.25, QS,0,NA,0.390056023188584
1999-02-26,32.29,32.37, HO,-0.0024714242,0.184703218189261,0.609943976811416
1999-02-26,10.88,11, CO,-0.0109090909,0.815296781810739,NA
1999-03-31,131.5,130.75, QS,0.057361377,0.057361377,NA
1999-03-31,44.68,44, HO,0.0154545455,0.0154545455,0.184703218189261
1999-03-31,15.24,15.16, CO,0.0052770449,0.0052770449,0.815296782", sep = ",", header = TRUE)
weights function:
weight <- function(vec) {
neg <- which(vec<0)
w <- vec
w[neg] <- vec[vec<0] / sum(vec[vec<0])
w[-neg] <- vec[vec>=0] / sum(vec[vec>=0])
w
}
contracts %>%
group_by(Dates) %>%
mutate(weights = weight(carry)) %>%
arrange(Dates) %>%
group_by(id) %>%
mutate(w = dplyr::lag(weights)) %>%
ungroup()
# # A tibble: 12 x 7
# Dates Last.Price Last.Price.1 id carry weights w
# <chr> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
# 1 1998-11-30 94.5 98.5 " QS" -0.0406 0.617 NA
# 2 1998-11-30 31.3 32.1 " HO" -0.0252 0.383 NA
# 3 1998-12-31 95.5 98 " QS" -0.0255 0.757 0.617
# 4 1998-12-31 34 34.3 " HO" -0.00817 0.243 0.383
# 5 1999-01-29 100 100. " QS" -0.00498 0.390 0.757
# 6 1999-01-29 33.2 33.4 " HO" -0.00778 0.610 0.243
# 7 1999-02-26 100. 100. " QS" 0 NaN 0.390
# 8 1999-02-26 32.3 32.4 " HO" -0.00247 0.185 0.610
# 9 1999-02-26 10.9 11 " CO" -0.0109 0.815 NA
# 10 1999-03-31 132. 131. " QS" 0.00574 0.00574 NaN
# 11 1999-03-31 44.7 44 " HO" 0.0155 0.0155 0.185
# 12 1999-03-31 15.2 15.2 " CO" 0.00528 0.00528 0.815
Notes:
I used dplyr::lag instead of just lag because of the possibility of confusion with stats::lag, which behaves significantly differently than dplyr::lag. While most of the time it'll work just fine, it works until it doesn't ... and it doesn't usually warn you :-)
This is lagging by Dates regardless of month. I'll assume that you are certain that Dates are always perfectly frequent. If you think there's the possibility in a gap (where lagging by-row is not correct), then you'll need to break out the year/month into a new field and join on itself instead of doing a lag.

How to pull last nth trading day of month in XTS in R?

This question is closely related to Pull nth Day of Month in XTS in R, in which I got a good answer:
library(xts)
data(sample_matrix)
x <- as.xts(sample_matrix)
do.call(rbind, lapply(split(x, "months"), function(x) x[10]))
# Open High Low Close
# 2007-01-11 49.88529 50.23910 49.88529 50.23910
# 2007-02-10 50.68923 50.72696 50.60707 50.69562
# 2007-03-10 49.79370 49.88984 49.70385 49.88698
# 2007-04-10 49.55704 49.78776 49.55704 49.76984
# 2007-05-10 48.83479 48.84549 48.38001 48.38001
# 2007-06-10 47.74899 47.74899 47.28685 47.28685
However, I want to Pull last nth trading day of each month. For example, I have a dataframe look like this, but the time span is several years.
date change open high low close volume
1 1990-01-02 1.780 353.40 359.69 351.98 359.69 162070000
2 1990-01-03 -0.259 359.69 360.59 357.89 358.76 192330000
3 1990-01-04 -0.861 358.76 358.76 352.89 355.67 177000000
4 1990-01-05 -0.976 355.67 355.67 351.35 352.20 158530000
5 1990-01-08 0.451 352.20 354.24 350.54 353.79 140110000
6 1990-01-09 -1.179 353.83 354.17 349.61 349.62 155210000
7 1990-01-10 -0.661 349.62 349.62 344.32 347.31 175990000
8 1990-01-11 0.351 347.31 350.14 347.31 348.53 154390000
9 1990-01-12 -2.468 348.53 348.53 339.49 339.93 183880000
10 1990-01-15 -0.862 339.93 339.94 336.57 337.00 140590000
11 1990-01-16 1.113 337.00 340.75 333.37 340.75 186070000
12 1990-01-17 -0.983 340.77 342.01 336.26 337.40 170470000
13 1990-01-18 0.234 337.40 338.38 333.98 338.19 178590000
14 1990-01-19 0.284 338.19 340.48 338.19 339.15 185590000
15 1990-01-22 -2.586 339.14 339.96 330.28 330.38 148380000
16 1990-01-23 0.372 330.38 332.76 328.67 331.61 179300000
17 1990-01-24 -0.407 331.61 331.71 324.17 330.26 207830000
I want to have a function that extract last nth trading day of every month, and form a new dataframe. For example, if I want to extract last 2nd trading day of every month. The output should look like the table shown below.
date change open high low close volume
1990-01-30 -0.683 325.20 325.73 319.83 322.98 186030000
1990-02-27 0.484 328.68 331.94 328.47 330.26 152590000
1990-03-29 -0.354 342.00 342.07 339.77 340.79 132190000
1990-04-27 -1.144 332.92 333.57 328.71 329.11 130630000
Note that I want to extract the last nth data point of each month, rather than the last nth calendar date.
You could use tail.
n <- 2
do.call(rbind, lapply(split(x, "months"), function(x) tail(x, n)))
# Open High Low Close
# 2007-01-30 49.85477 50.02180 49.77242 50.02180
# 2007-01-31 50.07049 50.22578 50.07049 50.22578
# 2007-02-27 50.74333 50.78909 50.61874 50.69206
# 2007-02-28 50.69435 50.77091 50.59881 50.77091
# 2007-03-30 48.74562 49.00218 48.74562 48.93546
# 2007-03-31 48.95616 49.09728 48.95616 48.97490
# 2007-04-29 49.30289 49.30289 49.05676 49.13529
# 2007-04-30 49.13825 49.33974 49.11500 49.33974
# 2007-05-30 47.78866 47.93267 47.78866 47.83291
# 2007-05-31 47.82845 47.84044 47.73780 47.73780
# 2007-06-29 47.63629 47.77563 47.61733 47.66471
# 2007-06-30 47.67468 47.94127 47.67468 47.76719
Data
library(xts)
data(sample_matrix)
x <- as.xts(sample_matrix)

Reshaping data for panel regression from Datastream

I have downloaded data from Datastream in form one variable per sheet.
Current data view - One variable: Price
What I want to do it to convert each sheet (each variable) into panel format so that I can use plm() or export data to Stata (I am kind of new to R), so that it looks like
Click to view - What I expect to have
One conundrum is that I have >500 companies and manually writting the names (or codes) in the R code is very burdensome
I would really appreciate if you could sketch a basic code and not just refer to reshape function in R.
P.S. Sorry for posting this question if it was already answered.
Your current data set is in wide format and you need it in long format and melt function from reshape package will do very well
The primary key for melt function is date since it is the same for all companies
I have assumed a test dataset for the below demo:
#Save Price, volume, market value, shares, etc into individual CSV files
#Rename first column as "date" and Remove rows 2 and 3 since you do not need them
#Demo for price data
price_data = read.csv("path_to_price_csv_file",header=TRUE,stringsAsFactors=FALSE,na.strings="NA")
test_DF = price_data
require(reshape2)
require(PerformanceAnalytics)
data(managers)
test_DF = data.frame(date=as.Date(index(managers),format="%Y-%m-%d"),managers,row.names=NULL,stringsAsFactors=FALSE)
#This data is similar in format as your price data
head(test_DF)
# date HAM1 HAM2 HAM3 HAM4 HAM5 HAM6 EDHEC.LS.EQ SP500.TR US.10Y.TR US.3m.TR
# 1 1996-01-31 0.0074 NA 0.0349 0.0222 NA NA NA 0.0340 0.00380 0.00456
# 2 1996-02-29 0.0193 NA 0.0351 0.0195 NA NA NA 0.0093 -0.03532 0.00398
# 3 1996-03-31 0.0155 NA 0.0258 -0.0098 NA NA NA 0.0096 -0.01057 0.00371
# 4 1996-04-30 -0.0091 NA 0.0449 0.0236 NA NA NA 0.0147 -0.01739 0.00428
# 5 1996-05-31 0.0076 NA 0.0353 0.0028 NA NA NA 0.0258 -0.00543 0.00443
# 6 1996-06-30 -0.0039 NA -0.0303 -0.0019 NA NA NA 0.0038 0.01507 0.00412
#test_data = test_DF #replace price, volume , shares dataset here
#dateColumnName = "date" #name of your date column
#columnOfInterest1 = "manager" #for you this will be "Name"
#columnOfInterest2 = "return" #this will vary according to your input data, price, volume, shares etc.
Custom_Melt_DataFrame = function(test_data = test_DF ,dateColumnName = "date", columnOfInterest1 = "manager",columnOfInterest2 = "return") {
molten_DF = melt(test_data,dateColumnName,stringsAsFactors=FALSE)
colnames(molten_DF) = c(dateColumnName,columnOfInterest1,columnOfInterest2)
#format as character
molten_DF[,columnOfInterest1] = as.character(molten_DF[,columnOfInterest1])
#assign index
molten_DF$index = rep(1:(ncol(test_data)-1),each=nrow(test_data))
#reorder columns
molten_DF = molten_DF[,c("index",columnOfInterest1,dateColumnName,columnOfInterest2)]
return(molten_DF)
}
custom_data = Custom_Melt_DataFrame (test_data = test_DF ,dateColumnName = "date", columnOfInterest1 = "manager",columnOfInterest2 = "return")
head(custom_data,10)
# index manager date return
# 1 1 HAM1 1996-01-31 0.0074
# 2 1 HAM1 1996-02-29 0.0193
# 3 1 HAM1 1996-03-31 0.0155
# 4 1 HAM1 1996-04-30 -0.0091
# 5 1 HAM1 1996-05-31 0.0076
# 6 1 HAM1 1996-06-30 -0.0039
# 7 1 HAM1 1996-07-31 -0.0231
# 8 1 HAM1 1996-08-31 0.0395
# 9 1 HAM1 1996-09-30 0.0147
# 10 1 HAM1 1996-10-31 0.0288
tail(custom_data,10)
# index manager date return
# 1311 10 US.3m.TR 2006-03-31 0.00385
# 1312 10 US.3m.TR 2006-04-30 0.00366
# 1313 10 US.3m.TR 2006-05-31 0.00404
# 1314 10 US.3m.TR 2006-06-30 0.00384
# 1315 10 US.3m.TR 2006-07-31 0.00423
# 1316 10 US.3m.TR 2006-08-31 0.00441
# 1317 10 US.3m.TR 2006-09-30 0.00456
# 1318 10 US.3m.TR 2006-10-31 0.00381
# 1319 10 US.3m.TR 2006-11-30 0.00430
# 1320 10 US.3m.TR 2006-12-31 0.00441

How to convert daywise(daily) data to monthly data using R? [duplicate]

This question already has answers here:
Aggregate Daily Data to Month/Year intervals
(9 answers)
Closed 7 years ago.
I have day-wise data of interest rate of 15 years from 01-01-2000 to 01-01-2015.
I want to convert this data to monthly data, which only having month and year.
I want to take mean of the values of all the days in a month and make it one value of that month.
How can I do this in R.
> str(mibid)
'data.frame': 4263 obs. of 6 variables:
$ Days: int 1 2 3 4 5 6 7 8 9 10 ...
$ Date: Date, format: "2000-01-03" "2000-01-04" "2000-01-05" "2000-01-06" ...
$ BID : num 8.82 8.82 8.88 8.79 8.78 8.8 8.81 8.82 8.86 8.78 ...
$ I.S : num 0.092 0.0819 0.0779 0.0801 0.074 0.0766 0.0628 0.0887 0.0759 0.073 ...
$ BOR : num 9.46 9.5 9.52 9.36 9.33 9.37 9.42 9.39 9.4 9.33 ...
$ R.S : num 0.0822 0.0817 0.0828 0.0732 0.084 0.0919 0.0757 0.0725 0.0719 0.0564 ...
> head(mibid)
Days Date BID I.S BOR R.S
1 1 2000-01-03 8.82 0.0920 9.46 0.0822
2 2 2000-01-04 8.82 0.0819 9.50 0.0817
3 3 2000-01-05 8.88 0.0779 9.52 0.0828
4 4 2000-01-06 8.79 0.0801 9.36 0.0732
5 5 2000-01-07 8.78 0.0740 9.33 0.0840
6 6 2000-01-08 8.80 0.0766 9.37 0.0919
>
I'd do this with xts:
set.seed(21)
mibid <- data.frame(Date=Sys.Date()-100:1,
BID=rnorm(100, 8, 0.1), I.S=rnorm(100, 0.08, 0.01),
BOR=rnorm(100, 9, 0.1), R.S=rnorm(100, 0.08, 0.01))
require(xts)
# convert to xts
xmibid <- xts(mibid[,-1], mibid[,1])
# aggregate
agg_xmibid <- apply.monthly(xmibid, colMeans)
# convert back to data.frame
agg_mibid <- data.frame(Date=index(agg_xmibid), agg_xmibid, row.names=NULL)
head(agg_mibid)
# Date BID I.S BOR R.S
# 1 2015-04-30 8.079301 0.07189111 9.074807 0.06819096
# 2 2015-05-31 7.987479 0.07888328 8.999055 0.08090253
# 3 2015-06-30 8.043845 0.07885779 9.018338 0.07847999
# 4 2015-07-31 7.990822 0.07799489 8.980492 0.08162038
# 5 2015-08-07 8.000414 0.08535749 9.044867 0.07755017
A small example of how this might be done using dplyr and lubridate
set.seed(321)
dat <- data.frame(day=seq.Date(as.Date("2010-01-01"), length.out=200, by="day"),
x = rnorm(200),
y = rexp(200))
head(dat)
day x y
1 2010-01-01 1.7049032 2.6286754
2 2010-01-02 -0.7120386 0.3916089
3 2010-01-03 -0.2779849 0.1815379
4 2010-01-04 -0.1196490 0.1234461
5 2010-01-05 -0.1239606 2.2237404
6 2010-01-06 0.2681838 0.3217511
require(dplyr)
require(lubridate)
dat %>%
mutate(year = year(day),
monthnum = month(day),
month = month(day, label=T)) %>%
group_by(year, month) %>%
arrange(year, monthnum) %>%
select(-monthnum) %>%
summarise(x = mean(x),
y = mean(y))
Source: local data frame [7 x 4]
Groups: year
year month x y
1 2010 Jan 0.02958633 0.9387509
2 2010 Feb 0.07711820 1.0985411
3 2010 Mar -0.06429982 1.2395438
4 2010 Apr -0.01787658 1.3627864
5 2010 May 0.19131861 1.1802712
6 2010 Jun -0.04894075 0.8224855
7 2010 Jul -0.22410057 1.1749863
Another option is using data.table which has several very convenient datetime functions. Using the data of #SamThomas:
library(data.table)
setDT(dat)[, lapply(.SD, mean), by=.(year(day), month(day))]
this gives:
year month x y
1: 2010 1 0.02958633 0.9387509
2: 2010 2 0.07711820 1.0985411
3: 2010 3 -0.06429982 1.2395438
4: 2010 4 -0.01787658 1.3627864
5: 2010 5 0.19131861 1.1802712
6: 2010 6 -0.04894075 0.8224855
7: 2010 7 -0.22410057 1.1749863
On the data of #JoshuaUlrich:
setDT(mibid)[, lapply(.SD, mean), by=.(year(Date), month(Date))]
gives:
year month BID I.S BOR R.S
1: 2015 5 7.997178 0.07794925 8.999625 0.08062426
2: 2015 6 8.034805 0.07940600 9.019823 0.07823314
3: 2015 7 7.989371 0.07822263 8.996015 0.08195401
4: 2015 8 8.010541 0.08364351 8.982793 0.07748399
If you want the names of the months instead of numbers, you will have to include [, day:=as.IDate(day)] after the setDT() part and use months instead of month:
setDT(mibid)[, Date:=as.IDate(Date)][, lapply(.SD, mean), by=.(year(Date), months(Date))]
Note: Especially on larger datasets, data.table will probably be (a lot) faster then the other two solutions.

Resources