Related to this question here, but I decided to ask another question for the sake of clarity as the 'new' question is not directly related to the original. Briefly, I am using ddply to cumulatively sum a value for each of three years. My code takes data from the first year and repeats in in the second and third-year rows of the column. My guess is that each 1-year chunk is being copied to the whole of the column, but I don't understand why.
Q. How can I get a cumulatively summed value for each year, in the right rows of the designated column?
[Edit: the for loop - or something similar - is important, as ultimately I want to automagically calculate new columns based on a list of column names, rather than calculating each new column by hand. The loop iterates over the list of column names.]
I use the ddply and cumsum combination frequently so it is rather vexing to suddenly be having problems with it.
[Edit: this code has been updated to the solution I settled on, which is based on #Chase's answer below]
require(lubridate)
require(plyr)
require(xts)
require(reshape)
require(reshape2)
set.seed(12345)
# create dummy time series data
monthsback <- 24
startdate <- as.Date(paste(year(now()),month(now()),"1",sep = "-")) - months(monthsback)
mydf <- data.frame(mydate = seq(as.Date(startdate), by = "month", length.out = monthsback),
myvalue1 = runif(monthsback, min = 600, max = 800),
myvalue2 = runif(monthsback, min = 1900, max = 2400),
myvalue3 = runif(monthsback, min = 50, max = 80),
myvalue4 = runif(monthsback, min = 200, max = 300))
mydf$year <- as.numeric(format(as.Date(mydf$mydate), format="%Y"))
mydf$month <- as.numeric(format(as.Date(mydf$mydate), format="%m"))
# Select columns to process
newcolnames <- c('myvalue1','myvalue4','myvalue2')
# melt n' cast
mydf.m <- mydf[,c('mydate','year',newcolnames)]
mydf.m <- melt(mydf.m, measure.vars = newcolnames)
mydf.m <- ddply(mydf.m, c("year", "variable"), transform, newcol = cumsum(value))
mydf.m <- dcast(mydate ~ variable, data = mydf.m, value.var = "newcol")
colnames(mydf.m) <- c('mydate',paste(newcolnames, "_cum", sep = ""))
mydf <- merge(mydf, mydf.m, by = 'mydate', all = FALSE)
mydf
I don't really follow your for loop there, but are you overcomplicating things? Can't you just directly use transform and ddply?
#Make sure it's ordered properly
mydf <- mydf[order(mydf$year, mydf$month),]
#Use ddply to calculate the cumsum by year:
ddply(mydf, "year", transform,
cumsum1 = cumsum(myvalue1),
cumsum2 = cumsum(myvalue2))
#----------
mydate myvalue1 myvalue2 year month cumsum1 cumsum2
1 2010-05-01 744.1808 264.4543 2010 5 744.1808 264.4543
2 2010-06-01 775.1546 238.9828 2010 6 1519.3354 503.4371
3 2010-07-01 752.1965 269.8544 2010 7 2271.5319 773.2915
....
9 2011-01-01 745.5411 218.7712 2011 1 745.5411 218.7712
10 2011-02-01 797.9474 268.1834 2011 2 1543.4884 486.9546
11 2011-03-01 606.9071 237.0104 2011 3 2150.3955 723.9650
...
21 2012-01-01 690.7456 225.9681 2012 1 690.7456 225.9681
22 2012-02-01 665.3505 232.1225 2012 2 1356.0961 458.0906
23 2012-03-01 793.0831 206.0195 2012 3 2149.1792 664.1101
EDIT - this is untested as I don't have R on this machine, but this is what I had in mind:
require(reshape2)
mydf.m <- melt(mydf, measure.vars = newcolnames)
mydf.m <- ddply(mydf.m, c("year", "variable"), transform, newcol = cumsum(value))
dcast(mydate + year + month ~ variable, data = mydf.m, value.var = "newcol")
Related
I am new to data tables in R and have managed to get 80% of the way through my analysis. The background is that I want to get the returns of a stock 5 days (before and after), and then 25 and 45 days after they report. I have successfully managed to do it for one set of dates (effectively hardcoding) but when I try and automate the process it falls apart.
I will start with my current formulas and then explain the data.
This formula successfully looks at the data tables and returns the sum that I need. The issue is that datem5 and V1 need to go through a loop (or mapply) to automate the process.
CQR_Date[CQR_DF[CQR_Date, sum(CQR), on = .(unit, date >= date1, date <= datem5),
by = .EACHI], newvar := V1, on = .(unit, date1=date)]
I tried this (along with many other variants). Please note the newvar needs to be addressed as well.
for (i in 1:4) {
CQR_Date[CQR_DF[CQR_Date, sum(CQ), on = .(unit, date >= date1, date <= cols[,..i]),
by = .EACHI], newvar := v, on = .(unit, date1=date)]
but get this error
Error: argument specifying columns specify non existing column(s): cols[3]='cols[, ..i]'
Interestingly, when I try
for (i in 1:2) {
y <- cols[,..i]}
There is no issue.
Now in terms of data;
col just contains the column headings that I need from CQR_Data
cols <- data.table("datem5", "datep5", "datep20" , "datep45")
CQ_Data has the reporting dates for the stock CQ such as the following
CQ_Date <- data.frame("date1" = anydate(c("2016-02-17", "2016-06-12", "2016-08-17")))
CQ_Date$datem5 <- CQ_Date$date1 - 5 # minus five days
CQ_Date$datep5 <- CQ_Date$date1 + 5 # plus five days
CQ_Date$datep20 <- CQ_Date$date1 + 20
CQ_Date$datep45 <- CQ_Date$date1 + 45
CQ_Date$unit <- 1 # I guess I need this for some sort of indexing
Then CQ_DF (it is the log returns for the stock) is formed by:
CQ_DF <- data.frame("unit" = rep(1,300))
CQ_DF$CQ <- rnorm(10)
CQ_DF$date <- seq(as.Date("2015-12-25"), by = "day", length.out = 300)
CQ_DF$unit <- 1
Before setting them as DT
setDT(CQ_DF)
setDT(CQ_Date)
Any help would be greatly appreciated. Note this uses
library(data.table)
library(anytime)
A simplified version is:
CQ_Date <- data.frame("date1" = c(10, 20))
CQ_Date$datep5 <- CQ_Date$date1 + 5 # plus five days
CQ_Date$datep20 <- CQ_Date$date1 + 10
CQ_Date$unit <- 1
CQ_DF <- data.frame("unit" = rep(1,100))
CQ_DF$CQ <- seq(1, by = 1, length.out = 100)
CQ_DF$date <- seq(1, by = 1, length.out = 100)
CQ_DF$unit <- 1
setDT(CQ_DF)
setDT(CQ_Date)
cols <- c("datep5", "datep20" )
tmp <- melt(CQ_Date, measure.vars = cols)
setDT(tmp)
tmp[CQ_DF[tmp, sum(CQ), on = .( unit, date >= date1, date <= value), by =
.EACHI],newvar := V1, on = .(unit, date1=date )]
The issue is now that the sum does not appear to work correctly. It may have something to do with "variable" variable.
Instead of using mapply or for loop, try reshaping the dataset in long format using melt, create sequence between the numbers, perform the join and calculate the sum.
library(data.table)
cols <- c("datep5", "datep20" )
tmp <- melt(CQ_Date, measure.vars = cols)
tmp <- melt(CQ_Date, measure.vars = cols)
tmp <- tmp[, list(date = seq(date1, value)), .(unit, variable, date1, value)]
tmp <- merge(tmp, CQ_DF, by = c('unit', 'date'))
tmp[, .(newvar = sum(CQ)), .(unit, variable, date1)]
# unit variable date1 newvar
#1: 1 datep5 10 75
#2: 1 datep20 10 165
#3: 1 datep5 20 135
#4: 1 datep20 20 275
If you need the data back in wide format you can use dcast.
Equivalent tidyverse option is :
library(tidyverse)
CQ_Date %>%
pivot_longer(cols = cols) %>%
mutate(date = map2(date1, value, seq)) %>%
unnest(date) %>%
left_join(CQ_DF, by = c('unit', 'date')) %>%
group_by(unit, name, date1) %>%
summarise(newvar = sum(CQ))
I have dataframe that has 6000 locations. For each location, I have 36 years daily data of rainfall in wide format.
A sample data:
set.seed(123)
mat <- matrix(round(rnorm(6000*36*365), digits = 2),nrow = 6000*36, ncol = 365)
dat <- data.table(mat)
names(dat) <- rep(paste0("d_",1:365))
dat$loc.id <- rep(1:6000, each = 36)
dat$year <- rep(1980:2015, times = 6000)
What I want to do is for each location, generate the long term average rainfall for each month. For e.g. for loc.id = 1, mean rainfall in Jan, Feb, March... Dec.
Let' say this data is called df which is a data table
library(dplyr)
Here's what I did:
loc.list <- unique(dat$loc.id)
my.list <- list() # a list to store results
ptm <- proc.time()
for(i in seq_along(loc.list)){
n <- loc.list[i]
df1 <- dat[dat$loc.id == n,]
df2 <- gather(df1, day, rain, -year) # this melts the data in long format
df3 <- df2 %>% mutate(day = gsub("d_","", day)) %>% # since the day column was in "d_1" format, I converted into integer (1,2,3..365)
mutate(day = as.numeric(as.character(day))) %>% # ensure that day column is numeric. For some reasonson, some NA.s appear.
arrange(year,day) %>% # ensure that they are arranged in order
mutate(month = strptime(paste(year, day), format = "%Y %j")$mon + 1) %>% # assing each day to a month
group_by(year,month) %>% # group by year and month
summarise(month.rain = sum(rain)) %>% # calculate for each location, year and month, total rainfall
group_by(month) %>% # group by month
summarise(month.mean = round(mean(month.rain), digits = 2)) # calculate for each month, the long term mean
my.list[[i]] <- df3
}
proc.time() - ptm
user system elapsed
1036.17 0.20 1040.68
I wanted to ask if there are more efficient and faster way to achieve this task
Another data.table alternative:
# change column names to month, grabbed from 365 dates of a non-leap year
setnames(dat, c(format(as.Date("2017-01-01") + 0:364, "%b"),
"loc.id", "year"))
# melt to long format
d <- melt(dat, id.vars = c("loc.id", "year"),
variable.name = "month", value.name = "rain")
# calculate mean rain by location and month
d2 <- d[ , .(mean_rain = mean(rain)), by = .(loc, month)]
This seems ~7 times faster than the answer by caw5cs. The result by Martin Morgan is in a different format though, which prevents a direct comparison of timings.
If you rather have unique column names in 'dat', you may use %b_%d (month-day) instead of %b only. Then use substr in by to grab the month part:
# change column names to month_day, using 365 dates of a non-leap year
setnames(dat, c(format(as.Date("2017-01-01") + 0:364, "%b_%d"),
"loc.id", "year"))
# melt to long format
d <- melt(dat, id.vars = c("loc.id", "year"),
variable.name = "month_day", value.name = "rain")
# calculate mean rain by location and month
d2 <- d[ , .(mean_rain = mean(rain)), by = .(loc.id, month = substr(month_day, 1, 3))]
Use the cryptically named rowsum() to sum daily rainfall at each site, over all years
loc.id = rep(1:6000, each = 36)
daily.by.loc = rowsum(mat, loc.id)
and use the same trick on the transposed matrix to sum by month (since there are 365 columns leap years must be ignored).
month = factor(
months(as.Date(0:364, origin="1970-01-01")),
levels = month.name
)
loc.by.month = rowsum(t(daily.by.loc), month)
Calculate the average by dividing by number of observations; R's column-major matrix representation and recycling rules apply. Transpose so the orientation is the same as the data.
days.per.month = tabulate(month)
ans = t(loc.by.month / (36 * days.per.month))
The result is a 6000 x 12 matrix
> dim(ans)
[1] 6000 12
> head(ans, 3)
January February March April May June
1 0.01554659 0.002043651 -0.02950717 -0.02700926 0.003521505 -0.011268519
2 0.04953405 0.032926587 -0.04959677 0.02808333 0.022051971 0.009768519
3 -0.01125448 -0.023343254 -0.02672939 0.04012963 0.018530466 0.035583333
July August September October November December
1 0.009874552 -0.030824373 -0.04958333 -0.03366487 -0.07390741 -0.07899642
2 -0.011630824 -0.003369176 -0.00100000 -0.00594086 -0.02817593 -0.01161290
3 0.031810036 0.059641577 -0.01109259 0.04646953 -0.01601852 0.03103943
in less than a second.
Grossly misread the question the first time, oops! Seems to be working as intended this time.
library(data.table)
set.seed(123)
mat <- matrix(round(rnorm(6000*36*365), digits = 2),nrow = 6000*36, ncol = 365)
dat <- data.table(mat)
names(dat) <- rep(paste0("d_",1:365))
dat$loc.id <- rep(1:6000, each = 36)
dat$year <- rep(1980:2015, times = 6000)
system.time({
# convert to long format with month # as column name
date_cols <- colnames(dat)[1:365]
setnames(dat, date_cols, as.character(1:365))
dat.long <- melt(dat, measure.vars=as.character(1:365), variable="day", value="rainfall")
# R date starts at 0 for Jan 1, so we offset the day by 1
dat.long[, day := as.numeric(day) - 1]
setkey(dat.long, year, day)
# Make table for merging year/day/month
months <- CJ(year=1980:2015, day=0:365)
months[, date := as.Date(day, origin=paste(year, "-01-01", sep=""))]
months[, month := tstrsplit(date, "-")[2]]
setkey(months, year, day)
# Merge tables to get month column
dat.merge <- merge(dat.long, months)
# aggregate by location an dmonth
dat.ag <- dat.merge[, list(mean_rainfall = mean(rainfall)), by=list(loc.id, month)]
})
Yielding
user system elapsed
14.420 4.205 18.626
> dat.ag
loc.id month mean_rainfall
1: 1 01 0.015546595
2: 2 01 0.049534050
3: 3 01 -0.011254480
4: 4 01 -0.019453405
5: 5 01 0.005860215
---
71996: 5996 12 0.027407407
71997: 5997 12 0.020334237
71998: 5998 12 0.043360434
71999: 5999 12 -0.006856369
72000: 6000 12 0.040542005
My current data frame looks like this:
# Create sample data
my_df <- data.frame(seq(1, 100), rep(c("ind_1", "", "", ""), times = 25), rep(c("", "ind_2", "", ""), times = 25), rep(c("", "", "ind_3", ""), times = 25), rep(c("", "", "", "ind_4"), times = 25))
# Rename columns
names(my_df)[names(my_df)=="seq.1..100."] <- "value"
names(my_df)[names(my_df)=="rep.c..ind_1................times...25."] <- "ind_1"
names(my_df)[names(my_df)=="rep.c......ind_2............times...25."] <- "ind_2"
names(my_df)[names(my_df)=="rep.c..........ind_3........times...25."] <- "ind_3"
names(my_df)[names(my_df)=="rep.c..............ind_4....times...25."] <- "ind_4"
# Replace empty elements with NA
my_df[my_df==''] = NA
What I want to script is a rather simple for loop that calculates the sum of the value column for each of the four ind_*columns and prints the result.
So far my very meagre attempt has been:
# Create a vector with all individuals
individuals <- c("ind_1", "ind_2", "ind_3", "ind_4")
# Calculate aggregates for each individual
for (i in individuals){
ind <- 1
sum_i <- aggregate(value~ind_1, data = my_df, sum)
print(paste("Individual", i, "possesses an aggregated value of", sum_i$value))
ind <- ind + 1
}
As you can see, I currently struggle to include the correct command to calculate the sum based on one column after another as the current output, naturally, only calculates the result of ind_1. What needs to be changed in the aggregatecommand to achieve the desired result (I'm a total beginner but thought of using indices for proceeding from one column to another?)?
Assuming you´d want to calculate the sum if ind-column matches an expression in your individuals-vector:
individuals <- c("ind_1", "ind_2", "ind_3", "ind_4")
for (i in 1:(ncol(my_df)-1)){
print(sum(my_df$value[which(my_df[,individuals[i]] == individuals[i])]))
}
Why do you want to use print() instead of storing the results in a separate vector?
You can try tidyverse as well:
my_df %>%
gather(key, Inds, -value) %>%
filter(!is.na(Inds)) %>%
group_by(key) %>%
summarise(Sum=sum(value))
# A tibble: 4 x 2
key Sum
<chr> <int>
1 ind_1 1225
2 ind_2 1250
3 ind_3 1275
4 ind_4 1300
Idea is to make the data long using gather. Filter the NAs out, then group by Inds and summarize the values.
A more base R solution would be:
library(reshape2)
my_df_long <- melt(my_df, id.vars = "value",value.name = "ID")
aggregate(value ~ ID, my_df_long, sum, na.rm= T)
ID value
1 ind_1 1225
2 ind_2 1250
3 ind_3 1275
4 ind_4 1300
I need to aggregate multiple months from original data with dataframe in R, e.g: data frame with datetime include 2017 and 2018.
date category amt
1 2017-08-05 A 0.1900707
2 2017-08-06 B 0.2661277
3 2017-08-07 c 0.4763196
4 2017-08-08 A 0.5183718
5 2017-08-09 B 0.3021019
6 2017-08-10 c 0.3393616
What I want is to sum based on 6 month period and category:
period category sum
1 2017_secondPeriod A 25.00972
2 2018_firstPeriod A 25.59850
3 2017_secondPeriod B 24.96924
4 2018_firstPeriod B 24.79649
5 2017_secondPeriod c 20.17096
6 2018_firstPeriod c 27.01794
What I did:
1. select the last 6 months of 2017, like wise 2018
2. add a new column for each subset to indicate the period
3. Combine 2 subset again
4. aggregate
as following:
library(lubridate)
df <- data.frame(
date = today() + days(1:300),
category = c("A","B","c"),
amt = runif(300)
)
df2017_secondHalf <- subset(df, month(df$date) %in% c(7,8,9,10,11,12) & year(df$date) == 2017)
f2018_firstHalf <- subset(df, month(df$date) %in% c(1,2,3,4,5,6) & year(df$date) == 2018)
sum1 <- aggregate(df2017_secondHalf$amt, by=list(Category=df2017_secondHalf$Category), FUN=sum)
sum2 <- aggregate(df2018_firstHalf$amt, by=list(Category=df2018_secondHalf$Category), FUN=sum)
df2017_secondHalf$period <- '2017_secondPeriod'
df2018_firstHalf$period <- '2018_firstPeriod'
aggregate(x = df$amt, by = df[c("period", "category")], FUN = sum)
I try to figure out but did not know how to aggregate multple months e.g, 3 months, or 6 months.
Thanks in advance
Any suggesstion?
With lubridate and tidyverse (dplyr & magrittr)
First, let's create groups with Semesters, Quarter, and "Trimonthly".
library(tidyverse)
library(lubridate)
df <- df %>% mutate(Semester = semester(date, with_year = TRUE),
Quarter = quarter(date, with_year = TRUE),
Trimonthly = round_date(date, unit = "3 months" ))
Lubridate's semester() breaks by semsters and gives you a 1 (Jan-Jun) or 2 (Jul-Aug); quarter() does a similar thing with quarters.
I add a third, the more basic round_date function, where you can specify your time frame in the form of size and time units. It yields the first date of such time frame. I deliberately name it "Trimonthly" so you can see how it compares to quarter()
Pivot.Semester <- df %>%
group_by(Semester, category) %>%
summarise(Semester.sum = sum(amt))
Pivot.Quarter <- df %>%
group_by(Quarter, category) %>%
summarise(Quarter.sum = sum(amt))
Pivot.Trimonthly <- df %>%
group_by(Trimonthly, category) %>%
summarise(Trimonthly.sum = sum(amt))
Pivot.Semester
Pivot.Quarter
Pivot.Trimonthly
Optional: If you want to join the summarised data to the original DF.
df <- df %>% left_join(Pivot.Semester, by = c("category", "Semester")) %>%
left_join(Pivot.Quarter, by = c("category", "Quarter")) %>%
left_join(Pivot.Trimonthly, by = c("category", "Trimonthly"))
df
Here is a 3 line solution that uses no package. Let k be the number of months in a period. For half year periods k is 6. For quarter year periods k would be 3, etc. Replace 02 in the sprintf format with 1 if you want one digit suffices (but not for monthly since those must be two digit). Further modify the sprintf format if you want it to exactly match the question.
k <- 6
period <- with(as.POSIXlt(DF$date), sprintf("%d-%02d", year + 1900, (mon %/% k) + 1))
aggregate(amt ~ category + period, DF, sum)
giving:
category period amt
1 A 2017-02 0.7084425
2 B 2017-02 0.5682296
3 c 2017-02 0.8156812
At the expense of using one package we can simplify the quarterly and monthly calculations by replacing the formula for period with one of these:
library(zoo)
# quarterly
period <- as.yearqtr(DF$date)
# monthly
period <- as.yearmon(DF$date)
Note: The input in reproducible form is:
Lines <- "date category amt
1 2017-08-05 A 0.1900707
2 2017-08-06 B 0.2661277
3 2017-08-07 c 0.4763196
4 2017-08-08 A 0.5183718
5 2017-08-09 B 0.3021019
6 2017-08-10 c 0.3393616"
DF <- read.table(text = Lines)
DF$date <- as.Date(DF$date)
I don't often have to work with dates in R, but I imagine this is fairly easy. I have a column that represents a date in a dataframe. I simply want to create a new dataframe that summarizes a 2nd column by Month/Year using the date. What is the best approach?
I want a second dataframe so I can feed it to a plot.
Any help you can provide will be greatly appreciated!
EDIT: For reference:
> str(temp)
'data.frame': 215746 obs. of 2 variables:
$ date : POSIXct, format: "2011-02-01" "2011-02-01" "2011-02-01" ...
$ amount: num 1.67 83.55 24.4 21.99 98.88 ...
> head(temp)
date amount
1 2011-02-01 1.670
2 2011-02-01 83.550
3 2011-02-01 24.400
4 2011-02-01 21.990
5 2011-02-03 98.882
6 2011-02-03 24.900
I'd do it with lubridate and plyr, rounding dates down to the nearest month to make them easier to plot:
library(lubridate)
df <- data.frame(
date = today() + days(1:300),
x = runif(300)
)
df$my <- floor_date(df$date, "month")
library(plyr)
ddply(df, "my", summarise, x = mean(x))
There is probably a more elegant solution, but splitting into months and years with strftime() and then aggregate()ing should do it. Then reassemble the date for plotting.
x <- as.POSIXct(c("2011-02-01", "2011-02-01", "2011-02-01"))
mo <- strftime(x, "%m")
yr <- strftime(x, "%Y")
amt <- runif(3)
dd <- data.frame(mo, yr, amt)
dd.agg <- aggregate(amt ~ mo + yr, dd, FUN = sum)
dd.agg$date <- as.POSIXct(paste(dd.agg$yr, dd.agg$mo, "01", sep = "-"))
A bit late to the game, but another option would be using data.table:
library(data.table)
setDT(temp)[, .(mn_amt = mean(amount)), by = .(yr = year(date), mon = months(date))]
# or if you want to apply the 'mean' function to several columns:
# setDT(temp)[, lapply(.SD, mean), by=.(year(date), month(date))]
this gives:
yr mon mn_amt
1: 2011 februari 42.610
2: 2011 maart 23.195
3: 2011 april 61.891
If you want names instead of numbers for the months, you can use:
setDT(temp)[, date := as.IDate(date)
][, .(mn_amt = mean(amount)), by = .(yr = year(date), mon = months(date))]
this gives:
yr mon mn_amt
1: 2011 februari 42.610
2: 2011 maart 23.195
3: 2011 april 61.891
As you see this will give the month names in your system language (which is Dutch in my case).
Or using a combination of lubridate and dplyr:
temp %>%
group_by(yr = year(date), mon = month(date)) %>%
summarise(mn_amt = mean(amount))
Used data:
# example data (modified the OP's data a bit)
temp <- structure(list(date = structure(1:6, .Label = c("2011-02-01", "2011-02-02", "2011-03-03", "2011-03-04", "2011-04-05", "2011-04-06"), class = "factor"),
amount = c(1.67, 83.55, 24.4, 21.99, 98.882, 24.9)),
.Names = c("date", "amount"), class = c("data.frame"), row.names = c(NA, -6L))
You can do it as:
short.date = strftime(temp$date, "%Y/%m")
aggr.stat = aggregate(temp$amount ~ short.date, FUN = sum)
Just use xts package for this.
library(xts)
ts <- xts(temp$amount, as.Date(temp$date, "%Y-%m-%d"))
# convert daily data
ts_m = apply.monthly(ts, FUN)
ts_y = apply.yearly(ts, FUN)
ts_q = apply.quarterly(ts, FUN)
where FUN is a function which you aggregate data with (for example sum)
Here's a dplyr option:
library(dplyr)
df %>%
mutate(date = as.Date(date)) %>%
mutate(ym = format(date, '%Y-%m')) %>%
group_by(ym) %>%
summarize(ym_mean = mean(x))
I have a function monyr that I use for this kind of stuff:
monyr <- function(x)
{
x <- as.POSIXlt(x)
x$mday <- 1
as.Date(x)
}
n <- as.Date(1:500, "1970-01-01")
nn <- monyr(n)
You can change the as.Date at the end to as.POSIXct to match the date format in your data. Summarising by month is then just a matter of using aggregate/by/etc.
One more solution:
rowsum(temp$amount, format(temp$date,"%Y-%m"))
For plot you could use barplot:
barplot(t(rowsum(temp$amount, format(temp$date,"%Y-%m"))), las=2)
Also, given that your time series seem to be in xts format, you can aggregate your daily time series to a monthly time series using the mean function like this:
d2m <- function(x) {
aggregate(x, format(as.Date(zoo::index(x)), "%Y-%m"), FUN=mean)
}