How to calculate the average year - r

I have a 20-year monthly XTS time series
Jan 1990 12.3
Feb 1990 45.6
Mar 1990 78.9
..
Jan 1991 34.5
..
Dec 2009 89.0
I would like to get the average (12-month) year, or
Jan xx
Feb yy
...
Dec kk
where xx is the average of every January, yy of every February, and so on.
I have tried apply.yearly and lapply but these return 1 value, which is the 20-year total average
Would you have any suggestions? I appreciate it.

The lubridate package could be useful for you. I would use the functions year() and month() in conjunction with aggregate():
library(xts)
library(lubridate)
#set up some sample data
dates = seq(as.Date('2000/01/01'), as.Date('2005/01/01'), by="month")
df = data.frame(rand1 = runif(length(dates)), rand2 = runif(length(dates)))
my_xts = xts(df, dates)
#get the mean by year
aggregate(my_xts$rand1, by=year(index(my_xts)), FUN=mean)
This outputs something like:
2000 0.5947939
2001 0.4968154
2002 0.4941752
2003 0.5291211
2004 0.6631564
To find the mean for each month you can do:
#get the mean by month
aggregate(my_xts$rand1, by=month(index(my_xts)), FUN=mean)
which will output something like
1 0.5560279
2 0.6352220
3 0.3308571
4 0.6709439
5 0.6698147
6 0.7483192
7 0.5147294
8 0.3724472
9 0.3266859
10 0.5331233
11 0.5490693
12 0.4642588

Related

How to find out how many trading days in each month in R?

I have a dataframe like this. The time span is 10 years. Because it's Chinese market data, and China has Lunar Holidays. So each year have different holiday times in terms of the western calendar.
When it is a holiday, the stock market does not open, so it is a non-trading day. Weekends are non-trading days too.
I want to find out which month of which year has the least number of trading days, and most importantly, what number is that.
There are not repeated days.
date change open high low close volume
1 1995-01-03 -1.233 637.72 647.71 630.53 639.88 234518
2 1995-01-04 2.177 641.90 655.51 638.86 653.81 422220
3 1995-01-05 -1.058 656.20 657.45 645.81 646.89 430123
4 1995-01-06 -0.948 642.75 643.89 636.33 640.76 487482
5 1995-01-09 -2.308 637.52 637.55 625.04 625.97 509851
6 1995-01-10 -2.503 616.16 617.60 607.06 610.30 606925
If there are not repeated days, you can count days per month and year by:
library(data.table) "maxx"))), .Names = c("X2005", "X2006", "X2007", "X2008"))
library(lubridate)
dt <- as.data.table(dt)
dt_days <- dt[, .(count_day=.N), by=.(year(date), month(date))]
Then you only need to do this to get the min:
dt_days[count_day==min(count_day)]
The chron and bizdays packages deal with business days but neither actually contains a usable calendar of holidays limiting their usefulness.
We will use chron below assuming you have defined the .Holidays vector of dates that are holidays. (If you run the code below without doing that only weekdays will be regarded as business days as the default .Holidays vector supplied by chron has very few dates in it.) DF has 120 rows (one row for each year/month) and the last line subsets that to just the month in each year having least business days.
library(chron)
library(zoo)
st <- as.yearmon("2001-01")
en <- as.yearmon("2010-12")
ym <- seq(st, en, 1/12) # sequence of year/months of interest
# no of business days in each yearmonth
busdays <- sapply(ym, function(x) {
s <- seq(as.Date(x), as.Date(x, frac = 1), "day")
sum(!is.weekend(s) & !is.holiday(s))
})
# data frame with one row per year/month
yr <- as.integer(ym)
DF <- data.frame(year = yr, month = cycle(ym), yearmon = ym, busdays)
# data frame with one row per year
wx.min <- ave(busdays, yr, FUN = function(x) which.min(x) == seq_along(x))
DF[wx.min == 1, ]
giving:
year month yearmon busdays
2 2001 2 Feb 2001 20
14 2002 2 Feb 2002 20
26 2003 2 Feb 2003 20
38 2004 2 Feb 2004 20
50 2005 2 Feb 2005 20
62 2006 2 Feb 2006 20
74 2007 2 Feb 2007 20
95 2008 11 Nov 2008 20
98 2009 2 Feb 2009 20
110 2010 2 Feb 2010 20

R Studio: look up a value in table(both direction V&H), then use as a variable in loop

I am dealing with a dataset ("IndexTable") have 3 million+ observations. Please see following for the first 6 observations:
Identity gender type amount Year Month
1 65 F W 31.88 1987 Jan
2 23 M P 29.21 1985 Mar
3 45 F W 44.70 1987 Jan
4 47 F W 72.64 1987 Jan
5 56 M P 28.92 1986 Jul
6 09 F W 34.32 1990 Jan
and the index table ("index") from which the value will be searched (part of the table):
year average Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1 1950 32.84210 33.19118 33.10321 33.01572 32.89977 32.81334 32.98665 32.98665 33.10321 32.89977 32.55677 32.41595 32.24857
2 1951 30.09866 31.94615 31.64936 31.43694 30.94371 30.19568 30.09866 29.64623 29.50617 29.29854 29.09382 28.98131 28.78098
3 1952 27.56470 28.28139 28.25313 28.11271 27.67259 27.67259 27.21981 27.24604 27.40444 27.45766 27.21981 27.24604 27.06353
4 1953 26.73099 27.08945 27.01183 26.83243 26.58025 26.68055 26.53038 26.53038 26.70575 26.75628 26.75628 26.68055 26.78162
5 1954 26.25941 26.73099 26.78162 26.53038 26.43120 26.50552 26.35730 25.92244 26.08984 26.13807 26.01783 25.89871 25.75718
6 1955 25.11668 25.66369 25.66369 25.66369 25.52472 25.57087 25.04994 24.96151 25.13901 24.98356 24.72149 24.33854 24.33854
For each observation in "IndexTable", I would like to find the value in "index" which match the Year and Month, then use the value to multiply it's amount to get the adjusted amount.
Thanks in advance J
Using the dplyr and tidyr package:
index_long <- index %>%
gather(Month, multiplier, Jan:Dec) %>%
select(-average)
left_join(IndexTable, index_long, by = c("Year" = "year", "Month" = "Month")) %>%
mutate(adjusted_amount = amount*multiplier)
First I gather the Month columns into one column with the value column multiplier.
I drop the average column, because it doesn't need to be joined to the other table. Then by using a left join only does value with a matching year month combination will be joined to the IndexTable.
Then finally I used the multiplier to create the new column adjusted_amount

Weekends in a Month in R

I am trying to prepare an xreg serie for my Arima model and I will use number of weekends in a month for it. I can find results for a year but when it is longer than a year, it usually is, I couldn't find a way. Here is what I do so far.
dates <- seq(from=as.Date("2001-01-01"), to=as.Date("2010-12-31"), by = "day")
wd <- weekdays(dates)
aylar <- months(dates[which(wd == "Sunday" | wd == "Satuday")])
table(aylar)
What I want is gathering all months' weekends not based on only months but also years. So that I can have the same length of serie with my original forecast serie.
Here is my solution:
library(chron)
library(dplyr)
library(lubridate)
month <- months(dates[chron::is.weekend(dates)])
day <- dates[chron::is.weekend(dates)]
# create data.frame
df <- data.frame(date = day, month = month, year = chron::years(day))
df %>% group_by(year, month) %>% summarize(weekends = floor(n()/2))
# year month weekends
# <dbl> <fctr> <dbl>
#1 2001 April 4
#2 2001 August 4
#3 2001 Dezember 5
#4 2001 Februar 4
#5 2001 Januar 4
#6 2001 Juli 4
#7 2001 Juni 4
#8 2001 Mai 4
#9 2001 März 4
#10 2001 November 4
## ... with 110 more rows
I hope this is a starting point for your work.

Sum daily values into monthly values

I am trying to sum daily rainfall values into monthly totals for a record over 100 years in length. My data takes the form:
Year Month Day Rain
1890 1 1 0
1890 1 2 3.1
1890 1 3 2.5
1890 1 4 15.2
In the example above I want R to sum all the days of rainfall in January 1890, then February 1890, March 1890.... through to December 2010. I guess what I'm trying to do is create a loop to sum values. My output file should look like:
Year Month Rain
1890 1 80.5
1890 2 72.4
1890 3 66.8
1890 4 77.2
Any easy way to do this?
Many thanks.
You can use dplyr for some pleasing syntax
library(dplyr)
df %>%
group_by(Year, Month) %>%
summarise(Rain = sum(Rain))
In some cases it can be beneficial to convert it to a time-series class like xts, then you can use functions like apply.monthly().
Data:
df <- data.frame(
Year = rep(1890,5),
Month = c(1,1,1,2,2),
Day = 1:5,
rain = rexp(5)
)
> head(df)
Year Month Day rain
1 1890 1 1 0.1528641
2 1890 1 2 0.1603080
3 1890 1 3 0.5363315
4 1890 2 4 0.6368029
5 1890 2 5 0.5632891
Convert it to xts and use apply.monthly():
library(xts)
dates <- with(df, as.Date(paste(Year, Month, Day), format("%Y %m %d")))
myXts <- xts(df$rain, dates)
> head(apply.monthly(myXts, sum))
[,1]
1890-01-03 0.8495036
1890-02-05 1.2000919

Aggregate count of timeseries values which exceed threshold, by year-month

I am now learning R and using the SEAS package to help me with some calculation in R and data is the same format as SEAS package likes. It is a time series
require(seas)
data(mscdata)
dat.int <- (mksub(mscdata, id=1108447))
the heading of the data and it is 20 years of data
year yday date t_max t_min t_mean rain snow precip
However, I now need to calculate the number of days in each month rainfall is >= 1.0mm . So at the end of it. I would have two columns ( each month in each year and total # of days in each month rainfall>= 1.0mm )
I'm not certain how to write this code and any help would be appreciated
Thank you
Lam
I now need to calculate the number of days in each month rainfall is >= 1.0mm. So at the end of it. I would have two columns ( each month in each year and total # of days in each month rainfall>= 1.0mm )
1) So dat.int$date is a Date object. First step is you need to create a new column dat.int$yearmon extracting the year-month, e.g. using zoo::yearmon
Extract month and year from a zoo::yearmon object
require(zoo)
dat.int$yearmon <- as.yearmon(dat.int$date, "%b %y")
2) Second, you need to do a summarize operation (recommend you use plyr or the newer dplyr) on rain>=1.0 aggregated by yearmon. Let's name our resulting column rainy_days.
If you want to store rainy_days column back into the dat.int dataframe, you use a transform instead of a summarize:
ddply(dat.int, .(yearmon), transform, rainy_days=sum(rain >= 1.0) )
or else if you really just want a new summary dataframe:
require(plyr)
rainydays_by_yearmon <- ddply(dat.int, .(yearmon), summarize, rainy_days=sum(rain >= 1.0) )
print.data.frame(rainydays_by_yearmon)
yearmon rainy_days
1 Jan 1975 14
2 Feb 1975 12
3 Mar 1975 13
4 Apr 1975 6
5 May 1975 6
6 Jun 1975 5
...
355 Jul 2004 3
356 Aug 2004 7
357 Oct 2004 14
358 Nov 2004 16
359 Dec 2004 19
Note: you can do the above with plain old R, without using zoo or plyr/dplyr packages. But might as well teach you nicer, more scalable, maintainable code idioms.

Resources