How to switch rows in R? - r

I have a array with following content:
> head(MEAN)
1901DJF 1901JJA 1901MAM 1901SON 1902DJF 1902JJA
-0.45451556 -0.72922229 -0.17669396 -1.12095590 -0.86523850 -0.04031273
This should be a time series with seasonal mean values from 1901 to 2009. The problem is that the generated column heads are strictly alphabetically ordered. However, in terms of season this doesn't make to much sense, e.g. JJA (june, july, august) is leading MAM (march, april, may).
How could I switch each MAM and JJA entry of the array?
PS: MEAN is generated applying tapply on the data.frame pdsi
> head(pdsi)
date scPDSI month seas seasyear
1 1901-01-01 -0.10881074 Jan DJF 1901DJF
2 1901-02-01 -0.22287750 Feb DJF 1901DJF
3 1901-03-01 -0.12233192 Mär MAM 1901MAM
4 1901-04-01 -0.04440915 Apr MAM 1901MAM
5 1901-05-01 -0.36334082 Mai MAM 1901MAM
6 1901-06-01 -0.52079030 Jun JJA 1901JJA
>
> MEAN <- tapply(pdsi$scPDSI, ts.pdsi$seasyear, mean, na.rm = T)
May be there is also known a more elegant way to calculate seasonal means...

You can change the order of the factor levels:
pdsi[["seasyear"]] = factor(pdsi[["seasyear"]], levels = c("1901DJF", "1901MAM", etc))

I think this is a fairly simple way of re-ordering your means, however, it does have the assumption that your data is already ordered chronologically in the data set. So if that holds this should work.
I also created some random data, rather than copying your data, but the results should be the same
seasons = c("1901DJF", "1901MAM", "1901JJA")
seasons = rep(seasons, c(2, 3, 1))
data = data.frame(runif(1:6), seasons)
MEAN = tapply(data[,1], data[,2], mean)
1901DJF 1901JJA 1901MAM
0.5799779 0.3724785 0.6514327
order = unique(seasons)
MEAN[order]
1901DJF 1901MAM 1901JJA
0.5799779 0.6514327 0.3724785
What this does is take the order of seasyear in the data set, and reorders the object MEAN to reflect that order. Again, it assumes your data is chronologically ordered in the raw file, but I think this is a safe assumption. Apologies if it is not the case.

Related

XTS:: Help me on the usage & differences between period.apply() & to.period()

I am learning time series analysis with R and came across these 2 functions while learning. I do understand that the output of both of these is a periodic data defined by the frequency of period and the only difference I can see is the OHLC output option in the to.period().
Other than the OHLC when a particular of these functions is to be used?
to.period and all the to.minutes, to.weekly, to.quarterly are indeed meant for OHLC data.
If you take the function to.period it will take the open from the first day of the period, the close of the last day of the period and the highest high / lowest low of the specified period. These functions work very well together with the quantmod / tidyquant / quantstrat packages. See code example 1.
If you give the to.period non-OHLC data, but a timeseries with 1 data column, you still get a sort of OHLC back. See code example 2.
Now period.apply is is more interesting. Here you can supply your own functions to be applied on the data. Especially in combination with endpoints this can be a powerful function in timeseries data if you want to aggregate your function to different time periods. The index is mostly specified with endpoints, since with endpoints you can create the index you need to get to higher time levels (from day to week / etc etc). See code example 3 and 4.
Remember to use matrix functions with period.apply if you have more than 1 column of data since xts is basicly a matrix and an index. See code example 5.
More info on this data.camp course.
library(xts)
data(sample_matrix)
zoo.data <- zoo(rnorm(31)+10,as.Date(13514:13744,origin="1970-01-01"))
# code example 1
to.quarterly(sample_matrix)
sample_matrix.Open sample_matrix.High sample_matrix.Low sample_matrix.Close
2007 Q1 50.03978 51.32342 48.23648 48.97490
2007 Q2 48.94407 50.33781 47.09144 47.76719
# same as to.quarterly
to.period(sample_matrix, period = "quarters")
sample_matrix.Open sample_matrix.High sample_matrix.Low sample_matrix.Close
2007 Q1 50.03978 51.32342 48.23648 48.97490
2007 Q2 48.94407 50.33781 47.09144 47.76719
# code example 2
to.period(zoo.data, period = "quarters")
zoo.data.Open zoo.data.High zoo.data.Low zoo.data.Close
2007-03-31 9.039875 11.31391 7.451139 10.35057
2007-06-30 10.834614 11.31391 7.451139 11.28427
2007-08-19 11.004465 11.31391 7.451139 11.30360
# code example 3 using base standard deviation in the chosen period
period.apply(zoo.data, endpoints(zoo.data, on = "quarters"), sd)
2007-03-31 2007-06-30 2007-08-19
1.026825 1.052786 1.071758
# self defined function of summing x + x for the period
period.apply(zoo.data, endpoints(zoo.data, on = "quarters"), function(x) sum(x + x) )
2007-03-31 2007-06-30 2007-08-19
1798.7240 1812.4736 993.5729
# code example 5
period.apply(sample_matrix, endpoints(sample_matrix, on = "quarters"), colMeans)
Open High Low Close
2007-03-31 50.15493 50.24838 50.05231 50.14677
2007-06-30 48.47278 48.56691 48.36606 48.45318

Differencing with respect to specific value of a column

I have a variable called Depression which has 40 observations and goes from 2004 to 2013 quarterly (e.g. 2004 Q1, 2004 Q2 etc.) I would like to make a new column which differences with respect to the 27th row/observations which corresponds with 2010 Q3 and set that value to 0. Any help is greatly appreciated!
If I understand correctly your question, this would do it:
# generate sample data
dat <- data.frame(id=paste0("Obs.",1:40),depression=as.integer(runif(40,0,20)))
# Create new var that calculates difference with 27th observation on depression score
dat$diff <- dat$depression - dat$depression[27]

Having trouble with R's time series objects

I have a column of 84 monthly expenditures from 1/2004 - 12/2010, which in Excel looks like...
12247815.55
11812697.14
13741176.13
21372260.37
27412419.28
42447077.96
55563235.3
45130678.8
54579583.53
43406197.32
34318334.64
25321371.4
...(74 more entries)
I am trying to run an stl() from the forecast package on this series, and so I load the data:
d <- ts(read.csv("deseason_vVectForTS.csv",
header = TRUE),
start=c(2004,1),
end=c(2010,12),
frequency = 12)
(If I do header=FALSE it will absorb the first entry - 122...- as the header for the second column, and name the first column's header 'X')
But instead of my environment being populated with a Time Series Object from 2004 to 2011 (as it has said before) it simply says ts[1:84, 1].
Probably related is the fact that,
fit <- stl(d)
throws
Error in stl(d) : only univariate series are allowed.
despite the fact that
head(d)
[1] 12247816 11812697 13741176 21372260 27412419 42447078
and
d
Jan Feb Mar Apr May Jun Jul Aug Sep Oct
2004 12247816 11812697 13741176 21372260 27412419 42447078 55563235 45130679 54579584 43406197
("years 2005-2010 look exactly the same, and all rows have columns for Jan-Dec; it just doesn't fit on here neatly - just trying to show the object has taken the ts labeling structure.")
What am I doing wrong? As far as I know this is the same way I have been building my time series objects in the past...
read.csv reads in a matrix. If it only has one column, it is still a matrix. To make it a vector use
d <- ts(read.csv("deseason_vVectForTS.csv",
header = TRUE)[,1],
start=c(2004,1),
end=c(2010,12),
frequency = 12)
Also, please check your facts. stl is in the stats package, not the forecast package. This is easily checked by using help(stl).

Eliminating Existing Observations in a Zoo Merge

I'm trying to do a zoo merge between stock prices from selected trading days and observations about those same stocks (we call these "Nx observations") made on the same days. Sometimes do not have Nx observations on stock trading days and sometimes we have Nx observations on non-trading days. We want to place an "NA" where we do not have any Nx observations on trading days but eliminate Nx observations where we have them on non-trading day since without trading data for the same day, Nx observations are useless.
The following SO question is close to mine, but I would characterize that question as REPLACING missing data, whereas my objective is to truly eliminate observations made on non-trading days (if necessary, we can change the process by which Nx observations are taken, but it would be a much less expensive solution to leave it alone).
merge data frames to eliminate missing observations
The script I have prepared to illustrate follows (I'm new to R and SO; all suggestions welcome):
# create Stk_data data.frame for use in the Stack Overflow question
Date_Stk <- c("1/2/13", "1/3/13", "1/4/13", "1/7/13", "1/8/13") # dates for stock prices used in the example
ABC_Stk <- c(65.73, 66.85, 66.92, 66.60, 66.07) # stock prices for tkr ABC for Jan 1 2013 through Jan 8 2013
DEF_Stk <- c(42.98, 42.92, 43.47, 43.16, 43.71) # stock prices for tkr DEF for Jan 1 2013 through Jan 8 2013
GHI_Stk <- c(32.18, 31.73, 32.43, 32.13, 32.18) # stock prices for tkr GHI for Jan 1 2013 through Jan 8 2013
Stk_data <- data.frame(Date_Stk, ABC_Stk, DEF_Stk, GHI_Stk) # create the stock price data.frame
# create Nx_data data.frame for use in the Stack Overflow question
Date_Nx <- c("1/2/13", "1/4/13", "1/5/13", "1/6/13", "1/7/13", "1/8/13") # dates for Nx Observations used in the example
ABC_Nx <- c(51.42857, 51.67565, 57.61905, 57.78349, 58.57143, 58.99564) # Nx scores for stock ABC for Jan 1 2013 through Jan 8 2013
DEF_Nx <- c(35.23809, 36.66667, 28.57142, 28.51778, 27.23150, 26.94331) # Nx scores for stock DEF for Jan 1 2013 through Jan 8 2013
GHI_Nx <- c(7.14256, 8.44573, 6.25344, 6.00423, 5.99239, 6.10034) # Nx scores for stock GHI for Jan 1 2013 through Jan 8 2013
Nx_data <- data.frame(Date_Nx, ABC_Nx, DEF_Nx, GHI_Nx) # create the Nx scores data.frame
# create zoo objects & merge
z.Stk_data <- zoo(Stk_data, as.Date(as.character(Stk_data[, 1]), format = "%m/%d/%Y"))
z.Nx_data <- zoo(Nx_data, as.Date(as.character(Nx_data[, 1]), format = "%m/%d/%Y"))
z.data.outer <- merge(z.Stk_data, z.Nx_data)
The NAs on Jan 3 2013 for the Nx observations are fine (we'll use the na.locf) but we need to eliminate the Nx observations that appear on Jan 5 and 6 as well as the associated NAs in the Stock price section of the zoo objects.
I've read the R Documentation for merge.zoo regarding the use of "all": that its use "allows
intersection, union and left and right joins to be expressed". But trying all combinations of the
following use of "all" yielded the same results (as to why would be a secondary question).
z.data.outer <- zoo(merge(x = Stk_data, y = Nx_data, all.x = FALSE)) # try using "all"
While I would appreciate comments on the secondary question, I'm primarily interested in learning how to eliminate the extraneous Nx observations on days when there is no trading of stocks. Thanks. (And thanks in general to the community for all the great explanations of R!)
The all argument of merge.zoo must be (quoting from the help file):
logical vector having the same length as the number of "zoo" objects to be merged
(otherwise expanded)
and you want to keep all rows from the first argument but not the second so its value should be c(TRUE, FALSE).
merge(z.Stk_data, z.Nx_data, all = c(TRUE, FALSE))
The reason for the change in all syntax for merge.zoo relative to merge.data.frame is that merge.zoo can merge any number of arguments whereas merge.data.frame only handles two so the syntax had to be extended to handle that.
Also note that %Y should have been %y in the question's code.
I hope I have understood your desired output correctly ("NAs on Jan 3 2013 for the Nx observations are fine"; "eliminate [...] observations that appear on Jan 5 and 6"). I don't quite see the need for zoo in the merging step.
merge(Stk_data, Nx_data, by.x = "Date_Stk", by.y = "Date_Nx", all.x = TRUE)
# Date_Stk ABC_Stk DEF_Stk GHI_Stk ABC_Nx DEF_Nx GHI_Nx
# 1 1/2/13 65.73 42.98 32.18 51.42857 35.23809 7.14256
# 2 1/3/13 66.85 42.92 31.73 NA NA NA
# 3 1/4/13 66.92 43.47 32.43 51.67565 36.66667 8.44573
# 4 1/7/13 66.60 43.16 32.13 58.57143 27.23150 5.99239
# 5 1/8/13 66.07 43.71 32.18 58.99564 26.94331 6.10034

R: left sided moving average for periods (months)

I have a question which might be trivial for most of you guys. I tried a lot, didn't come to a solution, so I would be glad if somebody could give me a hint. The starting point is a weekly xts-time series.
Month Week Value Goal
Dec 2011 W50 a a
Dec 2011 W51 b mean(a,b)
Dec 2011 W52 c mean(a,b,c)
Dec 2011 W53 d mean(a,b,c,d)
Jan 2012 W01 e e
Jan 2012 W02 f mean(e,f)
Jan 2012 W03 g mean(e,f,g)
Jan 2012 W04 h mean(e,f,g,h)
Feb 2012 W05 i i
Feb 2012 W06 j mean(i,j)
Please excuse the Excel notation, but I think it makes it pretty clear what I want to do: I want to calculate a left sided moving average for the column "Value" but just for the respective month, as it is displayed in the column Goal. I experimented with apply.monthly() and period.apply(). But it didn't get me what I want. Can sombody of you give me a hint how to solve the problem? Just a hint which function I should use would be already enough!
Thank you very much!
Best regards,
Andreas
apply.monthly will not work because it only assigns one value to the endpoint of the period, whereas you want to assign many values to each monthly period.
You can do this pretty easily by splitting your xts data by month, applying a cumulative mean function to each, and rbind'ing the list back together.
library(quantmod)
# Sample data
getSymbols("SPY")
spy <- to.weekly(SPY)
# Cumulative mean function
cummean <- function(x) cumsum(x)/seq_along(x)
# Expanding average calculation
spy$EA <- do.call(rbind, lapply(split(Cl(spy),'months'), cummean))
I hope I got your question right. but is it that what you are looking for:
require(plyr)
require(PerformanceAnalytics)
ddply(data, .(Week), summarize, Goal=apply.fromstart(Value,fun="mean"))
this should work - though a reproducible expample would have been nice.
here's what it does.
df <- data.frame(Week=rep(1:5, each=5), Value=c(1:25)*runif(25)) #sample data
require(plyr)
require(PerformanceAnalytics)
df$Goal <- ddply(df, .(Week), summarize, Goal=apply.fromstart(Value,FUN="mean"))[,2]
outcome:
Week Value Goal
1 1 0.7528037 0.7528037
2 1 1.9622622 1.3575330
3 1 0.3367802 1.0172820
4 1 2.5177284 1.3923936
of course you may obtain further info via the help: ?ddply or ?apply.fromstart.

Resources