I would like to use rollapply or rollapplyr to apply the modwt function to my time series data.
I'm familiar with how rollapply/r works but I need some help setting up the output so that I can correctly store my results when using rollapply.
The modwt function in the waveslim package takes a time series and decomposes it into J levels, for my particular problem J = 4 which means I will have 4 sets of coefficients from my single time series stored in a list of 5. Of this list I am only concerned with d1,d2,d3 & d4.
The output of the modwt function looks as follows
> str(ar1.modwt)
List of 5
$ d1: num [1:200] -0.223 -0.12 0.438 -0.275 0.21 ...
$ d2: num [1:200] 0.1848 -0.4699 -1.183 -0.9698 -0.0937 ...
$ d3: num [1:200] 0.5912 0.6997 0.5416 0.0742 -0.4989 ...
$ d4: num [1:200] 1.78 1.86 1.85 1.78 1.65 ...
$ s4: num [1:200] 4.64 4.42 4.19 3.94 3.71 ...
- attr(*, "class")= chr "modwt"
- attr(*, "wavelet")= chr "la8"
- attr(*, "boundary")= chr "periodic"
In the example above I have applied the modwt function to the full length time series of length 200 but I wish to apply it to a small rolling window of 30 using rollapply.
I have already tried the following but the output is a large matrix and I cannot easily identify which values belong to d1,d2,d3 or d4
roller <- rollapplyr(ar1, 30,FUN=modwt,wf="la8",n.levels=4,boundary="periodic")
The output of this is a large matrix with the following structure:
> str(roller)
List of 855
$ : num [1:30] 0.117 -0.138 0.199 -1.267 1.872 ...
$ : num [1:30] -0.171 0.453 -0.504 -0.189 0.849 ...
$ : num [1:30] 0.438 -0.3868 0.1618 -0.0973 -0.0247 ...
$ : num [1:30] -0.418 0.407 0.639 -2.013 1.349 ...
...lots of rows omitted...
$ : num [1:30] 0.307 -0.658 -0.105 1.128 -0.978 ...
[list output truncated]
- attr(*, "dim")= int [1:2] 171 5
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:5] "d1" "d2" "d3" "d4" ...
How can I set up a variable such that it will store the (200-30)+1 lists with lists within this for each of the scales d1,d2,d3 and d4?
For a reproducible example please use the following:
library(waveslim)
data(ar1)
ar1.modwt <- modwt(ar1, "la8", 4)
Define modwt2 which invokes modwt, takes the first 4 components and strings them out into a numeric vector. Then use rollapplyr with that giving rollr where each row of rollr is the result of one call to modwt2. Finally, reshape each row of rollr into a separate matrix and create a list, L, of those matrices:
modwt2 <- function(...) unlist(head(modwt(...), 4))
rollr <- rollapplyr(ar1, 30, FUN = modwt2, wf = "la8", n.levels = 4, boundary = "periodic")
L <- lapply(1:nrow(rollr), function(i) matrix(rollr[i,], , 4))
If a 30 x 4 x 171 array is desired then the following will simplify it into a 3d array:
simplify2array(L)
or as a list of lists:
lapply(L, function(x) as.list(as.data.frame(x)))
2) This is an alternate solution that just uses lapply directly and returns a list each of whose components is the list consisting of d1, d2, d3 and d4.
lapply(1:(200-30+1), function(i, ...) head(modwt(ar1[seq(i, length = 30)], ...), 4),
wf = "la8", n.levels = 4, boundary = "periodic")
Updates: Code improvements, expand (1) and add (2).
Related
I have 4 columns and 34 rows of data. I tried to export it into excel with xlsx format using write.xlsx. But when I convert it, the excel file only shows 1 data.
library(openxlsx)
data = scale(DATA2)
write.xlsx(data, "outpu2t.xlsx");
This is my data
and this is the output
The key consideration here is that the output of the scale() function is an object of type matrix() when write.xlsx() requires an input of type data.frame(). The following code creates a data frame, uses scale() to scale it, and prints the structure to show that the data frame has bene converted to a matrix().
df <- data.frame(matrix(runif(4 * 34),ncol=4))
str(df)
> df <- data.frame(matrix(runif(4 * 34),ncol=4))
> str(df)
'data.frame': 34 obs. of 4 variables:
$ X1: num 0.438 0.134 0.671 0.392 0.613 ...
$ X2: num 0.9 0.793 0.668 0.351 0.275 ...
$ X3: num 0.201 0.892 0.74 0.788 0.14 ...
$ X4: num 0.996 0.619 0.492 0.904 0.615 ...
scaledData <- scale(df)
str(scaledData)
> scaledData <- scale(df)
> str(scaledData)
num [1:34, 1:4] -0.174 -1.386 0.752 -0.36 0.521 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:4] "X1" "X2" "X3" "X4"
- attr(*, "scaled:center")= Named num [1:4] 0.482 0.591 0.508 0.471
..- attr(*, "names")= chr [1:4] "X1" "X2" "X3" "X4"
- attr(*, "scaled:scale")= Named num [1:4] 0.251 0.206 0.306 0.29
..- attr(*, "names")= chr [1:4] "X1" "X2" "X3" "X4"
We can solve the problem by casting the output of scale() with data.frame().
The following code generates a 4 x 34 matrix, scales it, and casts to a data.frame() as part of write.xlsx().
aMatrix <- matrix(runif(4 * 34),ncol=4)
library(openxlsx)
write.xlsx(data.frame(scale(aMatrix)),"./data/aSpreadsheet.xlsx")
The resulting spreadsheet looks like this when viewed in Microsoft Excel.
Note that writexl::write_xlsx() will also fail when passed an input of type matrix(), so this is not a tidyverse vs. openxlsx problem.
b <- scale(aMatrix)
write_xlsx(b,"./data/aSpreadsheetWritexl.xlsx")
...generates the following error:
> write_xlsx(b,"./data/aSpreadsheetWritexl.xlsx")
Error in write_xlsx(b, "./data/aSpreadsheetWritexl.xlsx") :
Argument x must be a data frame or list of data frames
I am cognisant that you asked this question pointing to the package {openxlsx}. I used this a while back as well and ran into multiple problems. Being biased and leaning towards the {tidyverse} family, there is a cool package that comes from that part of the R/RStudio ecosystem: {writexl}.
If not yet installed: install.packages("writexl")
Then run the following without pain ... and it does not require to install other fancy stuff/dependencies/etc:
library(writexl)
# create a reproducible data set of 34 rows
my_data <- iris[1:34,]
# write-out my_data to the data subfolder in the project - configure as appropriate for your environment
write_xlsx(x = my_data, path = "./data/my_data.xlsx")
This gets you without problems:
The solution is to convert matrix data into a data frame.
data <- as.data.frame(data)
Then,
write.xlsx(data, "outpu2t.xlsx")
will work as expected
I would like to generate multivariate random data manipulating the sample size and variance using MASS::mvrnorm (or, as the case may prove to be, rnorm). This is fairly straightforward, however, the trick is that I intend to simulate a statistic with this generated data that compares two different sample sizes (of different lengths). Essentially, this creates a 3X3 design where I have three levels of 2 different sample sizes (e.g. [450,150], [300,300], [150,450]) crossed with three levels of variance (e.g. [1,1], [1,3], [1,10]).
n <- c(450,150,300,300,150,450) # sample sizes
sig <- matrix(c(1,1,1,3,1,10), nrow=2, byrow=F) # variance
mu <- (5, 5, 5) # mean is constant across all conditions
mvrnorm(n, mu, sig)
I'm sure that I have to iterate through my vector of sample sizes, just as I would if I were only generating one sample size per condition. But since I'm generating two sample sizes for each condition, I am unsure how to do this.
If I understand you correctly, you want six samples: 450 draws from N(5, 1), 150 draws from N(5, 1), 300 draws from N(5, 1), 300 draws from N(5, 3), 150 draws from N(5, 1), and 450 draws from N(5, 10). You can get that via
samples <- mapply(rnorm, n = c(450,150,300,300,150,450), sd = c(1,1,1,3,1,10))
Obviously I'll omit the output due to size, but you can see what I mean:
str(samples)
List of 6
$ : num [1:450] 0.785 -0.21 0.192 -0.265 -0.501 ...
$ : num [1:150] 1.224 -0.315 -0.131 -0.923 0.407 ...
$ : num [1:300] -0.413 -1.081 0.469 1.332 0.244 ...
$ : num [1:300] -0.748 -0.628 0.753 1.4 3.883 ...
$ : num [1:150] 0.376 -1.193 1.133 1.839 1.528 ...
$ : num [1:450] 2.19 -3.17 2.45 0.75 -8.4 ...
Then you want to run some sort of test between samples[[1]] and samples[[2]], then between samples[[3]] and samples[[4]], and finally between samples[[5]] and samples[[6]]. I don't know what test you intend to run, but that should be straightforward if you have a function for the test: Just feed in the proper list elements.
Update
Based on the comment, what you need to get all the sample combinations you want is
f <- function(sample_size_pairs, sd_pairs) {
return(sapply(1:nrow(sample_size_pairs), function(i) {
mapply(rnorm, n = sample_size_pairs[i, ], sd = sd_pairs[i])
}))
}
sample_sizes <- matrix(c(rep(c(450, 150), 3), rep(c(150, 450), 3),
rep(c(300, 300), 3)), ncol = 2, byrow = TRUE)
sds <- matrix(rep(c(1, 1, 1, 3, 1, 10), 3), ncol = 2, byrow = TRUE)
g <- f(sample_sizes, sds)
str(g)
List of 9
$ :List of 2
..$ : num [1:450] 1.4243 1.733 0.5004 -0.8036 -0.0101 ...
..$ : num [1:150] -0.0607 0.1797 0.3787 -0.6676 -1.4352 ...
$ :List of 2
..$ : num [1:450] -0.0766 -0.1407 -0.4893 0.2251 1.0174 ...
..$ : num [1:150] -1.8814 -1.3532 -1.2888 -0.0542 0.2637 ...
$ :List of 2
..$ : num [1:450] 1.945 -1.375 -1.258 0.292 -0.208 ...
..$ : num [1:150] -1.291 -0.557 -1.199 1.385 -2.062 ...
$ :List of 2
..$ : num [1:150] -2.461 -0.345 -1.454 -0.286 0.942 ...
..$ : num [1:450] -0.75 -0.636 -0.488 1.818 -0.585 ...
$ :List of 2
..$ : num [1:150] -1.238 -0.765 -1.447 -1.153 -1.466 ...
..$ : num [1:450] 2.5461 0.9368 -0.0503 -0.9727 -1.4101 ...
$ :List of 2
..$ : num [1:150] 0.7209 2.4342 -0.7617 0.0285 -1.3297 ...
..$ : num [1:450] -0.6882 0.0927 -0.8981 -0.4088 1.3421 ...
$ : num [1:300, 1:2] 2.217 -0.161 -0.976 0.26 -0.362 ...
$ : num [1:300, 1:2] 0.456 -0.112 -0.541 3.759 0.32 ...
$ : num [1:300, 1:2] 0.165 0.247 -0.187 -0.624 -1.335 ...
I need a faster way of doing linear regression than the lm() method. I found that lm.fit() is quite a bit faster but I'm wondering how to use the results. For example using this code:
x = 1:5
y = 5:1
regr = lm.fit(as.matrix(x), y)
str(regr)
Outputs:
List of 8
$ coefficients : Named num 0.636
..- attr(*, "names")= chr "x1"
$ residuals : num [1:5] 4.364 2.727 1.091 -0.545 -2.182
$ effects : Named num [1:5] -4.719 1.69 -0.465 -2.619 -4.774
..- attr(*, "names")= chr [1:5] "x1" "" "" "" ...
$ rank : int 1
$ fitted.values: num [1:5] 0.636 1.273 1.909 2.545 3.182
$ assign : NULL
$ qr :List of 5
..$ qr : num [1:5, 1] -7.416 0.27 0.405 0.539 0.674
..$ qraux: num 1.13
..$ pivot: int 1
..$ tol : num 1e-07
..$ rank : int 1
..- attr(*, "class")= chr "qr"
$ df.residual : int 4
I'm expecting intercept = 6 and slope = -1 but the result above doesn't contain anyhing near that. Also, does lm.fit() output r squared?
lm.fit allows to do things much more manually, so, as #MrFlick commented, we must include the intercept manually as well using cbind(1, x) as the design matrix. The R^2 is not provided but we may easily compute it:
x <- 1:5
y <- 5:1 + rnorm(5)
regr <- lm.fit(cbind(1, x), y)
regr$coef
# x
# 5.2044349 -0.5535963
1 - var(regr$residuals) / var(y) # R^2
# [1] 0.3557227
1 - var(regr$residuals) / var(y) * (length(y) - 1) / regr$df.residual # Adj. R^2
# [1] 0.1409636
I am trying to use the penalizedLDA package to run a penalized linear discriminant analysis in order to select the "most meaningful" variables. I have searched here and on other sites for help in accessing the the output from the penalized model to no avail.
My data comprises of 400 varaibles and 44 groups. Code I used and results I got thus far:
yy.m<-as.matrix(yy) #Factors/groups
xx.m<-as.matrix(xx) #Variables
cv.out<-PenalizedLDA.cv(xx.m,yy.m,type="standard")
## aplly the penalty
out <- PenalizedLDA(xx.m,yy.m,lambda=cv.out$bestlambda,K=cv.out$bestK)
Too get the structure of the output from the anaylsis:
> str(out)
List of 10
$ discrim: num [1:401, 1:4] -0.0234 -0.0219 -0.0189 -0.0143 -0.0102 ...
$ xproj : num [1:100, 1:4] -8.31 -14.68 -11.07 -13.46 -26.2 ...
$ K : int 4
$ crits :List of 4
..$ : num [1:4] 2827 2827 2827 2827
..$ : num [1:4] 914 914 914 914
..$ : num [1:4] 162 162 162 162
..$ : num [1:4] 48.6 48.6 48.6 48.6
$ type : chr "standard"
$ lambda : num 0
$ lambda2: NULL
$ wcsd.x : Named num [1:401] 0.0379 0.0335 0.0292 0.0261 0.0217 ...
..- attr(*, "names")= chr [1:401] "R400" "R405" "R410" "R415" ...
$ x : num [1:100, 1:401] 0.147 0.144 0.145 0.141 0.129 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:401] "R400" "R405" "R410" "R415" ...
$ y : num [1:100, 1] 2 2 2 2 2 1 1 1 1 1 ...
- attr(*, "class")= chr "penlda"
I am interested in obtaining a list or matrix of the top 20 variables for feature selection, more than likely based on the coefficients of the Linear discrimination.
I realized I would have to sort the coefficients in descending order, and get the variable names matched to it. So the output I would expect is something like this imaginary example
V1 V2
R400 0.34
R1535 0.22...
Can anyone provide any pointers (not necessarily the R code). Thanks in advance.
Your out$K is 4, and that means you have 4 discriminant vectors. If you want the top 20 variables according to, say, the 2nd vector, try this:
# get the data frame of variable names and coefficients
var.coef = data.frame(colnames(xx.m), out$discrim[,2])
# sort the 2nd column (the coefficients) in decreasing order, and only keep the top 20
var.coef.top = var.coef[order(var.coef[,2], decreasing = TRUE)[1:20], ]
var.coef.top is what you want.
This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
My lack of understanding of R is causing me to grind to a halt in my work and seek your help. I'm looking to build a neural network from some time series data and then build a prediction using separate data and the model returned by the trained neural network.
I created an xts containing the dependent variable nxtCl (a one-day forward closing stock price) and the independent variables (a set of corresponding prices and technical indicators).
I split the xts in two, one set being training data and the other set for testing/prediction, these are miData.train and miData.test respectively. Subsequently I altered these two xts to be scaled data frames.
miData.train <- scale(as.data.frame(miData.train))
miDate.test <- scale(as.data.frame(miData.test))
Using the package nnet I am able to build a neural network from the training data:
nn <- nnet(nxtCl ~ .,data=miData.train,linout=T,size=10,decay=0.001,maxit=10000)
The str() output for this returned formula object is:
> str(nn)
List of 18
$ n : num [1:3] 11 10 1
$ nunits : int 23
$ nconn : num [1:24] 0 0 0 0 0 0 0 0 0 0 ...
$ conn : num [1:131] 0 1 2 3 4 5 6 7 8 9 ...
$ nsunits : num 22
$ decay : num 0.001
$ entropy : logi FALSE
$ softmax : logi FALSE
$ censored : logi FALSE
$ value : num 4.64
$ wts : num [1:131] 2.73 -1.64 1.1 2.41 1.36 ...
$ convergence : int 0
$ fitted.values: num [1:901, 1] -0.465 -0.501 -0.46 -0.431 -0.485 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:901] "2005-07-15" "2005-07-18" "2005-07-19" "2005-07-20" ...
.. ..$ : NULL
$ residuals : num [1:901, 1] -0.0265 0.0487 0.0326 -0.0384 0.0632 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:901] "2005-07-15" "2005-07-18" "2005-07-19" "2005-07-20" ...
.. ..$ : NULL
$ call : language nnet.formula(formula = nxtCl ~ ., data = miData.train, inout = T, size = 10, decay = 0.001, maxit = 10000)
$ terms : language nxtCl ~ Op + Hi + Lo + Cl + vul + smaten + smafif + smath + vol + rsi + dvi
$ coefnames : chr [1:11] "Op" "Hi" "Lo" "Cl" ...
$ xlevels : Named list()
- attr(*, "class")= chr [1:2] "nnet.formula" "nnet"
I then try to run the prediction function using this model nn and the data I kept separate miData.test using the following function:
preds <- predict(object=nn, miData.test)
and I get the following error:
Error in terms.default(object, data = data) :
no terms component nor attribute
Running terms.default on miData.test I see that my data frame does not have any attributes:
terms.default(miData.test)
Error in terms.default(miData.test) : no terms component nor attribute
but is this why the prediction will not run?
miData.test has names that match the terms of nn:
> nn$terms
nxtCl ~ Op + Hi + Lo + Cl + vul + smaten + smafif + smath + vol +
rsi + dvi
> names(miData.test)[1] "Op" "Hi" "Lo" "Cl" "vul" "smaten" "smafif" "smath" "vol" "rsi" "dvi" "nxtCl"
And, in terms of structure, the data is exactly the same as that which was used to build nn in the first place. I tried adding my own named attributes to miData.test, matching the terms of nn but that did not work. The str() of miData.test returns:
> str(miData.test)
'data.frame': 400 obs. of 12 variables:
$ Op : num 82.2 83.5 80.2 79.8 79.8 ...
$ Hi : num 83.8 84.2 83 79.9 80.2 ...
$ Lo : num 81 82.7 79.2 78.3 78 ...
$ Cl : num 83.7 82.8 79.2 79 78.2 ...
$ vul : num 4.69e+08 2.94e+08 4.79e+08 3.63e+08 3.17e+08 ...
$ smaten: num 84.1 84.1 83.8 83.3 82.8 ...
$ smafif: num 86.9 86.8 86.7 86.6 86.4 ...
$ smath : num 111 111 111 110 110 ...
$ vol : num 0.335 0.341 0.401 0.402 0.382 ...
$ rsi : num 45.7 43.6 36.6 36.3 34.7 ...
$ dvi : num 0.00968 0.00306 -0.01575 -0.01189 -0.00623 ...
$ nxtCl : num 82.8 79.2 79 78.2 77.4 ...
Any help or insight in getting predict() to work in this instance would be greatly appreciated. Thanks.
Here's some reproducible code. In putting this together, I have 'removed' the error. Unfortunately, although it now works, I am none the wiser as to what was causing the problem before:
require(quantstrat)
require(PerformanceAnalytics)
require(nnet)
initDate <- "2004-09-30"
endDate <- "2010-09-30"
symbols <- c("SPY")
getSymbols(symbols, from=initDate, to=endDate, index.class=c("POSIXt","POSIXct"))
rsi <- RSI(Cl(SPY))
smaTen <- SMA(Cl(SPY))
smaFif <- SMA(Cl(SPY),n=50)
nxtCl <- lag(Cl(SPY),-1)
tmp <- SPY[,-5]
tmp <- tmp[,-5]
miData <- merge(tmp,rsi,smaTen,smaFif,nxtCl)
names(miData) <- c("Op","Hi","Lo","Cl","rsi","smaTen","smaFif","nxtCl")
miData <- miData[50:1512]
scaled.miData <- scale(miData)
miData.train <- as.data.frame(scaled.miData[1:1000])
miData.test <- as.data.frame(scaled.miData[1001:1463])
nn <- nnet(nxtCl ~ .,data=miData.train,linout=T,size=10,decay=0.001,maxit=10000)
preds <- predict(object=nn, miData.test)