Incredibly basic question. I'm brand new to R. I feel bad for asking, but also like someone will crush it:
I'm trying to generate a number of vectors with a for loop. Each with an unique name, numbered by iteration. The code I'm attaching throws an error, but I think it explains what I'm trying to do in principle fairly well.
Thanks in advance.
vectorBuilder <- function(num){
for (x in num){
paste0("vec",x) <- rnorm(10000, mean = 0, sd = 1)}
}
numSeries <- 1:10
vectorBuilder(numSeries)
You can write the function to return a named list :
create_vector <- function(n) {
setNames(replicate(n, rnorm(10000), simplify = FALSE),
paste0('vec', seq_len(n)))
}
and call it as :
data <- create_vector(10)
data will have list of length 10 with each element having a vector of size 10000. It is better to keep data in this list instead of creating lot of vectors in global environment. However, if you still want separate vectors you can use list2env :
list2env(data, .GlobalEnv)
Related
In R one can use the <<- symbol within the lapply() function to assign a value to a variable outside lapply().
Let's consider a matrix full of 1:
m<-matrix(data=1, nrow=5, ncol=5)
Let's say I want to replace each row by the values 1,2,3,4 and 5 using the assignation symbol <<-. I can use the function the lapply function (it is not the designed function for that kind of operation, this is only an example):
lapply(X = seq(nrow(m)), FUN = function(r){
m[r,]<<-seq(5)
})
This will work.
But if I now use mclapply like this:
mclapply(X = seq(nrow(m)), FUN = function(r){
m[r,]<<-seq(5)
})
The matrix m will remain full of 1.
The idea is to apply changes to rows of a matrix, without creating a new one, but rather assigning them in the existing one. The only constrain is to use a function from the parallel package (e.g. mclapply(), but maybe another function would better fit).
Also using the <<- symbol is not mandatory.
How can I do that ?
You can't assign in parallel, as you're just assigning to a local copy of the matrix.
Two solutions:
Use shared memory (e.g. matrices on disk using package {bigstatsr}; disclaimer: I'm the author)
Don't assign in the first place. Just run the lapply(), get all the results parts as a list and use do.call("rbind", list).
How about this, using the future package
library(future)
plan(multiprocess)
m <- matrix(data = 1, nrow = 5, ncol = 5)
# we create a set of futures, so the values are calculated in parallele and
# not sent back to the main environment
fs <- lapply(seq(nrow(m)), function(x) future(seq(5) + x))
# when then pull the values one by one and apply them where they belong
for (i in seq(nrow(m))) {
m[i, ] <- value(fs[[i]])
}
# or the same way you did it:
lapply(X = seq(nrow(m)), FUN = function(r){
m[r,] <<- value(fs[[r]])
})
The drawback here is that the value are assigned sequentially but at least they are calculated in parallel. But, I don't think you intend to use the matrix before all calculations are done anyway.
I'm sure there is an easy answer. I have a loop, where for each iteration, I create a new vector to store the results. I do this by pasting a name together and then assigning that name to an empty vector.
for (i in seq(1, 50)) {
current_iteration = i
x = paste0("resultsVec", current_iteration)
assign(x, rep(NA, 43))
paste0("resultsVec", i)
for (j in seq(1, 100))
{
resultsVeci[j] = j * j # <- problem here
}
}
However, you obviously can't refer to 'resultsVeci' - so how to I refer to the iteration specific vector each loop?
If you do paste0("resultsVec", i), where i=2 for example, it returns a string "resultsVec2", rather than the object resultsVec2. How do I refer to the object rather than the string?
Thanks.
It really isn't a good idea to use get() and assign() with most R code. (Why is using assign bad?). Better to just use a list. A simple lapply would work here.
resultsVec<-lapply(1:50, function(i) (1:100)*(1:100))
and then you can get the values with reusltsVec[[1]], resultsVec[[2]], etc
I am new to R, so im sorry if it is not a good question.
I have several data frames called matrix1, matrix2, etc.
I want to use these 2 commands in a loop for all of them:
A1=as.matrix(matrix1)
B1=graph.adjacency(A1,mode="directed",weighted=NULL,diag=FALSE)
but I cannot figure out how to get the loop to change the names of the matrices.
Thank you in advance!
You can use get to get a variable by its name.
e.g.
for (i in 1:n) {
A1 = as.matrix(get(paste0('matrix', i)))
B1 = graph.adjacency(A1,mode="directed",weighted=NULL,diag=FALSE)
}
If you want to store the B1s, you could do so using (for example) a list:
Bs <- lapply(1:n, function (i) {
A1 = ...
B1 = ...
return(B1)
})
Then Bs[[i]] will contain the B1 of matrix i.
And then, a further improvement - rather than manually naming all your matrices matrix1, matrix2, ... , matrix10000 (particularly if you have a lot of them!), it would be better to store them in a list, e.g. As[[i]] is matrixi. (I can't give you specific code on how to do this, as it depends on where your matrices come from/how they are populated. e.g. you might lapply(list_of_filenames, read.csv) to read all the matrices from a list of file names).
Then you can:
Bs <- lapply(As, graph.adjacency, mode="directed", weighted=NULL, diag=FALSE)
without resorting to get.
Use assign() to create matrices/data.frames in loops. Use get() when calling a numbered matrix/data.frame in your loop.
for (i in 1:n) {
assign(paste0("A", i), unname(as.matrix(get(paste0("matrix", i)))))
assign(paste0("B", i), graph.adjacency(get(paste0("A", i)),
mode = "directed",
weighted = NULL,
diag = FALSE))
}
I have a large data set and I want to perform several functions at once and extract for each a parameter.
The test dataset:
testdf <- data.frame(vy = rnorm(60), vx = rnorm(60) , gvar = rep(c("a","b"), each=30))
I first definded a list of functions:
require(fBasics)
normfuns <- list(jarqueberaTest=jarqueberaTest, shapiroTest=shapiroTest, lillieTest=lillieTest)
Then a function to perform the tests by the grouping variable
mynormtest <- function(d) {
norm_test <- res_reg <- list()
for (i in c("a","b")){
res_reg[[i]] <- residuals(lm(vy~vx, data=d[d$gvar==i,]))
norm_test[[i]] <- lapply(normfuns, function(f) f(res_reg[[i]]))
}
return(norm_test)
}
mynormtest(testdf)
I obtain a list of test summaries for each grouping variable.
However, I am interested in getting only the parameter "STATISTIC" and I did not manage to find out how to extract it.
You can obtain the value stored as "STATISTIC" in the output of the various tests with
res_list <- mynormtest(testdf)
res_list$a$shapiroTest#test#statistic
res_list$a$jarqueberaTest#test#statistic
res_list$a$lillieTest#test#statistic
And correspondingly for set b:
res_list$b$shapiroTest#test$statistic
res_list$b$jarqueberaTest#test$statistic
res_listb$lillieTest#test$statistic
Hope this helps.
Concerning your function fgetparam I think that it is a nice starting point. Here's my suggestion with a few minor modifications:
getparams2 <- function(myp) {
m <- matrix(NA, nrow=length(myp), ncol=3)
for (i in (1:length(myp))){
m[i,] <- sapply(1:3,function(x) myp[[i]][[x]]#test$statistic)}
return(m)
}
This function represents a minor generalization in the sense that it allows for an arbitrary number of observations, while in your case this was fixed to two cases, a and b. The code can certainly be further shortened, but it might then also become somewhat more cryptic. I believe that in developing a code it is helpful to preserve a certain compromise between efficacy and compactness on one hand and readability or easiness to understand on the other.
Edit
As pointed out by #akrun and #Roland the function getparams2() can be written in a much more elegant and shorter form. One possibility is
getparams2 <- function(myp) {
matrix(unname(rapply(myp, function(x) x#test$statistic)),ncol=3)}
Another great alternative is
getparams2 <- function(myp){t(sapply(myp, sapply, function(x) x#test$statistic))}
I am having trouble optimising a piece of R code. The following example code should illustrate my optimisation problem:
Some initialisations and a function definition:
a <- c(10,20,30,40,50,60,70,80)
b <- c(“a”,”b”,”c”,”d”,”z”,”g”,”h”,”r”)
c <- c(1,2,3,4,5,6,7,8)
myframe <- data.frame(a,b,c)
values <- vector(length=columns)
solution <- matrix(nrow=nrow(myframe),ncol=columns+3)
myfunction <- function(frame,columns){
athing = 0
if(columns == 5){
athing = 100
}
else{
athing = 1000
}
value[colums+1] = athing
return(value)}
The problematic for-loop looks like this:
columns = 6
for(i in 1:nrow(myframe){
values <- myfunction(as.matrix(myframe[i,]), columns)
values[columns+2] = i
values[columns+3] = myframe[i,3]
#more columns added with simple operations (i.e. sum)
solution <- rbind(solution,values)
#solution is a large matrix from outside the for-loop
}
The problem seems to be the rbind function. I frequently get error messages regarding the size of solution which seems to be to large after a while (more than 50 MB).
I want to replace this loop and the rbind with a list and lapply and/or foreach. I have started with converting myframeto a list.
myframe_list <- lapply(seq_len(nrow(myframe)), function(i) myframe[i,])
I have not really come further than this, although I tried applying this very good introduction to parallel processing.
How do I have to reconstruct the for-loop without having to change myfunction? Obviously I am open to different solutions...
Edit: This problem seems to be straight from the 2nd circle of hell from the R Inferno. Any suggestions?
The reason that using rbind in a loop like this is bad practice, is that in each iteration you enlarge your solution data frame and then copy it to a new object, which is a very slow process and can also lead to memory problems. One way around this is to create a list, whose ith component will store the output of the ith loop iteration. The final step is to call rbind on that list (just once at the end). This will look something like
my.list <- vector("list", nrow(myframe))
for(i in 1:nrow(myframe)){
# Call all necessary commands to create values
my.list[[i]] <- values
}
solution <- rbind(solution, do.call(rbind, my.list))
A bit to long for comment, so I put it here:
If columns is known in advance:
myfunction <- function(frame){
athing = 0
if(columns == 5){
athing = 100
}
else{
athing = 1000
}
value[colums+1] = athing
return(value)}
apply(myframe, 2, myfunction)
If columns is not given via environment, you can use:
apply(myframe, 2, myfunction, columns) with your original myfunction definition.