iteratively create variable in for loop - r

I want to run a for loop that does a vector-matrix operation and returns a vector suffixed by the iteration number.
E.g:
If I have a 5 by 5 matrix , I want to take each column of the matrix at a time (at each iteration of the for loop) and work on a bunch of operations and at the end of it get a vector that is labelled as v_i where i refers to the column index and also the iteration number of the loop. I understand that this can be achieved in a for loop but I am not sure how to label the variable at each iteration.
For instance if I had to do this in SAS, I would have used v&i and put in a macro and run it. But not sure what is the R equivalent of this iterative labelling of variables.
Would really appreciate any help on this. I have a homework due next week and am in real crunch time.
Thanks!

It's not good programming practice to create objects out of a loop like that. It's better to put them in one object (say, a list) and populate the list from inside the loop. You can even create the list before running the loop.
I'm not sure if I understood correctly, but here's a very simple example that calculates the mean per column. To me, your question didn't actually reveal what you try to accomplish.
mat <- matrix(1:25, ncol=5)
lst <- as.list(numeric(ncol(mat)))
names(lst) <- sapply(1:ncol(mat), function(x) paste("v_",x,sep=""))
myfun <- mean
for(i in 1:ncol(mat)){
lst[[i]] <- myfun(mat[,i])
}
Cheers!

You'll be better served if you spend a bit of time learning how R works and how it is different from SAS. Lucky for you, there are resources dedicated to such a learning curve - see here.
In this case, you don't need to use a loop at all. I also doubt that you actually want a list output as SimonG suggests, rather a simple vector. Here's an example:
mat <- matrix(1:25, ncol=5)
#Give the matrix some names
colnames(mat) <- paste0("col_", 1:5)
#compute the column means
colMeans(mat)
---
col_1 col_2 col_3 col_4 col_5
3 8 13 18 23

Related

apply fisher test in a large dataset that join all contingency tables

I have a dataset like this:
contingency_table<-tibble::tibble(
x1_not_happy = c(1,4),
x1_happy = c(19,31),
x2_not_happy = c(1,4),
x2_happy= c(19,28),
x3_not_happy=c(14,21),
X3_happy=c(0,9),
x4_not_happy=c(3,13),
X4_happy=c(17,22)
)
in fact, there are many other variables that come from a poll aplied in two different years.
Then, I apply a Fisher test in each 2X2 contingency matrix, using this code:
matrix1_prueba <- contingency_table[1:2,1:2]
matrix2_prueba<- contingency_table[1:2,3:4]
fisher1<-fisher.test(matrix1_prueba,alternative="two.sided",conf.level=0.9)
fisher2<-fisher.test(matrix2_prueba,alternative="two.sided",conf.level=0.9)
I would like to run this task using a short code by mean of a function or a loop. The output must be a vector with the p_values of each questions.
Thanks,
Frederick
So this was a bit of fun to do. The main thing that you need to recognize is that you want combinations of your data. There are a number of functions in R that can do that for you. The main workhorse is combn() Link
So in the language of the problem, we want all combinations of your tibble taken 2 at a time link2
From there, you just need to do some looping structure to get your tests to work, and extract the p-values from the object.
list_tables <- lapply(combn(contingency_table,2,simplify=F), fisher.test)
unlist(lapply(list_tables, `[`, 'p.value'))
This should produce your answer.
EDIT
Given the updated requirements for just adjacement data.frame columns, the following modifications should work.
full_list <- combn(contingency_table,2,simplify=F)
full_list <- full_list[sapply(
full_list, function(x) all(startsWith(names(x), substr(names(x)[1], 1,2))))]
full_list <- lapply(full_list, fisher.test)
unlist(lapply(full_list, `[`, 'p.value'))
This is approximately the same code as before, but now we have to find the subsets of the data that have the same question prefix name. This only works if the prefixes are exactly the same (X3 != x3). I think this is a better solution than trying to work with column indexes, and without the guarantee of always being next to one another. The sapply code does just that. The final output should be what you need for the problem.

Reassign variables to elements of list

I have a list of objects in R on which I perform different actions using lapply. However in the next step I would like to apply functions only to certain elements of the list. Therefore I would like to split the list into the original variables again. Is there a command in R for this? Is it even possible, or do I have to create new variables each time I want to do this?
See the following example to make clear what I mean:
# 3 vectors:
test1 <- 1:3
test2 <- 2:6
test3 <- 8:9
# list l:
l <- list(test1,test2,test3)
# add 3 to each element of the list:
l <- lapply(l, function(x) x+3)
# In effect, list l contains modified versions of the three test vectors
Question: How can I assign those modifications to the original variables again? I do not want to do:
test1 <- l[[1]]
test2 <- l[[2]]
test3 <- l[[3]]
# etc.
Is there a better way to do that?
A more intuitive approach, assuming you're new to R, might be to use a for loop. I do think that Richard Scriven's approach is better. It is at least more concise.
for(i in seq(1, length(l))){
name <- paste0("test",i)
assign(name, l[[i]] + 3)
}
That all being said, your ultimate goal is a bit dubious. I would recommend that you save the results in a list or matrix, especially if you are new to R. By including all the results in a list or matrix you can continue to use functions like lapply and sapply to manipulate your results.
Loosely speaking Richard Scriven's approach in the comments is converting each element of your list into an object and then passing these objects to the enclosing environment which is the global environment in this case. He could have passed the objects to any environment. For example try,
e <- new.env()
list2env(lapply(mget(ls(pattern = "test[1-3]")), "+", 3), e)
Notice that test1, test2 and test3 are now in environment e. Try e$test1 or ls(e). Going deeper in to the parenthesis, the call to ls uses simple regular expressions to tell mget the names of the objects to look for. For more, take a look at http://adv-r.had.co.nz/Environments.html.

Making matrix in R

I want to make matrices without using loops such as for , while.
So I tried assigned k and put k in function which makes matrices.
powlist= function(base,startnum,endnum) (base)^(startnum:endnum)
m_maker= function(base) matrix(c(powlist(base,0,19)),4,5)
k= 2:10
a= m_maker((k-1)/k)
But function returns only one matrix.
I think function should return 9 matrices.
Please let me know how should I change this code.
I want to make each matrices that first one is matrix m_maker(1/2) and
second one m_maker(2/3) so on.
When I put k=2 and k=3 each time, it returns what I want.
What I want is way to return 9 matrices at one to go.
You're looking for lapply, like
res <- lapply((k-1)/k, m_maker)
However, you really should use an array for something like this.
ares <- abind(res, along=3)

How to assign submatrices in elements of a list

For example:
Let M be some matrix mXn matrix where n is large enough to make manual entry impossible.
tmp_list[1] <- M[,1:10]
tmp_list[2] <- M[,11:20]
.
.
.
tmp_list[last] <- M[end - 9,end]
The problem I'm working on is sort of monte carlo, repeating an experiment involving a random mXn matrix 100K times. I'm still pretty new to R, I've done it using a for loop, but it obviously took a very long time. So I'm hoping to assign each "experiment" to an element of a list and use lapply.
let's take the easy case, and you can expand it from there
say n=100, develop your start indeces
n<-100
byParam<-10
starts<-seq(1, n-(byParam-1), by=byParam)
then lapply
tmp_list<-lapply(starts, function(startIndex) M[, startIndex:(startIndex+(byParam-1)])
just one way to do it, becomes a bit more complicated if n is not a nice multiple of 10 (or whatever you set the "byParam" equal to). If that is the case then you can develop your start and end indeces, and then use mapply instead
#given start and end indeces
tmp_list<-mapply(function(startInd, endInd){
M[, startInd:endInd},
startInd=starts, endInd=ends)
Now lapply and mapply are still iterative, so I wouldn't expect massive improvement on time efficiency
EDIT
After discussion in the comments, here is a solution for the entire set up, not just the above question
tmp_list<-lapply(1:1000, function(i){
vect<-sample(c(0,1), 10*1000, replace=TRUE)
dim(vect)<-c(10, 1000)
vect
})
Let's break this down, it makes everything very simple.
We first create a random sample of 1's and 0's, of the length 10*1000 (the number of elements in each sub-matrix). We can then neatly convert that vector to a matrix by assigning it's dim attribute to be c(10, 1000), which changes its form to have 10 rows and 1000 columns. Then we return that into a list at the index i. We lapply over 1:1000, or iterate 1000 times.

is.na() in R for loop not quite understood

I am confused by the behavior of is.na() in a for loop in R.
I am trying to make a function that will create a sequence of numbers, do something to a matrix, summarize the resulting matrix based on the sequence of numbers, then modify the sequence of numbers based on the summary and repeat. I made a simple version of my function because I think it still gets at my problem.
library(plyr)
test <- function(desired.iterations, max.iterations)
{
rich.seq <- 4:34 ##make a sequence of numbers
details.table <- matrix(nrow=length(rich.seq), ncol=1, dimnames=list(rich.seq))
##generate a table where the row names are those numbers
print(details.table) ##that's what it looks like
temp.results <- matrix(nrow=10, ncol=2, dimnames=list(1:10))
##generate some sample data to summarize and fill into details.table
temp.results[,1] <- rep(5:6, 5)
temp.results[,2] <- rnorm(10)
print(temp.results) ##that's what it looks like
details.table[,1][row.names(details.table) %in% count(temp.results[,1])$x] <-
count(temp.results[,1])$freq
##summarize, subset to the appropriate rows in details.table, and fill in the summary
print(details.table)
for (i in 1:max.iterations)
{
rich.seq <- rich.seq[details.table < desired.iterations | is.na(details.table)]
## the idea would be to keep cutting this sequence of numbers down with
## successive iterations until the desired number of iterations per row in
## details.table was reached. in other words, in the real code i'd do
## something to details.table in the next line
print(rich.seq)
}
}
##call the function
test(desired.iterations=4, max.iterations=2)
On the first run through the for loop the rich.seq looks like I'd expect it to, where 5 & 6 are no longer in the sequence because both ended up with more than 4 iterations. However, on the second run, it spits out something unexpected.
UPDATE
Thanks for your help and also my apologies. After re-reading my original post it is not only less than clear, but I hadn't realized count was part of the plyr package, which I call in my full function but wasn't calling here. I'll try and explain better.
What I have working at the moment is a function that takes a matrix, randomizes it (in any of a number of different ways), then calculates some statistics on it. These stats are temporarily stored in a table--temp.results--where temp.results[,1] is the sum of the non zero elements in each column, and temp.results[,2] is a different summary statistic for that column. I save these results to a csv file (and append them to the same file at subsequent iterations), because looping through it and rbinding hogs a lot of memory.
The problem is that certain column sums (temp.results[,1]) are sampled very infrequently. In order to sample those sufficiently requires many many iterations, and the resulting .csv files would stretch into the hundreds of gigabytes.
What I want to do is create and then update a table (details.table) at each iteration that keeps track of how many times each column sum actually got sampled. When a given element in the table reaches the desired.iterations, I want it to be excluded from the vector rich.seq, so that only columns that haven't received the desired.iterations are actually saved to the csv file. The max.iterations argument will be used in a break() statement in case things are taking too long.
So, what I was expecting in the example case is the exact same line for rich.seq for both iterations, since I didn't actually do anything to change it. I believe that flodel is definitely right that my problem lies in comparing a matrix (details.table) of length longer than rich.seq, leading to unexpected results. However, I don't want the dimensions of details.table to change. Perhaps I can solve the problem implementing %in% somehow when I redefine rich.seq in the for loop?
I agree you should improve your question. However, I think I can spot what is going wrong.
You compute details.table before the for loop. It is a matrix with same length as rich.seq when it was first initialized (length(4:34), i.e. 31).
Inside the for loop, details.table < desired.iterations | is.na(details.table) is then a logical vector of length 31. On the first loop iteration,
rich.seq <- rich.seq[details.table < desired.iterations | is.na(details.table)]
will result in reducing the length of rich.seq. But on the second loop iteration, unless details.table is redefined (not the case), you are trying to subset rich.seq by a logical vector of longer length than rich.seq. This will certainly lead to unexpected results.
You probably meant to redefine details.table as part of your for loop.
(Also I am surprised to see you never used temp.results[,2].)
Thanks to flodel for setting me off on the right track. It had nothing to do with is.na but rather the lengths of vectors I was comparing.
That said, I set the initial values of the details.table to zero to avoid the added complexity of the is.na statement.
This code works, and can be modified to do what I described above.
library(plyr)
test <- function(desired.iterations, max.iterations)
{
rich.seq <- 4:34 ##make a sequence of numbers
details.table <- matrix(nrow=length(rich.seq), ncol=1, dimnames=list(rich.seq)) ##generate a table where the row names are those numbers
details.table[,1] <- 0
print(details.table) ##that's what it looks like
temp.results <- matrix(nrow=10, ncol=2, dimnames=list(1:10)) ##generate some sample data to summarize and fill into details.table
temp.results[,1] <- rep(5:6, 5)
temp.results[,2] <- rnorm(10)
print(temp.results) ##that's what it looks like
details.table[,1][row.names(details.table) %in% count(temp.results[,1])$x] <- count(temp.results[,1])$freq ##summarize, subset to the appropriate rows in details.table, and fill in the summary
print(details.table)
for (i in 1:max.iterations)
{
rich.seq <- row.names(details.table)[details.table[,1] < desired.iterations]
print(rich.seq)
}
}
Rather than trying to cut down the rich.seq I just redefine it every iteration based on whatever happens with details.table during the previous iteration.

Resources