R create a vector with loop structure - r

I have a list, listDFs, where each element is a data frame. Each data frame has a different number of rows and the same number of columns.
I should create a vector beginning from listDFs[[i]]$Name extracting all the i element from the list.
I thought to use a loop such:
vComposti <- c()
for(j in 1:10){vComposti <- c(listDFs[[j]]$Name)}
But the result is a vector containing only the first level (listDFs[[1]]$Name) of the list.
Where I wrong?? Do you have any suggestion??

The problem you have is in this line:
vComposti <- c(listDFs[[j]]$Name)
Each time through your loop, you are re-assigning a new value to vComposti and overwriting the previous value.
In general it is preferable to pre-allocate the vector and fill it element by element:
vComposti <- rep(NA, 10)
for(j in 1:10){
vComposti[j] <- c(listDFs[[j]]$Name)
}
But it's also not clear to me exactly what you're expecting the result to be. You create a vector, but it looks like you are trying to store an entire data frame column in each element of the vector. If that's the case you may actually be looking for a result that's a list:
vComposti <- vector("list",10)
for(j in 1:10){
vComposti[[j]] <- c(listDFs[[j]]$Name)
}
Another, somewhat more sophisticated, option may be to use lapply:
lapply(listDFs,FUN = "[","Name")

Related

How to apply a for-loop and t-test to a dataset?

I'm trying to apply a for-loop to a dataframe in R, using it to take the row number, which will be used in a t-test, along with specified column indices.
When I run the code I currently have, it only takes the last value specified in the for-loop. How do I fix this? (sorry I'm a complete novice)
This is my code:
x represents the dataset
for(i in 1:nrow(x)){
test<- t.test(x[i, 1:5], x[i, 6:10])
return(test$p.value)
}
I want it to run a t-test on every row, using i (as the row number) and the specified column indices as the input, to provide me with the p value from each test
It happens because you overwrite test all the time. If you really want to use a for loop for this purpose and extract the p-values afterwards, this would better work:
set.seed(1)
x <- matrix(sample(1:100,100), nrow = 10)
test = list()
a = 0
for(i in 1:nrow(x)){
a <- a + 1
test[[a]] <- t.test(x[i, 1:5], x[i, 6:10])
}
lapply(test, "[[", "p.value")
However, using apply the way nadizan proposed is much more preferred in this case.
I think that in order to use return you have to define a function (I am actually surprised you don't get an error). What happens is that the loop performs all the tests as you want but it overwrites them on the same variable test, so at the end you have only the last result.
Edit: In fact, I checked and the returnshould let you exit at the first iteration, thus getting only the result of the first test.
One simple way to fix this is to create, for example, a vector and then append each new result in the same position as the correspondent row:
test <- c()
for(i in 1:nrow(x)){
test[i] <- t.test(x[i, 1:5], x[i, 6:10])
}
Notice that appending to an empty vector/list is quite expensive as its final length increases, so you may want to initialize it with NAs with the same length as the number of rows of the dataframe:
test <- rep (NA,nrow(x))

R; Populating a matrix with a for loop iterating over a vector

I am not very experienced with R, and have been struggling for days to repeat a string of code to fill a data matrix. My instinct is to create a for loop.
I am a biology student working on colour differences between sets of images, making use of the R package colordistance. The relevant data has been loaded in R as a list of 8x4 matrices (each matrix describes the colours in one image). Five images make up one set and there are 100 sets in total. Each set is identified by a number (not 1-100, it's an interrupted sequence, but I have stored the sequence of numbers in a vector called 'numberlist'). I have written the code to extract the desired data in the right format for the first set, and it is as follows;
## extract the list of matrices belonging to the first set (A3) from the the full list
A3<-histlist[grep('^3',names(histlist))]
## create a colour distance matrix (cdm), ie a pairwise comparison of "similarity" between the five matrices stored in A3
cdm3<-colordistance::getColorDistanceMatrix(A3, method="emd", plotting=FALSE)
## convert to data frame to fix row names
cdm3df<-as.data.frame(cdm3)
## remove column names
names(cdm3df)<-NULL
## return elements in the first row and column 2-5 only (retains row names).
cdm3filtered<-cdm3df[1,2:5]
Now I want to replace "3" in the code above with each number in 'numberlist' (not sure whether they should be as.factor or as.numeric). I've had many attempts starting with for (i in numberlist) {...} but with no successful output. To me it makes sense to store the output from the loop in a storage matrix; matrix(nrow=100,ncol=4) but I am very much stuck, and unable to populate my storage matrix row by row by iterating the code above...
Any help would be greatly appreciated!
Updates
What I want the outputs of the loop to to look like (+ appended in the storage matrix);
> cdm17filtered
17clr 0.09246918 0.1176651 0.1220622 0.1323586
This is my attempt:
for (i in numberlist$X) {
A[i] <- histlist[grep(paste0('^',i),names(histlist))]
cdm[i] <- colordistance::getColorDistanceMatrix(A[i], method="emd", plotting=FALSE)
cdm[i]df <- as.data.frame(cdm[i])
cdm[i]filtered <- cdm[i]df[1,2:5]
print(A[i]) # *insert in n'th column of storage matrix
}
The above is not working, and I'm missing the last bit needed to store the outputs of the loop in the storage matrix. (I was advised against using rbind to populate the storage matrix because it is slow..)
In your attempt, you use invalid R names with non-alphanumeric characters not escaped, cdm[i]df and cdm[i]filtered. It seems you intend to index from a larger container like a list of objects.
To properly generalize your process for all items in numberlist, adjust your ^3 setup. Specifically, build empty lists and in loop iteratively assign by index [i]:
# INITIALIZE LISTS (SAME LENGTH AS numberlist)
A <- vector(mode="list", length = length(numberlist))
cdm_matrices <- vector(mode="list", length = length(numberlist))
cdm_dfs <- vector(mode="list", length = length(numberlist))
cdm_filtered_dfs <- vector(mode="list", length = length(numberlist))
# POPULATE LISTS
for (i in numberlist$X) {
## extract the list of matrices belonging to the first set
A[i] <- histlist[grep(paste0('^', i), names(histlist))]
## create a colour distance matrix (cdm)
cdm_matrices[i] <- colordistance::getColorDistanceMatrix(A[i], method="emd", plotting=FALSE)
## convert to data frame to fix row names and remove column names
cdm_dfs[i] <- setNames(as.data.frame(cdm_matrices[i]), NULL)
## return elements in the first row and column 2-5 only (retains row names).
cdm_filtered_dfs[i] <- cdm_dfs[i][1,2:5]
}
Alternatively, if you only need the last object, cdm_filtered_df returned, use lapply where you do not need to use or index lists and all objects are local in scope of function (i.e., never saved in global environment):
cdm_build <- function(i) {
A <- histlist[grep(paste0('^', i), names(histlist))]
cdm <- colordistance::getColorDistanceMatrix(A, method="emd", plotting=FALSE)
cdm_df <- setNames(as.data.frame(cdm), NULL)
cdm_filtered_df <- cdm_df[1,2:5]
return(cdm_filtered_df) # REDUNDANT AS LAST LINE IS RETURNED BY DEFAULT
}
# LIST OF FILTERED CDM DATA FRAMES
cdm_filtered_dfs <- lapply(numberlist, cdm_build)
Finally, with either solution above, should you want to build a singular data frame, run rbind in a do.call():
cdm_final_df <- do.call(rbind, cdm_filtered_dfs)

Saving rows into variables in R

I have a 18-by-48 matrix.
Is there a way to save each of the 18 rows automatically in a separate variable (e.g., from r1 to r18) ?
I'd definitely advise against splitting a data.frame or matrix into its constituent rows. If i absolutely had to split the rows up, I'd put them in a list then operate from there.
If you desperately had to split it up, you could do something like this:
toy <- matrix(1:(18*48),18,48)
variables <- list()
for(i in 1:nrow(toy)){
variables[[paste0("variable", i)]] <- toy[i,]
}
list2env(variables, envir = .GlobalEnv)
I'd be inclined to stop after the for loop and avoid the list2env. But I think this should give you your result.
I believe you can select a row r from your dataframe d by indexing without a column specified:
var <- d[r,]
Thus you can extract all of the rows into a variable by using
var <- d[1:length(d),]
Where var[1] is the first row, var[2] the second. Etc.. not sure if this is exactly what you are looking for. Why would you want 18 different variables for each row?
result <- data.frame(t(mat))
colnames(result) <- paste("r", 1:18, sep="")
attach(result)
your matrix is mat

Loop used to create multiple vectors from data frame columns

I would like to create a vector from each column of mtcars data frame. I need two solutions. First one should be done in loop and if it's possible the other one without a loop.
The desired output should be like that:
vec_1 <- mtcars[,1]
vec_2 <- mtcars[,2]
etc...
I tried to create a loop but I failed. Can you tell me what is wrong with that loop ?
vec <- c()
for (i in 1:2){
assign(paste("vec",i,sep="_" <- mtcars[,i][!is.na(mtcars[,i])]
}
I need to remove possible NAs from my data that's why I put it in the example.
Your loop is missing a few brackets and you should assign the vector to the global environment of your R session like so:
for (i in 1:2) {
assign(sprintf("vec_%d", i), mtcars[!is.na(mtcars[[i]]), i], envir = .GlobalEnv)
}
It is not possible to get the desired result without a loop.

need to assign variables some values in a loop in R

I need to assign variables some values in a loop
Eg:
abc_1<-
abc_2<-
abc_3<-
.....
something like:
for(i in 1:20)
{
paste("abc",i,sep="_")<-some calculated value
}
I have tried to use paste as above but it doesn't work.
How could this be done.Thanks
assign() and paste0() should help you.
for example:
object_names <- paste0("abc",1:20)
for (i in 1:20){
assign(object_names[i],runif(40))
}
assign() takes the string in object_names and assigns the function in the second argument to each name. When you place a numeric vector inside of paste0() it gives back a character vector of concatenated values for each value in the numeric vector.
edit:
As Gregor says below, this is much better to do in a list because:
It will be faster.
When making a large number of things you probably want to do the same thing to each of them. lapply() is very good at this.
For example:
N <- 20
# create random numbers in list
abcs <- lapply(1:N,function(i) runif(40))
# multiply each vector in list by 10
abc.mult <- lapply(1:length(abcs), function(i) abcs[[i]] * 10)

Resources