If statement to fill Matrix, incorrect number of subscripts - r

I am trying to write a for loop where if the cell of one matrix matches a letter it then fills a blank matrix with the entire row that matched. Here is my code
mets<-data.frame(read.csv(file="Metabolite_data.csv",header=TRUE))
full<-length(mets[,6])
A=matrix(,nrow=4930,ncol=8, byrow=T)
for (i in 1:full){
if (mets[i,6]=="A") (A[i,]=(mets[i,]))
}
If I replace the i in the if statement with a single number it works to fill that row of matrix A, however it will not fill more then one row. TIA

You might be getting problems going from data frame to matrix. It could be that just using "mets" as a matrix instead of a data frame could solve your problem, or you could use as.matrix within your for loop. An example of the latter with made-up data since I don't have your "metabolite_data.csv":
mets <- matrix(sample(LETTERS[1:4], 80, replace = TRUE), nrow = 10, ncol = 8)
mets <- as.data.frame(mets)
A <- matrix(nrow = nrow(mets), ncol = ncol(mets), byrow = TRUE)
for(i in 1:nrow(mets)){
if(mets[i,6] == "A"){
A[i,] = as.matrix(mets[i,])
}
}
print(A)

You may wanna try to specify ncol=dim(mets)[2] to make sure you are providing same number of inputs to fill the matrix.

Related

Creating multiple dataframes in a loop in R

I am new to R and I don't know how to create multiple data frames in a loop. For example:
I have a data frame "Data" with 20 rows and 4 columns:
Data <- data.frame(matrix(NA, nrow = 20, ncol = 4))
names(Data) <- c("A","B","C","D")
I want to choose the rows of Data which its values in column T are the closest values to the vector elements of X.
X = c(X1,X2,X3,X4,X5)
Finally, I want to assign them to a separate data frames with their associated X name:
for(i in 1:length(X)){
data_X[i] <- data.frame(matrix(NA))
data_X[i] <- subset(data2, 0 <= A-X[i] | A-X[i]< 0.000001 )
}
Thank you!
Since you didn't give us any numbers, it is difficult to say exactly what you need the for loop to look for. As such, you will need to sort that out yourself, but here is a basic example of what you could do. The important part that I think you are missing is that you need to use assign to send the created dataframes to your global environment or wherever you want them to go for that matter. Paste0 is a handy way to give them each their own name. Take note that some of the data frames will be empty. It may be worthwhile to use an if statement that skips assigning the dataframe if (nrow(data3)==0).
`Data <- data.frame(matrix(sample(1:10,80,replace = T), nrow = 20, ncol = 4))`
`names(Data) <- c("A","B","C","D")`
`X = c(1:10)`
`for(i in 1:length(X)){
data2 <- Data
data3 <- subset(data2, A == X[i])
assign(paste0("SubsetData",i), data3, envir = .GlobalEnv)
}`

Sampling for a row from a matrix yields empty data

I have a matrix named "fida", from which I have randomly sampled certain number of rows. On these rows I am running a set of commands at the end of which I have a condition which if true, i want to sample another row randomly from the same matrix which is not any of the rows sampled earlier.
For doing this I have a condition. But before that itself when i use the same command to sample from the matrix gives me an empty data
reps=5 #number of samples
randreps=sample(nrow(fida), size = reps, replace = F)
for (loop in randreps)
{calculate a}
if(a==0)
{loop=sample(nrow(fida), size = 1, replace = F)
calculate a}
But when I run this, the second sample always gives empty data and a cannot be calculated. When I go back and check my dataframe "fida" for the row that has been selected, there is data in that row. I do not know what is wrong and any help will be much appreciated.
You could approach this problem in the following manner.
set.seed(357)
xy <- matrix(1:30, nrow = 10)
original.rows <- sample(10, size = 3, replace = FALSE)
original <- xy[original.rows, ]
# Your calculations.
# Sample from the original matrix again, but without the already sampled
# samples.
middle <- xy[-original.rows, ]
output.row <- sample(nrow(middle), size = 3, replace = FALSE)
output <- xy[output.row, ]
In other words, you have a matrix that holds only the unsampled rows, which serves as a source of new rows for your calculations.

How store for loop results to data frame?

I have a list of files that I need to perform analysis. I would like to store the results of each iteration to a data frame as a new row. Here is what I tried but got the error:
Error in `$<-.data.frame`(`*tmp*`, "c1", value = c(0, 64010, 0, 64010, : replacement has 2 rows, data has 65
Here is my code (this part of the code only counts number of records in each file)
h <- data.frame(matrix(0, ncol = 2, nrow = 65))
colnames(y) <- c("c1","c2")
my_files <- list.files("C:/Users/....")
for (i in 1:length(my_files))
{
k <- length(readLines(my_files[i], skipNul=TRUE))
h$c1 <- rbind(h$c1, k)
}
length will give you a single number. You are trying to rbind a single value into a two column object. One solution would be to add an NA in column c2 in your loop like so:
h <- rbind(h, c(k,NA))
Try to stay away from for loops. Consider using one of the apply functions, like lapply.

Assign value to one column of a matrix in a loop

I wonder if there is a simple way to produce a list of matrices with sequential names using a "for" loop, and then give one of their columns values.
for(i in 1:3)
{
assign(paste0("matrix",i), matrix(NA, nrow = 4, ncol = 6))
assign(get(paste0("matrix",i))[,1], rep(i, 4))
}
In the above code, I tried to create 3 matrices matrix1, matrix2, and matrix3, whose first columns were aimed to assign the values of rep(1, 4), rep(2, 4), rep(3, 4). However, R gives an error message.
Error in assign(get(paste0("matrix", i))[, 1], rep(i, 4)) :
invalid first argument
Thanks for your help.
If your goal is to make a list of matrices, I would recommend using list. Putting them in a real list and not in the main env as similarly named objects creates a lot more cohesion and makes your code easier to understand.
matrix_list = lapply(1:3, function(x) matrix(NA, nrow = 4, ncol = 6))
names(matrix_list) = paste('matrix', 1:3)
The error you see is probably because assign requires a character as input. Carefully read the docs for assign and get (and never use them again ;) ).

Crantastic way of expand a vector into repeated measurements

Let us say I have a data frame indicating the factor level for each individual:
I.df = data.frame(variant = sample(x=c(0,1,2), size=30, replace = TRUE), tissue = sample(x=as.factor(c('cereb','hipo','arc')), size=30, replace = TRUE))
And I also have a vector with the means for each factor:
means.tissues = c(1.2, 3, 0.5)
names(means.tissues) = c('cereb', 'hipo', 'arc')
Then I want to create a vector of length equal to the number of rows of I.df, and where the value is the respective tissue for a given row. I.e.,
ind.tissues = rep(NA, nrow(I.df))
for(i in 1:nrow(I.df))
{
ind.tissues[i] = means.tissues[names(means.tissues) == I.df$tissue[i]]
}
I think the for loop is a rather inefficient way to do this, specially for matrices with very large n, is there a better/more efficient way to do this using vectorization code in R?
You can use match:
ind.tissues = means.tissues[match(I.df$tissue, names(means.tissues))]
The match function returns the position in argument 2 of each element in argument 1. We then use those indices to grab the correct elements in means.tissues.
Edit: As mentioned by #Joran in the comment, since means.tissues is a named vector, you can look it up by name instead of using match:
ind.tissues <- means.tissues[as.character(I.df$tissue)]

Resources