How to write nested for loop so that the inner loop does not overwrite the first loop values - r

New poster on Stackoverflow but long time viewer. I could not find any previous posts that get at my specific question.
Basically, I am struggling with how to make use of a nested for loop for my problem. The issue is that the number of variables and outcomes will change with the use case, so I want a solution that is flexible for various permutations. I am not sure that apply would help me because I don't know in advance how many variables and outcomes will exist in any given use case.
The goal is to classify whether the outcome is correctly predicted by the variable (tp = true positive, etc).
The problem is that the inner loop causes the outer loop values to be overwritten, but what I want is for each outcome to be evaluated over each variable once independently. Not sure what the best way to do this is and any advice appreciated.
#Repex code
#Generate variable
variable <- c(1,2,3)
df <- as.data.frame(matrix(0, ncol = 0, nrow = 30))
for(i in 1:length(variable)){
df[,c(paste0("variable",variable[i]))]<-as.vector(sample(c(0,1), replace=TRUE, size=30))
}
df
#Generate outcome
outcome <- c(1,2,3)
df2 <- as.data.frame(matrix(0, ncol = 0, nrow = 30))
for(i in 1:length(outcome)){
df2[,c(paste0("outcome",outcome[i]))]<-as.vector(sample(c(0,1), replace=TRUE, size=30))
}
df2
#Generate performance metrics of outcome and predictor
for (i in variable){
for(j in 1:length(df2)){
df[, c(paste0("tp.",variable[i]))] <- as.vector(ifelse(df[, c(paste0("variable",variable[i]))]==1 & df2[j]==1,1,0))
df[, c(paste0("tn.",variable[i]))] <- as.vector(ifelse(df[, c(paste0("variable",variable[i]))]==0 & df2[j]==0,1,0))
df[, c(paste0("fp.",variable[i]))] <- as.vector(ifelse(df[, c(paste0("variable",variable[i]))]==1 & df2[j]==0,1,0))
df[, c(paste0("fn.",variable[i]))] <- as.vector(ifelse(df[, c(paste0("variable",variable[i]))]==0 & df2[j]==1,1,0))
}
}
df
#bind the data for comparison and spot checking
df3 <- cbind(df2,df)
#here we see that only the final inner loop data are correct
df3

The problem is that you have 3 different variables that you want to compare to 3 different outcomes, so you are making 9 comparisons. However, since you are labelling your columns only according to the variable, you only have three unique numeric suffixes (one for each value of i) pasted on to each statistic (tp, tn, fp and fn). You therefore only have 12 distinct column names.
At no point are you labelling the columns according to both the variable and the outcome. That means that every time your inner loop increments to the next outcome variable, you are over-writing the column in df that you wrote in the previous iteration of the loop.
In any case, how would you intend to keep track of which comparison you are making unless you use both the variable number and the outcome number to label your columns?
So you could do it this way:
for (i in variable)
{
V <- c(paste0("variable", i))
for(j in seq_along(df2))
{
comp <- paste0(i, ".vs.", j)
df[paste0("tp.", comp)] <- as.numeric(df[V] == 1 & df2[j] == 1)
df[paste0("tn.", comp)] <- as.numeric(df[V] == 0 & df2[j] == 0)
df[paste0("fp.", comp)] <- as.numeric(df[V] == 1 & df2[j] == 0)
df[paste0("fn.", comp)] <- as.numeric(df[V] == 0 & df2[j] == 1)
}
}
df3 <- cbind(df2, df)
Which will give you the structure you were looking for. It's a large data frame, so we'll just peek at it with str:
str(df3)
#> 'data.frame': 30 obs. of 42 variables:
#> $ outcome1 : num 0 1 1 1 1 0 1 0 1 1 ...
#> $ outcome2 : num 0 0 0 0 1 0 0 1 1 1 ...
#> $ outcome3 : num 1 1 0 0 0 0 0 0 1 0 ...
#> $ variable1: num 0 1 0 0 1 1 0 1 1 1 ...
#> $ variable2: num 1 1 0 0 0 0 0 0 0 0 ...
#> $ variable3: num 1 0 0 0 1 0 0 1 0 0 ...
#> $ tp.1.vs.1: num 0 1 0 0 1 0 0 0 1 1 ...
#> $ tn.1.vs.1: num 1 0 0 0 0 0 0 0 0 0 ...
#> $ fp.1.vs.1: num 0 0 0 0 0 1 0 1 0 0 ...
#> $ fn.1.vs.1: num 0 0 1 1 0 0 1 0 0 0 ...
#> $ tp.1.vs.2: num 0 0 0 0 1 0 0 1 1 1 ...
#> $ tn.1.vs.2: num 1 0 1 1 0 0 1 0 0 0 ...
#> $ fp.1.vs.2: num 0 1 0 0 0 1 0 0 0 0 ...
#> $ fn.1.vs.2: num 0 0 0 0 0 0 0 0 0 0 ...
#> $ tp.1.vs.3: num 0 1 0 0 0 0 0 0 1 0 ...
#> $ tn.1.vs.3: num 0 0 1 1 0 0 1 0 0 0 ...
#> $ fp.1.vs.3: num 0 0 0 0 1 1 0 1 0 1 ...
#> $ fn.1.vs.3: num 1 0 0 0 0 0 0 0 0 0 ...
#> $ tp.2.vs.1: num 0 1 0 0 0 0 0 0 0 0 ...
#> $ tn.2.vs.1: num 0 0 0 0 0 1 0 1 0 0 ...
#> $ fp.2.vs.1: num 1 0 0 0 0 0 0 0 0 0 ...
#> $ fn.2.vs.1: num 0 0 1 1 1 0 1 0 1 1 ...
#> $ tp.2.vs.2: num 0 0 0 0 0 0 0 0 0 0 ...
#> $ tn.2.vs.2: num 0 0 1 1 0 1 1 0 0 0 ...
#> $ fp.2.vs.2: num 1 1 0 0 0 0 0 0 0 0 ...
#> $ fn.2.vs.2: num 0 0 0 0 1 0 0 1 1 1 ...
#> $ tp.2.vs.3: num 1 1 0 0 0 0 0 0 0 0 ...
#> $ tn.2.vs.3: num 0 0 1 1 1 1 1 1 0 1 ...
#> $ fp.2.vs.3: num 0 0 0 0 0 0 0 0 0 0 ...
#> $ fn.2.vs.3: num 0 0 0 0 0 0 0 0 1 0 ...
#> $ tp.3.vs.1: num 0 0 0 0 1 0 0 0 0 0 ...
#> $ tn.3.vs.1: num 0 0 0 0 0 1 0 0 0 0 ...
#> $ fp.3.vs.1: num 1 0 0 0 0 0 0 1 0 0 ...
#> $ fn.3.vs.1: num 0 1 1 1 0 0 1 0 1 1 ...
#> $ tp.3.vs.2: num 0 0 0 0 1 0 0 1 0 0 ...
#> $ tn.3.vs.2: num 0 1 1 1 0 1 1 0 0 0 ...
#> $ fp.3.vs.2: num 1 0 0 0 0 0 0 0 0 0 ...
#> $ fn.3.vs.2: num 0 0 0 0 0 0 0 0 1 1 ...
#> $ tp.3.vs.3: num 1 0 0 0 0 0 0 0 0 0 ...
#> $ tn.3.vs.3: num 0 0 1 1 0 1 1 0 0 1 ...
#> $ fp.3.vs.3: num 0 0 0 0 1 0 0 1 0 0 ...
#> $ fn.3.vs.3: num 0 1 0 0 0 0 0 0 1 0 ...
The other (and perhaps more sensible) way to do it is to have 3 data frames, one for each variable, and each with twelve columns (three sets of tp, tn, fp, fn). You can do this easily using lapply:
df_list <- lapply(df, function(x)
{
dfs <- list()
for(j in seq_along(df2))
{
dfs[[j]] <- data.frame(ifelse(x == 1 & df2[j] == 1, 1, 0),
ifelse(x == 0 & df2[j] == 0, 1, 0),
ifelse(x == 1 & df2[j] == 0, 1, 0),
ifelse(x == 0 & df2[j] == 1, 1, 0))
}
setNames(do.call("cbind", dfs),
paste0(c("tp.", "tn.", "fp.", "fn."), rep(seq_along(df2), each = 4)))
})
Which gives you:
df_list
#> $variable1
#> tp.1 tn.1 fp.1 fn.1 tp.2 tn.2 fp.2 fn.2 tp.3 tn.3 fp.3 fn.3
#> 1 0 1 0 0 0 1 0 0 0 0 0 1
#> 2 1 0 0 0 0 0 1 0 1 0 0 0
#> 3 0 0 0 1 0 1 0 0 0 1 0 0
#> 4 0 0 0 1 0 1 0 0 0 1 0 0
#> 5 1 0 0 0 1 0 0 0 0 0 1 0
#> 6 0 0 1 0 0 0 1 0 0 0 1 0
#> 7 0 0 0 1 0 1 0 0 0 1 0 0
#> 8 0 0 1 0 1 0 0 0 0 0 1 0
#> 9 1 0 0 0 1 0 0 0 1 0 0 0
#> 10 1 0 0 0 1 0 0 0 0 0 1 0
#> 11 0 1 0 0 0 0 0 1 0 0 0 1
#> 12 0 0 1 0 1 0 0 0 1 0 0 0
#> 13 0 0 0 1 0 1 0 0 0 1 0 0
#> 14 0 1 0 0 0 0 0 1 0 0 0 1
#> 15 0 0 0 1 0 1 0 0 0 0 0 1
#> 16 0 1 0 0 0 1 0 0 0 0 0 1
#> 17 0 0 1 0 1 0 0 0 1 0 0 0
#> 18 0 0 1 0 0 0 1 0 0 0 1 0
#> 19 0 0 1 0 0 0 1 0 0 0 1 0
#> 20 1 0 0 0 0 0 1 0 1 0 0 0
#> 21 0 1 0 0 0 1 0 0 0 1 0 0
#> 22 1 0 0 0 0 0 1 0 1 0 0 0
#> 23 0 0 0 1 0 1 0 0 0 0 0 1
#> 24 0 0 0 1 0 1 0 0 0 0 0 1
#> 25 0 0 1 0 0 0 1 0 0 0 1 0
#> 26 0 0 1 0 0 0 1 0 0 0 1 0
#> 27 1 0 0 0 1 0 0 0 0 0 1 0
#> 28 0 0 1 0 0 0 1 0 1 0 0 0
#> 29 0 0 0 1 0 1 0 0 0 1 0 0
#> 30 0 0 1 0 0 0 1 0 0 0 1 0
#>
#> $variable2
#> tp.1 tn.1 fp.1 fn.1 tp.2 tn.2 fp.2 fn.2 tp.3 tn.3 fp.3 fn.3
#> 1 0 0 1 0 0 0 1 0 1 0 0 0
#> 2 1 0 0 0 0 0 1 0 1 0 0 0
#> 3 0 0 0 1 0 1 0 0 0 1 0 0
#> 4 0 0 0 1 0 1 0 0 0 1 0 0
#> 5 0 0 0 1 0 0 0 1 0 1 0 0
#> 6 0 1 0 0 0 1 0 0 0 1 0 0
#> 7 0 0 0 1 0 1 0 0 0 1 0 0
#> 8 0 1 0 0 0 0 0 1 0 1 0 0
#> 9 0 0 0 1 0 0 0 1 0 0 0 1
#> 10 0 0 0 1 0 0 0 1 0 1 0 0
#> 11 0 1 0 0 0 0 0 1 0 0 0 1
#> 12 0 0 1 0 1 0 0 0 1 0 0 0
#> 13 0 0 0 1 0 1 0 0 0 1 0 0
#> 14 0 0 1 0 1 0 0 0 1 0 0 0
#> 15 1 0 0 0 0 0 1 0 1 0 0 0
#> 16 0 0 1 0 0 0 1 0 1 0 0 0
#> 17 0 1 0 0 0 0 0 1 0 0 0 1
#> 18 0 0 1 0 0 0 1 0 0 0 1 0
#> 19 0 0 1 0 0 0 1 0 0 0 1 0
#> 20 1 0 0 0 0 0 1 0 1 0 0 0
#> 21 0 1 0 0 0 1 0 0 0 1 0 0
#> 22 0 0 0 1 0 1 0 0 0 0 0 1
#> 23 0 0 0 1 0 1 0 0 0 0 0 1
#> 24 1 0 0 0 0 0 1 0 1 0 0 0
#> 25 0 0 1 0 0 0 1 0 0 0 1 0
#> 26 0 0 1 0 0 0 1 0 0 0 1 0
#> 27 1 0 0 0 1 0 0 0 0 0 1 0
#> 28 0 0 1 0 0 0 1 0 1 0 0 0
#> 29 0 0 0 1 0 1 0 0 0 1 0 0
#> 30 0 0 1 0 0 0 1 0 0 0 1 0
#>
#> $variable3
#> tp.1 tn.1 fp.1 fn.1 tp.2 tn.2 fp.2 fn.2 tp.3 tn.3 fp.3 fn.3
#> 1 0 0 1 0 0 0 1 0 1 0 0 0
#> 2 0 0 0 1 0 1 0 0 0 0 0 1
#> 3 0 0 0 1 0 1 0 0 0 1 0 0
#> 4 0 0 0 1 0 1 0 0 0 1 0 0
#> 5 1 0 0 0 1 0 0 0 0 0 1 0
#> 6 0 1 0 0 0 1 0 0 0 1 0 0
#> 7 0 0 0 1 0 1 0 0 0 1 0 0
#> 8 0 0 1 0 1 0 0 0 0 0 1 0
#> 9 0 0 0 1 0 0 0 1 0 0 0 1
#> 10 0 0 0 1 0 0 0 1 0 1 0 0
#> 11 0 1 0 0 0 0 0 1 0 0 0 1
#> 12 0 1 0 0 0 0 0 1 0 0 0 1
#> 13 1 0 0 0 0 0 1 0 0 0 1 0
#> 14 0 1 0 0 0 0 0 1 0 0 0 1
#> 15 0 0 0 1 0 1 0 0 0 0 0 1
#> 16 0 0 1 0 0 0 1 0 1 0 0 0
#> 17 0 0 1 0 1 0 0 0 1 0 0 0
#> 18 0 1 0 0 0 1 0 0 0 1 0 0
#> 19 0 1 0 0 0 1 0 0 0 1 0 0
#> 20 1 0 0 0 0 0 1 0 1 0 0 0
#> 21 0 0 1 0 0 0 1 0 0 0 1 0
#> 22 0 0 0 1 0 1 0 0 0 0 0 1
#> 23 1 0 0 0 0 0 1 0 1 0 0 0
#> 24 0 0 0 1 0 1 0 0 0 0 0 1
#> 25 0 1 0 0 0 1 0 0 0 1 0 0
#> 26 0 1 0 0 0 1 0 0 0 1 0 0
#> 27 1 0 0 0 1 0 0 0 0 0 1 0
#> 28 0 0 1 0 0 0 1 0 1 0 0 0
#> 29 1 0 0 0 0 0 1 0 0 0 1 0
#> 30 0 0 1 0 0 0 1 0 0 0 1 0

Related

Add a new column generated from predict() to a list of dataframes

I have a logistic regression model. I would like to predict the morphology of items in multiple dataframes that have been put into a list.
I have lots of dataframes (most say working with a list of dataframes is better).
I need help with 1:
Applying the predict function to a list of dataframes.
Adding these predictions to their corresponding dataframe inside the list.
I am not sure whether it is better to have the 1000 dataframes separately and predict using loops etc, or to continue having them inside a list.
Prior to this code I have split my data into train and test sets. I then trained the model using:
library(nnet)
#Training the multinomial model
multinom_model <- multinom(Morphology ~ ., data=morph, maxit=500)
#Checking the model
summary(multinom_model)
This was then followed by validation etc.
My new dataset, consisting of multiple dataframes stored in a list, called rose.list was formatted by the following:
filesrose <- list.files(pattern = "_rose.csv")
#Rename all files of rose dataset 'rose.i'
for (i in seq_along(filesrose)) {
assign(paste("rose", i, sep = "."), read.csv(filesrose[i]))
}
#Make a list of the dataframes
rose.list <- lapply(ls(pattern="rose."), function(x) get(x))
I have been using this function to predict on a singular new dataframe
# Predicting the classification for individual datasets
rose.1$Morph <- predict(multinom_model, newdata=rose.1, "class")
Which gives me the dataframe, with the new prediction column 'Morph'
But how would I do this for multiple dataframes in my rose.list? I have tried:
lapply(rose.list, predict(multinom_model, "class"))
Error in eval(predvars, data, env) : object 'Area' not found
and, but also has the error:
lapply(rose.list, predict(multinom_model, newdata = rose.list, "class"))
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows:
You can use an anonymous function (those with function(x) or abbreviated \(x)).
library(nnet)
multinom_model <- multinom(low ~ ., birthwt)
lapply(df_list, \(x) predict(multinom_model, newdata=x, type='class'))
# $rose_1
# [1] 1 0 1 1 0 0 0 1 0 1 1 1 0 0 1 1 0 0 1 0 0 1 0 0 0 1 0 0 0 0 1 1 1 0 0 1 0 1 0
# [40] 1 0 0 0 0 0 1 1 1 0 1 1 0 1 1 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 1 1 1 1 1 0 0 1
# [79] 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 0
# [118] 1 0 0 1 1 0 1 0 0 0 1 1 0 1 1 1 0 1 0 1 1 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 1
# [157] 1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 1 0 1 0 0 0 0 1 0 1 1 1 1 0 0 1
# Levels: 0 1
#
# $rose_2
# [1] 0 1 0 1 1 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1 0 1 1 0 1 1 1 1 0 0 1 0 0 1 0 1 1 0 1
# [40] 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 1 1 1 0 1 1 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1
# [79] 1 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0
# [118] 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 0 1 1 0 0 0 1 0 0 1 0 0 0 1 0
# [157] 0 0 0 1 1 1 1 1 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0
# Levels: 0 1
#
# $rose_3
# [1] 0 0 0 0 1 1 0 1 1 0 0 1 0 0 0 0 1 1 1 1 0 1 0 0 0 0 0 0 1 0 0 0 0 1 1 1 0 0 1
# [40] 0 0 0 1 1 0 0 0 1 1 0 0 0 1 0 1 1 1 1 0 0 0 1 0 1 0 1 1 0 1 0 0 1 0 0 0 0 1 1
# [79] 0 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 1 0 0 0 1 0 0 1 0 1 0 1
# [118] 0 0 0 0 1 0 1 0 1 1 1 1 0 0 0 1 0 0 1 1 1 1 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0
# [157] 0 1 0 0 1 1 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1 0 0 1 0 1 1 0 0 0 0
# Levels: 0 1
update
To add the predictions as new column to each data frame in the list, modify the code like so:
res <- lapply(df_list, \(x) cbind(x, pred=predict(multinom_model, newdata=x, type="class")))
lapply(res, head)
# $rose_1
# low age lwt race smoke ptl ht ui ftv bwt pred
# 136 0 24 115 1 0 0 0 0 2 3090 0
# 154 0 26 133 3 1 2 0 0 0 3260 0
# 34 1 19 112 1 1 0 0 1 0 2084 1
# 166 0 16 112 2 0 0 0 0 0 3374 0
# 27 1 20 150 1 1 0 0 0 2 1928 1
# 218 0 26 160 3 0 0 0 0 0 4054 0
#
# $rose_2
# low age lwt race smoke ptl ht ui ftv bwt pred
# 167 0 16 135 1 1 0 0 0 0 3374 0
# 26 1 25 92 1 1 0 0 0 0 1928 1
# 149 0 23 119 3 0 0 0 0 2 3232 0
# 98 0 22 95 3 0 0 1 0 0 2751 0
# 222 0 31 120 1 0 0 0 0 2 4167 0
# 220 0 22 129 1 0 0 0 0 0 4111 0
#
# $rose_3
# low age lwt race smoke ptl ht ui ftv bwt pred
# 183 0 36 175 1 0 0 0 0 0 3600 0
# 86 0 33 155 3 0 0 0 0 3 2551 0
# 51 1 20 121 1 1 1 0 1 0 2296 1
# 17 1 23 97 3 0 0 0 1 1 1588 1
# 78 1 14 101 3 1 1 0 0 0 2466 1
# 167 0 16 135 1 1 0 0 0 0 3374 0
Data:
data('birthwt', package='MASS')
set.seed(42)
df_list <- replicate(3, birthwt[sample(nrow(birthwt), replace=TRUE), ], simplify=FALSE) |>
setNames(paste0('rose_', 1:3))

Dummy variable if it is in certain states

I am trying to create a dummy variable that is equal to 1 if it is in certain states.
My code is not working and I don't see the dummy variable being generated. any help would be appreciated.
as.integer(df2$physicaladdress.stateorprovincecode %in%
c("NJ", "NC", "PA", "RI", "WA", "DE", "GA", "HI",
"ID", "MD", "MT", "NM", "SC", "TX", "UT", "LA", "OH"))
In the console for I receive this but no variable is generated in the dataframe with the dummy variable.
[1] 1 1 0 1 1 0 1 1 0 1 1 1 0 1 1 1 1 1 0 0 0 0 0 0
[25] 0 0 0 0 1 1 1 1 0 1 0 1 1 0 0 1 1 1 1 1 0 1 0 0
[49] 1 1 1 1 1 1 0 0 1 0 1 1 1 1 1 0 1 0 1 0 0 1 0 0
[73] 1 0 1 0 0 1 0 1 1 1 1 0 0 1 1 1 0 0 1 0 0 1 0 0
[97] 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[121] 0 0 0 0 0 1 0 1 0 0 0 0 1 1 0 1 1 1 1 1 0 0 0 0
[145] 0 0 0 1 1 1 1 1 0 1 0 0 0 1 0 0 0 1 0 1 1 1 0 1
[169] 0 1 0 1 0 1 0 1 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0
[193] 0 1 0 1 0 0 1 0 1 1 1 0 1 0 0 0 0 1 0 0 0 1 1 1
[217] 1 0 1 0 0 0 0 1 0 0 1 1 0 1 0 0 1 0 1 0 0 0 0 0
[241] 0 0 1 0 0 0 0 0 0 1 0 0 0 1 1 1 1 0 0 0 0 0 0 1
[265] 1 0 0 0 0 0 1 0 0 1 1 1 1 0 0 0 1 0 1 0 1 0 1 0
[289] 1 0 1 1 1 0 0 1 0 0 0 1 1 1 1 0 1 0 0 0 0 0 0 1
[313] 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 1 0 0 0 1 1 0 0 0
[337] 0 0 1 1 0 1 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 1 1
[361] 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 1 1 0 0 1 1 0 0
[385] 1 0 0 0 1 1 1 0 1 1 0 0 0 1 1 0 1 0 1 1 0 1 1 0
[409] 0 0 1 1 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 1 1 0 0
[433] 0 1 0 0 0 0 0 0 0 0 1 0 1 0 1 0 0 0 1 1 0 0 1 1
[457] 0 0 1 1 1 0 0 0 0 1 1 1 0 0 0 1 1 0 1 0 1 0 1 1
[481] 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 1 0 1 0 0 0 1 1
[505] 0 1 0 0 0 1 0 0 0 0 1 1 0 1 0 1 0 1 0 1 0 1 1 0
[529] 0 1 1 0 1 1 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0
[553] 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 0 0 0 0 0 1 0
[577] 0 0 0 0 1 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1
[601] 0 1 0 1 1 0 1 0 1 1 1 1 0 1 0 0 0 0 1 1 0 0 1 0
[625] 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 1 0
[649] 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1
[673] 0 0 1 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 1 1 1 0 0
[697] 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 1 0 0
[721] 1 1 0 0 0 0 1 0 0 0 0 0 1 1 1 1 1 1 0 0 1 1 1 1
[745] 0 1 1 0 0 1 0 1 1 0 0 0 1 1 0 1 0 0 0 0 0 1 0 1
[769] 0 1 1 1 1 0 1 0 0 0 0 1 0 1 0 1 1 1 1 0 1 0 0 0
[793] 0 0 1 0 0 0 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1 0 1 0
[817] 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 1 1 1 0 0 0
[841] 1 1 0 0 0 0 1 0 0 1 1 1 1 0 1 0 0 0 0 1 0 1 0 0
[865] 0 1 1 1 0 1 1 0 1 0 0 0 1 0 0 1 0 1 0 0 0 1 1 0
[889] 1 0 1 0 1 1 1 1 0 1 0 1 0 1 0 0 0 0 1 0 1 0 0 0
[913] 0 0 0 0 0 1 1 1 0 1 0 1 1 1 0 0 1 1 1 0 0 0 0 0
[937] 1 0 1 1 0 0 0 0 0 0 0 1 0 1 0 1 1 1 1 0 0 1 0 0
[961] 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 1 0
[985] 1 0 0 0 0 0 0 0 1 1 1 0 0 1 1 0
[ reached getOption("max.print") -- omitted 547604 entries ]

r row names selection using columns

Suppose i have this matrix
0 1 2 3 4 5 6 98 183 385 419 420 422 423 469 470 35698 35709 35729 37415
0 0 1 1 1 0 0 1 0 0 1 0 0 0 0 1 0 0 0 0 1
1 1 0 1 0 0 1 1 0 0 1 0 0 0 0 1 0 0 0 0 0
2 1 1 0 1 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0
3 1 0 1 0 1 1 0 1 1 0 1 1 1 1 0 0 1 0 0 1
4 0 0 1 1 0 1 1 1 0 0 1 1 1 0 0 1 0 1 1 0
5 0 1 0 1 1 0 1 1 0 0 0 1 0 0 0 1 0 0 1 0
6 1 1 0 0 1 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0
98 0 0 0 1 1 1 1 0 0 0 0 1 0 0 0 1 0 0 1 0
183 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1
385 1 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
419 0 0 1 1 1 0 0 0 0 0 0 1 1 0 0 0 1 1 0 0
420 0 0 0 1 1 1 0 1 0 0 1 0 0 0 0 0 1 1 0 0
422 0 0 1 1 1 0 0 0 1 0 1 0 0 1 1 0 0 0 0 1
423 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1
469 1 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1
470 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0
35698 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0
35709 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0
35729 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 1 0 0 0 0
37415 1 0 0 1 0 0 0 0 1 0 0 0 1 1 1 0 0 0 0 0
I am getting a value from another program let us say
x=3.
I want to choose the name of rows where x == 1 i.e. where the value of 3 is 1.
Output will be : 0,2,4,5,98,183,419,420,422,423,35698,37415.
And I don't want to pass "3" directly into the command. I want to pass the variable x so that if this number varies I could get the output accordingly.
Can anyone help me, please? thanks in advance
x=matrix(c(1,1,2,5,6,6,5,7,7,8,3,3,1,9,20,20,4,7,9,5),4,5,dimnames = list(c(letters[1:4]),c(LETTERS[1:5])))
you'r requirement is row names then
rownames(x)[x[,"D"]==20]
here '20' is you'r input value and D is you'r searching column.

how to convert a matrix of values into a binary matrix

I'd like to convert a matrix of values into a matrix of 'bits'.
I have been looking for solutions and found this, which seems to be part of a solution.
I'll try to explain what I am looking for.
I have a matrix like
> x<-matrix(1:20,5,4)
> x
[,1] [,2] [,3] [,4]
[1,] 1 6 11 16
[2,] 2 7 12 17
[3,] 3 8 13 18
[4,] 4 9 14 19
[5,] 5 10 15 20
which I would like to convert into
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0
2 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0
3 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0
4 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0
5 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1
so for each value in the row a "1" in the corresponding column.
If I use
> table(sequence(length(x)),t(x))
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
5 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
9 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
10 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
11 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
12 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
13 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
14 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
15 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
17 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
18 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
19 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
this is close to what I am looking for, but returns a line for each value.
I would only need to consolidate all values from one row into one row.
Because a
> table(x)
x
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
gives alls values of the whole table, so what do I need to do to get the values per row.
Here is another option using table() function:
table(row(x), x)
# x
# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
# 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0
# 2 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0
# 3 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0
# 4 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0
# 5 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1
bit_x = matrix(0, nrow = nrow(x), ncol = max(x))
for (i in 1:nrow(x)) {bit_x[i,x[i,]] = 1}
Let
(x <- matrix(c(1, 3), 2, 2))
[,1] [,2]
[1,] 1 1
[2,] 3 3
One approach would be
M <- matrix(0, nrow(x), max(x))
M[cbind(c(row(x)), c(x))] <- 1
M
# [,1] [,2] [,3]
# [1,] 1 0 0
# [2,] 0 0 1
In one line:
replace(matrix(0, nrow(x), max(x)), cbind(c(row(x)), c(x)), 1).
Following your approach, and similarly to #Psidom's suggestion:
table(rep(1:nrow(x), ncol(x)), x)
# x
# 1 3
# 1 2 0
# 2 0 2
We can use the reshape2 package.
library(reshape2)
# At first we make the matrix you provided
x <- matrix(1:20, 5, 4)
# then melt it based on first column
da <- melt(x, id.var = 1)
# then cast it
dat <- dcast(da, Var1 ~ value, fill = 0, fun.aggregate = length)
which gives us this
Var1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0
2 2 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0
3 3 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0
4 4 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0
5 5 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1

Knitr echo function call with evaluated parameter names

If I have previously defined
n <- 200
maf <- .2
Is there any way to echo
snp <- rbinom(n,2,maf)
such that it displays as
snp <- rbinom(200,2,.2)
in the resulting knitr document.
You can use substitute like this :
```{r ,echo=FALSE}
substitute(snp <- rbinom(n,2,maf),list(n=n,maf=maf))
rbinom(n,2,maf)
```
## snp <- rbinom(200, 2, 0.2)
## [1] 0 0 1 1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 2 0 1 1 2 1 0 0 0 0
## [36] 1 0 0 1 0 0 0 0 0 1 0 0 1 0 0 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 1
## [71] 0 1 0 0 0 0 0 1 0 1 0 1 1 0 0 1 0 1 0 0 1 0 1 1 1 2 0 1 0 1 1 0 1 0 0
## [106] 1 1 0 1 1 1 0 0 1 0 0 0 0 0 0 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
## [141] 1 0 0 1 0 0 1 0 2 0 0 0 1 0 0 0 1 1 0 0 0 0 0 2 1 1 2 0 1 0 0 0 0 0 1
## [176] 0 0 2 0 1 1 0 0 0 0 0 1 0 1 2 0 1 1 2 0 0 1 1 1 0

Resources