Add a new column generated from predict() to a list of dataframes - r

I have a logistic regression model. I would like to predict the morphology of items in multiple dataframes that have been put into a list.
I have lots of dataframes (most say working with a list of dataframes is better).
I need help with 1:
Applying the predict function to a list of dataframes.
Adding these predictions to their corresponding dataframe inside the list.
I am not sure whether it is better to have the 1000 dataframes separately and predict using loops etc, or to continue having them inside a list.
Prior to this code I have split my data into train and test sets. I then trained the model using:
library(nnet)
#Training the multinomial model
multinom_model <- multinom(Morphology ~ ., data=morph, maxit=500)
#Checking the model
summary(multinom_model)
This was then followed by validation etc.
My new dataset, consisting of multiple dataframes stored in a list, called rose.list was formatted by the following:
filesrose <- list.files(pattern = "_rose.csv")
#Rename all files of rose dataset 'rose.i'
for (i in seq_along(filesrose)) {
assign(paste("rose", i, sep = "."), read.csv(filesrose[i]))
}
#Make a list of the dataframes
rose.list <- lapply(ls(pattern="rose."), function(x) get(x))
I have been using this function to predict on a singular new dataframe
# Predicting the classification for individual datasets
rose.1$Morph <- predict(multinom_model, newdata=rose.1, "class")
Which gives me the dataframe, with the new prediction column 'Morph'
But how would I do this for multiple dataframes in my rose.list? I have tried:
lapply(rose.list, predict(multinom_model, "class"))
Error in eval(predvars, data, env) : object 'Area' not found
and, but also has the error:
lapply(rose.list, predict(multinom_model, newdata = rose.list, "class"))
Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, :
arguments imply differing number of rows:

You can use an anonymous function (those with function(x) or abbreviated \(x)).
library(nnet)
multinom_model <- multinom(low ~ ., birthwt)
lapply(df_list, \(x) predict(multinom_model, newdata=x, type='class'))
# $rose_1
# [1] 1 0 1 1 0 0 0 1 0 1 1 1 0 0 1 1 0 0 1 0 0 1 0 0 0 1 0 0 0 0 1 1 1 0 0 1 0 1 0
# [40] 1 0 0 0 0 0 1 1 1 0 1 1 0 1 1 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 1 1 1 1 1 0 0 1
# [79] 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 0 0 0 0 0 1 1 1 0 0 0 0 0 1 1 0
# [118] 1 0 0 1 1 0 1 0 0 0 1 1 0 1 1 1 0 1 0 1 1 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 0 0 1
# [157] 1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 1 0 1 0 0 0 0 1 0 1 1 1 1 0 0 1
# Levels: 0 1
#
# $rose_2
# [1] 0 1 0 1 1 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1 0 1 1 0 1 1 1 1 0 0 1 0 0 1 0 1 1 0 1
# [40] 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 1 1 1 0 1 1 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1
# [79] 1 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0
# [118] 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 0 1 1 0 0 0 1 0 0 1 0 0 0 1 0
# [157] 0 0 0 1 1 1 1 1 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0
# Levels: 0 1
#
# $rose_3
# [1] 0 0 0 0 1 1 0 1 1 0 0 1 0 0 0 0 1 1 1 1 0 1 0 0 0 0 0 0 1 0 0 0 0 1 1 1 0 0 1
# [40] 0 0 0 1 1 0 0 0 1 1 0 0 0 1 0 1 1 1 1 0 0 0 1 0 1 0 1 1 0 1 0 0 1 0 0 0 0 1 1
# [79] 0 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 0 0 1 0 0 0 1 0 0 1 0 1 0 1
# [118] 0 0 0 0 1 0 1 0 1 1 1 1 0 0 0 1 0 0 1 1 1 1 0 1 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0
# [157] 0 1 0 0 1 1 1 0 0 1 0 0 1 0 0 1 0 1 0 0 0 0 1 0 0 1 0 1 1 0 0 0 0
# Levels: 0 1
update
To add the predictions as new column to each data frame in the list, modify the code like so:
res <- lapply(df_list, \(x) cbind(x, pred=predict(multinom_model, newdata=x, type="class")))
lapply(res, head)
# $rose_1
# low age lwt race smoke ptl ht ui ftv bwt pred
# 136 0 24 115 1 0 0 0 0 2 3090 0
# 154 0 26 133 3 1 2 0 0 0 3260 0
# 34 1 19 112 1 1 0 0 1 0 2084 1
# 166 0 16 112 2 0 0 0 0 0 3374 0
# 27 1 20 150 1 1 0 0 0 2 1928 1
# 218 0 26 160 3 0 0 0 0 0 4054 0
#
# $rose_2
# low age lwt race smoke ptl ht ui ftv bwt pred
# 167 0 16 135 1 1 0 0 0 0 3374 0
# 26 1 25 92 1 1 0 0 0 0 1928 1
# 149 0 23 119 3 0 0 0 0 2 3232 0
# 98 0 22 95 3 0 0 1 0 0 2751 0
# 222 0 31 120 1 0 0 0 0 2 4167 0
# 220 0 22 129 1 0 0 0 0 0 4111 0
#
# $rose_3
# low age lwt race smoke ptl ht ui ftv bwt pred
# 183 0 36 175 1 0 0 0 0 0 3600 0
# 86 0 33 155 3 0 0 0 0 3 2551 0
# 51 1 20 121 1 1 1 0 1 0 2296 1
# 17 1 23 97 3 0 0 0 1 1 1588 1
# 78 1 14 101 3 1 1 0 0 0 2466 1
# 167 0 16 135 1 1 0 0 0 0 3374 0
Data:
data('birthwt', package='MASS')
set.seed(42)
df_list <- replicate(3, birthwt[sample(nrow(birthwt), replace=TRUE), ], simplify=FALSE) |>
setNames(paste0('rose_', 1:3))

Related

How to keep ID in dummyVars()

I would like to do transform Gender and Country using One-Hot-Encoding.
With the code below I can not create the new dataset including the ID
library(caret)
ID<-1:10
Gender<-c("F","F","F","M","M","F","M","M","F","M")
Country<-c("Mali","France","France","Guinea","Senegal",
"Mali","France","Mali","Senegal","France")
data<-data.frame(ID,Gender,Country)
#One hot encoding
dmy <- dummyVars(" ~Gender+Country", data = data, fullRank = T)
dat_transformed <- data.frame(predict(dmy, newdata = data))
dat_transformed
Gender.M Country.Guinea Country.Mali Country.Senegal
1 0 0 1 0
2 0 0 0 0
3 0 0 0 0
4 1 1 0 0
5 1 0 0 1
6 0 0 1 0
7 1 0 0 0
8 1 0 1 0
9 0 0 0 1
10 1 0 0 0
I want to get a dataset that include the ID without enconding it.
ID Gender.M Country.Guinea Country.Mali Country.Senegal
1 1 0 0 1 0
2 2 0 0 0 0
3 3 0 0 0 0
4 4 1 1 0 0
5 5 1 0 0 1
6 6 0 0 1 0
7 7 1 0 0 0
8 8 1 0 1 0
9 9 0 0 0 1
10 10 1 0 0 0
dat_transformed <- cbind(ID,dat_transformed)
dat_transformed
ID Gender.M Country.Guinea Country.Mali Country.Senegal
1 0 0 1 0
2 0 0 0 0
3 0 0 0 0
4 1 1 0 0
5 1 0 0 1
6 0 0 1 0
7 1 0 0 0
8 1 0 1 0
9 0 0 0 1
10 1 0 0 0

How to write nested for loop so that the inner loop does not overwrite the first loop values

New poster on Stackoverflow but long time viewer. I could not find any previous posts that get at my specific question.
Basically, I am struggling with how to make use of a nested for loop for my problem. The issue is that the number of variables and outcomes will change with the use case, so I want a solution that is flexible for various permutations. I am not sure that apply would help me because I don't know in advance how many variables and outcomes will exist in any given use case.
The goal is to classify whether the outcome is correctly predicted by the variable (tp = true positive, etc).
The problem is that the inner loop causes the outer loop values to be overwritten, but what I want is for each outcome to be evaluated over each variable once independently. Not sure what the best way to do this is and any advice appreciated.
#Repex code
#Generate variable
variable <- c(1,2,3)
df <- as.data.frame(matrix(0, ncol = 0, nrow = 30))
for(i in 1:length(variable)){
df[,c(paste0("variable",variable[i]))]<-as.vector(sample(c(0,1), replace=TRUE, size=30))
}
df
#Generate outcome
outcome <- c(1,2,3)
df2 <- as.data.frame(matrix(0, ncol = 0, nrow = 30))
for(i in 1:length(outcome)){
df2[,c(paste0("outcome",outcome[i]))]<-as.vector(sample(c(0,1), replace=TRUE, size=30))
}
df2
#Generate performance metrics of outcome and predictor
for (i in variable){
for(j in 1:length(df2)){
df[, c(paste0("tp.",variable[i]))] <- as.vector(ifelse(df[, c(paste0("variable",variable[i]))]==1 & df2[j]==1,1,0))
df[, c(paste0("tn.",variable[i]))] <- as.vector(ifelse(df[, c(paste0("variable",variable[i]))]==0 & df2[j]==0,1,0))
df[, c(paste0("fp.",variable[i]))] <- as.vector(ifelse(df[, c(paste0("variable",variable[i]))]==1 & df2[j]==0,1,0))
df[, c(paste0("fn.",variable[i]))] <- as.vector(ifelse(df[, c(paste0("variable",variable[i]))]==0 & df2[j]==1,1,0))
}
}
df
#bind the data for comparison and spot checking
df3 <- cbind(df2,df)
#here we see that only the final inner loop data are correct
df3
The problem is that you have 3 different variables that you want to compare to 3 different outcomes, so you are making 9 comparisons. However, since you are labelling your columns only according to the variable, you only have three unique numeric suffixes (one for each value of i) pasted on to each statistic (tp, tn, fp and fn). You therefore only have 12 distinct column names.
At no point are you labelling the columns according to both the variable and the outcome. That means that every time your inner loop increments to the next outcome variable, you are over-writing the column in df that you wrote in the previous iteration of the loop.
In any case, how would you intend to keep track of which comparison you are making unless you use both the variable number and the outcome number to label your columns?
So you could do it this way:
for (i in variable)
{
V <- c(paste0("variable", i))
for(j in seq_along(df2))
{
comp <- paste0(i, ".vs.", j)
df[paste0("tp.", comp)] <- as.numeric(df[V] == 1 & df2[j] == 1)
df[paste0("tn.", comp)] <- as.numeric(df[V] == 0 & df2[j] == 0)
df[paste0("fp.", comp)] <- as.numeric(df[V] == 1 & df2[j] == 0)
df[paste0("fn.", comp)] <- as.numeric(df[V] == 0 & df2[j] == 1)
}
}
df3 <- cbind(df2, df)
Which will give you the structure you were looking for. It's a large data frame, so we'll just peek at it with str:
str(df3)
#> 'data.frame': 30 obs. of 42 variables:
#> $ outcome1 : num 0 1 1 1 1 0 1 0 1 1 ...
#> $ outcome2 : num 0 0 0 0 1 0 0 1 1 1 ...
#> $ outcome3 : num 1 1 0 0 0 0 0 0 1 0 ...
#> $ variable1: num 0 1 0 0 1 1 0 1 1 1 ...
#> $ variable2: num 1 1 0 0 0 0 0 0 0 0 ...
#> $ variable3: num 1 0 0 0 1 0 0 1 0 0 ...
#> $ tp.1.vs.1: num 0 1 0 0 1 0 0 0 1 1 ...
#> $ tn.1.vs.1: num 1 0 0 0 0 0 0 0 0 0 ...
#> $ fp.1.vs.1: num 0 0 0 0 0 1 0 1 0 0 ...
#> $ fn.1.vs.1: num 0 0 1 1 0 0 1 0 0 0 ...
#> $ tp.1.vs.2: num 0 0 0 0 1 0 0 1 1 1 ...
#> $ tn.1.vs.2: num 1 0 1 1 0 0 1 0 0 0 ...
#> $ fp.1.vs.2: num 0 1 0 0 0 1 0 0 0 0 ...
#> $ fn.1.vs.2: num 0 0 0 0 0 0 0 0 0 0 ...
#> $ tp.1.vs.3: num 0 1 0 0 0 0 0 0 1 0 ...
#> $ tn.1.vs.3: num 0 0 1 1 0 0 1 0 0 0 ...
#> $ fp.1.vs.3: num 0 0 0 0 1 1 0 1 0 1 ...
#> $ fn.1.vs.3: num 1 0 0 0 0 0 0 0 0 0 ...
#> $ tp.2.vs.1: num 0 1 0 0 0 0 0 0 0 0 ...
#> $ tn.2.vs.1: num 0 0 0 0 0 1 0 1 0 0 ...
#> $ fp.2.vs.1: num 1 0 0 0 0 0 0 0 0 0 ...
#> $ fn.2.vs.1: num 0 0 1 1 1 0 1 0 1 1 ...
#> $ tp.2.vs.2: num 0 0 0 0 0 0 0 0 0 0 ...
#> $ tn.2.vs.2: num 0 0 1 1 0 1 1 0 0 0 ...
#> $ fp.2.vs.2: num 1 1 0 0 0 0 0 0 0 0 ...
#> $ fn.2.vs.2: num 0 0 0 0 1 0 0 1 1 1 ...
#> $ tp.2.vs.3: num 1 1 0 0 0 0 0 0 0 0 ...
#> $ tn.2.vs.3: num 0 0 1 1 1 1 1 1 0 1 ...
#> $ fp.2.vs.3: num 0 0 0 0 0 0 0 0 0 0 ...
#> $ fn.2.vs.3: num 0 0 0 0 0 0 0 0 1 0 ...
#> $ tp.3.vs.1: num 0 0 0 0 1 0 0 0 0 0 ...
#> $ tn.3.vs.1: num 0 0 0 0 0 1 0 0 0 0 ...
#> $ fp.3.vs.1: num 1 0 0 0 0 0 0 1 0 0 ...
#> $ fn.3.vs.1: num 0 1 1 1 0 0 1 0 1 1 ...
#> $ tp.3.vs.2: num 0 0 0 0 1 0 0 1 0 0 ...
#> $ tn.3.vs.2: num 0 1 1 1 0 1 1 0 0 0 ...
#> $ fp.3.vs.2: num 1 0 0 0 0 0 0 0 0 0 ...
#> $ fn.3.vs.2: num 0 0 0 0 0 0 0 0 1 1 ...
#> $ tp.3.vs.3: num 1 0 0 0 0 0 0 0 0 0 ...
#> $ tn.3.vs.3: num 0 0 1 1 0 1 1 0 0 1 ...
#> $ fp.3.vs.3: num 0 0 0 0 1 0 0 1 0 0 ...
#> $ fn.3.vs.3: num 0 1 0 0 0 0 0 0 1 0 ...
The other (and perhaps more sensible) way to do it is to have 3 data frames, one for each variable, and each with twelve columns (three sets of tp, tn, fp, fn). You can do this easily using lapply:
df_list <- lapply(df, function(x)
{
dfs <- list()
for(j in seq_along(df2))
{
dfs[[j]] <- data.frame(ifelse(x == 1 & df2[j] == 1, 1, 0),
ifelse(x == 0 & df2[j] == 0, 1, 0),
ifelse(x == 1 & df2[j] == 0, 1, 0),
ifelse(x == 0 & df2[j] == 1, 1, 0))
}
setNames(do.call("cbind", dfs),
paste0(c("tp.", "tn.", "fp.", "fn."), rep(seq_along(df2), each = 4)))
})
Which gives you:
df_list
#> $variable1
#> tp.1 tn.1 fp.1 fn.1 tp.2 tn.2 fp.2 fn.2 tp.3 tn.3 fp.3 fn.3
#> 1 0 1 0 0 0 1 0 0 0 0 0 1
#> 2 1 0 0 0 0 0 1 0 1 0 0 0
#> 3 0 0 0 1 0 1 0 0 0 1 0 0
#> 4 0 0 0 1 0 1 0 0 0 1 0 0
#> 5 1 0 0 0 1 0 0 0 0 0 1 0
#> 6 0 0 1 0 0 0 1 0 0 0 1 0
#> 7 0 0 0 1 0 1 0 0 0 1 0 0
#> 8 0 0 1 0 1 0 0 0 0 0 1 0
#> 9 1 0 0 0 1 0 0 0 1 0 0 0
#> 10 1 0 0 0 1 0 0 0 0 0 1 0
#> 11 0 1 0 0 0 0 0 1 0 0 0 1
#> 12 0 0 1 0 1 0 0 0 1 0 0 0
#> 13 0 0 0 1 0 1 0 0 0 1 0 0
#> 14 0 1 0 0 0 0 0 1 0 0 0 1
#> 15 0 0 0 1 0 1 0 0 0 0 0 1
#> 16 0 1 0 0 0 1 0 0 0 0 0 1
#> 17 0 0 1 0 1 0 0 0 1 0 0 0
#> 18 0 0 1 0 0 0 1 0 0 0 1 0
#> 19 0 0 1 0 0 0 1 0 0 0 1 0
#> 20 1 0 0 0 0 0 1 0 1 0 0 0
#> 21 0 1 0 0 0 1 0 0 0 1 0 0
#> 22 1 0 0 0 0 0 1 0 1 0 0 0
#> 23 0 0 0 1 0 1 0 0 0 0 0 1
#> 24 0 0 0 1 0 1 0 0 0 0 0 1
#> 25 0 0 1 0 0 0 1 0 0 0 1 0
#> 26 0 0 1 0 0 0 1 0 0 0 1 0
#> 27 1 0 0 0 1 0 0 0 0 0 1 0
#> 28 0 0 1 0 0 0 1 0 1 0 0 0
#> 29 0 0 0 1 0 1 0 0 0 1 0 0
#> 30 0 0 1 0 0 0 1 0 0 0 1 0
#>
#> $variable2
#> tp.1 tn.1 fp.1 fn.1 tp.2 tn.2 fp.2 fn.2 tp.3 tn.3 fp.3 fn.3
#> 1 0 0 1 0 0 0 1 0 1 0 0 0
#> 2 1 0 0 0 0 0 1 0 1 0 0 0
#> 3 0 0 0 1 0 1 0 0 0 1 0 0
#> 4 0 0 0 1 0 1 0 0 0 1 0 0
#> 5 0 0 0 1 0 0 0 1 0 1 0 0
#> 6 0 1 0 0 0 1 0 0 0 1 0 0
#> 7 0 0 0 1 0 1 0 0 0 1 0 0
#> 8 0 1 0 0 0 0 0 1 0 1 0 0
#> 9 0 0 0 1 0 0 0 1 0 0 0 1
#> 10 0 0 0 1 0 0 0 1 0 1 0 0
#> 11 0 1 0 0 0 0 0 1 0 0 0 1
#> 12 0 0 1 0 1 0 0 0 1 0 0 0
#> 13 0 0 0 1 0 1 0 0 0 1 0 0
#> 14 0 0 1 0 1 0 0 0 1 0 0 0
#> 15 1 0 0 0 0 0 1 0 1 0 0 0
#> 16 0 0 1 0 0 0 1 0 1 0 0 0
#> 17 0 1 0 0 0 0 0 1 0 0 0 1
#> 18 0 0 1 0 0 0 1 0 0 0 1 0
#> 19 0 0 1 0 0 0 1 0 0 0 1 0
#> 20 1 0 0 0 0 0 1 0 1 0 0 0
#> 21 0 1 0 0 0 1 0 0 0 1 0 0
#> 22 0 0 0 1 0 1 0 0 0 0 0 1
#> 23 0 0 0 1 0 1 0 0 0 0 0 1
#> 24 1 0 0 0 0 0 1 0 1 0 0 0
#> 25 0 0 1 0 0 0 1 0 0 0 1 0
#> 26 0 0 1 0 0 0 1 0 0 0 1 0
#> 27 1 0 0 0 1 0 0 0 0 0 1 0
#> 28 0 0 1 0 0 0 1 0 1 0 0 0
#> 29 0 0 0 1 0 1 0 0 0 1 0 0
#> 30 0 0 1 0 0 0 1 0 0 0 1 0
#>
#> $variable3
#> tp.1 tn.1 fp.1 fn.1 tp.2 tn.2 fp.2 fn.2 tp.3 tn.3 fp.3 fn.3
#> 1 0 0 1 0 0 0 1 0 1 0 0 0
#> 2 0 0 0 1 0 1 0 0 0 0 0 1
#> 3 0 0 0 1 0 1 0 0 0 1 0 0
#> 4 0 0 0 1 0 1 0 0 0 1 0 0
#> 5 1 0 0 0 1 0 0 0 0 0 1 0
#> 6 0 1 0 0 0 1 0 0 0 1 0 0
#> 7 0 0 0 1 0 1 0 0 0 1 0 0
#> 8 0 0 1 0 1 0 0 0 0 0 1 0
#> 9 0 0 0 1 0 0 0 1 0 0 0 1
#> 10 0 0 0 1 0 0 0 1 0 1 0 0
#> 11 0 1 0 0 0 0 0 1 0 0 0 1
#> 12 0 1 0 0 0 0 0 1 0 0 0 1
#> 13 1 0 0 0 0 0 1 0 0 0 1 0
#> 14 0 1 0 0 0 0 0 1 0 0 0 1
#> 15 0 0 0 1 0 1 0 0 0 0 0 1
#> 16 0 0 1 0 0 0 1 0 1 0 0 0
#> 17 0 0 1 0 1 0 0 0 1 0 0 0
#> 18 0 1 0 0 0 1 0 0 0 1 0 0
#> 19 0 1 0 0 0 1 0 0 0 1 0 0
#> 20 1 0 0 0 0 0 1 0 1 0 0 0
#> 21 0 0 1 0 0 0 1 0 0 0 1 0
#> 22 0 0 0 1 0 1 0 0 0 0 0 1
#> 23 1 0 0 0 0 0 1 0 1 0 0 0
#> 24 0 0 0 1 0 1 0 0 0 0 0 1
#> 25 0 1 0 0 0 1 0 0 0 1 0 0
#> 26 0 1 0 0 0 1 0 0 0 1 0 0
#> 27 1 0 0 0 1 0 0 0 0 0 1 0
#> 28 0 0 1 0 0 0 1 0 1 0 0 0
#> 29 1 0 0 0 0 0 1 0 0 0 1 0
#> 30 0 0 1 0 0 0 1 0 0 0 1 0

recursively write out model matrix in R

in the analysis I am running there are many predictor variables fro which I would like to build a model matrix. However, the model matrix requires a formula in a format such as
t<-model.matrix(f[,1]~f[,2]+f[,3]+....)
if my data frame is called f is there a quick way with paste or somethign just to write out this formula recusively? Otherwise Iw oudl need to type everything
Why not use:
f <- data.frame(z = 1:10, b= 1:10, d=factor(1:10))
model.matrix(~. , data=f[-1])
#-------------
(Intercept) b d2 d3 d4 d5 d6 d7 d8 d9 d10
1 1 1 0 0 0 0 0 0 0 0 0
2 1 2 1 0 0 0 0 0 0 0 0
3 1 3 0 1 0 0 0 0 0 0 0
4 1 4 0 0 1 0 0 0 0 0 0
5 1 5 0 0 0 1 0 0 0 0 0
6 1 6 0 0 0 0 1 0 0 0 0
7 1 7 0 0 0 0 0 1 0 0 0
8 1 8 0 0 0 0 0 0 1 0 0
9 1 9 0 0 0 0 0 0 0 1 0
10 1 10 0 0 0 0 0 0 0 0 1
attr(,"assign")
[1] 0 1 2 2 2 2 2 2 2 2 2
attr(,"contrasts")
attr(,"contrasts")$d
[1] "contr.treatment"
Compare to what you get with:
> model.matrix(z~., f)
(Intercept) b d2 d3 d4 d5 d6 d7 d8 d9 d10
1 1 1 0 0 0 0 0 0 0 0 0
2 1 2 1 0 0 0 0 0 0 0 0
3 1 3 0 1 0 0 0 0 0 0 0
4 1 4 0 0 1 0 0 0 0 0 0
5 1 5 0 0 0 1 0 0 0 0 0
6 1 6 0 0 0 0 1 0 0 0 0
7 1 7 0 0 0 0 0 1 0 0 0
8 1 8 0 0 0 0 0 0 1 0 0
9 1 9 0 0 0 0 0 0 0 1 0
10 1 10 0 0 0 0 0 0 0 0 1
attr(,"assign")
[1] 0 1 2 2 2 2 2 2 2 2 2
attr(,"contrasts")
attr(,"contrasts")$d
[1] "contr.treatment"

Copy a row but with some modifications

I have a large data set like this:
SUB SMOKE AMT MDV ADDL II EVID
1 0 0 0 0 0 0
1 0 20 0 16 24 1
1 0 0 0 0 0 0
1 0 0 0 0 0 0
2 1 0 0 0 0 0
2 1 50 0 24 12 1
2 1 0 0 0 0 0
2 1 0 0 0 0 0
...
I want to copy the row where EVID=1 and insert it below, but for the copied row, AMT,ADDL,II and EVID should all equal to 0, SMOKE and MDV remain the same. The expected output should look like this:
SUB SMOKE AMT MDV ADDL II EVID
1 0 0 0 0 0 0
1 0 20 0 16 24 1
1 0 0 0 0 0 0
1 0 0 0 0 0 0
1 0 0 0 0 0 0
2 1 0 0 0 0 0
2 1 50 0 24 12 1
2 1 0 0 0 0 0
2 1 0 0 0 0 0
2 1 0 0 0 0 0
...
Does anyone have idea about realizing this?
# repeat EVID=0 rows 1 time and EVID=1 rows 2 times
r <- rep(1:nrow(DF), DF$EVID + 1)
DF2 <- DF[r, ]
# insert zeros
DF2[duplicated(r), c("AMT", "ADDL", "II", "EVID")] <- 0
giving:
> DF2
SUB SMOKE AMT MDV ADDL II EVID
1 1 0 0 0 0 0 0
2 1 0 20 0 16 24 1
2.1 1 0 0 0 0 0 0
3 1 0 0 0 0 0 0
4 1 0 0 0 0 0 0
5 2 1 0 0 0 0 0
6 2 1 50 0 24 12 1
6.1 2 1 0 0 0 0 0
7 2 1 0 0 0 0 0
8 2 1 0 0 0 0 0
Maybe this:
> t2 <- t[t$EVID==1,] # t is your data.frame
> t2[c("AMT","ADDL","II","EVID")] <- 0
> t2
SUB SMOKE AMT MDV ADDL II EVID
2 1 0 0 0 0 0 0
6 2 1 0 0 0 0 0
> rbind(t,t2)
SUB SMOKE AMT MDV ADDL II EVID
1 1 0 0 0 0 0 0
2 1 0 20 0 16 24 1
3 1 0 0 0 0 0 0
4 1 0 0 0 0 0 0
5 2 1 0 0 0 0 0
6 2 1 50 0 24 12 1
7 2 1 0 0 0 0 0
8 2 1 0 0 0 0 0
21 1 0 0 0 0 0 0 # this row
61 2 1 0 0 0 0 0 # and this one are new

Knitr echo function call with evaluated parameter names

If I have previously defined
n <- 200
maf <- .2
Is there any way to echo
snp <- rbinom(n,2,maf)
such that it displays as
snp <- rbinom(200,2,.2)
in the resulting knitr document.
You can use substitute like this :
```{r ,echo=FALSE}
substitute(snp <- rbinom(n,2,maf),list(n=n,maf=maf))
rbinom(n,2,maf)
```
## snp <- rbinom(200, 2, 0.2)
## [1] 0 0 1 1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 2 0 1 1 2 1 0 0 0 0
## [36] 1 0 0 1 0 0 0 0 0 1 0 0 1 0 0 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0 0 1
## [71] 0 1 0 0 0 0 0 1 0 1 0 1 1 0 0 1 0 1 0 0 1 0 1 1 1 2 0 1 0 1 1 0 1 0 0
## [106] 1 1 0 1 1 1 0 0 1 0 0 0 0 0 0 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
## [141] 1 0 0 1 0 0 1 0 2 0 0 0 1 0 0 0 1 1 0 0 0 0 0 2 1 1 2 0 1 0 0 0 0 0 1
## [176] 0 0 2 0 1 1 0 0 0 0 0 1 0 1 2 0 1 1 2 0 0 1 1 1 0

Resources