Storing output for nested loop - r

I'm trying to do a nested loop for logistic regression.
I'm trying to run a loop for the discretization value and for each class.
Here's the code so far... I'm unable to get an output for each different iteration.
class <- c(1,2,3,4,5)
discretization_value <- seq(0.25, 0.75, by =0.05)
output<-data.frame(matrix(nrow=500, ncol=5))
names(output)=c("discretization_value", "class", "var1_coef", "var2_coef", "var3_coef")
for (i in discretization_value){
for (j in class) {
df$discretization_value <- ifelse(df$score >= i,1,0)
result <- (glm(discretization_value ~
var1 + var2 + var3,
data = df[df$class == j,], family= "binomial"))
output[i,1] <- i
output[i,2] <- j
output[i,3] <- coef(summary(result))[c("var1"),c("Estimate")]
output[i,4] <- coef(summary(result))[c("var2"),c("Estimate")]
output[i,5] <- coef(summary(result))[c("var3"),c("Estimate")]
}
}
a snippet of my df
class score var1 var2 var3
1 0.3 0.18 0.33 356
1 0.5 0.22 0.55 33
1 0.6 0.77 0.44 35
2 0.9 0.99 0.55 2
3 0 0 0 0
3 0.4 0.5 0.11 5
4 0 0.6 0 7
4 0 0.6 0 9
4 0.6 0.2 0.1 6

Could this be the problem?
data = df[df$class == j,], family= "binomial"))
I would try to remove the comma before the squared parenthesis.

Related

Computation with different combinations of parameters using for loop

I am trying to implement a for loop in R to fill a df with some combinations of learning rates and decays used in machine learning. The ideia is to try several learning rates and decays, calculate error metrics of these combinations and save in a dataset. So I could point out which combination is better.
Below is the code and my result. I don't understand why I get this result.
learning_rate = c(0.01, 0.02)
decay = c(0, 1e-1)
combinations = length(learning_rate) * length(decay)
df <- data.frame(Combination=character(combinations),
lr=double(combinations),
decay=double(combinations),
result=character(combinations),
stringsAsFactors=FALSE)
for (i in 1:combinations) {
for (lr in learning_rate) {
for (dc in decay) {
df[i, 1] = i
df[i, 2] = lr
df[i, 3] = dc
df[i, 4] = 10*lr + dc*4 # Here I'd do some machine learning. Just put this is easy equation as example
}
}
}
The result I get. It seems that only the combination loop worked well. What I did wrong?
Combination lr decay result
1 0.02 0.1 0.6
2 0.02 0.1 0.6
3 0.02 0.1 0.6
4 0.02 0.1 0.6
I expected this result
Combination lr decay result
1 0.01 0 0.1
2 0.01 1e-1 0.5
3 0.02 0 0.2
4 0.02 1e-1 0.6
Tuning with for-loop:
df <- data.frame()
for (lr in learning_rate) {
for (dc in decay) {
df <- rbind(df, data.frame(
lr = lr,
decay = dc,
result = 10*lr + dc*4
))
}
}
df
# lr decay result
# 1 0.01 0.0 0.1
# 2 0.01 0.1 0.5
# 3 0.02 0.0 0.2
# 4 0.02 0.1 0.6
Tuning with mapply():
df <- expand.grid(lr = learning_rate, decay = decay)
ML.fun <- function(lr, dc) 10*lr + dc*4
df$result <- mapply(ML.fun, lr = df$lr, dc = df$decay)
df
# lr decay result
# 1 0.01 0.0 0.1
# 2 0.02 0.0 0.2
# 3 0.01 0.1 0.5
# 4 0.02 0.1 0.6

How to use an IF function to update columns in a data frame?

predict <- read.table(header=TRUE, text="
0 1
0.44 0.55
0.76 0.24
0.71 0.29
0.75 0.24
0.25 0.75
")
I have attached a sample data frame with 2 columns titled '0' & '1'. I want to use an IF function so that if the value in the 0 column is bigger than 0.7 the cell updates to have a 0 value in it. Also if the value in the '1' column is bigger than 0.7 the cell updates to have a 1 value in it. Finally if neither the '0' or '1' values are bigger than 0.7 I would like the cells to return as -99. I have attached an example of what my sample would look like after this IF function was applied.
predict <- read.table(header=TRUE, text="
0 1
-99 -99
0 0.24
0 0.29
0 0.24
0.25 1
")
The code I have attempted is;
if(predict[,1] > 0.7 ){predict[,1] == '0' }
if(predict[,1] > 0.7 ){predict[,2] == '1' }
If you could advise me on the best way to update this IF function that would be really appreciated.
Update
Based on the intervention of AniGoyal (Many thanks for this!!!)
I updated the answer to fulfill the exact desired output of the OP:
I combined the two answers in one code to get the desired output:
Code:
predict %>%
as_tibble %>%
mutate(a = case_when(X0 > 0.7 ~ 0,
TRUE ~ ifelse(X0 < 0.7 & X1 < 0.7, -99, X0)),
b = case_when(X1 > 0.7 ~ 1,
TRUE ~ ifelse(X1 < 0.7 & X0 < 0.7, -99, X1))
) %>%
select(X0 = a, X1=b)
Output:
X0 X1
<dbl> <dbl>
1 -99 -99
2 0 0.24
3 0 0.290
4 0 0.24
5 0.25 1
We could use case_when from the dplyr package. Mutate changes columns X0 and X1 depending on den case_when condition.
library(dplyr)
predict %>%
mutate(X0 = case_when(X0 > 0.7 ~ 0,
TRUE ~ -99),
X1 = case_when(X1 > 0.7 ~ 1,
TRUE ~ -99)
)
Output:
X0 X1
1 -99 -99
2 0 -99
3 0 -99
4 0 -99
5 -99 1
ifelse
Or we could use ifelse https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/ifelse
predict$X0 <- ifelse(predict$X0 > 0.7, 0, -99)
predict$X1 <- ifelse(predict$X1 > 0.7, 1, -99)
predict
Note - numeric names for columns are less desirable ("0" and "1"). Here they are renamed to "X0" and "X1".
One approach with base R is to subset your data for your 3 circumstances, first checking to see if neither are greater than .7 (and set both to -99), then checking the 0 column (set to 0), then checking the 1 column (set to 1):
predict[!(predict$X0 > .7 | predict$X1 > .7), c("X0", "X1")] <- -99
predict[predict$X0 > .7, "X0"] <- 0
predict[predict$X1 > .7, "X1"] <- 1
predict
Output
X0 X1
1 -99.00 -99.00
2 0.00 0.24
3 0.00 0.29
4 0.00 0.24
5 0.25 1.00
This is just another way using dplyr:
library(dplyr)
predict %>%
as_tibble() %>%
mutate(X0 = ifelse(X0 > 0.7, 0, X0),
X1 = ifelse(X1 > 0.7, 1, X1)) %>%
mutate(across(X0:X1, ~ ifelse((X0 < 0.7 & X0 != 0) & (X1 < 0.7 & X0 != 0), -99, .)))
X0 X1
<dbl> <dbl>
1 -99 -99
2 0 0.24
3 0 0.290
4 0 0.24
5 0.25 1
There are two errors in the code you are trying -
baseR's if else doesn't work iteratively. So If you have to use that for a complete vector where each element is to be checked iteratively, you'll have to use it inside a loop
usage of == for assignment. == is used for comparision/conditionals and not for assignment. Use = for assignment.
If you still want to do it baseR's if else style
for(i in 1:nrow(predict)){
if(predict[i, 1] > 0.7){
predict[i, 1] = 0
}
if(predict[i,2] > 0.7){
predict[i, 2] = 1
}
if(predict[i, 1] < 0.7 & predict[i, 2] < 0.7 & predict[i, 1] >0){
predict[i, 1] = -99
predict[i, 2] = -99
}
}
> predict
X0 X1
1 -99.00 -99.00
2 0.00 0.24
3 0.00 0.29
4 0.00 0.24
5 0.25 1.00
You may also consider use of replace like this
predict[, 1] <- replace(predict[,1], predict[,1] > 0.7, 0)
predict[, 2] <- replace(predict[,2], predict[,2] > 0.7, 1)
predict[, 1] <- replace(predict[, 1], predict[, 2] < 0.7 & predict[, 1] < 0.7 & predict[, 1] > 0, -99)
predict[, 2] <- replace(predict[, 2], predict[, 2] < 0.7 & predict[, 1] < 0.7 & predict[, 1] > 0, -99)
> predict
X0 X1
1 -99.00 0.55
2 0.00 0.24
3 0.00 0.29
4 0.00 0.24
5 0.25 1.00

how can I set a dummy variable in a regression in R

The following is my data
y r1 r2 r3
1 0.1 0.2 -0.3
2 0.7 -0.9 0.03
3 -0.93 -0.32 -0.22
1.The first question is how can I get the output like this:
y r1 r2 r3 dummy_r1 dummy_r2 dummy_r3
1 0.1 0.2 -0.3 0 0 1
2 0.7 -0.9 0.03 0 1 0
3 -0.93 -0.32 -0.22 1 1 1
Note:I want the negative data equals to 1, and the positive data equals to 0
2.The second question is that if I want to do the regression like: lm(y~r1+r2+r3+dummy_r1+ dummy_r2+dummy_r3),what should I do if I don’t want to use the output data(dummy_r1,dummy_r2,dummy_r3) above, because it is not convenient.
Using DF shown reproducibly in the Note at the end, define DF2 to also have the sign.* columns and then run the regression on that. Of cousse you don't have enough data shown in the question to actually get coefficients for so many predictors but if in your real problem you have more data then it should be ok.
DF2 <- cbind(DF, sign = +(DF[-1] < 0))
lm(y ~., DF2)
giving:
Call:
lm(formula = y ~ ., data = DF2)
Coefficients:
(Intercept) r1 r2 r3 sign.r1
1.425 -1.163 -1.543 NA NA
sign.r2 sign.r3
NA NA
Note
Lines <- "y r1 r2 r3
1 0.1 0.2 -0.3
2 0.7 -0.9 0.03
3 -0.93 -0.32 -0.22"
DF <- read.table(text = Lines, header = TRUE)

Defining a function that includes for loops

I have 2 data frames.
One (df1) has columns for slopes and intercepts, and the other (df2) has an index column (i.e., row numbers).
I wish to apply a function based on parameters from df1 to the entire index column in df2. I don't want the function to mix and match slopes and intercepts (i.e., I want to make sure that the function always uses slopes and intercepts from the same columns in df1).
I tried to do this
my_function <- function(x) {for (i in df1$slope) for (j in df1$intercept) {((i*x)+j)}}
df3 <- for (k in df2$Index) {my_function(k)}
df3
but it didn't work.
Here are sample data:
> df1
thermocouple slope intercept
1 1 0.01 0.5
2 2 -0.01 0.4
3 3 0.03 0.2
> df2
index t_1 t_2 t_3
1 1 0.3 0.2 0.2
2 2 0.5 0.2 0.3
3 3 0.3 0.9 0.1
4 4 1.2 1.8 0.4
5 5 2.3 3.1 1.2
Here would be the output I need:
index baseline_t_1 baseline_t_2 baseline_t_3
1 0.51 0.39 0.23
2 0.52 0.38 0.26
3 0.53 0.37 0.29
4 0.54 0.36 0.32
5 0.55 0.35 0.35
What am I doing wrong?
Thanks!
Try this:
By passing three arguments with values at the same time to a anonymous function defined in Map.
Map( function(index, slope, intercept) (index * slope ) + intercept,
index = df2$Index, slope = df1$slope, intercept = df1$intercept)
May be this: I am not sure which one you prefer given there is no data and expected output in the question.
lapply( df2$index, function(index){
unlist( Map( function(slope, intercept) (index * slope ) + intercept,
slope = df1$slope, intercept = df1$intercept) )
})

dataframe column wise subtraction and division.

need help in N number or column wise subtraction and division, Below are the columns in a input dataframe.
input dataframe:
> df
A B C D
1 1 3 6 2
2 3 3 3 4
3 1 2 2 2
4 4 4 4 4
5 5 2 3 2
formula - a, (b - a) / (1-a)
MY CODE
ABC <- cbind.data.frame(DF[1], (DF[-1] - DF[-ncol(DF)])/(1 - DF[-ncol(DF)]))
Expected out:
A B C D
1 Inf -1.5 0.8
3 0.00 0.0 -0.5
1 Inf 0.0 0.0
4 0.00 0.0 0.0
5 0.75 -1.0 0.5
But i dont want to use ncol here, cause there is a last column after column D in the actual dataframe.
So want to apply this formula only till first 4 column, IF i use ncol, it will traverse till last column in the dataframe.
Please help thanks.
What about trying:
df <- matrix(c(1,3,6,2,3,3,3,4,1,2,2,2,4,4,4,4,5,2,3,2), nrow = 5, byrow = TRUE)
df_2 <- matrix((df[,2]-df[,1])/(1-df[,1]),5,1)
df_3 <- matrix((df[,3]-df[,2])/(1-df[,2]),5,1)
df_4 <- matrix((df[,4]-df[,3])/(1-df[,3]),5,1)
cbind(df[,1],df_2,df_3,df_4)
edit: a loop version
df <- matrix(c(1,3,6,2,3,3,3,4,1,2,2,2,4,4,4,4,5,2,3,2), nrow = 5, byrow = TRUE)
test_bind <- c()
test_bind <- cbind(test_bind, df[,1])
for (i in 1:3){df_1 <- matrix((df[,i+1]-df[,i])/(1-df[,i]),5,1)
test_bind <- cbind(test_bind,df_1)}
test_bind
here is one option with tidyverse
library(dplyr)
library(purrr)
map2_df(DF[2:4], DF[1:3], ~ (.x - .y)/(1- .y)) %>%
bind_cols(DF[1], .)
# A B C D
#1 1 Inf -1.5 0.8
#2 3 0.00 0.0 -0.5
#3 1 Inf 0.0 0.0
#4 4 0.00 0.0 0.0
#5 5 0.75 -1.0 0.5

Resources