how can I set a dummy variable in a regression in R - r

The following is my data
y r1 r2 r3
1 0.1 0.2 -0.3
2 0.7 -0.9 0.03
3 -0.93 -0.32 -0.22
1.The first question is how can I get the output like this:
y r1 r2 r3 dummy_r1 dummy_r2 dummy_r3
1 0.1 0.2 -0.3 0 0 1
2 0.7 -0.9 0.03 0 1 0
3 -0.93 -0.32 -0.22 1 1 1
Note:I want the negative data equals to 1, and the positive data equals to 0
2.The second question is that if I want to do the regression like: lm(y~r1+r2+r3+dummy_r1+ dummy_r2+dummy_r3),what should I do if I don’t want to use the output data(dummy_r1,dummy_r2,dummy_r3) above, because it is not convenient.

Using DF shown reproducibly in the Note at the end, define DF2 to also have the sign.* columns and then run the regression on that. Of cousse you don't have enough data shown in the question to actually get coefficients for so many predictors but if in your real problem you have more data then it should be ok.
DF2 <- cbind(DF, sign = +(DF[-1] < 0))
lm(y ~., DF2)
giving:
Call:
lm(formula = y ~ ., data = DF2)
Coefficients:
(Intercept) r1 r2 r3 sign.r1
1.425 -1.163 -1.543 NA NA
sign.r2 sign.r3
NA NA
Note
Lines <- "y r1 r2 r3
1 0.1 0.2 -0.3
2 0.7 -0.9 0.03
3 -0.93 -0.32 -0.22"
DF <- read.table(text = Lines, header = TRUE)

Related

Computation with different combinations of parameters using for loop

I am trying to implement a for loop in R to fill a df with some combinations of learning rates and decays used in machine learning. The ideia is to try several learning rates and decays, calculate error metrics of these combinations and save in a dataset. So I could point out which combination is better.
Below is the code and my result. I don't understand why I get this result.
learning_rate = c(0.01, 0.02)
decay = c(0, 1e-1)
combinations = length(learning_rate) * length(decay)
df <- data.frame(Combination=character(combinations),
lr=double(combinations),
decay=double(combinations),
result=character(combinations),
stringsAsFactors=FALSE)
for (i in 1:combinations) {
for (lr in learning_rate) {
for (dc in decay) {
df[i, 1] = i
df[i, 2] = lr
df[i, 3] = dc
df[i, 4] = 10*lr + dc*4 # Here I'd do some machine learning. Just put this is easy equation as example
}
}
}
The result I get. It seems that only the combination loop worked well. What I did wrong?
Combination lr decay result
1 0.02 0.1 0.6
2 0.02 0.1 0.6
3 0.02 0.1 0.6
4 0.02 0.1 0.6
I expected this result
Combination lr decay result
1 0.01 0 0.1
2 0.01 1e-1 0.5
3 0.02 0 0.2
4 0.02 1e-1 0.6
Tuning with for-loop:
df <- data.frame()
for (lr in learning_rate) {
for (dc in decay) {
df <- rbind(df, data.frame(
lr = lr,
decay = dc,
result = 10*lr + dc*4
))
}
}
df
# lr decay result
# 1 0.01 0.0 0.1
# 2 0.01 0.1 0.5
# 3 0.02 0.0 0.2
# 4 0.02 0.1 0.6
Tuning with mapply():
df <- expand.grid(lr = learning_rate, decay = decay)
ML.fun <- function(lr, dc) 10*lr + dc*4
df$result <- mapply(ML.fun, lr = df$lr, dc = df$decay)
df
# lr decay result
# 1 0.01 0.0 0.1
# 2 0.02 0.0 0.2
# 3 0.01 0.1 0.5
# 4 0.02 0.1 0.6

Storing output for nested loop

I'm trying to do a nested loop for logistic regression.
I'm trying to run a loop for the discretization value and for each class.
Here's the code so far... I'm unable to get an output for each different iteration.
class <- c(1,2,3,4,5)
discretization_value <- seq(0.25, 0.75, by =0.05)
output<-data.frame(matrix(nrow=500, ncol=5))
names(output)=c("discretization_value", "class", "var1_coef", "var2_coef", "var3_coef")
for (i in discretization_value){
for (j in class) {
df$discretization_value <- ifelse(df$score >= i,1,0)
result <- (glm(discretization_value ~
var1 + var2 + var3,
data = df[df$class == j,], family= "binomial"))
output[i,1] <- i
output[i,2] <- j
output[i,3] <- coef(summary(result))[c("var1"),c("Estimate")]
output[i,4] <- coef(summary(result))[c("var2"),c("Estimate")]
output[i,5] <- coef(summary(result))[c("var3"),c("Estimate")]
}
}
a snippet of my df
class score var1 var2 var3
1 0.3 0.18 0.33 356
1 0.5 0.22 0.55 33
1 0.6 0.77 0.44 35
2 0.9 0.99 0.55 2
3 0 0 0 0
3 0.4 0.5 0.11 5
4 0 0.6 0 7
4 0 0.6 0 9
4 0.6 0.2 0.1 6
Could this be the problem?
data = df[df$class == j,], family= "binomial"))
I would try to remove the comma before the squared parenthesis.

Defining a function that includes for loops

I have 2 data frames.
One (df1) has columns for slopes and intercepts, and the other (df2) has an index column (i.e., row numbers).
I wish to apply a function based on parameters from df1 to the entire index column in df2. I don't want the function to mix and match slopes and intercepts (i.e., I want to make sure that the function always uses slopes and intercepts from the same columns in df1).
I tried to do this
my_function <- function(x) {for (i in df1$slope) for (j in df1$intercept) {((i*x)+j)}}
df3 <- for (k in df2$Index) {my_function(k)}
df3
but it didn't work.
Here are sample data:
> df1
thermocouple slope intercept
1 1 0.01 0.5
2 2 -0.01 0.4
3 3 0.03 0.2
> df2
index t_1 t_2 t_3
1 1 0.3 0.2 0.2
2 2 0.5 0.2 0.3
3 3 0.3 0.9 0.1
4 4 1.2 1.8 0.4
5 5 2.3 3.1 1.2
Here would be the output I need:
index baseline_t_1 baseline_t_2 baseline_t_3
1 0.51 0.39 0.23
2 0.52 0.38 0.26
3 0.53 0.37 0.29
4 0.54 0.36 0.32
5 0.55 0.35 0.35
What am I doing wrong?
Thanks!
Try this:
By passing three arguments with values at the same time to a anonymous function defined in Map.
Map( function(index, slope, intercept) (index * slope ) + intercept,
index = df2$Index, slope = df1$slope, intercept = df1$intercept)
May be this: I am not sure which one you prefer given there is no data and expected output in the question.
lapply( df2$index, function(index){
unlist( Map( function(slope, intercept) (index * slope ) + intercept,
slope = df1$slope, intercept = df1$intercept) )
})

R: Filling Missing Values (NA) by Multiplying Two Separate Vectors

I'm having a brain-freeze.
This is what I have:
C <- c(C1, C2, C3) # A constant for every row in the data frame
r <- c(r1, r2, r3, r4) # A ratio for every column in the data frame
My data frame looks like this:
1 2 3 4
a 0.7 0.4 NA NA
b NA NA 0.3 NA
c NA 0.6 NA 0.4
I need to fill in the NA's with a multiplication of C and r so that it looks like this:
1 2 3 4
a 0.7 0.4 C1*r3 C1*r4
b C2*r1 C2*r2 0.3 C2*r4
c C3*r1 0.6 C3*r3 0.4
Notice that the multiplication is only done for the NA's and not for numbers that already exist. I know is.na is used to pick out the NA's, and it's probably just linear algebra, but my brain has quit for the day. Any help would be great.
Thanks.
If mm is your matrix , you can fill missing values like this:
mm[is.na(mm)] <- outer(C,r)[is.na(mm)]
example with data :
mm <- read.table(text=' 1 2 3 4
a 0.7 0.4 NA NA
b NA NA 0.3 NA
c NA 0.6 NA 0.4')
C <- c(1, 1, 1) # A constant for every row in the data frame
r <- c(2, 2, 2, 2)
mm[is.na(mm)] <- outer(C,r)[is.na(mm)]
# X1 X2 X3 X4
# a 0.7 0.4 2.0 2.0
# b 2.0 2.0 0.3 2.0
# c 2.0 0.6 2.0 0.4

Tab Delimited to Square Matrix

I have a tab delimited file like
A B 0.5
A C 0.75
B D 0.2
And I want to convert it to a square matrix, like
A B C D
A 0 0.5 0.75 0
B 0 0 0.2
C 0 0
D 0
How can I go about it in R?
Thanks,
If you have the data in a data frame with the following column names:
Var1 Var2 value
you can use
xtabs(value ~ Var1 + Var2, data = df)
See the plyr package for some more general data reshaping functions also.
Another approach (not as elegant as JoFrhwld's)
df<- read.table(textConnection("
Var1 Var2 value
A B 0.5
A C 0.75
B D 0.2
"),header = T)
lev = unique(c(levels(df$Var1),levels(df$Var2)))
A = matrix(rep(0,length(lev)^2),nrow=length(lev))
colnames(A) = lev
rownames(A) = lev
apply(df,1,function(x) A[x[1],x[2]]<<-as.numeric(x[3]))
> A
A B C D
A 0 0.5 0.75 0.0
B 0 0.0 0.00 0.2
C 0 0.0 0.00 0.0
D 0 0.0 0.00 0.0
>
I'm guessing this is a weighted adjacency matrix for a graph. If so, you might be interested in the igraph package, to read the data as a weighted edge list.

Resources