I have 2 data frames.
One (df1) has columns for slopes and intercepts, and the other (df2) has an index column (i.e., row numbers).
I wish to apply a function based on parameters from df1 to the entire index column in df2. I don't want the function to mix and match slopes and intercepts (i.e., I want to make sure that the function always uses slopes and intercepts from the same columns in df1).
I tried to do this
my_function <- function(x) {for (i in df1$slope) for (j in df1$intercept) {((i*x)+j)}}
df3 <- for (k in df2$Index) {my_function(k)}
df3
but it didn't work.
Here are sample data:
> df1
thermocouple slope intercept
1 1 0.01 0.5
2 2 -0.01 0.4
3 3 0.03 0.2
> df2
index t_1 t_2 t_3
1 1 0.3 0.2 0.2
2 2 0.5 0.2 0.3
3 3 0.3 0.9 0.1
4 4 1.2 1.8 0.4
5 5 2.3 3.1 1.2
Here would be the output I need:
index baseline_t_1 baseline_t_2 baseline_t_3
1 0.51 0.39 0.23
2 0.52 0.38 0.26
3 0.53 0.37 0.29
4 0.54 0.36 0.32
5 0.55 0.35 0.35
What am I doing wrong?
Thanks!
Try this:
By passing three arguments with values at the same time to a anonymous function defined in Map.
Map( function(index, slope, intercept) (index * slope ) + intercept,
index = df2$Index, slope = df1$slope, intercept = df1$intercept)
May be this: I am not sure which one you prefer given there is no data and expected output in the question.
lapply( df2$index, function(index){
unlist( Map( function(slope, intercept) (index * slope ) + intercept,
slope = df1$slope, intercept = df1$intercept) )
})
Related
I am trying to implement a for loop in R to fill a df with some combinations of learning rates and decays used in machine learning. The ideia is to try several learning rates and decays, calculate error metrics of these combinations and save in a dataset. So I could point out which combination is better.
Below is the code and my result. I don't understand why I get this result.
learning_rate = c(0.01, 0.02)
decay = c(0, 1e-1)
combinations = length(learning_rate) * length(decay)
df <- data.frame(Combination=character(combinations),
lr=double(combinations),
decay=double(combinations),
result=character(combinations),
stringsAsFactors=FALSE)
for (i in 1:combinations) {
for (lr in learning_rate) {
for (dc in decay) {
df[i, 1] = i
df[i, 2] = lr
df[i, 3] = dc
df[i, 4] = 10*lr + dc*4 # Here I'd do some machine learning. Just put this is easy equation as example
}
}
}
The result I get. It seems that only the combination loop worked well. What I did wrong?
Combination lr decay result
1 0.02 0.1 0.6
2 0.02 0.1 0.6
3 0.02 0.1 0.6
4 0.02 0.1 0.6
I expected this result
Combination lr decay result
1 0.01 0 0.1
2 0.01 1e-1 0.5
3 0.02 0 0.2
4 0.02 1e-1 0.6
Tuning with for-loop:
df <- data.frame()
for (lr in learning_rate) {
for (dc in decay) {
df <- rbind(df, data.frame(
lr = lr,
decay = dc,
result = 10*lr + dc*4
))
}
}
df
# lr decay result
# 1 0.01 0.0 0.1
# 2 0.01 0.1 0.5
# 3 0.02 0.0 0.2
# 4 0.02 0.1 0.6
Tuning with mapply():
df <- expand.grid(lr = learning_rate, decay = decay)
ML.fun <- function(lr, dc) 10*lr + dc*4
df$result <- mapply(ML.fun, lr = df$lr, dc = df$decay)
df
# lr decay result
# 1 0.01 0.0 0.1
# 2 0.02 0.0 0.2
# 3 0.01 0.1 0.5
# 4 0.02 0.1 0.6
I'm trying to do a nested loop for logistic regression.
I'm trying to run a loop for the discretization value and for each class.
Here's the code so far... I'm unable to get an output for each different iteration.
class <- c(1,2,3,4,5)
discretization_value <- seq(0.25, 0.75, by =0.05)
output<-data.frame(matrix(nrow=500, ncol=5))
names(output)=c("discretization_value", "class", "var1_coef", "var2_coef", "var3_coef")
for (i in discretization_value){
for (j in class) {
df$discretization_value <- ifelse(df$score >= i,1,0)
result <- (glm(discretization_value ~
var1 + var2 + var3,
data = df[df$class == j,], family= "binomial"))
output[i,1] <- i
output[i,2] <- j
output[i,3] <- coef(summary(result))[c("var1"),c("Estimate")]
output[i,4] <- coef(summary(result))[c("var2"),c("Estimate")]
output[i,5] <- coef(summary(result))[c("var3"),c("Estimate")]
}
}
a snippet of my df
class score var1 var2 var3
1 0.3 0.18 0.33 356
1 0.5 0.22 0.55 33
1 0.6 0.77 0.44 35
2 0.9 0.99 0.55 2
3 0 0 0 0
3 0.4 0.5 0.11 5
4 0 0.6 0 7
4 0 0.6 0 9
4 0.6 0.2 0.1 6
Could this be the problem?
data = df[df$class == j,], family= "binomial"))
I would try to remove the comma before the squared parenthesis.
The following is my data
y r1 r2 r3
1 0.1 0.2 -0.3
2 0.7 -0.9 0.03
3 -0.93 -0.32 -0.22
1.The first question is how can I get the output like this:
y r1 r2 r3 dummy_r1 dummy_r2 dummy_r3
1 0.1 0.2 -0.3 0 0 1
2 0.7 -0.9 0.03 0 1 0
3 -0.93 -0.32 -0.22 1 1 1
Note:I want the negative data equals to 1, and the positive data equals to 0
2.The second question is that if I want to do the regression like: lm(y~r1+r2+r3+dummy_r1+ dummy_r2+dummy_r3),what should I do if I don’t want to use the output data(dummy_r1,dummy_r2,dummy_r3) above, because it is not convenient.
Using DF shown reproducibly in the Note at the end, define DF2 to also have the sign.* columns and then run the regression on that. Of cousse you don't have enough data shown in the question to actually get coefficients for so many predictors but if in your real problem you have more data then it should be ok.
DF2 <- cbind(DF, sign = +(DF[-1] < 0))
lm(y ~., DF2)
giving:
Call:
lm(formula = y ~ ., data = DF2)
Coefficients:
(Intercept) r1 r2 r3 sign.r1
1.425 -1.163 -1.543 NA NA
sign.r2 sign.r3
NA NA
Note
Lines <- "y r1 r2 r3
1 0.1 0.2 -0.3
2 0.7 -0.9 0.03
3 -0.93 -0.32 -0.22"
DF <- read.table(text = Lines, header = TRUE)
I am trying to change a data frame such that I only include those columns where the first value of the row is the nth largest.
For example, here let's assume I want to only include the columns where the top value in row 1 is the 2nd largest (top 2 largest).
dat1 = data.frame(a = c(0.1,0.2,0.3,0.4,0.5), b = c(0.6,0.7,0.8,0.9,0.10), c = c(0.12,0.13,0.14,0.15,0.16), d = c(NA, NA, NA, NA, 0.5))
a b c d
1 0.1 0.6 0.12 NA
2 0.2 0.7 0.13 NA
3 0.3 0.8 0.14 NA
4 0.4 0.9 0.15 NA
5 0.5 0.1 0.16 0.5
such that a and d are removed, because 0.1 and NA are not the 2nd largest values in
row 1. Here 0.6 and 0.12 are larger than 0.1 and NA in column a and d respectively.
b c
1 0.6 0.12
2 0.7 0.13
3 0.8 0.14
4 0.9 0.15
5 0.1 0.16
Is there a simple way to subset this? I do not want to order it, because that will create problems with other data frames I have that are related.
Complementing pieca's answer, you can encapsulate that into a function.
Also, this way, the returning data.frame won't be sorted.
get_nth <- function(df, n) {
df[] <- lapply(df, as.numeric) # edit
cols <- names(sort(df[1, ], na.last = NA, decreasing = TRUE))
cols <- cols[seq(n)]
df <- df[names(df) %in% cols]
return(df)
}
Hope this works for you.
Sort the first row of your data.frame, and then subset by names:
cols <- names(sort(dat1[1,], na.last = NA, decreasing = TRUE))
> dat1[,cols[1:2]]
b c
1 0.6 0.12
2 0.7 0.13
3 0.8 0.14
4 0.9 0.15
5 0.1 0.16
You can get an inverted rank of the first row and take the top nth columns:
> r <- rank(-dat1[1,], na.last=T)
> r <- r <= 2
> dat1[,r]
b c
1 0.6 0.12
2 0.7 0.13
3 0.8 0.14
4 0.9 0.15
5 0.1 0.16
I need to do some calculation as per the below formula:
B1 = A1 + (1-A1) * B1
example:
B1 = 0.2 + (1 - 0.2) * 0.4
= 0.52
C1 = 0.4 + (1 - 0.4) * 0.8
= 0.904
D1 = 0.8 + (1 - 0.8) * 0.5
= 0.952
Same logic applied for other rows and other columns, there are total 11.
dataframe:
df
A B C D
0.2 0.4 0.8 0.5
0.4 0.5 0.6 0.2
0.8 0.1 0.5 0.4
0.3 0.4 0.1 0.8
Expected output:
A B C D
0.2 0.52 0.904 0.952
0.4 0.7 0.88 0.904
0.8 0.82 0.91 0.946
0.3 0.58 0.622 0.9244
I tried it for 1 with the below code:
Df <- df[-ncol(df)] + ( 1 – df[-ncol(df)]) * df[-1]
I was able to get the column B as per the output, but not working for rest of the column.
Please help, thanks. BM.
You can do this recursively as follows:
do.call(cbind, Reduce(f = function(A1, B1) A1+(1-A1)*B1,
x = df,
accumulate = TRUE))
Explanation:
Since df is a data.frame which is a list of vectors, Reduce will take each vector and apply your function. Then do.call(cbind,...) combine the results into a data.frame.