I am printing a table to PDF with much success. Standard hierarchical regression
with three steps. However, my questions is twofold: 1) How do I add the asterisk to mark sig p values on the covariates and 2) how do I remove rows like AIC etc.
At this point, I am just opening the pdf in word to edit the table but thought someone might have a solution.
H_regression <- apa_print(list(Step1 = model1,
Step2 = model2,
Step3 = model3),
boot_samples = 0)
I'll use the example from the papaja documentation as an example.
mod1 <- lm(Sepal.Length ~ Sepal.Width, data = iris)
mod2 <- update(mod1, formula = . ~ . + Petal.Length)
mod3 <- update(mod2, formula = . ~ . + Petal.Width)
moi <- list(Baseline = mod1, Length = mod2, Both = mod3)
h_reg <- apa_print(moi, boot_samples = 0)
h_reg_table <- h_reg$table
2) how do I remove rows like AIC
The table returned by apa_print() is a data.frame with some additional information. Hence, you can index and subset it as you would any other table. You can select rows by name (see below) or by row number.
# Remove rows
rows_to_remove <- c("$\\mathrm{AIC}$", "$\\mathrm{BIC}$")
h_reg_table <- h_reg_table[!rownames(h_reg_table) %in% rows_to_remove, ]
1) How do I add the asterisk to mark sig p values on the covariates
There is currently no way to highlight significant predictors (I'm not a fan of this practice). But here's some code that will let you add highlighting after the fact. The following function takes the formatted table, the list of the compared models and a character symbol to highlight significant predictors as input.
# Define custom function
highlight_sig_predictors <- function(x, models, symbol) {
n_coefs <- sapply(models, function(y) length(coef(y)))
for(i in seq_along(models)) {
sig_stars <- rep(FALSE, max(n_coefs))
sig_stars[1:n_coefs[i]] <- apply(confint(models[[i]]), 1, function(y) all(y > 0) || all(y < 0))
x[1:max(n_coefs), i] <- paste0(x[1:max(n_coefs), i], ifelse(sig_stars, symbol, paste0("\\phantom{", symbol, "}")))
}
x
}
Now this function can be used to customize the table returned by apa_print().
# Add significance symbols to predictors
h_reg_table <- highlight_sig_predictors(h_reg_table, moi, symbol = "*")
# Print table
apa_table(h_reg_table, escape = FALSE, align = c("lrrr"))
Related
I would like to create a graphing function in R which takes a formula as an argument, e.g.:
my.plot(sqrt(Sepal.Width) ~ Petal.Width + log(Petal.Length) + Species + Petal.Width * Petal.Length, .data = iris)
And then
Perform a model fit with the first predictor term exchanged for another vector created within the function.
Use the outcome term and the first predictor term for an overlying plot.
Allow interaction and crossing terms, and use of the . symbol denoting all other variables in the data frame.
Handle the case where only 1 predictor term is provided - e.g. Sepal.Width ~ Petal.Width.
R pseudocode for a highly simplified example:
library("formula.tools")
my.plot <- function(.formula, .data) {
outcome.term <- lhs(.formula)
first.predictor.term <- rhs(.formula)[1]
new.formula <- outcome.term ~ 1:nrow(.data) + rhs(.formula)[-1]
my.fit <- lm(new.formula, data = .data)
my.predict <- predict(my.fit)
plot(first.predictor.term, outcome.term, data = .data)
lines(first.predictor.term, my.predict, data = .data)
}
You could accomplish the same using Base R:
my.plot <- function(.formula, .data) {
outcome.term <- deparse(.formula[[2]])
first.predictor.term <- .formula[[3]]
len <- length(first.predictor.term) > 1
if (len) first.predictor.term <- .formula[[3]][[2]]
if (len) .formula[[3]][[2]] <- substitute(new_variable)
else .formula[[3]] <- substitute(new_variable)
.data['new_variable'] <- 1:nrow(.data)
my.fit <- lm(.formula, data = .data)
my.predict <- predict(my.fit)
f <- reformulate(deparse(first.predictor.term), outcome.term)
plot(f, data = .data, ty = "p")
}
I have a list of model summaries (let's say it is a linear model; but this could apply to any model summary).
Currently, I am extracting a certain coefficient from this list of summaries using the following:
coef <- sapply(modelsummaries, function(x) x[[4]][[4,1]])
How could I do this by calling the variable name instead of relying on row position?
For each of the model summaries within the list, only one variable differs, which is named as V_01, V_02, V_03 etc. This is the variable coefficient I would like to extract.
I was thinking of using the grep function, something like:
coef <- sapply(modelsummaries, function(x) x[[4]][[grep("^[V]"),1]])
...but haven't got it working. Any suggestions?
Here's a reproducible example (only the last line needs to be tweaked):
newdata <- as.data.frame(seq(from = 0.1, to = 0.9, by = 0.1))
newdata <- as.data.frame(t(newdata))
colnames(newdata) = newdata[1, ]
colnames(newdata) <- paste("V", colnames(newdata), sep = "_")
mtcars <- mtcars
mtcarsmodel <- data.frame(mtcars, newdata)
mtcarsmodel[c(12:20)] <- sample(1:100, 288, replace=TRUE)
xnam <- paste(colnames(mtcarsmodel)[c(4:5)], sep="")
xnam2 <- paste(colnames(mtcarsmodel)[c(12:20)], sep="")
fmla <- paste(xnam, collapse= "+")
fmla2 <- paste(paste(fmla), "+")
fla <- paste("mpg ~", paste(fmla2))
models <- lapply(setNames(xnam2, xnam2), function(var) {
form = paste(fla, var)
lm(form, data=mtcarsmodel)
})
modelsummaries <-lapply(models, summary)
coef <- sapply(modelsummaries, function(x) x[[4]][[4,1]])
You were quite close, you just needed to tell grep what to search on, which is the rownames of the coefficient matrix returned by coef() (which is a better way to get them than [[4]]). Also so as not to reuse that name, I suggest saving the result in something different, like coefs.
coefs <- sapply(modelsummaries, function(x) {
coef(x)[grep("^V", rownames(coef(x))),"Estimate"]
})
V_0.1 V_0.2 V_0.3 V_0.4 V_0.5 V_0.6 V_0.7 V_0.8
0.030927774 -0.053437459 0.009335911 -0.011009187 -0.010303494 -0.001705420 -0.036297492 0.021838044
V_0.9
0.005457086
Also, check out the new broom package which can make it easier to extract certain information from models in a tidy way.
After struggling with a grep solution, I committed blasphemy and used an sql solution instead:
library('sqldf')
new <- lapply(modelsummaries, function(x) setDT(data.frame(x[[4]]), keep.rownames = TRUE)[])
values <- sapply(new, function(x) sqldf("SELECT x.estimate, x.'Pr...t..' FROM x WHERE rn like '%V_%'"))
data <- as.data.frame(t(rbind(values)))
I've also come up with a (somewhat ugly) grep based solution:
coef <- sapply(modelsummaries, function(x) as.numeric(unlist(strsplit(grep("^V_", capture.output(x), value = TRUE), "\\s+"))[[2]]))
In my dataset I have 6 variables(x1,x2,x3,x4,x5,x6), i wish to create a function that allows me to input one variable and it will do the formula with the rest of the variables in the data set.
For instance,
fitRegression <- function(data, dependentVariable) {
fit = lm(formula = x1 ~., data = data1)
return(fit)
}
fitRegression(x2)
However, this function only returns me with results of x1. My desire result will be inputting whatever variables and will automatically do the formula with the rest of the variables.
For Example:
fitRegression(x2)
should subtract x2 from the variable list therefore we only compare x2 with x1,x3,x4,x5,x6.
and if:
fitRegression(x3)
should subtract x3 from the comparable list, therefore we compare x3 with x1,x2,x4,x5,x6.
Is there any ways to express this into my function, or even a better function.
You can do it like this:
# sample data
sampleData <- data.frame(matrix(rnorm(500),100,5))
colnames(sampleData) <- c("A","B","C","D","E")
# function
fitRegression <- function(mydata, dependentVariable) {
# select your independent and dependent variables
dependentVariableIndex<-which(colnames(mydata)==dependentVariable)
independentVariableIndices<-which(colnames(mydata)!=dependentVariable)
fit = lm(formula = as.formula(paste(colnames(mydata)[dependentVariableIndex], "~", paste(colnames(mydata)[independentVariableIndices], collapse = "+"), sep = "" )), data = mydata)
return(fit)
}
# ground truth
lm(formula = A~B+C+D+E, data = sampleData)
# reconcile results
fitRegression(sampleData, "A")
You want to select the Y variable in your argument. The main difficulty is to pass this argument without any quotes in your function (it is apparently the expected result in your code). Therefore you can use this method, using the combination deparse(substitute(...)):
fitRegression <- function(data, dependentVariable) {
formula <- as.formula(paste0(deparse(substitute(dependentVariable)), "~."))
return(lm(formula, data) )
}
fitRegression(mtcars, disp)
That will return the model.
The below function uses "purrr" and "caret" it produces a list of models.
df <-mtcars
library(purrr);library(caret)
#create training set
vect <- createDataPartition(1:nrow(df), p=0.8, list = FALSE)
#build model list
ModList <- 1:length(df) %>%
map(function(col) train(y= df[vect,col], x= df[vect,-col], method="lm"))
I'm using lm in R to do simple multilinear regression. Here's an example model:
m <- lm(formula = t ~ a + b + 0, data = df1)
where t, a and b are columns in df1. This model calculates 2 coefficients, let's call them a.coef and b.coef. If I then use this model to predict some other data, say in df2, I can get the predicted values like so:
predict(m, df2)
if I have the columns a and b in df2 as well. It essentially returns
df2$a * a.coef + df2$b * b.coef
What I'd like, however, are the columns df2$a * a.coef and df2$b * b.coef. R sums them and gives me the answer, but I'd like to see how the scaling affects these values.
Is there a convenient way to do this in R (esp in lm or predict.lm), or will I have to manually code this myself? I played with the terms argument in predict.lm, but I couldn't get anywhere.
Thanks for the help!
EDIT
I wrote this function:
scaled.fn <- function(dt, x, y, i) {
# dt is data.table
# x is dependent column (col name as str)
# y are predictor columns (col names as vector of str)
# i is name of column to multiply, as str
dep = paste(y, collapse = " + ")
my.formula = paste(x, " ~ ", dep, sep = "")
m = lm(formula = my.formula, data = dt)
# column names in dt are named in y
return(dt[, get(i) * coef(m)[i]])
}
Try this:
sweep(df2, MARGIN = 2, coef(m), '*')
EDIT: more specific solution:
sweep(df2[,c("a","b")], MARGIN = 2, coef(m), '*')
I'm new to loops and I have a problem with calling variable from i'th data frame.
I'm able to call each data frame correctly, but when I should call a specified variable inside each data frame problems come:
Example:
for (i in 1:15) {
assign(
paste("model", i, sep = ""),
(lm(response ~ variable, data = eval(parse(text = paste("data", i, sep = "")))))
)
plot(data[i]$response, predict.lm(eval(parse(text = paste("model", i, sep = ""))))) #plot obs vs preds
}
Here I'm doing a simple one variable linear model 15 times, which works just fine. Problems come when I try to plot the results. How should I call data[i] response?
Let's say there are multiple dataframes with names: data1 ...data15 and that there are no other data-objects that begin with the letters: d,a,t,a. Lets also assume that in each of those dataframes are columns named 'response' and 'variable'. The this would gather the dataframes into a list and draw separate plots for the linear regression lines.
dlist <- lapply ( ls(patt='^data'), get)
lapply(dlist, function(df)
plot(NA, xlim=range(df$variable), ylim=range(df$response)
abline( coef( lm(response ~ variable, data=df) ) )
)
If you wanted to name the dataframes in that list, you could use your paste code to supply names:
names(dlist) <- paste("data", i, sep = "")
There are many other assignments you could make in the context of this loop, but you would need to describe the desired results better than with failed efforts.
Here's modified code that should work. It does one variable lm-model and calculates correlation of predicted and observed values and stores it into an empty matrix. It also plots these values.
Thanks Thomas for help.
par(mfrow=c(4,5))
results.matrix <- matrix(NA, nrow = 20, ncol = 2)
colnames(results.matrix) <- c("Subset","Correlation")
for (i in 1:length(datalist)) {
model <- lm(response ~ variable, data = datalist[[i]])
pred <- predict.lm(model)
cor <- (cor.test(pred, datalist[[i]]$response))
plot(pred, datalist[[i]]$response, xlab="pred", ylab="obs")
results.matrix[i, 1] <- i
results.matrix[i, 2] <- cor$estimate
}