How to apply a chunk of code (not only a single function) to all columns in dataset - r

I would like to apply this chunk of code to each column in a dataset. I can run all columns individually, but it is tedious to make repeated code for 75 different columns and change all of the names in the code to match each column name. Is there a way that I can run all columns individually at once without making code for each column individually?
max.Width =lmer(mergeCowpeaTEST$max.Width ~ (1|Genotype) + (1|Year) + (1|Genotype:Year) + (1|Rep:Year), data=mergeCowpeaTEST,na.action = na.omit)
model.a_max.Width <-lmer(max.Width~ (1|Genotype) + (1|Year) + (1|Genotype:Year) + (1|Rep:Year), data=mergeCowpeaTEST)
alt.est.a_max.Width <- influence(model.a_max.Width, obs=TRUE)
cooks<-cooks.distance(alt.est.a_max.Width)
plot(alt.est.a_max.Width, which="cook", sort=FALSE,main="cook's distance plot of max.Width")
which(residuals(max.Width)>0.10)
which(residuals(max.Width)<(-0.10))
boxplot(residuals(max.Width))
myboxplot<-boxplot(residuals(max.Width))
myboxplot$out
hist(residuals(max.Width))
qqnorm(residuals(max.Width))
pdf("Widiv_max.Width_residual_graphs.pdf",height=8,width=10)
plot(fitted(max.Width),residuals(max.Width), xlab="Predicted values", ylab="Residuals", main="Residual Plot of widiv max.Width")
abline(h=0, col="red")
hist(resid(max.Width),main="histogram of max.Width residuals")
qqnorm(residuals(max.Width), main="Residuals Q-Q Plot");qqline(resid(max.Width))
qqnorm(ranef(max.Width)$Genotype$"(Intercept)", main="Genotypes Q-Q Plot"); qqline(ranef(max.Width)$Genotype$"(Intercept)")
qqnorm(ranef(max.Width)$"Genotype:Year"$"(Intercept)", main="Genotype by Year Q-Q Plot"); qqline(ranef(max.Width)$"Genotype:Year"$"(Intercept)")
plot(alt.est.a_max.Width, which="cook", sort=FALSE,main="cook's distance plot of max.Width")
dev.off()

The key to this is your describing it as "only" a single function. A single function can run an arbitrary amount of things. You can have it print something, then do something, then output something. Or do lots of things. Or play Global Geothermonuclear War. All in a single function.
apply( ChickWeight, 2, function(clmn) {
cat("Hi")
cat("Low")
cat("The only way to win is not to play at all")
} )

Related

Is there a ggplot2 analogue to the avPlots function in R?

When undertaking regression modelling it is useful to produce added variable plots for the explanatory variables in the model, to check whether the posited relationships to the response variable are appropriate to the data. The avPlots function in the car package in R takes a model input, and produces a grid of added-variable plots using the base graphics system. This function is extremely user-friendly, insofar as all you need to do is put in the model object as an argument, and it automatically produces all the added variable plots for each explanatory variable. This matrix of plots contains all the desired information, but unfortunately the plots look poor, owing to the fact that it uses the base graphics system rather than the ggplot2 package. For example, using data found here (downloaded as the file Trucking.csv) here is the output of the avPlots function.
#Load required libraries
library(car);
#Import data, fit model, and show AV plots
DATA <- read.csv('Trucking.csv');
MODEL <- lm(log(PRICPTM) ~ DISTANCE + PCTLOAD + ORIGIN + MARKET + DEREG + PRODUCT,
data = DATA);
avPlots(MODEL);
Question: Is there an equivalent function in ggplot2 that produces a matrix of each of the added-variable plots for a model, but with "prettier" plots? Is it possible to produce these plots, but then customise them using standard ggplot syntax?
I am not aware of any automated function that produces the added variable plots using ggplot. However, as well as giving a plot output as a side-effect of the function call, the avPlots function produces an object that is a list containing the data values used in each of the added variable plots. It is relatively simple to extract data frames of these variables and use these to generate added variable plots using ggplot. This can be done for a general model object using the following functions.
avPlots.invis <- function(MODEL, ...) {
ff <- tempfile()
png(filename = ff)
OUT <- car::avPlots(MODEL, ...)
dev.off()
unlink(ff)
OUT }
ggAVPLOTS <- function(MODEL, YLAB = NULL) {
#Extract the information for AV plots
AVPLOTS <- avPlots.invis(MODEL)
K <- length(AVPLOTS)
#Create the added variable plots using ggplot
GGPLOTS <- vector('list', K)
for (i in 1:K) {
DATA <- data.frame(AVPLOTS[[i]])
GGPLOTS[[i]] <- ggplot2::ggplot(aes_string(x = colnames(DATA)[1],
y = colnames(DATA)[2]),
data = DATA) +
geom_point(colour = 'blue') +
geom_smooth(method = 'lm', se = FALSE,
color = 'red', formula = y ~ x, linetype = 'dashed') +
xlab(paste0('Predictor Residual \n (',
names(DATA)[1], ' | others)')) +
ylab(paste0('Response Residual \n (',
ifelse(is.null(YLAB),
paste0(names(DATA)[2], ' | others'), YLAB), ')')) }
#Return output object
GGPLOTS }
The function ggAVPLOTS will take an input model and produce a list of ggplot objects for each of the added variable plots. These have been constructed to give "pretty" plots with blue points and a dashed red regression line through each plot. If you want all the added variable plots to show up in a single plot, it is relatively simple to do this using the grid.arrange function in the gridExtra package. Below we apply this to your model and show the resulting plot.
#Produce matrix of added variable plots
library(gridExtra)
PLOTS <- ggAVPLOTS(MODEL)
K <- length(PLOTS)
NCOL <- ceiling(sqrt(K))
AVPLOTS <- do.call("arrangeGrob", c(PLOTS, ncol = NCOL, top = 'Added Variable Plots'))
ggsave('AV Plots - Trucking.jpg', width = 10, height = 10)
It is possible to make whatever alterations you want to these plots in the ggplot code above, so if a user prefers to change the colours, font sizes, etc., this is done using standard syntax in ggplot. This method works by importing the data for the added variable plots from the avPlots function, but once you have done that, you can use this data to produce any kind of plot.

How to use an if/else statement to plot different colored lines within a plotting for-loop (in R)

I created a for loop to plot 38 lines (which are the rows of my matrix, results.summary.evap, and correspond to 38 total samples). I'd like to make these lines different colors, based on a characteristic pertaining to each sample: age. I can access the age in my input matrix: surp.data$Age_ka.
However, the matrix I am looping over (results.summary.evap) does not have sample age or sample name, though each sample should be located in the same rows for both surp.data and results.summary.evap.
Here is the for loop I created to plot 38 lines, one corresponding to each sample. In this case, results.summary.evap is what I am plotting from, and this matrix is derived from information in the surp.data input file.
par(mfrow=c(3,1))
par(mar=c(3, 4, 2, 0.5))
plot(NA,NA,xlim=c(0,10),ylim=c(0,2500), ylab = "Evaporation (mm/yr)", xlab = "Relative Humidity")
for(i in 1:range){
lines(rh.sens,results.summary.evap[i,])
}
```
I'd like to plot lines in different colors based on the age associated with each sample. I tried to incorporate an if/else statement into the loop, that would plot in a different color if the corresponding sample age was greater that 20 in the input file.
```
for(i in 1:range){
if surp.data$Age_ka[i]>20 {
lines(rh.sens,results.summary.evap[i,], col = 'red')
} else {
lines(rh.sens,results.summary.evap[i,], col = 'black')
}
}
This for loop won't run (due to issues with parentheses). I'm not sure if what I am doing if fundamentally wrong, or if i've just missed a parenthesis somewhere. I'm also not sure how to make this a bit more robust; for example, by plotting in 6-8 different colors based on age ranges, rather than just two.
Thank you!
You're missing parenthesis around your if statement
for(i in 1:range){
if(surp.data$Age_ka[i]>20){
lines(rh.sens,results.summary.evap[i,], col = 'red')
} else {
lines(rh.sens,results.summary.evap[i,], col = 'black')
}
}

R statistical Programing

I am trying to write R codes for the histogram plot and save each histogram separate file using the following command.
I have a data set "Dummy" and i want to plot each histogram by a column name and there will be 100 histogram plots in total...
I have the following R codes that draws the each Histogram...
library(ggplot2)
i<-1
for(i in 1:100)
{
jpeg(file="d:/R Data/hist.jpeg", sep=",")
hist(Dummy$colnames<-1, ylab= "Score",ylim=c(0,3),col=c("blue"));
dev.off()
i++
if(i>100)
break()
}
As a start, let's get your for loop into R a little better by taking out the lines trying to change i, your for loop will do that for you.
We'll also include a file= value that changes with each loop run.
for(i in 1:100)
{
jpeg(file = paste0("d:/R Data/hist", i, ".jpeg"))
hist(Dummy[[i]], ylab = "Score", ylim = c(0, 3), col = "blue")
dev.off()
}
Now we just need to decide what you want to plot. Will each plot be different? How will each plot extract the data it needs?
EDIT: I've taken a stab at what you're trying to do. Are you trying to take each of 100 columns from the Dummy dataset? If so, Dummy[[i]] should achieve that (or Dummy[,i] if Dummy is a matrix).

Looping over attributes vector to produce combined graphs

Here is some code that tries to compute the marginal effects of each of the predictors in a model (using the effects package) and then plot the results. To do this, I am looping over the "term.labels" attribute of the glm terms object).
library(DAAG)
library(effects)
formula = pres.abs ~ altitude + distance + NoOfPools + NoOfSites + avrain + meanmin + meanmax
summary(logitFrogs <- glm(formula = formula, data = frogs, family = binomial(link = "logit")))
par(mfrow = c(4, 2))
for (predictorName in attr(logitFrogs$terms, "term.labels")) {
print(predictorName)
effLogitFrogs <- effect(predictorName, logitFrogs)
plot(effLogitFrogs)
}
This produces no picture at all. On the other hand, explicitly stating the predictor names does work:
effLogitFrogs <- effect("distance", logitFrogs)
plot(effLogitFrogs)
What am I doing wrong?
Although you call function plot(), actually it calls function plot.eff() and it is lattice plot and so par() argument is ignored. One solution is to use function allEffects() and then plot(). This will call function plot.efflist(). With this function you do not need for loop because all plots are made automatically.
effLogitFrogs <- allEffects(predictorName, logitFrogs)
plot(effLogitFrogs)
EDIT - solution with for loop
There is "ugly" solution to use with for() loop. For this we need also package grid. First, make as variables number of rows and columns (now it works only with 1 or 2 columns). Then grid.newpage() and pushViewport() set graphical window.
Predictor names are stored in vector outside the loop. Using functions pushViewport() and popViewport() all plots are put in the same graphical window.
library(lattice)
library(grid)
n.col=2
n.row= 4
grid.newpage()
pushViewport(viewport(layout = grid.layout(n.row,n.col)))
predictorName <- attr(logitFrogs$terms, "term.labels")
for (i in 1:length(predictorName)) {
print(predictorName[i])
effLogitFrogs <- effect(predictorName[i], logitFrogs)
pushViewport(viewport(layout.pos.col=ceiling(i/n.row), layout.pos.row=ifelse(i-n.row<=0,i,i-n.row)))
p<-plot(effLogitFrogs)
print(p,newpage=FALSE)
popViewport(1)
}
add print to your loop resolve the problem.
print(plot(effLogitFrogs))
plot call plot.eff , which create the plot without printing it.
allEffects generete an object of type eff.list. When we try to plot this object, its calls plot.efflist function which prints the plot so no need to call print like plot.eff.

Can I tell ggpairs to use log scales?

Can I provide a parameter to the ggpairs function in the GGally package to use log scales for some, not all, variables?
You can't provide the parameter as such (a reason is that the function creating the scatter plots is predefined without scale, see ggally_points), but you can change the scale afterward using getPlot and putPlot. For instance:
custom_scale <- ggpairs(data.frame(x=exp(rnorm(1000)), y=rnorm(1000)),
upper=list(continuous='points'), lower=list(continuous='points'))
subplot <- getPlot(custom_scale, 1, 2) # retrieve the top left chart
subplotNew <- subplot + scale_y_log10() # change the scale to log
subplotNew$type <- 'logcontinuous' # otherwise ggpairs comes back to a fixed scale
subplotNew$subType <- 'logpoints'
custom_scale <- putPlot(custom_fill, subplotNew, 1, 2)
This is essentially the same answer as Jean-Robert but looks much more simple (approachable). I don't know if it is a new feature but it doesn't look like you need to use getPlot or putPlot anymore.
custom_scale[1,2]<-custom_scale[1,2] + scale_y_log10() + scale_x_log10()
Here is a function to apply it across a big matrix. Supply the number of rows in the plot and the name of the plot.
scalelog2<-function(x=2,g){ #for below diagonal
for (i in 2:x){
for (j in 1:(i-1)) {
g[i,(j)]<-g[i,(j)] + scale_x_continuous(trans='log2') +
scale_y_continuous(trans='log2')
} }
for (i in 1:x){ #for the bottom row
g[(x+1),i]<-g[(x+1),i] + scale_y_continuous(trans='log2')
}
for (i in 1:x){ #for the diagonal
g[i,i]<-g[i,i]+ scale_x_continuous(trans='log2') }
return(g) }
It's probably better use a linear scale and log transform variables as appropriate before supplying them to ggpairs because this avoids ambiguity in how the correlation coefficients have been computed (before or after log-transform).
This can be easily achieved e.g. like this:
library(tidyverse)
log10_vars <- vars(ends_with(".Length")) # define variables to be transformed
iris %>% # use standard R example dataframe
mutate_at(log10_vars, log10) %>% # log10 transform selected columns
rename_at(log10_vars, sprintf, fmt="log10 %s") %>% # rename variables accordingly
GGally::ggpairs(aes(color=Species))

Resources