Write function to plot data, requires passing data.frame column names - r

I would like to write a function to create plots (in order to create multiple plots without listing the design settings every time). The pirateplot function that I use requires columnnames and a dataframe as input, which causes problems.
My not-working code is:
pirateplot_default <- function(DV,IV,Dataset) {
plot <- pirateplot(formula = DV ~ IV,
data = Dataset,
xlab = "Solution")
return(plot)
}
I have tried "as.name" (saw that here) but it did not work.
using data[DV] is no option because the pirateplot function requires a different notation
I know that there are similar questions here,here,here, and this probably qualifies as duplicate for more skilled programmers, but I did not manage to apply the solutions at other questions to my problem, so hoping for help.

Here is an example
pirateplot_default <- function(DV,IV,Dataset) {
tmp=as.formula(paste0(DV,"~",paste0(IV,collapse="+")))
plot <- pirateplot(formula = tmp,
data = Dataset,
xlab = "Solution")
return(plot)
}
pirateplot_default("mpg",c("disp","cyl","hp"),mtcars)

Related

Advise a Chemist: Automate/Streamline his Voltammetry Data Graphing Code

I am a chemist dealing with a significant amount of voltammetry data recently. Let me be very clear and give some research information. I run scans from a starting voltage to an ending voltage on solid state conductive films. These scans are saved as .txt files (name scheme: run#.txt) in a single folder. I am looking at how conductance changes as temperature changes. The LINEST line plotting current v. voltage at a given temperature gives me a line with slope = conductance. Once I have the conductances (slopes) for each scan, I plot conductance v. temperature to see the temperature dependent conductance characteristics. I had been doing this in Excel, but have found quicker ways to get the job done using R. I am brand new to R (Rstudio) and recognize that my coding is not the best. Without doubt, this process can be streamlined and sped up which would help immensely. This is how I am performing the process currently:
# Set working directory with folder containing all .txt files for inspection
# Add all .txt files to the global environment
allruns<-list.files(pattern=".txt")
for(i in 1:length(allruns))assign(allruns[i],read.table(allruns[i]))
Since the voltage column (a 1x1000 matrix) is the same for all runs and is in column V1 of each .txt file, I assign a x to be the voltage column from the first folder
x<-run1.txt$V1
All currents (these change as voltage changes) are found in the V2 column of all the .txt files, so I assign y# to each. These are entered one at a time..
y1<-run1.txt$V2
y2<-run2.txt$V2
y3<-run3.txt$V2
# ...
yn<-runn.txt$V2
So that I can get the eqn for each LINEST (one LINEST for each scan and plotted with abline later). Again entered one at a time:
run1<-lm(y1~x)
run2<-lm(y2~x)
run3<-lm(y3~x)
# ...
runn<-lm(yn~x)
To obtain a single graph with all LINEST (one for each scan ) on the same plot, without the data points showing up, I have been using this pattern of coding to first get all data points on a single plot in separate series:
plot(x,y1,col="transparent",main="LSV Solid Film", xlab = "potential(V)",ylab="current(A)", xlim=rev(range(x)),ylim=range(c(y3,yn)))
par(new=TRUE)
plot(x,y2,col="transparent",main="LSV Solid Film", xlab = "potential(V)",ylab="current(A)", xlim=rev(range(x)),ylim=range(c(y3,yn)))
par(new=TRUE)
plot(x,y3,col="transparent",main="LSV Solid Film", xlab = "potential(V)",ylab="current(A)", xlim=rev(range(x)),ylim=range(c(y1,yn)))
# ...
par(new=TRUE)
plot(x,yn,col="transparent",main="LSV Solid Film", xlab = "potential(V)",ylab="current(A)", xlim=rev(range(x)),ylim=range(c(y1,yn)))
#To obtain all LINEST lines (one for each scan, on the single graph):
abline(run1,col=””, lwd=1)
abline(run2,col=””,lwd=1)
abline(run3,col=””,lwd=1)
# ...
abline(runn,col=””,lwd=1)
# Then to get each LINEST equation:
summary(run1)
summary(run2)
summary(run3)
# ...
summary(runn)
Each time I use summary(), I copy the slope and paste it into an Excel sheet- along with corresponding scan temp which I have recorded separately. I then graph the conductance v temp points for the film as X-Y scatter with smooth lines to give the temperature dependent conductance curve. Giving me a single LINEST lines plot in R and the conductance v temp in Excel.
This technique is actually MUCH quicker than doing it all in Excel, but it can be done much quicker and efficiently!!! Also, if I need to change something, this entire process needs to be reexecuted with whatever change is necessary. This process takes me maybe 5 hours in Excel and 1.5 hours in R (maybe I am too slow). Nonetheless, any tips to help automate/streamline this further are greatly appreciated.
There are plenty of questions about operating on data in lists; storing a list of matrix or a list of data.frame is fast, and code that operates cleanly on one can be applied to the remaining n-1 very easily.
(Note: the way I'm showing it here is one technique: maintaining everything in well-compartmentalized lists. Other will suggest -- very justifiably -- that combing things into a single data.frame and adding a group variable (to identify from which file/experiment the data originated) will help with more advanced multi-experiment regression or combined plotting, such as with ggplot2. I'm not going to go into this latter technique here, not yet.)
It is long decried not to do for(...) assign(..., read.csv(...)); you have the important part done, so this is relatively easy:
allruns <- sapply(list.files(pattern = "*.txt"), read.table, simplify = FALSE)
(The use of sapply(..., simplify=FALSE) is similar to lapply(...), but it has a nice side-effect of naming the individual list-ified elements with, in this case, each filename. It may not be critical here but is quite handy elsewhere.)
Extracting your invariant and variable data is simple enough:
allLMs <- lapply(allruns, function(mdl) lm(V2 ~ V1, data = mdl))
I'm using each table's V1 here instead of a once-extracted x ... though you might wonder why, I argue keeping it like for two reasons: (1) JUST IN CASE the V1 variable is ever even one-row-different, this will save you; (2) it is very easy to construct the model like this.
At this point, each object within allLMs is an lm object, meaning we might do:
summary(allLMs[[1]])
Plotting: I think I understand why you are using par=NEW, and I have to laugh ... I had been deep in R for a while before I started using that technique. What I think you need is actually much simpler:
xlim <- rev(range(allruns[[1]]$V1))
ylim <- range(sapply(allruns, `[`, "V2"))
# this next plot just sets the box and axes, no points
plot(NA, type = "na", xlim = xlim, ylim = ylim)
# no need to plot points with "transparent" ...
ign <- sapply(allLMs, abline, col = "") # and other abline options ...
Copying all models into Excel, again, using lists:
out <- do.call(rbind, sapply(allLMs, function(m) summary(m)$coefficients[,1]))
This will now be a single data.frame with all coefficients in two columns. (Feel free to use similar techniques to extract the other model summary attributes, including std err, t.value, or Pr(>|t|) (in the $coefficients); or $r.squared, $adj.r.squared, etc.)
write.csv(out, file="clipboard", sep="\t")
and paste into Excel. (Or, better yet, save it to a CSV file and import that, since you might want to keep it around.)
One of the tricks to using lists for this is to persevere: keep things in lists as long as you can, so that you don't have deal with models individually. One mantra is that if you do it once, you shouldn't have to type it again, just loop/apply/map/whatever. Don't extract too much from the lists before you have to.
Note: r2evans' answer provides good general advice and doesn't require heavy package dependencies. But it probably doesn't hurt to see alternative strategies.
The tidyverse can be quite handy for this sort of thing, here's a dummy example for illustration,
library(tidyverse)
# creating dummy data files
dummy <- function(T) {
V <- seq(-5, 5, length=20)
I <- jitter(T*V + T, factor = 1)
write.table(data.frame(V=V, I = I),
file = paste0(T,".txt"),
row.names = FALSE)
}
purrr::walk(300:320, dummy)
# reading
lf <- list.files(pattern = "\\.txt")
read_one <- function(f, ...) {cbind(T = as.numeric(gsub("\\.txt", "", f)), read.table(f, ...))}
m <- purrr::map_df(lf, read_one, header = TRUE, .id="id")
head(m)
ggplot(m, aes(V, I, group = T)) +
facet_wrap( ~ T) +
geom_point() +
geom_smooth(se = FALSE)
models <- m %>%
split(.$T) %>%
map(~lm(I ~ V, data = .))
coefs <- models %>% map_df(broom::tidy, .id = "T")
ggplot(coefs, aes(as.numeric(T), estimate)) +
geom_line() +
facet_wrap(~term, scales = "free")

Pass function argument to ggplot label [duplicate]

I need to wrap ggplot2 into another function, and want to be able to parse variables in the same manner that they are accepted, can someone steer me in the correct direction.
Lets say for example, we consider the below MWE.
#Load Required libraries.
library(ggplot2)
##My Wrapper Function.
mywrapper <- function(data,xcol,ycol,colorVar){
writeLines("This is my wrapper")
plot <- ggplot(data=data,aes(x=xcol,y=ycol,color=colorVar)) + geom_point()
print(plot)
return(plot)
}
Dummy Data:
##Demo Data
myData <- data.frame(x=0,y=0,c="Color Series")
Existing Usage which executes without hassle:
##Example of Original Function Usage, which executes as expected
plot <- ggplot(data=myData,aes(x=x,y=y,color=c)) + geom_point()
print(plot)
Objective usage syntax:
##Example of Intended Usage, which Throws Error ----- "object 'xcol' not found"
mywrapper(data=myData,xcol=x,ycol=y,colorVar=c)
The above gives an example of the 'original' usage by the ggplot2 package, and, how I would like to wrap it up in another function. The wrapper however, throws an error.
I am sure this applies to many other applications, and it has probably been answered a thousand times, however, I am not sure what this subject is 'called' within R.
The problem here is that ggplot looks for a column named xcol in the data object. I would recommend to switch to using aes_string and passing the column names you want to map using a string, e.g.:
mywrapper(data = myData, xcol = "x", ycol = "y", colorVar = "c")
And modify your wrapper accordingly:
mywrapper <- function(data, xcol, ycol, colorVar) {
writeLines("This is my wrapper")
plot <- ggplot(data = data, aes_string(x = xcol, y = ycol, color = colorVar)) + geom_point()
print(plot)
return(plot)
}
Some remarks:
Personal preference, I use a lot of spaces around e.g. x = 1, for me this greatly improves the readability. Without spaces the code looks like a big block.
If you return the plot to outside the function, I would not print it inside the function, but just outside the function.
This is just an addition to the original answer, and I do know that this is quite an old post, but just as an addition:
The original answer provides the following code to execute the wrapper:
mywrapper(data = "myData", xcol = "x", ycol = "y", colorVar = "c")
Here, data is provided as a character string. To my knowledge this will not execute correctly. Only the variables within the aes_string are provided as character strings, while the data object is passed to the wrapper as an object.

How to control plot layout for lmerTest output results?

I am using lme4 and lmerTest to run a mixed model and then use backward variable elimination (step) for my model. This seems to work well. After running the 'step' function in lmerTest, I plot the final model. The 'plot' results appear similar to ggplot2 output.
I would like to change the layout of the plot. The obvious answer is to do it manually myself creating an original plot(s) with ggplot2. If possible, I would like to simply change the layout of of the output, so that each plot (i.e. plotted dependent variable in the final model) are in their own rows.
See below code and plot to see my results. Note plot has three columns and I would like three rows. Further, I have not provided sample data (let me know if I need too!).
library(lme4)
library(lmerTest)
# Full model
Female.Survival.model.1 <- lmer(Survival.Female ~ Location + Substrate + Location:Substrate + (1|Replicate), data = Transplant.Survival, REML = TRUE)
# lmerTest - backward stepwise elimination of dependent variables
Female.Survival.model.ST <- step(Female.Survival.model.1, reduce.fixed = TRUE, reduce.random = FALSE, ddf = "Kenward-Roger" )
Female.Survival.model.ST
plot(Female.Survival.model.ST)
The function that creates these plots is called plotLSMEANS. You can look at the code for the function via lmerTest:::plotLSMEANS. The reason to look at the code is 1) to verify that, indeed, the plots are based on ggplot2 code and 2) to see if you can figure out what needs to be changed to get what you want.
In this case, it sounds like you'd want facet_wrap to have one column instead of three. I tested with the example from the **lmerTest* function step help page, and it looks like you can simply add a new facet_wrap layer to the plot.
library(ggplot2)
plot(Female.Survival.model.ST) +
facet_wrap(~namesforplots, scales = "free", ncol = 1)
Try this: plot(difflsmeans(Female.Survival.model.ST$model, test.effs = "Location "))

Adding points after the fact with ggplot2; user defined function

I believe the answer to this is that I cannot, but rather than give in utterly to depraved desperation, I will turn to this lovely community.
How can I add points (or any additional layer) to a ggplot after already plotting it? Generally I would save the plot to a variable and then just tack on + geom_point(...), but I am trying to include this in a function I am writing. I would like the function to make a new plot if plot=T, and add points to the existing plot if plot=F. I can do this with the basic plotting package:
fun <- function(df,plot=TRUE,...) {
...
if (!plot) { points(dYdX~Time.Dec,data=df2,col=col) }
else { plot(dYdX~Time.Dec,data=df2,...) }}
I would like to run this function numerous times with different dataframes, resulting in a plot with multiple series plotted.
For example,
fun(df.a,plot=T)
fun(df.b,plot=F)
fun(df.c,plot=F)
fun(df.d,plot=F)
The problem is that because functions in R don't have side-effects, I cannot access the plot made in the first command. I cannot save the plot to -> p, and then recall p in the later functions. At least, I don't think I can.
have a ggplot plot object be returned from your function that you can feed to your next function call like this:
ggfun = function(df, oldplot, plot=T){
...
if(plot){
outplot = ggplot(df, ...) + geom_point(df, ...)
}else{
outplot = oldplot + geom_point(data=df, ...)
}
print(outplot)
return(outplot)
}
remember to assign the plot object returned to a variable:
cur.plot = ggfun(...)

Trouble getting started on my user defined function in R that creates 3 plots.

I am new to R. I've been trying to work on my code but I am having trouble understanding which 'parts' go where. This is the code I was given:
df=10
boxplot(rt(1000,df),rnorm(1000),names=c(paste("t,df=",df),"Standard Normal"))
x=seq(0,1,length=150)
plot(qt(x,df),qnorm(x),xlab=paste("t, df=",df),ylab="standard
Normal",main="qq-plot")
abline(0,1)
curve(dnorm(x),-3.5,3.5,main="Density Comparison")
curve(dt(x,df),lty=2,add=TRUE)
legend("topright",legend=c("standard
normal",paste("t,df=",df)),lty=c(1,2))
I am supposed to create a user defined function that takes df as the input and output 3 types of plots. I need to use: df=5,10,25, and 50.
This is what I have so far. Please dumb it down for me since I'm not very familiar with R terminology and I am not sure I am placing things where they are supposed to go..:
my.plot = function(n, df) {
a = rt(n,df)
b=rnorm(1000)
x= seq(0,1,length=150)
qt=qt(x,df)
qn=qnorm(x)
dn=dnorm(x)
ledt=dt(x,df)
n=1000
}
thebox= boxplot(a,b,names=c(paste("t,df=",df),"Stand rd Normal")) #1boxplot.
theplot= plot(qt,qn,xlab=paste("t, df=",df),ylab="standard Normal",main="qq-plot")
abline(0,1)
onecurve= curve(dn,-3.5,3.5,main="Density Comparison") #density curve
twocurve= curve(ledt, lty=2,add=TRUE)
legend("topright",legend=c("standard normal",paste("t,df=",df)),lty=c(1,2)
}
return(thebox)
return(theplot)
return(oneplot)
return(twocurve)
}
par(mfrow=c(1,3))
my.plot(1000,5)
my.plot(1000,10)
my.plot(1000,25)
my.plot(1000,50)
It works like this:
1. Your only input parameter is df. So your function will have only one input variable.
2. Since you already have the remaining code functional (i.e. after defining df) that code can be used as it is.
Below is a simple implementation from your demo code. You can modify it to suite your needs.
my.plot <- function(df) {
par(mfrow=c(1,3))
boxplot(rt(1000,df),rnorm(1000),names=c(paste("t,df=",df),"Standard Normal"))
x=seq(0,1,length=150)
plot(qt(x,df),qnorm(x),xlab=paste("t, df=",df),ylab="standard
Normal",main="qq-plot")
abline(0,1)
curve(dnorm(x),-3.5,3.5,main="Density Comparison")
curve(dt(x,df),lty=2,add=TRUE)
legend("topright",legend=c("standard
normal",paste("t,df=",df)),lty=c(1,2))
par(mfrow=c(1,1))
}
my.plot(5)
for(df in c(5,10,15,25)) my.plot(df)

Resources