How to plot several violin plots in one panel as a function of time in R using lattice? - r

Here's a minimal example of my data and the plot I was able to adapt from this tutorial:
require(lattice)
t <- c(0.88,3.52,7.04,10.56,18.48,29.92,29.6,52.8,70.4)
n <- 1000
mu.A <- c(0.4014165,0.2444396,0.2200015,0.1829841,0.2087899,0.1385284,0.2150571,0.2272082,0.1643309 )
mu.C <- c(0.4670488,0.3561108,0.1957407,0.1564677,0.1199911,0.1883665,0.1678103,0.1194251,0.1274065 )
C <- A <- numeric(0)
for (i in 1:length(mu.C)) {C <- c(C,rnorm(mu.C[i],sd=0.031))}
for (i in 1:length(mu.A)) {A <- c(A,rnorm(mu.C[i],sd=0.021))}
data.f <- data.frame(C,A,rep(t,each=n))
colnames(data.f) <- c("C","A","Time")
bwplot(C + A ~ factor(Time),
data=data.f,
xlab="Time",
ylab="P. Estimate",
outer=T, # This parameter makes sure that the right hand side variable gets an own panel
as.table=T,
panel = function(...,box.ratio) {
panel.violin(...,col="lightblue",
box.ratio=box.ratio)
panel.bwplot(...,box.ratio=.1,pch="|")
},
par.settings = list(box.rectangle=list(col="black"),
plot.symbol=list(pch=".",cex=.001),
strip=strip.custom(factor.levels=c("C","A"))
)
)
Here's my problem: This plot doesn't have a proper time axis. It treats each element of t as a category of its own and not as a point on a continuous scale. In the experiment, time was measured and t are mean response time over all participants.
My approach here was to use xyplot() and use panel.violin as the panel function. However, this is not working. In the output, the violins are oriented horizontally and really huge. R also takes a very long time to draw the plot and eventually I have to kill the R-Studio session:
xyplot(C + A ~ Time,
data=data.f,
xlab="Time",
ylab="P. Estimate",
panel = function(...) {
panel.violin(...,col="lightblue")
},
par.settings = list(box.rectangle=list(col="black"),
plot.symbol=list(pch=".",cex=.001),
strip=strip.custom(factor.levels=c("C","A"))
)
)
I'm not so much looking for someone who just solves the problem for me but rather tells me where I'm making the mistake.
Disclaimer: This is not my real data. It's just a more convenient way to reproduce the data without having to upload it somewhere.

Related

Errors in R Histogram

Can anyone understand why this block of code isn't producing a histogram? Here is the code:
incremental <- c()
for (i in 1:1000) {
set.seed(42)
avg_2 = mean(runif(100))
incremental <- rbind(incremental, c(avg_2))
}
incremental <- as.numeric(incremental)
hist(incremental, main = "Histogram of Averages From For Loop",
xlab = "Averages")
Don't worry about the set.seed, it is part of the exercise. All the data points will be the same, but nothing shows up on the histogram. Why is this so? Here is a screenshot of the histogram:
Actually, you are just looking at a plot with one big bar. It's very hard for R (or anyone) to guess where to create breaks if you only observe one value. Maybe you want something like this:
hist(incremental, main = "Histogram of Averages From For Loop",
xlab = "Averages",
breaks=seq(0,1, length.out=10))
This tells hist() to create 10 breaks in the range from 0 to 1.

Extracting the exact coordinates of a mouse click in an interactive plot

In short: I'm looking for a way to get the exact coordinates of a series of mouse positions (on-clicks) in an interactive x/y scatter plot rendered by ggplot2 and ggplotly.
I'm aware that plotly (and several other interactive plotting packages for R) can be combined with Shiny, where a box- or lazzo select can return a list of all data points within the selected subspace. This list will be HUGE in most of the datasets I'm analysing, however, and I need to be able to do the analysis reproducibly in an R markdown format (writing a few, mostly less than 5-6, point coordinates is much more readable). Furthermore, I have to know the exact positions of the clicks to be able to extract points within the same polygon of points in a different dataset, so a list of points within the selection in one dataset is not useful.
The grid.locator() function from the grid package does almost what I'm looking for (the one wrapped in fx gglocator), however I hope there is a way to do the same within an interactive plot rendered by plotly (or maybe something else that I don't know of?) as the data sets are often HUGE (see the plot below) and thus being able to zoom in and out interactively is very much appreciated during several iterations of analysis.
Normally I have to rescale the axes several times to simulate zooming in and out which is exhausting when doing it MANY times. As you can see in the plot above, there is a LOT of information in the plots to explore (the plot is about 300MB in memory).
Below is a small reprex of how I'm currently doing it using grid.locator on a static plot:
library(ggplot2)
library(grid)
p <- ggplot(mtcars, aes(wt, mpg)) +
geom_point()
locator <- function(p) {
# Build ggplot object
ggobj <- ggplot_build(p)
# Extract coordinates
xr <- ggobj$layout$panel_ranges[[1]]$x.range
yr <- ggobj$layout$panel_ranges[[1]]$y.range
# Variable for selected points
selection <- data.frame(x = as.numeric(), y = as.numeric())
colnames(selection) <- c(ggobj$plot$mapping$x, ggobj$plot$mapping$y)
# Detect and move to plot area viewport
suppressWarnings(print(ggobj$plot))
panels <- unlist(current.vpTree()) %>%
grep("panel", ., fixed = TRUE, value = TRUE)
p_n <- length(panels)
seekViewport(panels, recording=TRUE)
pushViewport(viewport(width=1, height=1))
# Select point, plot, store and repeat
for (i in 1:10){
tmp <- grid.locator('native')
if (is.null(tmp)) break
grid.points(tmp$x,tmp$y, pch = 16, gp=gpar(cex=0.5, col="darkred"))
selection[i, ] <- as.numeric(tmp)
}
grid.polygon(x= unit(selection[,1], "native"), y= unit(selection[,2], "native"), gp=gpar(fill=NA))
#return a data frame with the coordinates of the selection
return(selection)
}
locator(p)
and from here use the point.in.polygon function to subset the data based on the selection.
A possible solution could be to add, say 100x100, invisible points to the plot and then use the plotly_click feature of event_data() in a Shiny app, but this is not at all ideal.
Thanks in advance for your ideas or solutions, I hope my question was clear enough.
-- Kasper
I used ggplot2. Besides the materials at https://shiny.rstudio.com/articles/plot-interaction.html, I'd like to mention the following:
Firstly, when you create the plot, don't use "print( )" within "renderPlot( )", or the coordinates would be wrong. For instance, if you have the following in UI:
plotOutput("myplot", click = "myclick")
The following in the Server would work:
output$myplot <- renderPlot({
p = ggplot(data = mtcars, aes(x=mpg, y=hp)) + geom_point()
p
})
But the clicking coordinates would be wrong if you do:
output$myplot <- renderPlot({
p = ggplot(data = mtcars, aes(x=mpg, y=hp)) + geom_point()
print(p)
})
Then, you could store the coordinates by adding to the Server:
mydata = reactiveValues(x_values = c(), y_values = c())
observeEvent(input$myclick, {
mydata$x_values = c(mydata$x_values, input$myclick$x)
mydata$y_values = c(mydata$y_values, input$myclick$y)
})
In addition to X-Y coordinates, when you use facet with ggplot2, you refer to the clicked facet panel by
input$myclick$panelvar1

Save plots as R objects and displaying in grid

In the following reproducible example I try to create a function for a ggplot distribution plot and saving it as an R object, with the intention of displaying two plots in a grid.
ggplothist<- function(dat,var1)
{
if (is.character(var1)) {
var1 <- which(names(dat) == var1)
}
distribution <- ggplot(data=dat, aes(dat[,var1]))
distribution <- distribution + geom_histogram(aes(y=..density..),binwidth=0.1,colour="black", fill="white")
output<-list(distribution,var1,dat)
return(output)
}
Call to function:
set.seed(100)
df <- data.frame(x = rnorm(100, mean=10),y =rep(1,100))
output1 <- ggplothist(dat=df,var1='x')
output1[1]
All fine untill now.
Then i want to make a second plot, (of note mean=100 instead of previous 10)
df2 <- data.frame(x = rep(1,1000),y = rnorm(1000, mean=100))
output2 <- ggplothist(dat=df2,var1='y')
output2[1]
Then i try to replot first distribution with mean 10.
output1[1]
I get the same distibution as before?
If however i use the information contained inside the function, return it back and reset it as a global variable it works.
var1=as.numeric(output1[2]);dat=as.data.frame(output1[3]);p1 <- output1[1]
p1
If anyone can explain why this happens I would like to know. It seems that in order to to draw the intended distribution I have to reset the data.frame and variable to what was used to draw the plot. Is there a way to save the plot as an object without having to this. luckly I can replot the first distribution.
but i can't plot them both at the same time
var1=as.numeric(output2[2]);dat=as.data.frame(output2[3]);p2 <- output2[1]
grid.arrange(p1,p2)
ERROR: Error in gList(list(list(data = list(x = c(9.66707664902549, 11.3631137069225, :
only 'grobs' allowed in "gList"
In this" Grid of multiple ggplot2 plots which have been made in a for loop " answer is suggested to use a list for containing the plots
ggplothist<- function(dat,var1)
{
if (is.character(var1)) {
var1 <- which(names(dat) == var1)
}
distribution <- ggplot(data=dat, aes(dat[,var1]))
distribution <- distribution + geom_histogram(aes(y=..density..),binwidth=0.1,colour="black", fill="white")
plot(distribution)
pltlist <- list()
pltlist[["plot"]] <- distribution
output<-list(pltlist,var1,dat)
return(output)
}
output1 <- ggplothist(dat=df,var1='x')
p1<-output1[1]
output2 <- ggplothist(dat=df2,var1='y')
p2<-output2[1]
output1[1]
Will produce the distribution with mean=100 again instead of mean=10
and:
grid.arrange(p1,p2)
will produce the same Error
Error in gList(list(list(plot = list(data = list(x = c(9.66707664902549, :
only 'grobs' allowed in "gList"
As a last attempt i try to use recordPlot() to record everything about the plot into an object. The following is now inside the function.
ggplothist<- function(dat,var1)
{
if (is.character(var1)) {
var1 <- which(names(dat) == var1)
}
distribution <- ggplot(data=dat, aes(dat[,var1]))
distribution <- distribution + geom_histogram(aes(y=..density..),binwidth=0.1,colour="black", fill="white")
plot(distribution)
distribution<-recordPlot()
output<-list(distribution,var1,dat)
return(output)
}
This function will produce the same errors as before, dependent on resetting the dat, and var1 variables to what is needed for drawing the distribution. and similarly can't be put inside a grid.
I've tried similar things like arrangeGrob() in this question "R saving multiple ggplot2 plots as R-object in list and re-displaying in grid " but with no luck.
I would really like a solution that creates an R object containing the plot, that can be redrawn by itself and can be used inside a grid without having to reset the variables used to draw the plot each time it is done. I would also like to understand wht this is happening as I don't consider it intuitive at all.
The only solution I can think of is to draw the plot as a png file, saved somewhere and then have the function return the path such that i can be reused - is that what other people are doing?.
Thanks for reading, and sorry for the long question.
Found a solution
How can I reference the local environment within a function, in R?
by inserting
localenv <- environment()
And referencing that in the ggplot
distribution <- ggplot(data=dat, aes(dat[,var1]),environment = localenv)
made it all work! even with grid arrange!

Setting equal xlim and ylim in plot function

Is there a way to get the plot function to generate equal xlimand ylimautomatically?
I do not want to define a fix range beforehand, but I want the plot function to decide about the range itself. However, I expect it to pick the same range for x and y.
A possible solution is to define a wrapper to the plot function:
plot.Custom <- function(x, y, ...) {
.limits <- range(x, y)
plot(x, y, xlim = .limits, ylim = .limits, ...)
}
One way is to manipulate interactively and then choose the right one. A slider will appear once you run the following code.
library(manipulate)
manipulate(
plot(cars, xlim=c(x.min,x.max)),
x.min=slider(0,15),
x.max=slider(15,30))
I'm not aware of anyway to do this using plot(doesn't mean there isn't one). ggplot might be the way to go; it lends itself more to be being retroactively changed since it is designed around a layer system.
library(ggplot2)
#Creating our ggplot object
loop_plot <- ggplot(cars, aes(x = speed, y = dist)) +
geom_point()
#pulling out the 'auto' x & y axis limits
rangepull <- t(cbind(
ggplot_build(loop_plot)$panel$ranges[[1]]$x.range,
ggplot_build(loop_plot)$panel$ranges[[1]]$y.range))
#taking the max and min(so we don't cut out data points)
newrange <- list(cor.min = min(rangepull[,1]), cor.max = max(rangepull[,2]))
#changing our plot size to be nice and symmetric
loop_plot <- loop_plot +
xlim(newrange$cor.min, newrange$cor.max) +
ylim(newrange$cor.min, newrange$cor.max)
Note that the loop_plot object is of ggplot class, and wont actually print until its called.
I used the cars dataset in the code above to show whats going on, but just sub in your data set[s] and then do whatever postmortem your end goal is.
You'll also be able to add in titles and the like based off of the dataset name et cetera which will likely end up producing a clearer visualization out of your loop.
Hopefully this works for your needs.

lattice or latticeExtra combine multiple plots different yscaling (log10 and non-transformed)

I have a multiple variable time series were some of the variables have rather large ranges. I wish to make a single-page plot with multiple stacked plots of each variable were some of the variables have a log10 y-axis scaling. I am relatively new to lattice and have not been able to figure out how to effectively mix the log10 scaling with non-transformed axes and get a publication quality plot. If print.trellis is used the plots are not aligned and the padding needs some work, if c.trellis is used the layout is good, but only the y-scaling from only one plot is used. Any suggestions for an efficient solution, where I can replicate the output of c.trellis using the different y-scaling for each (original) object?
Example below:
require(lattice)
require(latticeExtra)
# make data.frame
d.date <- as.POSIXct(c("2009-12-15", "2010-01-15", "2010-02-15", "2010-03-15", "2010-04-15"))
CO2dat <- c(100,200,1000,9000,2000)
pHdat <- c(10,9,7,6,7)
tmp <- data.frame(date=d.date ,CO2dat=CO2dat ,pHdat=pHdat)
# make plots
plot1 <- xyplot(pHdat ~ date, data=tmp
, ylim=c(5,11)
, ylab="pHdat"
, xlab="Date"
, origin = 0, border = 0
, scales=list(y=list(alternating=1))
, panel = function(...){
panel.xyarea(...)
panel.xyplot(...)
}
)
# make plot with log y scale
plot2 <- xyplot(CO2dat ~ date, data=tmp
, ylim=c(10,10^4)
, ylab="CO2dat"
, xlab="Date"
, origin = 0, border = 0
, scales=list(y=list(alternating=1,log=10))
, yscale.components = yscale.components.log10ticks
, panel = function(...){
panel.xyarea(...)
panel.xyplot(...)
# plot CO2air uatm
panel.abline(h=log10(390),col="blue",type="l",...)
}
)
# plot individual figures using split
print(plot2, split=c(1,1,1,2), more=TRUE)
print(plot1, split=c(1,2,1,2), more=F)
# combine plots (more convenient)
comb <- c(plot1, plot2, x.same=F, y.same=F, layout = c(1, 2))
# plot combined figure
update(comb, ylab = c("pHdat","log10 CO2dat"))
Using #joran's idea, I can get the axes to be closer but not exact; also, reducing padding gets them closer together but changes the aspect ratio. In the picture below I've reduced the padding perhaps by too much to show the not exactness; if this close were desired, you'd clearly want to remove the x-axis labels on the top as well.
I looked into the code that sets up the layout and the margin on the left side is calculated from the width of the labels, so #joran's idea is probably the only thing that will work based on the printing using split, unless one were to rewrite the plot.trellis command. Perhaps the c method could work but I haven't found a way yet to set the scale components separately depending on the panel. That does seem more promising though.
mtheme <- standard.theme("pdf")
mtheme$layout.heights$bottom.padding <- -10
plot1b <- update(plot1, scales=list(y=list(alternating=1, at=5:10, labels=paste(" ",c(5:10)))))
plot2b <- update(plot2, par.settings=mtheme)
pdf(file="temp.pdf")
print(plot2b, split=c(1,1,1,2), more=TRUE)
print(plot1b, split=c(1,2,1,2), more=F)

Resources