I have the following data
[1] 0.09733344 0.17540020 0.14168188 0.54093074 0.78151039 0.28068527
[7] 1.96164429 0.33743328 0.05200734 0.09103039 0.28842044 0.09240131
[13] 0.09143535 0.38142022 0.11700952
from which I did bayesian inference and made a plot with the following code
f_theta<-function(theta,Data){
(theta^length(Data) )*exp(-theta*sum(Data))}
theta<-seq(1,20,length=100)
a=b=0.001
plot(theta,dgamma(theta,a,b),type="l",col="red",
ylim=c(0,2),tck=-0.01,cex.lab=0.8,cex.axis=0.8)
lines(theta,dgamma(theta,length(Data)+a,sum(Data)+b),col="green",lty=1)
lines(theta,f_theta(theta,Data=Data),lty=1,col="blue")
legend('topright',legend=c("Prior","Post","Likelihood")
,col=c("red","green","blue","purple"),lty=1,bty="n",cex=0.8)
But I've seen the following graph
which has code
# ggplot2 examples
library(ggplot2)
# create factors with value labels
mtcars$gear <- factor(mtcars$gear,levels=c(3,4,5),
labels=c("3gears","4gears","5gears"))
mtcars$am <- factor(mtcars$am,levels=c(0,1),
labels=c("Automatic","Manual"))
mtcars$cyl <- factor(mtcars$cyl,levels=c(4,6,8),
labels=c("4cyl","6cyl","8cyl"))
# Kernel density plots for mpg
# grouped by number of gears (indicated by color)
qplot(mpg, data=mtcars, geom="density", fill=gear, alpha=I(.5),
main="Distribution of Gas Milage", xlab="Miles Per Gallon",
ylab="Density")
but I'm not quite familiar with ggplot library and graphs and I would like some help in order to adapt my code and make a graph similar to last one.
ggplot() assumes that your data are in a particular format (sometimes called "long", but the author of ggplot() dislikes that description), so let's start by putting them into that format:
Data2 = data.frame(
theta = rep(theta, 3),
WhichDistribution = c(rep("Prior",length(theta)), rep("Post",length(theta)), rep("Likelihood",length(theta))),
Density = c(dgamma(theta,a,b), dgamma(theta,length(Data)+a,sum(Data)+b), f_theta(theta,Data=Data))
)
Then we can construct a ggplot() command. ggplot() needs data, aesthetics, and a geometry. Your data will be the data frame just constructed. The aesthetics refer generally to how the qualities of the data will impact the graph (what is on axes, what determines groups, etc.), and the geometry is the kind of plot (not a great wording, sorry).
ggplot(Data2, aes(x=theta, y=Density, group=WhichDistribution, color=WhichDistribution, fill=WhichDistribution))+
# position="identity" in order to not stack the densities
geom_area(alpha=.2, position="identity") +
# gets rid of the title on the legend
theme(legend.title = element_blank())+
# make the horizontal axis label pretty
scale_x_continuous(expression(theta))
You can change alpha to adjust transparency. If you want the horizontal axis to not go all the way to 20, change it in scale_x_continuous():
ggplot(Data2, aes(x=theta, y=Density, group=WhichDistribution, color=WhichDistribution, fill=WhichDistribution))+
# position="identity" in order to not stack the densities
geom_area(alpha=.2, position="identity") +
# gets rid of the title on the legend
theme(legend.title = element_blank())+
# make the horizontal axis label pretty
scale_x_continuous(expression(theta), limits=c(0,7))
qplot() is a quick plotting function that seems to mostly get in the way for people trying to learn the ggplot() language, so you might want to avoid it.
Related
I did everything in ggplot, and it was everything working well. Now I need it to show data when I point a datapoint. In this example, the model (to identify point), and the disp and wt ( data in axis).
For this I added the shape (same shape, I do not actually want different shapes) to model data. and asked ggplot not to show shape in legend. Then I convert to plotly. I succeeded in showing the data when I point the circles, but now I am having problems with the legend showing colors and shapes separated with a comma...
I did not wanted to make it again from scrach in plotly as I have no experience in plotly, and this is part of a much larger shiny project, where the chart adjust automatically the axis scales and adds trend lines the the chart among other things (I did not include for simplicity) that I do not know how to do it in plotly.
Many thanks in advance. I have tried a million ways for a couple of days now, and did not succeed.
# choose mtcars data and add rowname as column as I want to link it to shapes in ggplot
data1 <- mtcars
data1$model <- rownames(mtcars)
# I turn cyl data to character as when charting it showed (Error: Continuous value supplied to discrete scale)
data1$cyl <- as.character(data1$cyl)
# linking colors with cylinders and shapes with models
ccolor <- c("#E57373","purple","green")
cylin <- c(6,4,8)
# I actually do not want shapes to be different, only want to show data of model when I point the data point.
models <- data1$model
sshapes <- rep(16,length(models))
# I am going to chart, do not want legend to show shape
graff <- ggplot(data1,aes(x=disp, y=wt,shape=model,col=cyl)) +
geom_point(size = 1) +
ylab ("eje y") + xlab('eje x') +
scale_color_manual(values= ccolor, breaks= cylin)+
scale_shape_manual(values = sshapes, breaks = models)+
guides(shape='none') # do not want shapes to show in legend
graff
chart is fine, but when converting to ggplotly, I am having trouble with the legend
# chart is fine, but when converting to ggplotly, I am having trouble with the legend
graffPP <- ggplotly(graff)
graffPP
legend is not the same as it was in ggplot
I succeeded in showing the model and data from axis when I point a datapoint in the chart... but now I am having problems with the legend....
To the best of my knowledge there is no easy out-of-the box solution to achieve your desired result.
Using pure plotly you could achieve your result by assigning legendgroups which TBMK is not available using ggplotly. However, you could assign the legend groups manually by manipulating the plotly object returned by ggplotly.
Adapting my answer on this post to your case you could achieve your desired result like so:
library(plotly)
p <- ggplot(data1, aes(x = disp, y = wt, shape = model, col = cyl)) +
geom_point(size = 1) +
ylab("eje y") +
xlab("eje x") +
scale_color_manual(values = ccolor, breaks = cylin) +
scale_shape_manual(values = sshapes, breaks = models) +
guides(shape = "none")
gp <- ggplotly(p = p)
# Get the names of the legend entries
df <- data.frame(id = seq_along(gp$x$data), legend_entries = unlist(lapply(gp$x$data, `[[`, "name")))
# Extract the group identifier, i.e. the number of cylinders from the legend entries
df$legend_group <- gsub("^\\((\\d+).*?\\)", "\\1", df$legend_entries)
# Add an indicator for the first entry per group
df$is_first <- !duplicated(df$legend_group)
for (i in df$id) {
# Is the layer the first entry of the group?
is_first <- df$is_first[[i]]
# Assign the group identifier to the name and legendgroup arguments
gp$x$data[[i]]$name <- df$legend_group[[i]]
gp$x$data[[i]]$legendgroup <- gp$x$data[[i]]$name
# Show the legend only for the first layer of the group
if (!is_first) gp$x$data[[i]]$showlegend <- FALSE
}
gp
I make very slow progress in R but now I'm able to do some stuff.
Right now I'm plotting the effects of 4 treatments on plant growth in one graph. As you can see the errorbars overlap which is why I made them different colors. I think in order to make the graph clearer it's better to use the lower errorbars as "half wiskers" for the lower 2 lines, and the upper errorbars for the top two lines (like I have now), see the attached image for reference
Is that doable with the way my script is set up now?
Here is part of my script of the plot, I have a lot more but this is where I specify the plot itself (leaving out the aesthetics and stuff), thanks in advance:
"soda1" is my altered dataframe, setup in a clear way, "sdtv" are my standard deviations for each timepoint/treatment, "oppervlak" is my y variable and "Measuring Date" is my x variable. "Tray ID" is the treatment, so my grouping variable.
p <- ggplot(soda1, aes(x=reorder(`Measuring Date`, oppervlak), y=`oppervlak`, group=`Tray ID`, fill=`Tray ID`, colour = `Tray ID` )) +
scale_fill_brewer(palette = "Spectral") +
geom_errorbar(data=soda1, mapping=aes(ymin=oppervlak, ymax=oppervlak+sdtv, group=`Tray ID`), width=0.1) +
geom_line(aes(linetype=`Tray ID`)) +
geom_point(mapping=aes(x=`Measuring Date`, y=oppervlak, shape=`Tray ID`))
print(p)
Showing only one side of errorbars can hide an overlap in the uncertainty between the distribution of two or more variables or measurements.
Instead of hiding this overlap, you could adjust the position of your errorbars horizontally very easily by adding position=position_dodge(width=) to your call to geom_errorbar().
For example:
library(ggplot2)
# some random data with two factors
df <- data.frame(a=rep(1:10, times=2),
b=runif(20),
treat=as.factor(rep(c(0,1), each=10)),
errormax=runif(20),
errormin=runif(20))
# plotting both sides of the errorbars, but dodging them horizontally
p <- ggplot(data=df, aes(x=a, y=b, colour=treat)) +
geom_line() +
geom_errorbar(data=df, aes(ymin=b-errormin, ymax=b+errormax),
position=position_dodge(width=0.25))
Does anyone know how to use ggplot() to redraw a ggplot2 example given in Quick-R. The example link is http://www.statmethods.net/advgraphs/ggplot2.html
I want to redraw the second (or fifth) graph on that webpage by using ggplot(), rather than qplot(). Especially, how to realize the same plot structure (3 by 3 plot, the same label, organizations...)
Specifically, the example is,
# ggplot2 examples
library(ggplot2)
# create factors with value labels
mtcars$gear <- factor(mtcars$gear,levels=c(3,4,5),
labels=c("3gears","4gears","5gears"))
mtcars$am <- factor(mtcars$am,levels=c(0,1),
labels=c("Automatic","Manual"))
mtcars$cyl <- factor(mtcars$cyl,levels=c(4,6,8),
labels=c("4cyl","6cyl","8cyl"))
# Scatterplot of mpg vs. hp for each combination of gears and cylinders
# in each facet, transmittion type is represented by shape and color
qplot(hp, mpg, data=mtcars, shape=am, color=am,
facets=gear~cyl, size=I(3),
xlab="Horsepower", ylab="Miles per Gallon")
How to draw the same picture without change the data.frame structure? That is, only make changes starting from the qplot() code by using ggplot().
This is a way to reproduce the plot with ggplot:
library(ggplot2)
ggplot(aes(x=hp, y=mpg, shape=as.factor(am), color=as.factor(am)), data=mtcars)+
facet_grid(gear~cyl) +
geom_point(size=I(3)) +
xlab("Horsepower") +
ylab("Miles per Gallon")
Note that I replaced am with as.factor(am) since a continuous value cannot be mapped to a shape scale. If you want to change the legend title to am, like in the original plot, you have to add the following command to the plot:
guides(shape = guide_legend(title="am"),
color = guide_legend(title="am"))
What I really want to do is plot a histogram, with the y-axis on a log-scale. Obviously this i a problem with the ggplot2 geom_histogram, since the bottom os the bar is at zero, and the log of that gives you trouble.
My workaround is to use the freqpoly geom, and that more-or less does the job. The following code works just fine:
ggplot(zcoorddist) +
geom_freqpoly(aes(x=zcoord,y=..density..),binwidth = 0.001) +
scale_y_continuous(trans = 'log10')
The issue is that at the edges of my data, I get a couple of garish vertical lines that really thro you off visually when combining a bunch of these freqpoly curves in one plot. What I'd like to be able to do is use points at every vertex of the freqpoly curve, and no lines connecting them. Is there a way to to this easily?
The easiest way to get the desired plot is to just recast your data. Then you can use geom_point. Since you don't provide an example, I used the standard example for geom_histogram to show this:
# load packages
require(ggplot2)
require(reshape)
# get data
data(movies)
movies <- movies[, c("title", "rating")]
# here's the equivalent of your plot
ggplot(movies) + geom_freqpoly(aes(x=rating, y=..density..), binwidth=.001) +
scale_y_continuous(trans = 'log10')
# recast the data
df1 <- recast(movies, value~., measure.var="rating")
names(df1) <- c("rating", "number")
# alternative way to recast data
df2 <- as.data.frame(table(movies$rating))
names(df2) <- c("rating", "number")
df2$rating <- as.numeric(as.character(df$rating))
# plot
p <- ggplot(df1, aes(x=rating)) + scale_y_continuous(trans="log10", name="density")
# with lines
p + geom_linerange(aes(ymax=number, ymin=.9))
# only points
p + geom_point(aes(y=number))
What's the ggplot2 equivalent of "dotplot" histograms? With stacked points instead of bars? Similar to this solution in R:
Plot Histogram with Points Instead of Bars
Is it possible to do this in ggplot2? Ideally with the points shown as stacks and a faint line showing the smoothed line "fit" to these points (which would make a histogram shape.)
ggplot2 does dotplots Link to the manual.
Here is an example:
library(ggplot2)
set.seed(789); x <- data.frame(y = sample(1:20, 100, replace = TRUE))
ggplot(x, aes(y)) + geom_dotplot()
In order to make it behave like a simple dotplot, we should do this:
ggplot(x, aes(y)) + geom_dotplot(binwidth=1, method='histodot')
You should get this:
To address the density issue, you'll have to add another term, ylim(), so that your plot call will have the form ggplot() + geom_dotplot() + ylim()
More specifically, you'll write ylim(0, A), where A will be the number of stacked dots necessary to count 1.00 density. In the example above, the best you can do is see that 7.5 dots reach the 0.50 density mark. From there, you can infer that 15 dots will reach 1.00.
So your new call looks like this:
ggplot(x, aes(y)) + geom_dotplot(binwidth=1, method='histodot') + ylim(0, 15)
Which will give you this:
Usually, this kind of eyeball estimate will work for dotplots, but of course you can try other values to fine-tune your scale.
Notice how changing the ylim values doesn't affect how the data is displayed, it just changes the labels in the y-axis.
As #joran pointed out, we can use geom_dotplot
require(ggplot2)
ggplot(mtcars, aes(x = mpg)) + geom_dotplot()
Edit: (moved useful comments into the post):
The label "count" it's misleading because this is actually a density estimate may be you could suggest we changed this label to "density" by default. The ggplot implementation of dotplot follow the original one of Leland Wilkinson, so if you want to understand clearly how it works take a look at this paper.
An easy transformation to make the y axis actually be counts, i.e. "number of observations". From the help page it is written that:
When binning along the x axis and stacking along the y axis, the numbers on y axis are not meaningful, due to technical limitations of ggplot2. You can hide the y axis, as in one of the examples, or manually scale it to match the number of dots.
So you can use this code to hide y axis:
ggplot(mtcars, aes(x = mpg)) +
geom_dotplot(binwidth = 1.5) +
scale_y_continuous(name = "", breaks = NULL)
I introduce an exact approach using #Waldir Leoncio's latter method.
library(ggplot2); library(grid)
set.seed(789)
x <- data.frame(y = sample(1:20, 100, replace = TRUE))
g <- ggplot(x, aes(y)) + geom_dotplot(binwidth=0.8)
g # output to read parameter
### calculation of width and height of panel
grid.ls(view=TRUE, grob=FALSE)
real_width <- convertWidth(unit(1,'npc'), 'inch', TRUE)
real_height <- convertHeight(unit(1,'npc'), 'inch', TRUE)
### calculation of other values
width_coordinate_range <- diff(ggplot_build(g)$panel$ranges[[1]]$x.range)
real_binwidth <- real_width / width_coordinate_range * 0.8 # 0.8 is the argument binwidth
num_balls <- real_height / 1.1 / real_binwidth # the number of stacked balls. 1.1 is expanding value.
# num_balls is the value of A
g + ylim(0, num_balls)
Apologies : I don't have enough reputation to 'comment'.
I like cuttlefish44's "exact approach", but to make it work (with ggplot2 [2.2.1]) I had to change the following line from :
### calculation of other values
width_coordinate_range <- diff(ggplot_build(g)$panel$ranges[[1]]$x.range)
to
### calculation of other values
width_coordinate_range <- diff(ggplot_build(g)$layout$panel_ranges[[1]]$x.range)