Is it possible to plot a boxplot and a stripchart next to each other in the same figure? If I run this code, the stripchart overrides the boxplots. What i actually want is that they lay next to each other. In hat way a figure with 10 column on the x-as will be formed. Is that possible?
boxplot(doubles[1:5,])
stripchart(doubles[6:10,],add=TRUE,vertical=TRUE, pch=19)
Some example of you data would be good, but the easiest option is probably:
#random data corresponding to your 5 columns
x <- data.frame(V = rnorm(100), W = rnorm(100), X = rnorm(100), Y = rnorm(100),
Z = rnorm(100))
#remove axis with 'axes=F', define wider x-limits with 'xlim'
stripchart(x[1:5,],vertical=TRUE, pch=19,xlim=c(1,6),axes=F)
#add boxplots next to stripchart, decrease width with 'boxwex'
boxplot(x[1:5,],add=T,at=1.5:5.5,boxwex=0.25,axes=F)
#add custom x axis
axis(1,at=1.25:5.25,labels=names(x))
Use ggplot2
library(ggplot2)
qplot(treatment, decrease, data = OrchardSprays) +
scale_y_log10() +
geom_boxplot() +
geom_point(colour = 'blue', alpha = 0.5)
Related
I would like to extract the breaks and the colour values associated with a ggplot continuous colour scale. There are multiple answers to finding the colour associated with each date point (like this), which can also be used to get discrete scale values, but I haven't seen an approach for a continuous colour scale. I don't want to force the scales, just retrieve the values that ggplot generates.
example:
library(ggplot)
df <- data.frame(x = 1:10, y = 1:10, col = 11:20)
ggplot(df) +
geom_point(aes(x = x, y = y, colour = col))
I would like to get a data frame showing breaks (12.5, 15, 17.5, 20) and the colour values associated with them.
Many thanks!
There are two ways of doing this, once with building the plot and once without building the plot.
If we build the plot;
library(ggplot2)
df <- data.frame(x = 1:10, y = 1:10, col = 11:20)
ggplot(df) +
geom_point(aes(x = x, y = y, colour = col))
We can extract the scale and use it to retrieve the relevant information.
# Using build plot
build <- ggplot_build(last_plot())
scale <- build$plot$scales$get_scales("colour")
breaks <- scale$get_breaks()
colours <- scale$map(breaks)
data.frame(breaks = breaks, colours = colours)
#> breaks colours
#> 1 NA grey50
#> 2 12.5 #1D3F5E
#> 3 15.0 #2F638E
#> 4 17.5 #4289C1
#> 5 20.0 #56B1F7
Alternatively, we can skip building the plot and use the scales themselves directly, provided we 'train' the scales by showing it the limits of the data.
scale <- scale_colour_continuous()
scale$train(range(df$col))
breaks <- scale$get_breaks()
colours <- scale$map(breaks)
data.frame(breaks = breaks, colours = colours)
As you can see, the default breaks algorithm produces an out-of-bounds break. If you want to use the information later on, it might be good to filter those out.
I'm generating violin plots in ggplot2 for a time series, year_1 to year_32. The years in my df are stored as numerical values. From the examples I've seen, it seems that I must convert these numerical year values to factors to plot one violin per year; and in fact, if I run the code without as.factors, I get one big fat violin. I would like to understand why geom_violin can't have numeric values on the x axis; or if I'm wrong about that, how to use them?
So:
my_data$year <- as.factor(my_data$year)
p <- ggplot(data = my_data, aes(x = year, y = continuous_var)+
geom_violin(fill = "#FF0000", color = "#000000")+
ylim(0,500)+
labs(x = "x_label", y = "y_label")
p +my_theme()
works fine, but if I skip
my_data$year <- as.factor(my_data$year)
it doesn't work, I get one big fat violin for all years. Why?
TIA
You miss a ) at the end of this line p <- ggplot(data = my_data, aes(x = year, y = continuous_var)
I have construced a reproducible example with the ToothGrowth dataset:
This should work now:
library(ggplot2)
my_data <- ToothGrowth
my_data$dose <- as.factor(my_data$dose)
p <- ggplot(data = my_data, aes(x = dose, y = len))+
geom_violin(fill = "#FF0000", color = "#000000")+
ylim(0,500)+
labs(x = "x_label", y = "y_label") +
theme_bw()
p
PS: this discussion would better fit Cross Validated, as it's more of an statistics than coding question.
I'm not 100% sure, but here's my explanation: the violin plot shows the density for a set of data, you can divide your data into groups so that you can plot one violin for each part of your data. But if the metric you're using to divide groups (x axis) is a continuous, you're going to have infinite groupings (one group for the values at 0, one for 0.1, one for 0.01, etc.), so in the end you actually can't divide your data, and ggplot probably ignores the x variable and makes one violin for all your data.
I am trying to combine a line plot and horizontal barplot on the same plot. The difficult part is that the barplot is actually counts of the y values of the line plot.
Can someone show me how this can be done using the example below ?
library(ggplot2)
library(plyr)
x <- c(1:100)
dff <- data.frame(x = x,y1 = sample(-500:500,size=length(x),replace=T), y2 = sample(3:20,size=length(x),replace=T))
counts <- ddply(dff, ~ y1, summarize, y2 = sum(y2))
# line plot
ggplot(data=dff) + geom_line(aes(x=x,y=y1))
# bar plot
ggplot() + geom_bar(data=counts,aes(x=y1,y=y2),stat="identity")
I believe what I need is presented in the pseudocode below but I do not know how to write it out in R.
Apologies. I actually meant the secondary x axis representing the value of counts for the barplot, while primary y-axis is the y1.
ggplot(data=dff) + geom_line(aes(x=x,y=y1)) + geom_bar(data=counts , aes(primary y axis = y1,secondary x axis =y2),stat="identity")
I just want the barplots to be plotted horizontally, so I tried the code below which flip both the line chart and barplot, which is also not I wanted.
ggplot(data=dff) +
geom_line(aes(x=x,y=y1)) +
geom_bar(data=counts,aes(x=y2,y=y1),stat="identity") + coord_flip()
You can combine two plots in ggplot like you want by specifying different data = arguments in each geom_ layer (and none in the original ggplot() call).
ggplot() +
geom_line(data=dff, aes(x=x,y=y1)) +
geom_bar(data=counts,aes(x=y1,y=y2),stat="identity")
The following plot is the result. However, since x and y1 have different ranges, are you sure this is what you want?
Perhaps you want y1 on the vertical axis for both plots. Something like this works:
ggplot() +
geom_line(data=dff, aes(x=y1 ,y = x)) +
geom_bar(data=counts,aes(x=y1,y=y2),stat="identity", color = "red") +
coord_flip()
Maybe you are looking for this. Ans based on your last code you look for a double axis. So using dplyr you can store the counts in the same dataframe and then plot all variables. Here the code:
library(ggplot2)
library(dplyr)
#Data
x <- c(1:100)
dff <- data.frame(x = x,y1 = sample(-500:500,size=length(x),replace=T), y2 = sample(3:20,size=length(x),replace=T))
#Code
dff %>% group_by(y1) %>% mutate(Counts=sum(y2)) -> dff2
#Scale factor
sf <- max(dff2$y1)/max(dff2$Counts)
# Plot
ggplot(data=dff2)+
geom_line(aes(x=x,y=y1),color='blue',size=1)+
geom_bar(stat='identity',aes(x=x,y=Counts*sf),fill='tomato',color='black')+
scale_y_continuous(name="y1", sec.axis = sec_axis(~./sf, name="Counts"))
Output:
I'm trying to plot a line graph (data points between 0 and 2.5, with interval of 0.5). I want to plot some bars in the same chart on the right-hand axis (between 0 and 60 with interval of 10). I am making some mistake in my code such that the bars get plotted in the left hand axis.
Here's some sample data and code:
Month <- c("J","F","M","A")
Line <- c(2.5,2,0.5,3.4)
Bar <- c(30,33,21,40)
df <- data.frame(Month,Line,Bar)
ggplot(df, aes(x=Month)) +
geom_line(aes(y = Line,group = 1)) +
geom_col(aes(y=Bar))+
scale_y_continuous("Line",
sec.axis = sec_axis(trans= ~. /50, name = "Bar"))
Here's the output
Thanks in advance.
Try this approach with scaling factor. It is better if you work with a scaling factor between your variables and then you use it for the second y-axis. I have made slight changes to your code:
library(tidyverse)
#Data
Month <- c("J","F","M","A")
Line <- c(2.5,2,0.5,3.4)
Bar <- c(30,33,21,40)
df <- data.frame(Month,Line,Bar)
#Scale factor
sfactor <- max(df$Line)/max(df$Bar)
#Plot
ggplot(df, aes(x=Month)) +
geom_line(aes(y = Line,group = 1)) +
geom_col(aes(y=Bar*sfactor))+
scale_y_continuous("Line",
sec.axis = sec_axis(trans= ~. /sfactor, name = "Bar"))
Output:
I am trying to create a graph where because there are so many points on the graph, at the edges of the green it starts to fade to black while the center stays green. The code I am currently using to create this graph is:
plot(snb$px,snb$pz,col=snb$event_type,xlim=c(-2,2),ylim=c(1,6))
I looked into contour plotting but that did not work for this. The coloring variable is a factor variable.
Thanks!
This is a great problem for ggplot2.
First, read the data in:
snb <- read.csv('MLB.csv')
With your data frame you could try plotting points that are partly transparent, and setting them to be colored according to the factor event_type:
require(ggplot2)
p1 <- ggplot(data = snb, aes(x = px, y = py, color = event_type)) +
geom_point(alpha = 0.5)
print(p1)
and then you get this:
Or, you might want to think about plotting this as a heatmap using geom_bin2d(), and plotting facets (subplots) for each different event_type, like this:
p2 <- ggplot(data = snb, aes(x = px, y = py)) +
geom_bin2d(binwidth = c(0.25, 0.25)) +
facet_wrap(~ event_type)
print(p2)
which makes a plot for each level of the factor, where the color will be the number of data points in each bins that are 0.25 on each side. But, if you have more than about 5 or 6 levels, this might look pretty bad. From the small data sample you supplied, I got this
If the levels of the factors don't matter, there are some nice examples here of plots with too many points. You could also try looking at some of the examples on the ggplot website or the R cookbook.
Transparency could help, which is easily achieved, as #BenBolker points out, with adjustcolor:
colvect = adjustcolor(c("black", "green"), alpha = 0.2)
plot(snb$px, snb$pz,
col = colvec[snb$event_type],
xlim = c(-2,2),
ylim = c(1,6))
It's built in to ggplot:
require(ggplot2)
p <- ggplot(data = snb, aes(x = px, y = pz, color = event_type)) +
geom_point(alpha = 0.2)
print(p)