I try to display a line on top of a boxplot graph with the x made from factor.
This code work well:
x <- c(91,92,93,125,123,140)
y <- c(200,260,220,300,350,360)
d1 <- data.frame(x=x,y=y)
d1$f1 = factor(round(d1$x/10))
qplot(f1,y,data=d1,geom="boxplot")
d2<-data.frame(x2=c(90,140),y2=c(210,320))
qplot(x2,y2,data=d2,geom="line")
But when i try to add the line to the graph...
qplot(f1,y,data=d1,geom="boxplot") + geom_line(data = d2, aes(x = x2, y=y2))
To see my results: http://jeb-files.s3.amazonaws.com/Clipboard01.jpg
How do I manage to have my line align with my boxplot?
Thanks!
A boxplot requires the x-values to be factors, whereas a geom_line requires the x-values to be numeric. You can get what you want by modifying the geom_line call so that the x value is defined as the numeric version of the ordered factor obtained from round(x2/10):
qplot( f1,y,data=d1,geom="boxplot") +
geom_line(data = d2, aes(x = as.numeric(ordered(round(x2/10))), y=y2))
Related
I'm generating violin plots in ggplot2 for a time series, year_1 to year_32. The years in my df are stored as numerical values. From the examples I've seen, it seems that I must convert these numerical year values to factors to plot one violin per year; and in fact, if I run the code without as.factors, I get one big fat violin. I would like to understand why geom_violin can't have numeric values on the x axis; or if I'm wrong about that, how to use them?
So:
my_data$year <- as.factor(my_data$year)
p <- ggplot(data = my_data, aes(x = year, y = continuous_var)+
geom_violin(fill = "#FF0000", color = "#000000")+
ylim(0,500)+
labs(x = "x_label", y = "y_label")
p +my_theme()
works fine, but if I skip
my_data$year <- as.factor(my_data$year)
it doesn't work, I get one big fat violin for all years. Why?
TIA
You miss a ) at the end of this line p <- ggplot(data = my_data, aes(x = year, y = continuous_var)
I have construced a reproducible example with the ToothGrowth dataset:
This should work now:
library(ggplot2)
my_data <- ToothGrowth
my_data$dose <- as.factor(my_data$dose)
p <- ggplot(data = my_data, aes(x = dose, y = len))+
geom_violin(fill = "#FF0000", color = "#000000")+
ylim(0,500)+
labs(x = "x_label", y = "y_label") +
theme_bw()
p
PS: this discussion would better fit Cross Validated, as it's more of an statistics than coding question.
I'm not 100% sure, but here's my explanation: the violin plot shows the density for a set of data, you can divide your data into groups so that you can plot one violin for each part of your data. But if the metric you're using to divide groups (x axis) is a continuous, you're going to have infinite groupings (one group for the values at 0, one for 0.1, one for 0.01, etc.), so in the end you actually can't divide your data, and ggplot probably ignores the x variable and makes one violin for all your data.
I am trying to combine a line plot and horizontal barplot on the same plot. The difficult part is that the barplot is actually counts of the y values of the line plot.
Can someone show me how this can be done using the example below ?
library(ggplot2)
library(plyr)
x <- c(1:100)
dff <- data.frame(x = x,y1 = sample(-500:500,size=length(x),replace=T), y2 = sample(3:20,size=length(x),replace=T))
counts <- ddply(dff, ~ y1, summarize, y2 = sum(y2))
# line plot
ggplot(data=dff) + geom_line(aes(x=x,y=y1))
# bar plot
ggplot() + geom_bar(data=counts,aes(x=y1,y=y2),stat="identity")
I believe what I need is presented in the pseudocode below but I do not know how to write it out in R.
Apologies. I actually meant the secondary x axis representing the value of counts for the barplot, while primary y-axis is the y1.
ggplot(data=dff) + geom_line(aes(x=x,y=y1)) + geom_bar(data=counts , aes(primary y axis = y1,secondary x axis =y2),stat="identity")
I just want the barplots to be plotted horizontally, so I tried the code below which flip both the line chart and barplot, which is also not I wanted.
ggplot(data=dff) +
geom_line(aes(x=x,y=y1)) +
geom_bar(data=counts,aes(x=y2,y=y1),stat="identity") + coord_flip()
You can combine two plots in ggplot like you want by specifying different data = arguments in each geom_ layer (and none in the original ggplot() call).
ggplot() +
geom_line(data=dff, aes(x=x,y=y1)) +
geom_bar(data=counts,aes(x=y1,y=y2),stat="identity")
The following plot is the result. However, since x and y1 have different ranges, are you sure this is what you want?
Perhaps you want y1 on the vertical axis for both plots. Something like this works:
ggplot() +
geom_line(data=dff, aes(x=y1 ,y = x)) +
geom_bar(data=counts,aes(x=y1,y=y2),stat="identity", color = "red") +
coord_flip()
Maybe you are looking for this. Ans based on your last code you look for a double axis. So using dplyr you can store the counts in the same dataframe and then plot all variables. Here the code:
library(ggplot2)
library(dplyr)
#Data
x <- c(1:100)
dff <- data.frame(x = x,y1 = sample(-500:500,size=length(x),replace=T), y2 = sample(3:20,size=length(x),replace=T))
#Code
dff %>% group_by(y1) %>% mutate(Counts=sum(y2)) -> dff2
#Scale factor
sf <- max(dff2$y1)/max(dff2$Counts)
# Plot
ggplot(data=dff2)+
geom_line(aes(x=x,y=y1),color='blue',size=1)+
geom_bar(stat='identity',aes(x=x,y=Counts*sf),fill='tomato',color='black')+
scale_y_continuous(name="y1", sec.axis = sec_axis(~./sf, name="Counts"))
Output:
I'm trying to plot a line graph (data points between 0 and 2.5, with interval of 0.5). I want to plot some bars in the same chart on the right-hand axis (between 0 and 60 with interval of 10). I am making some mistake in my code such that the bars get plotted in the left hand axis.
Here's some sample data and code:
Month <- c("J","F","M","A")
Line <- c(2.5,2,0.5,3.4)
Bar <- c(30,33,21,40)
df <- data.frame(Month,Line,Bar)
ggplot(df, aes(x=Month)) +
geom_line(aes(y = Line,group = 1)) +
geom_col(aes(y=Bar))+
scale_y_continuous("Line",
sec.axis = sec_axis(trans= ~. /50, name = "Bar"))
Here's the output
Thanks in advance.
Try this approach with scaling factor. It is better if you work with a scaling factor between your variables and then you use it for the second y-axis. I have made slight changes to your code:
library(tidyverse)
#Data
Month <- c("J","F","M","A")
Line <- c(2.5,2,0.5,3.4)
Bar <- c(30,33,21,40)
df <- data.frame(Month,Line,Bar)
#Scale factor
sfactor <- max(df$Line)/max(df$Bar)
#Plot
ggplot(df, aes(x=Month)) +
geom_line(aes(y = Line,group = 1)) +
geom_col(aes(y=Bar*sfactor))+
scale_y_continuous("Line",
sec.axis = sec_axis(trans= ~. /sfactor, name = "Bar"))
Output:
I want to make a line chart in plotly so that it does not have the same color on its whole length. The color is given continuous scale. It is easy in ggplot2 but when I translate it to plotly using ggplotly function the variable determining color behaves like categorical variable.
require(dplyr)
require(ggplot2)
require(plotly)
df <- data_frame(
x = 1:15,
group = rep(c(1,2,1), each = 5),
y = 1:15 + group
)
gg <- ggplot(df) +
aes(x, y, col = group) +
geom_line()
gg # ggplot2
ggplotly(gg) # plotly
ggplot2 (desired):
plotly:
I found one work-around that, on the other hand, behaves oddly in ggplot2.
df2 <- df %>%
tidyr::crossing(col = unique(.$group)) %>%
mutate(y = ifelse(group == col, y, NA)) %>%
arrange(col)
gg2 <- ggplot(df2) +
aes(x, y, col = col) +
geom_line()
gg2
ggplotly(gg2)
I also did not find a way how to do this in plotly directly. Maybe there is no solution at all. Any ideas?
It looks like ggplotly is treating group as a factor, even though it's numeric. You could use geom_segment as a workaround to ensure that segments are drawn between each pair of points:
gg2 = ggplot(df, aes(x,y,colour=group)) +
geom_segment(aes(x=x, xend=lead(x), y=y, yend=lead(y)))
gg2
ggplotly(gg2)
Regarding #rawr's (now deleted) comment, I think it would make sense to have group be continuous if you want to map line color to a continuous variable. Below is an extension of the OP's example to a group column that's continuous, rather than having just two discrete categories.
set.seed(49)
df3 <- data_frame(
x = 1:50,
group = cumsum(rnorm(50)),
y = 1:50 + group
)
Plot gg3 below uses geom_line, but I've also included geom_point. You can see that ggplotly is plotting the points. However, there are no lines, because no two points have the same value of group. If we hadn't included geom_point, the graph would be blank.
gg3 <- ggplot(df3, aes(x, y, colour = group)) +
geom_point() + geom_line() +
scale_colour_gradient2(low="red",mid="yellow",high="blue")
gg3
ggplotly(gg3)
Switching to geom_segment gives us the lines we want with ggplotly. Note, however, that line color will be based on the value of group at the first point in the segment (whether using geom_line or geom_segment), so there might be cases where you want to interpolate the value of group between each (x,y) pair in order to get smoother color gradations:
gg4 <- ggplot(df3, aes(x, y, colour = group)) +
geom_segment(aes(x=x, xend=lead(x), y=y, yend=lead(y))) +
scale_colour_gradient2(low="red",mid="yellow",high="blue")
ggplotly(gg4)
I am trying to have labels of the y axis from a ggplot between a categorical (species in Y) and a continuous variable (in X) presented in alphabetic order. But I am getting the Y presented with the last species in alphabetic order on the top of my Y axis and the first species in alphabetic order on the bottom.
Since I am new I cannot show images, but it looks like a list of species on the y axis and for each species is represented a point with its standard error bars to the corresponding x value (mean). And the species are presented with Wood Duck on the top and Alpine Swift on the bottom (the middle being ordered in alphabetic order).
I would like to have the opposite (species Alpine Swift on the top and on the bottom the species Wood Duck).
the command I used to plot the graph is the following:
# getting data for the error bars
limits<-aes(xmax=mydata$Xvalues+mydata$Xvalues_SD,
xmin=(mydata$Xvalues-mydata$Xvalues_SD))
# plot graph
graph<-ggplot(data=mydata,aes(x = Xvalues, y = species))
+scale_y_discrete("Species")
+scale_x_continuous(" ")+geom_point()+theme_bw()+geom_errorbarh(limits)
I have tried to order my data set before to upload the data and run the graph.
I have also tried to reorder the species factor using the following command:
mydata$species <- ordered(mydata$species, levels=c("Alpine Swift","Azure-winged Magpie","Barn Swallow","Black-browed Albatross","Blue Tit1","Blue Tit2","Blue-footed Booby","Collared Flycatcher","Common Barn Owl","Common Buzzard","Eurasian Sparrowhawk","European Oystercatcher","Florida Scrub-Jay","Goshawk","Great Tit","Green Woodhoopoe","Grey-headed Albatross","House Sparrow","Indigo Bunting","Lesser Snow Goose","Long-tailed Tit","Meadow Pipit","Merlin","Mute Swan","Osprey","Pied Flycatcher","Pinyon Jay","Sheychelles Warbler","Short-tailed Shearwater","Siberian Jay","Tawny Owl","Ural Owl","Wandering Albatross","Western Gull1","Western Gull2","Wood Duck"))
But I am getting the same graph.
How should I do to change the order of my Y axis?
library(ggplot2)
df <- data.frame(x=rnorm(10),Species=LETTERS[1:10])
ggplot(df)+geom_point(aes(x=x,y=Species),size=3,color="red")
df$Species <- factor(df$Species,levels=rev(unique(df$Species)))
ggplot(df)+geom_point(aes(x=x,y=Species),size=3,color="red")
If you want to put y in some other order, say order of decreasing x, do this:
df$Species <- factor(df$Species, levels=df[order(df$x,decreasing=T),]$Species)
ggplot(df)+geom_point(aes(x=x,y=Species),size=3,color="red")
Try changing +scale_y_discrete("Species") to +scale_y_discrete("Species", trans = 'reverse')
Using fct_rev() from package forcats and following jlhoward's example:
library(ggplot2)
library(forcats)
df <- data.frame(x = rnorm(10), Species = LETTERS[1:10])
# original plot
ggplot(df) +
geom_point(aes(x = x, y = Species), size = 3, color = "red")
# solution
ggplot(df) +
geom_point(aes(x = x, y = fct_rev(Species)), size = 3, color = "red")