I am currently trying to plot some data with dots and lines. My dataframe has an own column (FarbDots) in which I specify my wanted colours. When I try to plot the data, geom_point takes the colours in the wanted order, while geom_lines() creates a total mess (see image).
I was not able to recreate the same effect in a sample data set. Any idea on how to get my colours in order while still specifying them within the geom_line()/ geom_point()?
This is the code I used for plotting: (with b specifying the dataset, x, y, and groups)
b +
geom_line(colour=Data_Biol_long$FarbDots)+
geom_point(colour=Data_Biol_long$FarbDots)+
scale_y_log10()+
facet_grid(Analysis~., scale='free')
dots and lines should receive colour from same vector?!
I am trying to plot a distribution in R using the package vioplot; my plot consists of a scatterplot of points with violin plots (representing 'bins' of these points) plotted over the top of the scatterplots.
However, different methods of plotting my data result in slightly different characteristics in the plot. If all the violin plots are plotted using a loop, the violin plot tails will stretch down to the lowest points, but if plotted individually, the violin plot tails won't reach to the outliers. Additionally, resizing the plot window (and then re-plotting) also changes how the tails of the violin plots appear.
Because I'm getting these differing plots, I'm wondering how to tell which plot is the correct representation of the data, and how to produce a consistent result. I've used the 'range' and 'coef' arguments in vioplot to make the plots more consistent, but this hasn't worked.
Thanks!
Maybe I am wrong, but I think the violin itself is not really that defined and just an easy-to-look-at data representation. The boxplot on the other hand (which is plotted as well with vioplot inside the violin) is much more important as its bars tell you the 50th, 25th and 75th percentile (though the 5oth in vioplot is a white dot for some reason), and the whiskers depend on how you plot, but in the case of vioplot I think it is the 95th and 5th percentile.
If you want higher customizability, use ggplot:
library(reshape2) #for melt()
library(ggplot2)
uniform<-runif(200,-4,4)
normal<-rnorm(200,0,3)
df <- data.frame(x=normal, y=uniform) %>% melt()
ggplot(df, aes(x=variable, y=value)) +
geom_violin() +
geom_boxplot()
and if you want to plot all the data points instead of just the outliers, you can use ggbeeswarm for that, while not showing the outliers from geom_boxplot:
library(ggbeeswarm)
ggplot(df, aes(x=variable, y=value)) +
geom_violin() +
geom_boxplot(outlier.alpha = 0) +
geom_beeswarm()
I want to be able to create a bar graph which shows also shows the mean value for bars in each group. AND shows the mean bar in the legend.
I have been able to get this graph Bar chart with means using the code below, which is fine, but I would like to be able to see the mean lines in the legend.
##The data to be graphed is the proportion of persons receiving a treatment
## (num=numerator) in each population (denom=demoninator). The population is
##grouped by two age groups and (Age) and further divided by a categorical
##variable V1
###SET UP DATAFRAME###
require(ggplot2)
df <- data.frame(V1 = c(rep(c("S1","S2","S3","S4","S5"),2)),
Age= c(rep(70,5),rep(80,5)),
num=c(5280,6570,5307,4894,4119,3377,4244,2999,2971,2322),
denom=c(9984,12600,9425,8206,7227,7290,8808,6386,6206,5227))
df$prop<-df$num/df$denom*100
PopMean<-sum(df$num)/sum(df$denom)*100
df70<-df[df$Age==70,]
group70mean<-sum(df70$num)/sum(df70$denom)*100
df80<-df[df$Age==80,]
group80mean<-sum(df80$num)/sum(df80$denom)*100
df$PopMean<-c(rep(PopMean,10))
df$groupmeans<-c(rep(group70mean,5),rep(group80mean,5))
I want the plot to look like this, but want the lines in the legend too, to be labelled as 'mean of group' or similar.
#basic plot
P<-ggplot(df, aes(x=factor(Age), y=prop, fill=factor(V1))) +
geom_bar(position=position_dodge(), colour='black',stat="identity")
P
####add mean lines
P+geom_errorbar(aes(y=df$groupmeans, ymax=df$groupmeans,
ymin=df$groupmeans), col="red", lwd=2)
Adding show.legend=TRUE overlays the error bars onto the factor legend, rather than separately. If there is a way of showing geom_errorbar separately in the legend this is probably the simplest solution.
I have also tried various things with geom_line
The syntax below produces a line for the population mean value, but running from the centre of each point rather than covering the width of the bars
This produces a line for the population mean and it does produce a legend but one showing a bar of colour rather than a line.
P+geom_line(aes(y=df$PopMean, group=df$PopMean, color=df$PopMean),lwd=1)
If i try to do lines for group means the lines are not visible (because they are only single points).
P+geom_line(aes(y=df$groupmeans, group=df$groupmeans, color=df$groupmeans))
I also tried to get round this with facet plot, although this requires me to pretend my categorical variable is numeric to get it to work.
###set up new df
df2<-df
df2$V1<-c(rep(c(1,2,3,4,5),2))
P<-ggplot(df2, aes(x=factor(V1), y=prop, fill=factor(V1))) +
geom_bar(position=position_dodge(),
colour='black',stat="identity",width=1)
P+facet_grid(.~factor(df2$Age))
P+facet_grid(.~factor(df2$Age))+geom_line(aes(y=df$groupmeans,
group=df$groupmeans, color=df$groupmeans))
Facetplot
This allows me to show the mean lines, using geom_line, so a legend does appear (although it doesn't look right, showing a colour gradient rather than coloured lines!). However, the lines still do not go the full width of the bars. Also my x-axis now needs relabelling to show S1, S2 etc rather than numeric 1,2,3
To sum up - is there a way of showing error bar lines separately in the legend?
If not, then, if i use facetting, how do I correct the legend appearance and relabel axes with my categorical variables and is is possible to get the line to go the full width of the plot?
Or is there an alternate solution that I am missing!?
Thanks
To get the legend for the geom_error you need to pass the colour argument in the aes.
As you want only one category (here red), I've create a dummy variable first
df$mean <- "Mean"
ggplot(df, aes(x=factor(Age), y=prop, fill=factor(V1))) +
geom_bar(position=position_dodge(), colour='black',stat="identity") +
geom_errorbar(aes (ymax=groupmeans,
ymin=groupmeans, colour=mean), lwd=2) +
scale_colour_manual(name="",values = "#ff0000")
dateVec <- as.Date(c("08-01-2015","08-02-2015","08-03-2015","08-04-2015","08-05-2015"),format="%m-%d-%Y")
myData <- data.frame(dat=c(.1,.2,-.1,1,.1),
dates=dateVec,
indicator=c(0,0,0,1,0))
ggplot(myData,aes(x=dates,y=dat)) + geom_point()
I manually altered the plot here to shade the area around the datapoint with the highest value, where 'indicator' = 1.
How could I create this shading in ggplot automatically? Ideally I'd like the shaded area to have width, even though the x value is categorical. I've played with coloring the geom_point objects themselves according to the indicator, and while that works it doesn't really pop visually the way I would like it to.
I have a dataset that lookd pretty much like this one from diamonds:
diamonds2 = subset(diamonds, cut!='Good' & cut!='Very Good', -c(table, x, y, z, clarity, depth, price))
I want to make a boxplot like this one:
ggplot(diamonds2, aes(x=color, y=carat, col=cut))+geom_boxplot()
And the hard question comes here. My idea is to perform pairwise wilcox.test for each distribution of the variable y (carat) by group (cut) and for each of the columns (color).
library(plyr)
ddply(diamonds2,"color",
function(x) {
w <- wilcox.test(carat~cut,data=diamonds2)
with(w,data.frame(statistic,p.value))
})
The code fails because is asking for 2 levels (obviously). I can make a subset before applying the function (to remove one of the "cut") but It's not giving me what I want and can't understand why.
Additionally I would like to plot the results as asterisks of the color between the two distributions I'm comparing.
In the first boxplot (D), I would like to plot 3 asterisks, a purple (red and blue are significantly different), a yellow and a cian.
About the asterisk color plotting I've been playing a bit with the function geom_text from ggplot2 but I can't figure out how to plot below the X axis or plot text in different colors.