I'm trying to build some kind of profile diagram with ggplot2. I therefore want a line which connects the means in the plot. As you see, geom_line doesn't work here because it only connects the points within each factor level but not the means between factor levels.
Here's a small example:
df <- data.frame(variable=rep(1:3,each=10),value=rnorm(30))
p <- ggplot(df,aes(factor(variable),value))
p + stat_summary(fun.y=mean, geom="point")+coord_flip()+geom_line()
Does anyone has an idea how to achieve that?
Thank you in advance!
It is often easier to summarize the data before you plot. Something like
The next trick is to use group within the call to geom_line to override the default grouping by factor(variable)
summarydf <- ddply(df,.(variable),summarize, value = mean(value))
p <- ggplot(summarydf,aes(factor(variable),value)) +
geom_point() + geom_line(aes(group=1)) + coord_flip()
p
Related
I want to compare two histograms in a graph in R, but couldn't imagined and implemented.
My histograms are based on two sub-dataframes and these datasets divided according to a type (Action, Adventure Family)
My first histogram is:
split_action <- split(df, df$type)
dataset_action <- split_action$Action
hist(dataset_action$year)
split_adventure <- split(df, df$type)
dataset_adventure <- split_adventure$Adventure
hist(dataset_adventure$year)
I want to see how much overlapping is occured, their comparison based on year in the same histogram. Thank you in advence.
Using the iris dataset, suppose you want to make a histogram of sepal length for each species. First, you can make 3 data frames for each species by subsetting.
irissetosa<-subset(iris,Species=='setosa',select=c('Sepal.Length','Species'))
irisversi<-subset(iris,Species=='versicolor',select=c('Sepal.Length','Species'))
irisvirgin<-subset(iris,Species=='virginica',select=c('Sepal.Length','Species'))
and then, make the histogram for these 3 data frames. Don't forget to set the argument "add" as TRUE (for the second and third histogram), because you want to combine the histograms.
hist(irissetosa$Sepal.Length,col='red')
hist(irisversi$Sepal.Length,col='blue',add=TRUE)
hist(irisvirgin$Sepal.Length,col='green',add=TRUE)
you will have something like this
Then you can see which part is overlapping...
But, I know, it's not so good.
Another way to see which part is overlapping is by using density function.
plot(density(irissetosa$Sepal.Length),col='red')
lines(density(irisversi$Sepal.Length),col='blue')
lines(density(irisvirgin$Sepal.Length,col='green'))
Then you will have something like this
Hope it helps!!
You don't need to split the data if using ggplot. The key is to use transparency ("alpha") and change the value of the "position" argument to "identity" since the default is "stack".
Using the iris dataset:
library(ggplot2)
ggplot(data=iris, aes(x=Sepal.Length, fill=Species)) +
geom_histogram(binwidth=0.2, alpha=0.5, position="identity") +
theme_minimal()
It's not easy to see the overlap, so a density plot may be a better choice if that's the main objective. Again, use transparency to avoid obscuring overlapping plots.
ggplot(data=iris, aes(x=Sepal.Length, fill=Species)) +
geom_density(alpha=0.5) +
xlim(3.9,8.5) +
theme_minimal()
So for your data, the command would be something like this:
ggplot(data=df, aes(x=year, fill=type)) +
geom_histogram(alpha=0.5, position="identity")
I am wondering if I am able to graph separate lines for 2 variables without using the grid function. I would prefer the 4 lines on one graph than 2 lines in 2 grids. Its ok if I can't but thought I would ask.
My data is as follows:
nd<-data.frame(Machine = c(2,2,3,3,2,2,3,3),
Source = c("tube", "machine","tube", "machine","tube", "machine","tube", "machine"),
Time=c(0,0,0,0,2,2,2,2),
Count=c(224000, 107000, 850000, 940000, 610000,116000, 1160000, 1100000))
and this code gives me what I want with a facet...
ggplot(data=nd, aes(x=Time, y=Count, group=Machine, color=Machine)) +
geom_line(aes(group=Machine))+ geom_point()+facet_grid(~Source)
Is there an alternative to this?
P.S. even though Machine is a factor variable why is my legend showing it as continuous?
One quick way is to use the interaction function, which paste your two variables with a "."
ggplot(data=nd, aes(x=Time, y=Count, color=interaction(Machine,Source))) +
geom_line() + geom_point() +
scale_color_manual("groups",
values=c("#61d4b3","#fdd365","#fb8d62","#fd2eb3"))
I want to make a plot similar to the one attached by Lindfield et al. 2016. I'm familiar with the ggplot command in R with the format:
ggplot(dataframe, aes(x, y)) + geom_bar(stat = 'identity')
However, I don't know how to make a cumulative se error for a stacked barplot; only one that employs a position_dodge command.
I know that there are disadvantages to using stacked bars with se errors, but for my data set, it is more presentable than using the unstacked barplots.
Thanks.
I don't know how you get the cumulative standard errors in an appropriate way (I guess it depends on how your values are generated) but I think you need to do calculate them and store them in a second DF, for example if you have an initial data.frame created like this:
DF <- data.frame( x=c("a","a","b","b"),
sp=c("shark","cod","shark","cod"),
y=c(10,5,15,7),
stringsAsFactors=FALSE )
where y is the value associated with each species at each x point, then you'd create a second DF containing the lower and upper limits of your s.e. for each x value, eg
seDF <- data.frame( x=c('a','b'),
yl=c(12,18),
yu=c(17,24),
stringsAsFactors=FALSE )
Then you can create your plot with:
ggplot() +
geom_bar( data=DF, mapping=aes(x=x,y=y,fill=sp),
position="stack", stat="identity") +
geom_linerange( data=seDF, mapping=aes(x=x, ymin=yl, ymax=yu) )
I used geom_linerange rather then geom_errorbar as it doesn't create crossbars at either end.
I want to group the bars in a stacked barplot according to the values in another factor-variable. However, I want to do this without using facets.
my data in long format
I want to group the stacked bars according the afk variable. The normal stacked bar plot can be made with:
ggplot(nl.melt, aes(x=naam, y=perc, fill=stemmen)) +
geom_bar(stat="identity", width=.7) +
scale_x_discrete(expand=c(0,0)) +
scale_y_continuous(expand=c(0,0)) +
coord_flip() +
theme_bw()
which gives an alfabetically ordered barplot:
I tried to group them by using x=reorder(naam,afk) in the aes. But that didn't work. Also using group=afk does not have the desired effect.
Any ideas how to do this?
reorder should work but the problem is you're trying to re-order by a factor. You need to be explicit on how you want to use that information. You can either use
nl.melt$naam <- reorder(nl.melt$naam, as.numeric(nl.melt$afk))
or
nl.melt$naam <- reorder(nl.melt$naam, as.character(nl.melt$afk), FUN=min)
depending on whether you want to sort by the existing levels of afk or if you want to sort alphabetically by the levels of afk.
After running that and re-running the ggplot code, i get
An alternative to #MrFlick's approach (based on the answer #CarlosCinelli linked to) is:
ggplot(nl.melt, aes(x=interaction(naam,afk), y=perc, fill=stemmen)) +
geom_bar(stat="identity", width=.7) +
scale_x_discrete(expand=c(0,0)) +
scale_y_continuous(expand=c(0,0)) +
coord_flip() +
theme_bw()
which gives:
R tends to see the order of levels as a property of the data rather than a property of the graph. Try reordering the data itself before calling the plotting commands. Try running:
nl.melt$naam <- reorder(nl.melt$naam, nl.melt$afk)
Then run your ggplot code. Or use other ways of reordering your factor levels in naam.
I am trying to write a code that I wrote with a basic graphics package in R to ggplot.
The graph I obtained using the basic graphics package is as follows:
I was wondering whether this type of graph is possible to create in ggplot2. I think we could create this kind of graph by using panels but I was wondering is it possible to use faceting for this kind of plot. The major difficulty I encountered is that maximum and minimum have common lengths whereas the observed data is not continuous data and the interval is quite different.
Any thoughts on arranging the data for this type of plot would be very helpful. Thank you so much.
Jdbaba,
From your comments, you mentioned that you'd like for the geom_point to have just the . in the legend. This is a feature that is yet to be implemented to be used directly in ggplot2 (if I am right). However, there's a fix/work-around that is given by #Aniko in this post. Its a bit tricky but brilliant! And it works great. Here's a version that I tried out. Hope it is what you expected.
# bind both your data.frames
df <- rbind(tempcal, tempobs)
p <- ggplot(data = df, aes(x = time, y = data, colour = group1,
linetype = group1, shape = group1))
p <- p + geom_line() + geom_point()
p <- p + scale_shape_manual("", values=c(NA, NA, 19))
p <- p + scale_linetype_manual("", values=c(1,1,0))
p <- p + scale_colour_manual("", values=c("#F0E442", "#0072B2", "#D55E00"))
p <- p + facet_wrap(~ id, ncol = 1)
p
The idea is to first create a plot with all necessary attributes set in the aesthetics section, plot what you want and then change settings manually later using scale_._manual. You can unset lines by a 0 in scale_linetype_manual for example. Similarly you can unset points for lines using NA in scale_shape_manual. Here, the first two values are for group1=maximum and minimum and the last is for observed. So, we set NA to the first two for maximum and minimum and set 0 to linetype for observed.
And this is the plot:
Solution found:
Thanks to Arun and Andrie
Just in case somebody needs the solution of this sort of problem.
The code I used was as follows:
library(ggplot2)
tempcal <- read.csv("temp data ggplot.csv",header=T, sep=",")
tempobs <- read.csv("temp data observed ggplot.csv",header=T, sep=",")
p <- ggplot(tempcal,aes(x=time,y=data))+geom_line(aes(x=time,y=data,color=group1))+geom_point(data=tempobs,aes(x=time,y=data,colour=group1))+facet_wrap(~id)
p
The dataset used were https://www.dropbox.com/s/95sdo0n3gvk71o7/temp%20data%20observed%20ggplot.csv
https://www.dropbox.com/s/4opftofvvsueh5c/temp%20data%20ggplot.csv
The plot obtained was as follows:
Jdbaba