Using ggplot2 I have made facetted histograms using the following code.
library(ggplot2)
library(plyr)
df1 <- data.frame(monthNo = rep(month.abb[1:5],20),
classifier = c(rep("a",50),rep("b",50)),
values = c(seq(1,10,length.out=50),seq(11,20,length.out=50))
)
means <- ddply (df1,
c(.(monthNo),.(classifier)),
summarize,
Mean=mean(values)
)
ggplot(df1,
aes(x=values, colour=as.factor(classifier))) +
geom_histogram() +
facet_wrap(~monthNo,ncol=1) +
geom_vline(data=means, aes(xintercept=Mean, colour=as.factor(classifier)),
linetype="dashed", size=1)
The vertical line showing means per month is to stay.
But I want to also add text over these vertical lines displaying the mean values for each month. These means are from the 'means' data frame.
I have looked at geom_text and I can add text to plots. But it appears my circumstance is a little different and not so easy. It's a lot simpler to add text in some cases where you just add values of the plotted data points. But cases like this when you want to add the mean and not the value of the histograms I just can't find the solution.
Please help. Thanks.
Having noted the possible duplicate (another answer of mine), the solution here might not be as (initially/intuitively) obvious. You can do what you need if you split the geom_text call into two (for each classifier):
ggplot(df1, aes(x=values, fill=as.factor(classifier))) +
geom_histogram() +
facet_wrap(~monthNo, ncol=1) +
geom_vline(data=means, aes(xintercept=Mean, colour=as.factor(classifier)),
linetype="dashed", size=1) +
geom_text(y=0.5, aes(x=Mean, label=Mean),
data=means[means$classifier=="a",]) +
geom_text(y=0.5, aes(x=Mean, label=Mean),
data=means[means$classifier=="b",])
I'm assuming you can format the numbers to the appropriate precision and place them on the y-axis where you need to with this code.
Related
I want to add an average line to the existing plot.
library(ggplot2)
A <- c(1:10)
B <- c(1,1,2,2,3,3,4,4,5,5)
donnees <- data.frame(A,B)
datetime<-donnees[,2]
Indcatotvalue<-donnees[,1]
df<-donnees
mn<-tapply(donnees[,1],donnees[,2],mean)
moyenne <- data.frame(template=names(mn),mean=mn)
ggplot(data=df,
aes_q(x=datetime,
y=Indcatotvalue)) + geom_line()
I have tried to add :
geom_line(aes(y = moyenne[,2], colour = "blue"))
or :
lines(moyenne[,1],moyenne[,2],col="blue")
but nothing happens, I don't understand especially for the function "lines".
When you say average line I'm assuming you want to plot a line that represents the average value of Y (Indcatotvalue). For that you want to use geom_hline() which plots horizontal lines on your graph:
ggplot(data=df,aes_q(x=datetime,y=Indcatotvalue)) +
geom_line() +
geom_hline(yintercept = mean(Indcatotvalue), color="blue")
Which, with the example numbers you gave, will give you a plot that looks like this:
The function stat_summary is perfect here.
I have found the answer in this page groups.google from Brian Diggs:
p + stat_summary(aes(group=bucket), fun.y=mean, geom="line", colour="green")
You need to set the group to the faceting variable explicitly since
otherwise it will be type and bucket (which looks like type since type
is nested in bucket).
I want to create 3 graphs in ggplot2 as follows:
ggplot(observbest,aes(x=factor(iteration),y=bottles,colour=Team ,group=Team)) + geom_line() + scale_colour_gradientn(colours=rainbow(16))
ggplot(observmedium,aes(x=factor(iteration),y=bottles,colour=Team ,group=Team)) + geom_line() + scale_colour_gradientn(colours=rainbow(16))
ggplot(observweak,aes(x=factor(iteration),y=bottles,colour=Team ,group=Team)) + geom_line() + scale_colour_gradientn(colours=rainbow(16))
That is, three graphs displaying the same thing but for difference dataset each time. I want to compare between them, therefore I want their y axis to be fixed to the same scale with the same margins on all graphs, something the currently doesn't happen automatically.
Any suggestion?
Thanks
It sounds like a facet_wrap on all the observations, combined into a single dataframe, might be what you're looking for. E.g.
library(plyr)
library(ggplot2)
observ <- rbind(
mutate(observbest, category = "best"),
mutate(observmedium, category = "medium"),
mutate(observweak, category = "weak")
)
qplot(iteration, bottles, data = observ, geom = "line") + facet_wrap(~category)
Add + ylim(min_value,max_value) to each graph.
Another option would be to merge the three datasets with an id variable identifying which value is in which dataset, and then plot the three of them together, differentiating them by linetype for instance.
Use scale_y_continuous to define the y axis for each graph and make them all easily comparable.
In the data that I am attempting to plot, each sample belongs in one of several groups, that will be plotted on their own grids. I am plotting stacked bar plots for each sample that will be ordered in increasing number of sequences, which is an id attribute of each sample.
Currently, the plot (with some random data) looks like this:
(Since I don't have the required 10 rep for images, I am linking it here)
There are couple things I need to accomplish. And I don't know where to start.
I would like the bars not to be placed at its corresponding nseqs value, rather placed next to each other in ascending nseqs order.
I don't want each grid to have the same scale. Everything needs to fit snugly.
I have tried to set scales and size to for facet_grid to free_x, but this results in an unused argument error. I think this is related to the fact that I have not been able to get the scales library loaded properly (it keeps saying not available).
Code that deals with plotting:
ggfdata <- melt(fdata, id.var=c('group','nseqs','sample'))
p <- ggplot(ggfdata, aes(x=nseqs, y=value, fill = variable)) +
geom_bar(stat='identity') +
facet_grid(~group) +
scale_y_continuous() +
opts(title=paste('Taxonomic Distribution - grouped by',colnames(meta.frame)[i]))
Try this:
update.packages()
## I'm assuming your ggplot2 is out of date because you use opts()
## If the scales library is unavailable, you might need to update R
ggfdata <- melt(fdata, id.var=c('group','nseqs','sample'))
ggfdata$nseqs <- factor(ggfdata$nseqs)
## Making nseqs a factor will stop ggplot from treating it as a numeric,
## which sounds like what you want
p <- ggplot(ggfdata, aes(x=nseqs, y=value, fill = variable)) +
geom_bar(stat='identity') +
facet_wrap(~group, scales="free_x") + ## No need for facet_grid with only one variable
labs(title = paste('Taxonomic Distribution - grouped by',colnames(meta.frame)[i]))
The following code
library(ggplot2)
library(reshape2)
m=melt(iris[,1:4])
ggplot(m, aes(value)) +
facet_wrap(~variable,ncol=2,scales="free_x") +
geom_histogram()
produces 4 graphs with fixed y axis (which is what I want). However, by default, the y axis is only displayed on the left side of the faceted graph (i.e. on the side of 1st and 3rd graph).
What do I do to make the y axis show itself on all 4 graphs? Thanks!
EDIT: As suggested by #Roland, one could set scales="free" and use ylim(c(0,30)), but I would prefer not to have to set the limits everytime manually.
#Roland also suggested to use hist and ddply outside of ggplot to get the maximum count. Isn't there any ggplot2 based solution?
EDIT: There is a very elegant solution from #babptiste. However, when changing binwidth, it starts to behave oddly (at least for me). Check this example with default binwidth (range/30). The values on the y axis are between 0 and 30,000.
library(ggplot2)
library(reshape2)
m=melt(data=diamonds[,c("x","y","z")])
ggplot(m,aes(x=value)) +
facet_wrap(~variable,ncol=2,scales="free") +
geom_histogram() +
geom_blank(aes(y=max(..count..)), stat="bin")
And now this one.
ggplot(m,aes(x=value)) +
facet_wrap(~variable,scales="free") +
geom_histogram(binwidth=0.5) +
geom_blank(aes(y=max(..count..)), stat="bin")
The binwidth is now set to 0.5 so the highest frequency should change (decrease in fact, as in tighter bins there will be less observations). However, nothing happened with the y axis, it still covers the same amount of values, creating a huge empty space in each graph.
[The problem is solved... see #baptiste's edited answer.]
Is this what you're after?
ggplot(m, aes(value)) +
facet_wrap(~variable,scales="free") +
geom_histogram(binwidth=0.5) +
geom_blank(aes(y=max(..count..)), stat="bin", binwidth=0.5)
ggplot(m, aes(value)) +
facet_wrap(~variable,scales="free") +
ylim(c(0,30)) +
geom_histogram()
Didzis Elferts in https://stackoverflow.com/a/14584567/2416535 suggested using ggplot_build() to get the values of the bins used in geom_histogram (ggplot_build() provides data used by ggplot2 to plot the graph). Once you have your graph stored in an object, you can find the values for all the bins in the column count:
library(ggplot2)
library(reshape2)
m=melt(iris[,1:4])
plot = ggplot(m) +
facet_wrap(~variable,scales="free") +
geom_histogram(aes(x=value))
ggplot_build(plot)$data[[1]]$count
Therefore, I tried to replace the max y limit by this:
max(ggplot_build(plot)$data[[1]]$count)
and managed to get a working example:
m=melt(data=diamonds[,c("x","y","z")])
bin=0.5 # you can use this to try out different bin widths to see the results
plot=
ggplot(m) +
facet_wrap(~variable,scales="free") +
geom_histogram(aes(x=value),binwidth=bin)
ggplot(m) +
facet_wrap(~variable,ncol=2,scales="free") +
geom_histogram(aes(x=value),binwidth=bin) +
ylim(c(0,max(ggplot_build(plot)$data[[1]]$count)))
It does the job, albeit clumsily. It would be nice if someone improved upon that to eliminate the need to create 2 graphs, or rather the same graph twice.
Summary: I want to choose the colors for a ggplot2() density distribution plot without losing the automatically generated legend.
Details: I have a dataframe created with the following code (I realize it is not elegant but I am only learning R):
cands<-scan("human.i.cands.degnums")
non<-scan("human.i.non.degnums")
df<-data.frame(grp=factor(c(rep("1. Candidates", each=length(cands)),
rep("2. NonCands",each=length(non)))), val=c(cands,non))
I then plot their density distribution like so:
library(ggplot2)
ggplot(df, aes(x=val,color=grp)) + geom_density()
This produces the following output:
I would like to choose the colors the lines appear in and cannot for the life of me figure out how. I have read various other posts on the site but to no avail. The most relevant are:
Changing color of density plots in ggplot2
Overlapped density plots in ggplot2
After searching around for a while I have tried:
## This one gives an error
ggplot(df, aes(x=val,colour=c("red","blue"))) + geom_density()
Error: Aesthetics must either be length one, or the same length as the dataProblems:c("red", "blue")
## This one produces a single, black line
ggplot(df, aes(x=val),colour=c("red","green")) + geom_density()
The best I've come up with is this:
ggplot() + geom_density(aes(x=cands),colour="blue") + geom_density(aes(x=non),colour="red")
As you can see in the image above, that last command correctly changes the colors of the lines but it removes the legend. I like ggplot2's legend system. It is nice and simple, I don't want to have to fiddle about with recreating something that ggplot is clearly capable of doing. On top of which, the syntax is very very ugly. My actual data frame consists of 7 different groups of data. I cannot believe that writing + geom_density(aes(x=FOO),colour="BAR") 7 times is the most elegant way of coding this.
So, if all else fails I will accept with an answer that tells me how to get the legend back on to the 2nd plot. However, if someone can tell me how to do it properly I will be very happy.
set.seed(45)
df <- data.frame(x=c(rnorm(100), rnorm(100, mean=2, sd=2)), grp=rep(1:2, each=100))
ggplot(data = df, aes(x=x, color=factor(grp))) + geom_density() +
scale_color_brewer(palette = "Set1")
ggplot(data = df, aes(x=x, color=factor(grp))) + geom_density() +
scale_color_brewer(palette = "Set3")
gives me same plots with different sets of colors.
Provide vector containing colours for the "values" argument to map discrete values to manually chosen visual ones:
ggplot(df, aes(x=val,color=grp)) +
geom_density() +
scale_color_manual(values=c("red", "blue"))
To choose any colour you wish, enter the hex code for it instead:
ggplot(df, aes(x=val,color=grp)) +
geom_density() +
scale_color_manual(values=c("#f5d142", "#2bd63f")) # yellow/green