I'm faily new to R and do need some help in manipulating my graph. I'm trying to compare actual and forecast figures, but cannot get the coloring of the legends right. The data looks like this:
hierarchy Actual Forecast
<fctr> <dbl> <dbl>
1 E 9313 5455
2 K 6257 3632
3 O 7183 8684
4 A 1579 6418
5 S 8755 0149
6 D 5897 7812
7 F 1400 8810
8 G 4960 5710
9 R 3032 0412
And the code looks like this:
ggplot(sam4, aes(hierarchy))+ theme_bw() +
geom_bar(aes(y = Actual, colour="Actual"),fill="#66FF33", stat="identity",position="dodge", width=0.40) +
geom_bar(aes(y = Forecast, colour="Forecast"), fill="#FF3300", stat="identity",position="dodge", width=0.2)
The graph ends up looking like this:
I believe your problem is that your data is not formatted well to use ggplot. You want to tidy up your dataframe first. Check out http://tidyr.tidyverse.org/ to get familiar with the concept of tidy data.
Using the tidyverse (ggplot is part of it), I tidied up your data and I believe got the plot you want.
library(tidyverse) #includes ggplot
newdata <- gather(sam4, actualorforecast, value, -hierarchy)
ggplot(newdata, aes(x = hierarchy)) +
theme_bw() +
geom_bar(aes(y = value, fill = actualorforecast),
stat = "identity",
width = ifelse(newdata$actualorforecast == "Actual", .4, .2),
position = "dodge") +
scale_fill_manual(values= c(Actual ="#66FF33", Forecast="#FF3300"))
Related
I have about 90 years of daily data and I want to plot the long term mean, plus the individual lines for each year of my survey period (2014-2018). The data looks like this:
> head(dischg)
date ddmm year cfs daymo
1 1-Jan-27 01-Jan 1927 715 2018-01-01
2 2-Jan-27 02-Jan 1927 697 2018-01-02
3 3-Jan-27 03-Jan 1927 715 2018-01-03
4 4-Jan-27 04-Jan 1927 796 2018-01-04
5 5-Jan-27 05-Jan 1927 825 2018-01-05
6 6-Jan-27 06-Jan 1927 865 2018-01-06
I have been able to plot the long term mean easily enough:
p1 <- ggplot(dischg, aes(x=daymo, y=cfs)) +
stat_summary(fun.data = "mean_cl_boot", geom = "smooth", colour = "blue")
... but I need some help plotting the subset of years. I tried using "subset"
p2 <- p1 +
ggplot (subset(dischg, year %in% c(2014:2018)), aes(x=daymo, y=cfs, linetype=year)) +
geom_line() +
scale_colour_brewer(palette="Set1")
but I received this error:
Error: Don't know how to add o to a plot
Would it be smarter to just add one year at a time? That seems a bit cumbersome when there are five years of data to plot.
Thank you for providing sample data, however, I unfortunately cannot get the ggplot code to run with that sample data you provided so I will use a built in R dataset. The concepts are the same though.
The issue is that you are trying to add ggplot to an object that is already of class ggplot. Once you have initialized your object as a ggplot object, you don't need to call ggplot each time you want to add a layer. For example, I get the same error you do if I try:
p1 <- ggplot(mtcars, aes(x=hp,y=cyl)) + geom_point()
p2 <- p1 + ggplot(mtcars[mtcars$am == 1, ], aes(x = hp, y = cyl)) + geom_line()
As mentioned in my comment, if you want to add another layer with separate data (in your case the geom_line) you can do this by putting the data directly into the geom_ call. In your case you would do something like:
p1 <- ggplot(mtcars, aes(x=hp,y=cyl)) + geom_point()
p2 <- p1 + geom_line(data = mtcars[mtcars$am == 1, ])
p2
With thanks to feedback from #MikeH., I figured it out:
p1 <- ggplot(dischg, aes(x=daymo, y=cfs)) +
stat_summary(fun.data = "mean_cl_boot", geom = "smooth", colour = "blue") +
geom_line(data=subset(dischg, year %in% c(2014:2018)),
aes(colour=year)) +
scale_colour_brewer(palette="Set1")
(Also, I had to make sure the 'year' was a factor rather than a continuous variable.)
I would like to make a bar plot with percent format.
here is my data set:
https://drive.google.com/file/d/1xpRqQwzKFuirpKYKcoi1qVYSaiA-D5WX/view?usp=sharing
load('test.Robj')
Here is my part of data looks like:
res.1.2 branch
AAACCTGCACCAGGCT 0 1
AAACCTGGTCATATGC 7 4
AAACCTGGTTAGTGGG 15 NA
AAACCTGTCCACGCAG 1 NA
AAACCTGTCCACGTTC 17 2
AAACGGGCACCGAATT 0 1
I tried to use this code to plot:
ggplot(test,aes(x = branch, y =factor(1),fill = res.1.2)) +
geom_bar(position = "fill",stat = "identity")+
scale_y_discrete(labels =scales::percent)
I want to make my y axis as percent of counts of res.1.2 in total(stacked bar chart, or similar to a pie chart),
quite similar to this issue
but I got this:
Any suggestion?
If I understand correctly, the OP wants to plot the values of res.1.2 which are of type character. So, res.1.2 needs to be coerced to integer for plotting:
# load OP's data
load('test.Robj')
# create plot
library(ggplot2)
ggplot(test,aes(x = branch, y = as.integer(res.1.2), fill = res.1.2)) +
geom_bar(position = "fill",stat = "identity") +
scale_y_continuous(labels = scales::percent)
However, if the OP intends to show the number of occurrences of each value of res.1.2 as share of total within each branch, the code is as follows:
# load OP's data
load('test.Robj')
# create plot
library(ggplot2)
ggplot(test, aes(x = branch, fill = res.1.2)) +
geom_bar(position = "fill") +
scale_y_continuous(labels = scales::percent)
The chart shows the counts of res.1.2 as percentage for each branch
I want to plot the results of a benchmark of several bioinformatics tools, using ggplot. I would like t have all the bars on the same graph instead of having one graph for each tool. I already have an output with LibreOffice (see image below), but I want to re-do it with ggplot.
For now I have this kind of code for each tool (example with the first one) :
data_reduced <- read.table("benchmark_groups_4sps", sep="\t", header=TRUE)
p<-ggplot(data=data_reduced, aes(x=Nb_sps, y=OrthoFinder)) +
geom_bar(stat="identity", color="black", fill="red") +
xlab("Number of species per group") + ylab("Number of groups") +
geom_text(aes(label=OrthoFinder), vjust=1.6, color="black", size=3.5)
But I have not found out how to paste together all the graphes, but not how to merge them into a single one.
My input data :
Nb_species OrthoFinder FastOrtho POGS (no_para) POGS (soft_para) proteinOrtho
4 125 142 152 202 114
5 61 65 42 79 44
6 37 29 15 21 8
7 19 17 4 7 5
8 15 10 1 0 0
9 10 2 0 0 0
Thanks !
Maybe this can help you in the right direction:
# sample data
df = data.frame(Orthofinder=c(1,2,3), FastOrtho=c(2,3,4), POGs_no_para=c(1,2,2))
library(reshape2)
library(dplyr)
# first let's convert the dataset: Convert to long format and aggregate.
df = melt(df, id.vars=NULL)
df = df %>% group_by(variable,value) %>% count()
# Then, we create a plot.
ggplot(df, aes(factor(value), n, fill = variable)) +
geom_bar(stat="identity", position = "dodge") +
scale_fill_brewer(palette = "Set1")
There is enough documentation around on formatting a plot, so I'll leave that to you ;) Hope this helps!
EDIT: Since the question was changed to work with a different dataset as origin while I was typing my answer, here is the modified code to work with that:
df = data.frame(Nb_species = c(4,5,6,7), OrthoFinder=c(125,142,100,110), FastOrtho=c(100,120,130,140))
library(reshape2)
library(dplyr)
df = melt(df, id.vars="Nb_species")
ggplot(df, aes(factor(Nb_species), value, fill = variable)) +
geom_bar(stat="identity", position = "dodge") +
scale_fill_brewer(palette = "Set1")
I'm using ggplot2 to create a simple dot plot of -1 to +1 correlation values using the following R code:
ggplot(dataframe, aes(x = exit)) +
geom_point(aes(y= row.names(dataframe))) +
geom_text(aes(y=exit, label=samplesize))
The y-axis has text labels, and I believe those text labels may be the reason that my geom_text() data point labels are squished down into the bottom of the plot as pictured here:
How can I change my plotting so that the data point labels appear on the dots themselves?
I understand that you would like to have the samplesize appear above each data point in the plot. Here is a sample plot with a sample data frame that does this:
EDIT: Per note by Gregor, changed the geom_text() call to utilize aes() when referencing the data. Thanks for the heads up!
top10_rank<-
String Number
4 h 0
1 a 1
11 w 1
3 z 3
7 z 3
2 b 4
8 q 5
6 k 6
9 r 9
5 x 10
10 l 11
x<-ggplot(data=top10_rank, aes(x = Number,
y = String)) + geom_point(size=3) + scale_y_discrete(limits=top10_rank$String)
x + geom_text(data=top10_rank, size=5, color = 'blue',
aes(x = Number,label = Number), hjust=0, vjust=0)
Not sure if this is what you wanted though.
Your problem is simply that you switched the y variables:
# your code
ggplot(dataframe, aes(x = exit)) +
geom_point(aes(y = row.names(dataframe))) + # here y is the row names
geom_text(aes(y =exit, label = samplesize)) # here y is the exit column
Since you want the same y-values for both you can define this in the initial ggplot() call and not worry about repeating it later
# working version
ggplot(dataframe, aes(x = exit, y = row.names(dataframe))) +
geom_point() +
geom_text(aes(label = samplesize))
Using row names is a little fragile, it's a little safer and more robust to actually create a data column with what you want for y values:
# nicer code
dataframe$y = row.names(dataframe)
ggplot(dataframe, aes(x = exit, y = y)) +
geom_point() +
geom_text(aes(label = samplesize))
Having done this, you probably don't want the labels right on top of the points, maybe a little offset would be better:
# best of all?
ggplot(dataframe, aes(x = exit, y = y)) +
geom_point() +
geom_text(aes(x = exit + .05, label = samplesize), vjust = 0)
In the last case, you'll have to play with the adjustment to the x aesthetic, what looks right will depend on the dimensions of your final plot
I am trying to present my data using ggplot2. My dataframe is build up like this:
type count
1 exon 4
2 intron 3
3 intron 1
4 exon 10
.. ... ..
I am trying to present the data by plotting as histograms and boxplots, but I encounter some problems.
For the histograms I used the following code:
ggplot(hisdat, aes(x=count, fill=type)) +
geom_histogram(binwidth=.5, position="dodge")
and that gives me this plot:
As you can see the counts in the bottom of the plot are arranged such that 10 follows 1 and 100 follows 10. I arrange them from the first single number of the number count. How do I get it to go from 1-148?
For the boxplot I have the same trouble and on top of that my plot is not looking like a boxplot at all. Is my code wrong?
ggplot(hisdat, aes(x=type, y=count, fill=type)) + geom_boxplot()
It gives me this result:
since the other part of your question has already been answered in the comments here is the answer to this part:
How do I get it to go from 1-148?
df <- read.table(header = TRUE, text=
" type count
1 exon 4
2 intron 3
3 intron 1
4 exon 10")
library(ggplot2)
library(ggplot2)
ggplot(df, aes(x = reorder(type, count), y = count, fill = type)) + geom_bar(stat = "identity", position = "dodge")