Using the following R code I generated multiple plots onto one plot.
library(ggplot2)
Adata=nt.df[( nt.df$Species=='Human' | nt.df$Species=='Arabidopsis' )& nt.df$Nucleotide=='A',]
Cdata=nt.df[( nt.df$Species=='Human' | nt.df$Species=='Arabidopsis' )& nt.df$Nucleotide=='C',]
Gdata=nt.df[( nt.df$Species=='Human' | nt.df$Species=='Arabidopsis' )& nt.df$Nucleotide=='G',]
Udata=nt.df[( nt.df$Species=='Human' | nt.df$Species=='Arabidopsis' )& nt.df$Nucleotide=='U',]
# Grouped
Aplot <- ggplot(data, aes(fill=Species, y=Percent, x=Position)) +
geom_bar(position="dodge", stat="identity")
Cplot <- ggplot(data, aes(fill=Species, y=Percent, x=Position)) +
geom_bar(position="dodge", stat="identity")
Gplot <- ggplot(data, aes(fill=Species, y=Percent, x=Position)) +
geom_bar(position="dodge", stat="identity")
Uplot <- ggplot(data, aes(fill=Species, y=Percent, x=Position)) +
geom_bar(position="dodge", stat="identity")
grid.arrange(Aplot,Cplot,Gplot,Uplot,ncol=1)
How can I merge the x axis label, the y axis labels, and the Species legend into 1 for the entire figure?
Also, would it make more sense to have the Position tick marks labeled for the entire figure or for each figure?
Try this:
require(dplyr)
require(ggplot2)
nt.df %>%
filter(Species %in% c('Human', 'Arabidopsis')) %>%
ggplot(aes(fill = Species, y = Percent, x = Position)) +
geom_bar(position="dodge", stat="identity") +
facet_wrap(. ~ Nucleotide)
Since the data wasn't posted I couldn't test it, but this should work. Let me know if you get an error. If you've never used piping before (%>%), it's a popular way to make code more readable and concise. Basically it makes whatever is to the left the first arg in the function to the right. In this case, data is the first arg in ggplot() so the filtered dataset goes into ggplot()
Related
I want to plot multiple categories on a single graph, with the percentages of each category adding up to 100%. For example, if I were plotting male versus female, each grouping (male or female), would add up to 100%. I'm using the following code, where the percentages appear to be for all groups on both graphs, i.e. if you added up all the bars on the left and right hand graphs, they would total 100%, rather than the yellow bars on the left hand graph totalling 100%, the purple bars on the left hand graph totalling 100% etc.
I appreciate that this is doable by using stat = 'identity', but is there a way to do this in ggplot without wrangling the dataframe prior to plotting?
library(ggplot2)
tmp <- diamonds %>% filter(color %in% c("E","I")) %>% select(color, cut, clarity)
ggplot(data=tmp,
aes(x=clarity,
fill=cut)) +
geom_bar(aes(y = (..count..)/sum(..count..)), position="dodge") +
scale_y_continuous(labels = scales::percent) + facet_wrap(vars(color))
When computing the percentages inside ggplot2 you have to do the grouping of the data as you would when summarizing the data before passing it to ggplot. In your case the PANEL column added internally to the data by ggplot2 could be used for the grouping:
Using after_stat and tapply this could be achieved like so:
library(ggplot2)
library(dplyr)
tmp <- diamonds %>% filter(color %in% c("E","I")) %>% select(color, cut, clarity)
ggplot(data=tmp,
aes(x=clarity,
fill=cut)) +
geom_bar(aes(y = after_stat(count/tapply(count, PANEL, sum)[PANEL])), position="dodge") +
scale_y_continuous(labels = scales::percent) + facet_wrap(vars(color))
Or using the .. notation:
ggplot(data=tmp,
aes(x=clarity,
fill=cut)) +
geom_bar(aes(y = ..count../tapply(..count.., ..PANEL.., sum)[..PANEL..]), position="dodge") +
scale_y_continuous(labels = scales::percent) + facet_wrap(vars(color))
EDIT If you need to group by more than one variable I would suggest to make use of a helper function, where I make use of dplyr for the computations:
comp_pct <- function(count, PANEL, cut) {
data.frame(count, PANEL, cut) %>%
group_by(PANEL, cut) %>%
mutate(pct = count / sum(count)) %>%
pull(pct)
}
ggplot(data=tmp,
aes(x=clarity,
fill=cut)) +
geom_bar(aes(y = after_stat(comp_pct(count, PANEL, fill))), position="dodge") +
scale_y_continuous(labels = scales::percent) + facet_wrap(vars(color))
I am trying to create a barplot with the ggplot2 library. My data is stored in read.csv2 format.
# Library
library(ggplot2)
library(tidyverse) # function "%>%"
# 1. Read data (comma separated)
data = read.csv2(text = "Age;Frequency
0 - 10;1
11 - 20;5
21 - 30;20
31 - 40;13
41 - 49;1")
# 2. Print table
df <- as.data.frame(data)
df
# 3. Plot bar chart
ggplot(df, aes(x = Age)) +
geom_bar() +
theme_classic()
The code runs fine, but it produces a graph that looks like all data are at max all the time.
You need to specify your y axis as well:
ggplot(df, aes(x = Age, y = Frequency)) +
geom_bar(stat = "identity") +
theme_classic()
The default value of geom_bar plots the frequency of the values which is 1 for all the Age values here (Check table(df$Age)). You may use geom_bar with stat = 'identity'
library(ggplot2)
ggplot(df, aes(Age, Frequency)) +
geom_bar(stat = 'identity') +
theme_classic()
OR geom_col :
ggplot(df, aes(Age, Frequency)) +
geom_col() +
theme_classic()
I am missing some basics in R.
How do I make a plot for each column in a data frame?
I have tried making plots for each column separately. I was wondering if there was a easier way?
library(dplyr)
library(ggplot2)
data(economics)
#scatter plots
ggplot(economics,aes(x=pop,y=pce))+
geom_point()
ggplot(economics,aes(x=pop,y=psavert))+
geom_point()
ggplot(economics,aes(x=pop,y=uempmed))+
geom_point()
ggplot(economics,aes(x=pop,y=unemploy))+
geom_point()
#boxplots
ggplot(economics,aes(y=pce))+
geom_boxplot()
ggplot(economics,aes(y=pop))+
geom_boxplot()
ggplot(economics,aes(y=psavert))+
geom_boxplot()
ggplot(economics,aes(y=uempmed))+
geom_boxplot()
ggplot(economics,aes(y=unemploy))+
geom_boxplot()
All I'm looking for is having 1 box plot 2*2 and 1 2*2 scatter plot with ggplot2. I understand there is facet grid which I have failed to understand how to implement.(I believe this can be achieved easily with par(mfrow()) and base R plots. I saw somewhere else using using widening the data? which i didn't understand.
In cases like this the solution is almost always to reshape the data from wide to long format.
economics %>%
select(-date) %>%
tidyr::gather(variable, value, -pop) %>%
ggplot(aes(x = pop, y = value)) +
geom_point(size = 0.5) +
facet_wrap(~ variable, scales = "free_y")
economics %>%
tidyr::gather(variable, value, -date) %>%
ggplot(aes(y = value)) +
geom_boxplot() +
facet_wrap(~ variable, scales = "free_y")
I have a problem when doing an animated pie chart with gganimate and ggplot.
I want to have normal pies each year, but my output is totally different.
You can see an example of the code using mtcars:
library(ggplot2)
library(gganimate)
#Some Data
df<-aggregate(mtcars$mpg, list(mtcars$cyl,mtcars$carb), sum)
colnames(df)<-c("X","Y","Z")
bp<- ggplot(df, aes(x="", y=Z, fill=X, frame=Y))+
geom_bar(width = 1, stat = "identity") + coord_polar("y", start=0)
gganimate(pie, "output.gif")
An this is the output:
It works well when the frame has only one level:
The ggplot code creates a single stacked bar chart with a section for every row in df. With coord_polar this becomes a single pie chart with a wedge for each row in the data frame. Then when you use gg_animate, each frame includes only the wedges that correspond to a given level of Y. That's why you're getting only a section of the full pie chart each time.
If instead you want a full pie for each level of Y, then one option would be to create a separate pie chart for each level of Y and then combine those pies into a GIF. Here's an example with some fake data that (I hope) is similar to your real data:
library(animation)
# Fake data
set.seed(40)
df = data.frame(Year = rep(2010:2015, 3),
disease = rep(c("Cardiovascular","Neoplasms","Others"), each=6),
count=c(sapply(c(1,1.5,2), function(i) cumsum(c(1000*i, sample((-200*i):(200*i),5))))))
saveGIF({
for (i in unique(df$Year)) {
p = ggplot(df[df$Year==i,], aes(x="", y=count, fill=disease, frame=Year))+
geom_bar(width = 1, stat = "identity") +
facet_grid(~Year) +
coord_polar("y", start=0)
print(p)
}
}, movie.name="test1.gif")
The pies in the GIF above are all the same size. But you can also change the size of the pies based on the sum of count for each level of Year (code adapted from this SO answer):
library(dplyr)
df = df %>% group_by(Year) %>%
mutate(cp1 = c(0, head(cumsum(count), -1)),
cp2 = cumsum(count))
saveGIF({
for (i in unique(df$Year)) {
p = ggplot(df %>% filter(Year==i), aes(fill=disease)) +
geom_rect(aes(xmin=0, xmax=max(cp2), ymin=cp1, ymax=cp2)) +
facet_grid(~Year) +
coord_polar("y", start=0) +
scale_x_continuous(limits=c(0,max(df$cp2)))
print(p)
}
}, movie.name="test2.gif")
If I can editorialize for a moment, although animation is cool (but pie charts are uncool, so maybe animating a bunch of pie charts just adds insult to injury), the data will probably be easier to comprehend with a plain old static line plot. For example:
ggplot(df, aes(x=Year, y=count, colour=disease)) +
geom_line() + geom_point() +
scale_y_continuous(limits=c(0, max(df$count)))
Or maybe this:
ggplot(df, aes(x=Year, y=count, colour=disease)) +
geom_line() + geom_point(show.legend=FALSE) +
geom_line(data=df %>% group_by(Year) %>% mutate(count=sum(count)),
aes(x=Year, y=count, colour="All"), lwd=1) +
scale_y_continuous(limits=c(0, df %>% group_by(Year) %>%
summarise(count=sum(count)) %>% max(.$count))) +
scale_colour_manual(values=c("black", hcl(seq(15,275,length=4)[1:3],100,65)))
I'd like to show data values on stacked bar chart in ggplot2. After many attempts, the only way I found to show the total amount (for each bean) is using the following code
set.seed(1234)
df <- data.frame(
sex=factor(rep(c("F", "M"), each=200)),
weight=round(c(rnorm(200, mean=55, sd=5), rnorm(200, mean=65, sd=5)))
)
p<-ggplot(df, aes(x=weight, fill=sex, color=sex))
p<-p + geom_histogram(position="stack", alpha=0.5, binwidth=5)
tbl <- (ggplot_build(p)$data[[1]])[, c("x", "count")]
agg <- aggregate(tbl["count"], by=tbl["x"], FUN=sum)
for(i in 1:length(agg$x))
if(agg$count[i])
p <- p + geom_text(x=agg$x[i], y=agg$count[i] + 1.5, label=agg$count[i], colour="black" )
which generates the following plot:
Is there a better (and more efficient) way to get the same result using ggplot2?
Thanks a lot in advance
You can use stat_bin to count up the values and add text labels.
p <- ggplot(df, aes(x=weight)) +
geom_histogram(aes(fill=sex, color=sex),
position="stack", alpha=0.5, binwidth=5) +
stat_bin(aes(y=..count.. + 2, label=..count..), geom="text", binwidth=5)
I moved the fill and color aesthetics to geom_histogram so that they would apply only to that layer and not globally to the whole plot, because we want stat_bin to generate and overall count for each bin, rather than separate counts for each level of sex. ..count.. is an internal variable returned by stat_bin that stores the counts.
In this case, it was straightforward to add the counts directly. However, in more complicated situations, you might sometimes want to summarise the data outside of ggplot and then feed the summary data to ggplot. Here's how you would do that in this case:
library(dplyr)
counts = df %>% group_by(weight = cut(weight, seq(30,100,5), right=FALSE)) %>%
summarise(n = n())
countsByGroup = df %>% group_by(sex, weight = cut(weight, seq(30,100,5), right=FALSE)) %>%
summarise(n = n())
ggplot(countsByGroup, aes(x=weight, y=n, fill=sex, color=sex)) +
geom_bar(stat="identity", alpha=0.5, width=1) +
geom_text(data=counts, aes(label=n, y=n+2), colour="black")
Or, you can just create countsByGroup and then create the equivalent of counts on the fly inside ggplot:
ggplot(countsByGroup, aes(x=weight, y=n, fill=sex, color=sex)) +
geom_bar(stat="identity", alpha=0.5, width=1) +
geom_text(data=countsByGroup %>% group_by(weight) %>% mutate(n=sum(n)),
aes(label=n, y=n+2), colour="black")