How to make a pie chart with ggplot2? - r

I am trying to plot a pie chart using the following dataset
dt <- data.frame(name= c("A", "B", "C"),
one = sample(1:10, 3),
two= sample(1:10, 3),
three =sample(1:10, 3))
Of course the data are untidy, so I rearrange the dataset in a longitudinal form using
library(dplyr)
dt <- dt %>% gather("letter")
colnames(dt)[2] <- "number"
And I am perfectly able to plot a barchart
library(ggplot2)
ggplot(dt, aes(x=letter, y=value, fill=number)) +
geom_bar(stat="identity")
But when I apply the coord_polar() transformation, I can't make the slices look even nor make the pie-chart to sum up to 100%
ggplot(dt, aes(x=letter, y=value, fill=number)) +
geom_bar(stat="identity") +
coord_polar(theta = "x")

Related

For looping x-as in ggplot

I would like to create multiple histograms (ggplot) using a for loop. The problem is that my x-as from the plots, stay the same like "value". Do you know how to change the x-as every time it loops?
My dataframe for example:
df <- data.frame(variable = c("A", "A", "B", "B", "C", "C"), value = c(1, 2, 4, 5, 2, 3))
So that means I get three plots with x-as: "A", "B" and "C"
My code:
for (i in unique(df$variable)){
d <- subset(df, df$variable == i)
print(ggplot(d, aes(x = value)) + geom_histogram())
}
You can take help of imap to get different x-axis value after splitting the data by variable.
library(ggplot2)
list_plot <- df %>%
split(.$variable) %>%
purrr::imap(~ggplot(.x, aes(x = value)) +
geom_histogram() + xlab(.y))
Also have you considered using facets? Where x-axis is the same and you get A, B, C as facet names.
ggplot(df, aes(x = value)) + geom_histogram() + facet_wrap(~variable)

Add multi-stack axes label to plot

I have a dataset, named “data”:
df=ddply(data,c("Treatment","Concentration"),summarise,mean=mean(Inhibition),sd=sd(Inhibition),n=length(Inhibition),se=sd/sqrt(n))
p <- ggplot(df, aes(x=Treatment, y=Inhibition))
p1 <- p + geom_bar(stat="identity", position="dodge") +
geom_errorbar(aes(ymin=Inhibition-se,ymax=Inhibition+se), position="dodge",width=0.2)
and I got the following graph:
I want x-axis to be like the picture below:
How woud I do this??
This is best achieved using a facet within ggplot. As you haven’t included a reusable dataset, I have made one here:
df <- data.frame(Group = c("A", "A", "A", "A", "B"),
SubGroup = c(letters[1:5]),
value = 1:5
)
See below the facet_grid line which has a few additional options specified. You can read more about the added arguments here
library(ggplot2)
ggplot(df, aes(x = SubGroup, value)) +
geom_bar(stat="identity", position="dodge") +
facet_grid(.~Group, scales = "free_x", space = "free", switch = "x") +
theme(strip.placement = "outside")
For your data, you will need to split the drug and dose into two separate columns first, like my example.

proportional line width ggplot2, in Gantt chart

I aim to plot line widths proportional to a variable in a data.frame, a topic which has, for example, been discussed here.
My application is (although the issue is probably not related to that) within a Gantt chart adapted from here as in:
library(reshape2)
library(ggplot2)
MA <- c("A", "B", "C")
dfr <- data.frame(
name = factor(MA, levels = MA),
start.date = as.Date(c("2012-09-01", "2013-01-01","2014-01-01")),
end.date = as.Date(c("2019-01-01", "2017-12-31","2019-06-30")),
prozent = c(1,0.5,0.75)*100
)
mdfr <- melt(dfr, measure.vars = c("start.date", "end.date"))
ggplot(mdfr, aes(value, name)) + geom_line(aes(size = prozent))
This yields
where I do get different line widths, which however do not look proportional to prozent.
Is there a way to make the line widths proportional to prozent?
just add + scale_size_area():
ggplot(mdfr, aes(value, name)) +
geom_line(aes(size = prozent)) +
scale_size_area()

2 stacked histograms with a common x-axis

I want to plot two stacked histograms that share a common x-axis. I want the second histogram to be plotted as the inverse(pointing downward) of the first. I found this post that shows how to plot the stacked histograms (How to plot multiple stacked histograms together in R?). For the sake of simplicity, let's say I just want to plot that same histogram, on the same x-axis but facing in the negative y-axis direction.
You could count up cases and then multiply the count by -1 for one category. Example with data.table / ggplot
library(data.table)
library(ggplot2)
# fake data
set.seed(123)
dat <- data.table(value = factor(sample(1:5, 200, replace=T)),
category = sample(c('a', 'b'), 200, replace=T))
# count by val/category; cat b as negative
plot_dat <-
dat[, .(N = .N * ifelse(category=='a', 1, -1)),
by=.(value, category)]
# plot
ggplot(plot_dat, aes(x=value, y=N, fill=category)) +
geom_bar(stat='identity', position='identity') +
theme_classic()
You can try something like this:
ggplot() +
stat_bin(data = diamonds,aes(x = depth)) +
stat_bin(data = diamonds,aes(x = depth,y = -..count..))
Responding to the additional comment:
library(dplyr)
library(tidyr)
d1 <- diamonds %>%
select(depth,table) %>%
gather(key = grp,value = val,depth,table)
ggplot() +
stat_bin(data = d1,aes(x = val,fill = grp)) +
stat_bin(data = diamonds,aes(x = price,y = -..count..))
Visually, that's a bad example because the scales of the variables are all off, but that's the general idea.

How to plot a (sophisticated) stacked barplot in ggplot2, without complicated manual data aggregation

I want to plot a (facetted) stacked barplot where the X-Axis is in percent. Also the Frequency labels are displayed within the bars.
After quite some work and viewing many different questions on stackoverflow, I found a solution on how to solve this with ggplot2. However, I don't do it directly with ggplot2, I manually aggregate my data with a table call. And I do this manual aggregation in a complicated way and also calculate the percent values manually with temp variables (see source code comment "manually aggregate data").
How can I do the same plot, but in a nicer way without the manual and complicated data aggregation?
library(ggplot2)
library(scales)
library(gridExtra)
library(plyr)
##
## Random Data
##
fact1 <- factor(floor(runif(1000, 1,6)),
labels = c("A","B", "C", "D", "E"))
fact2 <- factor(floor(runif(1000, 1,6)),
labels = c("g1","g2", "g3", "g4", "g5"))
##
## STACKED BAR PLOT that scales x-axis to 100%
##
## manually aggregate data
##
mytable <- as.data.frame(table(fact1, fact2))
colnames(mytable) <- c("caseStudyID", "Group", "Freq")
mytable$total <- sapply(mytable$caseStudyID,
function(caseID) sum(subset(mytable, caseStudyID == caseID)$Freq))
mytable$percent <- round((mytable$Freq/mytable$total)*100,2)
mytable2 <- ddply(mytable, .(caseStudyID), transform, pos = cumsum(percent) - 0.5*percent)
## all case studies in one plot (SCALED TO 100%)
p1 <- ggplot(mytable2, aes(x=caseStudyID, y=percent, fill=Group)) +
geom_bar(stat="identity") +
theme(legend.key.size = unit(0.4, "cm")) +
theme(axis.text.x = element_text(angle = 60, hjust = 1)) +
geom_text(aes(label = sapply(Freq, function(x) ifelse(x>0, x, NA)), y = pos), size = 3) # the ifelse guards against printing labels with "0" within a bar
print(p1)
..
After you make the data:
fact1 <- factor(floor(runif(1000, 1,6)),
labels = c("A","B", "C", "D", "E"))
fact2 <- factor(floor(runif(1000, 1,6)),
labels = c("g1","g2", "g3", "g4", "g5"))
dat = data.frame(caseStudyID=fact1, Group=fact2)
You can automate making an unlabeled graph of the kind that you want with position_fill:
ggplot(dat, aes(caseStudyID, fill=Group)) + geom_bar(position="fill")
I don't know if there's a way to generate the text labels automatically. The positions and counts from the stacked graph are accessible with ggplot_build, if you want to use what ggplot calculates instead of doing it separately.
p = ggplot(dat, aes(caseStudyID, fill=Group)) + geom_bar(position="fill")
ggplot_build(p)$data[[1]]
That will return a dataframe with (among other things), count, x, y, ymin, and ymax variables that can be used to create positioned labels.
If you want the labels vertically centered in each category, first make a column with values halfway between ymin and ymax.
freq = ggplot_build(p)$data[[1]]
freq$y_pos = (freq$ymin + freq$ymax) / 2
Then add the labels to the graph with annotate.
p + annotate(x=freq$x, y=freq$y_pos, label=freq$count, geom="text", size=3)
If you have the distribution of case study ID's in each group as single vector, you could use the sjp.stackfrq function from the sjPlot-package.
A <- floor(runif(1000, 1,6))
B <- floor(runif(1000, 1,6))
C <- floor(runif(1000, 1,6))
D <- floor(runif(1000, 1,6))
E <- floor(runif(1000, 1,6))
mydf <- data.frame(A,B,C,D,E)
sjp.stackfrq(mydf, legendLabels = c("g1","g2", "g3", "g4", "g5"))
The function offers many parameters to easily customize plot appearance (labelling, size and colors etc.).

Resources