Stacked barplot histogram in R - r

I would like to make a histogram for my data but I would also like to visualize it in such a way that each category is coloured differently but stacked together.
This is what I'm trying to achieve: Stacked histogram from already summarized counts using ggplot2
but I'm unsure how to do it for my data set and my R skills are very much on the rusty side.
My data is formatted like this
Name Category Age Year
1 A 3 2017
2 B 6 2016
3 B 12 2017
4 B 8 2017
I'm only interested in Category B so I made a subset called catB. I would like the histogram to graph the frequency of the different ages, and I would like to colour the stacks based on year (in my data there are 5 year options).
I would appreciate any help! Thank you!

ggplot(catB, aes(x = Age, fill = Year)) +
geom_histogram()

one more nice graphical option. You have to add frequency(count): in example given it is count=1. However you have to see on real data what is count value:
catB <- cbind(catB, count=1)
ggplot(catB, aes(x=Age, y=count)) + geom_histogram(aes(fill=Year), stat="identity", group=1)

Related

Plot large panel data in R by category

I have a dataset (df) that looks like this:
EIN Year Cat Fund
1 16 2005 A 9784.490
2 16 2006 A 10020.720
3 16 2007 A 9232.796
4 15 2008 B 8567.893
5 15 2009 B 10292.670
6 17 2010 C 9274.589
The data has relatively large dimensions (around 300k observations), which makes plotting a potentially slow process. I would like to plot the variable Fund for each year, by the identifier EIN. Based on this post I have tried the following code:
library(ggplot2)
ggplot(df, mapping = aes(x = Year, y = Fund)) +
geom_line(aes(linetype = as.factor(EIN)))
Here are my questions:
This code becomes pretty slow given the high amount of observations that I have. Do you suggest any alternatives that could speed up the process?
Since I have a huge number of EINs, the legend ends-up taking all the space available for the graph, so I would like to get rid of it unsuccesfully. I tried adding + guides(fill=FALSE) at the end, but it did not work. Any advice?
If I wanted to either subset or color code my plot by Cat, what would be the best way to do it?
Thanks a lot for your help!
You can get rid of the legend using:
+ theme(legend.position = 'none')
To subset (facet) your plot, especially if there aren't too many categories, use facet_wrap:
+ facet_wrap(~Cat)
To colour instead, put colour = Cat inside your aes() calll.

Is something wrong with my ggplot2 or R code to plot an ordered ancestry stacked barplot?

I want to create an organized stacked barplot where bars with similar proportions appear together. I have a data frame of 10,000 individuals and each individual comes from three populations. Here is my data.
library(MCMCpack)
library(ggplot2)
n = 10000
alpha = c(0.1, 0.1, 0.1)
q <- as.data.frame(rdirichlet(n,alpha))
head(q)
individuals <- c(1:nrow(q))
q <- cbind(q, individuals)
head(q)
V1 V2 V3 individuals
1 0.0032720232 3.381345e-08 0.996727943 1
2 0.3354060035 4.433923e-01 0.221201688 2
3 0.0004121665 9.661220e-01 0.033465842 3
4 0.9966997182 3.234048e-03 0.000066234 4
5 0.7789280208 2.090134e-01 0.012058562 5
6 0.0005048727 9.408364e-02 0.905411485 6
# long format for ggplot2 plotting
qm <- gather(q, key, value, -individuals)
colnames(qm) <- c("individuals", "ancestry", "proportions")
head(qm)
individuals ancestry proportions
1 1 V1 0.0032720232
2 2 V1 0.3354060035
3 3 V1 0.0004121665
4 4 V1 0.9966997182
5 5 V1 0.7789280208
6 6 V1 0.0005048727
Without any kind of ordering of data, I plotted the stacked barplot as:
ggplot(qm) + geom_bar(aes(x = individuals, y = proportions, fill= ancestry), stat="identity")
I have two questions:
(1) I don't know how to make these individuals with similar proportions cluster together, and I have tried many solutions on stack exchange already but can't get them to work on my dataset!
(2) For some reason, it seems like when I implement the code to order individuals by decreasing/increasing proportions in one ancestry, the code sometimes works on toy datasets of lower dimensions I create, but when I try to plot 10,000 individuals, the code doesn't work anymore! Is this a problem in ggplot2 or am I doing something wrong? I would appreciate any answer to this thread to also plot n = 10,000 stacked barplots.
(3) Not sure if I'm imagining this, but in my stacked barplot, it seems like R is clustering the stacked bar plots in some order unknown to me -- because I can see regular gaps between the stacked plots. In reality, there should be no gaps and I'm not sure why this is happening.
I would appreciate any help since I have already worked on this code for an embarrassingly long amount of time!!
Since, the variance of proportions within the ancestry is very high, the bars look like clustered with other ancestry. It is plotted in the right way. However, we couldn't distinguish the difference because the number of individuals is high.
If you think that the proportions on your data set would not lose it's meaning and could be interpreted in the same way if they're transformed intro exponential or log values, you can try it.
The stacked bar with exponential of the proportions:
ggplot(qm) + geom_bar(aes(x = individuals, y = exp(proportions), fill= ancestry),
stat="identity")
If you don't want have gaps between the bars, set widht to 1.
ggplot(qm) + geom_bar(aes(x = individuals, y = exp(proportions), fill= ancestry),
stat="identity",
width=1)

Transform a ggplot stacked bar into pie chart or alternative

I am having trouble deciding how to graph the data I have.
It consists of overlapping quantities that represent a population, hence my decision to use a stacked bar.
These represent six population divisions ("groups") wherein group 1 and group 2 are the main division. Groups 4 to 6 are subgroups of two, and these are subgroups of each other. Its simple diagram is below:
Note: groups 1 and 2 complete the entire population or group 1 + group 2 = 100%.
I want all of these information in one chart which I do not know what and how to implement.
So far I have the one below, which is wrong because Group 1 is included in the main bar.
require(ggplot2)
require(reshape)
tab <- data.frame(
set=c("XXX","XXX","XXX","XXX","XXX","XXX"),
group=c("1","6","5","4","3","2"),
rate=as.numeric(c(10000,20000,50000,55000,75000,100000))
)
dat <- melt(tab)
dat$time <- factor(dat$group,levels=dat$group)
ggplot(dat,aes(x=set)) +
geom_bar(aes(weight=value,fill=group),position="fill",color="#7F7F7F") +
scale_fill_brewer("Groups", palette="OrRd")
What do you guys suggest to visualize it? I want to use R and ggplot for consistency and uniformity with the other graphs I have made already.
Using facets you can divide your plot into two:
# changed value of set for group 1
tab <- data.frame(
set=c("UUU","XXX","XXX","XXX","XXX","XXX"),
group=c("1","6","5","4","3","2"),
rate=as.numeric(c(10000,20000,50000,55000,75000,100000))
)
# explicitly defined id.vars
dat <- melt(tab, id.vars=c('set','group'))
dat$time <- factor(dat$group,levels=dat$group)
# added facet_wrap, in geom_bar aes changed weight to y,
# added stat="identity", changed position="stack"
ggplot(dat,aes(x=set)) +
geom_bar(aes(y=value,fill=group),position="stack", stat="identity", color="#7F7F7F") +
scale_fill_brewer("Groups", palette="OrRd") +
facet_wrap(~set, scale="free_x")
My guess is what you need is a treemap. Please correct me if I misunderstood your question.
here a link on Treemapping]1
If tree map is what you need you can use either portfolio package or googleVis.

Plot multiple histograms in one using ggplot2 in R

I am fairly new to R and ggplot2 and am having some trouble plotting multiple variables in the same histogram plot.
My data is already grouped and just needs to be plotted. The data is by week and I need to plot the number for each category (A, B, C and D).
Date A B C D
01-01-2011 11 0 11 1
08-01-2011 12 0 3 3
15-01-2011 9 0 2 6
I want the Dates as the x axis and the counts plotted as different colors according to a generic y axis.
I am able to plot just one of the categories at a time, but am not able to find an example like mine.
This is what I use to plot one category. I am pretty sure I need to use position="dodge" to plot multiple as I don't want it to be stacked.
ggplot(df, aes(x=Date, y=A)) + geom_histogram(stat="identity") +
labs(title = "Number in Category A") +
ylab("Number") +
xlab("Date") +
theme(axis.text.x = element_text(angle = 90))
Also, this gives me a histogram with spaces in between the bars. Is there any way to remove this? I tried spaces=0 as you would do when plotting bar graphs, but it didn't seem to work.
I read some previous questions similar to mine, but the data was in a different format and I couldn't adapt it to fit my data.
This is some of the help I looked at:
Creating a histogram with multiple data series using multhist in R
http://www.cookbook-r.com/Graphs/Plotting_distributions_%28ggplot2%29/
I'm also not quite sure what the bin width is. I think it is how the data should be spaced or grouped, which doesn't apply to my question since it is already grouped. Please advise me if I am wrong about this.
Any help would be appreciated.
Thanks in advance!
You're not really plotting histograms, you're just plotting a bar chart that looks kind of like a histogram. I personally think this is a good case for faceting:
library(ggplot2)
library(reshape2) # for melt()
melt_df <- melt(df)
head(melt_df) # so you can see it
ggplot(melt_df, aes(Date,value,fill=Date)) +
geom_bar() +
facet_wrap(~ variable)
However, I think in general, that changes over time are much better represented by a line chart:
ggplot(melt_df,aes(Date,value,group=variable,color=variable)) + geom_line()

How can I use ggplot2 to make a stacked barplot of pretabulated data with each column being a factor on each bar? [duplicate]

This question already has answers here:
Making a stacked bar plot for multiple variables - ggplot2 in R
(3 answers)
Closed 9 years ago.
I have data that has the following format:
revision added removed changed confirmed
1 20 0 0 0
2 18 3 8 10
3 12 8 14 10
4 6 5 11 8
5 0 1 7 11
Each row represents a revision of a document. The first column is the revision number, and the remaining columns represent elements added, removed, changed, and confirmed (ready) in the respective revision. (In reality, there are more rows and columns, this is just an example.) Each number represents the amount of recorded additions, removals, changes, and confirmations in each respective revision.
What I need is a stacked barplot that looks like somthing like this:
I would like to do this in ggplot2. The exact visual look is not important (fonts, colours, and placement of the legend) as long as I can tweak it later. At the moment, it's the general idea I'm looking for.
I've looked at several questions and answers, e.g.
How do I do a Barplot of already tabled data?,
Making a stacked bar plot for multiple variables - ggplot2 in R,
barplot with 3 variables (continous X and Y and third stacked variable), and
Stacked barplot, but they all seem to make assumptions that don't match my data. I've also experimented with something like this:
ggplot(data) + geom_bar(aes(x=revision, y=added), stat="identity", fill="white", colour="black") + geom_bar(aes(x=revision, y=removed), stat="identity", fill="red", colour="black")
But obviously this does not create a stacked barplot because it just drawns the second geom_bar over the first.
How can I make a stacked barplot of my data using ggplot2?
Try:
library(reshape2)
dat <- melt(data, id="revision")
ggplot(dat, aes(x=revision, y=value, fill=variable)) +
geom_bar(stat="identity")

Resources