Displaying similar bars together in ggplot - r

I have a Count for each Site (which corresponds with a country), and each Site belongs to a Region. The data looks like this:
> summary_data
Site Count Region
1 Chad 5 Africa
2 Angola 1 Africa
3 France 10 Europe
4 USA 6 Americas
5 Bolivia 3 Americas
6 Chile 4 Americas
I would like to generate a bar graph that:
Has a bar per country
The bars for a region are all next to each other in the bar graph
Per region, the bars appear in descending order
The bars are all the same width, but the heights are all on the same scale
Can be generalized (in particular: arbitrary regions, arbitrary countries per region)
I do not want to use fill color to represent the region (I want to use color to represent another characteristic eventually)
I want to have some visual representation to group the columns. For instance, having a gray background behind all the columns for the Americas region, a blue background behind all the columns for the Africa region, etc). I actually would be open to other approaches (perhaps a line at the top spanning all of Africa with "Africa" as a label or something).
Obviously each region can have a different number of country sites, and no country site spans two regions (I tried using facets but quickly realized that was not the right route). I also tried looping through all the regions to generate separate graphs per region and then put them together but that didn't quite seem the right approach either.
I have generated a graph like this (Closest I have gotten):
Using this code:
library("dplyr")
library(ggplot2)
sorted <- arrange(summary_data,Region,-Count)
sorted$Site <- factor(sorted$Site, levels = sorted$Site)
bar = ggplot(sorted,
aes(
x = Site,
y = Count,
fill = Region
)) +
geom_col()
print(bar)
But this does not meet the last two requirements I set above (I specifically do not want to use fill to represent region). I started down the path of geom_rect() but did not understand the coordinate system for discrete x values rather than continuous (I did find Stackoverflow questions / answers on continuous but didn't see how to translate to this). I think having shaded rectangles behind the columns is probably the best approach, but I would appreciate any input in general approach as well as how to pull it off.

You could consider defining a new panel for each region to separate them using facet_grid. If you want the colors to be the same, just remove the aes(fill = Site) argument inside geom_bar.
The argument space = "free_x" assures that the width of the bars are the same and with scale = free only those axis values corresponding to the specific region are shown.
ggplot(sorted, aes(x = Site, y = Count)) +
geom_bar(position = "dodge", stat = "identity", aes(fill = Site)) +
facet_grid(. ~ Region,scale="free", space="free_x")

Related

Adding values to ggplot points

I am using ggplot in R on a Mac, doing a line graph using the group option. I want to add the values that correspond to the end points for each of the lines. This is part of the data I am using:
Year Foundation Type No. of Houses Percent Shares
1 2000 Crawl Space 209529 16.84583
2 2001 Crawl Space 206431 16.58441
3 2002 Crawl Space 204327 15.58577
4 2003 Crawl Space 213328 15.39025
5 2004 Crawl Space 224195 14.63272
6 2005 Crawl Space 258254 15.91873
I run the following code:
ggplot(USbyFoundType, aes(x=Year, y=`Percent Shares`,
group=`Foundation Type`, color=`Foundation Type`)) +
geom_line()
I get this chart. I want to place the value at the end of each of the lines.
Thanks for any help
It would be nice to have a reproducible example, but something like:
endpts <- (USbyFoundType
%>% group_by(`Foundation Type`)
%>% filter(Year == max(Year))
)
Then add
+ geom_text(data = endpts, aes(x = Year, y = `Percent Shares`,
colour = `Foundation Types`,
label = `Percent Shares`)
You'll probably have to play with horizontal justification (hjust), spacing (nudge_x), and margins (e.g. + expand_limits(y=2030)).
This question is about plotting labels (not values) at the end of the lines, but contains lots of useful information about adjusting positioning, margins, clipping etc.

R: draw unique set of factors for each facet in ggplot2 barchart

Suppose we have a set of commodities (apples, bananas, potatoes etc) distributed over different continents. We visualize their distribution on continents via faceted barcharts in ggplot2 package, and these commodities (called in what follows "stuff" field) act as factors to be displayed on x axis. Each continent has its own set of stuff, as shown in the data, although certain commodities can be common (bananas) on two or more continents. Here is the data example in short format. Fields "medium" and "giant" additionally subdivide the market separating out things into medium and big sizes (to be plotted with different colours).
data<-read.csv(text="continent,stuff,average,giant
North America,apples,20,30
North America,bananas,25,32
Europe,bananas,15,25
Europe,potatoes,10,20
Europe,mosquitoes,13,17
Asia,snakes,26,35
Asia,snails,7,15
Asia,pandas,10,20")
First we reduce the data to long format, and next plot it via geom_col() and faceting technique:
library(dplyr)
library(tidyr)
library(ggplot2)
data.tidied<-data %>%
gather(key=size, value=val,-continent,-stuff)
ggplot(data.tidied,aes(x=stuff,y=val,fill=size))+
geom_col(position="dodge")+
facet_grid(~continent)+coord_flip()
All factors in the stuff are aligned across all continents, although most of them are not needed, so there are many gaps. But we don't need any snails in North America and Europe, it is natural to have this field only for the Asia facet and so on. (To make things clearer, you may think of apples/bananas/potatoes as some geographical localities, unique for a continent: we do not have any California in Europe). So: how to display this situation using nevertheless faceting technique of ggplot (or any alternative)? That is: how to draw a unique set of factors for each facet?
You can use facet_wrap instead of facet_grid and specify scales = "free_y" (has to be free_y as you flipped the axes). But it makes the charts look a little odd, in my opinion.
data %>%
gather(size, val, -continent, -stuff) %>%
ggplot(aes(stuff, val)) +
geom_col(aes(fill = size), position = "dodge") +
facet_wrap(~continent, scales = "free_y") +
coord_flip()

ggplot scale_fill_discrete(breaks = user_countries) creates a second, undesired legend

I am trying to change the factor level ordering of a data frame column to control the legend ordering and ggplot coloring of factor levels specified by country name. Here is my dataframe country_hours:
countries hours
1 Brazil 17
2 Mexico 13
3 Poland 20
4 Indonesia 2
5 Norway 20
6 Poland 20
Here is how I try to plot subsets of the data frame depending on a list of selected countries, user_countries:
make_country_plot<-function(user_countries, country_hours_pre)
{
country_hours = country_hours_pre[which(country_hours_pre$countries %in% user_countries) ,]
country_hours$countries = factor(country_hours$countries, levels = c(user_countries))
p = ggplot(data=country_hours, aes(x=hours, color=countries))
for(name in user_countries){
p = p + geom_bar( data=subset(country_hours, countries==name), aes(y = (..count..)/sum(..count..), fill=countries), binwidth = 1, alpha = .3)
}
p = p + scale_y_continuous(labels = percent) + geom_density(size = 1, aes(color=countries), adjust=1) +
ggtitle("Baltic countries") + theme(plot.title = element_text(lineheight=.8, face="bold")) + scale_fill_discrete(breaks = user_countries)
}
This works great in that the coloring goes according to my desired order as does the top legend, but a second legend appears and shows a different order. Without scale_fill_discrete(breaks = user_countries) I do not get my desired order, but I also do not get two legends. In the plot shown below, the desired order, given by user_countries was
user_countries = c("Lithuania", "Latvia", "Estonia")
I'd like to get rid of this second legend. How can I do it?
I also have another problem, which is that the plotting/coloring is inconsistent between different plots. I'd like the "first" country to always be blue, but it's not always blue. Also the 'real' legend (darker/solid colors) is not always in the same position - sometimes it's below the incorrect/black legend. Why does this happen and how can I make this consistent across plots?
Also, different plots have different numbers of factor groups, sometimes more than 9, so I'd rather stick with standard ggplot coloring as most of the solutions for defining your own colors seem limited in the number of colors you can do (How to assign colors to categorical variables in ggplot2 that have stable mapping?)
You are mapping to two different aesthetics (color and fill) but you changed the scale specifications for only one of them. Doing this will always split a previously combined legend. There is a nice example of this on this page
To keep your legends combined, you'll want to add scale_color_discrete(breaks = user_countries) in addition to scale_fill_discrete(breaks = user_countries).
I don't have enough reputation to comment, but this previous question has a comprehensive answer.
Short answer is to change geom_density so that it doesn't map countries to color. That means just taking everything inside the aes() and putting it outside.
geom_density(size = 1, color=countries, adjust=1)
(This should work. Don't have an example to confirm).

Ordering bars in a stacked bar plot using ggplot

The following is a simplified version of my dataframe (without too much loss in generality)
sales<-data.frame(ItemID=c(1,3,7,9,10,12),
Salesman=c("Bob","Sue","Jane","Bob","Sue","Jane"),
ProfitLoss=c(10.00,9.00,9.50,-7.50,-11.00,-1.00))
which produces
ItemID Salesman ProfitLoss
1 1 Bob 10.0
2 3 Sue 9.0
3 7 Jane 9.5
4 9 Bob -7.5
5 10 Sue -11.0
6 12 Jane -1.0
The following produces a stacked bar plot of each salesman's sales, ordered by the overall profit for each salesman.
sales$Salesman<-reorder(sales$Salesman,-sales$ProfitLoss,FUN="sum") #to order the bars
profits<-sales[which(sales$ProfitLoss>0),]
losses<-sales[which(sales$ProfitLoss<0),]
ggplot()+
geom_bar(data=losses,aes(x=Salesman, y=ProfitLoss),stat="identity", color="white")+
geom_bar(data=profits,aes(x=Salesman, y=ProfitLoss),stat="identity", color="white")
This works exactly as I desire. My issue arises when one of the salesmen has a profit but no loss, or a loss but no profit. For instance, changing sales to
sales<-data.frame(ItemID=c(1,3,7,9,10),
Salesman=c("Bob","Sue","Jane","Bob","Sue"),
ProfitLoss=c(10.00,9.00,9.50,-7.50,-11.00))
and reapplying the previous steps produces
So, the salesman are clearly out of order. For this example I can cheat and plot my profits before losses like
ggplot()+
geom_bar(data=profits,aes(x=Salesman, y=ProfitLoss),stat="identity", color="white")+
geom_bar(data=losses,aes(x=Salesman, y=ProfitLoss),stat="identity", color="white")
but that won't work for my real dataset.
Edit: In my real dataset, each salesman has more than two sales, and for each salesman I've stacked the bars so that the smallest bars in magnitude are closest to the x axis and the largest bars (i.e. biggest profit, biggest loss) are farthest from the x axis. For this reason, I need to call geom_bar() on both the profits dataframe and the losses dataframe. (I originally left this information out to try to avoid making my question too complex.)
The problem is the first plot call to geom_bar(losses dataset) only has two levels of salesman, hence the order is changed - that's why calling profits first still works (as there are still all levels). But your reordering works if you change the plot call
sales<-data.frame(ItemID=c(1,3,7,9,10),
Salesman=c("Bob","Sue","Jane","Bob","Sue"),
ProfitLoss=c(10.00,9.00,9.50,-7.50,-11.00))
#to order the bars
sales$Salesman<-reorder(sales$Salesman,-sales$ProfitLoss,FUN="sum")
# Changed plot call
ggplot(sales, aes(x = factor(Salesman), y = ProfitLoss)) +
geom_bar(stat = "identity",position="dodge",color="white")
-------------------------------------------------------------------------------
Following edit; Do you want the longest bars [ie the largest (profit + abs(losses))] furthest from the y-axis, rather than by descending revenue. You can do this by changing the reorder function. Apologies if i misunderstand.
I changed Jane's data so that it is the longest overall bar
sales<-data.frame(ItemID=c(1,3,7,9,10),
Salesmn=c("Bob","Sue","Jane","Bob","Sue"),
ProfitLoss=c(10.00,9.00,29.50,-7.50,-11.00))
sales$Salesman<-reorder(sales$Salesman,-sales$ProfitLoss,function(z) sum(abs(z)))
ggplot(sales, aes(x = factor(Salesman), y = ProfitLoss)) +
geom_bar(stat = "identity",position="dodge",color="white")

varying axis values in facet_wrap

I am working with a Danish dataset on immigrants by country of origin and age group. I transformed the data so I can see the top countries of origin for each age group.
I am plotting it using facet_wrap. What I would like to do is, since different age groups come from quite different areas, to show a different set of values for one axis in each facet. For example, those that are between 0 and 10 years old come from countries x,y and z, while those 10-20 years of age come from countries q, r, z and so on.
In my current version, it shows the entire set of values, including countries that are not in the top 10. I would like to show just the top ten countries of origin for each facet, in effect having different axis labels for each. (And, if it is possible, sorting by high to low for each facet).
Here is what I have so far:
library(ggplot2)
library(reshape)
###load and inspect data
load(url('http://dl.dropbox.com/u/7446674/dk_census.rda'))
head(dk_census)
###reshape for plotting--keep just a few age groups
dk_census.m <- melt(dk_census[dk_census$Age %in% c('0-9 år', '10-19 år','20-29 år','30-39 år'),c(1,2,4)])
###get top 10 observations for each age group, store in data frame
top10 <- by(dk_census.m[order(dk_census.m$Age,-dk_census.m$value),], dk_census.m$Age, head, n=10)
top10.df<-do.call("rbind", as.list(top10))
top10.df
###plot
ggplot(data=top10.df, aes(x=as.factor(Country), y=value)) +
geom_bar(stat="identity")+
coord_flip() +
facet_wrap(~Age)+
labs(title="Immigrants By Country by Age",x="Country of Origin",y="Population")
One option (that I actually strongly suspect you won't be happy with) is this:
p <- ggplot(data=top10.df, aes(x=Country, y=value)) +
geom_bar(stat="identity")+
coord_flip() +
facet_wrap(~Age)+
labs(title="Immigrants By Country by Age",x="Country of Origin",y="Population")
pp <- dlply(.data=top10.df,.(Age),function(x) {x$Country <- reorder(x$Country,x$value); p %+% x})
library(gridExtra)
do.call(grid.arrange,pp)
(Edited to sort each graph.)
Keep in mind that the only reason faceting exists is to plot multiple panels that share a common scale. So when you start asking to facet on some variable, but have the scales be different (oh, and also sort them separately on each panel as well) what you're doing is really no longer faceting. It's just making four different plots and arranging them together.
using lattice (Here I use ``latticeExtrafor ggplot2 theme), you can set torelation=freebetween panels. Here I am using abbreviate = TRUE` to short long labels.
library(latticeExtra)
barchart(value~ Country|Age,data=top10.df,layout=c(2,2),
horizontal=T,
par.strip.text =list(cex=2),
scales=list(y=list(relation='free',cex=1.5,abbreviate=T,
labels=levels(factor(top10.df$Country)))),
# ,cex=1.5,abbreviate=F),
par.settings = ggplot2like(),axis=axis.grid,
main="Immigrants By Country by Age",
ylab="Country of Origin",
xlab="Population")

Resources