varying axis values in facet_wrap - r

I am working with a Danish dataset on immigrants by country of origin and age group. I transformed the data so I can see the top countries of origin for each age group.
I am plotting it using facet_wrap. What I would like to do is, since different age groups come from quite different areas, to show a different set of values for one axis in each facet. For example, those that are between 0 and 10 years old come from countries x,y and z, while those 10-20 years of age come from countries q, r, z and so on.
In my current version, it shows the entire set of values, including countries that are not in the top 10. I would like to show just the top ten countries of origin for each facet, in effect having different axis labels for each. (And, if it is possible, sorting by high to low for each facet).
Here is what I have so far:
library(ggplot2)
library(reshape)
###load and inspect data
load(url('http://dl.dropbox.com/u/7446674/dk_census.rda'))
head(dk_census)
###reshape for plotting--keep just a few age groups
dk_census.m <- melt(dk_census[dk_census$Age %in% c('0-9 år', '10-19 år','20-29 år','30-39 år'),c(1,2,4)])
###get top 10 observations for each age group, store in data frame
top10 <- by(dk_census.m[order(dk_census.m$Age,-dk_census.m$value),], dk_census.m$Age, head, n=10)
top10.df<-do.call("rbind", as.list(top10))
top10.df
###plot
ggplot(data=top10.df, aes(x=as.factor(Country), y=value)) +
geom_bar(stat="identity")+
coord_flip() +
facet_wrap(~Age)+
labs(title="Immigrants By Country by Age",x="Country of Origin",y="Population")

One option (that I actually strongly suspect you won't be happy with) is this:
p <- ggplot(data=top10.df, aes(x=Country, y=value)) +
geom_bar(stat="identity")+
coord_flip() +
facet_wrap(~Age)+
labs(title="Immigrants By Country by Age",x="Country of Origin",y="Population")
pp <- dlply(.data=top10.df,.(Age),function(x) {x$Country <- reorder(x$Country,x$value); p %+% x})
library(gridExtra)
do.call(grid.arrange,pp)
(Edited to sort each graph.)
Keep in mind that the only reason faceting exists is to plot multiple panels that share a common scale. So when you start asking to facet on some variable, but have the scales be different (oh, and also sort them separately on each panel as well) what you're doing is really no longer faceting. It's just making four different plots and arranging them together.

using lattice (Here I use ``latticeExtrafor ggplot2 theme), you can set torelation=freebetween panels. Here I am using abbreviate = TRUE` to short long labels.
library(latticeExtra)
barchart(value~ Country|Age,data=top10.df,layout=c(2,2),
horizontal=T,
par.strip.text =list(cex=2),
scales=list(y=list(relation='free',cex=1.5,abbreviate=T,
labels=levels(factor(top10.df$Country)))),
# ,cex=1.5,abbreviate=F),
par.settings = ggplot2like(),axis=axis.grid,
main="Immigrants By Country by Age",
ylab="Country of Origin",
xlab="Population")

Related

Not all Counts Appearing with geom_text

I have a data set of several features of several organisms. I'm displaying each feature individually by several different categories individually and in combination (e.g. species, location, population). Both in raw counts and a percentage of the total sample size and a percentage within a give group.
My problem comes when I'm trying to display a stacked bar chart using ggplot for the percent of individuals within a group. Since the groups do not have the same number of individuals in them, I'd like to display the raw number or count of individuals with that feature on their respective bars for context. I've managed to properly display the stacked percentage bar chat and get the number of individuals from the most populous groups to display. I'm having trouble displaying the rest of the groups.
ggplot(data=All.k6,aes(x=Second.Dorsal))+
geom_bar(aes(fill=Species),position="fill")+
scale_y_continuous(labels=scales::percent)+
labs(x="Number of Second Dorsal Spines",y="Percentage of Individuals within Species",title="Second Dorsal Spines")+
geom_text(aes(label=..count..),stat='count',position=position_fill(vjust=0.5))
You need to include a group= aesthetic so that position_fill knows how to position things. In geom_bar, you set the fill= aesthetic, so ggplot assumed you also want to group by that aesthetic. In geom_text it assumes the group is your x= aesthetic. In your case, just add group=Species after your label= aesthetic. Here's an example:
# sample dataset
set.seed(1234)
types <- data.frame(
x=c('A','A','A','B','B','B','C','C','C'),
x1=rep(c('aa','bb','cc'),3)
)
df <- rbind(types[sample(1:9,50,replace=TRUE),])
Plot without grouping:
ggplot(df, aes(x=x)) +
geom_bar(aes(fill=x1),position='fill') +
scale_y_continuous(label=scales::percent) +
geom_text(aes(label=..count..),stat='count',
position=position_fill(vjust=0.5))
Plot with group= aesthetic:
ggplot(df, aes(x=x)) +
geom_bar(aes(fill=x1),position='fill') +
scale_y_continuous(label=scales::percent) +
geom_text(aes(label=..count..,group=x1),stat='count',
position=position_fill(vjust=0.5))

R: draw unique set of factors for each facet in ggplot2 barchart

Suppose we have a set of commodities (apples, bananas, potatoes etc) distributed over different continents. We visualize their distribution on continents via faceted barcharts in ggplot2 package, and these commodities (called in what follows "stuff" field) act as factors to be displayed on x axis. Each continent has its own set of stuff, as shown in the data, although certain commodities can be common (bananas) on two or more continents. Here is the data example in short format. Fields "medium" and "giant" additionally subdivide the market separating out things into medium and big sizes (to be plotted with different colours).
data<-read.csv(text="continent,stuff,average,giant
North America,apples,20,30
North America,bananas,25,32
Europe,bananas,15,25
Europe,potatoes,10,20
Europe,mosquitoes,13,17
Asia,snakes,26,35
Asia,snails,7,15
Asia,pandas,10,20")
First we reduce the data to long format, and next plot it via geom_col() and faceting technique:
library(dplyr)
library(tidyr)
library(ggplot2)
data.tidied<-data %>%
gather(key=size, value=val,-continent,-stuff)
ggplot(data.tidied,aes(x=stuff,y=val,fill=size))+
geom_col(position="dodge")+
facet_grid(~continent)+coord_flip()
All factors in the stuff are aligned across all continents, although most of them are not needed, so there are many gaps. But we don't need any snails in North America and Europe, it is natural to have this field only for the Asia facet and so on. (To make things clearer, you may think of apples/bananas/potatoes as some geographical localities, unique for a continent: we do not have any California in Europe). So: how to display this situation using nevertheless faceting technique of ggplot (or any alternative)? That is: how to draw a unique set of factors for each facet?
You can use facet_wrap instead of facet_grid and specify scales = "free_y" (has to be free_y as you flipped the axes). But it makes the charts look a little odd, in my opinion.
data %>%
gather(size, val, -continent, -stuff) %>%
ggplot(aes(stuff, val)) +
geom_col(aes(fill = size), position = "dodge") +
facet_wrap(~continent, scales = "free_y") +
coord_flip()

Creating density plots from two different data-frames using ggplot2

My goal is to compare the distribution of various socioeconomic factor such as income over multiple years to see how the population has evolved in particular region in say, over 5 years. The primary data for this comes from the Public Use Microdata Sample. I am using R + ggplot2 as my preferred tool.
When comparing two years worth of data (2005 and 2010) I have two data frames hh2005 and hh2010 with the household data for the two years. The income data for the two years are stored in the variable hincp in both data frames. Using ggplot2 I am going about creating the density plot for individual years as follows (example for 2010):
p1 <- ggplot(data = hh2010, aes(x=hincp))+
geom_density()+
labs(title = "Distribution of income for 2010")+
labs(y="Density")+
labs(x="Household Income")
p1
How do I overlay the 2005 density over this plot? I am unable to figure it out as having read data in as hh2010 I am not sure how to proceed. Should I be processing the data in a fundamentally different way from the very beginning?
You can pass data arguments to individual geoms, so you should be able to add the second density as a new geom like this:
p1 <- ggplot(data = hh2010, aes(x=hincp))+
geom_density() +
# Change the fill colour to differentiate it
geom_density(data=hh2005, fill="purple") +
labs(title = "Distribution of income for 2010")+
labs(y="Density")+
labs(x="Household Income")
This is how I would approach the problem:
Tag each data frame with the variable of interest (in this case, the year)
Merge the two data sets
Update the 'fill' aesthetic in the ggplot function
For example:
# tag each data frame with the year^
hh2005$year <- as.factor(2005)
hh2010$year <- as.factor(2010)
# merge the two data sets
d <- rbind(hh2005, hh2010)
d$year <- as.factor(d$year)
# update the aesthetic
p1 <- ggplot(data = d, aes(x=hincp, fill=year)) +
geom_density(alpha=.5) +
labs(title = "Distribution of income for 2005 and 2010") +
labs(y="Density") +
labs(x="Household Income")
p1
^ Note, the 'fill' parameter seems to work best when you use a factor, thus I defined the years as such. I also set the transparency of the overlapping density plots with the 'alpha' parameter.

Transform a ggplot stacked bar into pie chart or alternative

I am having trouble deciding how to graph the data I have.
It consists of overlapping quantities that represent a population, hence my decision to use a stacked bar.
These represent six population divisions ("groups") wherein group 1 and group 2 are the main division. Groups 4 to 6 are subgroups of two, and these are subgroups of each other. Its simple diagram is below:
Note: groups 1 and 2 complete the entire population or group 1 + group 2 = 100%.
I want all of these information in one chart which I do not know what and how to implement.
So far I have the one below, which is wrong because Group 1 is included in the main bar.
require(ggplot2)
require(reshape)
tab <- data.frame(
set=c("XXX","XXX","XXX","XXX","XXX","XXX"),
group=c("1","6","5","4","3","2"),
rate=as.numeric(c(10000,20000,50000,55000,75000,100000))
)
dat <- melt(tab)
dat$time <- factor(dat$group,levels=dat$group)
ggplot(dat,aes(x=set)) +
geom_bar(aes(weight=value,fill=group),position="fill",color="#7F7F7F") +
scale_fill_brewer("Groups", palette="OrRd")
What do you guys suggest to visualize it? I want to use R and ggplot for consistency and uniformity with the other graphs I have made already.
Using facets you can divide your plot into two:
# changed value of set for group 1
tab <- data.frame(
set=c("UUU","XXX","XXX","XXX","XXX","XXX"),
group=c("1","6","5","4","3","2"),
rate=as.numeric(c(10000,20000,50000,55000,75000,100000))
)
# explicitly defined id.vars
dat <- melt(tab, id.vars=c('set','group'))
dat$time <- factor(dat$group,levels=dat$group)
# added facet_wrap, in geom_bar aes changed weight to y,
# added stat="identity", changed position="stack"
ggplot(dat,aes(x=set)) +
geom_bar(aes(y=value,fill=group),position="stack", stat="identity", color="#7F7F7F") +
scale_fill_brewer("Groups", palette="OrRd") +
facet_wrap(~set, scale="free_x")
My guess is what you need is a treemap. Please correct me if I misunderstood your question.
here a link on Treemapping]1
If tree map is what you need you can use either portfolio package or googleVis.

ggplot boxplots with scatterplot overlay (same variables)

I'm an undergrad researcher and I've been teaching myself R over the past few months. I just started trying ggplot, and have run into some trouble. I've made a series of boxplots looking at the depth of fish at different acoustic receiver stations. I'd like to add a scatterplot that shows the depths of the receiver stations. This is what I have so far:
data <- read.csv(".....MPS.csv", header=TRUE)
df <- data.frame(f1=factor(data$Tagging.location), #$
f2=factor(data$Station),data$Detection.depth)
df2 <- data.frame(f2=factor(data$Station), data$depth)
df$f1f2 <- interaction(df$f1, df$f2) #$
plot1 <- ggplot(aes(y = data$Detection.depth, x = f2, fill = f1), data = df) + #$
geom_boxplot() + stat_summary(fun.data = give.n, geom = "text",
position = position_dodge(height = 0, width = 0.75), size = 3)
plot1+xlab("MPS Station") + ylab("Depth(m)") +
theme(legend.title=element_blank()) + scale_y_reverse() +
coord_cartesian(ylim=c(150, -10))
plot2 <- ggplot(aes(y=data$depth, x=f2), data=df2) + geom_point()
plot2+scale_y_reverse() + coord_cartesian(ylim=c(150,-10)) +
xlab("MPS Station") + ylab("Depth (m)")
Unfortunately, since I'm a new user in this forum, I'm not allowed to upload images of these two plots. My x-axis is "Stations" (which has 12 options) and my y-axis is "Depth" (0-150 m). The boxplots are colour-coded by tagging site (which has 2 options). The depths are coming from two different columns in my spreadsheet, and they cannot be combined into one.
My goal is to to combine those two plots, by adding "plot2" (Station depth scatterplot) to "plot1" boxplots (Detection depths). They are both looking at the same variables (depth and station), and must be the same y-axis scale.
I think I could figure out a messy workaround if I were using the R base program, but I would like to learn ggplot properly, if possible. Any help is greatly appreciated!
Update: I was confused by the language used in the original post, and wrote a slightly more complicated answer than necessary. Here is the cleaned up version.
Step 1: Setting up. Here, we make sure the depth values in both data frames have the same variable name (for readability).
df <- data.frame(f1=factor(data$Tagging.location), f2=factor(data$Station), depth=data$Detection.depth)
df2 <- data.frame(f2=factor(data$Station), depth=data$depth)
Step 2: Now you can plot this with the 'ggplot' function and split the data by using the `col=f1`` argument. We'll plot the detection data separately, since that requires a boxplot, and then we'll plot the depths of the stations with colored points (assuming each station only has one depth). We specify the two different plots by referencing the data from within the 'geom' functions, instead of specifying the data inside the main 'ggplot' function. It should look something like this:
ggplot()+geom_boxplot(data=df, aes(x=f2, y=depth, col=f1)) + geom_point(data=df2, aes(x=f2, y=depth), colour="blue") + scale_y_reverse()
In this plot example, we use boxplots to represent the detection data and color those boxplots by the site label. The stations, however, we plot separately using a specific color of points, so we will be able to see them clearly in relation to the boxplots.
You should be able to adjust the plot from here to suit your needs.
I've created some dummy data and loaded into the chart to show you what it would look like. Keep in mind that this is purely random data and doesn't really make sense.

Resources