How to drop unused factors in faceted R ggplot boxplot? - r

Below is some example code I use to make some boxplots:
stest <- read.table(text=" site year conc
south 2001 5.3
south 2001 4.67
south 2001 4.98
south 2002 5.76
south 2002 5.93
north 2001 4.64
north 2001 6.32
north 2003 11.5
north 2003 6.3
north 2004 9.6
north 2004 56.11
north 2004 63.55
north 2004 61.35
north 2005 67.11
north 2006 39.17
north 2006 43.51
north 2006 76.21
north 2006 158.89
north 2006 122.27
", header=TRUE)
require(ggplot2)
ggplot(stest, aes(x=year, y=conc)) +
geom_boxplot(horizontal=TRUE) +
facet_wrap(~site, ncol=1) +
coord_flip() +
scale_y_log10()
Which results in this:
I tried everything I could think of but cannot make a plot where the south facet only contains years where data is displayed (2001 and 2002). Is what I am trying to do possible?
Here is a link (DEAD) to the screenshot showing what I want to achieve:

Use the scales='free.x' argument to facet_wrap. But I suspect you'll need to do more than that to get the plot you're looking for.
Specifically aes(x=factor(year), y=conc) in your initial ggplot call.

A simple way to circumvent your problem (with a fairly good result):
generate separately the two boxplots and then join them together using the grid.arrange command of the gridExtra package.
library(gridExtra)
p1 <- ggplot(subset(stest,site=="north"), aes(x=factor(year), y=conc)) +
geom_boxplot(horizontal=TRUE) + coord_flip() + scale_y_log10(name="")
p2 <- ggplot(subset(stest,site=="south"), aes(x=factor(year), y=conc)) +
geom_boxplot(horizontal=TRUE) + coord_flip() +
scale_y_log10(name="X Title",breaks=seq(4,6,by=.5)) +
grid.arrange(p1, p2, ncol=1)

Related

Ploting in ggplot2 with geom_line() with label

I'm trying to plot this dataset with ggplot2, putting the name of each country in each line geom_line() and with the x axis (Year) and the y axis (with the relevant data from each country).
The DataSet to Edit
This is what I have so far. I wanted to include the name of the country in each line. The problem is that each country has its data in a separate column.
If you want to use ggplot you should bring your data into a "longer" format. Using package tidyr:
df %<>%
pivot_longer(cols=matches("[^Year]"),
names_to="Country",
values_to="Value")
gives you
# A tibble: 108 x 3
Year Country Value
<dbl> <chr> <dbl>
1 1995 Argentina 4122262
2 1995 Bolivia 3409890
3 1995 Brazil 36276255
4 1995 Chile 2222563
5 1995 Colombia 10279222
6 1995 Costa_Rica 1611055
7 1997 Argentina 4100563
8 1997 Bolivia 3391943
9 1997 Brazil 35718095
10 1997 Chile 2208382
Based on this it is easy to plot a line for each country using ggplot2:
ggplot(df, aes(x=Year, y=Value, color=Country)) +
geom_line()
You kind of answered your question. You require the package reshape to bring all countries into a single column.
Year<-c(1991,1992,1993,1994,1995,1996)
Argentina<-c(235,531,3251,3153,13851,16513)
Mexico<-c(16503,16035,3516,3155,30351,16513)
Japan<-c(1651,868416,68165,35135,03,136816)
df<-data.frame(Year,Argentina,Mexico,Japan)
library(reshape2)
df2<- melt(data = df, id.vars = "Year", Cont.Val=c("Argentina","Mexico","Japan"))
library(ggplot2)
ggplot(df2, aes(x=Year, y=value, group=variable, color=variable))+
geom_line()

R ggplot barplot with people name over it

I have data-frame like below for 4 years:
State Sex Year Name Percent
Arizona M 1962 John 0.3
Arizona F 1962 Mary 0.6
Arizona M 1963 Peter 0.4
Arizona F 1963 Jane 0.9
Arizona M 1964 Dave 0.7
Arizona F 1964 Lara 0.3
Arizona M 1965 Den 0.7
Arizona F 1965 Kate 0.2
I need a barplot with people name over it for every year but only with two colors like green and red.
One example is like below:
So in my case:
x-axis are Years
y-axis are Percent
Numbers over barplot are people names and instead of blue I need red and green.
You can do it all in ggplot with stat_summary to place the text as well. The key is to use the cumsum to get the y-positions.
ggplot(df, aes(x=Year, y=Percent, fill=Sex)) +
geom_bar(stat='identity') +
stat_summary(aes(label=Name, order=desc(Sex)), fun.y=cumsum,
position='stack', geom='text', vjust=1)
Here is a solution. The only problem is the position of the text labels : you have to compute them beforehand. My solution assumes there are only two observations a year and that they are ordered M first, F second.
txt <- readLines(n=9)
State Sex Year Name Percent
Arizona M 1962 John 0.3
Arizona F 1962 Mary 0.6
Arizona M 1963 Peter 0.4
Arizona F 1963 Jane 0.9
Arizona M 1964 Dave 0.7
Arizona F 1964 Lara 0.3
Arizona M 1965 Den 0.7
Arizona F 1965 Kate 0.2
df <- read.table(text=txt,head=TRUE,stringsAsFactors = FALSE)
library(ggplot2)
library(dplyr)
df <- group_by(df,Year) %>%
mutate(pos=ifelse(Sex=="M",Percent,Percent+lag(Percent)))
ggplot(df,aes(x=Year,label=Name,fill=Sex)) +
geom_bar(aes(y=Percent),stat="identity",position="stack") +
geom_text(aes(y=pos),vjust=1)

How to reverse coordinates on a line graph ggplot2 R

I'm working on a data visualization project and am making some line graphs. This is my data set:
groupA <- read.csv("afcongroupA.csv", header=T, row.names=NULL)
groupA
Date Team Position
1 1/12 South Africa 56
2 1/12 Angola 85
3 1/12 Morocco 61
4 1/12 Cape Verde Islands 58
5 4/12 South Africa 71
6 4/12 Angola 78
7 4/12 Morocco 62
8 4/12 Cape Verde Islands 76
9 8/12 South Africa 67
10 8/12 Angola 85
11 8/12 Morocco 68
12 8/12 Cape Verde Islands 78
13 12/12 South Africa 87
14 12/12 Angola 84
15 12/12 Morocco 72
16 12/12 Cape Verde Islands 69
I then plotted them on a line graph to show the rise of decline in position standings:
groupA$Date <- factor(groupA$Date, levels=groupA$Date[!duplicated(groupA$Date)])
ggplot(groupA, aes(x=Date, y=Position, colour=Team, group=Team)) + geom_line()
What I want to do is reverse the y-axis so that the largest number is at the bottom. I tried this bit of code:
groupA <- coord_flip() + scale_x_reverse()
But I get this error message:
Error in coord_flip() + scale_x_reverse() :
non-numeric argument to binary operator
I'm using R 2.15.2 on a Mac running OS X.
As your column Date is a factor then scale_x_reverse() won't work. One solution is to order your levels of factors in data frame
groupA$Date <- factor(groupA$Date, levels=rev(unique(groupA$Date)))
Then just use your code to make plot and flip axis.
ggplot(groupA, aes(x=Date, y=Position, colour=Team, group=Team)) +
geom_line()+coord_flip()

Change colour scheme for ggplot geom_polygon in R

I'm creating a map using the maps library and ggplot's geom_polygon. I'd simply like to change the default blue, red, purple colour scheme to something else. I'm extremely new to ggplot so please forgive if I'm just not using the right data types. Here's what the data I'm using looks like:
> head(m)
region long lat group order subregion Group.1 debt.to.income.ratio.mean ratio total
17 alabama -87.46201 30.38968 1 1 <NA> alabama 12.4059 20.51282 39
18 alabama -87.48493 30.37249 1 2 <NA> alabama 12.4059 20.51282 39
19 alabama -87.52503 30.37249 1 3 <NA> alabama 12.4059 20.51282 39
20 alabama -87.53076 30.33239 1 4 <NA> alabama 12.4059 20.51282 39
21 alabama -87.57087 30.32665 1 5 <NA> alabama 12.4059 20.51282 39
22 alabama -87.58806 30.32665 1 6 <NA> alabama 12.4059 20.51282 39
> head(v)
Group.1 debt.to.income.ratio.mean ratio region total
alabama alabama 12.40590 20.51282 alabama 39
alaska alaska 11.05333 33.33333 alaska 6
arizona arizona 11.62867 25.55556 arizona 90
arkansas arkansas 11.90300 5.00000 arkansas 20
california california 11.00183 32.59587 california 678
colorado colorado 11.55424 30.43478 colorado 92
Here's the code:
library(ggplot2)
library(maps)
states <- map_data("state")
m <- merge(states, v, by="region")
m <- m[order(m$order),]
p<-qplot(long, lat, data=m, group=group, fill=ratio, geom="polygon")
I've tried the below and more:
cols <- c("8" = "red","4" = "blue","6" = "darkgreen", "10" = "orange")
p + scale_colour_manual(values = cols)
p + scale_colour_brewer(palette="Set1")
p + scale_color_manual(values=c("#CC6666", "#9999CC"))
The problem is that you are using a color scale but are using the fill aesthetic in the plot. You can use scale_fill_gradient() for two colors and scale_fill_gradient2() for three colors:
p + scale_fill_gradient(low = "pink", high = "green") #UGLY COLORS!!!
I was getting issues with scale_fill_brewer() complaining about a continuous variable supplied when a discrete variable was expected. One easy fix is to create discrete bins with cut() and then use that as the fill aesthetic:
m$breaks <- cut(m$ratio, 5) #Change to number of bins you want
p <- qplot(long, lat, data = m, group = group, fill = breaks, geom = "polygon")
p + scale_fill_brewer(palette = "Blues")

Time Series in R with ggplot2

I'm a ggplot2 newbie and have a rather simple question regarding time-series plots.
I have a data set in which the data is structured as follows.
Area 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
MIDWEST 10 6 13 14 12 8 10 10 6 9
How do I generate a time series when the data is structured in this format.
With the reshape package, I could just alter the data to look like:
totmidc <- melt(totmidb, id="Area")
totmidc
Area variable value
1 MIDWEST 1998 10
2 MIDWEST 1999 6
3 MIDWEST 2000 13
4 MIDWEST 2001 14
5 MIDWEST 2002 12
6 MIDWEST 2003 8
7 MIDWEST 2004 10
8 MIDWEST 2005 10
9 MIDWEST 2006 6
10 MIDWEST 2007 9
Then run the following code to get the desired plot.
ggplot(totmidc, aes(Variable, Value)) + geom_line() + xlab("") + ylab("")
However, is it possible to generate a time series plot from the first
object in which the columns represent the years.
What is the error that ggplot2 gives you? The following seems to work on my machine:
Area <- as.numeric(unlist(strsplit("1998 1999 2000 2001 2002 2003 2004 2005 2006 2007", "\\s+")))
MIDWEST <-as.numeric(unlist(strsplit("10 6 13 14 12 8 10 10 6 9", "\\s+")))
qplot(Area, MIDWEST, geom = "line") + xlab("") + ylab("")
#Or in a dataframe
df <- data.frame(Area, MIDWEST)
qplot(Area, MIDWEST, data = df, geom = "line") + xlab("") + ylab("")
You may also want to check out the ggplot2 website for details on scale_date et al.
I am guessing that with "time series plot" you mean you want to get a bar chart instead of a line chart?
In that case, you have to modify your code only slightly to pass the correct parameters to geom_bar(). The geom_bar default stat is stat_bin, which will calculate a frequency count of your categories on the x-scale. With your data you want to override this behaviour and use stat_identity.
library(ggplot2)
# Recreate data
totmidc <- data.frame(
Area = rep("MIDWEST", 10),
variable = 1998:2007,
value = round(runif(10)*10+1)
)
# Line plot
ggplot(totmidc, aes(variable, value)) + geom_line() + xlab("") + ylab("")
# Bar plot
# Note that the parameter stat="identity" passed to geom_bar()
ggplot(totmidc, aes(x=variable, y=value)) + geom_bar(stat="identity") + xlab("") + ylab("")
This produces the following bar plot:

Resources