ggplot: order of factors with duplicate levels - r

ggplot changes the order of an axis variable, which I do not want. I know I can change the variable to a factor and specify the levels to get around this, but what if the levels contain duplicate values?
An example is below. The only alternative I can think of is to use reorder(), but I can't get that to preserve the original order of the variable.
require(ggplot2)
season <- c('Sp1', 'Su1', 'Au1', 'Wi1', 'Sp2', 'Su2', 'Au2', 'Wi2', 'Sp3', 'Su3', 'Au3', 'Wi3') # this is the order I want the seasons to appear in
tempa <- rnorm(12, 15)
tempb <- rnorm(12, 20)
df <- data.frame(season=rep(season, 2), temp=c(tempa, tempb), type=c(rep('A',12), rep('B',12)))
# X-axis order wrong:
ggplot(df, aes(x=season, y=temp, colour=type, group=type)) + geom_point() + geom_line()
# X-axis order correct, but warning of duplicate levels in factor
df$season2 <- factor(df$season, levels=df$season)
ggplot(df, aes(x=season2, y=temp, colour=type, group=type)) + geom_point() + geom_line()

Just so this has an answer, this works just fine:
df$season2 <- factor(df$season, levels=unique(df$season))
ggplot(df, aes(x=season2, y=temp, colour=type, group=type)) +
geom_point() +
geom_line()

Related

Ordering alphanumeric variables for plotting

How to I order a set of variable names along the x-axis that contain letters and numbers? So these come from a survey where the variables are formatted like var1, below. But when plotted, they appear out_1, out_10, out_11...
But what I would like is for it to be plotted out_1, out_2...
library(tidyverse)
var1<-rep(paste0('out','_', seq(1,12,1)), 100)
var2<-rnorm(n=length(var1) ,mean=2)
df<-data.frame(var1, var2)
ggplot(df, aes(x=var1, y=var2))+geom_boxplot()
I tried this:
df %>%
separate(var1, into=c('A', 'B'), sep='_') %>%
arrange(B) %>%
ggplot(., aes(x=B, y=var2))+geom_boxplot()
You can order the levels of var1 before plotting:
levels(df$var1) <- unique(df$var1)
ggplot(df, aes(var1,var2)) + geom_boxplot()
Or you can specify the order in ggplot scale options:
ggplot(df, aes(var1,var2)) +
geom_boxplot() +
scale_x_discrete(labels = unique(df$var1))
Both cases will give the same result:
You can also use it to give personalized labels; there's no need to create a new variable:
ggplot(df, aes(var1, var2)) +
geom_boxplot() +
scale_x_discrete('output', labels = gsub('out_', '', unique(df$var1)))
Check ?discrete_scale for details. You can use breaks and labels in different combinations, including the use of labels that came from outside your data.frame:
pers.labels <- paste('Output', 1:12)
ggplot(df, aes(var1, var2)) +
geom_boxplot() +
scale_x_discrete(NULL, labels = pers.labels)

scatterplot with no x variable

My data set has a response variable and a 2-level factor explanatory variable. Is there a function for creating a scatter plot with no x axis variable? I'd like the variables to be randomly spread out along the x axis to make them easier to see and differentiate the 2 groups by color. I'm able to create a plot by creating an "ID" variable, but I'm wondering if it's possible to do it without it? The "ID" variable is causing problems when I try to add + facet_grid(. ~ other.var) to view the same plot broken out by another factor variable.
#Create dummy data set
response <- runif(500)
group <- c(rep('group1',250), rep('group2',250))
ID <- c(seq(from=1, to=499, by=2), seq(from=2, to=500, by=2))
data <- data.frame(ID, group, response)
#plot results
ggplot() +
geom_point(data=data, aes(x=ID, y=response, color=group))
How about using geom_jitter, setting the x axis to some fixed value?
ggplot() +
geom_jitter(data=data, aes(x=1, y=response, color=group))
You could plot x as the row number?
ggplot() +
geom_point(data=data, aes(x=1:nrow(data), y=response, color=group))
Or randomly order it first?
RandomOrder <- sample(1:nrow(data), nrow(data))
ggplot() +
geom_point(data=data, aes(x= RandomOrder, y=response, color=group))
Here's how you can scatter plot a variable against row index without intermediate variable:
ggplot(data = data, aes(y = response, x = seq_along(response), color = group)) +
geom_point()
To shuffle row index just add a sample function, like this:
ggplot(data = data, aes(y = response, x = sample(seq_along(response)), color = group)) +
geom_point()

combining geom_ribbon when x is a factor

Leading on from this question.
I cannot generate the shaded area when my x is a factor.
Here is some sample data
time <- as.factor(c('A','B','C','D'))
x <- c(1.00,1.03,1.03,1.06)
x.upper <- c(0.91,0.92,0.95,0.90)
x.lower <- c(1.11,1.13,1.17,1.13)
df <- data.frame(time, x, x.upper, x.lower)
ggplot(data = df,aes(time,x))+
geom_ribbon(aes(x=time, ymax=x.upper, ymin=x.lower), fill="pink", alpha=.5) +
geom_point()
when i substitute factor into the aes() I still cannot get the shaded region. Or if i try this:
ggplot()+
geom_ribbon(data = df, aes(x=time, ymax=x.upper, ymin=x.lower), fill="pink", alpha=.5) +
geom_point(data = df, aes(time,x))
I still cannot get the shading. Any ideas how to overcome this...
I think aosmith was exactly right, you simply need to convert your factor variable to numeric. I think the following code is what you're looking for:
ggplot(data = df,aes(as.numeric(time),x))+
geom_ribbon(aes(x=as.numeric(time), ymax=x.upper, ymin=x.lower),
fill="pink", alpha=.5) +
geom_point()
Which produces this plot:
EDIT EDIT: Change x-axis labels back to their original values taken from #aosmith in the comments below:
ggplot(data = df,aes(as.numeric(time),x))+
geom_ribbon(aes(x=as.numeric(time), ymax=x.upper, ymin=x.lower),
fill="pink", alpha=.5) +
geom_point() + labs(title="My ribbon plot",x="Time",y="Value") +
scale_x_continuous(breaks = 1:4, labels = levels(df$time))

Not able to order fill colors as desired

I need to produce a plot with two separate geom_area commands, in order to draw some time series above and some below the zero line. Here is a simple example:
library("reshape2")
library("ggplot2")
d <- as.data.frame(t(read.table(sep=";",header=F,row.names=1,text="
year;1999;2000;2001;2002;2003;2004;2005;2006;2007;2008;2009;2010;2011;2012
primary balance;-5.63;-11.88;-15.37;-18.3;-20.09;-21.45;-21.87;-23.25;-26.98;-29.56;-28.92;-28.46;-29.64;-32.61
snow-ball effect;1.61;0.81;2.67;4.99;7.23;8.02;9.45;9.6;11.01;14.06;22.81;24.41;25.76;26.89
adjustment;2.83;5.38;6.52;3.93;2.28;2.45;3.94;5.28;4.5;6.73;6.94;7.59;7.73;8.07")))
dd <- melt(d,id.vars="year")
dd1 <- subset(dd,variable=="primary balance")
dd2 <- subset(dd,variable!="primary balance")
ggplot()+
geom_area(data=dd1,aes(x=year,y=value,fill=variable,order=variable),alpha=.5)+
geom_area(data=dd2,aes(x=year,y=value,fill=variable,order=variable),alpha=.5)
The plot is:
Although the order of levels for variable is: (1) "primary balance", (2) "snow-ball effect" and (3) "adjustment", there is now way I can tell ggplot to assign colors and put items in the legend in the correct order.
You can change the order using scale_fill_manual(values=..., breaks=...):
ggplot() +
geom_area(data=dd1,
aes(x=year, y=value, fill=variable, order=variable),
alpha=.5) +
geom_area(data=dd2,
aes(x=year, y=value, fill=variable, order=variable),
alpha=.5) +
scale_fill_manual(values=scales::hue_pal()(3),
breaks=c("adjustment",
"snow-ball effect",
"primary balance"))
You can use scale_fill_manual to manually choose colours and decide the ordering in the legend.
ggplot()
+geom_area(data=dd1,aes(x=year,y=value,fill=variable,order=variable),alpha=.5)
+geom_area(data=dd2,aes(x=year,y=value,fill=variable,order=variable),alpha=.5)
+scale_fill_manual(values=c("primary balance"="yellow","snow-ball effect" = "violet","adjustment" = "green"),breaks = levels(dd$variable))
dd <- melt(d,id.vars="year")
o <- ordered(c("primary balance","snow-ball effect","adjustment"), c("primary balance","snow-ball effect","adjustment"))
dd$variable <- ordered(dd$variable, o)
dd1 <- subset(dd,variable=="primary balance")
dd2 <- subset(dd,variable!="primary balance")
p <- ggplot()+ geom_area(data=dd1,aes(x=year,y=value,fill=variable), alpha=.5)
p <- p + geom_area(data=dd2,aes(x=year,y=value,fill=variable), alpha=.5)
p <- p + scale_fill_manual(values = setNames(scales::hue_pal()(length(levels(dd$variable))), o),
breaks = o)
p
Do you want it in that way? The problem is ggplot sorts factors alphabetically. So colors might be changed. With ordered you generate a factor and tell how its levels are sorted.

Aesthetics must either be length one, or the same length as the dataProblems

I would like to make a plot with X values as a subset of the measurement and Y-values as another subset of the measured data.
In the example as below, I have 4 products p1, p2, p3 and p4. Each are priced according to their skew, color and version.
I would like to create a multi-facet plot that depicts the P3 products (Y-axis) vs P1 products (X-axis).
My attempt as below has failed miserably with the following error:
Error: Aesthetics must either be length one, or the same length as
the dataProblems:subset(price, product == "p1"), subset(price, product
== "p3")
library(ggplot2)
product=c("p1","p1","p1","p1","p1","p1","p1","p1","p2","p2","p2","p2","p2","p2","p2","p2","p3","p3","p3","p3","p3","p3","p3","p3","p4","p4","p4","p4","p4","p4","p4","p4")
skew=c("b","b","b","b","a","a","a","a","b","b","b","b","a","a","a","a","b","b","b","b","a","a","a","a","b","b","b","b","a","a","a","a")
version=c(0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2,0.1,0.1,0.2,0.2)
color=c("C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2","C1","C2")
price=c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32)
df = data.frame(product, skew, version, color, price)
# First plot all the data
p1 <- ggplot(df, aes(x=price, y=price, colour=factor(skew))) + geom_point(size=2, shape=19)
p1 <- p1 + facet_grid(version ~ color)
p1 # This gavea very good plot. So far so good
# Now plot P3 vs P1
p1 <- ggplot(df, aes(x=subset(price, product=='p1'), y=subset(price, product=='p3'), colour=factor(skew))) + geom_point(size=2, shape=19)
p1
# failed with: Error: Aesthetics must either be length one, or the same length as the dataProblems:subset(price, product == "p1"), subset(price, product == "p3")
This is the result I am expecting:
It is better to not subset the variables inside aes(), and instead transform your data:
df1 <- unstack(df,form = price~product)
df1$skew <- rep(letters[2:1],each = 4)
p1 <- ggplot(df1, aes(x=p1, y=p3, colour=factor(skew))) +
geom_point(size=2, shape=19)
p1
The problem is that skew isn't being subsetted in colour=factor(skew), so it's the wrong length. Since subset(skew, product == 'p1') is the same as subset(skew, product == 'p3'), in this case it doesn't matter which subset is used. So you can solve your problem with:
p1 <- ggplot(df, aes(x=subset(price, product=='p1'),
y=subset(price, product=='p3'),
colour=factor(subset(skew, product == 'p1')))) +
geom_point(size=2, shape=19)
Note that most R users would write this as the more concise:
p1 <- ggplot(df, aes(x=price[product=='p1'],
y=price[product=='p3'],
colour=factor(skew[product == 'p1']))) +
geom_point(size=2, shape=19)
Similar to #joran's answer. Reshape the df so that the prices for each product are in different columns:
xx <- reshape(df, idvar=c("skew","version","color"),
v.names="price", timevar="product", direction="wide")
xx will have columns price.p1, ... price.p4, so:
ggp <- ggplot(xx,aes(x=price.p1, y=price.p3, color=factor(skew))) +
geom_point(shape=19, size=5)
ggp + facet_grid(color~version)
gives the result from your image.
I hit this error because I was specifying a label attribute in my geom (geom_text) but was specifying a color in the top level aes:
df <- read.table('match-stats.tsv', sep='\t')
library(ggplot2)
# don't do this!
ggplot(df, aes(x=V6, y=V1, color=V1)) +
geom_text(angle=45, label=df$V1, size=2)
To fix this, I just moved the label attribute out of the geom and into the top level aes:
df <- read.table('match-stats.tsv', sep='\t')
library(ggplot2)
# do this!
ggplot(df, aes(x=V6, y=V1, color=V1, label=V1)) +
geom_text(angle=45, size=2)
I encountered this problem because the dataset was filtered wrongly and the resultant data frame was empty. Even the following caused the error to show:
ggplot(df, aes(x="", y = y, fill=grp))
because df was empty.

Resources