How to order breaks with ggplot / geom_bar - r

I have a data.frame with entries like:
variable importance order
1 foo 0.06977263 1
2 bar 0.05532474 2
3 baz 0.03589902 3
4 alpha 0.03552195 4
5 beta 0.03489081 5
...
When plotting the above, with the breaks = variable, I would like for the order to be preserved, rather than placed in alphabetical order.
I am rendering with:
ggplot (data, aes(x=variable, weight=importance, fill=variable)) +
geom_bar() +
coord_flip() + opts(legend.position='none')
However, the ordering of the variable names is alphabetical, and not the order within the data frame. I had seen a post about using "order" in aes, but appears to have no effect.
I am looking to have a breaks ordering in-line with the "order" column.
There seems to be a similar question How to change the order of discrete x scale in ggplot, but frankly, did not understand the answer in this context.

Try:
data$variable <- factor(data$variable, levels=levels(data$variable)[order(-data$order)])
From: ggplot2 sorting a plot Part II

Even shorter and easier to understand:
data$Variable <- reorder(data$Variable, data$order)

Another solution is to plot the order and then change the labels after the fact:
df <- data.frame(variable=letters[c(3,3,2,5,1)], importance=rnorm(5), order=1:5)
p <- qplot(x=order, weight=importance, fill=variable, data=df, geom="bar") +
scale_x_continuous("", breaks=1:5, labels=df$variable) +
coord_flip() + opts(legend.position='none')

A shot in the dark, but maybe something like this:
data$variable <- factor(data$variable, levels=data$variable)

Related

How to rename the categories of a plot in R?

I would like to rename the categories of "income" from 1,2,3,4,5 to the real values of the income in the plot. I tried this code but it does not work. Can somebody please explain me why?
ggplot(data=subset(trips_renamed,income!="99")) +
geom_bar(mapping = aes(x = income,fill="income"))+
scale_x_discrete(labels=c("<=4000","4001-8000","8001-12000","12001-
16000",">16000",position="bottom"))+
labs(y= "Total number of trips", x="Income Classes")+
theme(legend.position = "none")
It would be much easier to find and test an answer if you provided a minimal reproducible example. However, below is shown how to change the scale for a similar plot as in your question.
Since the values for x are numeric we need to use the (somewhat counterintuitive) scale_x_continuous to change the labels on the fly
library(ggplot2)
ggplot(data=mtcars) +
geom_bar(aes(x = gear))+
scale_x_continuous(breaks = 3:5, labels=c("<4", "4-4.9",">4"))
Returns:
It seems your issue has to do with trips_renamed$income being a class "integer" or "numeric". As such, scale_x_discrete() should be replaced with scale_x_continuous(). You can either use scale_x_continuous() or convert to a discrete value (factor), then use scale_x_discrete(). Here are two examples using the following dummy dataset.
set.seed(8675309)
df <- data.frame(income=sample(1:5, 1000, replace=T))
Option 1 : Relabel your continuous axis
If class(trips_renamed$income) is "numeric" or "integer", then you will need to use scale_x_continuous(). Relabeling requires you to specify both breaks= and labels= arguments, and they have to be the same length. This should work:
ggplot(df, aes(x=income)) + geom_bar() +
scale_x_continuous(breaks=1:5, labels=c("<=4000","4001-8000","8001-12000","12001-
16000",">16000"),position="bottom")
Option 2 : Convert to Factor and use Discrete Scale
The other option is to convert to a factor first, then use scale_x_discrete(). Here, you don't need the breaks= argument (the levels of the factor are used):
df$income <- factor(df$income)
ggplot(df, aes(x=income)) + geom_bar() +
scale_x_discrete(labels=c("<=4000","4001-8000","8001-12000","12001-
16000",">16000"),position="bottom")
You get the same plot as above.
Option 2a: Factor and define labels together
If you want to get really crafty, you can define the labels the same time as the factor and they will be used for the axis labels instead of the name of the levels:
df2 <- df
df2$income <- factor(df2$income, labels=c("<=4000","4001-8000","8001-12000","12001-
16000",">16000"))
ggplot(df2, aes(x=income)) + geom_bar()
This together should give you a good idea of how ggplot2 works when choosing how to label the axes.

Can someone explain why my first ggplot2 box plot was just one big box and how the solution worked?

So my first ggplot2 box plot was just one big stretched out box plot, the second one was correct but I don't understand what changed and why the second one worked. I'm new to R and ggplot2, let me know if you can, thanks.
#----------------------------------------------------------
# This is the original ggplot that didn't work:
#----------------------------------------------------------
zSepalFrame <- data.frame(zSepalLength, zSepalWdth)
zPetalFrame <- data.frame(zPetalLength, zPetalWdth)
p1 <- ggplot(data = zSepalFrame, mapping = aes(x=zSepalWdth, y=zSepalLength, group = 4)) + #fill = zSepalLength
geom_boxplot(notch=TRUE) +
stat_boxplot(geom = 'errorbar', width = 0.2) +
theme_classic() +
labs(title = "Iris Data Box Plot") +
labs(subtitle ="Z Values of Sepals From Iris.R")
p1
#----------------------------------------------------------
# This is the new ggplot box plot line that worked:
#----------------------------------------------------------
bp = ggplot(zSepalFrame, aes(x=factor(zSepalWdth), y=zSepalLength, color = zSepalWdth)) + geom_boxplot() + theme(legend.position = "none")
bp
This is what the ggplot box plot looked like
I don't have your precise dataset, OP, but it seems to stem from assigning a continuous variable to your x axis, when boxplots require a discrete variable.
A continuous variable is something like a numeric column in a dataframe. So something like this:
x <- c(4,4,4,8,8,8,8)
Even though the variable x only contains 4's and 8's, R assigns this as a numeric type of variable, which is continuous. It means that if you plot this on the x axis, ggplot will have no issue with something falling anywhere in-between 4 or 8, and will be positioned accordingly.
The other type of variable is called discrete, which would be something like this:
y <- c("Green", "Green", "Flags", "Flags", "Cars")
The variable y contains only characters. It must be discrete, since there is no such thing as something between "Green" and "Cars". If plotted on an x axis, ggplot will group things as either being "Green", "Flags", or "Cars".
The cool thing is that you can change a continuous variable into a discrete one. One way to do that is to factorize or force R to consider a variable as a factor. If you typed factor(x), you get this:
[1] 4 4 4 8 8 8 8
Levels: 4 8
The values in x are the same, but now there is no such thing as a number between 4 and 8 when x is a factor - it would just add another level.
That is in short why your box plot changes. Let's demonstrate with the iris dataset. First, an example like yours. Notice that I'm assigning x=Sepal.Length. In the iris dataset, Sepal.Length is numeric, so continuous.
ggplot(iris, aes(x=Sepal.Length, y=Sepal.Width)) +
geom_boxplot()
This is similar to yours. The reason is that the boxplot is drawn by grouping according to x and then calculating statistics on those groups. If a variable is continuous, there are no "groups", even if data is replicated (like as in x above). One way to make groups is to force the data to be discrete, as in factor(Sepal.Length). Here's what it looks like when you do that:
ggplot(iris, aes(x=factor(Sepal.Length), y=Sepal.Width)) +
geom_boxplot()
The other way to have this same effect would be to use the group= aesthetic, which does what you might think: it groups according to that column in the dataset.
ggplot(iris, aes(x=Sepal.Length), y=Sepal.Width, group=Sepal.Length)) +
geom_boxplot()

colour single ggplot axis item

I have created a chart and am wanting to colour one of the x-axis items based on a variable. I have seen this post (How to get axis ticks labels with different colors within a single axis for a ggplot graph?), but am struggling to apply it to my dataset.
df1 <- data.frame(var=c("a","b","c","a","b","c","a","b","c"),
val=c(99,120,79,22,43,53,12,27,31),
type=c("alpha","alpha","alpha","bravo","bravo","bravo","charlie","charlie","charlie"))
myvar="a"
ggplot(df1,aes(x=reorder(var,-val), y=val,fill=type)) + geom_bar(stat="identity")
Any tips on how to make the x-axis value red when it is equal to myvar?
Update: Thanks to #ddiez for some guidance. I finally came around to the fact that i would have to reorder prior to plotting. I also should have made my original example with data.table, so am not sure if this would influenced original responses. I modified my original dataset to be a data.table and used the following code to achieve success.
df1 <- data.table(var=c("a","b","c","a","b","c","a","b","c"),
val=c(99,120,79,22,43,53,12,27,31),
type=c("alpha","alpha","alpha","bravo","bravo","bravo","charlie","charlie","charlie"))
myvar="a"
df1[,axisColour := ifelse(var==myvar,"red","black")]
df1$var <- reorder(df1$var,-df1$val,sum)
setkey(df1,var,type)
ggplot(df1,aes(x=var, y=val,fill=type)) + geom_bar(stat="identity") +
theme(axis.text.x = element_text(colour=df1[,axisColour[1],by=var][,V1]))
There may be a more elegant solution but a quick hack (requires you to know the final order) would be:
ggplot(df1,aes(x=reorder(var,-val), y=val,fill=type)) +
geom_bar(stat="identity") +
theme(axis.text.x = element_text(colour=c("black","black","red")))
A solution using the variable myvar (yet, there may be better ways):
# reorder the factors in the data.frame (instead of in situ).
df1$var=reorder(df1$var, -df1$val)
# create a vector of colors for each level.
mycol=rep("black", nlevels(df1$var))
names(mycol)=levels(df1$var)
# assign the desired ones a different color.
mycol[myvar]="red"
ggplot(df1,aes(x=var, y=val,fill=type)) +
geom_bar(stat="identity") +
theme(axis.text.x = element_text(colour=mycol))

how to display factor id on plot axis and factor id and value in legend?

I have a plot where the request is a factor with long values so they don't display on the char axis.
plot( time_taken ~ request )
The data in this case looks like:
time_taken request
1 7 /servlet1/endpoint2/
2 2 /session/
3 10 /servlet1/endpoint3/
4 2 /servlet1/endpoint2/
5 8 /servlet4/endpoint2/
6 5 /session/
...
Question: Is there a way to plot something like the factor level id on the x axis, and the factor level id + factor full string in the legend?
The code in your question generates a boxplot, so I assume that's what you want. Here are four ways to go about it.
This will generate a boxplot with the x-axis numbered, and the full names in the legend.
library(ggplot2)
ggplot(df) +
geom_boxplot(aes(x=as.integer(request),y=time_taken, color=request))+
labs(x="request")
As you can see below, though, with ggplot the labels are discernible (at least in the example).
ggp <- ggplot(df) + geom_boxplot(aes(x=request,y=time_taken))
ggp
In a situation like this I'd be inclined to rotate the plot.
ggp + coord_flip()
Finally, here's a way in base R, although IMO it's the least appealing option.
plot(time_taken~factor(as.integer(request)),df, xlab="request")
labs <- with(df,paste(as.integer(sort(unique(request))),sort(unique(request)),sep=" - "))
legend("topright",legend=labs)
A possible solution uses ggplot2. Following an example with some sample data.
df <- data.frame(factor = c("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa",
"bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb",
"ccccccccccccccccccccccccccccccccccccc"),
time = c(5, 7, 9))
library(ggplot2)
qplot(data = df, factor, time) + scale_x_discrete(labels = abbreviate)
You can also use directly the function abbreviate on your factor levels in your data frame, so that you can work with abbreviated labels, also avoiding ggplot2, if you're not familiar with it.
Look at ?abbreviate

How to add vertical lines to ggplot boxplots in R

I am plotting boxplots from this data:
MY_LABEL MY_REAL MY_CATEGORY
1 [POS] .56 POS
1 [POS] .57 POS
1 [POS] .37 POS
2 [POS] .51 POS
1 [sim v] .65 sim v
...
I'm using ggplot2:
ggplot( data=myDF, aes( x=MY_LABEL, y=MY_REAL, fill=MY_CATEGORY ) ) +
scale_colour_manual( values=palette ) +
coord_flip() +
geom_boxplot( outlier.size = 0 )
This works fine, and groups the boxplots by the field MY_CATEGORY:
I'd like to do 2 things:
1) To improve the clarity of this plot, I'd like to add separators between the various blocks, i.e. between POS and sim v, between sim v and C, etc (see the ugly red lines in the plot).
I've been struggling with geom_vline with no luck.
Alternatively, I'd like to add blank space between the blocks.
2) If I print this plot in grayscale, I can't distinguish the different blocks. I'm trying to force a different palette with:
scale_colour_manual( values=c("black","darkgray","gray","white") )
Again, no luck, the plot doesn't change at all.
What would you suggest to do?
Would this work for you?
require(ggplot2)
mtcars$cyl2<- ifelse(mtcars$cyl > 4, c('A'), c('B'))
p <- ggplot(mtcars, aes(factor(cyl), mpg))
p + geom_boxplot() + facet_grid(. ~ cyl2, scales = "free", space = "free")
would give something like this,
No one covered the horizontal line route, so I thought I'd add it. Not sure why geom_vline() wasn't working for you. Here's what I did (chose to play off of Eric Fail's approach):
require(ggplot2)
p <- ggplot(mtcars, aes(factor(cyl), mpg))
p <- p + geom_boxplot(aes(fill=factor(cyl))) + coord_flip()
p <- p + geom_vline(xintercept=c(1.5,2.5))
p
There's only three boxplots here, but in playing around, ggplot appears to place them at integer locations. Just figure out which box you want a line after (nth) and put the xintercept argument at n+0.5 for the line. You can obviously change the thickness and color to your liking: just add a size=width and colour="name" after the xintercept bit.
By the way, geom_vline() seems to work for me regardless of whether it's before or after coord_flip(). I find that counter-intuitive.
I'm not sure bdemarest is correct that you need the names to match the category names. I think the issue is that you used scale_colour_manual(), which applies if you used aes(..., colour=var) whereas you used fill=var. Thus, you need scale_fill_manual. Building on the above, we can add:
p <- p + scale_fill_manual(values=c("black","gray","white"))
p
Note that I've not defined any factor names for the colors to match. I think the colors are simply applied to your factor levels according to their order, but I could be wrong.
The end result of all of the above:
To change the fill colors, you need a named vector of values. The names need exactly match the y-axis category names.
scale_fill_manual(values=c("POS"="black", "sim v"="gray50",
"C"="gray80", "sim t"="white"))
To separate the y-axis categories, try facet_grid().
facet_grid(factor(MY_CATEGORY) ~ ., drop=TRUE)
I'm not sure that this will work because I don't have your data to test it.

Resources