I'm making a boxplot in which x and fill are mapped to different variables, a bit like this:
ggplot(mpg, aes(x=as.factor(cyl), y=cty, fill=as.factor(drv))) +
geom_boxplot()
As in the example above, the widths of my boxes come out differently at different x values, because I do not have all possible combinations of x and fill values, so .
I would like for all the boxes to be the same width. Can this be done (ideally without manipulating the underlying data frame, because I fear that adding fake data will cause me confusion during further analysis)?
My first thought was
+ geom_boxplot(width=0.5)
but this doesn't help; it adjusts the width of the full set of boxplots for a given x factor level.
This post almost seems relevant, but I don't quite see how to apply it to my situation. Using + scale_fill_discrete(drop=FALSE) doesn't seem to change the widths of the bars.
The problem is due to some cells of factor combinations being not present. The number of data points for all combinations of the levels of cyl and drv can be checked via xtabs:
tab <- xtabs( ~ drv + cyl, mpg)
tab
# cyl
# drv 4 5 6 8
# 4 23 0 32 48
# f 58 4 43 1
# r 0 0 4 21
There are three empty cells. I will add fake data to override the visualization problems.
Check the range of the dependent variable (y-axis). The fake data needs to be out of this range.
range(mpg$cty)
# [1] 9 35
Create a subset of mpg with the data needed for the plot:
tmp <- mpg[c("cyl", "drv", "cty")]
Create an index for the empty cells:
idx <- which(tab == 0, arr.ind = TRUE)
idx
# row col
# r 3 1
# 4 1 2
# r 3 2
Create three fake lines (with -1 as value for cty):
fakeLines <- apply(idx, 1,
function(x)
setNames(data.frame(as.integer(dimnames(tab)[[2]][x[2]]),
dimnames(tab)[[1]][x[1]],
-1),
names(tmp)))
fakeLines
# $r
# cyl drv cty
# 1 4 r -1
#
# $`4`
# cyl drv cty
# 1 5 4 -1
#
# $r
# cyl drv cty
# 1 5 r -1
Add the rows to the existing data:
tmp2 <- rbind(tmp, do.call(rbind, fakeLines))
Plot:
library(ggplot2)
ggplot(tmp2, aes(x = as.factor(cyl), y = cty, fill = as.factor(drv))) +
geom_boxplot() +
coord_cartesian(ylim = c(min(tmp$cty - 3), max(tmp$cty) + 3))
# The axis limits have to be changed to suppress displaying the fake data.
You can now use position_dodge() function.
ggplot(mpg, aes(x=as.factor(cyl), y=cty, fill=as.factor(drv))) +
geom_boxplot(position = position_dodge(preserve = "single"))
Just use the facet_grid() function, makes things a lot easier to visualize:
ggplot(mpg, aes(x=as.factor(drv), y=cty, fill=as.factor(drv))) +
geom_boxplot() +
facet_grid(.~cyl)
See how I switch from x=as.factor(cyl) to x=as.factor(drv).
Once you have done this you can always change the way you want the strips to be displayed and remove margins between the panels... it can easily look like your expected display.
By the way, you don't even need to use the as.factor() before specifying the columns to be used by ggplot(). this again improve the readability of your code.
Related
This question is a follow-up to: Annotating text on individual facet in ggplot2
I was trying out the code provided in the accepted answer and got something that was strangely different than the result provided. Granted the post is older and I'm using R 3.5.3 and ggplot2 3.1.0, but what I'm getting doesn't seem to make sense.
library(ggplot2)
p <- ggplot(mtcars, aes(mpg, wt)) + geom_point()
p <- p + facet_grid(. ~ cyl)
#below is how the original post created a dataframe for the text annotation
#this will produce an extra facet in the plot for reasons I don't know
ann_text <- data.frame(mpg = 15,wt = 5,lab = "Text",cyl = factor(8,levels = c("4","6","8")))
p+geom_text(data = ann_text,label = "Text")
This is the code from the accepted answer in the linked question. For me it produces the following graph with an extra facet (i.e, an addition categorical variable of 3 seems to have been added to cyl)
#below is an alternative version that produces the correct plot, that is,
#without any extra facets.
ann_text_alternate <- data.frame(mpg = 15,wt = 5,lab = "Text",cyl = 8)
p+geom_text(data = ann_text_alternate,label = "Text")
This gives me the correct graph:
Anybody have any explanations?
What is going on is a factors issue.
First, you facet by cyl, a column in dataset mtcars. This is an object of class "numeric" taking 3 different values.
unique(mtcars$cyl)
#[1] 6 4 8
Then, you create a new dataset, the dataframe ann_text. But you define cyl as an object of class "factor". And what is in this column can be seen with str.
str(ann_text)
#'data.frame': 1 obs. of 4 variables:
# $ mpg: num 15
# $ wt : num 5
# $ lab: Factor w/ 1 level "Text": 1
# $ cyl: Factor w/ 3 levels "4","6","8": 3
R codes factors as integers starting at 1, level "8" is the level number 3.
So when you combine both datasets, there are 4 values for cyl, the original numbers 4, 6 and 8 plus the new number, 3. Hence the extra facet.
This is also the reason why the solution works, in dataframe ann_text_alternate column cyl is a numeric variable taking one of the already existing values.
Another way of making it work would be to coerce cyl to factor when faceting. Note that
levels(factor(mtcars$cyl))
#[1] "4" "6" "8"
And the new dataframe ann_text no longer has a 4th level. Start plotting the graph as in the question
p <- ggplot(mtcars, aes(mpg, wt)) + geom_point()
p <- p + facet_grid(. ~ factor(cyl))
and add the text.
p + geom_text(data = ann_text, label = "Text")
I have two data frames z (1 million observations) and b (500k observations).
z= Tracer time treatment
15 0 S
20 0 S
25 0 X
04 0 X
55 15 S
16 15 S
15 15 X
20 15 X
b= Tracer time treatment
2 0 S
35 0 S
10 0 X
04 0 X
20 15 S
11 15 S
12 15 X
25 15 X
I'd like to create grouped boxplots using time as a factor and treatment as colour. Essentially I need to bind them together and then differentiate between them but not sure how. One way I tried was using:
zz<-factor(rep("Z", nrow(z))
bb<-factor(rep("B",nrow(b))
dumB<-merge(z,zz) #this won't work because it says it's too big
dumB<-merge(b,zz)
total<-rbind(dumB,dumZ)
But z and zz merge won't work because it says it's 10G in size (which can't be right)
The end plot might be similar to this example: Boxplot with two levels and multiple data.frames
Any thoughts?
Cheers,
EDIT: Added boxplot
I would approach it as follows:
# create a list of your data.frames
l <- list(z,b)
# assign names to the dataframes in the list
names(l) <- c("z","b")
# bind the dataframes together with rbindlist from data.table
# the id parameter will create a variable with the names of the dataframes
# you could also use 'bind_rows(l, .id="id")' from 'dplyr' for this
library(data.table)
zb <- rbindlist(l, id="id")
# create the plot
ggplot(zb, aes(x=factor(time), y=Tracer, color=treatment)) +
geom_boxplot() +
facet_wrap(~id) +
theme_bw()
which gives:
Other alternatives for creating your plot:
# facet by 'time'
ggplot(zb, aes(x=id, y=Tracer, color=treatment)) +
geom_boxplot() +
facet_wrap(~time) +
theme_bw()
# facet by 'time' & color by 'id' instead of 'treatment'
ggplot(zb, aes(x=treatment, y=Tracer, color=id)) +
geom_boxplot() +
facet_wrap(~time) +
theme_bw()
In respons to your last comment: to get everything in one plot, you use interaction to distinguish between the different groupings as follows:
ggplot(zb, aes(x=treatment, y=Tracer, color=interaction(id, time))) +
geom_boxplot(width = 0.7, position = position_dodge(width = 0.7)) +
theme_bw()
which gives:
The key is you do not need to perform a merge, which is computationally expensive on large tables. Instead assign a new variable and value (source c(b,z) in my code below) to each dataframe and then rbind. Then it becomes straight forward, my solution is very similar to #Jaap's just with different faceting.
library(ggplot2)
#Create some mock data
t<-seq(1,55,by=2)
z<-data.frame(tracer=sample(t,size = 10,replace = T), time=c(0,15), treatment=c("S","X"))
b<-data.frame(tracer=sample(t,size = 10,replace = T), time=c(0,15), treatment=c("S","X"))
#Add a variable to each table to id itself
b$source<-"b"
z$source<-"z"
#concatenate the tables together
all<-rbind(b,z)
ggplot(all, aes(source, tracer, group=interaction(treatment,source), fill=treatment)) +
geom_boxplot() + facet_grid(~time)
My question maybe very simple but I couldn't find the answer!
I have a matrix with 12 entries and I made a stack barplot with barplot function in R.
With this code:
mydata <- matrix(nrow=2,ncol=6, rbind(sample(1:12, replace=T)))
barplot(mydata, xlim=c(0,25),horiz=T,
legend.text = c("A","B","C","D","E","F"),
col=c("blue","green"),axisnames = T, main="Stack barplot")
Here is the image from the code:
What I want to do is to give each of the group (A:F , only the blue part) a different color but I couldn't add more than two color.
and I also would like to know how can I start the plot from x=2 instead of 0.
I know it's possible to choose the range of x by using xlim=c(2,25) but when I choose that part of my bars are out of range and I get picture like this:
What I want is to ignore the part of bars that are smaller than 2 and start the x-axis from two and show the rest of bars instead of put them out of range.
Thank you in advance,
As already mentioned in the other post is entirely clear your desired output. Here another option using ggplot2. I think the difficulty here is to reshape2 the data, then the plot step is straightforwardly.
library(reshape2)
library(ggplot2)
## Set a seed to make your data reproducible
set.seed(1)
mydata <- matrix(nrow=2,ncol=6, rbind(sample(1:12, replace=T)))
## tranfsorm you matrix to names data.frame
myData <- setNames(as.data.frame(mydata),LETTERS[1:6])
## put the data in the long format
dd <- melt(t(myData))
## transform the fill variable to the desired behavior.
## I used cumsum to bes sure to have a unique value for all VAR2==2.
## maybe you should chyange this step if you want an alternate behvior
## ( see other solution)
dd <- transform(dd,Var2 =ifelse(Var2==1,cumsum(Var2)+2,Var2))
## a simple bar plot
ggplot(dd) +
## use stat identity since you want to set the y aes
geom_bar(aes(x=Var1,fill=factor(Var2),y=value),stat='identity') +
## horizontal rotation and zooming
coord_flip(ylim = c(2, max(dd$value)*2)) +
theme_bw()
Another option using lattice package
I like the formula notation in lattice and its flexibility for flipping coordinates for example:
library(lattice)
barchart(Var1~value,groups=Var2,data=dd,stack=TRUE,
auto.key = list(space = "right"),
prepanel = function(x,y, ...) {
list(xlim = c(2, 2*max(x, na.rm = TRUE)))
})
You do this by using the "add" and "offset" arguments to barplot(), along with setting axes and axisnames FALSE to avoid double-plotting: (I'm throwing in my color-blind color palette, as I'm red-green color-blind)
# Conservative 8-color palette adapted for color blindness, with first color = "black".
# Wong, Bang. "Points of view: Color blindness." nature methods 8.6 (2011): 441-441.
colorBlind.8 <- c(black="#000000", orange="#E69F00", skyblue="#56B4E9", bluegreen="#009E73",
yellow="#F0E442", blue="#0072B2", reddish="#D55E00", purplish="#CC79A7")
mydata <- matrix(nrow=2,ncol=6, rbind(sample(1:12, replace=T)))
cols <- colorBlind.8[1:ncol(mydata)]
bar2col <- colorBlind.8[8]
barplot(mydata[1,], xlim=c(0,25), horiz=T, col=cols, axisnames=T,
legend.text=c("A","B","C","D","E","F"), main="Stack barplot")
barplot(mydata[2,], offset=mydata[1,], add=T, axes=F, axisnames=F, horiz=T, col=bar2col)
For the second part of your question, the "offset" argument is used for the first set of bars also, and you change xlim and use xaxp to adjust the x-axis numbering, and of course you must also adjust the height of the first row of bars to remove the excess offset:
offset <- 2
h <- mydata[1,] - offset
h[h < 0] <- 0
barplot(h, offset=offset, xlim=c(offset,25), xaxp=c(offset,24,11), horiz=T,
legend.text=c("A","B","C","D","E","F"),
col=cols, axisnames=T, main="Stack barplot")
barplot(mydata[2,], offset=offset+h, add=T, axes=F, axisnames=F, horiz=T, col=bar2col)
I'm not entirely sure if this is what you're looking for: 'A' has two values (x1 and x2), but your legend seems to hint otherwise.
Here is a way to approach what you want with ggplot. First we set up the data.frame (required for ggplot):
set.seed(1)
df <- data.frame(
name = letters[1:6],
x1=sample(1:6, replace=T),
x2=sample(1:6, replace=T))
name x1 x2
1 a 5 3
2 b 3 5
3 c 5 6
4 d 3 2
5 e 5 4
6 f 6 1
Next, ggplot requires it to be in a long format:
# Make it into ggplot format
require(dplyr); require(reshape2)
df <- df %>%
melt(id.vars="name")
name variable value
1 a x1 5
2 b x1 3
3 c x1 5
4 d x1 3
5 e x1 5
6 f x1 6
...
Now, as you want some bars to be a different colour, we need to give them an alternate name so that we can assign their colour manually.
df <- df %>%
mutate(variable=ifelse(
name %in% c("b", "d", "f") & variable == "x1",
"highlight_x1",
as.character(variable)))
name variable value
1 a x1 2
2 b highlight_x1 3
3 c x1 4
4 d highlight_x1 6
5 e x1 2
6 f highlight_x1 6
7 a x2 6
8 b x2 4
...
Next, we build the plot. This uses the standard colours:
require(ggplot2)
p <- ggplot(data=df, aes(y=value, x=name, fill=factor(variable))) +
geom_bar(stat="identity", colour="black") +
theme_bw() +
coord_flip(ylim=c(1,10)) # Zooms in on y = c(2,12)
Note that I use coord_flip (which in turn calls coord_cartesian) with the ylim=c(1,10) parameter to 'zoom in' on the data. It doesn't remove the data, it just ignores it (unlike setting the limits in the scale). Now, if you manually specify the colours:
p + scale_fill_manual(values = c(
"x1"="coral3",
"x2"="chartreuse3",
"highlight_x1"="cornflowerblue"))
I would like to simplify the proposed solution by #tedtoal, which was the finest one for me.
I wanted to create a barplot with different colors for each bar, without the need to use ggplot or lettuce.
color_range<- c(black="#000000", orange="#E69F00", skyblue="#56B4E9", bluegreen="#009E73",yellow="#F0E442", blue="#0072B2", reddish="#D55E00", purplish="#CC79A7")
barplot(c(1,6,2,6,1), col= color_range[1:length(c(1,6,2,6,1))])
I want to make a 2x4 array of plots that show distributions changing over time. The default ggplot arrangement with facet_wrap is that the top row has series 1&2, the second row has series 3&4, etc. I would like to change this so that the first column has series in order (1->2->3->4) and then the second column has the next 4 series. This way your eye can compare immediately adjacent distributions in time vertically (as I think they should be).
Use the direction dir parameter to facet_wrap(). Default is horizontal, and this can be switched to vertical:
# Horizontally
ggplot(mtcars, aes(x=hp, y=mpg)) + geom_point() + facet_wrap(~ cyl, ncol=2)
# Vertically
ggplot(mtcars, aes(x=hp, y=mpg)) + geom_point() + facet_wrap(~ cyl, ncol=2, dir="v")
Looks like you need to do this with the ordering factor prior to the the facet_wrap call:
fac <- factor( fac, levels=as.character(c(1, 10, 2, 20, 3, 30, 4, 40) ) )
The default for as/table in facet_wrap is TRUE which is going to put the lowest value ("1" in this case) at the upper left and the highest value ("40" in the example above) at the lower right corner. So:
pl + facet_wrap(~fac, ncol=2, nrow=4)
Your comments suggest you are working with numeric class variables. (Your comments still do not provide a working example and you seem to think this is our responsibility and not yours. Where does one acquire such notions of entitlement?) This should create a factor that might be "column major" ordered with either numeric of factor input:
> ss <- 1:8; factor(ss, levels=ss[matrix(ss, ncol=2, byrow=TRUE)])
[1] 1 2 3 4 5 6 7 8
Levels: 1 3 5 7 2 4 6 8
On the other hand I can think of situations where this might be the effective approach:
> ss <- 1:8; factor(ss, levels=ss[matrix(ss, nrow=2, byrow=TRUE)])
[1] 1 2 3 4 5 6 7 8
Levels: 1 5 2 6 3 7 4 8
I want to plot multiple parallel coordinate plots from one dataset. Currently I have a working solution with split and l_ply which produces 4 ggplot2 objects. I would like to solve this with facet_wrap or facet_grid to have a more compact layout and a single legend. Is this possible?
With a normal ggplot2 object (boxplot) facet_wrap works perfectly. With the GGally functionggparcoord() I get the error Error in layout_base(data, vars, drop = drop) :
At least one layer must contain all variables used for facetting
What am I doing wrong?
require(GGally)
require(ggplot2)
# Example Data
x <- data.frame(var1=rnorm(40,0,1),
var2=rnorm(40,0,1),
var3=rnorm(40,0,1),
type=factor(rep(c("x", "y"), length.out=40)),
set=factor(rep(c("A","B","C","D"), each=10))
)
# this works
p1 <- ggplot(x, aes(x=type, y=var1, group=type)) + geom_boxplot()
p1 <- p1 + facet_wrap(~ set)
p1
# this does not work
p2 <- ggparcoord(x, columns=1:3, groupColumn=4)
p2 <- p2 + facet_wrap(~ set)
p2
Any suggestions are appreciated! Thank you!
You can't use directly facet_wrap() with function ggparcoord() because this function use as data only those columns which are specified in call to this function. It can be seen by looking on data element of p2. There is no column named set.
p2 <- ggparcoord(x, columns=1:3, groupColumn=4)
head(p2$data)
type .ID anyMissing variable value
1 x 1 FALSE var1 0.95473093
2 y 2 FALSE var1 -0.05566205
3 x 3 FALSE var1 2.57548872
4 y 4 FALSE var1 0.14508261
5 x 5 FALSE var1 -0.92022584
6 y 6 FALSE var1 -0.05594902
To get the same type of plot with faceting, first, you need to add new column (contains just numbers corresponding to number of cases) to existing data frame and then reshape this data frame.
x$ID<-1:40
df.x<-melt(x,id.vars=c("set","ID","type"))
Then use function ggplot() and geom_line() to plot data.
ggplot(df.x,aes(x=variable,y=value,colour=type,group=ID))+
geom_line()+facet_wrap(~set)