I want to plot multiple parallel coordinate plots from one dataset. Currently I have a working solution with split and l_ply which produces 4 ggplot2 objects. I would like to solve this with facet_wrap or facet_grid to have a more compact layout and a single legend. Is this possible?
With a normal ggplot2 object (boxplot) facet_wrap works perfectly. With the GGally functionggparcoord() I get the error Error in layout_base(data, vars, drop = drop) :
At least one layer must contain all variables used for facetting
What am I doing wrong?
require(GGally)
require(ggplot2)
# Example Data
x <- data.frame(var1=rnorm(40,0,1),
var2=rnorm(40,0,1),
var3=rnorm(40,0,1),
type=factor(rep(c("x", "y"), length.out=40)),
set=factor(rep(c("A","B","C","D"), each=10))
)
# this works
p1 <- ggplot(x, aes(x=type, y=var1, group=type)) + geom_boxplot()
p1 <- p1 + facet_wrap(~ set)
p1
# this does not work
p2 <- ggparcoord(x, columns=1:3, groupColumn=4)
p2 <- p2 + facet_wrap(~ set)
p2
Any suggestions are appreciated! Thank you!
You can't use directly facet_wrap() with function ggparcoord() because this function use as data only those columns which are specified in call to this function. It can be seen by looking on data element of p2. There is no column named set.
p2 <- ggparcoord(x, columns=1:3, groupColumn=4)
head(p2$data)
type .ID anyMissing variable value
1 x 1 FALSE var1 0.95473093
2 y 2 FALSE var1 -0.05566205
3 x 3 FALSE var1 2.57548872
4 y 4 FALSE var1 0.14508261
5 x 5 FALSE var1 -0.92022584
6 y 6 FALSE var1 -0.05594902
To get the same type of plot with faceting, first, you need to add new column (contains just numbers corresponding to number of cases) to existing data frame and then reshape this data frame.
x$ID<-1:40
df.x<-melt(x,id.vars=c("set","ID","type"))
Then use function ggplot() and geom_line() to plot data.
ggplot(df.x,aes(x=variable,y=value,colour=type,group=ID))+
geom_line()+facet_wrap(~set)
Related
I'm currently trying to develop a surface plot that examines the results of the below data frame. I want to plot the increasing values of noise on the x-axis and the increasing values of mu on the y-axis, with the point estimate values on the z-axis. After looking at ggplot2 and ggplotly, it's not clear how I would plot each of these columns in surface or 3D plot.
df <- "mu noise0 noise1 noise2 noise3 noise4 noise5
1 1 0.000000 0.9549526 0.8908646 0.919630 1.034607
2 2 1.952901 1.9622004 2.0317115 1.919011 1.645479
3 3 2.997467 0.5292921 2.8592976 3.034377 3.014647
4 4 3.998339 4.0042379 3.9938346 4.013196 3.977212
5 5 5.001337 4.9939060 4.9917115 4.997186 5.009082
6 6 6.001987 5.9929932 5.9882173 6.015318 6.007156
7 7 6.997924 6.9962483 7.0118066 6.182577 7.009172
8 8 8.000022 7.9981131 8.0010066 8.005220 8.024569
9 9 9.004437 9.0066182 8.9667536 8.978415 8.988935
10 10 10.006595 9.9987245 9.9949733 9.993018 10.000646"
Thanks in advance.
Here's one way using geom_tile(). First, you will want to get your data frame into more of a Tidy format, where the goal is to have columns:
mu: nothing changes here
noise: need to combine your "noise0", "noise1", ... columns together, and
z: serves as the value of the noise and we will apply the fill= aesthetic using this column.
To do that, I'm using dplyr and gather(), but there are other ways (melt(), or pivot_longer() gets you that too). I'm also adding some code to pull out just the number portion of the "noise" columns and then reformatting that as an integer to ensure that you have x and y axes as numeric/integers:
# assumes that df is your data as data.frame
df <- df %>% gather(key="noise", value="z", -mu)
df <- df %>% separate(col = "noise", into=c('x', "noise"), sep=5) %>% select(-x)
df$noise <- as.integer(df$noise)
Here's an example of how you could plot it, but aesthetics are up to you. I decided to also include geom_text() to show the actual values of df$z so that we can see better what's going on. Also, I'm using rainbow because "it's pretty" - you may want to choose a more appropriate quantitative comparison scale from the RColorBrewer package.
ggplot(df, aes(x=noise, y=mu, fill=z)) + theme_bw() +
geom_tile() +
geom_text(aes(label=round(z, 2))) +
scale_fill_gradientn(colors = rainbow(5))
EDIT: To answer OP's follow up, yes, you can also showcase this via plotly. Here's a direct transition:
p <- plot_ly(
df, x= ~noise, y= ~mu, z= ~z,
type='mesh3d', intensity = ~z,
colors= colorRamp(rainbow(5))
)
p
Static image here:
A much more informative way to show this particular set of information is to see the variation of df$z as it relates to df$mu by creating df$delta_z and then using that to plot. (you can also plot via ggplot() + geom_tile() as above):
df$delta_z <- df$z - df$mu
p1 <- plot_ly(
df, x= ~noise, y= ~mu, z= ~delta_z,
type='mesh3d', intensity = ~delta_z,
colors= colorRamp(rainbow(5))
)
Giving you this (static image here):
ggplot accepts data in the long format, which means that you need to melt your dataset using, for example, a function from the reshape2 package:
dfLong = melt(df,
id.vars = "mu",
variable.name = "noise",
value.name = "meas")
The resulting column noise contains entries such as noise0, noise1, etc. You can extract the numbers and convert to a numeric column:
dfLong$noise = with(dfLong, as.numeric(gsub("noise", "", noise)))
This converts your data to:
mu noise meas
1 1 0 1.0000000
2 2 0 2.0000000
3 3 0 3.0000000
...
As per ggplot documentation:
ggplot2 can not draw true 3D surfaces, but you can use geom_contour(), geom_contour_filled(), and geom_tile() to visualise 3D surfaces in 2D.
So, for example:
ggplot(dfLong,
aes(x = noise
y = mu,
fill = meas)) +
geom_tile() +
scale_fill_gradientn(colours = terrain.colors(10))
Produces:
I had a data frame with 750 observations and 250 columns, and I would like to plot two density plots on top of each other. In one case, a particular factor is present, in the other it isn't (commercial activities against non-commercial activities).
I created a subset of the data
CommercialActivityData <- subset(MbadSurvey, Q2== 1)
NonCommercialActivityData <- subset(MbadSurvey, Q2== 2)
I then tried to plot this as follows
p1 <- ggplot(CommercialActivityData, aes(x = water_use_PP)) + geom_density()
p1
However, when I do, I get the following error message
Error: Aesthetics must be either length 1 or the same as the data (51): x
I have 51 data values where there is commercial, and 699 where there isn't.
EDIT: new code!!
I don't have access to your data set so I have simulated your data:
# Creating the data frame
MbadSurvey <- data.frame("water_use_PP"=runif(1000,1,100),
"Q2"=as.factor(round(runif(1000,1,2),0)))
# Requiring the package
require(ggplot2)
# Creating 3 different density plots based on the Species
p1 <- ggplot(MbadSurvey, aes(x = water_use_PP,colour = Q2)) + geom_density()
p1
NOTE: The variable Q2 must be a factor!
I have a data frame (100 x 4). The first column is a set of "bins" 0-100, the remaining columns are the counts for each variable of events within each bin (0 to the maximum number of events).
What I'm trying to do is to plot each of the three columns of data (2:4), alongside each other. Because the counts in each of the bins for each of the data sets is close to identical, the data are overlapped in the histogram/barplots I've created, despite my use of beside=true, and position = dodge.
I've set the first column as both numeric and character, but the results are identical- the bars are overlayed on top of each other. (semi-transparent density plots don't work because I want counts not the distribution densities).
The attached code, based on both R and other documentation produced the attached chart.
barplot(BinCntDF$preT,main=NewMain_Trigger, plot=TRUE,
xlab="sample frequency interval counts (0-100 msec bins)",
names.arg=BinCntDF$dT, las=0,
ylab="bin counts", axes=TRUE, xlim=c(0,100),
ylim=c(0,1000), col="red")
geom_bar(position="dodge")
barplot(BinCntDF$postT, beside=TRUE, add=TRUE)
geom_bar()
The goal is to be able to compare the two (or more) data sets side by side on the same axes, without either overlapping the other(s).
I think you have confused barplot with ggplot2. ggplot2 is a library where the function geom_bar comes from and isn't compatible with barplot which comes with Base R.
Simply compare ?barplot and ?geom_bar, and you will see that geom_bar is from the ggplot2 library. To achieve what you're after I have used the ggplot2 library and reshape2.
Step 1
Based on your description, I have assumed that your data looks roughly like this:
df <- data.frame(x = 1:10,
c1 = sample(0:100, replace=TRUE, size=10),
c2 = sample(0:50, replace=TRUE, size=10),
c3 = sample(0:70, replace=TRUE, size=10))
To plot it using ggplot2 you first have to transform the data to a long format instead of a wide format. You can do this using melt function from reshape2.
library(reshape2)
a <- melt(df, id=c("x"))
The output would look something like this
> head(a)
x variable value
1 1 c1 62
2 2 c1 47
3 3 c1 20
4 4 c1 64
5 5 c1 4
6 6 c1 52
Step 2
There are plenty of tutorials online to what ggplot2 does and the arguments. I would recommend you Google, or search through the many threads in SO to understand.
ggplot(a, aes(x=x, y=value, group=variable, fill=variable)) +
geom_bar(stat='identity', position='dodge')
Which gives you the output:
In a nutshell:
group groups the variables of interest
stat=identity ensures that no additional aggregations are made on your data
With that many bins (100) and groups (3) the plot will look messy, but try this:
set.seed(123)
myDF <- data.frame(bins=1:100, x=sample(1:100, replace=T), y=sample(1:100, replace=T), z=sample(1:100, replace=T))
myDF.m <- melt(myDF, id.vars='bins')
ggplot(myDF.m, aes(x=bins, y=value, fill=variable)) + geom_bar(stat='identity', position='dodge')
You could also try plotting w/ facets:
ggplot(myDF.m, aes(x=bins, y=value, fill=variable)) + geom_bar(stat='identity') + facet_wrap(~ variable)
My question maybe very simple but I couldn't find the answer!
I have a matrix with 12 entries and I made a stack barplot with barplot function in R.
With this code:
mydata <- matrix(nrow=2,ncol=6, rbind(sample(1:12, replace=T)))
barplot(mydata, xlim=c(0,25),horiz=T,
legend.text = c("A","B","C","D","E","F"),
col=c("blue","green"),axisnames = T, main="Stack barplot")
Here is the image from the code:
What I want to do is to give each of the group (A:F , only the blue part) a different color but I couldn't add more than two color.
and I also would like to know how can I start the plot from x=2 instead of 0.
I know it's possible to choose the range of x by using xlim=c(2,25) but when I choose that part of my bars are out of range and I get picture like this:
What I want is to ignore the part of bars that are smaller than 2 and start the x-axis from two and show the rest of bars instead of put them out of range.
Thank you in advance,
As already mentioned in the other post is entirely clear your desired output. Here another option using ggplot2. I think the difficulty here is to reshape2 the data, then the plot step is straightforwardly.
library(reshape2)
library(ggplot2)
## Set a seed to make your data reproducible
set.seed(1)
mydata <- matrix(nrow=2,ncol=6, rbind(sample(1:12, replace=T)))
## tranfsorm you matrix to names data.frame
myData <- setNames(as.data.frame(mydata),LETTERS[1:6])
## put the data in the long format
dd <- melt(t(myData))
## transform the fill variable to the desired behavior.
## I used cumsum to bes sure to have a unique value for all VAR2==2.
## maybe you should chyange this step if you want an alternate behvior
## ( see other solution)
dd <- transform(dd,Var2 =ifelse(Var2==1,cumsum(Var2)+2,Var2))
## a simple bar plot
ggplot(dd) +
## use stat identity since you want to set the y aes
geom_bar(aes(x=Var1,fill=factor(Var2),y=value),stat='identity') +
## horizontal rotation and zooming
coord_flip(ylim = c(2, max(dd$value)*2)) +
theme_bw()
Another option using lattice package
I like the formula notation in lattice and its flexibility for flipping coordinates for example:
library(lattice)
barchart(Var1~value,groups=Var2,data=dd,stack=TRUE,
auto.key = list(space = "right"),
prepanel = function(x,y, ...) {
list(xlim = c(2, 2*max(x, na.rm = TRUE)))
})
You do this by using the "add" and "offset" arguments to barplot(), along with setting axes and axisnames FALSE to avoid double-plotting: (I'm throwing in my color-blind color palette, as I'm red-green color-blind)
# Conservative 8-color palette adapted for color blindness, with first color = "black".
# Wong, Bang. "Points of view: Color blindness." nature methods 8.6 (2011): 441-441.
colorBlind.8 <- c(black="#000000", orange="#E69F00", skyblue="#56B4E9", bluegreen="#009E73",
yellow="#F0E442", blue="#0072B2", reddish="#D55E00", purplish="#CC79A7")
mydata <- matrix(nrow=2,ncol=6, rbind(sample(1:12, replace=T)))
cols <- colorBlind.8[1:ncol(mydata)]
bar2col <- colorBlind.8[8]
barplot(mydata[1,], xlim=c(0,25), horiz=T, col=cols, axisnames=T,
legend.text=c("A","B","C","D","E","F"), main="Stack barplot")
barplot(mydata[2,], offset=mydata[1,], add=T, axes=F, axisnames=F, horiz=T, col=bar2col)
For the second part of your question, the "offset" argument is used for the first set of bars also, and you change xlim and use xaxp to adjust the x-axis numbering, and of course you must also adjust the height of the first row of bars to remove the excess offset:
offset <- 2
h <- mydata[1,] - offset
h[h < 0] <- 0
barplot(h, offset=offset, xlim=c(offset,25), xaxp=c(offset,24,11), horiz=T,
legend.text=c("A","B","C","D","E","F"),
col=cols, axisnames=T, main="Stack barplot")
barplot(mydata[2,], offset=offset+h, add=T, axes=F, axisnames=F, horiz=T, col=bar2col)
I'm not entirely sure if this is what you're looking for: 'A' has two values (x1 and x2), but your legend seems to hint otherwise.
Here is a way to approach what you want with ggplot. First we set up the data.frame (required for ggplot):
set.seed(1)
df <- data.frame(
name = letters[1:6],
x1=sample(1:6, replace=T),
x2=sample(1:6, replace=T))
name x1 x2
1 a 5 3
2 b 3 5
3 c 5 6
4 d 3 2
5 e 5 4
6 f 6 1
Next, ggplot requires it to be in a long format:
# Make it into ggplot format
require(dplyr); require(reshape2)
df <- df %>%
melt(id.vars="name")
name variable value
1 a x1 5
2 b x1 3
3 c x1 5
4 d x1 3
5 e x1 5
6 f x1 6
...
Now, as you want some bars to be a different colour, we need to give them an alternate name so that we can assign their colour manually.
df <- df %>%
mutate(variable=ifelse(
name %in% c("b", "d", "f") & variable == "x1",
"highlight_x1",
as.character(variable)))
name variable value
1 a x1 2
2 b highlight_x1 3
3 c x1 4
4 d highlight_x1 6
5 e x1 2
6 f highlight_x1 6
7 a x2 6
8 b x2 4
...
Next, we build the plot. This uses the standard colours:
require(ggplot2)
p <- ggplot(data=df, aes(y=value, x=name, fill=factor(variable))) +
geom_bar(stat="identity", colour="black") +
theme_bw() +
coord_flip(ylim=c(1,10)) # Zooms in on y = c(2,12)
Note that I use coord_flip (which in turn calls coord_cartesian) with the ylim=c(1,10) parameter to 'zoom in' on the data. It doesn't remove the data, it just ignores it (unlike setting the limits in the scale). Now, if you manually specify the colours:
p + scale_fill_manual(values = c(
"x1"="coral3",
"x2"="chartreuse3",
"highlight_x1"="cornflowerblue"))
I would like to simplify the proposed solution by #tedtoal, which was the finest one for me.
I wanted to create a barplot with different colors for each bar, without the need to use ggplot or lettuce.
color_range<- c(black="#000000", orange="#E69F00", skyblue="#56B4E9", bluegreen="#009E73",yellow="#F0E442", blue="#0072B2", reddish="#D55E00", purplish="#CC79A7")
barplot(c(1,6,2,6,1), col= color_range[1:length(c(1,6,2,6,1))])
I'm making a boxplot in which x and fill are mapped to different variables, a bit like this:
ggplot(mpg, aes(x=as.factor(cyl), y=cty, fill=as.factor(drv))) +
geom_boxplot()
As in the example above, the widths of my boxes come out differently at different x values, because I do not have all possible combinations of x and fill values, so .
I would like for all the boxes to be the same width. Can this be done (ideally without manipulating the underlying data frame, because I fear that adding fake data will cause me confusion during further analysis)?
My first thought was
+ geom_boxplot(width=0.5)
but this doesn't help; it adjusts the width of the full set of boxplots for a given x factor level.
This post almost seems relevant, but I don't quite see how to apply it to my situation. Using + scale_fill_discrete(drop=FALSE) doesn't seem to change the widths of the bars.
The problem is due to some cells of factor combinations being not present. The number of data points for all combinations of the levels of cyl and drv can be checked via xtabs:
tab <- xtabs( ~ drv + cyl, mpg)
tab
# cyl
# drv 4 5 6 8
# 4 23 0 32 48
# f 58 4 43 1
# r 0 0 4 21
There are three empty cells. I will add fake data to override the visualization problems.
Check the range of the dependent variable (y-axis). The fake data needs to be out of this range.
range(mpg$cty)
# [1] 9 35
Create a subset of mpg with the data needed for the plot:
tmp <- mpg[c("cyl", "drv", "cty")]
Create an index for the empty cells:
idx <- which(tab == 0, arr.ind = TRUE)
idx
# row col
# r 3 1
# 4 1 2
# r 3 2
Create three fake lines (with -1 as value for cty):
fakeLines <- apply(idx, 1,
function(x)
setNames(data.frame(as.integer(dimnames(tab)[[2]][x[2]]),
dimnames(tab)[[1]][x[1]],
-1),
names(tmp)))
fakeLines
# $r
# cyl drv cty
# 1 4 r -1
#
# $`4`
# cyl drv cty
# 1 5 4 -1
#
# $r
# cyl drv cty
# 1 5 r -1
Add the rows to the existing data:
tmp2 <- rbind(tmp, do.call(rbind, fakeLines))
Plot:
library(ggplot2)
ggplot(tmp2, aes(x = as.factor(cyl), y = cty, fill = as.factor(drv))) +
geom_boxplot() +
coord_cartesian(ylim = c(min(tmp$cty - 3), max(tmp$cty) + 3))
# The axis limits have to be changed to suppress displaying the fake data.
You can now use position_dodge() function.
ggplot(mpg, aes(x=as.factor(cyl), y=cty, fill=as.factor(drv))) +
geom_boxplot(position = position_dodge(preserve = "single"))
Just use the facet_grid() function, makes things a lot easier to visualize:
ggplot(mpg, aes(x=as.factor(drv), y=cty, fill=as.factor(drv))) +
geom_boxplot() +
facet_grid(.~cyl)
See how I switch from x=as.factor(cyl) to x=as.factor(drv).
Once you have done this you can always change the way you want the strips to be displayed and remove margins between the panels... it can easily look like your expected display.
By the way, you don't even need to use the as.factor() before specifying the columns to be used by ggplot(). this again improve the readability of your code.