Plotting matrices of different sizes in one window (in R) - r

I’m trying to create color matrices to illustrate the change in the standardized values of several variables over 25 years. I’ve divided up the variables into a few subcategories and want to show the results for each subcategory in different plots in one window with one colorkey and title. I tried to do this using reshape and ggplot2 using the following code. Because each of the categories have a different number of variables, however, this produces a lot of empty space in the plots.
library(reshape)
library(ggplot2)
v1 <- replicate(7,rnorm(25))
v2 <- replicate(15, rnorm(25))
v3 <- replicate(11, rnorm(25))
v4 <- replicate(9, rnorm(25))
v5 <- replicate(9, rnorm(25))
v <- list(v1,v2,v3, v4, v5)
ggplot(melt(v), aes(x=X1, y=X2)) + facet_wrap(~ L1, ncol=1) +
geom_tile(aes(fill=value)) + ggtitle("Title") +
theme(plot.title = element_text(lineheight=2, face="bold"))
What is a better way of producing plots I need in one window without all the unnecessary blank space? Note that I originally tried to do this using the levelplot function in the lattice package. However, the only way I could figure out was to print each individual levelplot, which produced a color key and title for each plot (not what I wanted).

Is this what you are looking for??
You can get rid of the blank space using scales="free_y" in the call to facet_wrap(...). This forces each facet to have it's own y-axis, but does not force the display of a separate x-axis on each facet. I also added a different color scale (take it out if you prefer the default).
library(ggplot2)
library(reshape2)
library(RColorBrewer)
ggplot(melt(v), aes(x=X1, y=X2)) +
facet_wrap(~ L1, ncol=1,scales="free_y") +
geom_tile(aes(fill=value)) + ggtitle("Title") +
scale_fill_gradientn(colours=rev(brewer.pal(9,"Spectral")))+
theme(plot.title = element_text(lineheight=2, face="bold"))

Related

compare boxplots with a single value

I want to compare the distribution of several variables (here X1 and X2) with a single value (here bm). The issue is that these variables are too many (about a dozen) to use a single boxplot.
Additionaly the levels are too different to use one plot. I need to use facets to make things more organised:
However with this plot my benchmark category (bm), which is a single value in X1 and X2, does not appear in X1 and seems to have several values in X2. I want it to be only this green line, which it is in the first plot. Any ideas why it changes? Is there any good workaround? I tried the options of facet_wrap/facet_grid, but nothing there delivered the right result.
I also tried combining a bar plot with bm and three empty categories with the boxplot. But firstly it looked terrible and secondly it got similarly screwed up in the facetting. Basically any work around would help.
Below the code to create the minimal example displayed here:
# Creating some sample data & loading libraries
library(ggplot2)
library(RColorBrewer)
set.seed(10111)
x=matrix(rnorm(40),20,2)
y=rep(c(-1,1),c(10,10))
x[y==1,]=x[y==1,]+1
x[,2]=x[,2]+20
df=data.frame(x,y)
# creating a benchmark point
benchmark=data.frame(y=rep("bm",2),key=c("X1","X2"),value=c(-0.216936,20.526312))
# melting the data frame, rbinding it with the benchmark
test_dat=rbind(tidyr::gather(df,key,value,-y),benchmark)
# Creating a plot
p_box <- ggplot(data = test_dat, aes(x=key, y=value,color=as.factor(test_dat$y))) +
geom_boxplot() + scale_color_manual(name="Cluster",values=brewer.pal(8,"Set1"))
# The first line delivers the first plot, the second line the second plot
p_box
p_box + facet_wrap(~key,scales = "free",drop = FALSE) + theme(legend.position = "bottom")
The problem only lies int the use of test_dat$y inside the color aes. Never use $ in aes, ggplot will mess up.
Anyway, I think you plot would improve if you use a geom_hline for the benchmark, instead of hacking in a single value boxplot:
library(ggplot2)
library(RColorBrewer)
ggplot(tidyr::gather(df,key,value,-y)) +
geom_boxplot(aes(x=key, y=value, color=as.factor(y))) +
geom_hline(data = benchmark, aes(yintercept = value), color = '#4DAF4A', size = 1) +
scale_color_manual(name="Cluster",values=brewer.pal(8,"Set1")) +
facet_wrap(~key,scales = "free",drop = FALSE) +
theme(legend.position = "bottom")

Multiple plots using ggplot2

I'm trying to recreate a graph I made in excel, using ggplot2 in R and I'm having some trouble.
My variable 1 is a continuous variable (prices) and my variable two is a discrete one (cashflows)- both plotted against the same time steps.
As you noticed one variable is plotted using bars and the other using a line.
Is there any way someone could give me some help using random values? I was only able to plot them as lines.
In the sample code below v1 is the prices, v2 is the cashflows, and time is a seq(1:270)
gdata = data.frame(num = time, prices=v2, cashflows = v3)
test_data <- melt(gdata, id="num")
ggplot(data=test_data, aes(x=num, y=value, colour=variable)) +
geom_line() +
ggtitle("Prices") +
labs(x="Time",y="Prices") + theme_grey(base_size = 14) + theme(legend.title=element_blank())

ggplot2 stacked barplots, formatting, and grids

In the data that I am attempting to plot, each sample belongs in one of several groups, that will be plotted on their own grids. I am plotting stacked bar plots for each sample that will be ordered in increasing number of sequences, which is an id attribute of each sample.
Currently, the plot (with some random data) looks like this:
(Since I don't have the required 10 rep for images, I am linking it here)
There are couple things I need to accomplish. And I don't know where to start.
I would like the bars not to be placed at its corresponding nseqs value, rather placed next to each other in ascending nseqs order.
I don't want each grid to have the same scale. Everything needs to fit snugly.
I have tried to set scales and size to for facet_grid to free_x, but this results in an unused argument error. I think this is related to the fact that I have not been able to get the scales library loaded properly (it keeps saying not available).
Code that deals with plotting:
ggfdata <- melt(fdata, id.var=c('group','nseqs','sample'))
p <- ggplot(ggfdata, aes(x=nseqs, y=value, fill = variable)) +
geom_bar(stat='identity') +
facet_grid(~group) +
scale_y_continuous() +
opts(title=paste('Taxonomic Distribution - grouped by',colnames(meta.frame)[i]))
Try this:
update.packages()
## I'm assuming your ggplot2 is out of date because you use opts()
## If the scales library is unavailable, you might need to update R
ggfdata <- melt(fdata, id.var=c('group','nseqs','sample'))
ggfdata$nseqs <- factor(ggfdata$nseqs)
## Making nseqs a factor will stop ggplot from treating it as a numeric,
## which sounds like what you want
p <- ggplot(ggfdata, aes(x=nseqs, y=value, fill = variable)) +
geom_bar(stat='identity') +
facet_wrap(~group, scales="free_x") + ## No need for facet_grid with only one variable
labs(title = paste('Taxonomic Distribution - grouped by',colnames(meta.frame)[i]))

R: ggplot, legend control using scale_shape_manual and one data frame

Using scale shape manual in ggplot, I created different values for three different types of factories (squares, triangles, and circles), which corresponds to North, South, and West respectively. Is it possible to have the North/South/West labels in the legend without creating three different data frames for each region? Can I add these labels to the original data frame?
I have one data frame for a plot (as recommended by the ggplot2 book), and with my code below, the default legend lists every row in my data frame, which is repetitive and not what I want.
Basically, I would like to know the best way to label these regions in the plot. The only reason I would like to maintain one data frame is because the code will be easy to use over and over again by just switching the data frame (the benefit of one df mentioned in the ggplot2 book).
I think part of the problem is that I am using scale shape manual to assign values to each point individually. Should I put the North/South/West labels in my data frame and alter my scale shape manual? If so, what is the best way to accomplish this?
Please let me know if my question is unclear. My code is below, and it replicates my plot as it stands. Thanks.
#Data frame
points <- c(3,5,4,7,12)
bars <- c(.8,1.2,1.4,2.1,4)
points_df<-data.frame(points)
row.names(points_df) <- c( "Factory 1","Factory 2","Factory 3","Factory 4","Factory 5" )
df<-data.frame(Output=points,Errors=bars,lev.names= rownames(points_df))
df$lev.names<-factor(df$lev.names,levels=df$lev.names[order(df$Output)])
# GGPLOT #
library(ggplot2)
library(scales)
p2 <- ggplot(df,aes(lev.names,Output,shape=lev.names))
p2 <- p2 +geom_errorbar(aes(ymin=Output-Errors, ymax=Output+Errors), width=0,color="gray40", lty=1, size=0)
p2 <- p2 + geom_point(aes(size=2))
p2 <- p2 + scale_shape_manual(values=c(6,7,6,1,1))
p2 <- p2 + theme_bw() + xlab(" ") + ylab("Output")
p2 <- p2 + opts(title = expression("Production"))
p2 <- p2+ coord_flip()
print(p2)
Yes, put the location in your data.frame and use it in the aes mapping:
df$location <- c("North","South","North","West","West")
p2 <- ggplot(df,aes(lev.names,Output,shape=location)) +
geom_errorbar(aes(ymin=Output-Errors, ymax=Output+Errors),
width=0,color="gray40", lty=1, size=0) +
geom_point(size=3) +
theme_bw() + xlab(" ") + ylab("Output") +
ggtitle(expression("Production")) +
coord_flip()
print(p2)
I've also fixed some other stuff (e.g., opts is deprecated and you don't want to map size, but to set it).

How to control ylim for a faceted plot with different scales in ggplot2?

In the following example, how do I set separate ylims for each of my facets?
qplot(x, value, data=df, geom=c("smooth")) + facet_grid(variable ~ ., scale="free_y")
In each of the facets, the y-axis takes a different range of values and I would like to different ylims for each of the facets.
The defaults ylims are too long for the trend that I want to see.
This was brought up on the ggplot2 mailing list a short while ago. What you are asking for is currently not possible but I think it is in progress.
As far as I know this has not been implemented in ggplot2, yet. However a workaround - that will give you ylims that exceed what ggplot provides automatically - is to add "artificial data". To reduce the ylims simply remove the data you don't want plot (see at the and for an example).
Here is an example:
Let's just set up some dummy data that you want to plot
df <- data.frame(x=rep(seq(1,2,.1),4),f1=factor(rep(c("a","b"),each=22)),f2=factor(rep(c("x","y"),22)))
df <- within(df,y <- x^2)
Which we could plot using line graphs
p <- ggplot(df,aes(x,y))+geom_line()+facet_grid(f1~f2,scales="free_y")
print(p)
Assume we want to let y start at -10 in first row and 0 in the second row, so we add a point at (0,-10) to the upper left plot and at (0,0) ot the lower left plot:
ylim <- data.frame(x=rep(0,2),y=c(-10,0),f1=factor(c("a","b")),f2=factor(c("x","y")))
dfy <- rbind(df,ylim)
Now by limiting the x-scale between 1 and 2 those added points are not plotted (a warning is given):
p <- ggplot(dfy,aes(x,y))+geom_line()+facet_grid(f1~f2,scales="free_y")+xlim(c(1,2))
print(p)
Same would work for extending the margin above by adding points with higher y values at x values that lie outside the range of xlim.
This will not work if you want to reduce the ylim, in which case subsetting your data would be a solution, for example to limit the upper row between -10 and 1.5 you could use:
p <- ggplot(dfy,aes(x,y))+geom_line(subset=.(y < 1.5 | f1 != "a"))+facet_grid(f1~f2,scales="free_y")+xlim(c(1,2))
print(p)
There are actually two packages that solve that problem now:
https://github.com/zeehio/facetscales, and https://cran.r-project.org/package=ggh4x.
I would recommend using ggh4x because it has very useful tools, such as facet grid multiple layers (having 2 variables defining the rows or columns), scaling the x and y-axis as you wish in each facet, and also having multiple fill and colour scales.
For your problems the solution would be like this:
library(ggh4x)
scales <- list(
# Here you have to specify all the scales, one for each facet row in your case
scale_y_continuous(limits = c(2,10),
scale_y_continuous(breaks = c(3, 4))
)
qplot(x, value, data=df, geom=c("smooth")) +
facet_grid(variable ~ ., scale="free_y") +
facetted_pos_scales(y = scales)
I have one example of function facet_wrap
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
facet_wrap(vars(class), scales = "free",
nrow=2,ncol=4)
Above code generates plot as:
my level too low to upload an image, click here to see plot

Resources