I don't know if this question is trivial, but...
I'm trying to plot a group of variables in a similar form as a PAIRS plot.
But instead of using the same variables in the row and columns of the graphic I would like to have diferents variables. For exemple, if I have a dataset with X1,...,X7 and another dataset with Y1,...,Y7.
I've tryed with layout and par(mfrow) but as I want to cross 7 variables x 7 variables it gave me an overflow error.
Is there any way to do this plot matrix 7x7?
Thank you
I'm not aware of a way to do this using pairs(...) in base R, but here's a ggplot solution, assuming your x- and y-values are in dataframes named df.x and df.y.
# create a sample dataset - you have this already...
set.seed(1) # for reproducible example
df.x <- data.frame(matrix(sample(1:50,350,replace=T),nc=7))
df.y <- 2*df.x + rnorm(350,sd=5)
colnames(df.y) <- paste0("Y",1:7)
# this makes the plot - you start here.
library(ggplot2)
library(data.table)
library(reshape2) # for melt(...)
xDT <- data.table(melt(cbind(id=1:nrow(df.x),df.x),id="id",value.name="xval",variable.name="H"),key="id")
yDT <- data.table(melt(cbind(id=1:nrow(df.y),df.y),id="id",value.name="yval",variable.name="V"),key="id")
xy <- xDT[yDT,allow.cartesian=T]
# simulates pairs() in base R
ggp = ggplot(xy,aes(x=xval,y=yval))
ggp = ggp + geom_point()
ggp = ggp + facet_grid(V~H, scales="free")
ggp = ggp + labs(x="",y="")
print(ggp)
This assumes, but does not test, that the number of rows in df.x and df.y are the same.
You do not necessarily need data.tables to do this, but it's likely to be faster if your datasets are large, and the syntax is cleaner.
Related
The whole dataset describes a module (or cluster if you prefer).
In order to reproduce the example, the dataset is available at:
https://www.dropbox.com/s/y1905suwnlib510/example_dataset.txt?dl=0
(54kb file)
You can read as:
test_example <- read.table(file='example_dataset.txt')
What I would like to have in my plot is this
On the plot, the x-axis is my Timepoints column, and the y-axis are the columns on the dataset, except for the last 3 columns. Then I used facet_wrap() to group by the ConditionID column.
This is exactly what I want, but the way I achieved this was with the following code:
plot <- ggplot(dataset, aes(x=Timepoints))
plot <- plot + geom_line(aes(y=dataset[,1],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,2],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,3],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,4],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,5],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,6],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,7],colour = dataset$InModule))
plot <- plot + geom_line(aes(y=dataset[,8],colour = dataset$InModule))
...
As you can see it is not very automated. I thought about putting in a loop, like
columns <- dim(dataset)[2] - 3
for (i in seq(1:columns))
{
plot <- plot + geom_line(aes(y=dataset[,i],colour = dataset$InModule))
}
(plot <- plot + facet_wrap( ~ ConditionID, ncol=6) )
That doesn't work.
I found this topic
Use for loop to plot multiple lines in single plot with ggplot2 which corresponds to my problem.
I tried the solution given with the melt() function.
The problem is that when I use melt on my dataset, I lose information of the Timepoints column to plot as my x-axis. This is how I did:
data_melted <- dataset
as.character(data_melted$Timepoints)
dataset_melted <- melt(data_melted)
I tried using aggregate
aggdata <-aggregate(dataset, by=list(dataset$ConditionID), FUN=length)
Now with aggdata at least I have the information on how many Timepoints for each ConditionID I have, but I don't know how to proceed from here and combine this on ggplot.
Can anyone suggest me an approach.
I know I could use the ugly solution of creating new datasets on a loop with rbind(also given in that link), but I don't wanna do that, as it sounds really inefficient. I want to learn the right way.
Thanks
You have to specify id.vars in your call to melt.data.frame to keep all information you need. In the call to ggplot you then need to specify the correct grouping variable to get the same result as before. Here's a possible solution:
melted <- melt(dataset, id.vars=c("Timepoints", "InModule", "ConditionID"))
p <- ggplot(melted, aes(Timepoints, value, color = InModule)) +
geom_line(aes(group=paste0(variable, InModule)))
p
My dataset looks similar to the one described here( i have more variables=columns and more observations):
dat=cbind(var1=c(100,20,33,400),var2=c(1,0,1,1),var3=c(0,1,0,0))
Now I want to create a bargraph with R where on the x axis one see the names of all the variable, and on the y axis the mean of the respective variable.
As a second task it would be great to show not only the mean, also the standard deviation within the same plot.
It would be nice, solving this with gglopt or qplot.
Thanks
Using base R:
dat <- cbind(var1=c(1,0.20,0.33,4),var2=c(1,0,1,1),var3=c(0,1,0,0))
dat <- as.data.frame(dat) # get this into a data frame as early as possible
barplot(sapply(dat,mean))
Using ggplot
library(ggplot2)
library(reshape2) # for melt(...)
df <- melt(dat)
ggplot(df, aes(x=variable,y=value)) +
stat_summary(fun.y=mean,geom="bar",color="grey20",fill="lightgreen")+
stat_summary(fun.data="mean_sdl",mult=1)
I have three vectors for each I would like to make side-to-side boxplots in ggplot2. Each vector contains observations from three separate samples so ideally I would like to identify each boxplot. I know of course how to accomplish that with the simple boxplot command but in ggplot2, it seems to be more complicated, at least for a newbie such as myself.
Could you please tell me whether there is a painless way to proceed here?
Thank you.
library(ggplot2)
library(reshape2)
# re-create your samples via runif (though I should have set.seed first)
obs_1 <- runif(100)
obs_2 <- runif(100)
obs_3 <- runif(100)
# you need a data frame, but you can do it on the fly
# this makes 3 columns from each of your samples
# then uses melt to do wide to long (which is what geom_boxplot needs
gg <- ggplot(melt(data.frame(obs_1, obs_2, obs_3)), aes(x=variable, y=value))
gg <- gg + geom_boxplot()
gg
You should really make a proper data frame, do the melt and rename column as needed. This was just to show a quick example.
In geom_text(...), the default dataset is only sometimes subsetted based on facet variables. Easiest to explain with an example.
This example attempts to simulate pairs(...) with ggplot (and yes, I know about lattice, and plotmatrix, and ggpairs – the point is to understand how ggplot works).
require(data.table)
require(reshape2) # for melt(…)
require(plyr) # for .(…)
require(ggplot2)
Extract mgp, hp, disp, and wt from mtcars, use cyl as grouping factor
xx <- data.table(mtcars)
xx <- data.table(id=rownames(mtcars),xx[,list(group=cyl, mpg, hp, disp, wt)])
Reshape so we can use ggplot facets.
yy <- melt(xx,id=1:2, variable.name="H", value.name="xval")
yy <- data.table(yy,key="id,group")
ww <- yy[,list(V=H,yval=xval), key="id,group"]
zz <- yy[ww,allow.cartesian=T]
In zz,
H: facet variable for horizontal direction
V: facet variable for vertical direction
xval: x-value for a given facet (given value of H and V)
yval: y-value for a given facet
Now, the following generates something close to pairs(…),
ggp <- ggplot(zz, aes(x=xval, y=yval))
ggp <- ggp + geom_point(subset =.(H!=V), size=3, shape=1)
ggp <- ggp + facet_grid(V~H, scales="free")
ggp <- ggp + labs(x="",y="")
ggp
In other words, the values of xvar and yvar used in geom_point are appropriate for each facet; they have been subsetted based on the value of H and V. However, adding the following to center the variable names in the diagonal facets:
ggp + geom_text(subset = .(H==V),aes(label=factor(H),
x=min(xval)+0.5*diff(range(xval)),
y=min(yval)+0.5*diff(range(yval))),
size=10)
gives this:
It appears that H has been subsetted properly for each facet (e.g. the labels are correct), but xvar and yvar seem to apply to the whole dataset zz, not to the subset corresponding to H and V for each facet.
My question is: In the above, why are xvar and yvar treated differently than H in aes? Is there a way around this? {Note: I am much more interested in understanding why this is happening, than in a workaround.]
One observation is that actually the labels are overplotted:
ggp + geom_text(subset = .(H==V), aes(label=factor(H),
x=min(xval)+0.5*diff(range(xval))
+ runif(length(xval), max=10),
y=min(yval)+0.5*diff(range(yval))
+ runif(length(yval), max=20)), size=10)
adds some noise to the position of the labels, and you can see that for each observation in zz one text is added.
To your original question: From the perspective of ggplot it might be faster to evaluate all aesthetics at once and split later for faceting, which leads to the observed behavior. I'm not sure if doing the evaluation separately for each facet will ever be implemented in ggplot -- the only application I can think of is to aggregate facet-wise, and there are workarounds to achieve this easily. Also, to avoid the overplotting shown above, you'll have to build a table with four observations (one per text) anyway. Makes your code simpler, too.
I would like to plot an INDIVIDUAL box plot for each unrelated column in a data frame. I thought I was on the right track with boxplot.matrix from the sfsmsic package, but it seems to do the same as boxplot(as.matrix(plotdata) which is to plot everything in a shared boxplot with a shared scale on the axis. I want (say) 5 individual plots.
I could do this by hand like:
par(mfrow=c(2,2))
boxplot(data$var1
boxplot(data$var2)
boxplot(data$var3)
boxplot(data$var4)
But there must be a way to use the data frame columns?
EDIT: I used iterations, see my answer.
You could use the reshape package to simplify things
data <- data.frame(v1=rnorm(100),v2=rnorm(100),v3=rnorm(100), v4=rnorm(100))
library(reshape)
meltData <- melt(data)
boxplot(data=meltData, value~variable)
or even then use ggplot2 package to make things nicer
library(ggplot2)
p <- ggplot(meltData, aes(factor(variable), value))
p + geom_boxplot() + facet_wrap(~variable, scale="free")
From ?boxplot we see that we have the option to pass multiple vectors of data as elements of a list, and we will get multiple boxplots, one for each vector in our list.
So all we need to do is convert the columns of our matrix to a list:
m <- matrix(1:25,5,5)
boxplot(x = as.list(as.data.frame(m)))
If you really want separate panels each with a single boxplot (although, frankly, I don't see why you would want to do that), I would instead turn to ggplot and faceting:
m1 <- melt(as.data.frame(m))
library(ggplot2)
ggplot(m1,aes(x = variable,y = value)) + facet_wrap(~variable) + geom_boxplot()
I used iteration to do this. I think perhaps I wasn't clear in the original question. Thanks for the responses none the less.
par(mfrow=c(2,5))
for (i in 1:length(plotdata)) {
boxplot(plotdata[,i], main=names(plotdata[i]), type="l")
}