Draw a lot of plots on the same canvas (clean way) - r

I want to draw a number of similar plots with a loop.
What I do is:
plot(0, 0, type="l", col="white", xlim=range(1,N), ylim=range(0.5, 2.5)) # provide axes, frame, ...
for(col in colors)
{
X <- generate_X() # vector of random numbers
lines(1:N, X, type="l", col=col)
}
The problem is that random numbers sometimes go out of the range(0.5,2.5) and I want to lengthen ylim range. Atm I'm going to do it with min and max before plotting. But there must be much, much cleaner way which I poorly cant find anywhere.
I think I'm missing something basic about plotting, but I couldnt find the solution.
Thanks

I think there are two quick answers to the OP's question:
calculate the plot range before initializing the plot (implied by OP), or
use a "cleaner" plotting wrapper function.
Setup: First we need to define the variables and functions the OP implies and then generate some data to work with.
# Initialize our N number of X points and
# colors vector.
N <- 20
colors <- c("yellow", "red", "blue", "green")
# Create function 'generate_X' to perform
# as implied by the OP.
generate_X <- function(.N){
rnorm(n=.N, mean=0, sd=1)
}
# Generate the entire data frame
# using the 'matrix' function to shape
# the data quickly.
data <- data.frame(
id=1:N,
matrix(
generate_X(N*length(colors)),
ncol=length(colors)
)
)
The above code simply initializes the variables, function, and data needed for the OP's example.
Method 1: Calculate the plot range and initialize the plot. This is pretty easy using the 'range' function. In the data frame we created, there is an "id" column for our x values, so we use the range of 'data$id' for our x. Then, we find the range of all the data across every column EXCEPT the first column (data[,-1]) to find the overall y range. We initialize with the color white, since our background is also white. Otherwise, we would have a point in the lower-left and upper-right corners. I added x and y labels just for looks.
plot(
range(data$id),
range(data[,-1]),
col="white",
xlab="x",
ylab="y")
Next we just loop through and plot the lines.
for(i in 1:length(colors)){
lines(data$id, data[, i + 1], type="l", col=colors[i])
}
This is essentially the same thing the OP demonstrated, but it's adapted slightly to accept a data frame as input. It's far easier to reference columns using an integer counter (i in this case) rather than the list of colors.
Method 2: There are a lot of plot wrapper packages out there, and one of the most popular is the 'ggplot2' package, and for good reason. You can avoid a lot of the looping hassle with plots by feeding shaped data into a 'ggplot' function. The code here is much "cleaner" from a reading perspective.
# Load packages for shaping data and plotting.
library(reshape2)
library(ggplot2)
First, we need the 'reshape2' package, because we want to use "melted" data in our plot. This just makes the 'ggplot' code WAY cleaner. Then, we load up the 'ggplot2' package for the plotting.
For our plot, we initialize a plot without any instructions, so we can specify them in the geometry layer. If we were creating multiple layers from the same data, we would specify the options in the base plot layer, but for this, we are only creating a single geometry layer with lines. The + allows us to add plot layers.
Next, we choose a geometry layer ('geom_line' in this case) and specify the data as melt(data, id.vars="id"). This shapes our data for the 'ggplot' function to use with minimal code. We use the "id" column as the ID variable, since that contains our x values. The shaped data now looks more like this:
# id variable value
# 1 1 X1 -0.280035386
# 2 2 X1 -0.371020958
# 3 3 X1 -0.239889784
# 4 4 X1 0.450357442
# 5 5 X1 -0.801697283
# 6 6 X1 -0.453057841
# 7 7 X1 -0.451321958
# 8 8 X1 0.948124835
# 9 9 X1 2.724205279
# 10 10 X1 -0.725622824
# 11 11 X1 0.475545293
# 12 12 X1 0.533060822
# 13 13 X1 -1.928335572
# 14 14 X1 -0.466790259
# 15 15 X1 -1.606005895
# 16 16 X1 0.005678344
# 17 17 X1 -1.719827853
# 18 18 X1 0.601011314
# 19 19 X1 -2.056315661
# 20 20 X1 1.006169713
# 21 1 X2 -1.591227194
# ...
# 80 20 X4 -1.045224561
You don't need to get too hung up on the shaping. Just understand that "melted" data works better with the 'ggplot' functions. We specify our melted data as the data for our geometry layer, and then we use the 'aes' function to tell the geometry layer how to deal with our data. Our x values are in the "id" column, and our y values are in the "value" column. The next part is what removes the loops: we specify the color to be differentiated based on the "variable" column. In our melted data, the "variable" column contains the name of the column that the data originally came from, and using it to specify the color will tell 'ggplot' to automatically change the color for each new "variable" value.
ggplot() +
geom_line(
data=melt(data, id.vars="id"),
aes(
x=id,
y=value,
col=variable
),
lwd=1,
alpha=0.7)
I specified the line width ("lwd") and alpha values just to make the graph a little more readable.

Related

Cumulative plot in R

I try to make a cumulative plot for a particular (for instance the first) column of my data (example):
1 3
2 5
4 9
8 11
12 17
14 20
16 34
20 40
Than I want to overlap this plot with another cumulative plot of another data (for example the second column) and save it as a png or jpg image.
Without using the vectors implementation "by hand" as in Cumulative Plot with Given X-Axis because if I have a very large dataset i can't be able to do that.
I try the follow simple commands:
A <- read.table("cumul.dat", header=TRUE)
Read the file, but now I want that the cumulative plot is down with a particular column of this file.
The command is:
cdat1<-cumsum(dat1)
but this is for a particular vector dat1 that I need to take from the data array (cumul.dat).
Thanks
I couldn't follow your question so this is a shot in the dark answer based on key words I did get:
m <- read.table(text=" 1 3
2 5
4 9
8 11
12 17
14 20
16 34
20 40")
library(ggplot2)
m2 <- stack(m)
qplot(rep(1:nrow(m), 2), values, colour=ind, data=m2, geom="step")
EDIT I decided I like this approach bettwe:
library(ggplot2)
library(reshape2)
m$x <- seq_len(nrow(m))
m2 <- melt(m, id='x')
qplot(x, value, colour=variable, data=m2, geom="step")
I wasn't quite sure when the events were happening and what the observations were. I'm assuming the events are just at 1,2,3,4 and the columns represent sounds of the different groups. If that's the case, using Lattice I would do
require(lattice)
A<-data.frame(dat1=c(1,2,4,8,12,14,16,20), dat2=c(3,5,9,11,17,20,34,40))
dd<-do.call(make.groups, lapply(A, function(x) {data.frame(x=seq_along(x), y=cumsum(x))}))
xyplot(y~x,dd, groups=which, type="s", auto.key=T)
Which produces
With base graphics, this can be done by specifying type='s' in the plot call:
matplot(apply(A, 2, cumsum), type='s', xlab='x', ylab='y', las=1)
Note I've used matplot here, but you could also plot the series one at a time, the first with plot and the second with points or lines.
We could also add a legend with, for example:
legend('topleft', c('Series 1', 'Series 2'), bty='n', lty=c(1, 3), col=1:2)

Color Dependent Bar Graph in R

I'm a bit out of my depth with this one here. I have the following code that generates two equally sized matrices:
MAX<-100
m<-5
n<-40
success<-matrix(runif(m*n,0,1),m,n)
samples<-floor(MAX*matrix(runif(m*n),m))+1
the success matrix is the probability of success and the samples matrix is the corresponding number of samples that was observed in each case. I'd like to make a bar graph that groups each column together with the height being determined by the success matrix. The color of each bar needs to be a color (scaled from 1 to MAX) that corresponds to the number of observations (i.e., small samples would be more red, for instance, whereas high samples would be green perhaps).
Any ideas?
Here is an example with ggplot. First, get data into long format with melt:
library(reshape2)
data.long <- cbind(melt(success), melt(samples)[3])
names(data.long) <- c("group", "x", "success", "count")
head(data.long)
# group x success count
# 1 1 1 0.48513473 8
# 2 2 1 0.56583802 58
# 3 3 1 0.34541582 40
# 4 4 1 0.55829073 64
# 5 5 1 0.06455401 37
# 6 1 2 0.88928606 78
Note melt will iterate through the row/column combinations of both matrices the same way, so we can just cbind the resulting molten data frames. The [3] after the second melt is so we don't end up with repeated group and x values (we only need the counts from the second melt). Now let ggplot do its thing:
library(ggplot2)
ggplot(data.long, aes(x=x, y=success, group=group, fill=count)) +
geom_bar(position="stack", stat="identity") +
scale_fill_gradient2(
low="red", mid="yellow", high="green",
midpoint=mean(data.long$count)
)
Using #BrodieG's data.long, this plot might be a little easier to interpret.
library(ggplot2)
library(RColorBrewer) # for brewer.pal(...)
ggplot(data.long) +
geom_bar(aes(x=x, y=success, fill=count),colour="grey70",stat="identity")+
scale_fill_gradientn(colours=brewer.pal(9,"RdYlGn")) +
facet_grid(group~.)
Note that actual values are probably different because you use random numbers in your sample. In future, consider using set.seed(n) to generate reproducible random samples.
Edit [Response to OP's comment]
You get numbers for x-axis and facet labels because you start with matrices instead of data.frames. So convert success and samples to data.frames, set the column names to whatever your test names are, and prepend a group column with the "list of factors". Converting to long format is a little different now because the first column has the group names.
library(reshape2)
set.seed(1)
success <- data.frame(matrix(runif(m*n,0,1),m,n))
success <- cbind(group=rep(paste("Factor",1:nrow(success),sep=".")),success)
samples <- data.frame(floor(MAX*matrix(runif(m*n),m))+1)
samples <- cbind(group=success$group,samples)
data.long <- cbind(melt(success,id=1), melt(samples, id=1)[3])
names(data.long) <- c("group", "x", "success", "count")
One way to set a threshold color is to add a column to data.long and use that for fill:
threshold <- 25
data.long$fill <- with(data.long,ifelse(count>threshold,max(count),count))
Putting it all together:
library(ggplot2)
library(RColorBrewer)
ggplot(data.long) +
geom_bar(aes(x=x, y=success, fill=fill),colour="grey70",stat="identity")+
scale_fill_gradientn(colours=brewer.pal(9,"RdYlGn")) +
facet_grid(group~.)+
theme(axis.text.x=element_text(angle=-90,hjust=0,vjust=0.4))
Finally, when you have names for the x-axis labels they tend to get jammed together, so I rotated the names -90°.

Plotting an filled line chart with 4 variables against a 5th variable ggplot2

I am trying to create a postion="fill" which represents an allocation on the y axis (to always sum to 100) and another variable on the x axis. Variable 1-4 are numeric integers, variable 5 is also numeric. Variable 5 is a continuous numeric. All five variables on are on the same row.
Y axis: variable 1 + variable 2 + variable 3 + variable 4 = 100
X axis: variable 5
Is there a way to do this without melting my data table?
Sample code, caution: runs a bit slow due to how I set up variables 1-4...
library(combinat)
combinations <- combn(100, 4)
permutations <- combinations[, colSums(combinations) == 100]
rm(combinations)
data <- t(rbind(permutations,
replicate(ncol(permutations), cumprod(1+rnorm(20, 0.05, 0.30))[20])
))
One way to generate a reproducible example would be
set.seed(1)
data_ex <- data.frame(t(rmultinom(1000,prob=rep(0.25,4),size=100)),
v5=runif(1000,0.8,1))
and then
library(ggplot2)
library(reshape2)
ggplot(melt(data_ex,id.var="v5")) +
geom_area(aes(x=v5,y=value,fill=variable))
draws the plot.
If you really want to do things the hard way you can avoid using melt, but melt is much (much much) easier!
cumvals <- t(apply(data_ex[,1:4],1,cumsum))
data2 <- data.frame(cumvals,v5=data_ex$v5)
ggplot(data2,aes(x=v5)) +
## these must go in reverse order
geom_area(aes(y=X4),fill="green")+
geom_area(aes(y=X3),fill="purple")+
geom_area(aes(y=X2),fill="red")+
geom_area(aes(y=X1),fill="blue")

plotting aggregate data with ggplot

I have a data like this
subject<-1:208
ev<-runif(208, min=1, max=2)
seeds<-gl(6,40,labels=c('seed1', 'seed2','seed3','seed4','seed5','seed6'),length=208)
ngambles<-gl(2,1, labels=c('4','32'))
trial<-rep(1:20, each= 2, length=208)
ngambles<-rep('4','32' ,each=1, length=208)
data<-data.frame(subject,ev,seeds,ngambles,trial)
the data looks like this
subject ev seeds ngambles trial
1 1.996717 seed1 4 1
2 1.280977 seed1 32 1
3 1.571648 seed1 4 2
4 1.153311 seed1 32 2
5 1.502559 seed1 4 3
6 1.644001 seed1 32 3
I plot a graph with rep as x axis and expected_value as y axis for each seed and n_gambles by this command.
qplot(trial,ev,data=data,
facets=ngambles~seeds,xlab="Trial", ylab="Expected Value", geom="line")+
opts(title = "Expected Value for Each Seed")
now I want to draw a new graph by aggregating ev for trial equal to 1-5, 6-10,11-15,and 16-20. I also want to draw an error bar.
I have no clue how to do in R
maybe somebody can help me
thanks in advance
Assuming that your data frame is called df. First, added new column ag that show to which interval original trial value will belong with function cut().
df$ag<-cut(df$trial,c(1,6,11,16,21),right=FALSE)
Now there is two possibilities - first, aggregate your data using stat_.. functions of ggplot2. There is stat_summary() function already defined and then you should define also stat_sum_df() function (taken from stat_summary() help file) to calculate more than one summary value.
stat_sum_df <- function(fun, geom="crossbar", ...) {
stat_summary(fun.data=fun, colour="red", geom=geom, width=0.2, ...)
}
With stat_sum_df() and argument "mean_cl_normal" calculate confidence intervals to use in geom="errorbar" and with stat_summary() mean value for geom="line". As x value use new column ag. With scale_x_discrete() you can get right labels for x axis.
ggplot(df, aes(ag,ev,group=seeds))+stat_sum_df("mean_cl_normal",geom="errorbar")+
stat_summary(fun.y="mean",geom="line",color="red")+
facet_grid(ngambles~seeds)+
scale_x_discrete(labels=c("1-5","6-10","11-15","16-20"))
Second approach is to summarize data before plotting, for example, with function ddply() from library plyr. Also in this case you need column ag made in first example. And then use new data for plotting.
library(plyr)
df.new<-ddply(df,.(ag,seeds,ngambles),summarise,ev.m=mean(ev),
ev.lim=qt(0.975,length(ev)-1)*sd(ev)/sqrt(length(ev)))
ggplot(df.new,aes(ag,group=seeds))+
geom_errorbar(aes(y=ev.m,ymin=ev.m-ev.lim,ymax=ev.m+ev.lim))+
geom_line(aes(y=ev.m))+
facet_grid(ngambles~seeds)+
scale_x_discrete(labels=c("1-5","6-10","11-15","16-20"))

How do you apply the pch parameter in R to individual points in a scatter plot?

I am interested in changing the symbol used to represent the two most influential points in my scatter plot. In this case, they are rows 19 and 20 in the data frame. The code I have is as follows:
data1<-read.csv("data1.csv")
plot(h~w,data=data1,xlab="Weight",ylab="Height",
main="Scatterplot of H vs W",pch=c(17,19)[data1[c(19,20),]])
Obviously, I cannot get this to work depsite several suggestions and hours of trying to figure this out. Any suggestions would be appreciated.
The pch symbol is used for each data point and gets recyclyed to the length of the number of points you are plotting.
Consider this example
x <- 1:10 + rnorm(10)
y <- 1:10
plot( y ~ x )
The default is pch = 1 and it gets recycled to be used for each point.
Contrast that with:
plot( y ~ x , pch = rep(c(1,2),each=5))
You get the first five points with one symbol and the next5 with another, and that is because you have made a vector of values for pch that specifies the plotting symbol for each of the 10 values being plotted:
rep(c(1,2),each=5)
#[1] 1 1 1 1 1 2 2 2 2 2
In your case, all you need to do is
plot(h~w,data=data1,xlab="Weight",ylab="Height",
main="Scatterplot of H vs W",pch=c(rep(1,times=18),17,19) )

Resources