I am able to plot several plots that do get plotted in a lattice format with my code. However, there are several of these plots, 77 to be exact so, each plot is totally squished and unreadable. I have tried playing with the width, height, facet_wrap_paginate as well but none seem to give me the output I want. Each plot is a line plot for each subject and EVTEST and there are 77 subjects.
I want to break the plots in multiple panels of 4X3 or 4X4 and also pages. Also I am outputting it in RTF. I am open to outputting in PDF as well if it works.How do I do it?
Below is a fictitious data, which is the same format that I use as input in my code. what I am trying to plot is line plots for this longitudinal data, where my X axis is VISITDY_D and Y axis is EVSTRESN. I am grouping the plots by creating a concatenated handle (SUBJID_EVTEST_SITE).
SUBJID SITE EVTEST EVSTRESN VISITDY_D SIDE
1 AB ABC 1.1 D00 Left
1 AB ABC 2.1 D28 Right
1 AB ABC 2.2 D56 Left
1 AB ABC 2.3 D84 Left
2 AB ABC 1.5 D00
2 AB ABC 1.6 D28 Right
#read the data (csv file)
donnees <- read.csv(paste0(path_data,"Sample.csv"), sep = ";",header =
T,stringsAsFactors = FALSE)
Params = c("Phenotype1","Phenotype2")
#PLOTTING FUNCTION
pf1<-function(subD,tit1){ # subD:Input data, tit1: Title for the plot
subD$SUBJID1 <-
as.factor(paste0(subD$RANDOID,'_',subD$EVTEST,'_',subD$SITE))
p1 <- ggplot(subD,aes(x = VISITDY_D,y = EVSTRESN, color=SIDE,group=SIDE))
geom_line(position=position_dodge(width=0.7))+
geom_point() + facet_wrap_paginate(~ SUBJID1, nrow=3,ncol=3,page=1) +
theme()
print(p1)
p1 <- ggplot(subD,aes(x = VISITDY_D,y = EVSTRESN,
color=SIDE,group=SIDE)) +
geom_line(position=position_dodge(width=0.7))+
geom_point() + facet_wrap(~ SUBJID1) + theme()
print(p1)
}
# OUTPUT; Calling the plotting function in RTF
oP <- /output_directory
setwd(oP)
rtf <- RTF(file = paste0("TEST_","individual profiles.rtf"))
addTOC(rtf)
addPageBreak(rtf)
for(s in 1:length(Params)){
dat = subset(donnees,donnees$EVTEST %in% Params[s])
SUBJID1 <-as.factor(paste0(dat$RANDOID,'_',dat$EVTEST,'_',dat$SITE))
tit1<-paste0(Params[s])
addHeader(rtf,tit1,font.size = 4, TOC.level = 1)
addPlot(rtf, plot.fun=pf1, subD= dat,tit1= tit1, width= 7, height=5.2,
res=250)
}
done(rtf)
Related
I'm working on trying to represent an office building in R. Later, I'll need to represent multiple floors, but for now I need to start with one floor. There are clusters of cubes all in a regular structure. There are four small cubes for junior staff (4x4), and two larger cubes for a senior engineer and a manager (4x6). Once these are mapped out, I need to be able to show if they are occupied or free for new hires -- by color (like red for occupied, green for available). These are all laid out the same way, with the big ones on one end. For example,
+----+--+--+
| S |J1|J2|
+----+--+--+
<-hallway-->
+----+--+--+
| M |J3|J4|
+----+--+--+
I first thought I could use ggplot and just scatter plot everybody out, but I can't figure out how to capture the different size cubes with geom_point. I spent some time looking at maps, but it seems like I can't really take advantage of the regular structure of my floorplan -- maybe that really is the way to go and I take advantage of my regular structure in building out a map? Does R have a concept I should Google for this kind of structure?
In the end, I'll get a long data file, with the type of cubicle, the x and y coordinates of the cluster, and a "R" or "G" (4 columns).
You could also write a low-level graphic function; it's sometimes easier to tune than removing more and more components from a complex plot,
library(grid)
library(gridExtra)
floorGrob <- function(S = c(TRUE, FALSE), J = c(TRUE, FALSE, TRUE, TRUE),
draw=TRUE, newpage=is.null(vp), vp=NULL){
m <- rbind(c(1,3,4), # S1 J1 J2
c(7,7,7), # hall
c(2,5,6)) # S2 J3 J4
fills <- c(c("#FBB4AE","#CCEBC5")[c(S, J)+1], "grey90")
cellGrob <- function(f) rectGrob(gp=gpar(fill=f, col="white", lwd=2))
grobs <- mapply(cellGrob, f=fills, SIMPLIFY = FALSE)
g <- arrangeGrob(grobs = grobs, layout_matrix = m, vp = vp, as.table = FALSE,
heights = unit(c(4/14, 1/14, 4/14), "null"),
widths = unit(c(6/14, 4/14, 4/14), "null"), respect=TRUE)
if(draw) {
if(newpage) grid.newpage()
grid.draw(g)
}
invisible(g)
}
floorGrob()
How about?
df <- expand.grid(x = 0:5, y = 0:5)
df$color <- factor(sample(c("green", "red"), 36, replace = T))
head(df)
# x y color
# 1 0 0 green
# 2 1 0 green
# 3 2 0 green
# 4 3 0 red
# 5 4 0 green
# 6 5 0 red
library(ggplot2)
ggplot(df, aes(x, y, fill = color)) +
geom_tile() +
scale_fill_manual(name = "Is it open?",
values = c("lightgreen", "#FF3333"),
labels = c("open", "not open"))
I found coplot {graphics} very useful for my plots. However, I would like to include there not only one line, but add there one another. For basic graphic I just need to add = TRUE to add another line, or tu use plot(..) and lines(..). For {lattice} I can save my plots as objects
a<-xyplot(..)
b<-xyplot(..)
and display it simply by a + as.layer(b). No one of these approaches works for coplot(), apparently because creating objects as a<-coplot() doesn't produce trellis graphic but NULL object.
Please, any help how to add data line in coplot()? I really like its graphic so I wish to keep it. Thank you !!
my exemle data are here: http://ulozto.cz/xPfS1uRH/repr-exemple-csv
My code:
sub.tab<-read.csv("repr_exemple.csv", , header = T, sep = "")
attach(sub.tab)
cells.f<-factor(cells, levels=c(2, 25, 100, 250, 500), # unique(cells.in.cluster)???
labels=c("size2", "size25", "size100", "size250", "size500"))
perc.f<-factor(perc, levels=c(5, 10), # unique(cells.in.cluster)???
labels=c("perc5", "perc10"))
# how to put these plots together?
a<- coplot(max_dist ~ time |cells.f + perc.f, data = sub.tab,
xlab = "ticks", type = "l", col = "black", lwd = 1)
b<- coplot(mean_dist ~ time |cells.f * perc.f, data = sub.tab,
xlab = "ticks", type = "l", col = "grey", lwd = 1)
a + as.layer(b) # this doesn't work
Please, how to merge these two plots (grey and black lines)? I couldn't figure it out... Thank you !
Linking to sample data isn't really as helpful. Here's a randomly created sample data set
set.seed(15)
dd <- do.call("rbind",
do.call("Map", c(list(function(a,b) {
cbind.data.frame(a,b, x=1:5,
y1=cumsum(rpois(5,7)),
y2=cumsum(rpois(5,9)))
}),
expand.grid(a=letters[1:5], b=letters[20:22])))
)
head(dd)
# a b x y1 y2
# 1 a t 1 8 16
# 2 a t 2 13 28
# 3 a t 3 25 35
# 4 a t 4 33 45
# 5 a t 5 39 57
# 6 b t 1 4 12
I will note the coplot is a base graphics function, not Lattice. But it does have a panel= parameter. And you can have the coplot() take care of subsetting your data for you (well, calculating the indexes at least). But, like other base graphics functions, plotting different groups isn't exactly trivial. You can do it in this case with
coplot(y~x|a+b,
# make a fake y col to cover range of all y1 and y2 values
cbind(dd, y=seq(min(dd$y1, dd$y2), max(dd$y1, dd$y2), length.out=nrow(dd))),
#request subscripts to be sent to panel function
subscripts=TRUE,
panel=function(x,y,subscripts, ...) {
# draw group 1
lines(x, dd$y1[subscripts])
# draw group 2
lines(x, dd$y2[subscripts], col="red")
})
This gives
Using the following code:
library("ggplot2")
require(zoo)
args <- commandArgs(TRUE)
input <- read.csv(args[1], header=F, col.names=c("POS","ATT"))
id <- args[2]
prot_len <- nrow(input)
manual <- prot_len/100 # 4.3
att_name <- "Entropy"
att_zoo <- zoo(input$ATT)
att_avg <- rollapply(att_zoo, width = manual, by = manual, FUN = mean, align = "left")
autoplot(att_avg, col="att1") + labs(x = "Positions", y = att_name, title="")
With data:
> str(input)
'data.frame': 431 obs. of 2 variables:
$ POS: int 1 2 3 4 5 6 7 8 9 10 ...
$ ATT: num 0.652 0.733 0.815 1.079 0.885 ...
I do:
I would like to upload input2 which has different lenght (therefore, different x-axis) and overlap the 2 curves in the same plot (I mean overlap because I want the two curves in the same plot size, so I will "ignore" the overlapped axis labels and tittles), I would like to compare the shape, regardles the lenght of input.
First I've tried by generating toy input2 changing manual value, so that I have att_avg2 in which manual equals e.g. 7. In between original autoplot and new autoplot-2 I add par(new=TRUE), but this is not my expected output. Any hint on how doing this? Maybe it's better to save att_avg from zoo series to data.frame and not use autoplot? Thanks
UPDATE, response to G. Grothendieck:
If I do:
[...]
att_zoo <- zoo(input$ATT)
att_avg <- rollapply(att_zoo, width = manual, by = manual, FUN = mean, align = "left") #manual=4.3
att_avg2 <- rollapply(att_zoo, width = 7, by = 7, FUN = mean, align = "left")
autoplot(cbind(att_avg, att_avg2), facet=NULL) +
labs(x = "Positions", y = att_name, title="")
I get
and a warning message:
Removed 1 rows containing missing values (geom_path).
par is used with classic graphics, not for ggplot2. If you have two zoo series just cbind or merge the series together and autoplot them using facet=NULL:
library(zoo)
library(ggplot2)
z1 <- zoo(1:3) # length 3
z2 <- zoo(5:1) # length 5
autoplot(cbind(z1, z2), facet = NULL)
Note: The question omitted input2 so there could be some additional considerations from aspects not shown.
I have 2 dataframes, Tg and Pf, each of 127 columns. All columns have at least one row and can have up to thousands of them. All the values are between 0 and 1 and there are some missing values (empty cells). Here is a little subset:
Tg
Tg1 Tg2 Tg3 ... Tg127
0.9 0.5 0.4 0
0.9 0.3 0.6 0
0.4 0.6 0.6 0.3
0.1 0.7 0.6 0.4
0.1 0.8
0.3 0.9
0.9
0.6
0.1
Pf
Pf1 Pf2 Pf3 ...Pf127
0.9 0.5 0.4 1
0.9 0.3 0.6 0.8
0.6 0.6 0.6 0.7
0.4 0.7 0.6 0.5
0.1 0.6 0.5
0.3
0.3
0.3
Note that some cell are empty and the vector lengths for the same subset (i.e. 1 to 127) can be of very different length and are rarely the same exact length.
I want to generate 127 graph as follow for the 127 vectors (i.e. graph is for col 1 from each dataframe, graph 2 is for col 2 for each dataframe etc...):
Hope that makes sense. I'm looking forward to your assistance as I don't want to make those graphs one by one...
Thanks!
Here is an example to get you started (data at https://gist.github.com/1349300). For further tweaking, check out the excellent ggplot2 documentation that is all over the web.
library(ggplot2)
# Load data
Tg = read.table('Tg.txt', header=T, fill=T, sep=' ')
Pf = read.table('Pf.txt', header=T, fill=T, sep=' ')
# Format data
Tg$x = as.numeric(rownames(Tg))
Tg = melt(Tg, id.vars='x')
Tg$source = 'Tg'
Tg$variable = factor(as.numeric(gsub('Tg(.+)', '\\1', Tg$variable)))
Pf$x = as.numeric(rownames(Pf))
Pf = melt(Pf, id.vars='x')
Pf$source = 'Pf'
Pf$variable = factor(as.numeric(gsub('Pf(.+)', '\\1', Pf$variable)))
# Stack data
data = rbind(Tg, Pf)
# Plot
dev.new(width=5, height=4)
p = ggplot(data=data, aes(x=x)) + geom_line(aes(y=value, group=source, color=source)) + facet_wrap(~variable)
p
Highlighting the area between the lines
First, interpolate the data onto a finer grid. This way the ribbon will follow the actual envelope of the lines, rather than just where the original data points were located.
data = ddply(data, c('variable', 'source'), function(x) data.frame(approx(x$x, x$value, xout=seq(min(x$x), max(x$x), length.out=100))))
names(data)[4] = 'value'
Next, calculate the data needed for geom_ribbon - namely ymax and ymin.
ribbon.data = ddply(data, c('variable', 'x'), summarize, ymin=min(value), ymax=max(value))
Now it is time to plot. Notice how we've added a new ribbon layer, for which we've substituted our new ribbon.data frame.
dev.new(width=5, height=4)
p + geom_ribbon(aes(ymin=ymin, ymax=ymax), alpha=0.3, data=ribbon.data)
Dynamic coloring between the lines
The trickiest variation is if you want the coloring to vary based on the data. For that, you currently must create a new grouping variable to identify the different segments. Here, for example, we might use a function that indicates when the "Tg" group is on top:
GetSegs <- function(x) {
segs = x[x$source=='Tg', ]$value > x[x$source=='Pf', ]$value
segs.rle = rle(segs)
on.top = ifelse(segs, 'Tg', 'Pf')
on.top[is.na(on.top)] = 'Tg'
group = rep.int(1:length(segs.rle$lengths), times=segs.rle$lengths)
group[is.na(segs)] = NA
data.frame(x=unique(x$x), group, on.top)
}
Now we apply it and merge the results back with our original ribbon data.
groups = ddply(data, 'variable', GetSegs)
ribbon.data = join(ribbon.data, groups)
For the plot, the key is that we now specify a grouping aesthetic to the ribbon geom.
dev.new(width=5, height=4)
p + geom_ribbon(aes(ymin=ymin, ymax=ymax, group=group, fill=on.top), alpha=0.3, data=ribbon.data)
Code is available together at: https://gist.github.com/1349300
Here is a three-liner to do the same :-). We first reshape from base to convert the data into long form. Then, it is melted to suit ggplot2. Finally, we generate the plot!
mydf <- reshape(cbind(Tg, Pf), varying = 1:8, direction = 'long', sep = "")
mydf_m <- melt(mydf, id.var = c(1, 4), variable = 'source')
qplot(id, value, colour = source, data = mydf_m, geom = 'line') +
facet_wrap(~ time, ncol = 2)
NOTE. The reshape function in base R is extremely powerful, albeit very confusing to use. It is used to transform data between long and wide formats.
Kudos for automating something you used to do in Excel using R! That's exactly how I got started with R and a common path to R enlightenment :)
All you really need is a little looping. Here's an example, most of which is creating example data that represents your data structure:
## create some example data
Tg <- data.frame(Tg1 = rnorm(10))
for (i in 2:10) {
vec <- rep(NA, 8)
vec <- c(rnorm(sample(5:10,1)), vec)
Tg[paste("Tg", i, sep="")] <- vec[1:10]
}
Pf <- data.frame(Pf1 = rnorm(10))
for (i in 2:10) {
vec <- rep(NA, 8)
vec <- c(rnorm(sample(5:10,1)), vec)
Pf[paste("Pf", i, sep="")] <- vec[1:10]
}
## ok, sample data created
## now lets loop through all the columns
## if you didn't know how many columns there are you could
## use ncol(Tg) to figure out
for (i in 1:10) {
plot(1:10, Tg[,i], type = "l", col="blue", lwd=5, ylim=c(-3,3),
xlim=c(1, max(length(na.omit(Tg[,i])), length(na.omit(Pf[,i])))))
lines(1:10, Pf[,i], type = "l", col="red", lwd=5, ylim=c(-3,3))
dev.copy(png, paste('rplot', i, '.png', sep=""))
dev.off()
}
This will result in 10 graphs in your working directory that look like the following:
I have data that looks like this:
> head(data)
groupname ob_time dist.mean dist.sd dur.mean dur.sd ct.mean ct.sd
1 rowA 0.3 61.67500 39.76515 43.67500 26.35027 8.666667 11.29226
2 rowA 60.0 45.49167 38.30301 37.58333 27.98207 8.750000 12.46176
3 rowA 120.0 50.22500 35.89708 40.40000 24.93399 8.000000 10.23363
4 rowA 180.0 54.05000 41.43919 37.98333 28.03562 8.750000 11.97061
5 rowA 240.0 51.97500 41.75498 35.60000 25.68243 28.583333 46.14692
6 rowA 300.0 45.50833 43.10160 32.20833 27.37990 12.833333 14.21800
Each groupname is a data series. Since I want to plot each series separately, I've separated them like this:
> A <- zoo(data[which(groupname=='rowA'),3:8],data[which(groupname=='rowA'),2])
> B <- zoo(data[which(groupname=='rowB'),3:8],data[which(groupname=='rowB'),2])
> C <- zoo(data[which(groupname=='rowC'),3:8],data[which(groupname=='rowC'),2])
ETA:
Thanks to gd047: Now I'm using this:
z <- dlply(data,.(groupname),function(x) zoo(x[,3:8],x[,2]))
The resulting zoo objects look like this:
> head(z$rowA)
dist.mean dist.sd dur.mean dur.sd ct.mean ct.sd
0.3 61.67500 39.76515 43.67500 26.35027 8.666667 11.29226
60 45.49167 38.30301 37.58333 27.98207 8.750000 12.46176
120 50.22500 35.89708 40.40000 24.93399 8.000000 10.23363
180 54.05000 41.43919 37.98333 28.03562 8.750000 11.97061
240 51.97500 41.75498 35.60000 25.68243 28.583333 46.14692
300 45.50833 43.10160 32.20833 27.37990 12.833333 14.21800
So if I want to plot dist.mean against time and include error bars equal to +/- dist.sd for each series:
how do I combine A,B,C dist.mean and dist.sd?
how do I make a bar plot, or perhaps better, a line graph of the resulting object?
I don't see the point of breaking up the data into three pieces only to have to combine it together for a plot. Here is a plot using the ggplot2 library:
library(ggplot2)
qplot(ob_time, dist.mean, data=data, colour=groupname, geom=c("line","point")) +
geom_errorbar(aes(ymin=dist.mean-dist.sd, ymax=dist.mean+dist.sd))
This spaces the time values along the natural scale, you can use scale_x_continuous to define the tickmarks at the actual time values. Having them equally spaced is trickier: you can convert ob_time to a factor, but then qplot refuses to connect the points with a line.
Solution 1 - bar graph:
qplot(factor(ob_time), dist.mean, data=data, geom=c("bar"), fill=groupname,
colour=groupname, position="dodge") +
geom_errorbar(aes(ymin=dist.mean-dist.sd, ymax=dist.mean+dist.sd), position="dodge")
Solution 2 - add lines manually using the 1,2,... recoding of the factor:
qplot(factor(ob_time), dist.mean, data=data, geom=c("line","point"), colour=groupname) +
geom_errorbar(aes(ymin=dist.mean-dist.sd, ymax=dist.mean+dist.sd)) +
geom_line(aes(x=as.numeric(factor(ob_time))))
This is a hint of the way I would try to do it. I have ignored grouping, so you'll have to modify it to include more than one series. Also I haven't used zoo cause I don't know much.
g <- (nrow(data)-1)/(3*nrow(data))
plot(data[,"dist.mean"],col=2, type='o',lwd=2,cex=1.5, main="This is the title of the graph",
xlab="x-Label", ylab="y-Label", xaxt="n",
ylim=c(0,max(data[,"dist.mean"])+max(data[,"dist.sd"])),
xlim=c(1-g,nrow(data)+g))
axis(side=1,at=c(1:nrow(data)),labels=data[,"ob_time"])
for (i in 1:nrow(data)) {
lines(c(i,i),c(data[i,"dist.mean"]+data[i,"dist.sd"],data[i,"dist.mean"]-data[i,"dist.sd"]))
lines(c(i-g,i+g),c(data[i,"dist.mean"]+data[i,"dist.sd"], data[i,"dist.mean"]+data[i,"dist.sd"]))
lines(c(i-g,i+g),c(data[i,"dist.mean"]-data[i,"dist.sd"], data[i,"dist.mean"]-data[i,"dist.sd"]))
}
Read the data in using read.zoo with the split= argument to split it by groupname. Then bind together the dist, lower and upper lines. Finally plot them.
Lines <- "groupname ob_time dist.mean dist.sd dur.mean dur.sd ct.mean ct.sd
rowA 0.3 61.67500 39.76515 43.67500 26.35027 8.666667 11.29226
rowA 60.0 45.49167 38.30301 37.58333 27.98207 8.750000 12.46176
rowA 120.0 50.22500 35.89708 40.40000 24.93399 8.000000 10.23363
rowA 180.0 54.05000 41.43919 37.98333 28.03562 8.750000 11.97061
rowB 240.0 51.97500 41.75498 35.60000 25.68243 28.583333 46.14692
rowB 300.0 45.50833 43.10160 32.20833 27.37990 12.833333 14.21800"
library(zoo)
# next line is only needed until next version of zoo is released
source("http://r-forge.r-project.org/scm/viewvc.php/*checkout*/pkg/zoo/R/read.zoo.R?revision=719&root=zoo")
z <- read.zoo(textConnection(Lines), header = TRUE, split = 1, index = 2)
# pick out the dist and sd columns binding dist with lower & upper
z.dist <- z[, grep("dist.mean", colnames(z))]
z.sd <- z[, grep("dist.sd", colnames(z))]
zz <- cbind(z = z.dist, lower = z.dist - z.sd, upper = z.dist + z.sd)
# plot using N panels
N <- ncol(z.dist)
ylab <- sub("dist.mean.", "", colnames(z.dist))
plot(zz, screen = 1:N, type = "l", lty = rep(1:2, N*1:2), ylab = ylab)
I don't think you need to create zoo objects for this type of plot, I would do it directly from the data frame. Of course, there may be other reasons to use zoo objects, such a smart merging, aggregation, etc.
One option is the segplot function from latticeExtra
library(latticeExtra)
segplot(ob_time ~ (dist.mean + dist.sd) + (dist.mean - dist.sd) | groupname,
data = data, centers = dist.mean, horizontal = FALSE)
## and with the latest version of latticeExtra (from R-forge):
trellis.last.object(segments.fun = panel.arrows, ends = "both", angle = 90, length = .1) +
xyplot(dist.mean ~ ob_time | groupname, data, col = "black", type = "l")
Using Gabor's nicely-reproducible dataset this produces: