How can I plot 7 different graphs on one pdf page on R?
I currently use matplot, which doesn't seem to have this option. I need to plot columns of data against columns of data.
I initially tried to do this with the lattice library, but I can't seem to figure out how to plot the columns of data. It seems to want a function.
To create a pdf of plots, you can do something like this. To initialize a pdf document use the pdf function with a file name first. dev.off() at the end will close the graphics device and complete the pdf. Afterwards, you should see a new document in the working directory (in this example - 'plots.pdf').
d <- data.frame(matrix(sample(c(1:700), 2000, TRUE), 10, 20))
pdf('plots')
par(mfrow = c(3, 3)) ## set the layout to be 3 by 3
sapply(1:9, function(i) plot(d[,i]))
dev.off()
Which produces this pdf
If you want to do this with base graphics, I strongly recommend using the layout() function. It takes a matrix which defines how to split up the window. It will make a row for every row in your matrix and a column for every column. It draws the plots in order of the number of the cells. So if you pass the matrix
#layout(matrix(c(1:7,7), ncol=2, byrow=T))
# [,1] [,2]
#[1,] 1 2
#[2,] 3 4
#[3,] 5 6
#[4,] 7 7
the first plot will go in the upper left, the second in the upper right, etc, until the 7th plot goes all the way at the bottom. You could just have it take up only the bottom left if you like by specifying a different number in the bottom right.
To reset the layout back to "normal" just call
layout(1)
you could then create a for loop to make each plot.
If you want one plot to do all pairwise comparisons, the pairs() plotting function might be what you want
dd<-matrix(runif(5*5), ncol=5)
pairs(dd)
or the lattice equivalent is splom()
Related
I am trying to use the R barplot function to plot the following array on the same graph:
ID 1 2 3 4 5 6 7 8
HeL 0 2 1 4 2 3 2 4
CaC 2 0 0 2 1 5 7 8
NIH 1 2 5 6 3 5 7 9
I would need to have the barplot of each row having its own y-axis, but the x-axis should be common for all rows. What I have achieved so far, is to read the matrix from the file "rna.tab" and then plot each row separately:
dat <- read.table ("rna.tab", row.names=1, header=TRUE)
barplot (as.matrix (dat[,1]))
barplot (as.matrix (dat[,2]))
barplot (as.matrix (dat[,3]))
but I didn't succeed in plotting them all together.
Thanks in advance-
Arturo
Is this what you are looking for? If it isn't could you please make a manual example of what you want and post the image?
par(mfrow = c(ncol(dat),1), mar = c(2.5,4,1,1))
apply(dat, 2, barplot, beside = TRUE)
par(mfrow = c(1,1))
The first par say you want a grid of plots with as many rows as there are columns of dat and 1 column, and changes the margins of the plot to be appropriate. The apply function makes a barplot for eash column of dat and beside = TRUE puts the columns next to each other. The next par resets the plotting grid to a single graph so next time you need to plot something you aren't just making a bunch of tiny plots.
Thanks Barker for the fix and sorry for taking so long to get back to you, but I was sick for almost one week.
Your code works great, the only thing is that, since I need to plot the rows and not the columns, it should be:
apply(dat, 1, barplot, beside = TRUE)
Sorry for not being clear about this point.
I have just one last question, if you don't mind. Usually my real life matrix is 6000*30. This means that I have to plot 30 rows.
Usually I save the image to disk:
png ("plot.png")
par(mfrow = c(ncol(dat),1), mar = c(2.5,4,1,1))
apply(dat, 1, barplot, beside = TRUE)
dev.off ()
When I do this, I get only the plot of the last 4 rows in the file "plot.png", instead of the plot of all rows. Also, since the x-axis is the same for all plots, would be possible to draw it only at the end?
I have an R dataframe and some scatterplots and barplots created from them.
df <- data.frame(var1 = c(2,3,8,2,5,6,2,7,4,4),var2 = runif(n = 10),var3 = runif(n=10,min = 10,max=50),var4 = c(rep("A",5),rep("B",5)))
plot(df$var1,df$var2)
plot(df$var2,df$var3)
barplot(df$var3,names.arg=df$var4)
If I am interested in a point on the first plot, I would like to identify that point on the second, third or multiple other plots. I would like to be able to do this interactively (for example using mouse-over hover effects) in a shareable rmarkdown document.
How can one go about doing this in R either using base graphics, ggplot or even something like shiny/rCharts? Any examples/links would be appreciated. Thanks.
You can use the identify function to locate the points in a scatterplot interactively in base R.
As an example, you can identify the pairs of variables in the second plot with
identify(df$var2,df$var3)
Once you have clicked on the point of interest, hit the Esc key. The row number corresponding to the point on which you clicked will be displayed in the console and on the graph.
In this case I have clicked on a point near var2=0.5 and var3=30. The result shows that this is point number 2 in the dataset.
> identify(df$var2,df$var3) # Hit Esc key once you have selected the point.
[1] 2 # <- this is the result: the index (row) number of the selected point
#> df[2,]
# var1 var2 var3 var4
#2 3 0.481937 29.54026 A
For more information see ?identify
I'm completely new to R, and I have been tasked with making a script to plot the protocols used by a simulated network of users into a histogram by a) identifying the protocols they use and b) splitting everything into a 5-second interval and generate a graph for each different protocol used.
Currently we have
data$bucket <- cut(as.numeric(format(data$DateTime, "%H%M")),
c(0,600, 2000, 2359),
labels=c("00:00-06:00", "06:00-20:00", "20:00-23:59")) #Split date into dates that are needed to be
to split the codes into 3-zones for another function.
What should the code be changed to for 5 second intervals?
Sorry if the question isn't very clear, and thank you
The histogram function hist() can aggregate and/or plot all by itself, so you really don't need cut().
Let's create 1,000 random time stamps across one hour:
set.seed(1)
foo <- as.POSIXct("2014-12-17 00:00:00")+runif(1000)*60*60
(Look at ?POSIXct on how R treats POSIX time objects. In particular, note that "+" assumes you want to add seconds, which is why I am multiplying by 60^2.)
Next, define the breakpoints in 5 second intervals:
breaks <- seq(as.POSIXct("2014-12-17 00:00:00"),
as.POSIXct("2014-12-17 01:00:00"),by="5 sec")
(This time, look at ?seq.POSIXt.)
Now we can plot the histogram. Note how we assign the output of hist() to an object bar:
bar <- hist(foo,breaks)
(If you don't want the plot, but only the bucket counts, use plot=FALSE.)
?hist tells you that hist() (invisibly) returns the counts per bucket. We can look at this by accessing the counts slot of bar:
bar$counts
[1] 1 2 0 1 0 1 1 2 3 3 0 ...
I have data as follows in .csv format as I am new to ggplot2 graphs I am not able to do this
T L
141.5453333 1
148.7116667 1
154.7373333 1
228.2396667 1
148.4423333 1
131.3893333 1
139.2673333 1
140.5556667 2
143.719 2
214.3326667 2
134.4513333 3
169.309 8
161.1313333 4
I tried to plot a line graph using following graph
data<-read.csv("sample.csv",head=TRUE,sep=",")
ggplot(data,aes(T,L))+geom_line()]
but I got following image it is not I want
I want following image as follows
Can anybody help me?
You want to use a variable for the x-axis that has lots of duplicated values and expect the software to guess that the order you want those points plotted is given by the order they appear in the data set. This also means the values of the variable for the x-axis no longer correspond to the actual coordinates in the coordinate system you're plotting in, i.e., you want to map a value of "L=1" to different locations on the x-axis depending on where it appears in your data.
This type of fairly non-sensical thing does not work in ggplot2 out of the box. You have to define a separate variable that has a proper mapping to values on the x-axis ("id" in the code below) and then overwrite the labels with the values for "L".
The coe below shows you how to do this, but it seems like a different graphical display would probbaly be better suited for this kind of data.
data <- as.data.frame(matrix(scan(text="
141.5453333 1
148.7116667 1
154.7373333 1
228.2396667 1
148.4423333 1
131.3893333 1
139.2673333 1
140.5556667 2
143.719 2
214.3326667 2
134.4513333 3
169.309 8
161.1313333 4
"), ncol=2, byrow=TRUE))
names(data) <- c("T", "L")
data$id <- 1:nrow(data)
ggplot(data,aes(x=id, y=T))+geom_line() + xlab("L") +
scale_x_continuous(breaks=data$id, labels=data$L)
You have an error in your code, try this:
ggplot(data,aes(x=L, y=T))+geom_line()
Default arguments for aes are:
aes(x, y, ...)
i would like to write a function with graphical output of original data regression and one for modified data. The original data regression should be an option. Moreover there should be legends in the graphs. And here is my problem:
If i choose the option: orig.plot=FALSE, everything works ok.
But when i choose the other option: orig.plot=TRUE, the position of my legends is not very satisfying.
# Generation of the data set
set.seed(444)
nr.outlier<- 10
x<-seq(0,60,length=150);
y<-rnorm(150,0,10);
yy<-x+y;
d<-cbind(x,yy)
# Manipulation of data:
ss1<-sample(1:nr.outlier,1) # sample size 1
sri1<-sample(c(1:round(0.2*length(x))),ss1) # sample row index 1
sb1<-c(yy[quantile(yy,0.95)<yy])# sample base 1
d[sri1,2]<-sample(sb1,ss1,replace=T) # manipulation of part 1
ss2<-nr.outlier-ss1 # sample size 2
sri2<-sample(c(round(0.8*length(x)+1):length(x)),ss2) # sample row index 2
sb2<-c(yy[quantile(yy,0.05)>yy])# sample base 2
d[sri2,2]<-sample(sb2,ss2,replace=T) # manipulation of par 2
tlm2<-function(x,y,alpha=0.95,orig.plot=FALSE,orig.ret=FALSE){
m1<-lm(y~x)
res<-abs(m1$res)
topres<-sort(res,decreasing=TRUE)[1:round((1-alpha)*length(x))] # top alpha*n residuals
topind<-rownames(as.data.frame(topres)) # indices of the top residuals
x2<-x[-as.numeric(topind)] #
y2<-y[-as.numeric(topind)] # removal of the identified observations
m2<-lm(y2~x2)
r2_m1<-summary(m1)$'r.squared'
r2_m2<-summary(m2)$'r.squared'
if(orig.plot==TRUE){
par(mfrow=c(2,1))
plot(x,y,xlim=range(x),ylim=c(min(d[,2])-30,max(d[,2]+30)),main="Model based on original data")
abline(m1$coef);legend("topleft",legend=bquote(italic(R)^2==.(r2_m1)),bty="n")
}
plot(x2,y2,xlim=range(x),ylim=c(min(d[,2])-30,max(d[,2]+30)),main="Model based on trimmed data")
abline(m2$coef);
legend("topleft",legend=bquote(atop(italic(R)^2==.(r2_m2),alpha==.(alpha))),bty="n")
return(if(orig.ret==TRUE){list(m1=m1,m2=m2)} else{m2})
}
tlm2(d[,1],d[,2])
tlm2(d[,1],d[,2],orig.plot=T)
Can anyone give me a hint?
Thank You in advance!
The problem is that atop is essentially a division (i.e. x/y) without plotting the division bar. It also centers the denominator. The solution is to use expression instead of bquote. However, to mix expressions and variables, you need to use substitute and eval:
Here's what your legend call should look like:
legend("topleft",legend=c(eval(substitute(expression(italic(R)^2==my.r2), list(my.r2 = r2_m2))) ,
eval(substitute(expression(alpha==my.alpha), list(my.alpha = alpha)))
)
,bty="n")
And here's the result: