I currently have a dataset which has a format of: (x, y, type)
I've used the code that is found on the example of plotting with Postgres through R.
My question is: How would I get R to generate multiple graphs for each unique "type" column?
I'm new to R, so my appologies if this is something that is extremely easy and I just lack the understanding of loops with R.
So lets say we have this data:
(1,1,T), (1,2,T), (1,3,T), (1,4,T), (1,5,T), (1,6,T),
(1,1,A), (1,2,B), (1,3,B), (1,4,B), (1,5,A), (1,6,A),
(1,1,B), (1,2,B), (1,3,C), (1,4,C), (1,5,C), (1,6,C),
It would plot 4 individual graphs on the page. One for each of the types T, A, B, and C. [Ploting x,y]
How would I do that with R when the data coming in may look like the data above?
While the other post has some good info, there's a faster way to do all that. So assuming your data frame or matrix is called DF and is in the form above (where each (1,2,B) or whatever is a row), then:
by(DF, DF[,3], function(x) plot(x[,1], x[,2], main=unique(x[,3])))
And that's it.
If you'd like all the four plots to be on the same page, you can first change the graphing paramter option:
par(mfrow=c(2,2))
And back to default par(mfrow=c(1,1) when you're done.
I'm quite fond of the ggplot2 package, which does the same thing that user1717913 suggests, but with slightly different syntax (it does a lot of other things very nicely, which is why I like it.)
test <- data.frame(x=rep(1,18),y=rep(1:6,3),type=c("T","T","T","T","T","T","A","B","B","B","A","A","B","B","C","C","C","C"))
require(ggplot2)
ggplot(test, aes(x=x, y=y)) + #define the data that the plot will use, and which variables go where
geom_point() + #plot it with points
facet_wrap(~type) #facet it by the type variable
R is really cool in that there's a bazillion (that's a technical term) different ways to do most things. The way I would do is is to split the data along the groups, and then plot by group.
To do that, the split command is what you want (I'll assume your data is in an object called data):
data.splitted <- split(data, data$type)
Now the data will have this form (let's assume you have 3 types, A, B, and C):
data.splitted
L A
| L x y type
| 1 4 A
| 3 6 A
L B
| L x y type
| 3 3 B
| 2 1 B
L C
L x y type
4 5 C
5 2 C
and so on. You would reference the "4" in the y column of group A like so:
data.splitted$A$y[1] or data.splitted[[1]][[2]][1] Hopefully seeing them both together makes enough sense.
Now that we have the data split, we're getting closer.
We still need to tell R that we want to plot a bunch of graphs to the same window. Now, this is just one way to go about it. You could also tell it to write each graph to a image file, or a pdf, or whatever you want.
groups <- names(data.splitted) puts your different types into a variable for reference later.
par(mfcol=c(length(groups),1))
Using mfcol fills the graphs in vertically. the mfrow option fills in horizontally. The c() just combines input. The length(groups) returns the total number of groups.
Now we can work on the for-loop.
for(i in 1:length(data.splitted)){ # This tells it what i is iterating from and to.
# It can start and stop wherever, or be a
# sequence, ascending or descending,
# the sky is the limit.
tempx <- data.splitted[[i]][[x]] # This just saves us
tempy <- data.splitted[[i]][[y]] # a bunch of typing.
plot(tempx, tempy, main=groups[i]) # Plot it and make the title the type.
rm(tempx, tempy) # Remove our temporary variables for the next run through.
}
So you see, it's not too bad when you break it down into its components. You can do pretty much anything this way. I have a project I'm working on right now, where I'm doing this for 18 lidar metrics that I calculated using another for loop.
Commands to read up on:
split, plot, data.frame, "[",
par(mfrow=___) and par(mfcol=___)
Here's a few helpful links to get you started. The most helpful one of all is built right in to R though. a ? followed by a command will bring up the html help for that command in your browser.
Good luck!
Related
I'm very new to R - but have been developing SAS-programs (and VBA) for some years. Well, the thing is that I have 4 lines of R-code (scripts?) that I would like to repeat 44 times. Two times for each of 22 different train stations, indicating whether the train is in- or out-going. The four lines of code are:
dataGL_FLIin <- subset( dataGL_all, select = c(Tidsinterval, Dag, M.ned, Ugenr.,Kode, Ugedag, FLIin))
names(dataGL_FLIin)[names(dataGL_FLIin)=='FLIin'] <- 'GL_Antal'
dataGL_FLIin$DIR<-"IN"
dataGL_FLIin$STATION<-"FLI
To avoid repeating the 4 lines 44 times I need 2 "macro variables" (yes, I'm aware, that this is a SAS-thing only, sorry). One "macro variable" indicating the train station and one indicating the direction. In the example above the train station is FLI and the direction is in. Below the same 4 lines are demonstrated for the train station FBE, this time in out-going direction.
dataGL_FBEout <- subset( dataGL_all, select = c(Tidsinterval, Dag, M.ned, Ugenr.,Kode, Ugedag, FBEout))
names(dataGL_FBEout)[names(dataGL_FBEout)=='FBEout'] <- 'GL_Antal'
dataGL_FBEout$DIR<-"OUT"
dataGL_FBEout$STATION<-"FBE"
I have looked many places and tried many combinations of R-functions and R-lists, but I can't make it work. Quite possible I'm getting it all wrong. I apologize in advance if the question is (too) stupid, but will however be very grateful for any help on the matter.
Pls. notice that I, in the end, want 44 different data-frames created:
1) dataGL_FLIin
2) dataGL_FBEout
3) Etc. ...
ADDED: 2 STATION 2 DIRECTIONS EXAMPLE OF MY PROBLEM
'The one data frame I have'
Date<-c("01-01-15 04:00","01-01-15 04:20","01-01-15 04:40")
FLIin<-c(96,39,72)
FLIout<-c(173,147,103)
FBEin<-c(96,116,166)
FBEout<-c(32,53,120)
dataGL_all<-data.frame(Date, FLIin, FLIout, FBEin, FBEout)
'The four data frames I would like'
GL_antal<-c(96,39,72)
Station<-("FLI")
Dir<-("IN")
dataGL_FLIin<-data.frame(Date, Station, Dir, GL_antal)
GL_antal<-c(173,147,103)
Station<-("FLI")
Dir<-("OUT")
dataGL_FLIout<-data.frame(Date, Station, Dir, GL_antal)
GL_antal<-c(96,116,166)
Station<-("FBE")
Dir<-("IN")
dataGL_FBEin<-data.frame(Date, Station, Dir, GL_antal)
GL_antal<-c(32,53,120)
Station<-("FBE")
Dir<-("OUT")
dataGL_FBEout<-data.frame(Date, Station, Dir, GL_antal)
Thanks,
lars
With your example, it is now clearer what you want and I give it a second try. I use dataGL_all as defined in your question and the define
stations <- rep(c("FLI","FBE"),each=2)
directions <- rep(c("in","out"),times=length(stations)/2)
You could also extract the stations and directions from your data frame. Using your example, the following would work
stations <- substr(names(dataGL_all)[-1],1,3)
directions <- substr(names(dataGL_all)[-1],4,6)
Then, I define the function that will work on the data:
dataGLfun <- function(station,direction) {
name <- paste0(station,direction)
dataGL <- dataGL_all[,c("Date", name)]
names(dataGL)[names(dataGL)==name] <- 'GL_Antal'
dataGL$DIR<-direction
dataGL$STATION<-station
dataGL
}
And now I apply this function to all stations with both directions:
dataGL <- mapply(dataGLfun,stations,directions,SIMPLIFY=FALSE)
names(dataGL) <- paste0(stations,directions)
Now, you can get the data frames for each combination of station and direction. For instance, the two examples in your question, you get with dataGL$FLIin and dataGL$FBEout. The reason that there is a $ instead of a _ is that I did not actually create a separate variable for each data frame. Instead, I created a list, where each element of the list is one of the data frames. This has the advantage that it will be easier to do something to all the data frames later. With your solution, you would have to type all the various variable names, but if the data frames are in a list, you can work with them using functions like lapply.
If you prefer to have many different variables, you could do the following
for (i in seq_along(stations)) {
assign(paste0("dataGL_",stations[i],directions[i]), dataGLfun(stations[i],directions[i]))
}
However, in my opinion, this is not how you should solve this problem in R.
I have a data matrix with approximately one hundred variables and I want to do box plots of these variables. Doing them one by one is possible, but tedious. The code I use for my box plots is:
boxplot(myVar ~ Group*Trt*Time,data=exp,col=c('red','blue'),frame.plot=T,las=2, ylab='Counts', at=c(1,2,3,4,6,7,8,9,11,12,13,14,16,17,18,19))
I started doing them one by one, but realized there must be better options. So, the boxplot call will take only one variable at at time (I may be wrong), so I am looking for a way to get it done in one go. A for loop? Next, I would like to print the name of the current variable (= the colName) on the plot in order to keep them apart.
Appreciate suggestions.
Thank you.
jd
Why not try the following:
data(something)
panel.bxp <- function(x, ...)
{
a <- par("a"); on.exit(par(a))
par(a = c(0, 2, a[3:4]))
boxplot(x, add=TRUE)
}
Then, to run the function, you can try something like the following:
pairs(something, diag.panel = panel.bxp, text.panel = function(...){})
EDIT: There is also a nice link to an article here on R-bloggers which you might want to have a look at.
Being very new to R, I've tried to follow my 'old' thinking - making a for-loop. Here is what I came up with. Probably very primitive, and therefore, I'd appreciate comments/suggestions. Anyway: the loop:
for (i in 1:ncol(final)) {
#print(i)
c <- colnames(final)[i]
#print(c)
b <- final[,i]
#b <- t(b)
#dim(b)
#print(b)
exp <- data.frame(Group,Trt,Time,b)
#dim(exp)
#print(exp)
boxplot(b ~ Group*Trt*Time,data=exp,col=c('red','blue'),frame.plot=T, las=2, ylab='Counts',main=c, at=c(1,2,3,4,6,7,8,9,11,12,13,14,16,17,18,19))
}
The loop runs through the data matrix 'final', (48rows x 67cols). Picks up the column header, c, which is used in the boxplot call as main title. Picks up the data column, b. Sets up the experiment using the Group, Trt, and Time factors established outside the loop, and calls the boxplot.
This seem to do what I want. Oddly, Rstudio does not allow more than 25 (approx) plots to be stored in the plots console, so I have to run this loop in a couple of rounds.
Anyway, sorry for answering my own question. Better solutions are greatly appreciated since my way is pretty amateourish, I suspect.
I have a data frame that looks like that:
bin_with_regard_to_strand CLONE3
31 0.14750872
33 0.52735917
28 0.48559060
. .
. .
I want to use this data frame to generate violin plots in such a way that all of the values in CLONE3 corresponding to a given value of bin_with_regard_to_strand will generate one plot.
Further, I want all of the plots to appear in the same graphic device (I'm using R-studio, and I want all of the plots to appear in one plot window).
Theoretically I could do this with:
vioplot(df$CLONE3[which(df$bin_with_regard_to_strand==1)],
df$CLONE3[which(df$bin_with_regard_to_strand==2)]...)
but since bin_with_regard_to_strand has 60 different values, this seems a bit ridiculous.
I tried using tapply:
tapply(df$CLONE3, df$bin_with_regard_to_strand,vioplot)
But that would open 60 different windows (one for each plot).
Or, if I used the add parameter:
tapply(df$CLONE3, df$bin_with_regard_to_strand,vioplot(add=TRUE))
generated a single plot with the data from all values bin_with_regard_to_strand (seperated by lines).
Is there a way to do this?
You could use par(mfrow=c(rows, columns)) (see ?par for details).
(see also ?layout for complexer arrangements)
d <- lapply(1:6, function(x)runif(100)) # generate some example data
library("vioplot")
par(mfrow=c(3, 2)) # use a 3x2 (rows x columns) layout
lapply(d, vioplot) # call plot for each list element
par(mfrow=c(1, 1)) # reset layout
Another alternative to mfrow, is to use layout. It is very handy to organize your plots. You just create a matrix with plots index. Here what you can do. It seems that 60 boxplots is a huge number. Maybe you should organize them in 2 pages.
The code below in function of N (number of plots)
library(vioplot)
N <- 60
par(mar=rep(2,4))
layout(matrix(c(1:N),
nrow=10,byrow=T))
dat <- data.frame(bin_with_regard_to_strand=gl(N,10),CLONE3=rnorm(10*N))
with(dat ,
tapply(CLONE3,bin_with_regard_to_strand ,vioplot))
This is an old question, but though I would put out a different solution for getting vioplot to make multiple violin plots on the same graph (i.e. same axes), rather than on different graphics objects like the above answers.
Basically use do.call to apply vioplot to a list of data. Ultimately, vioplot is not very well written (can't even set the title, axis names, etc.). I usually prefer base R, but this is a case where ggplot2 options is probably the way to go.
x<-rnorm(1000)
fac<-rep(c(1:10),each=100)
listOfData<-tapply(x,fac,function(x){x},simplify=FALSE)
names(listOfData)[[1]]<-"x" #because vioplot requires a 'x' argument
do.call(vioplot,listOfData)
resultingImage
Let me preface this question by saying that I know very little about R. I'm importing a text file into R using read.table("file.txt", T). The text file is in the general format:
header1 header2
a 1
a 4
b 3
b 2
Each a is an observation from a sample and similarly each b is an observation from a different sample. I want to calculate various statistics of the sets of a and b which I'm doing with tapply(header2, header1, mean). That works fine.
Now I need to do some qqnorm plots of a and b and draw with qqline. I can use tapply(header2, header1, qqnorm) to make quantile plots of each BUT using tapply(header2, header1, qqline) draws both best fit lines on the last quantile plot. Programatically that makes sense but it doesn't help me.
So my question is, how can convert the data frame to two vectors (one for all a and one for all b)? Does that make sense? Basically, in the above example, I'd want to end up with two vectors: a=(1,4) and b=(3,2).
Thanks!
Create a function that does both. You won't be able (easily at least) to revert to an old graphics device.
e.g.
with(dd, tapply(header2,header1, function(x) {qqnorm(x); qqline(x)}))
You could use data.table here for coding elegance (and speed)
You can pass the equivalent of a body of a function that is evaluated within the scope of the data.table e.g.
library(data.table)
DT <- data.table(dd)
DT[, {qqnorm(x)
qqline(x)}, by=header1]
You don't really want to pollute your global environments with lots of objects (that will be inefficient).
So I have some lidar data that I want to calculate some metrics for (I'll attach a link to the data in a comment).
I also have ground plots that I have extracted the lidar points around, so that I have a couple hundred points per plot (19 plots). Each point has X, Y, Z, height above ground, and the associated plot.
I need to calculate a bunch of metrics on the plot level, so I created plotsgrouped with split(plotpts, plotpts$AssocPlot).
So now I have a data frame with a "page" for each plot, so I can calculate all my metrics by the "plot page". This works just dandy for individual plots, but I want to automate it. (yes, I know there's only 19 plots, but it's the principle of it, darn it! :-P)
So far, I've got a for loop going that calculates the metrics and puts the results in a data frame called Results. I pulled the names of the groups into a list called groups as well.
for(i in 1:length(groups)){
Results$Plot[i] <- groups[i]
Results$Mean[i] <- mean(plotsgrouped$PLT01$Z)
Results$Std.Dev.[i] <- sd(plotsgrouped$PLT01$Z)
Results$Max[i] <- max(plotsgrouped$PLT01$Z)
Results$75%Avg.[i] <- mean(plotsgrouped$PLT01$Z[plotsgrouped$PLT01$Z <= quantile(plotsgrouped$PLT01$Z, .75)])
Results$50%Avg.[i] <- mean(plotsgrouped$PLT01$Z[plotsgrouped$PLT01$Z <= quantile(plotsgrouped$PLT01$Z, .50)])
...
and so on.
The problem arises when I try to do something like:
Results$mean[i] <- mean(paste("plotsgrouped", groups[i],"Z", sep="$")). mean() doesn't recognize the paste as a reference to the vector plotsgrouped$PLT27$Z, and instead fails. I've deduced that it's because it sees the quotes and thinks, "Oh, you're just some text, I can't get the mean of you." or something to that effect.
Btw, groups is a list of the 19 plot names: PLT01-PLT27 (non-consecutive sometimes) and FTWR, so I can't simply put a sequence for the numeric part of the name.
Anyone have an easier way to iterate across my test plots and get arbitrary metrics?
I feel like I have all the right pieces, but just don't know how they go together to give me what I want.
Also, if anyone can come up with a better title for the question, feel free to post it or change it or whatever.
Try with:
for(i in seq_along(groups)) {
Results$Plot[i] <- groups[i] # character names of the groups
tempZ = plotsgrouped[[groups[i]]][["Z"]]
Results$Mean[i] <- mean(tempZ)
Results$Std.Dev.[i] <- sd(tempZ)
Results$Max[i] <- max(tempZ)
Results$75%Avg.[i] <- mean(tempZ[tempZ <= quantile(tempZ, .75)])
Results$50%Avg.[i] <- mean(tempZ[tempZ <= quantile(tempZ, .50)])
}