subset dataframe and plot all the subsets with a loop [R]

subset dataframe and plot all the subsets with a loop [R] - r

Im working with a dataframe with 8 useful variables, the idea of the code its to plot 4 variables (3 on y axis and a common x axis). The data frame looks like this:
It has like 6500 rows
I want to subset the data.frame from the file column, and then plot LogP as a x axis and Temperature, RH and ozone as y axis.
I tried using subset inside the plot function but didnt go well. I used this code for the plot with one of the original files, but no idea how to include the subset
> plot(DataOzono$LogP, DataOzono$Temperature, axes= F,type="l",col="red", ylab = NULL, xlab = 'LogP',xaxt="n",yaxt="n" )
axis(2,ylim(c(min(DataOzono$Temperature),max(DataOzono$Temperature)), layout.widths(2)))
mtext(text = 'T',line = 2,side = 2)
par(new=TRUE)
plot(DataOzono$LogP, DataOzono$RH,type="l",col="blue",xaxt="n",yaxt="n",xlab="",ylab="")
axis(4)
mtext("RH",side=4,line=2)
par(new=TRUE)
plot(DataOzono$LogP, DataOzono$Ozone,type="l",col="green",xaxt="n",yaxt="n",xlab="",ylab="")
mtext("O3",side=5,line=3)
axis(2, line = 4)
any advice will be very helpful.

Here's how to plot the charts in a loop. In the example you gave, we only have one file number. However, it should create a chart for every number in the file column. On Windows, you can use savePlot to save to your drive. I simplified your example because I was getting errors.
DataOzono <- read.table(text="pressure height Temperature RH Ozone file LogP
753.6 2541 16.8 76 0 80131 0.3475673
748.0 2604 17.7 32 0 80131 0.347959
743.5 2656 15.9 38 0 80131 0.3482766
739.8 2697 15.4 39 0 80131 0.3485396
736.6 2734 15.0 41 0 80131 0.3487685
731.8 2790 14.5 42 0 80131 0.3491142", header=TRUE, stringsAsFactors=FALSE)
original_par <- par()
par(mar=c(5.1, 8.1, 4.1, 3.1))
for (i in unique(DataOzono$file)){
DataOzono_subset <- DataOzono[DataOzono$file==i,] #keep only rows for that file number
plot(DataOzono_subset$LogP, DataOzono_subset$Temperature, axes= F,type="l",col="red", ylab = "", xlab = 'LogP',xaxt="n",yaxt="n" )
axis(2,col="red",col.axis="red")
mtext(text = 'T',line = 2,side = 2,col="red",col.lab="red")
par(new=TRUE)
plot(DataOzono_subset$LogP, DataOzono_subset$RH,type="l",col="blue",xaxt="n",yaxt="n",xlab="",ylab="")
axis(4,col="blue",col.axis="blue")
mtext("RH",side=4,line=2,col="blue",col.lab="blue" )
par(new=TRUE)
plot(DataOzono_subset$LogP, DataOzono_subset$Ozone,type="l",col="darkgreen",xaxt="n",yaxt="n",xlab="",ylab="")
mtext("O3",side=2,line=6,,col="darkgreen",col.lab="darkgreen")
axis(2, line = 4,col="darkgreen",col.axis="darkgreen")
savePlot(filename=paste0("c:/temp/",i,".png"),type="png")
}
par() <- original_par #restore par to initial value.

Related

How to create multiple plots (plot means) on the same graph?

TL;DR: Trying to create multiple plots in one graph (image attached), using loop function. Currently manually creating codes for each boxplot, then using par() function to plot them together. It works, but looking for a less repetitive way.
I was wondering if it's possible to create multiple plots; specifically to plot "plot means". You can find the exact output in image form here (the second example on plot means): How to create multiple ggboxplots on the same graph using the loop function?
My data looks something like this:
# A tibble: 62 x 4
offer payoff partner_transfer round_type
<dbl> <dbl> <dbl> <chr>
1 40 126 66 actual
2 100 273 273 actual
3 0 100 0 actual
4 100 6 6 actual
5 25 99 24 actual
6 80 29 9 practice
7 100 45 45 practice
8 0 100 0 practice
9 25 99 24 practice
10 100 183 183 practice
# ... with 52 more rows
I'm trying to get it to look like this:
![sample plot means][2]
Currently, my code to get this output is:
par(mfrow = c(2,2))
plot_offer <- plotmeans( offer ~ round_type, data = tg_proposer_split,
xlab = "Round Type", ylab = "Offer (by A)",
main="Mean Plot with 95% CI")
plot_partner_transfer <- plotmeans( partner_transfer ~ round_type, data = tg_proposer_split,
xlab = "Round Type", ylab = "Amount Transferred by Partner (Bot)",
main="Mean Plot with 95% CI")
plot_payoff <- plotmeans( payoff ~ round_type, data = tg_proposer_split,
xlab = "Round Type", ylab = "Payoff (for A)",
main="Mean Plot with 95% CI")
Is there a way I can shorten this code?
Biggest apologies, for some reason I'm unable to attach images because I haven't collated enough reputation points so I have no choice but to try it this way. Hope it is still clear.
Many thanks!

Here is a way to simplify the code with Map.
Define a general purpose function to take care of the plot, fun_plot;
Get the column names of the y axis variables;
Create a vector of y axis labels;
Plot in a Map loop.
The code becomes
fun_plot <- function(ycol, ylab){
fmla <- paste(ycol, "round_type", sep = "~")
fmla <- as.formula(fmla)
plotmeans(fmla, data = tg_proposer_split,
xlab = "Round Type", ylab = ylab,
main = "Mean Plot with 95% CI")
}
y_cols <- names(tg_proposer_split)[which(names(tg_proposer_split) != "round_type")]
y_lab <- c("Offer (by A)", "Amount Transferred by Partner (Bot)", "Payoff (for A)")
old_par <- par(mfrow = c(2,2))
Map(fun_plot, y_cols, y_lab)
par(old_par)
Edit.
Following the error reported in comment, here is a more general function, allowing for xcol and the data set to take any values, not just "round_type" and tg_proposer_split, respectively. This solution now uses mapply, not Map, in order for those two arguments to be passed in a MoreArgs list.
fun_plot2 <- function(ycol, ylab, xcol, data){
fmla <- paste(ycol, xcol, sep = "~")
fmla <- as.formula(fmla)
plotmeans(fmla, data = data,
xlab = "Round Type", ylab = ylab,
main = "Mean Plot with 95% CI")
}
old_par <- par(mfrow = c(2,2))
mapply(fun_plot2, y_cols, y_lab,
MoreArgs = list(
xcol = "round_type",
data = tg_proposer_split
)
)
par(old_par)
Data
tg_proposer_split <- read.table(text = "
offer payoff partner_transfer round_type
1 40 126 66 actual
2 100 273 273 actual
3 0 100 0 actual
4 100 6 6 actual
5 25 99 24 actual
6 80 29 9 practice
7 100 45 45 practice
8 0 100 0 practice
9 25 99 24 practice
10 100 183 183 practice
", header = TRUE)

Generate multiple plots in base R with loop function then concatenate by matching group variables

I have a data frame (below, my apologies for the verbose code, this is my first attempt at generating reproducible random data) that I'd like to loop through and generate individual plots in base R (specifically, ethograms) for each subject's day and video clip (e.g. subj-1/day1/clipB). After generating n graphs, I'd like to concatenate a PDF for each subj that includes all days + clips, and have each row correspond to a single day. I haven't been able to get past the generating individual graphs, however, so any help would be greatly appreciated!
Data frame
n <- 20000
library(stringi)
test <- as.data.frame(sprintf("%s", stri_rand_strings(n, 2, '[A-Z]')))
colnames(test)<-c("Subj")
test$Day <- sample(1:3, size=length(test$Subj), replace=TRUE)
test$Time <- sample(0:600, size=length(test$Subj), replace=TRUE)
test$Behavior <- as.factor(sample(c("peck", "eat", "drink", "fly", "sleep"), size = length(test$Time), replace=TRUE))
test$Vid_Clip <- sample(c("Clip_A", "Clip_B", "Clip_C"), size = length(test$Time), replace=TRUE)
Sample data from data frame:
> head(test)
Subj Day Time Behavior Vid_Clip
1 BX 1 257 drink Clip_B
2 NP 2 206 sleep Clip_B
3 ZF 1 278 peck Clip_B
4 MF 2 391 sleep Clip_A
5 VE 1 253 fly Clip_C
6 ID 2 359 eat Clip_C
After adapting this code, I am able to successfully generate a single plot (one at a time):
Subset single subj/day/clip:
single_subj_day_clip <- test[test$Vid_Clip == "Clip_B" & test$Subj == "AA" & test$Day == 1,]
After which, I can generate the graph I'm after by running the following lines:
beh_numb <- nlevels(single_subj_day_clip$Behavior)
mar.default <- c(5,4,4,2) + 0.1
par(mar = mar.default + c(0, 4, 0, 0))
plot(single_subj_day_clip$Time,
xlim=c(0,max(single_subj_day_clip$Time)), ylim=c(0, beh_numb), type="n",
ann=F, yaxt="n", frame.plot=F)
for (i in 1:length(single_subj_day_clip$Behavior)) {
ytop <- as.numeric(single_subj_day_clip$Behavior[i])
ybottom <- ytop - 0.5
rect(xleft=single_subj_day_clip$Subj[i], xright=single_subj_day_clip$Time[i+1],
ybottom=ybottom, ytop=ytop, col = ybottom)}
axis(side=2, at = (1:beh_numb -0.25), labels=levels(single_subj_day_clip$Behavior), las = 1)
mtext(text="Time (sec)", side=1, line=3, las=1)
Example graph from randomly generate data(sorry for link - newb SO user so until I'm at 10 reputation pts, I can't embed an image directly)
Example graph from actual data
Ideal per subject graph
Thank you all in advance for your input.
Cheers,
Dan

New and hopefully correct answer
The code is too long to post it here, so there is a link to the Dropbox folder with data and code. You can check this html document or run this .Rmd file on your machine. Please check if all required packages are installed. There is the output of the script.
There are additional problem in the analysis - some events are registered only once, at a single time point between other events. So there is no "width" of such bars. I assigned width of such events to 1000 ms, so some (around 100 per 20000 observations) of them are out of scale if they are at the beginning or at the end of the experiment (and if the width for such events is equal to zero). You can play with the code to fix this behavior.
Another problem is the different colors for the same factors on the different plots. I need some fresh air to fix it as well.
Looking into the graphs, you can notice that sometimes, it seems that some observation with a very short time are overlapping with other observations. But if you zoom the pdf to the maximum - you will see that they are not, and there is a 'holes' in underlying intervals, where they are supposed to be.
Lines, connecting the intervals for different kinds of behavior are helping to follow the timecourse of the experiment. You can uncomment corresponding parts of the code, if you wish.
Please let me know if it works.
Old answer
I am not sure it is the best way to do it, but probably you can use split() and after that lapply through your tables:
Split your data.frame by Subj, Day, and Vid_clip:
testl <- split(test, test[, c(1, 2, 5)], drop = T)
testl[[1123]]
# Subj Day Time Behavior Vid_Clip
#8220 ST 2 303 fly Clip_A
#9466 ST 2 463 fly Clip_A
#9604 ST 2 32 peck Clip_A
#10659 ST 2 136 peck Clip_A
#13126 ST 2 47 fly Clip_A
#14458 ST 2 544 peck Clip_A
Loop through the list with your data and plot to .pdf:
mar.default <- c(5,4,4,2) + 0.1
par(mar = mar.default + c(0, 4, 0, 0))
nbeh = nlevels(test$Behavior)
pdf("plots.pdf")
invisible(
lapply(testl, function(l){
plot(x = l$Time, xlim = c(0, max(l$Time)), ylim = c(0, nbeh),
type = "n", ann = F, yaxt = "n", frame.plot = F)
lapply(1:nbeh, function(i){
ytop <- as.numeric(l$Behavior[i]); ybot <- ytop - .5
rect(l$Subj[i], ybot, l$Time[i + 1], ytop, col = ybot)
})
axis(side = 2, at = 1:nbeh - .25, labels = levels(l$Behavior), las = 1)
mtext(text = "Time (sec)", side = 1, line = 3, las = 1)
})
)
dev.off()
You should probably check output here before you run code on your PC. I didn't edit much your plot-code, so please check it twice.

join axes in barplot

I would like to eliminate the gap between the x and y axes in barplot and extend the predicted line back to intersect the y axis, preferably in base R. Is this possible? Thank you for any advice or suggestions.
my.data <- read.table(text = '
band mid.point count
1 0.5 74
2 1.5 73
3 2.5 79
4 3.5 70
5 4.5 78
6 5.5 63
7 6.5 59
8 7.5 60
', header = TRUE)
my.data
x <- my.data$mid.point^2
my.model <- lm(count ~ x, data = my.data)
my.plot <- barplot(my.data$count, ylim=c(0,100), space=0, col=NA)
axis(1, at=my.plot+0.5, labels=my.data$band)
lines(predict(my.model, data.frame(x=x), type="resp"), col="black", lwd = 1.5)

EDIT November 26, 2014
I just realized the two plots are not the same (the plot in the original post and the plot in my answer below). Compare the two curved lines closely, particularly at the right-side of the plot. Clearly the two curved lines intersect the top of the 8th bar in different locations. However, I have not yet had time to figure out why the plots differ.
Here is one way to extrapolate the predicted line back to the y axis. I incorporate rawr's suggestion regarding eliminating the gap between the y axis and the x axis.
setwd('c:/users/markm/simple R programs/')
jpeg(filename = "barplot_and_line.jpeg")
my.data <- read.table(text = '
band mid.point count
1 0.5 74
2 1.5 73
3 2.5 79
4 3.5 70
5 4.5 78
6 5.5 63
7 6.5 59
8 7.5 60
', header = TRUE)
x <- my.data$mid.point^2
my.model <- lm(count ~ x, data = my.data)
z <- seq(0,8,0.01)
y <- my.model$coef[1] + my.model$coef[2] * z^2
barplot(my.data$count, ylim=c(0,100), space=0, col=NA, xaxs = 'i')
points(z, y, type='l', col=1)
dev.off()

I would like to read data from an output file and create stacked bar graphs?

This is the code that I have used because I am new in R and I kind of used something I found online to create graphs, however I hard coded majority of it and now I just want it to read the information from the file and create these graphs. The code I have now is as such:
ylabels <- c( "sampleXYZ",
"sampleXY",
"sampleG",
"sampleF",
"sampleE",
"sampleD",
"sampleC",
"sampleB",
"sampleA"
)
tablex = read.table(file="testGraphData.txt", header=FALSE, sep="\t")
dataframe <- as.data.frame.matrix(tablex)
test <- matrix(c(dataframe[1,2],dataframe[2,2],dataframe[3,2],dataframe[4,2],dataframe[5,2],dataframe[6,2],dataframe[7,2],dataframe[8,2],dataframe[9,2],5,10,20,25,30,35,40,45,50),
nrow =2 ,
ncol=9,
byrow=TRUE,
dimnames = list(c("Calculated", "Normal"),
ylabels))
par(mar=c(5.1, max(4.1,max(nchar(ylabels))/1.8) ,4.1 ,2.1))
plot_colors <- c("#458B74","#8B0000")
barplot(test,
col = plot_colors,
las=2,
beside = TRUE,
legend = T,
args.legend = list(x="topright", cex=.75),
xlim = c(0,110),
horiz=T)
abline (v=seq(0,80, 20))
This is the graph and this:
This is the graph I want but, I want to create it directly from the output file which is tab delimited and looks like this (has NO headers):
sampleA 7
sampleB 0
sampleC 53
sampleD 0
sampleE 28
sampleF 0
sampleG 0
sampleXY 0
sampleXYZ 12

something like this might work
table=read.table(file="testGraphData.txt")
names=table[,1]
values=table[,2]
barplot(values, names=names)
or
table=read.table(file="testGraphData.txt")
barplot(table[,2], names=table[,1])

Multiple data points in one R ggplot2 plot

I have two sets of data points that both relate to the same primary axis, but who differ in secondary axis. Is there some way to plot them on top of each other in R using ggplot2?
What I am looking for is basically something that looks like this:
4+ |
| x . + 220
3+ . . |
| x |
2+ . + 210
| x |
1+ . x x |
| + 200
0+-+-+-+-+-+-+
time
. temperatur
x car sale
(This is just a example of possible data)

Shane's answer, "you can't in ggplot2," is correct, if incomplete. Arguably, it's not something you want to do. How do you decide how to scale the Y axis? Do you want the means of the lines to be the same? The range? There's no principled way of doing it, and it's too easy to make the results look like anything you want them to look like. Instead, what you might want to do, especially in a time-series like that, is to norm the two lines of data so that at a particular value of t, often min(t), Y1 = Y2 = 100. Here's an example I pulled off of the Bonddad Blog (not using ggplot2, which is why it's ugly!) But you can cleanly tell the relative increase and decrease of the two lines, which have completely different underlying scales.

I'm not an expert on this, but it's my understanding that this is possible with lattice, but not with ggplot2. See this leanr blog post for an example of a secondary axis plot. Also see Hadley's response to this question.
Here's an example of how to do it in lattice (from Gabor Grothendieck):
library(lattice)
library(grid) # needed for grid.text
# data
Lines.raw <- "Date Fo Co
6/27/2007 57.1 13.9
6/28/2007 57.7 14.3
6/29/2007 57.8 14.3
6/30/2007 57 13.9
7/1/2007 57.1 13.9
7/2/2007 57.2 14.0
7/3/2007 57.3 14.1
7/4/2007 57.6 14.2
7/5/2007 58 14.4
7/6/2007 58.1 14.5
7/7/2007 58.2 14.6
7/8/2007 58.4 14.7
7/9/2007 58.7 14.8
"
# in reality next stmt would be DF <- read.table("myfile.dat", header = TRUE)
DF <- read.table(textConnection(Lines.raw), header = TRUE)
DF$Date <- as.Date(DF$Date, "%m/%d/%Y")
par.settings <- list(
layout.widths = list(left.padding = 10, right.padding = 10),
layout.heights = list(bottom.padding = 10, top.padding = 10)
)
xyplot(Co ~ Date, DF, default.scales = list(y = list(relation = "free")),
ylab = "C", par.settings = par.settings)
trellis.focus("panel", 1, 1, clip.off = TRUE)
pr <- pretty(DF$Fo)
at <- 5/9 * (pr - 32)
panel.axis("right", at = at, lab = pr, outside = TRUE)
grid.text("F", x = 1.1, rot = 90) # right y axis label
trellis.unfocus()

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

subset dataframe and plot all the subsets with a loop [R] - r

Related

How to create multiple plots (plot means) on the same graph?

Generate multiple plots in base R with loop function then concatenate by matching group variables

join axes in barplot

I would like to read data from an output file and create stacked bar graphs?

Multiple data points in one R ggplot2 plot

Categories

Resources