TL;DR: Trying to create multiple plots in one graph (image attached), using loop function. Currently manually creating codes for each boxplot, then using par() function to plot them together. It works, but looking for a less repetitive way.
I was wondering if it's possible to create multiple plots; specifically to plot "plot means". You can find the exact output in image form here (the second example on plot means): How to create multiple ggboxplots on the same graph using the loop function?
My data looks something like this:
# A tibble: 62 x 4
offer payoff partner_transfer round_type
<dbl> <dbl> <dbl> <chr>
1 40 126 66 actual
2 100 273 273 actual
3 0 100 0 actual
4 100 6 6 actual
5 25 99 24 actual
6 80 29 9 practice
7 100 45 45 practice
8 0 100 0 practice
9 25 99 24 practice
10 100 183 183 practice
# ... with 52 more rows
I'm trying to get it to look like this:
![sample plot means][2]
Currently, my code to get this output is:
par(mfrow = c(2,2))
plot_offer <- plotmeans( offer ~ round_type, data = tg_proposer_split,
xlab = "Round Type", ylab = "Offer (by A)",
main="Mean Plot with 95% CI")
plot_partner_transfer <- plotmeans( partner_transfer ~ round_type, data = tg_proposer_split,
xlab = "Round Type", ylab = "Amount Transferred by Partner (Bot)",
main="Mean Plot with 95% CI")
plot_payoff <- plotmeans( payoff ~ round_type, data = tg_proposer_split,
xlab = "Round Type", ylab = "Payoff (for A)",
main="Mean Plot with 95% CI")
Is there a way I can shorten this code?
Biggest apologies, for some reason I'm unable to attach images because I haven't collated enough reputation points so I have no choice but to try it this way. Hope it is still clear.
Many thanks!
Here is a way to simplify the code with Map.
Define a general purpose function to take care of the plot, fun_plot;
Get the column names of the y axis variables;
Create a vector of y axis labels;
Plot in a Map loop.
The code becomes
fun_plot <- function(ycol, ylab){
fmla <- paste(ycol, "round_type", sep = "~")
fmla <- as.formula(fmla)
plotmeans(fmla, data = tg_proposer_split,
xlab = "Round Type", ylab = ylab,
main = "Mean Plot with 95% CI")
}
y_cols <- names(tg_proposer_split)[which(names(tg_proposer_split) != "round_type")]
y_lab <- c("Offer (by A)", "Amount Transferred by Partner (Bot)", "Payoff (for A)")
old_par <- par(mfrow = c(2,2))
Map(fun_plot, y_cols, y_lab)
par(old_par)
Edit.
Following the error reported in comment, here is a more general function, allowing for xcol and the data set to take any values, not just "round_type" and tg_proposer_split, respectively. This solution now uses mapply, not Map, in order for those two arguments to be passed in a MoreArgs list.
fun_plot2 <- function(ycol, ylab, xcol, data){
fmla <- paste(ycol, xcol, sep = "~")
fmla <- as.formula(fmla)
plotmeans(fmla, data = data,
xlab = "Round Type", ylab = ylab,
main = "Mean Plot with 95% CI")
}
old_par <- par(mfrow = c(2,2))
mapply(fun_plot2, y_cols, y_lab,
MoreArgs = list(
xcol = "round_type",
data = tg_proposer_split
)
)
par(old_par)
Data
tg_proposer_split <- read.table(text = "
offer payoff partner_transfer round_type
1 40 126 66 actual
2 100 273 273 actual
3 0 100 0 actual
4 100 6 6 actual
5 25 99 24 actual
6 80 29 9 practice
7 100 45 45 practice
8 0 100 0 practice
9 25 99 24 practice
10 100 183 183 practice
", header = TRUE)
Related
I am visiting a bird sanctuary that has many different species of birds. Some species are more numerous while other species are less numerous. I came back to the sanctuary 9 times and after every visit I am calculating the total number of species I observed. Unsurprisingly, there is a diminishing return in my visits, since I observe the most numerous species on my every visit, but it does not increase the count of observed species. What is the best function in R to predict how many birds I will observe on my 20th visit?
Here is the data.frame
d <- structure(list(visit = 1:9,
totalNumSpeciesObserved = c(200.903, 296.329, 370.018, 431.59, 485.14, 533.233, 576.595, 616.536, 654)),
class = "data.frame", row.names = c(NA, 9L))
I expect to see a model that fits data well and behaves in a "log-like" fashion, predicting diminishing returns
In order to best ask a question, stack has some good links: https://stackoverflow.com/help/how-to-ask
If you're trying to model this, I might take the approach of a regression on the square root of the independent variable based on the data. Kind of strange to think about it as a function of visits though... Maybe if it were even spaced time periods it would make more sense.
d <- structure(list(visit = 1:9,
totalNumSpeciesObserved = c(200.903, 296.329, 370.018, 431.59, 485.14, 533.233, 576.595, 616.536, 654)),
class = "data.frame", row.names = c(NA, 9L))
mod <- lm(totalNumSpeciesObserved ~ I(sqrt(visit)), d)
new.df <- data.frame(visit=1:13)
out <- predict(mod, newdata = new.df)
plot(d, type = 'o',pch = 16, xlim = c(1,13), ylim = c(200,800), lwd = 2, cex = 2)
points(out, type= 'o', pch = 21, col = "blue", cex = 2)
The I() wrapper allows you to transform the independent variable on the fly, hense the use of sqrt() without needing to save a new variable.
I also don't know if this helps, but you could build a simulator to test for asymptoptic behaviour. For example you could build a population:
population <- sample(size = 1e6, LETTERS[1:20],
replace = TRUE, prob = 1/(2:21)^2)
This would say there are 20 species and decreasing probability in your population (expand as you wish).
The you could simulate visits and information about your visit. For example how large is the sample of your visit? During a visit you only see 1% of the rainforest etc.
sim_visits <- function(visits, percent_obs, population){
species_viewed <- vector()
unique_views <- vector()
for(i in 1:visits){
my_samp <- sample(x = population, size = round(percent_obs*length(population),0),
replace = FALSE)
species_viewed <- c(species_viewed, my_samp)
unique_views[i] <- length(unique(species_viewed))
}
new_observed <- unique_views - dplyr::lag(unique_views, 1, 0)
df <- data.frame(unique_views = unique_views, new_observed)
df$cummulative <- cumsum(unique_views)
df
}
And then you could draw from the simulation many times and see what distribution of values you get.
sim_visits(9, percent_obs = .001, population = population)
unique_views new_observed cummulative
1 13 13 13
2 15 2 28
3 15 0 43
4 17 2 60
5 17 0 77
6 17 0 94
7 17 0 111
8 17 0 128
9 17 0 145
And don't know if this is helpful, but I find simulation a good way to conceptualise problems like these.
I have the following data:
Treatment Dose Value
FeSo4 200 104.17
TQ1 6 98.17
TQ2 9 92
TQ3 12 86.67
TQ4 15 77.33
TQ5 18 71.33
TQ6 21 74.83
TQ7 24 82.17
How can I do Broken-line regression analysis of this data in R to get the graph as below:
The best way to fit linear models by segments in R is to use CRAN package segmented.
In what follows, I have created a new column, coercing column Treatment from class factor to its integer codes.
library(segmented)
df1$Num <- as.integer(df1$Treatment)
fit <- lm(Value ~ Num, df1)
summary(fit)
seg <- segmented(fit, seg.Z = ~Num, psi = 6)
plot(Value ~ Num, df1) # plot the points
plot(seg, add = TRUE) # plot the broken line
abline(v = seg$psi[2]) # plot the vertical at the breakpoint
Data.
df1 <- read.table(text = "
Treatment Dose Value
FeSo4 200 104.17
TQ1 6 98.17
TQ2 9 92
TQ3 12 86.67
TQ4 15 77.33
TQ5 18 71.33
TQ6 21 74.83
TQ7 24 82.17
", header = TRUE)
A different approach is to first find the threshold and then fit a regular lm() model:
library(SiZer)
df <- read.table(text = "
Treatment Dose Value
FeSo4 200 104.17
TQ1 6 98.17
TQ2 9 92
TQ3 12 86.67
TQ4 15 77.33
TQ5 18 71.33
TQ6 21 74.83
TQ7 24 82.17
", header = TRUE)
df$Num <- as.integer(df$Treatment)
thr.pwl = piecewise.linear(df$Num, df$Value,
middle = 1, CI = FALSE,
bootstrap.samples = 1000, sig.level = 0.001)
thr.pwl
[1] "Threshold alpha: 6.30159931424453" #This is the threshold you need
[1] ""
[1] "Model coefficients: Beta[0], Beta[1], Beta[2]" #The estimates here are the same as in model.pwl, however, with lm() you can include also other independent variables
(Intercept) x w
111.48333 -6.63000 13.97001
model.pwl <- lm(Value ~ Num*(Num >= 6.30) + Num*(Num < 6.30),
data = df)
summary(model.pwl)
And you can plot it as:
plot(thr.pwl)
abline(v = thr.pwl$change.point)
However, with piecewise.linear() you can only us one threshold, while with segmented() more of them.
# The Old Faithful geyser data
d <- density(faithful$eruptions, bw = "sj")
> head(d$x)
[1] 1.179869 1.188363 1.196857 1.205350 1.213844 1.222338
I'm using density function in {stats}, and I'm wondering if it's possible to see density at specific values in the output? For example, currently, I have density estimates at eruption values of [1] 1.179869 1.188363 ... but what if I want to know the density estimates at eruption values 1 2 5 10 ...? Is there a way to extract these the density object, d?
If I understand you correctly, you want the probabilities where the x value equals some number (3 or 4 as in my solution)?
d <- density(faithful$eruptions, bw = "sj")
densityDF <- data.frame(xVals = d$x, prob = d$y)
densityDF$xVals <- round(densityDF$xVals)
densitySearch <- densityDF[densityDF$xVals %in% c(3,4),]
Result:
xVals prob
157 3 0.11229482
158 3 0.10721410
159 3 0.10230912
160 3 0.09765156
161 3 0.09318662
162 3 0.08891621
I found coplot {graphics} very useful for my plots. However, I would like to include there not only one line, but add there one another. For basic graphic I just need to add = TRUE to add another line, or tu use plot(..) and lines(..). For {lattice} I can save my plots as objects
a<-xyplot(..)
b<-xyplot(..)
and display it simply by a + as.layer(b). No one of these approaches works for coplot(), apparently because creating objects as a<-coplot() doesn't produce trellis graphic but NULL object.
Please, any help how to add data line in coplot()? I really like its graphic so I wish to keep it. Thank you !!
my exemle data are here: http://ulozto.cz/xPfS1uRH/repr-exemple-csv
My code:
sub.tab<-read.csv("repr_exemple.csv", , header = T, sep = "")
attach(sub.tab)
cells.f<-factor(cells, levels=c(2, 25, 100, 250, 500), # unique(cells.in.cluster)???
labels=c("size2", "size25", "size100", "size250", "size500"))
perc.f<-factor(perc, levels=c(5, 10), # unique(cells.in.cluster)???
labels=c("perc5", "perc10"))
# how to put these plots together?
a<- coplot(max_dist ~ time |cells.f + perc.f, data = sub.tab,
xlab = "ticks", type = "l", col = "black", lwd = 1)
b<- coplot(mean_dist ~ time |cells.f * perc.f, data = sub.tab,
xlab = "ticks", type = "l", col = "grey", lwd = 1)
a + as.layer(b) # this doesn't work
Please, how to merge these two plots (grey and black lines)? I couldn't figure it out... Thank you !
Linking to sample data isn't really as helpful. Here's a randomly created sample data set
set.seed(15)
dd <- do.call("rbind",
do.call("Map", c(list(function(a,b) {
cbind.data.frame(a,b, x=1:5,
y1=cumsum(rpois(5,7)),
y2=cumsum(rpois(5,9)))
}),
expand.grid(a=letters[1:5], b=letters[20:22])))
)
head(dd)
# a b x y1 y2
# 1 a t 1 8 16
# 2 a t 2 13 28
# 3 a t 3 25 35
# 4 a t 4 33 45
# 5 a t 5 39 57
# 6 b t 1 4 12
I will note the coplot is a base graphics function, not Lattice. But it does have a panel= parameter. And you can have the coplot() take care of subsetting your data for you (well, calculating the indexes at least). But, like other base graphics functions, plotting different groups isn't exactly trivial. You can do it in this case with
coplot(y~x|a+b,
# make a fake y col to cover range of all y1 and y2 values
cbind(dd, y=seq(min(dd$y1, dd$y2), max(dd$y1, dd$y2), length.out=nrow(dd))),
#request subscripts to be sent to panel function
subscripts=TRUE,
panel=function(x,y,subscripts, ...) {
# draw group 1
lines(x, dd$y1[subscripts])
# draw group 2
lines(x, dd$y2[subscripts], col="red")
})
This gives
I would like to eliminate the gap between the x and y axes in barplot and extend the predicted line back to intersect the y axis, preferably in base R. Is this possible? Thank you for any advice or suggestions.
my.data <- read.table(text = '
band mid.point count
1 0.5 74
2 1.5 73
3 2.5 79
4 3.5 70
5 4.5 78
6 5.5 63
7 6.5 59
8 7.5 60
', header = TRUE)
my.data
x <- my.data$mid.point^2
my.model <- lm(count ~ x, data = my.data)
my.plot <- barplot(my.data$count, ylim=c(0,100), space=0, col=NA)
axis(1, at=my.plot+0.5, labels=my.data$band)
lines(predict(my.model, data.frame(x=x), type="resp"), col="black", lwd = 1.5)
EDIT November 26, 2014
I just realized the two plots are not the same (the plot in the original post and the plot in my answer below). Compare the two curved lines closely, particularly at the right-side of the plot. Clearly the two curved lines intersect the top of the 8th bar in different locations. However, I have not yet had time to figure out why the plots differ.
Here is one way to extrapolate the predicted line back to the y axis. I incorporate rawr's suggestion regarding eliminating the gap between the y axis and the x axis.
setwd('c:/users/markm/simple R programs/')
jpeg(filename = "barplot_and_line.jpeg")
my.data <- read.table(text = '
band mid.point count
1 0.5 74
2 1.5 73
3 2.5 79
4 3.5 70
5 4.5 78
6 5.5 63
7 6.5 59
8 7.5 60
', header = TRUE)
x <- my.data$mid.point^2
my.model <- lm(count ~ x, data = my.data)
z <- seq(0,8,0.01)
y <- my.model$coef[1] + my.model$coef[2] * z^2
barplot(my.data$count, ylim=c(0,100), space=0, col=NA, xaxs = 'i')
points(z, y, type='l', col=1)
dev.off()