join axes in barplot - r

I would like to eliminate the gap between the x and y axes in barplot and extend the predicted line back to intersect the y axis, preferably in base R. Is this possible? Thank you for any advice or suggestions.
my.data <- read.table(text = '
band mid.point count
1 0.5 74
2 1.5 73
3 2.5 79
4 3.5 70
5 4.5 78
6 5.5 63
7 6.5 59
8 7.5 60
', header = TRUE)
my.data
x <- my.data$mid.point^2
my.model <- lm(count ~ x, data = my.data)
my.plot <- barplot(my.data$count, ylim=c(0,100), space=0, col=NA)
axis(1, at=my.plot+0.5, labels=my.data$band)
lines(predict(my.model, data.frame(x=x), type="resp"), col="black", lwd = 1.5)

EDIT November 26, 2014
I just realized the two plots are not the same (the plot in the original post and the plot in my answer below). Compare the two curved lines closely, particularly at the right-side of the plot. Clearly the two curved lines intersect the top of the 8th bar in different locations. However, I have not yet had time to figure out why the plots differ.
Here is one way to extrapolate the predicted line back to the y axis. I incorporate rawr's suggestion regarding eliminating the gap between the y axis and the x axis.
setwd('c:/users/markm/simple R programs/')
jpeg(filename = "barplot_and_line.jpeg")
my.data <- read.table(text = '
band mid.point count
1 0.5 74
2 1.5 73
3 2.5 79
4 3.5 70
5 4.5 78
6 5.5 63
7 6.5 59
8 7.5 60
', header = TRUE)
x <- my.data$mid.point^2
my.model <- lm(count ~ x, data = my.data)
z <- seq(0,8,0.01)
y <- my.model$coef[1] + my.model$coef[2] * z^2
barplot(my.data$count, ylim=c(0,100), space=0, col=NA, xaxs = 'i')
points(z, y, type='l', col=1)
dev.off()

Related

How to create multiple plots (plot means) on the same graph?

TL;DR: Trying to create multiple plots in one graph (image attached), using loop function. Currently manually creating codes for each boxplot, then using par() function to plot them together. It works, but looking for a less repetitive way.
I was wondering if it's possible to create multiple plots; specifically to plot "plot means". You can find the exact output in image form here (the second example on plot means): How to create multiple ggboxplots on the same graph using the loop function?
My data looks something like this:
# A tibble: 62 x 4
offer payoff partner_transfer round_type
<dbl> <dbl> <dbl> <chr>
1 40 126 66 actual
2 100 273 273 actual
3 0 100 0 actual
4 100 6 6 actual
5 25 99 24 actual
6 80 29 9 practice
7 100 45 45 practice
8 0 100 0 practice
9 25 99 24 practice
10 100 183 183 practice
# ... with 52 more rows
I'm trying to get it to look like this:
![sample plot means][2]
Currently, my code to get this output is:
par(mfrow = c(2,2))
plot_offer <- plotmeans( offer ~ round_type, data = tg_proposer_split,
xlab = "Round Type", ylab = "Offer (by A)",
main="Mean Plot with 95% CI")
plot_partner_transfer <- plotmeans( partner_transfer ~ round_type, data = tg_proposer_split,
xlab = "Round Type", ylab = "Amount Transferred by Partner (Bot)",
main="Mean Plot with 95% CI")
plot_payoff <- plotmeans( payoff ~ round_type, data = tg_proposer_split,
xlab = "Round Type", ylab = "Payoff (for A)",
main="Mean Plot with 95% CI")
Is there a way I can shorten this code?
Biggest apologies, for some reason I'm unable to attach images because I haven't collated enough reputation points so I have no choice but to try it this way. Hope it is still clear.
Many thanks!
Here is a way to simplify the code with Map.
Define a general purpose function to take care of the plot, fun_plot;
Get the column names of the y axis variables;
Create a vector of y axis labels;
Plot in a Map loop.
The code becomes
fun_plot <- function(ycol, ylab){
fmla <- paste(ycol, "round_type", sep = "~")
fmla <- as.formula(fmla)
plotmeans(fmla, data = tg_proposer_split,
xlab = "Round Type", ylab = ylab,
main = "Mean Plot with 95% CI")
}
y_cols <- names(tg_proposer_split)[which(names(tg_proposer_split) != "round_type")]
y_lab <- c("Offer (by A)", "Amount Transferred by Partner (Bot)", "Payoff (for A)")
old_par <- par(mfrow = c(2,2))
Map(fun_plot, y_cols, y_lab)
par(old_par)
Edit.
Following the error reported in comment, here is a more general function, allowing for xcol and the data set to take any values, not just "round_type" and tg_proposer_split, respectively. This solution now uses mapply, not Map, in order for those two arguments to be passed in a MoreArgs list.
fun_plot2 <- function(ycol, ylab, xcol, data){
fmla <- paste(ycol, xcol, sep = "~")
fmla <- as.formula(fmla)
plotmeans(fmla, data = data,
xlab = "Round Type", ylab = ylab,
main = "Mean Plot with 95% CI")
}
old_par <- par(mfrow = c(2,2))
mapply(fun_plot2, y_cols, y_lab,
MoreArgs = list(
xcol = "round_type",
data = tg_proposer_split
)
)
par(old_par)
Data
tg_proposer_split <- read.table(text = "
offer payoff partner_transfer round_type
1 40 126 66 actual
2 100 273 273 actual
3 0 100 0 actual
4 100 6 6 actual
5 25 99 24 actual
6 80 29 9 practice
7 100 45 45 practice
8 0 100 0 practice
9 25 99 24 practice
10 100 183 183 practice
", header = TRUE)

subset dataframe and plot all the subsets with a loop [R]

Im working with a dataframe with 8 useful variables, the idea of the code its to plot 4 variables (3 on y axis and a common x axis). The data frame looks like this:
It has like 6500 rows
I want to subset the data.frame from the file column, and then plot LogP as a x axis and Temperature, RH and ozone as y axis.
I tried using subset inside the plot function but didnt go well. I used this code for the plot with one of the original files, but no idea how to include the subset
> plot(DataOzono$LogP, DataOzono$Temperature, axes= F,type="l",col="red", ylab = NULL, xlab = 'LogP',xaxt="n",yaxt="n" )
axis(2,ylim(c(min(DataOzono$Temperature),max(DataOzono$Temperature)), layout.widths(2)))
mtext(text = 'T',line = 2,side = 2)
par(new=TRUE)
plot(DataOzono$LogP, DataOzono$RH,type="l",col="blue",xaxt="n",yaxt="n",xlab="",ylab="")
axis(4)
mtext("RH",side=4,line=2)
par(new=TRUE)
plot(DataOzono$LogP, DataOzono$Ozone,type="l",col="green",xaxt="n",yaxt="n",xlab="",ylab="")
mtext("O3",side=5,line=3)
axis(2, line = 4)
any advice will be very helpful.
Here's how to plot the charts in a loop. In the example you gave, we only have one file number. However, it should create a chart for every number in the file column. On Windows, you can use savePlot to save to your drive. I simplified your example because I was getting errors.
DataOzono <- read.table(text="pressure height Temperature RH Ozone file LogP
753.6 2541 16.8 76 0 80131 0.3475673
748.0 2604 17.7 32 0 80131 0.347959
743.5 2656 15.9 38 0 80131 0.3482766
739.8 2697 15.4 39 0 80131 0.3485396
736.6 2734 15.0 41 0 80131 0.3487685
731.8 2790 14.5 42 0 80131 0.3491142", header=TRUE, stringsAsFactors=FALSE)
original_par <- par()
par(mar=c(5.1, 8.1, 4.1, 3.1))
for (i in unique(DataOzono$file)){
DataOzono_subset <- DataOzono[DataOzono$file==i,] #keep only rows for that file number
plot(DataOzono_subset$LogP, DataOzono_subset$Temperature, axes= F,type="l",col="red", ylab = "", xlab = 'LogP',xaxt="n",yaxt="n" )
axis(2,col="red",col.axis="red")
mtext(text = 'T',line = 2,side = 2,col="red",col.lab="red")
par(new=TRUE)
plot(DataOzono_subset$LogP, DataOzono_subset$RH,type="l",col="blue",xaxt="n",yaxt="n",xlab="",ylab="")
axis(4,col="blue",col.axis="blue")
mtext("RH",side=4,line=2,col="blue",col.lab="blue" )
par(new=TRUE)
plot(DataOzono_subset$LogP, DataOzono_subset$Ozone,type="l",col="darkgreen",xaxt="n",yaxt="n",xlab="",ylab="")
mtext("O3",side=2,line=6,,col="darkgreen",col.lab="darkgreen")
axis(2, line = 4,col="darkgreen",col.axis="darkgreen")
savePlot(filename=paste0("c:/temp/",i,".png"),type="png")
}
par() <- original_par #restore par to initial value.

group and average a large numeric vector to plot

I have an R matrix which is very data dense. It has 500,000 rows. If I plot 1:500000 (x axis) to the third column of the matrix mat[, 3] it takes too long to plot, and sometimes even crashes. I've tried plot, matplot, and ggplot and all of them take very long.
I am looking to group the data by 10 or 20. ie, take the first 10 elements from the vector, average that, and use that as a data point.
Is there a fast and efficient way to do this?
We can use cut and aggregate to reduce the number of points plotted:
generate some data
set.seed(123)
xmat <- data.frame(x = 1:5e5, y = runif(5e5))
use cut and aggregate
xmat$cutx <- as.numeric(cut(xmat$x, breaks = 5e5/10))
xmat.agg <- aggregate(y ~ cutx, data = xmat, mean)
make plot
plot(xmat.agg, pch = ".")
more than 1 column solution:
Here, we use the data.table package to group and summarize:
generate some more data
set.seed(123)
xmat <- data.frame(x = 1:5e5,
u = runif(5e5),
z = rnorm(5e5),
p = rpois(5e5, lambda = 5),
g = rbinom(n = 5e5, size = 1, prob = 0.5))
use data.table
library(data.table)
xmat$cutx <- as.numeric(cut(xmat$x, breaks = 5e5/10))
setDT(xmat) #convert to data.table
#for each level of cutx, take the mean of each column
xmat[,lapply(.SD, mean), by = cutx] -> xmat.agg
# xmat.agg
# cutx x u z p g
# 1: 1 5.5 0.5782475 0.372984058 4.5 0.6
# 2: 2 15.5 0.5233693 0.032501186 4.6 0.8
# 3: 3 25.5 0.6155837 -0.258803746 4.6 0.4
# 4: 4 35.5 0.5378580 0.269690334 4.4 0.8
# 5: 5 45.5 0.3453964 0.312308395 4.8 0.4
# ---
# 49996: 49996 499955.5 0.4872596 0.006631221 5.6 0.4
# 49997: 49997 499965.5 0.5974486 0.022103345 4.6 0.6
# 49998: 49998 499975.5 0.5056578 -0.104263093 4.7 0.6
# 49999: 49999 499985.5 0.3083803 0.386846148 6.8 0.6
# 50000: 50000 499995.5 0.4377497 0.109197095 5.7 0.6
plot it all
par(mfrow = c(2,2))
for(i in 3:6) plot(xmat.agg[,c(1,i), with = F], pch = ".")

Link segments matched by column value in R

Hello
I am attempting to plot segmented lines and connect them by matching values.
I have already plotted segments by the "Start" and "End" values as x coordinates and the Group as the y coordinates in R. I would like to connect these segments with a line if they share the same "id", as indicated by my sample dataset data:
Name Start End Group ID
TP1 363248 366670 7 98
TP2 365869 369291 11 98
TP3 366459 369881 1 98
AB1 478324 481599 11 134
AB2 478855 482130 1 134
AB3 480681 483956 10 134
JD1 166771 169764 6 214
JD2 386419 389244 7 214
JD2 389025 391850 11 214
What I have so far using data is:
x <- seq(0, 4100000, length = 200)
y <- seq(0, 15, length = 200)
plot(x,y,type="n");
start.x <- (data[,2])
end.x <- (data[,3])
end.y <- start.y <- (data[,4]) # from and to y coords the same
segments(x0 = start.x, y0 = start.y, x1 = end.x, y1 = end.y)
lines(data[,1], data[,5])
My segments are plotted just fine, but my connecting lines do not appear. Any suggestions as to how I can draw connecting lines? Thank you very much.
In my code below I zoomed in the plot using the xlim and ylim parameters so we can get a better look at the plotted data.
As you can see, I'm using a for loop to iterate over each unique ID value. For each value, I get the combinations of all pairs of records in the group using combn(). I then iterate over each combination using apply(). For each combination I call segments() to draw a segment between the centers of the two (original) segments. I use a different color for each group so they can easily be distinguished.
df <- data.frame(Name=c('TP1','TP2','TP3','AB1','AB2','AB3','JD1','JD2','JD2'),Start=c(363248,365869,366459,478324,478855,480681,166771,386419,389025),End=c(366670,369291,369881,481599,482130,483956,169764,389244,391850),Group=c(7,11,1,11,1,10,6,7,11),ID=c(98,98,98,134,134,134,214,214,214));
xlim <- c(min(df$Start),max(df$End));
ylim <- c(min(df$Group),max(df$Group));
plot(NA,xlim=xlim,ylim=ylim,xlab='x',ylab='y');
start.x <- df[,'Start'];
end.x <- df[,'End'];
end.y <- start.y <- df[,'Group'];
segments(start.x,start.y,end.x,end.y);
uid <- unique(df$ID);
cols <- rainbow(length(uid));
for (i in seq_along(uid)) {
df.sub <- subset(df,ID==uid[i]);
col <- cols[i];
apply(combn(nrow(df.sub),2),2,function(ris) {
r1 <- df.sub[ris[1],];
r2 <- df.sub[ris[2],];
segments(mean(c(r1$Start,r1$End)),r1$Group,mean(c(r2$Start,r2$End)),r2$Group,col=col);
});
};

Multiple data points in one R ggplot2 plot

I have two sets of data points that both relate to the same primary axis, but who differ in secondary axis. Is there some way to plot them on top of each other in R using ggplot2?
What I am looking for is basically something that looks like this:
4+ |
| x . + 220
3+ . . |
| x |
2+ . + 210
| x |
1+ . x x |
| + 200
0+-+-+-+-+-+-+
time
. temperatur
x car sale
(This is just a example of possible data)
Shane's answer, "you can't in ggplot2," is correct, if incomplete. Arguably, it's not something you want to do. How do you decide how to scale the Y axis? Do you want the means of the lines to be the same? The range? There's no principled way of doing it, and it's too easy to make the results look like anything you want them to look like. Instead, what you might want to do, especially in a time-series like that, is to norm the two lines of data so that at a particular value of t, often min(t), Y1 = Y2 = 100. Here's an example I pulled off of the Bonddad Blog (not using ggplot2, which is why it's ugly!) But you can cleanly tell the relative increase and decrease of the two lines, which have completely different underlying scales.
I'm not an expert on this, but it's my understanding that this is possible with lattice, but not with ggplot2. See this leanr blog post for an example of a secondary axis plot. Also see Hadley's response to this question.
Here's an example of how to do it in lattice (from Gabor Grothendieck):
library(lattice)
library(grid) # needed for grid.text
# data
Lines.raw <- "Date Fo Co
6/27/2007 57.1 13.9
6/28/2007 57.7 14.3
6/29/2007 57.8 14.3
6/30/2007 57 13.9
7/1/2007 57.1 13.9
7/2/2007 57.2 14.0
7/3/2007 57.3 14.1
7/4/2007 57.6 14.2
7/5/2007 58 14.4
7/6/2007 58.1 14.5
7/7/2007 58.2 14.6
7/8/2007 58.4 14.7
7/9/2007 58.7 14.8
"
# in reality next stmt would be DF <- read.table("myfile.dat", header = TRUE)
DF <- read.table(textConnection(Lines.raw), header = TRUE)
DF$Date <- as.Date(DF$Date, "%m/%d/%Y")
par.settings <- list(
layout.widths = list(left.padding = 10, right.padding = 10),
layout.heights = list(bottom.padding = 10, top.padding = 10)
)
xyplot(Co ~ Date, DF, default.scales = list(y = list(relation = "free")),
ylab = "C", par.settings = par.settings)
trellis.focus("panel", 1, 1, clip.off = TRUE)
pr <- pretty(DF$Fo)
at <- 5/9 * (pr - 32)
panel.axis("right", at = at, lab = pr, outside = TRUE)
grid.text("F", x = 1.1, rot = 90) # right y axis label
trellis.unfocus()

Resources