How do I superimpose a regression line in a barplot in R? - r

So I want to superimpose a regression line in a barplot in R. Similar to the attached image by Rosindell et al. 2011. However, when I try to do this with my data the line does not stretch the entire length of the barplot.
For a reproducible example, I made a dummy code:
x = 20:1
y = 1:20
barplot(x, y, space = 0)
lines(x, y, col = 'red')
How do I get the lines to transverse the entire stretch of the barplot bins?
PS: the line does not need to be non-linear. I just want to superimpose a straight line on the barplot
Thank you.

A more general solution could be to rely on the x-values that are generated by barplot(). This way, you can deal with scenarios where you only have counts (rather than x and y values). I am referring to a variable like this one, where your "x" is categorical (precisely, x-axis values correspond to the names of y).
p.x <- c(8,12,14,9,5,3,2)
x <- sample(c("A","B","C","D","E","F","G"),
prob = p.x/sum(p.x),
replace = TRUE,
size = 200)
y <- table(x)
y
# A B C D E F G
# 27 52 46 36 21 11 7
When you use barplot(), you can collect the x-positions of the bars in a variable (plot.dim in this case) and use to guide your line
plot.dim <- barplot(y)
lines(plot.dim, y, col = "red", lwd = 2)
The result
Now, back to your data. Even if you have both x and y, in a barplot you are displaying only your y variable, while x is used for the labels of y.
x <- 20:1
y <- as.integer(22 - 1 * sample(seq(0.7, 1.3, length.out = length(x))) * x)
names(y) <- x
y <- y[order(as.numeric(names(y)))]
Let's plot your y values again. Collect the barplot positions in the xpos variable.
xpos <- barplot(y, las = 2)
Note that the first bar (x=1) is not positioned at 1. Similarly, the last bar is positioned at 23.5 (and not 20).
xpos[1]
# x=1 is indeed at 0.7
xpos[length(xpos)]
# x=20 is indeed at 23.5
Do your regression (for example, use lm()). Compute the predicted y values at the first and the last x (y labels).
lm.fit <- lm(y~as.numeric(names(y)))
y.init <- lm.fit$coefficients[2] * as.numeric(names(y))[1] + lm.fit$coefficients[1]
y.end <- lm.fit$coefficients[2] * as.numeric(names(y))[(length(y))] + lm.fit$coefficients[1]
You can now over-pose a line using segments(), but remember to set your x-values according to what stored in xpos.
segments(xpos[1], y.init, xpos[length(xpos)], y.end, lwd = 2, col = "red")

Check out the help page ?barplot: the second argument is width - optional vector of bar widths, not the y coordinate. The following code does what you want, but I don't believe it's a general purpose solution.
barplot(y[x], space = 0)
lines(x, y, col = 'red')
Edit:
A probably better way would be to use the return value of barplot.
bp <- barplot(y[x], space = 0)
lines(c(bp), y[x], col = 'red')

Related

Plot multiple line graphs on the same window with auto-assigned different colors

I want to create a vector of functions with two parameters where one parameter is over a continuous range of values and the other runs over a fixed number of numerical values saved in the column vector dat[,2].
# Example functions
icc <- function(year, x) {
z = exp(year - x)
inf = z / (1 + z)
return (inf)
}
# Example data
year <- seq(-4, 4, 0.1)
x1 <- dat[1, 2]
x2 <- dat[2, 2]
# Plots
plot(t, icc(year, x1), type = "l")
plot(t, icc(year, x2), type = "l")
The issues are
dat[,2] has more than just 2 values and I want to be able to plot all the corresponding functions on the same plot but with different colors
manually assigning colors to each line is difficult as there are a large number of lines
dat[,1] stores the corresponding label to each plot; would it be possible to add them over each line?
I have something like this in mind-
UPDATE: dat is simply a 40 x 2 table storing strings in the first column and numerical values in the second. By 'a vector of functions', I mean an array containing functions with parameter values unique to each row. For example- if t^i is the function then, element 1 of the array is the function t^1, element 2 is t^2 and so on where t is a 'range'. (Label and color are extras and not too important. If unanswered, I'll post another question for them).
The function to use is matplot, not plot. There is also matlines but if the data to be plotted is in a matrix, matplot can plot all columns in one call.
Create a vector of y coordinates, yy, from the x values. This is done in a sapply loop. In the code below I have called the x coordinates values xx since there is no dat[,2] to work with.
Plot the resulting matrix in one matplot function call, which takes care of the colors automatically.
The lines labels problem is not addressed, only the lines plotting problem. With so many lines their labels would make the plot more difficult to read.
icc <- function(year, x) {
z = exp(year - x)
inf = z / (1 + z)
return (inf)
}
# Example data
year <- seq(-4, 4, 0.1)
xx <- seq(-1, 1, by = 0.2)
yy <- sapply(xx, \(x) icc(year, x))
matplot(year, yy, type = "l", lty = "solid")
Created on 2022-07-26 by the reprex package (v2.0.1)
Note
Function icc is the logistic distribution CDF with location x and scale 1. The base R plogis function can substitute for it, the results are equal within floating-point precision.
icc2 <- function(year, x) plogis(year, location = x, scale = 1)
yy2 <- sapply(xx, \(x) icc2(year, x))
identical(yy, yy2)
#> [1] FALSE
all.equal(yy, yy2)
#> [1] TRUE

How to draw a graph with both x-axis and y-axis are functions in R?

I have a function,
x= (z-z^2.5)/(1+2*z-z^2)
y = z-z^2.5
where z is the only variable. How to draw a graph where x-axis shows value of function x, and y-axis shows value of function y as z range from 0 to 5?
You can get a very basic plot by simply following your own instructions.
## z ranges from 0 to 5
z = seq(0,5,0.01)
## x and y are functions of z
x = (z-z^2.5)/(1+2*z-z^2)
y = z-z^2.5
##plot
plot(x,y, pch=20, cex=0.5)
If you want a smooth curve it is a little trickier. There is a discontinuity in the curve at
z = 1 + sqrt(2) ~ 2.414. If you just draw the curve as one piece, you get an unwanted line connecting across the discontinuity. So, in two pieces,
plot(x[1:242],y[1:242], type='l', xlab='x', ylab='y',
xlim=range(x), ylim=range(y))
lines(x[243:501],y[243:501])
But be careful about interpreting this. There is something tricky going on from z=0 to z=1.
Using ggplot2
# z ranges from -1000 to 1000 (The range can be arbitrary)
z = seq(-1000,1000,.25)
# x as a function of z
x = (z-z^2.5) / ((1+2*z)-z^2)
# y as a function of z
y = z-z^2.5
# make a dataframe of x,y and z
df <- data.frame(x=x, y=y, z=z)
# subset the df where z is between 0 and 5
df_5 <- subset(df, (df$z>=0 & df$z<=5))
# plot the graph
library(ggplot2)
ggplot(df_5, aes(x,y))+ geom_point(color="red")
The only addition to #G5W answer is subset() of values between 0 and 5 from your dataset to plot and the use of ggplot2.

How to plot an indexed set of (x,y) pairs such that the index is parallel to the x axis, but on the top of the frame

For example, let say:
x <- rnorm(20)
y <- rnorm(20) + 1
n <- seq(1,20,1)
data <- data.frame(n, x, y)
Is it possible to plot y~x with the indexed value of each pair at the top of the plot?
Can it be done with the base graphics, not ggplot?
It may be simple, but I am struggling to find help via Google. My guess is I'm using a poor selection of words.
Any help is much appreciated!
plot(x,y)
text(x = x, y = y, n, pos = 3)
#Adds text 'n' at co-ordinate (x,y)
# "pos = 3" means the text will be just above the co-ordinates
#See ?text for more
If you wanted to plot all the indices on a same line above the plot boundary, you can specify the appropriate value for y when using text. However, you will first have to pass par(xpd=TRUE) to be able to draw outside plot boundary
Yes we can add label. Try this code:
x <- rnorm(20)
y <- rnorm(20) + 1
n <- seq(1,20,1)
data <- data.frame(n, x, y)
plot(y~x)
with(data, text(y~x, labels = row.names(data)))

perspective plot in R (tick marks and understanding what is going on)

I plot a 3d plot in R using persp. I have two questions with respect to this:
Want to verify that I understand the docs correctly. persp will take the valuex from x and y, then depending on the index of each value in those vectors, say (i,j) corresponding to the current element in x and y, (x[i],y[j]), it will pluck out zfit[i,j] and plot (x[i],y[j],zfit[i,j]). Is this correct?
This does not produce the numbers on the actual axis but arrows in the increasing direction. How do I make numbers appear?
Example:
set.seed(1)
x = 1:10
y = rnorm(10)
z = x + y^2
g = expand.grid(list(x=seq(from=min(x), to=max(x), length.out=100),y=seq(from=min(y), to=max(y), length.out=100)))
mdl = loess(z ~ x+ y)
zfit = predict(mdl, newdata=g)
persp(x = seq(from=min(x), to=max(x), length.out=100), y = seq(from=min(y), to=max(y), length.out=100), z= zfit)
1 - Your understanding is correct.
2 - Add ticktype = "detailed" to show numbers on axis.

plot with overlapping points

I have data in R with overlapping points.
x = c(4,4,4,7,3,7,3,8,6,8,9,1,1,1,8)
y = c(5,5,5,2,1,2,5,2,2,2,3,5,5,5,2)
plot(x,y)
How can I plot these points so that the points that are overlapped are proportionally larger than the points that are not. For example, if 3 points lie at (4,5), then the dot at position (4,5) should be three times as large as a dot with only one point.
Here's one way using ggplot2:
x = c(4,4,4,7,3,7,3,8,6,8,9,1,1,1,8)
y = c(5,5,5,2,1,2,5,2,2,2,3,5,5,5,2)
df <- data.frame(x = x,y = y)
ggplot(data = df,aes(x = x,y = y)) + stat_sum()
By default, stat_sum uses the proportion of instances. You can use raw counts instead by doing something like:
ggplot(data = df,aes(x = x,y = y)) + stat_sum(aes(size = ..n..))
Here's a simpler (I think) solution:
x <- c(4,4,4,7,3,7,3,8,6,8,9,1,1,1,8)
y <- c(5,5,5,2,1,2,5,2,2,2,3,5,5,5,2)
size <- sapply(1:length(x), function(i) { sum(x==x[i] & y==y[i]) })
plot(x,y, cex=size)
## Tabulate the number of occurrences of each cooordinate
df <- data.frame(x, y)
df2 <- cbind(unique(df), value = with(df, tapply(x, paste(x,y), length)))
## Use cex to set point size to some function of coordinate count
## (By using sqrt(value), the _area_ of each point will be proportional
## to the number of observations it represents)
plot(y ~ x, cex = sqrt(value), data = df2, pch = 16)
You didn't really ask for this approach but alpha may be another way to address this:
library(ggplot2)
ggplot(data.frame(x=x, y=y), aes(x, y)) + geom_point(alpha=.3, size = 3)
You need to add the parameter cex to your plot function. First what I would do is use the function as.data.frame and table to reduce your data to unique (x,y) pairs and their frequencies:
new.data = as.data.frame(table(x,y))
new.data = new.data[new.data$Freq != 0,] # Remove points with zero frequency
The only downside to this is that it converts numeric data to factors. So convert back to numeric, and plot!
plot(as.numeric(new.data$x), as.numeric(new.data$y), cex = as.numeric(new.data$Freq))
You may also want to try sunflowerplot.
sunflowerplot(x,y)
Let me propose alternatives to adjusting the size of the points. One of the drawbacks of using size (radius? area?) is that the reader's evaluation of spot size vs. the underlying numeric value is subjective.
So, option 1: plot each point with transparency --- ninja'd by Tyler!
option 2: use jitter to push your data around slightly so the plotted points don't overlap.
A solution using lattice and table ( similar to #R_User but no need to remove 0 since lattice do the job)
dt <- as.data.frame(table(x,y))
xyplot(dt$y~dt$x, cex = dt$Freq^2, col =dt$Freq)

Resources