I am trying to draw a least squares regression line using abline(lm(...)) that is also forced to pass through a particular point. I see this question is related, but not quite what I want. Here's an example:
test <- structure(list(x = c(0, 9, 27, 40, 52, 59, 76), y = c(50, 68,
79, 186, 175, 271, 281)), .Names = c("x", "y"))
# set up an example plot
plot(test,pch=19,ylim=c(0,300),
panel.first=abline(h=c(0,50),v=c(0,10),lty=3,col="gray"))
# standard line of best fit - black line
abline(lm(y ~ x, data=test))
# force through [0,0] - blue line
abline(lm(y ~ x + 0, data=test), col="blue")
This looks like:
Now how would I go about forcing a line through the marked arbitrary point of (x=10,y=50) while still minimising the distance to the other points?
# force through [10,50] - red line
??
A rough solution would be to shift the origin for your model to that point and create a model with no intercept
nmod <- (lm(I(y-50)~I(x-10) +0, test))
abline(predict(nmod, newdata = list(x=0))+50, coef(nmod), col='red')
You can modify the formula for lm() and offset the data:
p=10
q=50
abline(lm(I(y-q) ~ I(x-p) + 0, data=test), col="red")
Related
How do I change hexbin plots scales?
I currently have this:
Instead of the scale jumping from 1 to 718, I would like it to go from 1 to 2, 3, 5, 10, 20, 40, 80, 160, 320, 640, 1280, 2560, 5120, 10240, 15935.
Here is the code I used to plot it:
hex <- hexbin(trial$pickup_longitude, trial$pickup_latitude, xbins=600)
plot(hex, colramp = colorRampPalette(LinOCS(12)))
Here's a ggplot method, where you can specify whatever breaks you want.
library(ggplot2)
library(RColorBrewer)
##
# made up sample
#
set.seed(42)
X <- rgamma(10000, shape=1000, scale=1)
Y <- rgamma(10000, shape=10, scale=100)
dt <- data.table(X, Y)
##
# define breaks and labels for the legend
#
brks <- c(0, 1, 2, 5, 10, 20, 50, 100, Inf)
n.br <- length(brks)
labs <- c(paste('<', brks[2:(n.br-1)]), paste('>', brks[n.br-1]))
##
#
ggplot(dt, aes(X, Y))+geom_hex(aes(fill=cut(..count.., breaks=brks)), color='grey80')+
scale_fill_manual(name='Count', values = rev(brewer.pal(8, 'Spectral')), labels=labs)
You cannot control the boundaries of the scale as closely as you want, but you can adjust it somewhat. First we need a reproducible example:
set.seed(42)
X <- rnorm(10000, 10, 3)
Y <- rnorm(10000, 10, 3)
XY.hex <- hexbin(X, Y)
To change the scale we need to specify a function to use on the counts and an inverse function to reverse the transformation. Now, three different scalings:
plot(XY.hex) # Linear, default
plot(XY.hex, trans=sqrt, inv=function(x) x^2) # Square root
plot(XY.hex, trans=log, inv=function(x) exp(x)) # Log
The top plot is the original scaling. The bottom left is the square root transform and the bottom right is the log transform. There are probably too many levels to read these plots clearly. Adding the argument colorcut=6 to the plot command would reduce the number of levels to 5.
I discovered few weeks ago ggforce, which has a great features to plot ellipse. But I don't manage to use it in log plots. Here is an example:
I would like to use the ellipse to circle this group
library(ggforce)
library(ggplot2)
ggplot(mtcars)+
geom_point(aes(hp,disp))+
geom_ellipse(aes(x0 = 230, y0 = 450, a = 80, b = 30, angle = -10))
But I would like to do this in a log plot. If I naively do
ggplot(mtcars)+
geom_point(aes(hp,disp))+
geom_ellipse(aes(x0 = 230, y0 = 450, a = 80, b = 30, angle = -10))+
scale_y_log10()
I obtain a giant ellipse:
It looks like the ellipse parameters are not log transformed. I could try to reduce the parameter axis to get the good size on the log axis, something like:
ggplot(mtcars)+
geom_point(aes(hp,disp))+
scale_y_log10()+
geom_ellipse(aes(x0 = 230, y0 = 450, a = 80, b = 0.05, angle =0))
which works:
But only if the angle is 0. If not, the two wxis are mixed and I can't get the ellipse I want:
ggplot(mtcars)+
geom_point(aes(hp,disp))+
scale_y_log10()+
geom_ellipse(aes(x0 = 230, y0 = 450, a = 80, b = 0.05, angle = -10))
How can I plot an ellipse in a log or log-log plot in ggplot ? Is there any feasible workaround with ggforce ? Is there any other "simple" solution (other than coding the ellipse in semi-log coordinates) ?
What actually works for me is to transform the coordinate system instead of the y scale.
ggplot(mtcars) +
geom_point(aes(hp,disp)) +
geom_ellipse(aes(x0 = 230, y0 = 450, a = 80, b = 30, angle = -10)) +
coord_trans(y = "log10")
To be honest it intuitively makes sense to me to use the coord transformation - it resembles coord_map where you're also transforming the coordinates when plotting polygons in different shapes - but I don't know enough internals to explain why scale transformation does not work.
I am new to R, and I am trying to create a conditional probability plot, with pre-test probability on the x axis and post-test probability on the y axis. Similar to the one in the link conditional probability plot. I need to plot points for a positive test and join them together with a line, and plot points for a negative test and join the points together with a line, on the same graph.
I have the data:
Pre-test prob for negative test <- c(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100)
Post-test prob for negative test <- c(0, 3, 7, 11, 17, 22, 30, 40, 53, 72, 100)
Pre-test prob for positive test < - c(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100)
Post-test prob for positive test <- c(0, 38, 57, 69, 77, 83, 88, 94, 95, 98, 100)
However I am unsure how best to organise the data or of the code to produce the graph that I need! I have searched for "conditional probability plots" but haven't found anything helpful.
Any guidance would be much appreciated.
Thanks, Laura
The best way to organise the data is inside a data.frame:
test = data.frame(Pos.pre = a, Pos.post = b, Neg.pre = c, Neg.post = d)
(Assuming your individual data was called a, b, c, d.)
Now you can plot, e.g. positive post vs pre:
plot(Pos.post ~ Pos.pre, data = test, type = 'l')
(type = 'l' makes this a line plot.)
And you can add the negative results using the lines function, which adds data to an existing plot:
lines(Neg.post ~ Neg.pre, test, col = 'red')
Here, I’ve taken the liberty of making the second line red. Take a look at the documentation of plot, lines and par for many more options.
Once you have the time, I strongly urge you to learn using the ggplot2 library, which makes these kinds of plots more flexible. Case in point, with ggplot2 we could create the above plot in a single, extensible command:
ggplot(test) +
geom_line(aes(x = Pos.pre, y = Pos.post)) +
geom_line(aes(x = Neg.pre, y = Neg.post), color = 'red')
Can anyone think of a way to add, to a 2D scatterplot, a third dimension that houses distinct distributions for Y|X=120, Y|X=140, and Y|X=160? I'm trying to include theoretical standard normals for starters (but would eventually like to include the empirical distributions).
For reference, here's a ggplot2 depiction of the 2D scatterplot
df <- data.frame(x = c(replicate(5, 120), replicate(7, 140), replicate(6, 160)),
y = c(c(79, 84, 90, 94, 98), c(80, 93, 95, 103, 108, 113, 115),
c(102, 107, 110, 116, 118, 125)))
library(dplyr)
df <- df %>% group_by(x) %>% mutate(gp.mn = mean(y))
library(ggplot2)
( ggplot(df, aes(x = x)) + geom_point(aes(y = y)) + geom_line(aes(y = gp.mn)))
I'm essentially trying to replicate an image I created in .tpx:
I'm not tied to any particular 3D package, but plot3Drgl can be used to generate a 2D plot similar to the one above:
library(plot3Drgl)
scatter2Drgl(df$x, df$y, xlab = "x", ylab = "y")
scatter2Drgl(df$x, df$gp.mn, type = "l", add = TRUE, lwd = 4)
My hope was to use the 2D plot as a building block for a pseudo-3D rgl plot, however, incorporating the distributions into a third dimension (rgl or otherwise) is eluding me. Any thoughts?
Maybe this will help. (I've never been very happy with he ggplot paradigm so I'm showing a base graphics version that someone can translate.) I also thought adding the group means to the df-object confused things so I'm only using the oritignal df.
aggregate(y~x,df, FUN=function(y) c(mn=mean(y),sd=sd(y)) )
#--------
x y.mn y.sd
1 120 89.000000 7.615773
2 140 101.000000 12.476645
3 160 113.000000 8.294577
#----------
png(); plot(df, xlim=c(110,170) )
lines( x= 120 - 100*dnorm(seq(89-2*7.6,89+2+7.6,length=25), 89, 7.6),
y= seq(89-2*7.6,89+2+7.6,length=25) )
lines( x=140 - 100*dnorm(seq(101-2*12.5,101+2*12.5,length=25), 101, 12.5),
y- seq(89-2*7.6,101+2+12.5,length=25) );dev.off()
The basic strategy is to reverse the argument order (and expand the distribution value by multiplying by a factor on the scale of the plotted points) and then "translate" the distributions so they are adjacent to the points they are derived from.
I have the following kind of data: on a rectangular piece of land (120x50 yards), there are 6 (also rectabgular) smaller areas each with a different kind of plant. The idea is to study the attractiveness of the various kinds of plant to birds. Each time a bird sits down somewhere on the land, I have the exact coordinates of where the bird sits down.
I don't care exactly where the bird sits down, but only care which of the six areas it is. To show the relative preference of birds for the various plants, I want to make a heatmap that makes the areas that are frequented most the darkest.
So, I need to convert the coordinates to code which area the bird visits, and then create a heatmap that shows the differential preference for each land area.
(the research is a bit more involved than this, but this is the general idea.)
How would I do this in R? Is there a R function that takes a vector of coordinates and turns that in such a heatmap? If not, do you have some hints for more on how to do this?
Not meant to be the answer you are looking for, but might give you some inspiration.
# Simulate some data
birdieLandingSimulator <- data.frame(t(sapply(1:100, function(x) c(runif(1, -10,10), runif(1, -10,10)))))
# Assign some coordinates, which ended up not really being used much at all, except for the point colors
assignCoord <- function(x)
{
# Assign the four coordinates clockwise: 1, 2, 3, 4
ifelse(all(x>0), 1, ifelse(!sum(x>0), 3, ifelse(x[1]>0, 2, 4)))
}
birdieLandingSimulator <- cbind(birdieLandingSimulator, Q = apply(birdieLandingSimulator, 1, assignCoord))
# Plot
require(ggplot2)
ggplot(birdieLandingSimulator, aes(x = X1, y = X2)) +
stat_density2d(geom="tile", aes(fill = 1/..density..), contour = FALSE) +
geom_point(aes(color = factor(Q))) + theme_classic() +
theme(axis.title = element_blank(),
axis.line = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank()) +
scale_color_discrete(guide = FALSE, h=c(180, 270)) +
scale_fill_continuous(name = "Birdie Landing Location")
Use ggplot2. Take a look at the examples for geom_bin2d. It's pretty simple to get 2d bins. Notice that you pass in binwidth for both x and y:
> df = data.frame(x=c(1,2,4,6,3,2,4,2,1,7,4,4),y=c(2,1,4,2,4,4,1,4,2,3,1,1))
> ggplot(df,aes(x=x, y=y,alpha=0.5)) + geom_bin2d(binwidth=c(2,2))
If you don't want to use ggplot, you can use the cut function to separate your data into bins.
# Test data.
x <- sample(1:120, 100, replace=T)
y <- sample(1:50, 100, replace=T)
# Separate the data into bins.
x <- cut(x, c(0, 40, 80, 120))
y <- cut(y, c(0, 25, 50))
# Now plot it, suppressing reordering.
heatmap(table(y, x), Colv=NA, Rowv=NA)
Alternatively, to actually plot the regions in their true geographic location, you could draw the boxes yourself with rect. You would have to count the number of points in each region.
# Test data.
x <- sample(1:120, 100, replace=T)
y <- sample(1:50, 100, replace=T)
regions <- data.frame(xleft=c(0, 40, 40, 80, 0, 80),
ybottom=c(0, 0, 15, 15, 30, 40),
xright=c(40, 120, 80, 120, 80, 120),
ytop=c(30, 15, 30, 40, 50, 50))
# Color gradient.
col <- colorRampPalette(c("white", "red"))(30)
# Make the plot.
plot(NULL, xlim=c(0, 120), ylim=c(0, 50), xlab="x", ylab="y")
apply(regions, 1, function (r) {
count <- sum(x >= r["xleft"] & x < r["xright"] & y >= r["ybottom"] & y < r["ytop"])
rect(r["xleft"], r["ybottom"], r["xright"], r["ytop"], col=col[count])
text( (r["xright"]+r["xleft"])/2, (r["ytop"]+r["ybottom"])/2, count)
})