I am new to R, and I am trying to create a conditional probability plot, with pre-test probability on the x axis and post-test probability on the y axis. Similar to the one in the link conditional probability plot. I need to plot points for a positive test and join them together with a line, and plot points for a negative test and join the points together with a line, on the same graph.
I have the data:
Pre-test prob for negative test <- c(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100)
Post-test prob for negative test <- c(0, 3, 7, 11, 17, 22, 30, 40, 53, 72, 100)
Pre-test prob for positive test < - c(0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100)
Post-test prob for positive test <- c(0, 38, 57, 69, 77, 83, 88, 94, 95, 98, 100)
However I am unsure how best to organise the data or of the code to produce the graph that I need! I have searched for "conditional probability plots" but haven't found anything helpful.
Any guidance would be much appreciated.
Thanks, Laura
The best way to organise the data is inside a data.frame:
test = data.frame(Pos.pre = a, Pos.post = b, Neg.pre = c, Neg.post = d)
(Assuming your individual data was called a, b, c, d.)
Now you can plot, e.g. positive post vs pre:
plot(Pos.post ~ Pos.pre, data = test, type = 'l')
(type = 'l' makes this a line plot.)
And you can add the negative results using the lines function, which adds data to an existing plot:
lines(Neg.post ~ Neg.pre, test, col = 'red')
Here, I’ve taken the liberty of making the second line red. Take a look at the documentation of plot, lines and par for many more options.
Once you have the time, I strongly urge you to learn using the ggplot2 library, which makes these kinds of plots more flexible. Case in point, with ggplot2 we could create the above plot in a single, extensible command:
ggplot(test) +
geom_line(aes(x = Pos.pre, y = Pos.post)) +
geom_line(aes(x = Neg.pre, y = Neg.post), color = 'red')
Related
I am calculating the most common flight path of birds over a given area (airport). I know their position (distance from me) and their flight angle. I am situated at a particular point and birds are flying around me. I make the assumption that all the birds are flying in a straight line.
How can I know what is the most common flight path over the area?
Example of flight angles:
direction <- c(35, 70, 300, 260, 340, 130, 240, 40, 190, 190, 150, 20)
I plotted their position given the distance and angle from me. Then I added their flight angle and a made up distance of their flight just to see the flight-path (1.5 km).
As you might see it is a bit chaotic but I would like to know roughly if birds are flying more frequently in some range of angles (20-30° range) or if it is all random.
Would a simple count of data points per category be enough information? You can use "cut" to do this based on the categories that you define. E.g.
library(dplyr)
direction <- c(35, 70, 300, 260, 340, 130, 240, 40, 190, 190, 150, 20)
categoryBreaks <- c(0,20,40,60,80,100,120,140,160,180,200,220,240,260,280,300,320,340,360)
catDirection <- data.frame(direction) %>%
arrange(direction) %>%
mutate(category = cut(direction, categoryBreaks))
And plotting this:
ggplot(catDirection) +
geom_bar(aes(category)) +
xlab("Angle of Flight") +
ylab("Count of birds") +
theme_light() +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
Do you need something more complex?
There are many ways. For example, you can compare the frequencies: flights within 20-30 range vs other ranges. Or maybe you can also express all flights as distance from that range and plot it or look for correlation.
I'm trying to animate a stacked line chart in ggplot2.
Here's the plot I'd like to animate:
Here's the code to generate a similar plot:
#Data
mydata <- data.frame(year=rep(1:6, times=4),
activity=as.factor(rep(c("research","coursework","clinical work","teaching"), each=6)),
time=c(40, 35, 40, 60, 85, 90,
50, 40, 10, 0, 5, 0,
5, 20, 20, 40, 10, 10,
5, 5, 30, 0, 0, 0))
mydata$activity <- ordered(mydata$activity, levels = c("research","clinical work","coursework","teaching"))
labels <- data.frame(activity=c("research","coursework","clinical work","teaching"),
xaxis=c(5, 1.8, 2.5, 2.97),
yaxis=c(25, 70, 48, 90))
#Plot
ggplot(mydata, aes(x=year, y=time, fill=activity)) +
geom_area(stat="smooth", span=.35, color="black") +
theme(legend.position = "none") +
geom_text(data=labels, aes(x=xaxis, y=yaxis, label=activity)) +
ggtitle("Time in Different Activities by Year in Program") +
ylab("Percentage of Time") +
xlab("Year in Program")
I'm looking for the first image to display all axes and text. The second iteration, I'd like to gradually reveal over time, from left to right, the "Research" stacked line (including color and border). The third iteration, I'd like to gradually reveal, from left to right, the "Clinical Work" stacked line. Fourth, the "Coursework" stacked line. And finally, the "Teaching" stacked line.
Ideally, the output format would be very smooth (no jagged jumps) and would be compatible with PowerPoint.
Here is an R-based solution. It saves individual figures (.png) that can be iterated through within a presentation.
Alternatively,you could create an animation (for example converting to .gif) using ImageMagick http://www.imagemagick.org/
#Data
mydata <- data.frame(year=rep(1:6, times=4),
activity=as.factor(rep(c("research","coursework","clinical work","teaching"), each=6)),
time=c(40, 35, 40, 60, 85, 90,
50, 40, 10, 0, 5, 0,
5, 20, 20, 40, 10, 10,
5, 5, 30, 0, 0, 0))
#order the activities and then the dataframe
mydata$activity <- ordered(mydata$activity, levels = c("research","clinical work","coursework","teaching"))
mydata <- mydata[order(mydata$activity),]
#labels
labels <- data.frame(activity=c("research","coursework","clinical work","teaching"),
xaxis=c(5, 1.8, 2.5, 2.97),
yaxis=c(25, 70, 48, 90))
#creates a function to draws a plot for each activity
draw.stacks<-function(leg){
int <- leg*6
a<-ggplot(data=mydata[1:int,], aes(x=year, y=time, fill=activity))+
geom_area(stat="smooth", span=.35, color="black") +
theme_bw()+
scale_fill_discrete(limits = c("research","clinical work","coursework","teaching"), guide="none")+
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank()) +
coord_cartesian(xlim=c(1,6),ylim=c(0,100))+
geom_text(data=labels, aes(x=xaxis, y=yaxis, label=activity)) +
ggtitle("Time in Different Activities by Year in Program") +
ylab("Percentage of Time") +
xlab("Year in Program")
print(a)
}
# save individual png figures
for (i in 0:4) {
png(paste("activity", i, "png", sep="."))
draw.stacks(i)
dev.off()
}
Sorry for bringing in a non-programmer solution, but I would simply generate plots for each iteration separately, put them in power point (one plot on one slide), and use some fancy slide transition effects (I tried the Random Bars effect on your example, and it looked nice).
If you determined to find an R-based solution, you can take a look at the animate package (see a Strategic Zombie Simulation example here).
Can anyone think of a way to add, to a 2D scatterplot, a third dimension that houses distinct distributions for Y|X=120, Y|X=140, and Y|X=160? I'm trying to include theoretical standard normals for starters (but would eventually like to include the empirical distributions).
For reference, here's a ggplot2 depiction of the 2D scatterplot
df <- data.frame(x = c(replicate(5, 120), replicate(7, 140), replicate(6, 160)),
y = c(c(79, 84, 90, 94, 98), c(80, 93, 95, 103, 108, 113, 115),
c(102, 107, 110, 116, 118, 125)))
library(dplyr)
df <- df %>% group_by(x) %>% mutate(gp.mn = mean(y))
library(ggplot2)
( ggplot(df, aes(x = x)) + geom_point(aes(y = y)) + geom_line(aes(y = gp.mn)))
I'm essentially trying to replicate an image I created in .tpx:
I'm not tied to any particular 3D package, but plot3Drgl can be used to generate a 2D plot similar to the one above:
library(plot3Drgl)
scatter2Drgl(df$x, df$y, xlab = "x", ylab = "y")
scatter2Drgl(df$x, df$gp.mn, type = "l", add = TRUE, lwd = 4)
My hope was to use the 2D plot as a building block for a pseudo-3D rgl plot, however, incorporating the distributions into a third dimension (rgl or otherwise) is eluding me. Any thoughts?
Maybe this will help. (I've never been very happy with he ggplot paradigm so I'm showing a base graphics version that someone can translate.) I also thought adding the group means to the df-object confused things so I'm only using the oritignal df.
aggregate(y~x,df, FUN=function(y) c(mn=mean(y),sd=sd(y)) )
#--------
x y.mn y.sd
1 120 89.000000 7.615773
2 140 101.000000 12.476645
3 160 113.000000 8.294577
#----------
png(); plot(df, xlim=c(110,170) )
lines( x= 120 - 100*dnorm(seq(89-2*7.6,89+2+7.6,length=25), 89, 7.6),
y= seq(89-2*7.6,89+2+7.6,length=25) )
lines( x=140 - 100*dnorm(seq(101-2*12.5,101+2*12.5,length=25), 101, 12.5),
y- seq(89-2*7.6,101+2+12.5,length=25) );dev.off()
The basic strategy is to reverse the argument order (and expand the distribution value by multiplying by a factor on the scale of the plotted points) and then "translate" the distributions so they are adjacent to the points they are derived from.
I have the following kind of data: on a rectangular piece of land (120x50 yards), there are 6 (also rectabgular) smaller areas each with a different kind of plant. The idea is to study the attractiveness of the various kinds of plant to birds. Each time a bird sits down somewhere on the land, I have the exact coordinates of where the bird sits down.
I don't care exactly where the bird sits down, but only care which of the six areas it is. To show the relative preference of birds for the various plants, I want to make a heatmap that makes the areas that are frequented most the darkest.
So, I need to convert the coordinates to code which area the bird visits, and then create a heatmap that shows the differential preference for each land area.
(the research is a bit more involved than this, but this is the general idea.)
How would I do this in R? Is there a R function that takes a vector of coordinates and turns that in such a heatmap? If not, do you have some hints for more on how to do this?
Not meant to be the answer you are looking for, but might give you some inspiration.
# Simulate some data
birdieLandingSimulator <- data.frame(t(sapply(1:100, function(x) c(runif(1, -10,10), runif(1, -10,10)))))
# Assign some coordinates, which ended up not really being used much at all, except for the point colors
assignCoord <- function(x)
{
# Assign the four coordinates clockwise: 1, 2, 3, 4
ifelse(all(x>0), 1, ifelse(!sum(x>0), 3, ifelse(x[1]>0, 2, 4)))
}
birdieLandingSimulator <- cbind(birdieLandingSimulator, Q = apply(birdieLandingSimulator, 1, assignCoord))
# Plot
require(ggplot2)
ggplot(birdieLandingSimulator, aes(x = X1, y = X2)) +
stat_density2d(geom="tile", aes(fill = 1/..density..), contour = FALSE) +
geom_point(aes(color = factor(Q))) + theme_classic() +
theme(axis.title = element_blank(),
axis.line = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank()) +
scale_color_discrete(guide = FALSE, h=c(180, 270)) +
scale_fill_continuous(name = "Birdie Landing Location")
Use ggplot2. Take a look at the examples for geom_bin2d. It's pretty simple to get 2d bins. Notice that you pass in binwidth for both x and y:
> df = data.frame(x=c(1,2,4,6,3,2,4,2,1,7,4,4),y=c(2,1,4,2,4,4,1,4,2,3,1,1))
> ggplot(df,aes(x=x, y=y,alpha=0.5)) + geom_bin2d(binwidth=c(2,2))
If you don't want to use ggplot, you can use the cut function to separate your data into bins.
# Test data.
x <- sample(1:120, 100, replace=T)
y <- sample(1:50, 100, replace=T)
# Separate the data into bins.
x <- cut(x, c(0, 40, 80, 120))
y <- cut(y, c(0, 25, 50))
# Now plot it, suppressing reordering.
heatmap(table(y, x), Colv=NA, Rowv=NA)
Alternatively, to actually plot the regions in their true geographic location, you could draw the boxes yourself with rect. You would have to count the number of points in each region.
# Test data.
x <- sample(1:120, 100, replace=T)
y <- sample(1:50, 100, replace=T)
regions <- data.frame(xleft=c(0, 40, 40, 80, 0, 80),
ybottom=c(0, 0, 15, 15, 30, 40),
xright=c(40, 120, 80, 120, 80, 120),
ytop=c(30, 15, 30, 40, 50, 50))
# Color gradient.
col <- colorRampPalette(c("white", "red"))(30)
# Make the plot.
plot(NULL, xlim=c(0, 120), ylim=c(0, 50), xlab="x", ylab="y")
apply(regions, 1, function (r) {
count <- sum(x >= r["xleft"] & x < r["xright"] & y >= r["ybottom"] & y < r["ytop"])
rect(r["xleft"], r["ybottom"], r["xright"], r["ytop"], col=col[count])
text( (r["xright"]+r["xleft"])/2, (r["ytop"]+r["ybottom"])/2, count)
})
I am trying to draw a least squares regression line using abline(lm(...)) that is also forced to pass through a particular point. I see this question is related, but not quite what I want. Here's an example:
test <- structure(list(x = c(0, 9, 27, 40, 52, 59, 76), y = c(50, 68,
79, 186, 175, 271, 281)), .Names = c("x", "y"))
# set up an example plot
plot(test,pch=19,ylim=c(0,300),
panel.first=abline(h=c(0,50),v=c(0,10),lty=3,col="gray"))
# standard line of best fit - black line
abline(lm(y ~ x, data=test))
# force through [0,0] - blue line
abline(lm(y ~ x + 0, data=test), col="blue")
This looks like:
Now how would I go about forcing a line through the marked arbitrary point of (x=10,y=50) while still minimising the distance to the other points?
# force through [10,50] - red line
??
A rough solution would be to shift the origin for your model to that point and create a model with no intercept
nmod <- (lm(I(y-50)~I(x-10) +0, test))
abline(predict(nmod, newdata = list(x=0))+50, coef(nmod), col='red')
You can modify the formula for lm() and offset the data:
p=10
q=50
abline(lm(I(y-q) ~ I(x-p) + 0, data=test), col="red")