Related
I am trying to visualize the trajectory of multiple participants in a virtual room using R. I have a participant entering from the right (black square) and moving toward the left, where there is an exit door (red square). Sometimes there is an obstacle right in the middle of the room (circle), and the participant goes around it.
To visualize multiple participants’ trajectories on the same graph (i.e., multiple lines), I have used the function plot to set up the plot itself (and the first line) and then I have used the function lines to add other trajectories after that.
Below you can see an example with two lines; in the experiment, I have many more (as now I have collected data from about 20 participants.)
library(shape)
# black line
pos_x <- c(5.04,4.68,4.39,4.09,3.73,3.37,3.07,2.77,2.47,2.11)
pos_z <- c(0.74,0.69,0.64,0.60,0.56,0.52,0.50,0.50,0.50,0.51)
df1 <- cbind.data.frame(pos_x,pos_z)
x.2 <- df1$pos_x
z.2 <- df1$pos_z
plot(x.2,z.2,type="l", xlim=range(x.2), ylim=c(-1,3.5), xlab="x", ylab="z", main = "Two trajectories")
filledrectangle(wx = 0.2, wy = 0.2,col = "black", mid = c(5.16, 1), angle = 0)
filledrectangle(wx = 0.2, wy = 0.2,col = "red", mid = c(2, 1), angle = 0)
plotcircle(mid = c(3.4, 1), r = 0.05)
# red line
pos_x <- c(5.14,4.84,4.24,3.64,3.34,2.74,2.15)
pos_z <- c(0.17,0.13,0.01,-0.2,0.01,0.10,0.17)
df2 <- cbind.data.frame(pos_x,pos_z)
x.3 <- df2$pos_x
z.3 <- df2$pos_z
lines(x.3, z.3, xlim=range(x.3), ylim=c(-1,3.5), pch=16, col="red")
What I would like to do now is to find the average between these two lines. Ideally, I would like to be able to average multiple lines and add an interval for the standard deviation.
The first thing I have tried is to build an interpolation; the problem is that the start and end point are different, so I cannot average the points:
plot(x.2, z.2, xlim=range(x.2), ylim=c(-1,3.5), xlab="x", ylab="z", main = "Interpolation")
points(approx(x.2, z.2), col = 2, pch = "*")
points(x.3, z.3)
points(approx(x.3, z.3), col = 2, pch = "*")
I have then found a suggestion here: use the R library dtw.
I have looked up the library and the companion paper.
This is a typical example from the paper, in which "two non-overlapping windows" are extracted from a reference electrocardiogram. The dataset "aami3a" is a time series object.
library("dtw")
data("aami3a")
ref <- window(aami3a, start = 0, end = 2)
test <- window(aami3a, start = 2.7, end = 5)
alignment <- dtw(test, ref)
alignment$distance
The problem is that in all these examples the data is either structured as a time series object or the two lines are functions of a common matrix (see also the R quickstart example in the documentation and this other tutorial.)
How can I reorganize my data to make the function work? Or do you know of any other way to create an average?
You could map equivalent points from the start to the end of each path (i.e. find the midpoint between the two lines at the start of each path, the midpoint between the two lines after a quarter of each path is complete, after a half, at the end, etc.
The way to do that is to use interpolation (via approx):
pos_x_a <- c(5.04,4.68,4.39,4.09,3.73,3.37,3.07,2.77,2.47,2.11)
pos_z_a <- c(0.74,0.69,0.64,0.60,0.56,0.52,0.50,0.50,0.50,0.51)
pos_x_b <- c(5.14,4.84,4.24,3.64,3.34,2.74,2.15)
pos_z_b <- c(0.17,0.13,0.01,-0.2,0.01,0.10,0.17)
pos_t_a <- seq(0, 1, length.out = length(pos_x_a))
pos_t_b <- seq(0, 1, length.out = length(pos_x_b))
a_x <- approx(pos_t_a, pos_x_a, seq(0, 1, 0.01))$y
a_y <- approx(pos_t_a, pos_z_a, seq(0, 1, 0.01))$y
b_x <- approx(pos_t_b, pos_x_b, seq(0, 1, 0.01))$y
b_y <- approx(pos_t_b, pos_z_b, seq(0, 1, 0.01))$y
plot(a_x, a_y, type = "l", ylim = c(-1, 3))
lines(b_x, b_y, col = "red")
lines((a_x + b_x)/2, (a_y + b_y)/2, col = "blue", lty = 2)
We get a better idea of how this averaging has occurred by joining the points on each line that were used to get the average:
for(i in seq_along(a_x)) segments(a_x[i], a_y[i], b_x[i], b_y[i], col = "gray")
I have a raster dataset that I created from iwd. I have plotted a filledContour plot but I want to reverse the x- and y-axis so that the numbers are decreasing, eliminate the white space and vertically exaggerate the y-axis. Setting the xlim and ylim as you would in ggplot or plot has not worked.
If there is no way to reverse the x- and y-axis of a raster dataset, how do I maintain the resolution of my s4 class dataset after converting to s3? For example, if I use filled.contour instead of filledContour.
Here is my code and plot, which is pretty basic because what I have tried has not produced any results:
idw.out <- gstat::idw(Z ~ 1, core2, grd, idp = 1.5)
r <- raster(idw.out[1])
r.contour <- filledContour(r)
r.contour
An example of the scale that I am looking for is below:
Cheers
I resolved my problem with the code listed below. Because I had S4 class data, I had to manually set the axis extent to coincide with the xlim and ylim. This step is not required for 'filled.contour' plots, but is for 'filledContour' plots.
interpolated grain-size data from core2
idw.out <- gstat::idw(Z ~ 1, core2, grd, idp = 1.5)
conversion to raster
r=raster(idw.out[1], layer = 1, values=TRUE)
add colour scheme and plot
b = c(0,5,10,15,20,30,40,50)
col = rev(bpy.colors(length(b)-1))
r.contour = filledContour(r, zlim=c(0,50), xlim=c(9,-4), ylim=c(5.3,0),
asp = NA, xaxs = "i", yaxs = "i", las = 1,
col=col, levels=b,
xlab="grain-size (phi)", ylab="core depth (m)", main="Core 2")
I'm trying to add color to specific points in my circular data based on group membership (I have two groups: one with individuals with a certain medical condition and another group of just healthy controls). I've converted their data from degrees to radians and put it on the plot, but I haven't managed to be able to selectively change the color of the points based on the factor variable I have).
Know that I've loaded library (circular), which doesn't allow me to use ggplot. Here's the syntax I've been working with:
plot(bcirc, stack=FALSE, bins=60, shrink= 1, col=w$dx, axes=FALSE, xlab ="Basal sCORT", ylab = "Basal sAA")
If you've noticed, I specified the factor variable (which has two levels) in the color section, but it just keeps putting everything in one color. Any suggestions?
Seems plot.circular does not like to assign multiple colours. Here's one potential work-around:
library(circular)
## simulate circular data
bcirc1 <- rvonmises(100, circular(90), 10, control.circular=list(units="degrees"))
bcirc2 <- rvonmises(100, circular(0), 10, control.circular=list(units="degrees"))
bcirc <- c(bcirc1, bcirc2)
dx <- c(rep(1,100),rep(2,100))
## start with blank plot, then add group-specific points
plot(bcirc, stack=FALSE, bins=60, shrink= 1, col=NA,
axes=FALSE, xlab ="Basal sCORT", ylab = "Basal sAA")
points(bcirc[dx==1], col=rgb(1,0,0,0.1), cex=2) # note: a loop would be cleaner if dealing with >2 levels
points(bcirc[dx==2], col=rgb(0,0,1,0.1), cex=2)
Inspired by Paul Regular's example, here is a version using the same data where one condition is plotted stacking inwards and the other is plotted stacking outwards.
library(circular)
## simulate circular data
bcirc1 <- rvonmises(100, circular(90, units = 'degrees'), 10, control.circular=list(units="degrees"))
bcirc2 <- rvonmises(100, circular(0, units = 'degrees'), 10, control.circular=list(units="degrees"))
bcirc <- data.frame(condition = c(
rep(1,length(bcirc1)),
rep(2,length(bcirc2)) ),
angles = c(bcirc1,
bcirc2) )
## start with blank plot, then add group-specific points
dev.new(); par(mai = c(1, 1, 0.1,0.1))
plot(circular(subset(bcirc, condition == 1)$angles, units = 'degrees'), stack=T, bins=60, shrink= 1, col=1,sep = 0.005, tcl.text = -0.073,#text outside
axes=T, xlab ="Basal sCORT", ylab = "Basal sAA")
par(new = T)
plot(circular(subset(bcirc, condition == 2)$angles, units = 'degrees'), stack=T, bins=60, shrink= 1.05, col=2,
sep = -0.005, axes=F)#inner circle, no axes, stacks inwards
I tried this question in stats.stackexchange and somebody suggested I try it over here, so here goes:
I've completed PCA analysis, in R with VEGAN package, of some ecological data on tree health. There are 80 trees total (so, 80 'sites') divided into four treatment categories. I've got the data plotted with color coded points--colors according to the treatment groups. Rather than plotting individual sites/trees on PCA biplot, I'd like to make something like a box-and-whisker plot that has four 'crosses' that show the centroid for each group and the SE in both PCA dimensions. I've seen figures like this in papers, but I can't seem to find an R script for plotting this way. Any suggestions? (I'd like to post an example image here of what I'm looking for, but the ones I can find are all paywalled, sorry).
I guess an alternative would be to just take the site scores and manually find the means and SE's and create my own plot, but I'd rather find a script for it, if possible.
The code I've been running is really straightforward:
p1<-princomp(scale(health, scale=T))
summary(p1)
scores(p1)
plot(p1)
loadings(p1)
biplot(p1, xlab = "PC 1 (38%)", ylab = "PC 2 (22%)",cex=0.6)
plot(p1$scores[,1],p1$scores[,2])
names(p1)
plot(p1$scores[,1],p1$scores[,2], type='n', xlab="PC I", ylab="PC II")
text(p1$scores[,1],p1$scores[,2] labels=Can$tree)
I would probably start with ordiellipse and see if that suits your needs.
### Reproducible example
require("vegan")
data(varespec)
data(varechem)
pca <- rda(varespec, scale = TRUE)
grp <- with(varechem, cut(Baresoil, 4, labels = 1:4))
cols <- c("red","orange","blue","forestgreen")
scl <- 1 ## scaling
plot(pca, display = "sites", scaling = scl, type = "n")
points(pca, display = "sites", scaling = scl, col = cols[grp], pch = 16)
lev <- levels(grp)
for (i in seq_along(lev)) { ## draw ellipse per group
ordiellipse(pca, display = "sites", kind = "se", scaling = scl,
groups = grp, col = cols[i], show.groups = lev[i])
}
## centroids
scrs <- as.data.frame(scores(pca, display = "sites", scaling = scl,
choices = 1:2))
cent <- do.call(rbind, lapply(split(scrs, grp), colMeans))
points(cent, col = cols, pch = 3, cex = 1.1)
This produces
You can remove the points() line from the code above to stop it drawing the actual samples, but I thought it instructive here in terms of understanding what ordiellipse is doing.
In the plot, the centroid is marked as the mean of the site scores on each axis by grouping grp. The ellipse is a continuous region that is (given the setting I chose in the ordiellipse() call) 1 standard error about this centroid. Your suggestion for error bars in each direction is a specific case of the ellipse drawn by ordiellipse --- if you were to compute the standard errors of the centroids, they should extend as far as the extremal points of the ellipse in the horizontal and vertical directions.
However, this would fail to take into account the covariances of the scores on the two axes. Note in the example below how in those ellipses that are oriented at angles to the axes, the standard error bars do not intersect with the ellipse at their extremal points. If you were to draw a box containing the the region defined by the error bars, it would contain the ellipse, but it gives a very different impression of the uncertainty in the centroid.
serrFun <- function(df) {
apply(df, 2, function(x) sd(x) / sqrt(length(x)))
}
serr <- do.call(rbind, lapply(split(scrs, grp), serrFun))
for (i in seq_along(lev)) {
arrows(cent[i, 1] - serr[i, 1], cent[i, 2],
cent[i, 1] + serr[i, 1], cent[i, 2],
col = cols[i], code = 3, angle = 90, length = 0.05)
arrows(cent[i, 1], cent[i, 2] - serr[i, 2],
cent[i, 1], cent[i, 2] + serr[i, 2],
col = cols[i], code = 3, angle = 90, length = 0.05)
}
Given such data:
#Cutpoint SN (1-PPV)
5 0.56 0.01
7 0.78 0.19
9 0.91 0.58
How can I plot ROC curve with R that produce similar result like the
attached ?
I know ROCR package but it doesn't take such input.
If you just want to create the plot (without that silly interpolation spline between points) then just plot the data you give in the standard way, prepending a point at (0,0) and appending one at (1,1) to give the end points of the curve.
## your data with different labels
dat <- data.frame(cutpoint = c(5, 7, 9),
TPR = c(0.56, 0.78, 0.91),
FPR = c(0.01, 0.19, 0.58))
## plot version 1
op <- par(xaxs = "i", yaxs = "i")
plot(TPR ~ FPR, data = dat, xlim = c(0,1), ylim = c(0,1), type = "n")
with(dat, lines(c(0, FPR, 1), c(0, TPR, 1), type = "o", pch = 25, bg = "black"))
text(TPR ~ FPR, data = dat, pos = 3, labels = dat$cutpoint)
abline(0, 1)
par(op)
To explain the code: The first plot() call sets up the plotting region, without doing an plotting at all. Note that I force the plot to cover the range (0,1) in both axes. The par() call tells R to plot axes that cover the range of the data - the default extends them by 4 percent of the range on each axis.
The next line, with(dat, lines(....)) draws the ROC curve and here we prepend and append the points at (0,0) and (1,1) to give the full curve. Here I use type = "o" to give both points and lines overplotted, the points are represented by character 25 which allows it to be filled with a colour, here black.
Then I add labels to the points using text(....); the pos argument is used to position the label away from the actual plotting coordinates. I take the labels from the cutpoint object in the data frame.
The abline() call draws the 1:1 line (here the 0, and 1 mean an intercept of 0 and a slope of 1 respectively.
The final line resets the plotting parameters to the defaults we saved in op prior to plotting (in the first line).
The resulting plot looks like this:
It isn't an exact facsimile and I prefer the plot using the default for the axis ranges(adding 4 percent):
plot(TPR ~ FPR, data = dat, xlim = c(0,1), ylim = c(0,1), type = "n")
with(dat, lines(c(0, FPR, 1), c(0, TPR, 1), type = "o", pch = 25, bg = "black"))
text(TPR ~ FPR, data = dat, pos = 3, labels = dat$cutpoint)
abline(0, 1)
Again, not a true facsimile but close.