I am trying to visualize the trajectory of multiple participants in a virtual room using R. I have a participant entering from the right (black square) and moving toward the left, where there is an exit door (red square). Sometimes there is an obstacle right in the middle of the room (circle), and the participant goes around it.
To visualize multiple participants’ trajectories on the same graph (i.e., multiple lines), I have used the function plot to set up the plot itself (and the first line) and then I have used the function lines to add other trajectories after that.
Below you can see an example with two lines; in the experiment, I have many more (as now I have collected data from about 20 participants.)
library(shape)
# black line
pos_x <- c(5.04,4.68,4.39,4.09,3.73,3.37,3.07,2.77,2.47,2.11)
pos_z <- c(0.74,0.69,0.64,0.60,0.56,0.52,0.50,0.50,0.50,0.51)
df1 <- cbind.data.frame(pos_x,pos_z)
x.2 <- df1$pos_x
z.2 <- df1$pos_z
plot(x.2,z.2,type="l", xlim=range(x.2), ylim=c(-1,3.5), xlab="x", ylab="z", main = "Two trajectories")
filledrectangle(wx = 0.2, wy = 0.2,col = "black", mid = c(5.16, 1), angle = 0)
filledrectangle(wx = 0.2, wy = 0.2,col = "red", mid = c(2, 1), angle = 0)
plotcircle(mid = c(3.4, 1), r = 0.05)
# red line
pos_x <- c(5.14,4.84,4.24,3.64,3.34,2.74,2.15)
pos_z <- c(0.17,0.13,0.01,-0.2,0.01,0.10,0.17)
df2 <- cbind.data.frame(pos_x,pos_z)
x.3 <- df2$pos_x
z.3 <- df2$pos_z
lines(x.3, z.3, xlim=range(x.3), ylim=c(-1,3.5), pch=16, col="red")
What I would like to do now is to find the average between these two lines. Ideally, I would like to be able to average multiple lines and add an interval for the standard deviation.
The first thing I have tried is to build an interpolation; the problem is that the start and end point are different, so I cannot average the points:
plot(x.2, z.2, xlim=range(x.2), ylim=c(-1,3.5), xlab="x", ylab="z", main = "Interpolation")
points(approx(x.2, z.2), col = 2, pch = "*")
points(x.3, z.3)
points(approx(x.3, z.3), col = 2, pch = "*")
I have then found a suggestion here: use the R library dtw.
I have looked up the library and the companion paper.
This is a typical example from the paper, in which "two non-overlapping windows" are extracted from a reference electrocardiogram. The dataset "aami3a" is a time series object.
library("dtw")
data("aami3a")
ref <- window(aami3a, start = 0, end = 2)
test <- window(aami3a, start = 2.7, end = 5)
alignment <- dtw(test, ref)
alignment$distance
The problem is that in all these examples the data is either structured as a time series object or the two lines are functions of a common matrix (see also the R quickstart example in the documentation and this other tutorial.)
How can I reorganize my data to make the function work? Or do you know of any other way to create an average?
You could map equivalent points from the start to the end of each path (i.e. find the midpoint between the two lines at the start of each path, the midpoint between the two lines after a quarter of each path is complete, after a half, at the end, etc.
The way to do that is to use interpolation (via approx):
pos_x_a <- c(5.04,4.68,4.39,4.09,3.73,3.37,3.07,2.77,2.47,2.11)
pos_z_a <- c(0.74,0.69,0.64,0.60,0.56,0.52,0.50,0.50,0.50,0.51)
pos_x_b <- c(5.14,4.84,4.24,3.64,3.34,2.74,2.15)
pos_z_b <- c(0.17,0.13,0.01,-0.2,0.01,0.10,0.17)
pos_t_a <- seq(0, 1, length.out = length(pos_x_a))
pos_t_b <- seq(0, 1, length.out = length(pos_x_b))
a_x <- approx(pos_t_a, pos_x_a, seq(0, 1, 0.01))$y
a_y <- approx(pos_t_a, pos_z_a, seq(0, 1, 0.01))$y
b_x <- approx(pos_t_b, pos_x_b, seq(0, 1, 0.01))$y
b_y <- approx(pos_t_b, pos_z_b, seq(0, 1, 0.01))$y
plot(a_x, a_y, type = "l", ylim = c(-1, 3))
lines(b_x, b_y, col = "red")
lines((a_x + b_x)/2, (a_y + b_y)/2, col = "blue", lty = 2)
We get a better idea of how this averaging has occurred by joining the points on each line that were used to get the average:
for(i in seq_along(a_x)) segments(a_x[i], a_y[i], b_x[i], b_y[i], col = "gray")
Related
I made this image in powerpoint to illustrate what I am trying to do:
I am trying to make a series of circles (each of which are the same size) that "move" along the x-axis in consistent intervals; for instance, the center of each consecutive circle would be 2 points away from the previous circle.
I have tried several things, including the DrawCircle function from the DescTools package, but cant produce this. For example, here I am trying to draw 20 circles, where the center of each circle is 2 points away from the previous, and each circle has a radius of 2 (which doesnt work)
library(DescTools)
plotdat <- data.frame(xcords = seq(1,50, by = 2.5), ycords = rep(4,20))
Canvas()
DrawCircle(x=plotdat$xcords, y=plotdat$ycords, radius = 2)
How can this be done in R?
This is basically #Peter's answer but with modifications. Your approach was fine but there is no radius= argument in DrawCircle. See the manual page ?DrawCircle for the arguments:
dev.new(width=12, height=4)
Canvas(xlim = c(0,50), ylim=c(2, 6), asp=1, xpd=TRUE)
DrawCircle(x=plotdat$xcords, y=plotdat$ycords, r.out = 2)
But your example has axes:
plot(NA, xlim = c(0,50), ylim=c(2, 6), xlab="", ylab="", yaxt="n", asp=1, xpd=TRUE)
DrawCircle(x=plotdat$xcords, y=plotdat$ycords, r.out = 2)
My solution requires the creation of some auxiliary functions
library(tidyverse)
##First function: create circle with a predefined radius, and a x-shift and y-shift
create_circle <- function(radius,x_shift, y_shift){
p <- tibble(
x = radius*cos(seq(0,2*pi, length.out = 1000)) + x_shift ,
y = radius*sin(seq(0,2*pi, length.out = 1000))+ y_shift
)
return(p)
}
##Use lapply to create circles with multiple x shifts:
##Group is only necessary for plotting
l <- lapply(seq(0,40, by = 2), function(i){
create_circle(2,i,0) %>%
mutate(group = i)
})
##Bind rows and plot
bind_rows(l) %>%
ggplot(aes(x = x, y = y, group =group)) +
geom_path()
Does this do the trick?
library(DescTools)
plotdat <- data.frame(xcords = seq(1, 5, length.out = 20), ycords = rep(4,20))
Canvas(xlim = c(0, 5), xpd=TRUE)
DrawCircle(x=plotdat$xcords, y=plotdat$ycords, r.out = 2)
I've assumed when you say circle centres are 2 points apart you mean 0.2 units apart.
You may have to experiment with the values to get what you need.
I have two vectors of 1000 values (a and b), from which I created density plots and histograms. I would like to retrieve the coordinates (or just the y value) where the two plots cross (it does not matter if it detects several crossings, I can discriminate them afterwards). Please find the data in the following link. Sample Data
xlim = c(min(c(a,b)), max(c(a,b)))
hist(a, breaks = 100,
freq = F,
xlim = xlim,
xlab = 'Test Subject',
main = 'Difference plots',
col = rgb(0.443137, 0.776471, 0.443137, 0.5),
border = rgb(0.443137, 0.776471, 0.443137, 0.5))
lines(density(a))
hist(b, breaks = 100,
freq = F,
col = rgb(0.529412, 0.807843, 0.921569, 0.5),
border = rgb(0.529412, 0.807843, 0.921569, 0.5),
add = T)
lines(density(b))
Using locate() is not optimal, since I need to retrieve this from several plots (but will use that approach if nothing else is viable). Thanks for your help.
We calculate the density curves for both series, taking care to use the same range. Then, we compare whether the y-value for a is greater than b at each x-value. When the outcome of this comparison flips, we know the lines have crossed.
df <- merge(
as.data.frame(density(a, from = xlim[1], to = xlim[2])[c("x", "y")]),
as.data.frame(density(b, from = xlim[1], to = xlim[2])[c("x", "y")]),
by = "x", suffixes = c(".a", ".b")
)
df$comp <- as.numeric(df$y.a > df$y.b)
df$cross <- c(NA, diff(df$comp))
points(df[which(df$cross != 0), c("x", "y.a")])
which gives you
I've successfully produced NMDS plots (monoMDS, bray-curtis, 3 dimensions, local model). Each point represents an animal and their diet composition.
I have two questions:
(1) how do I change the symbology of points to show 2 levels (a or j) within 1 column (Life stage) on the NMDS plot?!
(2) How should I show 3D NMDS, I can't get the 3D orgl- functions to work on the 3D plot. Should I just make a few plots showing different dimensions in 2D? Looking for thoughtful ideas.
The code used:
plot((BC.NMDS.length.corr), choices = c(1, 2), type = "points",
xlim = c(-2.0, 2.0),las = 1, ylim = c(-1, 1),
xlab = "NMDS Axis 1", ylab = "NMDS Axis 2",mgp = c(3.25, 1, 0),
cex.lab = 1.35, cex.axis = 1.25)
with(DATA,
points(BC.NMDS.length.corr, Class, draw = "points",col = "gray0",
show.groups = "Adult",label = TRUE, lty = 1, lwd = 2))
Using an example of what you want with the default example of the package:
# Load library
library(vegan)
# Load data
data(dune)
# Compute the distance
dis <- vegdist(dune)
Specify if you want a 3D plot, the representation of the three dimensions
# Run monoMDS
m <- monoMDS(dis, model = "loc", k=3)
# The 3D representation
plot(m)
# Load library for 3D representation
library(scatterplot3d)
Coordinates are in m$points; each column referring to each dimension.
# Graphical representation
scatterplot3d(x=m$points[,1], y=m$points[,2], z=m$points[,3])
Additionally, if you want to colour the plots depending on a factor, you can specify color=A, where A is a numeric value where groups are codified.
I tried this question in stats.stackexchange and somebody suggested I try it over here, so here goes:
I've completed PCA analysis, in R with VEGAN package, of some ecological data on tree health. There are 80 trees total (so, 80 'sites') divided into four treatment categories. I've got the data plotted with color coded points--colors according to the treatment groups. Rather than plotting individual sites/trees on PCA biplot, I'd like to make something like a box-and-whisker plot that has four 'crosses' that show the centroid for each group and the SE in both PCA dimensions. I've seen figures like this in papers, but I can't seem to find an R script for plotting this way. Any suggestions? (I'd like to post an example image here of what I'm looking for, but the ones I can find are all paywalled, sorry).
I guess an alternative would be to just take the site scores and manually find the means and SE's and create my own plot, but I'd rather find a script for it, if possible.
The code I've been running is really straightforward:
p1<-princomp(scale(health, scale=T))
summary(p1)
scores(p1)
plot(p1)
loadings(p1)
biplot(p1, xlab = "PC 1 (38%)", ylab = "PC 2 (22%)",cex=0.6)
plot(p1$scores[,1],p1$scores[,2])
names(p1)
plot(p1$scores[,1],p1$scores[,2], type='n', xlab="PC I", ylab="PC II")
text(p1$scores[,1],p1$scores[,2] labels=Can$tree)
I would probably start with ordiellipse and see if that suits your needs.
### Reproducible example
require("vegan")
data(varespec)
data(varechem)
pca <- rda(varespec, scale = TRUE)
grp <- with(varechem, cut(Baresoil, 4, labels = 1:4))
cols <- c("red","orange","blue","forestgreen")
scl <- 1 ## scaling
plot(pca, display = "sites", scaling = scl, type = "n")
points(pca, display = "sites", scaling = scl, col = cols[grp], pch = 16)
lev <- levels(grp)
for (i in seq_along(lev)) { ## draw ellipse per group
ordiellipse(pca, display = "sites", kind = "se", scaling = scl,
groups = grp, col = cols[i], show.groups = lev[i])
}
## centroids
scrs <- as.data.frame(scores(pca, display = "sites", scaling = scl,
choices = 1:2))
cent <- do.call(rbind, lapply(split(scrs, grp), colMeans))
points(cent, col = cols, pch = 3, cex = 1.1)
This produces
You can remove the points() line from the code above to stop it drawing the actual samples, but I thought it instructive here in terms of understanding what ordiellipse is doing.
In the plot, the centroid is marked as the mean of the site scores on each axis by grouping grp. The ellipse is a continuous region that is (given the setting I chose in the ordiellipse() call) 1 standard error about this centroid. Your suggestion for error bars in each direction is a specific case of the ellipse drawn by ordiellipse --- if you were to compute the standard errors of the centroids, they should extend as far as the extremal points of the ellipse in the horizontal and vertical directions.
However, this would fail to take into account the covariances of the scores on the two axes. Note in the example below how in those ellipses that are oriented at angles to the axes, the standard error bars do not intersect with the ellipse at their extremal points. If you were to draw a box containing the the region defined by the error bars, it would contain the ellipse, but it gives a very different impression of the uncertainty in the centroid.
serrFun <- function(df) {
apply(df, 2, function(x) sd(x) / sqrt(length(x)))
}
serr <- do.call(rbind, lapply(split(scrs, grp), serrFun))
for (i in seq_along(lev)) {
arrows(cent[i, 1] - serr[i, 1], cent[i, 2],
cent[i, 1] + serr[i, 1], cent[i, 2],
col = cols[i], code = 3, angle = 90, length = 0.05)
arrows(cent[i, 1], cent[i, 2] - serr[i, 2],
cent[i, 1], cent[i, 2] + serr[i, 2],
col = cols[i], code = 3, angle = 90, length = 0.05)
}
I am working on a bit of code that creates a plot consisting of multiple individual graphs, one per ID, showing longitudinal data. For visual purposes, I am limiting the number of graphs to 20 per plot using par, but there are more than 20 IDs in the dataset, and therefore I need multiple plots.
Current problem: how to avoid overwriting an earlier plot with a new plot once the code moves beyond the 20th (or N*20th) ID. I think I need to use plot.new(), but not clear how to work this in, and could not find previous post that exactly addressed this.
My code:
# Create sample data by sampling
Start <- as.Date("2012-01-01")
End <- as.Date("2013-01-01")
data1 <- data.frame(ID = sort(rep(seq(64),3)), VisitDate = sort((Start + sample.int(End-Start, 192))), Count = rnorm(192, mean = 300, sd = 12), Treat = sample(0:1, 192, replace = TRUE))
# calculate days elapsed since start date, by ID
data1$VisitDate <- with(data1,as.Date(VisitDate,format="%y-%b-%d"))
data1$Days <- unlist(with(data1,tapply(VisitDate,ID,function(x){x-x[1]})))
#Define plot function
plot_one <- function(d){
with(d, plot(Days, Count, t="n", tck=1, main=unique(d$ID), cex.main = 0.8, ylab = "", yaxt = 'n', xlab = "", xaxt="n", xlim=c(0,8), ylim=c(0,500))) # set limits
grid(lwd = 0.3, lty = 7)
with(d[d$Treat == 0,], points(Days, Count, col = 1))
with(d[d$Treat == 1,], points(Days, Count, col = 2))
}
#Create multiple plot figure
par(mfrow=c(4,5), oma = c(0.5,0.5,0.5,0.5), mar = c(0.5,0.5,0.5,0.5))
plyr::d_ply(data1, "ID", plot_one)
If you are using windows, you call windows(). If you are using a Mac, you call quartz(). These will open a new device so that your next call to (e.g.) plot() will not overwrite your existing plots.