Related
I am trying to visualize the trajectory of multiple participants in a virtual room using R. I have a participant entering from the right (black square) and moving toward the left, where there is an exit door (red square). Sometimes there is an obstacle right in the middle of the room (circle), and the participant goes around it.
To visualize multiple participants’ trajectories on the same graph (i.e., multiple lines), I have used the function plot to set up the plot itself (and the first line) and then I have used the function lines to add other trajectories after that.
Below you can see an example with two lines; in the experiment, I have many more (as now I have collected data from about 20 participants.)
library(shape)
# black line
pos_x <- c(5.04,4.68,4.39,4.09,3.73,3.37,3.07,2.77,2.47,2.11)
pos_z <- c(0.74,0.69,0.64,0.60,0.56,0.52,0.50,0.50,0.50,0.51)
df1 <- cbind.data.frame(pos_x,pos_z)
x.2 <- df1$pos_x
z.2 <- df1$pos_z
plot(x.2,z.2,type="l", xlim=range(x.2), ylim=c(-1,3.5), xlab="x", ylab="z", main = "Two trajectories")
filledrectangle(wx = 0.2, wy = 0.2,col = "black", mid = c(5.16, 1), angle = 0)
filledrectangle(wx = 0.2, wy = 0.2,col = "red", mid = c(2, 1), angle = 0)
plotcircle(mid = c(3.4, 1), r = 0.05)
# red line
pos_x <- c(5.14,4.84,4.24,3.64,3.34,2.74,2.15)
pos_z <- c(0.17,0.13,0.01,-0.2,0.01,0.10,0.17)
df2 <- cbind.data.frame(pos_x,pos_z)
x.3 <- df2$pos_x
z.3 <- df2$pos_z
lines(x.3, z.3, xlim=range(x.3), ylim=c(-1,3.5), pch=16, col="red")
What I would like to do now is to find the average between these two lines. Ideally, I would like to be able to average multiple lines and add an interval for the standard deviation.
The first thing I have tried is to build an interpolation; the problem is that the start and end point are different, so I cannot average the points:
plot(x.2, z.2, xlim=range(x.2), ylim=c(-1,3.5), xlab="x", ylab="z", main = "Interpolation")
points(approx(x.2, z.2), col = 2, pch = "*")
points(x.3, z.3)
points(approx(x.3, z.3), col = 2, pch = "*")
I have then found a suggestion here: use the R library dtw.
I have looked up the library and the companion paper.
This is a typical example from the paper, in which "two non-overlapping windows" are extracted from a reference electrocardiogram. The dataset "aami3a" is a time series object.
library("dtw")
data("aami3a")
ref <- window(aami3a, start = 0, end = 2)
test <- window(aami3a, start = 2.7, end = 5)
alignment <- dtw(test, ref)
alignment$distance
The problem is that in all these examples the data is either structured as a time series object or the two lines are functions of a common matrix (see also the R quickstart example in the documentation and this other tutorial.)
How can I reorganize my data to make the function work? Or do you know of any other way to create an average?
You could map equivalent points from the start to the end of each path (i.e. find the midpoint between the two lines at the start of each path, the midpoint between the two lines after a quarter of each path is complete, after a half, at the end, etc.
The way to do that is to use interpolation (via approx):
pos_x_a <- c(5.04,4.68,4.39,4.09,3.73,3.37,3.07,2.77,2.47,2.11)
pos_z_a <- c(0.74,0.69,0.64,0.60,0.56,0.52,0.50,0.50,0.50,0.51)
pos_x_b <- c(5.14,4.84,4.24,3.64,3.34,2.74,2.15)
pos_z_b <- c(0.17,0.13,0.01,-0.2,0.01,0.10,0.17)
pos_t_a <- seq(0, 1, length.out = length(pos_x_a))
pos_t_b <- seq(0, 1, length.out = length(pos_x_b))
a_x <- approx(pos_t_a, pos_x_a, seq(0, 1, 0.01))$y
a_y <- approx(pos_t_a, pos_z_a, seq(0, 1, 0.01))$y
b_x <- approx(pos_t_b, pos_x_b, seq(0, 1, 0.01))$y
b_y <- approx(pos_t_b, pos_z_b, seq(0, 1, 0.01))$y
plot(a_x, a_y, type = "l", ylim = c(-1, 3))
lines(b_x, b_y, col = "red")
lines((a_x + b_x)/2, (a_y + b_y)/2, col = "blue", lty = 2)
We get a better idea of how this averaging has occurred by joining the points on each line that were used to get the average:
for(i in seq_along(a_x)) segments(a_x[i], a_y[i], b_x[i], b_y[i], col = "gray")
I need to overlay a normal distribution curve based on a dataset on a histogram of the same dataset.
I get the histogram and the normal curve right individually. But the curve just stays a flat line when combined to the histogram using the add = TRUE attribute in the curve function.
I did try adjusting the xlim and ylim to check if it works but am not getting the intended results, I am confused about how to set the (x and y) limits to suit both the histogram and the curve.
Any suggestions? My dataset is a set of values for 100 individuals daily walk distances ranging from min = 0.4km to max = 10km
bd.m <- read_excel('walking.xlsx')
hist(bd.m, ylim = c(0,10))
curve(dnorm(x, mean = mean(bd.m), sd = sd(bd.m)), add = TRUE, col = 'red')
You need to set freq = FALSEin the call to hist. For example:
dt <- rnorm(1000, 2)
hist(dt, freq = F)
curve(dnorm(x, mean = mean(dt), sd = sd(dt)), add = TRUE, col = 'red')
I have two vectors of 1000 values (a and b), from which I created density plots and histograms. I would like to retrieve the coordinates (or just the y value) where the two plots cross (it does not matter if it detects several crossings, I can discriminate them afterwards). Please find the data in the following link. Sample Data
xlim = c(min(c(a,b)), max(c(a,b)))
hist(a, breaks = 100,
freq = F,
xlim = xlim,
xlab = 'Test Subject',
main = 'Difference plots',
col = rgb(0.443137, 0.776471, 0.443137, 0.5),
border = rgb(0.443137, 0.776471, 0.443137, 0.5))
lines(density(a))
hist(b, breaks = 100,
freq = F,
col = rgb(0.529412, 0.807843, 0.921569, 0.5),
border = rgb(0.529412, 0.807843, 0.921569, 0.5),
add = T)
lines(density(b))
Using locate() is not optimal, since I need to retrieve this from several plots (but will use that approach if nothing else is viable). Thanks for your help.
We calculate the density curves for both series, taking care to use the same range. Then, we compare whether the y-value for a is greater than b at each x-value. When the outcome of this comparison flips, we know the lines have crossed.
df <- merge(
as.data.frame(density(a, from = xlim[1], to = xlim[2])[c("x", "y")]),
as.data.frame(density(b, from = xlim[1], to = xlim[2])[c("x", "y")]),
by = "x", suffixes = c(".a", ".b")
)
df$comp <- as.numeric(df$y.a > df$y.b)
df$cross <- c(NA, diff(df$comp))
points(df[which(df$cross != 0), c("x", "y.a")])
which gives you
My GAM curves are being shifted downwards. Is there something wrong with the intercept? I'm using the same code as Introduction to statistical learning... Any help's appreciated..
Here's the code. I simulated some data (a straight line with noise), and fit GAM multiple times using bootstrap.
(It took me a while to figure out how to plot multiple GAM fits in one graph. Thanks to this post Sam's answer, and this post)
library(gam)
N = 1e2
set.seed(123)
dat = data.frame(x = 1:N,
y = seq(0, 5, length = N) + rnorm(N, mean = 0, sd = 2))
plot(dat$x, dat$y, xlim = c(1,100), ylim = c(-5,10))
gamFit = vector('list', 5)
for (ii in 1:5){
ind = sample(1:N, N, replace = T) #bootstrap
gamFit[[ii]] = gam(y ~ s(x, 10), data = dat, subset = ind)
par(new=T)
plot(gamFit[[ii]], col = 'blue',
xlim = c(1,100), ylim = c(-5,10),
axes = F, xlab='', ylab='')
}
The issue is with plot.gam. If you take a look at the help page (?plot.gam), there is a parameter called scale, which states:
a lower limit for the number of units covered by the limits on the ‘y’ for each plot. The default is scale=0, in which case each plot uses the range of the functions being plotted to create their ylim. By setting scale to be the maximum value of diff(ylim) for all the plots, then all subsequent plots will produced in the same vertical units. This is essential for comparing the importance of fitted terms in additive models.
This is an issue, since you are not using range of the function being plotted (i.e. the range of y is not -5 to 10). So what you need to do is change
plot(gamFit[[ii]], col = 'blue',
xlim = c(1,100), ylim = c(-5,10),
axes = F, xlab='', ylab='')
to
plot(gamFit[[ii]], col = 'blue',
scale = 15,
axes = F, xlab='', ylab='')
And you get:
Or you can just remove the xlim and ylim parameters from both calls to plot, and the automatic setting of plot to use the full range of the data will make everything work.
I am trying to plot rose diagrams/ circular histograms on specific coordinates on a map analogous to drawing pie charts on a map as in the package mapplots.
Below is an example generated with mapplots (see below for code), I'd like to replace the pie charts with rose diagrams
The package circular lets me plot the rose diagrams, but I am unable to integrate it with the mapplots package. Any suggestions for alternative packages or code to achieve this?
In response to the question for the code to make the map. It's all based on the mapplots package. I downloaded a shapefile for the map (I think from http://www.freegisdata.org/)
library(mapplots)
library(shapefiles)
xlim = c(-180, 180)
ylim = c(-90, 90)
#load shapefile
wmap = read.shapefile ("xxx")
# define x,y,z for pies
x <- c(-100, 100)
y <- c(50, -50)
z1 <- c(0.25, 0.25, 0.5)
z2 <- c(0.5, 0.2, 0.3)
z <- rbind(z1,z2)
# define radii of the pies
r <- c(5, 10)
# it's easier to have all data in a single df
plot(NA, xlim = xlim, ylim = ylim, cex = 0.75, xlab = NA, ylab = NA)
draw.shape(wmap, col = "grey", border = "NA")
draw.pie(x,y,z,radius = r, col=c("blue", "yellow", "red"))
legend.pie (x = -160, y = -70, labels = c("0", "1", "2"), radius = 5,
bty = "n", cex = 0.5, label.dist=1.5, col = c("blue", "yellow", "red"))
the legend for the pie size can then be added using legend.bubble
Have a look at this example, you can use the map as background an plot your rose diagrams withPlotrix or ggplot2. In either case you would want to overlay multiple of these diagrams on top of your map which is easy to do in ggplot, just have a look at the example.
I discovered subplot() in the package Hmisc, which seems to do exactly what I wanted. Below is my solution (without the map in the background, which can be plotted using mapplots). I am open to suggestions on how to improve this though...
library(Hmisc)
library (circular)
dat <- data.frame(replicate(2,sample(0:360,10,rep=TRUE)))
lat <- c(50, -40)
lon <- c(-100, 20)
# convert to class circular
cir.dat <- as.circular (dat, type ='angles', units = 'degrees', template = 'geographic', modulo = 'asis', zero = 'pi/2', rotation = 'clock')
# function for subplot, plots relative frequencies, see rose.diag for how to adjust the plot
sub.rose <- function(x){
nu <- sum(!is.na(x))
de <- max(hist(x, breaks = (seq(0, 360, 30)), plot = FALSE)$counts)
prop <- nu/de
rose.diag(x, bins = 12, ticks = FALSE, axes = FALSE,
radii.scale = 'linear',
border = NA,
prop = prop,
col = 'black'
)
}
plot(NA, xlim = xlim, ylim = ylim)
for(i in 1:length(lat)){
subplot(sub.rose(cir.dat[,i]), x = lon[i], y = lat[i], size = c(1, 1))
}