Mean distance of the points from a fixed target point - r

I have some points in my data and I am trying to find the mean (arithmetic mean) distance to a target point.
I am taking two routes here:
One, using the 'Distance Between Two Points' formula to compute the distance between each point to the target and then getting a mean distance of those distance values.
Other, find the mean point of all points and then find the distance between this mean point and the target point.
I am not sure which approach is correct (both give different answers) if I want to get the average distance of all points to a target point?
My target point is in green and mean point is in red.
Following is my R code:
# three data points
a <- c(1.6, 2.3, 3.4)
b <- c(3.1, 4.1, 0.5)
# target point
t_x <- 1.1
t_y <- 0.1
df <- data.frame("x" = a, "y" = b)
# mean of the distances
df$distance <- sqrt(((df$x - t_x)^2) + ((df$y - t_y)^2))
print(mean(df$distance))
# distance from the mean point to the target
mean_x <- mean(df$x)
mean_y <- mean(df$y)
print(sqrt((mean_x - t_x)^2 + (mean_y - t_y)^2))
# plotting all
ggplot(df, aes(x = x, y = y)) +
geom_point() +
coord_cartesian(xlim = c(-5,5), ylim = c(-5,5)) +
geom_point(aes(x=mean_x, y=mean_y), color = "red") +
geom_point(aes(x=t_x, y=t_y), color = "green")

Just consider the case with one target point and two other points
Target: (0,0)
Point1: (-1,0)
Point2: (1, 0)
If you take the average of Points 1 and 2, you get (0,0) so the average distance to the target is 0. But the distance from each of the points to the target is 1 so the mean distance in 1. In general these two calcuations are quite different
It really just comes down to how you define the problem. In this example, do you expect the answer to be 0 or 1.

Here's an intuitive explanation of why your first method is the correct one. Imagine your target is at (0, 0):
t_x <- 0
t_y <- 0
Now suppose we draw some points around it that are all the same distance from it - in fact, they all lie on the unit circle and by definition are a distance of 1 from the target:
library(ggplot2)
t_x <- 0
t_y <- 0
rads <- seq(0, 2 * pi, length.out = 17)[-17]
df <- data.frame(x = cos(rads), y = sin(rads), xend = 0, yend = 0)
ggplot(df, aes(x, y)) +
geom_point(col = "red") +
geom_segment(aes(xend = xend, yend = yend), linetype = 2) +
coord_equal() +
geom_point(x = t_x, y = t_y, size = 5, colour = "red")
Now, not surprisingly, since all points are distance 1 from the target, the average distance by Pythagoras will also be 1:
# mean of the distances
df$distance <- sqrt(((df$x - t_x)^2) + ((df$y - t_y)^2))
print(mean(df$distance))
#> [1] 1
But now consider what happens if we take the average of all our x values - they cancel to 0. So do the y values, so the average of all points is (0, 0). When you measure the distance from (0, 0) to (0, 0), the answer, of course, is 0:
mean_x <- mean(df$x)
mean_y <- mean(df$y)
print(sqrt((mean_x - t_x)^2 + (mean_y - t_y)^2))
#> [1] 0
Created on 2020-08-22 by the reprex package (v0.3.0)

Related

How to plot distance biplot and correlation biplot results of SVD/PCA in R?

I searched for a long time for a straightforward explanation of the distance vs correlation biplots, as well as an explanation of how to transform the standard outputs of PCA to achieve the two biplots. All the stack overflow explanations 1 2 3 4 I saw went way over my head with math terms. How can I create both a distance biplot and a correlation biplot using the outputs of R's prcomp?
The best explanation I found is some lecture slides from Pierre Legendre, Département de sciences biologiques, Université de Montréal (http://biol09.biol.umontreal.ca/PLcourses/Ordination_section_1.1_PCA_Eng.pdf). However, while these slides did show the way to plot a distance and correlation biplot manually, they didn't show how to plot the distance and correlation biplots from the results of prcomp.
So I worked through an example that shows how one can use the outputs of prcomp for them to be equivalent to the example walked through in the pdf above. I am leaving this here for future people like myself who are wondering how to plot a distance vs correlation biplot and when you want to use each (according to Pierre Legendre)
set.seed(1)
#Run standard PCA
pca_res <- prcomp(mtcars[, 1:7], center = TRUE, scale = TRUE, retx = TRUE)
#To print a distance biplot, simply plot pca_red$x as points and $rotation
#as vectors
library(ggplot2)
arrow_len <- 3 #arbitrary scaling of arrows so they're same mag as PC scores
ggplot(data = as.data.frame(pca_res$x), aes(x = PC1, y = PC2)) +
geom_point() +
geom_segment(data = as.data.frame(pca_res$rotation),
aes(x = 0, y = 0, yend = arrow_len*PC1, xend = arrow_len*PC2),
arrow = arrow(length = unit(0.02, "npc"))) +
geom_text(data = as.data.frame(pca_res$rotation),
mapping = aes(y = arrow_len*PC1, x = arrow_len*PC2,
label = row.names(pca_res$rotation)))
#This is equivalent to the following steps:
Y_centered <- scale(mtcars[, 1:7], center = TRUE, scale = TRUE)
Y_eig <- eigen(cov(Y_centered))
#Note that Y_eig$vectors == pca_res$rotation ("rotations" or "loadings")
# and Y_eig$values (eigenvalues) == pca_res$sdev**2
#For a distance biplot
U_frame <- Y_eig$vectors
#F is your PC scores, achieved by multiplying your original data by the rotations
F_frame <- Y_centered %*% U_frame
#flipping constants if needed bc PC axis direction is arbitrary
x_flip = -1
y_flip = -1
ggplot(data = as.data.frame(F_frame), aes(x = x_flip*V1, y = y_flip*V2)) +
geom_point() +
geom_segment(data = as.data.frame(U_frame),
aes(x = 0, y = 0, yend = y_flip*arrow_len*V1, xend = x_flip*arrow_len*V2),
arrow = arrow(length = unit(0.02, "npc"))) +
geom_text(data = as.data.frame(U_frame),
mapping = aes(y = y_flip*arrow_len*V1, x = x_flip*arrow_len*V2,
label = colnames(Y_centered)))
#To print a correlation biplot, matrix multiply your rotations/loadings
# by the identity matrix times your PCA standard deviations
# (equivalent to the sqrt of your eigen values)
U_frame_scaling2 <- U_frame %*% diag(Y_eig$values^(0.5))
#And divide your PC scores by your PCA standard deviations
# (equivalent to 1/sqrt(eigen values)
F_frame_scaling2 <- F_frame %*% diag(Y_eig$values^(-0.5))
#Plot
arrow_len <- 1.5 #arbitrary scaling of arrows so they're same mag as PC scores
ggplot(data = as.data.frame(pca_res$x %*% diag(1/pca_res$sdev)),
aes(x = V1, y = V2)) +
geom_point() +
geom_segment(data = as.data.frame(pca_res$rotation %*% diag(pca_res$sdev)),
aes(x = 0, y = 0, yend = arrow_len*V1, xend = arrow_len*V2),
arrow = arrow(length = unit(0.02, "npc"))) +
geom_text(data = as.data.frame(pca_res$rotation %*% diag(pca_res$sdev)),
mapping = aes(y = arrow_len*V1, x = arrow_len*V2,
label = row.names(pca_res$rotation)))
ggplot(data = as.data.frame(F_frame_scaling2), aes(x = x_flip*V1, y = y_flip*V2)) +
geom_point() +
geom_segment(data = as.data.frame(U_frame_scaling2),
aes(x = 0, y = 0, yend = y_flip*arrow_len*V1, xend = x_flip*arrow_len*V2),
arrow = arrow(length = unit(0.02, "npc"))) +
geom_text(data = as.data.frame(U_frame_scaling2),
mapping = aes(y = y_flip*arrow_len*V1, x = x_flip*arrow_len*V2,
label = colnames(Y_centered)))
As for the differences between the two (in case the pdf above becomes unavailable at some point):
Scaling type 1: distance biplot, used when the interest is on the
positions of the objects with respect to one another. –
Plot matrices F to represent the objects and U for the variables.
Scaling type 2: correlation biplot, used when the angular
relationships among the variables are of primary interest. –
Plot matrices G to represent the objects and Usc2 for the
variables, where G = FΛ–1/2 , and Usc2 = UΛ1/2.
In scaling 1 (distance biplot),
the sites have variances, along each axis (or principal
component), equal to the axis eigenvalue (column of F);
the eigenvectors (columns of U) are normed to lengths = 1;
the length (norm) of each species vector in the pdimensional ordination space (rows of U) is 1.
In scaling 2 (correlation biplot),
the sites have unit variance along each axis (columns of G);
the eigenvectors (columns of Usc2) are normed to
lengths = sqrt(eigenvalues);
the norm of each species vector in the p-dimensional
ordination space (rows of Usc2) is its standard deviation.
In scaling 1 (distance biplot),
Distances among objects approximate their Euclidean distances in
full multidimensional space.
Projecting an object at right angle on a descriptor approximates the
position of the object along that descriptor.
Since descriptors have equal lengths of 1 in the full-dimensional
space, the length of the projection of a descriptor in reduced space
indicates how much it contributes to the formation of that space.
A scaling 1 biplot thus shows which variables contribute the most
to the ordination in a few dimensions (see also section: Equilibrium
contribution of variables).
The descriptor-axes are orthogonal (90°) to one another in
multidimensional space. These right angles, projected in reduced
space, do not reflect the variables’ correlations.
In scaling 2 (correlation biplot),
Distances among objects approximate their Mahalanobis distances
in full multidimensional space.
Projecting an object at right angle on a descriptor approximates the
position of the object along that descriptor.
Since descriptors have lengths sj in full-dimensional space, the
length of the projection of a descriptor j in reduced space is an
approximation of its standard deviation sj
. Note: sj is 1 when the
variables have been standardized.
The angles between descriptors in the biplot reflect their
correlations.
When the distance relationships among objects are important for
interpretation, this type of biplot is inadequate; a distance biplot
should be used.

How to calculate x-values of the convolution of two distributions?

(This question may be suited for https://stats.stackexchange.com/, but I'm thinking it's just how you calculate what I want in R that is my question).
I'm trying to add multiple distributions together, and then look at the resulting distribution. I'll illustrate my problem with a simple example using normally distributed random variables, p1 and p2.
set.seed(21)
N <- 1000
p1 <- rnorm(N, mean = 0, sd = 1)
p2 <- rnorm(N, mean = 10, sd = 1)
Which we can plot:
data.frame(p1, p2) %>%
gather(key="dist", value="value") %>%
ggplot(aes(value, color=dist)) + geom_density()
I can add these distributions together using convolve. Okay so that's fine. But what I can't figure out, is how to plot the summation of the distributions with the appropriate x-values. In the examples I've seen, it looks like the x-values are manually added in a way that doesn't seem "accurate" for lack of better work. See this Example.
I can "add" them together and plot:
pdf.c <- convolve(pdf1.y, pdf2.y, type = "open")
plot(pdf.c, type="l")
My question is how to get the corresponding x-values of the new distribution. I'm sure I'm missing something from a foundational statistics point of view.
Appendix for pdf1 and pdf2:
set.seed(21)
N <- 1000
p1 <- rnorm(N, mean = 0, sd = 1)
p2 <- rnorm(N, mean = 10, sd = 1)
pdf1.x <- density(p1)$x
pdf2.x <- density(p2)$x
pdf1.y <- density(p1)$y / sum(density(p1)$y)
pdf2.y <- density(p2)$y / sum(density(p2)$y)
df1 <- data.frame(pdf.x = pdf1.x, pdf.y = pdf1.y, dist = "1", stringsAsFactors = FALSE)
df2 <- data.frame(pdf.x = pdf2.x, pdf.y = pdf2.y, dist = "2", stringsAsFactors = FALSE)
df <- bind_rows(df1, df2)
Assuming that p1 and p2 are discretized uniformly, with the same interval dx between successive x values. (I see that you have discretized p1 and p2 at random points -- that's not the same, and, without thinking about it some more, I don't have an answer for that.) Let x1 = x1_0 + (k - 1) times dx, k = 1, 2, 3, ..., n1 be the points at which p1 is discretized, and x2 = x2_0 + (k - 1) times dx, k = 1, 2, 3, ..., n2 be the points at which p2 is discretized.
Each point xi_k = xi_0 + (k - 1) times dx represents the center point of a bar which has width dx and height pi(xi_k), i = 1, 2. Thus the mass of the bar is dx times pi(xi_k), and the total mass for all bars approaches 1 as dx approaches 0. These masses are the values which are convolved. If the discretized masses are normalized to 1, then their convolution will also be normalized to 1.
To be very careful, the range over which the distributions are discretized are xi_0 - dx/2 to xi_0 + (ni - 1) times dx + dx/2. After computing the convolution, the range for the result is likewise -dx/2 and +dx/2 the first and last points, respectively.
The convolution has n = n1 + n2 - 1 points, namely x1_0 + x2_0 + (k - 1) times dx, k = 1, 2, 3, ..., n1 + n2 - 1. The first point is x1_0 + x2_0 (i.e. first point for p1 plus first point for p2) and the last point is x1_0 + x2_0 + (n1 + n2 - 2) times dx = (x1_0 + (n1 - 1) times dx) + (x2_0 + (n2 - 1) times dx) (i.e. last point for p1 plus last point for p2). From this you can construct x values corresponding to the convolution via the seq function or something like that.

Add labels to the center of a geom_curve line (ggplot)

Is there any way to add a label on or near the center of a geom_curve line? Currently, I can only do so by labeling either the start or end point of the curve.
library(tidyverse)
library(ggrepel)
df <- data.frame(x1 = 1, y1 = 1, x2 = 2, y2 = 3, details = "Object Name")
ggplot(df, aes(x = x1, y = y1, label = details)) +
geom_point(size = 4) +
geom_point(aes(x = x2, y = y2),
pch = 17, size = 4) +
geom_curve(aes(x = x1, y = y1, xend = x2, yend = y2)) +
geom_label(nudge_y = 0.05) +
geom_label_repel(box.padding = 2)
I would love some way to automatically label the curve near coordinates x=1.75, y=1.5. Is there a solution out there I haven't seen yet? My intended graph is quite busy, and labeling the origin points makes it harder to see what's happening, while labeling the arcs would make a cleaner output.
I've come to a solution for this problem. It's large and clunky, but effective.
The core problem is that geom_curve() does not draw a set path, but it moves and scales with the aspect ratio of the plot window. So short of locking the aspect ratio with coord_fixed(ratio=1) there is no way I can easily find to predict where the midpoint of a geom_curve() segment will be.
So instead I set about finding midpoint for a curve, and then forcing the curve to go through that point which I would later label. To find the midpoint I had to copy two functions from the grid package:
library(grid)
library(tidyverse)
library(ggrepel)
# Find origin of rotation
# Rotate around that origin
calcControlPoints <- function(x1, y1, x2, y2, curvature, angle, ncp,
debug=FALSE) {
# Negative curvature means curve to the left
# Positive curvature means curve to the right
# Special case curvature = 0 (straight line) has been handled
xm <- (x1 + x2)/2
ym <- (y1 + y2)/2
dx <- x2 - x1
dy <- y2 - y1
slope <- dy/dx
# Calculate "corner" of region to produce control points in
# (depends on 'angle', which MUST lie between 0 and 180)
# Find by rotating start point by angle around mid point
if (is.null(angle)) {
# Calculate angle automatically
angle <- ifelse(slope < 0,
2*atan(abs(slope)),
2*atan(1/slope))
} else {
angle <- angle/180*pi
}
sina <- sin(angle)
cosa <- cos(angle)
# FIXME: special case of vertical or horizontal line ?
cornerx <- xm + (x1 - xm)*cosa - (y1 - ym)*sina
cornery <- ym + (y1 - ym)*cosa + (x1 - xm)*sina
# Debugging
if (debug) {
grid.points(cornerx, cornery, default.units="inches",
pch=16, size=unit(3, "mm"),
gp=gpar(col="grey"))
}
# Calculate angle to rotate region by to align it with x/y axes
beta <- -atan((cornery - y1)/(cornerx - x1))
sinb <- sin(beta)
cosb <- cos(beta)
# Rotate end point about start point to align region with x/y axes
newx2 <- x1 + dx*cosb - dy*sinb
newy2 <- y1 + dy*cosb + dx*sinb
# Calculate x-scale factor to make region "square"
# FIXME: special case of vertical or horizontal line ?
scalex <- (newy2 - y1)/(newx2 - x1)
# Scale end points to make region "square"
newx1 <- x1*scalex
newx2 <- newx2*scalex
# Calculate the origin in the "square" region
# (for rotating start point to produce control points)
# (depends on 'curvature')
# 'origin' calculated from 'curvature'
ratio <- 2*(sin(atan(curvature))^2)
origin <- curvature - curvature/ratio
# 'hand' also calculated from 'curvature'
if (curvature > 0)
hand <- "right"
else
hand <- "left"
oxy <- calcOrigin(newx1, y1, newx2, newy2, origin, hand)
ox <- oxy$x
oy <- oxy$y
# Calculate control points
# Direction of rotation depends on 'hand'
dir <- switch(hand,
left=-1,
right=1)
# Angle of rotation depends on location of origin
maxtheta <- pi + sign(origin*dir)*2*atan(abs(origin))
theta <- seq(0, dir*maxtheta,
dir*maxtheta/(ncp + 1))[c(-1, -(ncp + 2))]
costheta <- cos(theta)
sintheta <- sin(theta)
# May have BOTH multiple end points AND multiple
# control points to generate (per set of end points)
# Generate consecutive sets of control points by performing
# matrix multiplication
cpx <- ox + ((newx1 - ox) %*% t(costheta)) -
((y1 - oy) %*% t(sintheta))
cpy <- oy + ((y1 - oy) %*% t(costheta)) +
((newx1 - ox) %*% t(sintheta))
# Reverse transformations (scaling and rotation) to
# produce control points in the original space
cpx <- cpx/scalex
sinnb <- sin(-beta)
cosnb <- cos(-beta)
finalcpx <- x1 + (cpx - x1)*cosnb - (cpy - y1)*sinnb
finalcpy <- y1 + (cpy - y1)*cosnb + (cpx - x1)*sinnb
# Debugging
if (debug) {
ox <- ox/scalex
fox <- x1 + (ox - x1)*cosnb - (oy - y1)*sinnb
foy <- y1 + (oy - y1)*cosnb + (ox - x1)*sinnb
grid.points(fox, foy, default.units="inches",
pch=16, size=unit(1, "mm"),
gp=gpar(col="grey"))
grid.circle(fox, foy, sqrt((ox - x1)^2 + (oy - y1)^2),
default.units="inches",
gp=gpar(col="grey"))
}
list(x=as.numeric(t(finalcpx)), y=as.numeric(t(finalcpy)))
}
calcOrigin <- function(x1, y1, x2, y2, origin, hand) {
# Positive origin means origin to the "right"
# Negative origin means origin to the "left"
xm <- (x1 + x2)/2
ym <- (y1 + y2)/2
dx <- x2 - x1
dy <- y2 - y1
slope <- dy/dx
oslope <- -1/slope
# The origin is a point somewhere along the line between
# the end points, rotated by 90 (or -90) degrees
# Two special cases:
# If slope is non-finite then the end points lie on a vertical line, so
# the origin lies along a horizontal line (oslope = 0)
# If oslope is non-finite then the end points lie on a horizontal line,
# so the origin lies along a vertical line (oslope = Inf)
tmpox <- ifelse(!is.finite(slope),
xm,
ifelse(!is.finite(oslope),
xm + origin*(x2 - x1)/2,
xm + origin*(x2 - x1)/2))
tmpoy <- ifelse(!is.finite(slope),
ym + origin*(y2 - y1)/2,
ifelse(!is.finite(oslope),
ym,
ym + origin*(y2 - y1)/2))
# ALWAYS rotate by -90 about midpoint between end points
# Actually no need for "hand" because "origin" also
# encodes direction
# sintheta <- switch(hand, left=-1, right=1)
sintheta <- -1
ox <- xm - (tmpoy - ym)*sintheta
oy <- ym + (tmpox - xm)*sintheta
list(x=ox, y=oy)
}
With that in place, I calculated a midpoint for each record
df <- data.frame(x1 = 1, y1 = 1, x2 = 10, y2 = 10, details = "Object Name")
df_mid <- df %>%
mutate(midx = calcControlPoints(x1, y1, x2, y2,
angle = 130,
curvature = 0.5,
ncp = 1)$x) %>%
mutate(midy = calcControlPoints(x1, y1, x2, y2,
angle = 130,
curvature = 0.5,
ncp = 1)$y)
I then make the graph, but draw two separate curves. One from the origin to the calculated midpoint, and another from the midpoint to the destination. The angle and curvature settings for both finding the midpoint and drawing these curves are tricky to keep the result from obviously looking like two different curves.
ggplot(df_mid, aes(x = x1, y = y1)) +
geom_point(size = 4) +
geom_point(aes(x = x2, y = y2),
pch = 17, size = 4) +
geom_curve(aes(x = x1, y = y1, xend = midx, yend = midy),
curvature = 0.25, angle = 135) +
geom_curve(aes(x = midx, y = midy, xend = x2, yend = y2),
curvature = 0.25, angle = 45) +
geom_label_repel(aes(x = midx, y = midy, label = details),
box.padding = 4,
nudge_x = 0.5,
nudge_y = -2)
Though the answer isn't ideal or elegant, it scales with a large number of records.
Maybe annotations would help here (see: http://ggplot2.tidyverse.org/reference/annotate.html)
library(tidyverse)
library(ggrepel)
df <- data.frame(x1 = 1, y1 = 1, x2 = 2, y2 = 3, details = "Object Name")
ggplot(df, aes(x = x1, y = y1, label = details)) +
geom_point(size = 4) +
geom_point(aes(x = x2, y = y2),
pch = 17, size = 4) +
geom_curve(aes(x = x1, y = y1, xend = x2, yend = y2)) +
geom_label(nudge_y = 0.05) +
geom_label_repel(box.padding = 2) +
annotate("label", x=1.75, y=1.5, label=df$details)

Simulated sphere shape data based on normal distribution

I'm clueless on below question. Any help is appreciated please.
"Simulate data with n=1000 observations and p=3 covariates -- all random variables from standard normal distribution. Create two category class variable assigning all observations within a sphere with radius of 1.5 centered at 3D zero to one class category and all others -- to the second".
Here's a 2D example to get you going...
library(ggplot2)
library(grid)
Sample x & y coords from normal distribution (default mean = 0, sd = 1)
df <- data.frame(x = rnorm(100), y = rnorm(100))
Calculate distance from centre (0,0)
df$r = sqrt(df$x^2 + df$y^2)
Assign to category
df$category <- ifelse(df$r < 1, "in", "out")
Plot
ggplot(df, aes(x = x, y = y, color = category)) +
geom_point() +
coord_equal() +
annotation_custom(grob=circleGrob(r=unit(1,"npc"), gp = gpar(fill = NA)), xmin=-0.5, xmax=0.5, ymin=-0.5, ymax=0.5)

topoplot in ggplot2 – 2D visualisation of e.g. EEG data

Can ggplot2 be used to produce a so-called topoplot (often used in neuroscience)?
Sample data:
label x y signal
1 R3 0.64924459 0.91228430 2.0261520
2 R4 0.78789621 0.78234410 1.7880972
3 R5 0.93169511 0.72980685 0.9170998
4 R6 0.48406513 0.82383895 3.1933129
Full sample data.
Rows represent individual electrodes. Columns x and y represent the projection into 2D space and the column signal is essentially the z-axis representing voltage measured at a given electrode.
stat_contour doesn't work, apparently due to unequal grid.
geom_density_2d only provides a density estimation of x and y.
geom_raster is one not fitted for this task or I must be using it incorrectly since it quickly runs out of memory.
Smoothing (like in the image on the right) and head contours (nose, ears) aren't necessary.
I want to avoid Matlab and transforming the data so that it fits this or that toolbox… Many thanks!
Update (26 January 2016)
The closest I've been able to get to my objective is via
library(colorRamps)
ggplot(channels, aes(x, y, z = signal)) + stat_summary_2d() + scale_fill_gradientn(colours=matlab.like(20))
which produces an image like this:
Update 2 (27 January 2016)
I've tried #alexforrence's approach with full data and this is the result:
It's a great start but there is a couple of issues:
The last call (ggplot()) takes about 40 seconds on an Intel i7 4790K while Matlab toolboxes manage to generate these almost instantly; my ‘emergency solution’ above takes about a second.
As you can see, the upper and lower border of the central part appear to be ‘sliced’ – I'm not sure what causes this but it could be the third issue.
I'm getting these warnings:
1: Removed 170235 rows containing non-finite values (stat_contour).
2: Removed 170235 rows containing non-finite values (stat_contour).
Update 3 (27 January 2016)
Comparison between two plots produced with different interp(xo, yo) and stat_contour(binwidth) values:
Ragged edges if one chooses low interp(xo, yo), in this case xo/yo = seq(0, 1, length = 100):
Here's a potential start:
First, we'll attach some packages. I'm using akima to do linear interpolation, though it looks like EEGLAB uses some sort of spherical interpolation here? (the data was a little sparse to try it).
library(ggplot2)
library(akima)
library(reshape2)
Next, reading in the data:
dat <- read.table(text = " label x y signal
1 R3 0.64924459 0.91228430 2.0261520
2 R4 0.78789621 0.78234410 1.7880972
3 R5 0.93169511 0.72980685 0.9170998
4 R6 0.48406513 0.82383895 3.1933129")
We'll interpolate the data, and stick that in a data frame.
datmat <- interp(dat$x, dat$y, dat$signal,
xo = seq(0, 1, length = 1000),
yo = seq(0, 1, length = 1000))
datmat2 <- melt(datmat$z)
names(datmat2) <- c('x', 'y', 'value')
datmat2[,1:2] <- datmat2[,1:2]/1000 # scale it back
I'm going to borrow from some previous answers. The circleFun below is from Draw a circle with ggplot2.
circleFun <- function(center = c(0,0),diameter = 1, npoints = 100){
r = diameter / 2
tt <- seq(0,2*pi,length.out = npoints)
xx <- center[1] + r * cos(tt)
yy <- center[2] + r * sin(tt)
return(data.frame(x = xx, y = yy))
}
circledat <- circleFun(c(.5, .5), 1, npoints = 100) # center on [.5, .5]
# ignore anything outside the circle
datmat2$incircle <- (datmat2$x - .5)^2 + (datmat2$y - .5)^2 < .5^2 # mark
datmat2 <- datmat2[datmat2$incircle,]
And I really liked the look of the contour plot in R plot filled.contour() output in ggpplot2, so we'll borrow that one.
ggplot(datmat2, aes(x, y, z = value)) +
geom_tile(aes(fill = value)) +
stat_contour(aes(fill = ..level..), geom = 'polygon', binwidth = 0.01) +
geom_contour(colour = 'white', alpha = 0.5) +
scale_fill_distiller(palette = "Spectral", na.value = NA) +
geom_path(data = circledat, aes(x, y, z = NULL)) +
# draw the nose (haven't drawn ears yet)
geom_line(data = data.frame(x = c(0.45, 0.5, .55), y = c(1, 1.05, 1)),
aes(x, y, z = NULL)) +
# add points for the electrodes
geom_point(data = dat, aes(x, y, z = NULL, fill = NULL),
shape = 21, colour = 'black', fill = 'white', size = 2) +
theme_bw()
With improvements mentioned in the comments (setting extrap = TRUE and linear = FALSE in the interp call to fill in gaps and do a spline smoothing, respectively, and removing NAs before plotting), we get:
mgcv can do spherical splines. This replaces akima (the chunk containing interp() isn't necessary).
library(mgcv)
spl1 <- gam(signal ~ s(x, y, bs = 'sos'), data = dat)
# fine grid, coarser is faster
datmat2 <- data.frame(expand.grid(x = seq(0, 1, 0.001), y = seq(0, 1, 0.001)))
resp <- predict(spl1, datmat2, type = "response")
datmat2$value <- resp

Resources