Plot a thin plate spline using scatterplot3d - r

Splines are still fairly new to me.
I am trying to figure out how to create a three dimensional plot of a thin plate spline, similar to the visualizations which appear on pages 24-25 of Introduction to Statistical Learning (http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Sixth%20Printing.pdf). I'm working in scatterplot3d, and for the sake of easily reproducible data, lets use the 'trees' dataset in lieu of my actual data.
Setting the initial plot is trivial:
data(trees)
attach(trees)
s3d <- scatterplot3d(Girth, Height, Volume,
type = "n", grid = FALSE, angle = 70,
zlab = 'volume',
xlab = 'girth',
ylab = 'height',
main = "TREES") # blank 3d plot
I use the Tps function from the fields library to create the spline:
my.spline <- Tps(cbind(Girth, Height), Volume)
And I can begin to represent the spline visually:
for(i in nrow(my.spline$x):1) # for every girth . . .
s3d$points3d(my.spline$x[,1], rep(my.spline$x[i,2], times=nrow(my.spline$x)), # repeat every height . . .
my.spline$y, type='l') # and match these values to a predicted volume
But when I try to complete the spline by cross hatching lines along the height access, the results become problematic:
for(i in nrow(my.spline$x):1) # for every height . . .
s3d$points3d(rep(my.spline$x[i,1], times=nrow(my.spline$x)), my.spline$x[,2], # repeat every girth . . .
my.spline$y, type='l') # and match these values to a predicted volume
And the more that I look at the resulting plot, the less certain I am that I'm even using the right data from my.spline.
Please note that this project uses scatterplot3d for other visualizations, so I am wedded to this package as the result of preexisting team choices. Any help will be greatly appreciated.

I don't think you are getting the predicted Tps. That requires using predict.Tps
require(fields)
require(scatterplot3d)
data(trees)
attach(trees) # this worries me. I generally use data in dataframe form.
s3d <- scatterplot3d(Girth, Height, Volume,
type = "n", grid = FALSE, angle = 70,
zlab = 'volume',
xlab = 'girth',
ylab = 'height',
main = "TREES") # blank 3d plot
grid<- make.surface.grid( list( girth=seq( 8,22), height= seq( 60,90) ))
surf <- predict(my.spline, grid)
str(surf)
# num [1:465, 1] 5.07 8.67 12.16 15.6 19.1 ...
str(grid)
#------------
int [1:465, 1:2] 8 9 10 11 12 13 14 15 16 17 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:2] "girth" "height"
- attr(*, "grid.list")=List of 2
..$ girth : int [1:15] 8 9 10 11 12 13 14 15 16 17 ...
..$ height: int [1:31] 60 61 62 63 64 65 66 67 68 69 ...
#-------------
s3d$points3d(grid[,1],grid[,2],surf, cex=.2, col="blue")
You can add back the predicted points. This gives a better idea of x-y regions where there is "support" for the estimated surface:
s3d$points3d(my.spline$x[,1], my.spline$x[,2],
predict(my.spline) ,col="red")
There is no surface3d function in scatterplot3d package. (And I just searched the Rhelp archives to see if I were missing something but the graphics experts have always said that you would need to use lattice::wireframe, the graphics::persp or the 'rgl'-package functions. Since you have made a commitment to scatterplot3d, I think the easiest transtion would not be to those but to the much more capable base-graphics package named plot3d. It is capable of many variations and makes quite beautiful surfaces with its surf3D function:

Related

Manage Circles size in plot using symbols

I am using symbols function in r to draw cycles in a map, which has been imported as a plot.
According to the function Cycles radius are scaled basted on the max value of the data set.
I am plotting the same map for different time periods (different data set) and i want the maps to be comparable, meaning that the circle radius refers to the same values in all different maps. Is there a way that I can manage circle scaling?
Thanks
This is my code
#for the first map 2010
plot(my_map)
symbols(data2010$Lon, data2010$Lat, circles= data2010$number, inches=0.25,add=T)
#then the map for 2011
plot(my_map)
symbols(data2011$Lon, data2011$Lat, circles= data2011$number, inches=0.25,add=T)
The manual page suggests that setting inches=FALSE will accomplish what you want. Since you did not provide a sample of your data, we have to use data already available. This data set is used in the Examples on the manual page for the symbols() function:
data(trees)
str(trees)
# 'data.frame': 31 obs. of 3 variables:
# $ Girth : num 8.3 8.6 8.8 10.5 10.7 10.8 11 11 11.1 11.2 ...
# $ Height: num 70 65 63 72 81 83 66 75 80 75 ...
# $ Volume: num 10.3 10.3 10.2 16.4 18.8 19.7 15.6 18.2 22.6 19.9 ...
Since we only have one sample, we can plot the the symbols with and without the 31th row which is the largest.
with(trees, symbols(Height, Volume, circles = Girth/24, inches = FALSE))
Now add the data without row 31:
with(trees[-31, ], symbols(Height, Volume, circles = Girth/24, fg="red", inches = FALSE, add=TRUE))
We can tell that the scaling is the same because the red circles match the black circles even though the largest girth is missing from the second plot. For this to work you will have to specify the same values for xlim= and ylim= in each plot.
Run this code again replacing inches=FALSE with inches=.5 to see the difference.

How to plot a 3D surface out of columns data frames if I do not know the polynomial function

I have some data of longitudes and latitudes. My third variable is the penetration of the electric Vehicle in an municipality. Hence, I have sparse datas and I do not know the mapping from f(long,lat) -> MS_Year. I have the following datas
long lat MS_Year
<dbl> <dbl> <dbl>
1 -66.0436169857389 50.3417726256247 0.0122
2 -66.1704063635085 48.168838536499 0.0115
3 -67.1376617834163 48.9202603958534 0.0136
4 -67.474931686395 48.8025438021711 0.0108
5 -67.5756670981796 48.5194066352801 0.0111
6 -67.6273066949175 48.429540936994 0.0167
I have been able to do the 3D scatter plot without any problems.
scatterplot3d(Plot_Me_Tot_2019_grouped)
[enter image description here][1]
However, I've spent the whole day trying to understand how to do a surface. To my understanding, it is particularly hard, because I need to use a nonparametric estimator to show how complex the topology is. (The idea is to justify a nonparametric regression, which I've just learned about and never used; it might explain my total struggle).
Hence, I need to create a polynomial function f(long,lat) that has output MS_Year.
I Tried to applied it as follow :
library(predict3d)
library(rgl)
fit5=lm(MS_Year ~polym(long, lat,degree=5, raw=T),data=Plot_Me_Tot_2019_grouped)
predict3d(fit5,radius=0.05)
I did that, caused It combines this [polynomial regression][2], to this [3D plotting][3].
It's a total failure.
Did someone ever faced similar issues ?
I feel my problem is to create the linked function AKA the f(long,lat) and then with this, use expand.grid(long,lang) to create a surface and plot it.
One should understand that I do not posses a good understanding of the translation from the DF to the matrix format required for the 3D surface.
Thanks a lot for your time
[1]: https://i.stack.imgur.com/6ceJj.png
[2]: Polynomial regression with two variables with R
[3]: https://cran.r-project.org/web/packages/predict3d/vignettes/predict3d.html
I think you don't want a polynomial for the whole surface: that's likely to be very unstable, with huge amounts of variation between the points.
However, you might want a low degree local polynomial fit, or some low degree interpolation.
You haven't posted your real data, so I'll demonstrate with fake data. First, we do interpolation between the points:
set.seed(123)
df <- data.frame(long = rnorm(100, -66, 1),
lat = rnorm(100, 49, 1))
df$MS_Year <- 0.015 + df$long/1000 + df$lat/1000 + rnorm(100, 0.01, 0.0005)
head(df)
#> long lat MS_Year
#> 1 -66.56048 48.28959 0.007828523
#> 2 -66.23018 49.25688 0.008682913
#> 3 -64.44129 48.75331 0.009179444
#> 4 -65.92949 48.65246 0.007994563
#> 5 -65.87071 48.04838 0.006970499
#> 6 -64.28494 48.95497 0.009431914
library(interp)
surf <- interp(df$long, df$lat, df$MS_Year,
xo = sort(df$long), yo = sort(df$lat))
library(rgl)
plot3d(df, type = "s", size = 0.5)
persp3d(surf, col = "gray", add = TRUE)
This did bilinear interpolation between the points; it ends up very rough. You'll probably prefer to fit some sort of surface to the points rather than interpolate them. This fits a local smooth:
library(mgcv)
#> Loading required package: nlme
#> This is mgcv 1.8-38. For overview type 'help("mgcv-package")'.
fit <- gam(MS_Year ~ s(long, lat), data = df)
xo <- sort(df$long)
yo <- sort(df$lat)
grid <- expand.grid(long = xo, lat = yo)
pred <- predict(fit, newdata = grid)
plot3d(df, type = "s", size = 0.5)
persp3d(xo, yo, matrix(pred, 100,100), col = "gray", add = TRUE)
Created on 2022-01-23 by the reprex package (v2.0.1)
That's the same dataset, but the smoother managed to see that it's more or less linear in both long and lat. Your data probably won't end up with such a simple shape.

Grouping a Data Column in R

I have a data frame of 48 samples of zinc concentration reading. I am trying to group the data as Normal, High and Low (0 -30 low, 31 - 80 normal and above 80 high) and proceed to plot a scatter plot with different pch for each group.
here is the first 5 entries
sample concentration
1 1 71.1
2 2 131.7
3 3 13.9
4 4 31.7
5 5 6.4
THANKS
In general please try to include sample data in by using dput(head(data)) and pasting the output into your question.
What you want is called binning (grouping is a very different operation in R terms).
The standard function here is cut :
numericVector <- runif(100, min = 1, max = 100 ) # sample vector
cut(numericVector, breaks = c(0,30,81,Inf),right = TRUE, include.lowest = FALSE, labels = c("0-30","31-80","80 or higher"))
Please check the function documentation to adapt the parameters to your specific case.

Adding two kernel density objects in R?

Suppose we have two objects created using the density() function. Is there a way to add these two objects to get another density (or similar) object?
For example:
A = rnorm(100)
B = rnorm(1000)
dA = density(A)
dB = density(B)
dC = density(c(A, B))
Is there a way to get the dC object from the dA and dB objects? Some king of sum operation?
A return from density is a list with these parts:
> str(dA)
List of 7
$ x : num [1:512] -3.67 -3.66 -3.65 -3.64 -3.63 ...
$ y : num [1:512] 0.00209 0.00222 0.00237 0.00252 0.00268 ...
$ bw : num 0.536
$ n : int 4
$ call : language density.default(x = A)
$ data.name: chr "A"
$ has.na : logi FALSE
- attr(*, "class")= chr "density"
note the original data isn't in there, so we can't get that and simply do something like dAB = density(c(dA$data, dB$data)).
The x and y components form the curve of the density, which you can plot with plot(dA$x, dA$y). You might think all you need to do is add the y values from two density objects but there's no guarantee they'll be at the same x points.
So maybe you think you can interpolate one to the same x points and then add the y values. But that won't integrate to 1 like a proper density ought to, so what you should do is scale dA$y and dB$y according to the fraction of points in each component density - which you can get from the dA$n component.
If you don't understand that last point, consider the following two densities, one from 1000 points and one from 10:
dA = density(runif(1000))
dB = density(runif(500)+10)
the first is a uniform between 0 and 1, the second a uniform between 10 and 11. The height of both uniforms is 1, and their ranges don't overlap, so if you added them you'd get two steps of equal height. But the density of their unions:
dAB = density(c(runif(1000), runif(500)+10))
is a density with twice as much mass between 0 and 1 than between 10 and 11. When adding densities taken from samples you need to weight by the sample size.
So if you can interpolate them to the same x values, and then sum the y values scaled according to the n values as weights, you can get something that would approximate density(c(A,B)).

For a given location, identify minimum kernel density isopleth

I am undertaking research looking at the interactions of individual rats with a grid of traps distributed across the landscape (I have x, y coordinates for all trap locations). For each rat, I have generated a kernel utilisation density "home range" estimate using the R package adehabitatHR. What I'd like to do next is the following:
1- For each rat, calculate fine-scale home range contours from 1 - 99%
2- For each trap, calculate the minimum isopleth on which it is located: for example, trap 1 might "first" be on the 20% isopleth, trap 2 might "first" be on the 71% isopleth
My ultimate goal is to use the minimum isopleths calculated in a logistic regression to estimate the probability that a particular rat will "encounter" a particular trap within a specified time period.
Step 1 is easy enough but I'm having trouble imagining a way to accomplish step 2 short of plotting it all out manually (possible but I think there must be a better way). I suspect that part of my problem is that I'm new to both R and analysis of spatial data and I'm probably not searching with the right key words. Of what I've managed to find, the discussion that most closely resembles what I want to do is this.
How can I get the value of a kernel density estimate at specific points?
The above succeeds in calculating the probability value at specific points within a kernel utilisation distribution. However, what I'm trying to do is more to assign specific locations to a "category" - i.e. 5% category, 22% category etc.
Here is a small sample of my rat location data (coordinate system NZTM)
RatID Easting Northing
18 1732782.018 5926656.26
18 1732746.074 5926624.161
18 1732775.206 5926617.687
18 1732750.443 5926653.985
18 1732759.188 5926645.705
18 1732765.358 5926624.287
18 1732762.588 5926667.765
18 1732707.336 5926638.793
18 1732759.54 5926693.451
18 1732743.532 5926645.08
18 1732724.905 5926637.952
18 1732729.757 5926594.709
18 1732743.725 5926603.689
18 1732754.217 5926591.804
18 1732733.287 5926619.997
18 1732813.398 5926632.372
18 1732764.513 5926609.795
18 1732756.472 5926607.948
18 1732771.352 5926609.855
18 1732789.088 5926598.158
18 1732768.952 5926620.593
18 1732742.667 5926630.391
18 1732751.399 5926595.63
18 1732749.846 5926624.015
18 1732756.466 5926661.141
18 1732748.507 5926597.018
18 1732782.934 5926620.3
18 1732779.814 5926633.227
18 1732773.356 5926613.596
18 1732755.782 5926627.243
18 1732786.594 5926619.327
18 1732758.493 5926610.918
18 1732760.756 5926617.973
18 1732748.722 5926621.693
18 1732767.133 5926655.643
18 1732774.129 5926646.358
18 1732766.18 5926659.081
18 1732747.999 5926630.82
18 1732755.94 5926606.326
18 1732757.592 5926586.467
And here are the location data for my grid of traps:
TrapNum Easting Northing
HA1 1732789.055 5926589.589
HA2 1732814.738 5926605.615
HA3 1732826.837 5926614.635
HA4 1732853.275 5926621.766
HA5 1732877.903 5926638.804
HA6 1732893.335 5926649.771
HA7 1732917.186 5926651.287
HA8 1732944.25 5926669.952
HA9 1732963.233 5926679.758
HB1 1732778.721 5926613.718
HB2 1732798.169 5926624.735
HB3 1732818.44 5926631.303
HB4 1732844.132 5926647.878
HB5 1732862.387 5926662.465
HB6 1732884.118 5926671.112
HB7 1732903.641 5926681.234
HB8 1732931.883 5926695.332
HB9 1732947.286 5926698.757
HC1 1732766.385 5926629.555
HC2 1732785.31 5926647.128
HC3 1732801.985 5926657.742
HC4 1732835.289 5926664.553
HC5 1732843.434 5926694.72
HC6 1732862.648 5926702.187
HC7 1732878.385 5926709.82
HC8 1732916.886 5926712.215
HC9 1732935.947 5926715.582
HD1 1732755.253 5926654.033
HD2 1732774.911 5926672.812
HD3 1732794.617 5926671.724
HD4 1732820.064 5926689.754
HD5 1732816.794 5926714.769
HD6 1732841.166 5926732.481
HD7 1732865.646 5926734.21
HD8 1732906.592 5926738.893
HD9 1732930.1 5926752.73
Below is the code I used to calculate 1-99% home range contours using package adehabitatHR (Step 1). In addition, the code to plot selected home range isopleths over the grid of traps.
### First, load adehabitatHR and dependents
## specifying which variables are coordinates converts the dataframe into class SpatialPointsDataFrame
coordinates (RatLocs) = c("Easting", "Northing")
# create and store in object kudH KUDs using default bivariate normal kernel function and least-squares-cross-validation as smoothing bandwidth
kudH = kernelUD(RatLocs[,1], h = "LSCV")
kudH
## estimating home range from the KUD - mode VECTOR
homerange = getverticeshr(kudH)
## calculate home-range area for ALL probability levels (every 1%)
hr1to100 = kernel.area(kudH, percent = seq(1,100, by =1))
# generates error - for 100% kernel. rerun kernel UD with larger extent parameter.
## tried a range of values for other extents. Couldn't get one that worked for a 100% isopleth, 99% works
hr1to99 = kernel.area(kudH, percent = seq(1,99, by =1))
## An example of calculating and plotting selected home range isopleths over the grid of traps
## plot the trap grid
plot(Grid[,2], Grid[,3], xlab="Easting", ylab="Northing", pch=3, cex = 0.6, col="black", bty = "n", xlim=c(1742650,1743100), ylim=c(5912900,5913200), main = "KUD Home Range rat 33")
text(Grid[,2], Grid[,3], Grid[,1], cex=0.6, pos=2)
# Calculate and plot 95%, 75% and 50% contours for rat ID 33 (rat 2 in dataset)
HR95pc = getverticeshr(kudH)
plot(HR95pc[2,], col= rgb (1,0,0, alpha =0.1), border = "red1", add=TRUE)
HR75pc = getverticeshr(kudH, percent=75)
plot (HR75pc[2,], col = rgb(0,0,1, alpha =0.3), border = "purple", add=TRUE)
HR50pc = getverticeshr(kudH, percent=50)
plot(HR50pc[2,], col = rgb (0,1,1, alpha=0.3), border = "blue2", add=TRUE)
# Add individual location points for rat ID 33
rat33L = subset(RatLocs, RatID =="33")
plot(rat33L[,1], pch = 16, col = "blue", add=TRUE)
Can anyone help me get started on Step 2? I'd be grateful for any ideas.
Thanks.

Resources