R Surface Plot from List of X,Y,Z points - r

I am trying to make a surface plot for data that is in a very long list of x,y,z points. To do this, I am dividing the data into a grid of 10k squares and finding the max value of z within each square. From my understanding, each z value should be stored in a matrix where each element of the matrix corresponds to a square on the grid. Is there an easier way to do this than the code below? That last line is already pretty long and it is only one square.
x<-(sequence(101)-1)*max(eff$CFaR)/100
y<-(sequence(101)-1)*max(eff$EaR)/100
effmap<-matrix(ncol=length(x)-1, nrow=length(y)-1)
someMatrix <- max(eff$Cost[which(eff$EaR[which(eff$CFaR >= x[50] & eff$CFaR <x[51], arr.ind=TRUE)]>=y[20] & eff$EaR[which(eff$CFaR >= x[50] & eff$CFaR <x[51], arr.ind=TRUE)]< y[91])])

So this is my interpretation of what you are trying to accomplish...
df <- read.csv("effSample.csv") # downloaded from your link
df <- df[c("CFaR","EaR","Cost")] # remove unnecessary columns
df$x <- cut(df$CFaR,breaks=100,labels=FALSE) # establish bins: CFaR
df$y <- cut(df$EaR,breaks=100,labels=FALSE) # establish bins: EaR
df.max <- expand.grid(x=1:100,y=1:100) # template; 10,000 grid cells
# maximum cost in each grid cell - NOTE: most of the cells are *empty*
df.max <- merge(df.max,aggregate(Cost~x+y,df,max),all.x=TRUE)
z <- matrix(df.max$Cost,nr=100,nc=100) # Cost vector -> matrix
# colors based on z-value
palette <- rev(rainbow(20)) # palette of 20 colors
zlim <- range(z[!is.na(z)])
colors <- palette[19*(z-zlim[1])/diff(zlim) + 1]
# create the plot
library(rgl)
open3d(scale=c(1,1,10)) # CFaR and EaR range ~ 10 X Cost range
x.values <- min(df$CFaR)+(0:99)*diff(range(df$CFaR))/100
y.values <- min(df$EaR)+(0:99)*diff(range(df$EaR))/100
surface3d(x.values,y.values,z,col=colors)
axes3d()
title3d(xlab="CFaR",ylab="EaR",zlab="Cost")
The code above generates a rotatable 3D plot, so the image is just a screen shot. Notice how there are lots of "holes". This is (partially) because you provided only part of your data. However, it is important to realize that just because you imagine 10,000 grid cells (e.g., a 100 X 100 grid), does not mean that there will be data in every cell.

Related

How to visualize a distance matrix on the map by the thickness or color of the line connect the adjacent localities in R?

Suppose I have two datasets: (1) a data frame: coordinates of localities, each with ID; and (2) a linguistic distance matrix which reflects the linguistic distance between these localities.
# My data are similar to this structure
# dataframe
id <- c("A","B","C","D","E")
x_coor <- c(0.5,1,1,1.5,2)
y_coor <- c(5.5,3,7,6.5,5)
my.data <- data.frame(id = id, x_coor = x_coor, y_coor = y_coor)
# linguistic distance matrix
A B C D
B 308.298557
C 592.555483 284.256926
D 141.421356 449.719913 733.976839
E 591.141269 282.842712 1.414214 732.562625
Now, I want to visualize the linguistic distance between every two sites onto a map by the thickness or color of the line connect the adjacent localities in R.
Just like this:
enter image description here
My idea is to generate the delaunay triangulation by deldir or tripack package in R.
# generate delaunay triangulation
library(deldir)
de=deldir(my.data$x_coor,my.data$y_coor)
plot.deldir(de,wlines="triang",col='blue',wpoints = "real",cex = 0.1)
text(my.data$x_coor,my.data$y_coor,my.data$id)
this is the plot:
enter image description here
My question is how to reflect the linguistic distance by the thickness or color of the edges of triangles? Is there any other better method?
Thank you very much!
What you want to do in respect of the line widths can be done "fairly
easily" by the deldir package. You simply call plot.deldir() with the
appropriate value of "lw" (line width).
At the bottom of this answer is a demonstration script "demo.txt" which shows how to do this in the case of your example. In particular this script shows
how to obtain the appropriate value of lw from the "linguistic distance
matrix". I had to make some adjustments in the way this matrix was
presented. I.e. I had to convert it into a proper matrix.
I have rescaled the distances to lie between 0 and 10 to obtain the
corresponding values of the line widths. You might wish to rescale in a different manner.
In respect of colours, there are two issues:
(1) It is not at all clear how you would like to map the "linguistic
distances" to colours.
(2) Unfortunately the code for plot.deldir() is written in a very
kludgy way, whence the "col" argument to segments() cannot be
appropriately passed on in the same manner that the "lw" argument can.
(I wrote the plot.deldir() code a long while ago, when I knew far less about
R programming than I know now! :-))
I will adjust this code and submit a new version of deldir to CRAN
fairly soon.
#
# Demo script
#
# Present the linguistic distances in a useable way.
vldm <- c(308.298557,592.555483,284.256926,141.421356,449.719913,
733.976839,591.141269,282.842712,1.414214,732.562625)
ldm <- matrix(nrow=5,ncol=5)
ldm[row(ldm) > col(ldm)] <- vldm
ldm[row(ldm) <= col(ldm)] <- 0
ldm <- (ldm + t(ldm))/2
rownames(ldm) <- LETTERS[1:5]
colnames(ldm) <- LETTERS[1:5]
# Set up the example data. It makes life much simpler if
# you denote the "x" and "y" coordinates by "x" and "y"!!!
id <- c("A","B","C","D","E")
x_coor <- c(0.5,1,1,1.5,2)
y_coor <- c(5.5,3,7,6.5,5)
# Eschew nomenclature like "my.data". Such nomenclature
# is Micro$oft-ese and is an abomination!!!
demoDat <- data.frame(id = id, x = x_coor, y = y_coor)
# Form the triangulation/tessellation.
library(deldir)
dxy <- deldir(demoDat)
# Plot the triangulation with line widths proportional
# to "linguistic distances". Note that plot.deldir() is
# a *method* for plot, so you do not have to (and shouldn't)
# type the ".deldir" in the plotting command.
plot(dxy,col=0) # This, and plotting with "add=TRUE" below, is
# a kludge to dodge around spurious warnings.
ind <- as.matrix(dxy$delsgs[,c("ind1","ind2")])
lwv <- ldm[ind]
lwv <- 10*lwv/max(lwv)
plot(dxy,wlines="triang",col='grey',wpoints="none",
lw=10*lwv/max(lwv),add=TRUE)
with(demoDat,text(x,y,id,col="red",cex=1.5))

R: Sample a matrix for cells close to a specified position

I'm trying to find sites to collect snails by using a semi-random selection method. I have set a 10km2 grid around the region I want to collect snails from, which is broken into 10,000 10m2 cells. I want to randomly this grid in R to select 200 field sites.
Randomly sampling a matrix in R is easy enough;
dat <- matrix(1:10000, nrow = 100)
sample(dat, size = 200)
However, I want to bias the sampling to pick cells closer to a single position (representing sites closer to the research station). It's easier to explain this with an image;
The yellow cell with a cross represents the position I want to sample around. The grey shading is the probability of picking a cell in the sample function, with darker cells being more likely to be sampled.
I know I can specify sampling probabilities using the prob argument in sample, but I don't know how to create a 2D probability matrix. Any help would be appreciated, I don't want to do this by hand.
I'm going to do this for a 9 x 6 grid (54 cells), just so it's easier to see what's going on, and sample only 5 of these 54 cells. You can modify this to a 100 x 100 grid where you sample 200 from 10,000 cells.
# Number of rows and columns of the grid (modify these as required)
nx <- 9 # rows
ny <- 6 # columns
# Create coordinate matrix
x <- rep(1:nx, each=ny);x
y <- rep(1:ny, nx);y
xy <- cbind(x, y); xy
# Where is the station? (edit: not snails nest)
Station <- rbind(c(x=3, y=2)) # Change as required
# Determine distance from each grid location to the station
library(SpatialTools)
D <- dist2(xy, Station)
From the help page of dist2
dist2 takes the matrices of coordinates coords1 and coords2 and
returns the inter-Euclidean distances between coordinates.
We can visualize this using the image function.
XY <- (matrix(D, nr=nx, byrow=TRUE))
image(XY) # axes are scaled to 0-1
# Create a scaling function - scales x to lie in [0-1)
scale_prop <- function(x, m=0)
(x - min(x)) / (m + max(x) - min(x))
# Add the coordinates to the grid
text(x=scale_prop(xy[,1]), y=scale_prop(xy[,2]), labels=paste(xy[,1],xy[,2],sep=","))
Lighter tones indicate grids closer to the station at (3,2).
# Sampling probabilities will be proportional to the distance from the station, which are scaled to lie between [0 - 1). We don't want a 1 for the maximum distance (m=1).
prob <- 1 - scale_prop(D, m=1); range (prob)
# Sample from the grid using given probabilities
sam <- sample(1:nrow(xy), size = 5, prob=prob) # Change size as required.
xy[sam,] # Thse are your (**MY!**) 5 samples
x y
[1,] 4 4
[2,] 7 1
[3,] 3 2
[4,] 5 1
[5,] 5 3
To confirm the sample probabilities are correct, you can simulate many samples and see which coordinates were sampled the most.
snail.sam <- function(nsamples) {
sam <- sample(1:nrow(xy), size = nsamples, prob=prob)
apply(xy[sam,], 1, function(x) paste(x[1], x[2], sep=","))
}
SAMPLES <- replicate(10000, snail.sam(5))
tab <- table(SAMPLES)
cols <- colorRampPalette(c("lightblue", "darkblue"))(max(tab))
barplot(table(SAMPLES), horiz=TRUE, las=1, cex.names=0.5,
col=cols[tab])
If using a 100 x 100 grid and the station is located at coordinates (60,70), then the image would look like this, with the sampled grids shown as black dots:
There is a tendency for the points to be located close to the station, although the sampling variability may make this difficult to see. If you want to give even more weight to grids near the station, then you can rescale the probabilities, which I think is ok to do, to save costs on travelling, but these weights need to be incorporated into the analysis when estimating the number of snails in the whole region. Here I've cubed the probabilities just so you can see what happens.
sam <- sample(1:nrow(xy), size = 200, prob=prob^3)
The tendency for the points to be located near the station is now more obvious.
There may be a better way than this but a quick way to do it is to randomly sample on both x and y axis using a distribution (I used the normal - bell shaped distribution, but you can really use any). The trick is to make the mean of the distribution the position of the research station. You can change the bias towards the research station by changing the standard deviation of the distribution.
Then use the randomly selected positions as your x and y coordinates to select the positions.
dat <- matrix(1:10000, nrow = 100)
#randomly selected a position for the research station
rs <- c(80,30)
# you can change the sd to change the bias
x <- round(rnorm(400,mean = rs[1], sd = 10))
y <- round(rnorm(400, mean = rs[2], sd = 10))
position <- rep(NA, 200)
j = 1
i = 1
# as some of the numbers sampled can be outside of the area you want I oversampled # and then only selected the first 200 that were in the area of interest.
while (j <= 200) {
if(x[i] > 0 & x[i] < 100 & y[i] > 0 & y [i]< 100){
position[j] <- dat[x[i],y[i]]
j = j +1
}
i = i +1
}
plot the results:
plot(x,y, pch = 19)
points(x =80,y = 30, col = "red", pch = 19) # position of the station

Subset 3D matrix using polygon coordinates

I'm working on some bioacoustical analysis and got stuck with an issue that I believe it can be worked out mathematically. I'll use an sound sample from seewavepackage:
library(seewave)
library(tuneR)
data(tico)
By storing a spectrogram (i.e. graphic representation of the sound wave tico) in an R object, we can now deal with the wave file computationally.
s <- spectro(tico, plot=F)
class(s)
>[1] "list"
length(s)
>[1] 3
The object created s consists in two numerical vectors x = s$time, y = s$freq representing the X and Y axis, respectively, and a matrix z = s$amp of amplitude values with the same dimensions of x and y. Z is a virtually a 3D matrix that can be plotted using persp3D (plot3D), plot_ly (plotly) or plot3d (rgl). Alternatively, the wave file can be plotted in 3D using seewave if one wishes to visualize it as an interative rgl plot.
spectro3D(tico)
That being said, the analysis I'm conducting aims to calculate contours of relative amplitude:
con <- contourLines(x=s$time, y=s$freq, z=t(s$amp), levels=seq(-25, -25, 1))
Select the longest contour:
n.con <- numeric(length(con))
for(i in 1:length(con)) n.con[i] <- length(con[[i]]$x)
n.max <- which.max(n.con)
con.max <- con[[n.max]]
And then plot the selected contour against the spectrogram of tico:
spectro(tico, grid=F, osc=F, scale=F)
polygon(x=con.max$x, y=con.max$y, lwd=2)
Now it comes the tricky part. I must find a way to "subset" the matrix of amplitude values s$amp using the coordinates of the longest contour con.max. What I aim to achieve is a new matrix containing only the amplitude values inside the polygon. The remaining parts of the spectrogram should then appear as blank spaces.
One approach I though it could work would be to create a loop that replaces every value outside the polygon for a given amplitude value (e.g. -25 dB). I once did an similar approach to remove the values below -30 dB and it worked out perfectly:
for(i in 1:length(s$amp)){if(s$amp[i] == -Inf |s$amp[i] <= -30)
{s$amp[i] <- -30}}
Another though would be to create a new matrix with the same dimensions of s$amp, subset s$amp using the coordinates of the contour, then replace the subset on the new matrix. Roughly:
mt <- matrix(-30, nrow=nrow(s$amp), ncol = ncol(s$amp))
sb <- s$amp[con.max$y, con.max$x]
new.mt <- c(mt, sb)
s$amp <- new.mt
I'll appreciate any help.

R: How to extract values from contiguous raster cells that are not touched by SpatialLines?

I've been trying to extract values from a single attribute raster (area, in m2) that overlaps with lines (that is, a .shp SpatialLines).
The problem is that, along these lines, my raster sometimes goes from one to several contiguous cells in all directions. Using the extract function only values from cells that are touched by the lines are extracted. Thus, when I add up the extracted values from all lines a significant amount of area (m2) is lost due to cells that were not touched by the line and therefore values were not extracted.
I tried to work it around by:
Step 1 - first aggregating my raster to a lower resolution (i.e. increasing the fact argument) and then
Step 2 - rasterizing the lines using this aggregated raster (created in step 1) as a mold to make sure the rasterized lines would get thick enough to cover the horizontal spread of cells in my original resolution raster.
Step 3 - Then I resample the rasterized lines (created in step 2) back to the original resolution I started with.
Step 4 - Finally, extracted the values from the resampled rasterized lines (created in step 3).
However, it didn't quite work as now the total area (m2) varies according to the fact="" value I use when first aggregating the raster (in step 1).
I really appreciate if anyone has already dealt with a similar problem and can help me out here. Here are the codes I've been running to try to get it to work:
# input raster file
g.025 <- raster("ras.asc")
g.1 <- aggregate(g.025, fact=2, fun=sum)
# input SpatialLines
Spline1 <- readOGR("/Users/xxxxx.shp")
Spline2 <- readOGR("/Users/xxxxx.shp")
Spline3 <- readOGR("/Users/xxxxx.shp")
# rasterizing using low resolution raster (aggregated)
c1 <- rasterize(Spline1, g.1, field=Spline1$type, fun=sum)
c2 <- rasterize(Spline2, g.1, field=Spline2$type, fun=sum)
c3 <- rasterize(Spline3, g.1, field=Spline3$type, fun=sum)
# resampling back to higher resolution
c1 <- resample(c1, g.025)
c2 <- resample(c2, g.025)
c3 <- resample(c3, g.025)
# preparing to extract area (m2) values from raster “g.025”
c1tab <- as.data.frame(c1, xy=T)
c2tab <- as.data.frame(c2, xy=T)
c3tab <- as.data.frame(c3, xy=T)
c1tab <- c1tab[which(is.na(c1tab$layer)!=T),]
c2tab <- c2tab[which(is.na(c2tab$layer)!=T),]
c3tab <- c3tab[which(is.na(c3tab$layer)!=T),]
# extracting area (m2) values from raster “g.025”
c1tab[,4] <- extract(g.025, c1tab[,1:2])
c2tab[,4] <- extract(g.025, c2tab[,1:2])
c3tab[,4] <- extract(g.025, c3tab[,1:2])
names(c1tab)[4] <- "area_m2"
names(c2tab)[4] <- "area_m2"
names(c3tab)[4] <- "area_m2"
# sum total area (m2)
c1_area <- sum(c1tab$area_m2)
c2_area <- sum(c2tab$area_m2)
c3_area <- sum(c3tab$area_m2)
tot_area <- sum(c1_area, c2_area, c3_area)
Thanks!
Andre

Finding the best matching pairwise points from 2 vectors

I have 2 lists with X,Y coordinates of points.
List 1 contains more points than list 2.
The task is to find pairs of points in a way that the overall euclidean distance is minimized.
I have a working code, but i don't know if this is the best way and I would like to get hint what I can improve for result (better algorithm to find the minimum ) or speed, because the list are about 2000 elements each.
The round in the sample vectors is implemented to get also points with same distances.
With the "rdist" function all distances are generated in "distances". Than the minimum in the matrix is used to link 2 point ("dist_min"). All distances of these 2 points are now replaced by NA and the loop continues by searching the next minimum until all points of list 2 have a point from list 1.
At the end I have added a plot for visualization.
require(fields)
set.seed(1)
x1y1.data <- matrix(round(runif(200*2),2), ncol = 2) # generate 1st set of points
x2y2.data <- matrix(round(runif(100*2),2), ncol = 2) # generate 2nd set of points
distances <- rdist(x1y1.data, x2y2.data)
dist_min <- matrix(data=NA,nrow=ncol(distances),ncol=7) # prepare resulting vector with 7 columns
for(i in 1:ncol(distances))
{
inds <- which(distances == min(distances,na.rm = TRUE), arr.ind=TRUE)
dist_min[i,1] <- inds[1,1] # row of point(use 1st element of inds if points have same distance)
dist_min[i,2] <- inds[1,2] # column of point (use 1st element of inds if points have same distance)
dist_min[i,3] <- distances[inds[1,1],inds[1,2]] # distance of point
dist_min[i,4] <- x1y1.data[inds[1,1],1] # X1 ccordinate of 1st point
dist_min[i,5] <- x1y1.data[inds[1,1],2] # Y1 coordinate of 1st point
dist_min[i,6] <- x2y2.data[inds[1,2],1] # X2 coordinate of 2nd point
dist_min[i,7] <- x2y2.data[inds[1,2],2] # Y2 coordinate of 2nd point
distances[inds[1,1],] <- NA # remove row (fill with NA), where minimum was found
distances[,inds[1,2]] <- NA # remove column (fill with NA), where minimum was found
}
# plot 1st set of points
# print mean distance as measure for optimization
plot(x1y1.data,col="blue",main="mean of min_distances",sub=mean(dist_min[,3],na.rm=TRUE))
points(x2y2.data,col="red") # plot 2nd set of points
segments(dist_min[,4],dist_min[,5],dist_min[,6],dist_min[,7]) # connect pairwise according found minimal distance
This is a fundamental problem in combinatorial optimization known as the assignment problem. One approach to solving the assignment problem is the Hungarian algorithm which is implemented in the R package clue:
require(clue)
sol <- solve_LSAP(t(distances))
We can verify that it outperforms the naive solution:
mean(dist_min[,3])
# [1] 0.05696033
mean(sqrt(
(x2y2.data[,1] - x1y1.data[sol, 1])^2 +
(x2y2.data[,2] - x1y1.data[sol, 2])^2))
#[1] 0.05194625
And we can construct a similar plot to the one in your question:
plot(x1y1.data,col="blue")
points(x2y2.data,col="red")
segments(x2y2.data[,1], x2y2.data[,2], x1y1.data[sol, 1], x1y1.data[sol, 2])

Resources