Analyzing octopus catches with LinearK function in R [closed] - r

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 3 years ago.
Improve this question
I hope you can help me with this problem i can't find how to overcome. Sorry if I made some mistakes while writing this post, my english is a bit rusty right now.
Here is the question. I have .shp data that I want to analyze in R. The .shp can be either lines that represent lines of traps we set to catch octopuses or points located directly over those lines, representing where we had catured one.
The question i'm trying to answer is: Are octopuses statistically grouped or not?
After a bit of investigation it seems to me that i need to use R and its linearK function to answer that question, using the libraries Maptools, SpatStat and Sp.
Here is the code i'm using in RStudio:
Loading the libraries
library(spatstat)
library(maptools)
library(sp)
Creating a linnet object with the track
t1<- as.linnet(readShapeSpatial("./20170518/t1.shp"))
I get the following warning but it seems to work
Warning messages:
1: use rgdal::readOGR or sf::st_read
2: use rgdal::readOGR or sf::st_read
Plotting it to be sure everything is ok
plot(t1)
Creating a ppp object with the points
p1<- as.ppp(readShapeSpatial("./20170518/p1.shp"))
I get the same warning here, but the real problems start when I try to plot it:
> plot(p1)
Error in if (!is.vector(xrange) || length(xrange) != 2 || xrange[2L] < :
missing value where TRUE/FALSE needed
In addition: Warning messages:
1: Interpretation of arguments maxsize and markscale has changed (in spatstat version 1.37-0 and later). Size of a circle is now measured by its diameter.
2: In plot.ppp(x, ..., multiplot = FALSE, do.plot = FALSE) :
All mark values are NA; plotting locations only.
3: In plot.ppp(x, ..., multiplot = FALSE, do.plot = FALSE) :
All mark values are NA; plotting locations only.
4: In plot.ppp(x, ..., multiplot = FALSE, do.plot = FALSE) :
All mark values are NA; plotting locations only.
5: In plot.ppp(x, ..., multiplot = FALSE, do.plot = FALSE) :
All mark values are NA; plotting locations only.
6: In plot.ppp(x, ..., multiplot = FALSE, do.plot = FALSE) :
All mark values are NA; plotting locations only.
7: In plot.ppp(x, ..., multiplot = FALSE, do.plot = FALSE) :
All mark values are NA; plotting locations only.
Now what is left is to join the objects in a lpp object and to analyze it with the linearK function
> pt1 <- lpp(p1,t1)
> linearK(pt1)
Function value object (class ‘fv’)
for the function r -> K[L](r)
......................................
Math.label Description
r r distance argument r
est {hat(K)[L]}(r) estimated K[L](r)
......................................
Default plot formula: .~r
where “.” stands for ‘est’
Recommended range of argument r: [0, 815.64]
Available range of argument r: [0, 815.64]
This is my situation right now. What i dont know is why the plot function is not working with my ppp object and how to understant the return of the linearK function. Help(linearK) didn't provide any clue. Since i have a lot of tracks, each with its set of points, my desired outcome would be some kind of summary like x tracks analized, a grouped, b dispersed and c unkown.
Thank you for your time, i'll greatly appreciate if you can help me solve this problem.
Edit: Here is a link to a zip file containing al the shp files of one day, both tracks and points, and a txt file with my code. https://drive.google.com/open?id=0B0uvwT-2l4A5ODJpOTdCekIxWUU

First two pieces of general advice: (1) each time you create a complicated object, print it at the terminal, to see if it is what you expected. (2) When you get an error, immediately type traceback() and copy the output. This will reveal exactly where the error is detected.
A ppp object must include a specification of the study region (window). In your code, the object p1 is created by converting data of class SpatialPointsDataFrame, which do not include a specification of the study region, converted via the function as.ppp.SpatialPointsDataFrame, into an object of class ppp in which the window is guessed by taking the bounding box of the coordinates. Unfortunately, in your example, there is only one data point in p1, so the default bounding box is a rectangle of width 0 and height 0. [This would have been revealed by printing p1.] Such objects can usually be handled by spatstat, but this particular object triggers a bug in the function plot.solist which expects windows to have non-zero size. I will fix the bug, but...
In your case, I suggest you do
Window(p1) <- Window(t1)
immediately after creating p1. This will ensure that p1 has the window that you probably intended.
If all else fails, read the spatstat vignette on shapefiles...

I have managed to find a solution. As Adrian Baddeley noticed there was a problem with the owin object. That problem seems to be bypassed (not really solved) if I create the ppp object in a manual way instead of converting my set of points.
I have also changed the readShapeFile function for the rgdal::readOGR, since the first once was deprecated, and that was the reason of the warnings I was getting.
This is the R script i'm using right now, commented to clarify:
#first install spatstat, maptools y sp
#load them
library(spatstat)
library(maptools)
library(sp)
#create an array of folders, will add more when everything works fine
folders=c("20170518")
for(f in folders){
#read all shp from that folder, both points and tracks
pointfiles <- list.files(paste("./",f,"/points", sep=""), pattern="*.shp$")
trackfiles <- list.files(paste("./",f,"/tracks", sep=""), pattern="*.shp$")
#for each point and track couple
for(i in 1:length(pointfiles)){
#create a linnet object with the track
t<- as.linnet(rgdal::readOGR(paste("./",f,"/tracks/",trackfiles[i], sep="")))
#plot(t)
#create a ppp object for each set of points
pre_p<-rgdal::readOGR(paste("./",f,"/points/",pointfiles[i], sep=""))
#plot(p)
#obtain the coordinates the current set of points
c<-coordinates(pre_p)
#create vector of x coords
xc=c()
#create vector of y coords
yc=c()
#not a very good way to fill my vectors but it works for my study area
for(v in c){
if(v>4000000){yc<-c(yc,v)}
else {if(v<4000000 && v>700000){xc<-c(xc,v)}}
}
print(xc)
print(yc)
#create a ppp object using the vectors of x and y coords, and a window object
#extracted from my set of points
p=ppp(xc,yc,Window(as.ppp(pre_p)))
#join them into an lpp object
pt <- lpp(p,t)
#plot(pt)
#analize it with the linearK function, nsim=9 for testing purposes
#envelope.lpp is the method for analyzing linear point patterns
assign(paste("results",f,i,sep="_"),envelope.lpp(pt, nsim=9, fun=linearK))
}#end for each points & track set
}#end for each day of study
So as you can see this script is testing for CSR each couple of points and track for each day, working fine right now. Unfortunately I have not managed to create a report or reportlike with the results yet (or even to fully understand them), I'll keep working on that. Of course I can use any advice you have, since this is my first try with R and many newie mistakes will happen.
The script and the shp files with the updated folder structure can be found here(113 KB size)

Related

Drawing GRTS points within polygons with spsurvey

In previous versions of the spsurvey package, it was possible to draw random points within polygons in a shapefile using a somewhat complicated design specification. (See here for an example).
The newly updated version of spsurvey (5.0.1) appears very user-friendly, except I cannot figure out how to perform a GRTS draw of more than one point within polygons. Below is an example:
Suppose I want to draw 10 random points using GRTS within the states of Montana and Wyoming. The grts() call requires an sf object, so we can get an sf object first.
library(raster)
library(sf)
library(spsurvey)
## Get state outlines
US <- raster::getData(country = "United States", level = 1)
States.outline <- US[US$NAME_1 %in% c("Montana","Wyoming"),]
# convert to sf object
states.out <- st_as_sf(States.outline)
Then, if we want to stratify by state, and we want ten points from each, we need:
# Define the number of points to draw from each state
strata_n <- c(Montana = 10, Wyoming = 10)
The strata_n object then gets fed into the grts() call, with the NAME_1 variable being the state name.
# Attempt to make grts draw
grts(sframe = states.out,
stratum_var = "NAME_1",
n_base = strata_n
)
This returns an error message:
During the check of the input to grtspts, one or more errors were
identified. Enter the following command to view all input error
messages: stopprnt() To view a subset of the errors (e.g., errors 1
and 5) enter stopprnt(m=c(1,5))
Running stopprnt() gives the following message:
Input Error Message n_base : Each stratum must have a sample
size no larger than the number of rows in 'sframe' representing that
stratum
This is a wonderfully clear message -- we can't draw more than one point from each polygon because the sf object only has a single row per state.
So: with the new and improved spsurvey package, how does one draw multiple points from within a polygon? Any tips or direction would be appreciated.
This is a bug. I have updated the development version, which can be installed (after installing the remotes package) by running
remotes::install_github("USEPA/spsurvey", ref = "develop")
Likely a few weeks before the changes in spsurvey are reflected on CRAN. Thanks for finding this.

Normalizing an R stars object by grid area?

first post :)
I've been transitioning my R code from sp() to sf()/stars(), and one thing I'm still trying to grasp is accounting for the area in my grids.
Here's an example code to explain what I mean.
library(stars)
library(tidyverse)
# Reading in an example tif file, from stars() vignette
tif = system.file("tif/L7_ETMs.tif", package = "stars")
x = read_stars(tif)
x
# Get areas for each grid of the x object. Returns stars object with "area" in units of [m^2]
x_area <- st_area(x)
x_area
I tried loosely adopting code from this vignette (https://github.com/r-spatial/stars/blob/master/vignettes/stars5.Rmd) to divide each value in x by it's grid area, and it's not working as expected (perhaps because my objects are stars and not sf?)
x$test1 = x$L7_ETMs.tif / x_area # Some computationally intensive calculation seems to happen, but doesn't produce the results I expect?
x$test1 = x$L7_ETMs.tif / x_area$area # Throws error, "non-conformable arrays"
What does seem to work is the following.
x %>%
mutate(test1 = L7_ETMs.tif / units::set_units(as.numeric(x_area$area), m^2))
Here are the concerns I have with this code.
I worry that as I turn the x_area$area (a matrix, areas in lat/lon) into a numeric vector, I may mess up the lat/lon matching between the grid and it's area. I did some rough testing to see if the areas match up the way I expect them to, but can't escape the worry that this could lead to errors that are difficult to catch.
It just doesn't seem clean that I start with "x_area" in the correct units, only to remove then set the units again during the computation.
Can someone suggest a "cleaner" implementation for what I'm trying to do, i.e. multiplying or dividing grids by its area while maintaining units throughout? Or convince me that the code I have is fine?
Thanks!
I do not know how to improve the stars code, but you can compare the results you get with this
tif <- system.file("tif/L7_ETMs.tif", package = "stars")
library(terra)
r <- rast(tif)
a <- cellSize(r, sum=FALSE)
x <- r / a
With planar data you could do this when it is safe to assume there is no distortion (generally not the case, but it can be the case)
y <- r / prod(res(r))

Is there a way to apply PCA onto two raster stacks (w same variable)

I am trying to run PCA on some bioclimatic variables in R, specifically, current and future projections of bioclimatic variables from worldclim.org.
The problem lies in that prcomp can only work with one rasterstack. I wanted for prcomp to work on two rasterstacks at the same time. Rasterstacks, which have the exact same set of variables (names and extent), but different cell values.
I do have a roundabout method to solve this, which is to shift the extent of the the future raster layers and merge them with the current ones into one extensive set of rasterlayer. But this gives me a lot of projection issues with other data that I wish to use alongside this.
I understand this is not entirely clear, but basically: PCA two rasterstacks with the same variables, with the same coordinates, without having to shift extent.
Thank you!
Edits:
I could not figure out how to get the data or create example data, thus did not initially including example code. Will try to be clearer here.
filesC # location of bioclimate files for current
[1] "bio01.asc" "bio02.asc" "bio03.asc" "bio04.asc" "bio05.asc"
filesF # location of bioclimate files for the future
[1] "bio01.asc" "bio02.asc" "bio03.asc" "bio04.asc" "bio05.asc"
# note they have the exact same variables.
rasC <- stack(filesC)
rasF <- stack(filesF)
rasC#extent
#class : Extent
#xmin : 116.95
#xmax : 126.6
#ymin : 4.65
#ymax : 21.11667
rasF#extent
#class : Extent
#xmin : 116.95
#xmax : 126.6
#ymin : 4.65
#ymax : 21.11667
# and exact same extent
So currently I am doing this.
pcaC <- prcomp(rasC, scale = T)
FutPCs <- predict(rasF, pcaC)
# creating PCs of rasF based on the pca from rasC
However, I would like to apply the PCA on both rasterstacks simultaneously. Building a "PCA formula" based on the variables from both current and future bioclimatic variables.
like so...
pca <- prcomp(rasC, rasF, scale = T)
CurPCs <- predict(rasC, pca)
FutPCs <- predict(rasF, pca)
Hope this is clearer!
To people who come across this post with similar issues.
I have found a simple answer to this problem.
prcomp
Does not work directly on the rasters themselves, but rather with a matrix of (usually random) points from the raster.
To merge the analysis of PCA for the two rasterstacks, I merely included equal portions of points from each raster stack and binded them together.
rasC <- stack(rasC)
rasF <- stack(rasF)
srC <- sampleRandom(rasC, 10000)
srF <- sampleRandom(rasF, 10000)
srCF <- rbind(srC,srF)
pcaCF <- prcomp(srCF,scale=T)
From there, I can predict the new PCs based on the combined pca of the two datasets.
Did not realise there was such a step to take, but at least I think it is solved!

R: How do I loop through spatial points with a specific buffer?

So my problem is quite difficult to describe so I hope I can make my question as clear as possible.
I use the rLiDAR package to load a .las file into R and afterwards convert it into a SpatialPointsDataFrame using the sp package.
So my SpatialPointsDataFrame is quite dense.
Now I want to define a buffer of 0.5 meters and loop (iterate) with him (the buffer) through the points, choosing always the point with the highest Z value within the buffer, as the next point to jump to.This should be repeated until there isn't any point within the buffer with an higher Z value as the current. All values (or perhaps the X and Y values) of this "found" point should then be written into a list/dataframe and the process should be repeated until all such highest points are found.
Thats the code I got so far:
>library(rLiDAR)
>library(sp)
>rLAS<-readLAS("Test.las",short=FALSE)
>PointCloud<- data.frame(rLAS)
>coordinates(PointCloud) <- c("X", "Y")
Well I googled extensively but I could not find any clues how to proceed further...
I dont even know which packages could be of help, I guess perhaps spatstat as my question would probably go into the spatial point pattern analysis.
Does anyone have some ideas how to archive something like that in R? Or is something like that not possible? (Do I perhaps have to skip to python to make something like this work?)
Help would gladly be appreciated.
If you want to get the set of points which are the local maxima within a 0.5m radius circle around each point, this should work. The gist of it is:
Convert the LAS points to a SpatialPointsDataFrame
Create a buffered polygon set with overlapping polygons
Loop through all buffered polygons and find the desired element within the buffer -- in your case, it's the one with the maximum height.
Code below:
library(rLiDAR)
library(sp)
library(rgeos)
rLAS <- readLAS("Test.las",short=FALSE)
PointCloud <- data.frame(rLAS)
coordinates(PointCloud) <- c("X", "Y")
Finish creating the SpatialPointsDataFrame from the LAS source. I'm assuming the field with the point height is PointCloud$value
pointCloudSpdf <- SpatialPointsDataFrame(data=PointCloud,xy)
Use rgeos library for intersection. It's important to have byid=TRUE or the polygons will get merged where they intersect
bufferedPoints <- gBuffer(pointCloudSpdf,width=0.5,byid=TRUE)
# Save our local maxima state (this will be updated)
localMaxes <- rep(FALSE,nrow(PointCloud))
i=0
for (buff in 1:nrow(bufferedPoint#data)){
i <- i+1
bufPolygons <- bufferedPoints#polygons[[i]]
bufSpPolygons <- SpatialPolygons(list(bufPolygons))
bufSpPolygonDf <-patialPolygonsDataFrame(bufSpPolygons,bufferedPoints#data[i,])
ptsInBuffer <- which(!is.na(over(pointCloudSpdf,spPolygonDf)))
# I'm assuming `value` is the field name containing the point height
localMax <- order(pointCloudSpdf#data$value[ptsInBuffer],decreasing=TRUE)[1]
localMaxes[localMax] <- TRUE
}
localMaxPointCloudDf <- pointCloudSpdf#data[localMaxes,]
Now localMaxPointCloudDf should contain the data from the original points if they are a local maximum. Just a warning -- this isn't going to be super fast if you have a lot of points. If that ends up being a concern you may be smarter about pre-filtering your points using a smaller grid and extract from the raster package.
That would look something like this:
Make the cell size small enough so that each 0.5m buffer will intersect at least 4 raster cells -- err on smaller since we are comparing circles to squares.
library(raster)
numRows <- extent(pointCloudSpdf)#ymax-extent(pointCloudSpdf)#ymin/0.2
numCols <- extent(pointCloudSpdf)#xmax-extent(pointCloudSpdf)#xmin/0.2
emptyRaster <- raster(nrow=numRows,ncol=numCols)
rasterize will create a grid with the maximum value of the given field within a cell. Because of the square/circle mismatch this is only a starting point to filter out obvious non-maxima. After this we will have a raster in which all the local maxima are represented by cells. However, we won't know which cells are maxima in the 0.5m radius and we don't know which point in the original feature layer they came from.
r <- rasterize(pointCloudSpdf,emptyRaster,"value",fun="max")
extract will give us raster values (i.e., the highest value for each cell) that each point intersects. Recall from above that all the local maxima will be in this set, although some values will not be 0.5m radius local maxima.
rasterMaxes <- extract(r,pointCloudSpdf)
To match up the original points with the raster maxes, just subtract the raster value at each point from that point's value. If the value is 0, then the values are the same and we have a point with a potential maximum. Note that at this point we are only merging the points back to the raster -- we will have to throw some of these out because they are "under" a 0.5m radius with a higher local max even though they are the max in their 0.2m x 0.2m cell.
potentialMaxima <- which(pointCloudSpdf#data$value-rasterMaxes==0)
Next, just subset the original SpatialPointsDataFrame and we'll do the more exhaustive and accurate iteration over this subset of points since we should have thrown out a bunch of points which could not have been maxima.
potentialMaximaCoords <- coordinates(pointCloudSpdf#coords[potentialMaxima,])
# using the data.frame() constructor because my example has only one column
potentialMaximaDf <- data.frame(pointCloudSpdf#data[potentialMaxima,])
potentialMaximaSpdf <-SpatialPointsDataFrame(potentialMaximaCoords,potentialMaximaDf)
The rest of the algorithm is the same but we are buffering the smaller dataset and iterating over it:
bufferedPoints <- gBuffer(potentialMaximaSpdf, width=0.5, byid=TRUE)
# Save our local maxima state (this will be updated)
localMaxes <- rep(FALSE, nrow(PointCloud))
i=0
for (buff in 1:nrow(bufferedPoint#data)){
i <- i+1
bufPolygons <- bufferedPoints#polygons[[i]]
bufSpPolygons <- SpatialPolygons(list(bufPolygons))
bufSpPolygonDf <-patialPolygonsDataFrame(bufSpPolygons,bufferedPoints#data[i,])
ptsInBuffer <- which(!is.na(over(pointCloudSpdf, spPolygonDf)))
localMax <- order(pointCloudSpdf#data$value[ptsInBuffer], decreasing=TRUE)[1]
localMaxes[localMax] <- TRUE
}
localMaxPointCloudDf <- pointCloudSpdf#data[localMaxes,]

Use Rcartogram on a SpatialPolygonsDataFrame object

I'm trying to do the same thing asked in this question, Cartogram + choropleth map in R, but starting from a SpatialPolygonsDataFrame and hoping to end up with the same type of object.
I could save the object as a shapefile, use scapetoad, reopen it and convert back, but I'd rather have it all within R so that the procedure is fully reproducible, and so that I can code dozens of variations automatically.
I've forked the Rcartogram code on github and added my efforts so far here.
Essentially what this demo does is create a SpatialGrid over the map, look up the population density at each point of the grid and convert this to a density matrix in the format required for cartogram() to work on. So far so good.
But, how to interpolate the original map points based on the output of cartogram()?
There are two problems here. The first is to get the map and grid into the same units to allow interpolation. The second is to access every point of every polygon, interpolate it, and keep them all in right order.
The grid is in grid units and the map is in projected units (in the case of the example longlat). Either the grid must be projected into longlat, or the map into grid units. My thought is to make a fake CRS and use this along with the spTransform() function in package(rgdal), since this handles every point in the object with minimal fuss.
Accessing every point is difficult because they are several layers down into the SpPDF object: object>polygons>Polygons>lines>coords I think. Any ideas how to access these while keeping the structure of the overall map intact?
This problem can be solved with the getcartr package, available on Chris Brunsdon's GitHub, as beautifully explicated in this blog post.
The quick.carto function does exactly what you want -- takes a SpatialPolygonsDataFrame as input and has a SpatialPolygonsDataFrame as output.
Reproducing the essence of the example in the blog post here in case the link goes dead, with my own style mixed in & typos fixed:
(Shapefile; World Bank population data)
library(getcartr)
library(maptools)
library(data.table)
world <- readShapePoly("TM_WORLD_BORDERS-0.3.shp")
#I use data.table, see blog post if you want a base approach;
# data.table wonks may be struck by the following step as seeming odd;
# see here: http://stackoverflow.com/questions/32380338
# and here: https://github.com/Rdatatable/data.table/issues/1310
# for some background on what's going on.
world#data <- setDT(world#data)
world.pop <- fread("sp.pop.totl_Indicator_en_csv_v2.csv",
select = c("Country Code", "2013"),
col.names = c("ISO3", "pop"))
world#data[world.pop, Population := as.numeric(i.pop), on = "ISO3"]
#calling quick.carto has internal calls to the
# necessary functions from Rcartogram
world.carto <- quick.carto(world, world$Population, blur = 0)
#plotting with a color scale
x <- world#data[!is.na(Population), log10(Population)]
ramp <- colorRampPalette(c("navy", "deepskyblue"))(21L)
xseq <- seq(from = min(x), to = max(x), length.out = 21L)
#annoying to deal with NAs...
cols <- ramp[sapply(x, function(y)
if (length(z <- which.min(abs(xseq - y)))) z else NA)]
plot(world.carto, col = cols,
main = paste0("Cartogram of the World's",
" Population by Country (2013)"))

Resources