Correlating rasters with divisible resolution - r

I am using a multibeam echosounder to create a raster stack in R with layers all in the same resolution, which I then convert to a data frame so I can create additive models to describe the distribution of fish around bathymetry variables (depth, aspect, slope, roughness etc.).
The issue I have is that I would like to keep my resonse variable (fish school volume) fine and my predictive variables (bathymetry) coarse, such that I have say 1 x 1m cells representing the distribution of fish schools and 10 x 10m cells representing bathymetry (so the coarse cell is divisible by the fine cell with no remainder).
I can easily create these rasters individually but relating them is the problem. As each coarser cell would contain 10 x 10 = 100 finer cells, I am not sure how to program this into R so that the values are in the right location relative to an x and a y column (for cell addresses). But I realise in this case, I would need each coarse cell value to be repeated 100 times in the data frame.
Any advice would be greatly appreciated! Thanks!

Related

Extract values from geom_bin2d

I have taken photos of a bird nesting area and have marked positions of earch bird on the photo. Resulting data is a list of X and Y positions. I transformed pixel data to meters data.
I want to calculate how many of counts are there in squares of 1m2. I was able to get what I looked for graphically with geom_bin2d but I would like to extract the value of each of the squares.
Any functions that would do this? or methods to extract data from geom_bin2d?
Thank you very much!
I have found few functions (density, bkde2D) but they are related to Kernel density estimate, which doesn't seem to fit the same values with geom_bin2d.

Pick locations with high density in a vector

I want to find the expected value of y given x, based on some data. I want really good estimates of the mean y-value at a few particular x-values, but I don't want/need to fit something parametric or do a regression.
Instead, I want to take my observations, bin a bunch of them where I have a lot of x-values in a small range of X, and compute the mean of y.
Is there a clever way to select, say, 6 non-overlapping regions of high density from my vector of x observations?
If so, I'll take the center of each region, grab a bunch of the closest x's (maybe 100 in my real data), and compute the associated mean(y).
Here's some example data:
# pick points for high density regions
#xobs<-runif(900)
clustery_obs<-function(x){rnorm(40,x,0.2)}
under_x<-runif(11)
xobs<-sapply(under_x, clustery_obs)
xobs<-xobs[0<xobs&xobs<1]
yfun<-function(x){rnorm(1, mean=(10*x)^2-(30*x)+3, sd=6)}
yobs<-sapply(xobs, yfun)
plot(xobs, yobs)

R understanding raster's corLocal neighborhood size parameter

I am calculating the Pearson correlation between two rasters (identical in dimensions and cell size) in a moving window with the corLocal from the raster package. It is not clear (to me) from the manual what the neighborhood size parameter (ngb) actually means. E.g., does a ngb = 5 mean that the correlation is calculated for the focal cell plus the top-bottom-right-left cells?
I looked at the code and corLocal calls getValuesFocal():
getValuesFocal(x, 1, nrow(x), ngb=ngb)
but I couldn't understand what getValuesFocal actually does.
Thanks,
Ilik
The ngb parameter defines the neighborhood size. For example, I believe ngb=5 defines a 5 x 5 neighborhood. This should be equivalent to ngb=c(5,5) which is a vector of two integers defining the numbers of rows and cols in the neighborhood or focal window. In this example, an individual cell in the output raster would represent the correlation calculated from a 5 x 5 cell neighborhood in the two input rasters.
The raster library documentation on p. 118 might help too.

How to assign values to a matrix based on the values of another matrix in R?

My question may be poorly worded, but hopefully I can explain it better. In R, I created a matrix and assigned it values like this:
sample<-matrix(data=rbinom(10000,1,0.3), nrow=100, ncol=100, byrow=TRUE)
Now I am trying to figure out how to assign values following a poisson distribution to each value in the matrix that == 1
Here is the assignment I was given:
You have a 100m x 100m grid with 1m x 1m cells. You want to select about 30% of the cells to sample. Simulate grid cells to sample. How many cells did you sample?
You count snails in each sampled grid cell. A paper you found reports an average snail density of 15/m2. Simulate the number of snails you count in each grid cell you sample based on an average snail density of 15/m2. On average, how many snails did you count per grid cell? How many snails did you count in total across all sampled grid cells?
Not looking for anyone to do my work for me, just a point in the right direction would help me out a lot. Thank you.
You can find the number of nonzero cells with:
num_cells <- sum(sample)
At that point, you can reassign the nonzero values using rpois:
sample[sample == 1] <- rpois(num_cells, 15)

How to read a coplot() graph

I cannot warp my mind arround reading the plots generated by coplot().
For example from the help(coplot)
## Tonga Trench Earthquakes
coplot(lat ~ long | depth, data = quakes)
What do the gray bars above represent? Why are there 2 rows or lat/long boxes?
How do I read this graph?
I can shed some more light on the second chart's interpretation. The gray bars for both mag and depth represent intervals of the their respective variables. Andy gave a nice description of how they are created above.
When you are reading them keep in mind that they are meant to show you the range of the observations for the respective conditioning variable (mag or depth) represented in each column or row. Therefore, in Andy's example the largest mag bar is just showing that the topmost row contains observations for earthquakes of approx. 4.6 to 7. It makes sense that this bar is the largest, since as Andy mentioned, they are created to have roughly similar numbers of observations and stronger earthquakes are not as common as weaker ones. The same logic holds true for depth where a larger range of depths was required to get a roughly proportional number of observations.
Regarding reading the chart, you would read the columns as representing the three depth groups (left to right) and the rows as representing the four mag groups (bottom to top). Thus, as you read up the chart you're progressively slicing the data into groups of observations with increasing magnitudes. So, for example, the bottom row represents earthquakes with magnitudes of 4 to 4.5 with each column representing a different range of depths. Similarly, you read the columns as holding depth constant while allowing you to see various ranges of magnitudes.
Putting it all together, as mentioned by Andy, we can see that as we read up the rows (progressing up in magnitude) the distribution of earthquakes remains relatively unchanged. However, when reading across the columns (progressing up in depth) we see that the distribution does slightly change. Specifically, the grouping of quakes on the right, between longitudes 180 and 185, grows tighter and more clustered towards the top of the cell.
This is a method for visualizing interactions in your dataset. More specifically, it lets you see how some set of variables are conditional on some other set of variables.
In the example given, you're asking to visualize how lat and long vary with depth. Because you didn't specify number, and the formula indicates you're interested in only one conditional variable, the function assumes you want number=6 depth cuts (passed to co.intervals, which tries to make the number of data points approximately equal within each interval) and is simply maximizing the data-to-ink ratio by stacking individual plot frames; the value of depth increases to the right, starting with the lowest row and moving up (hence the top-right frame represents the largest depth interval). You can set rows or columns to change this behavior, e.g.:
coplot(lat ~ long | depth, data = quakes, columns=6)
but I think the power of this tool becomes more apparent when you inspect two or more conditioning variables. For example:
coplot(lat ~ long | depth * mag, data = quakes, number=c(3,4))
gives a rich view of how earthquakes vary in space, and demonstrates that there is some interaction with depth (the pattern changes from left to right), and little-to-no interaction with magnitude (the pattern does not change from top to bottom).
Finally, I would highly recommend reading Cleveland's Visualizing Data -- a classic text.

Resources