I have a large raster data with values between 0 to 9. I want to find out if the surrounding points have same value as the center? - associations

Is there any statistical index that allows us to quantify if a pixel is surrounded by pixels of similar value or if they are completely random? For example, if I have one pixel that has a value 8, I want to know if the surrounding values are more likely to be 8 or if there is no association between those points? Can it be applicable to raster data?

Related

Analyse Pixel distribution of a Rasterlayer

I really really need some advice. I have a Raster with many pixels. Each pixel has one value. Now I want to do a spatial analysis of these pixels. I want to see in which region have the most pixels and were not. Sounds simple, but it's not.
I had an idea to do this with the kernal density but it does not work with rasterlayer. It doesn't work either with ppp, because you can't transform a raster into this data type. I'm really lost. I don't know what could work. So I would be very grateful if I could get some help.
My Pixels looks like this:
There must be a way to show the regions with the most pixels and so on. But I don't know how I can do that.
Short answer: convert your raster object to a pixel image of class im in the spatstat package. Then use Smooth.im. Example:
library(spatstat)
Z <- as.im(my_raster_data)
S <- Smooth(Z)
plot(S)
Long answer: you're using the term "pixel" in a nonstandard sense. The pixels are the small squares which make up the image. Your illustration shows a pixel image in which the majority of the pixels have the value 0 (represented by white colour), but a substantial number of individual pixels have values greater than 0 (ranging from 0 to 0.3).
If I understand correctly, you would like to generate a colour image or heat map which has a brighter/warmer colour in those places where more of the pixels have positive values.
The simplest way is to use Gaussian smoothing of the pixel values in the image. This will calculate a spatially-varying average of the values of the nearby pixels, including the zero pixels. To do this, convert the raster to a pixel image of class im in the spatstat package
Z <- as.im(my_raster_object)
then apply Smooth.im
S <- Smooth(Z)
plot(S)
Look at the help for Smooth.im for options to control the degree of smoothing.
If you wanted to ignore the actual colours (pixel values) in the input data, you could just transform them to binary values before smoothing:
B <- (Z > 0)
SB <- Smooth(B)
plot(SB)

Compress grayscale image using its histogram

I have a background on mathematics and Machine Learning, but I'm quite new on image compression. The other way I was thinking in the optimal way to compress an image just using a lookup table. This means, given an original image which has N unique values, change it to a new image with M unique values being M<N. Given a fixed value of M, my question was how to pick those values. I realized that if we take as figure of merit the total error (MSE) of all the pixels, all the information has to be in the histogram of the pixel intensities. Somehow, the most common values should be mapped to a closer value than the uncommon values, making the higher regions of the histogram more "dense" in the new values that the low regions.Hence I was wondering if it exists a mathematical formula that:
-Given the histogram h(x) of all the pixels intensities
-Given the number of uniques new values M
Defines the set of new M values {X_new} that minimizes the total error.
I tried to define the loss function and take the derivative, but it appeared some argmax operations that I don't know how to derivate them. However, my intution tells me that it should exist a closed formula.....
Example:
Say we have an image with just 10 pixels, with values {1,1,1,1,2,2,2,2,3,3}. We initially have N=3
and we are asked to select the M=2 unique values that minimizes the error. It is clear, that we have to pick the 2 most common ones, so {X_new}={1,2} and the new image will be "compressed" as {1,1,1,1,2,2,2,2,2,2}. If we are asked to pick M=1, we will pick {X_new}=2 to minimize the error.
Thanks!
This is called color quantization or palettization. It is essentially a clustering problem, usually in the 3D RGB space. Each cluster becomes a single color in the downsampled image. The GIF and PNG image formats both support palettes.
There are many clustering algorithms out there, with a lot of research behind them. For this, I would first try k-means and DBSCAN.
Note that palettization would only be one part of an effective image compression approach. You would also want to take advantage of both the spatial correlation of pixels (often done with a 2-D spatial frequency analysis such as a discrete cosine transform or wavelet transform), as well as taking advantage of the lower resolution of the human eye in color discrimination as opposed to grayscale acuity.
Unless you want to embark on a few years of research to improve the state of the art, I recommend that you use existing image compression algorithms and formats.

Merge neighbouring areas together, if tm_fill for one area is too small

I have made a map with zip codes of a town. The fill is the ratio of being a case or not case. But some zip codes have very few numbers in total so outliers distort the map.
Is there a way to merge the polygons and data of two neighboring areas based on their n automatically?
And if that is not possible, how can I merge rows of my sf/df without losing ID?
And I guess simplest would be just to set the zip codes to NA.
Depends on what you mean by "automatically". Here's a simple algorithm.
repeat:
Find the region with the smallest population.
If that's more than your threshold, stop
Find that region's neighbours, pick one (at random, or smallest population).
merge that neighbour with that region
Finding neighbours and merging can all be done with either the sf package or the sp package and friends (like spdep and rgeos).
Equally, this can be considered a clustering algorithm using a distance metric based on adjacency. You could do a full hierarchical clustering and then cut the tree at a point such that all clusters had N>threshold.
Now as to whether this is a good idea statistically is another question, and depends on what your goal here is. If you are worried about whether an underlying risk is, say > 0.5, and you are getting positives because you have a population of 3 and 2 positives "by chance" from a small sample (of 3), then you need to model your data and work out the probability of exceeding 0.5 given the data. Then map that, which will take into account the small sample size.

R: Number of nodes in rastertoPolygons

The raster package in R contains the function rasterToPolygons(). It allows you to set the number of nodes per polygon to 4, 8 or 16.
What is the restriction to those three values based on? Why is it not possible to choose any number in between?
The minimum number of nodes to create a polygon from a raster cell is four.
A higher number can be useful when you need to transform the coordinate reference system (crs) of the polygons, because a straight line in one crs is likely to be curved in another. The function allows adding nodes in the middle of each side (8) or adding three nodes on each side (16).
The number of nodes would normally be a factor of 4 (at least for square cells), but there is no fundamental reason to restrict it to 8 and 16 --- that was just a choice of the programmer, who did not consider a more flexible function when he developed it, or did not have the time to do that. You can of course do so yourself or write the maintainer of the raster package and request it.

Create stage height raster using least cost path and r

I have a point shapefile of Station IDs and stageheights. I would like to create a raster where each cell has the stage height value (in meters) of the closest in situ station to that cell.
I want this raster to match up with another raster. So I would like it if I could input both a raster I have created (dataset 3 described below) and my point shapefile (1).
Datasets:
1) Point Shapefile with stage heights of a river delta
2) Shapefile of the river delta extent
3) Raster of the delta where NA's represent land (could also have them be zero's if need be) and 1's are water. Two datasets 10 meter resolution and 30 meter resolution.
One conceptual issue I am having is with the amount of small streams I have.
For example (pictured in image below), station 1 (circled in blue) is technically closer to the black x region than station 2 (circled in red), but the stage height value in red is more representative of point x. There are NA's in between the two streams, does that mean that the value will not jump across streams?
How can I reassign the values in my Raster (all the 1's) to the stage height of the nearest station and make sure that these values are not jumping from stream to stream? Do I need to use least cost path? What is the best way to do this?
I would like to use R, but can use ArcMap if I must.
So I'm not sure what tools you have available to you but I think this answer may be useful:
Calculating attribute for network distance between multiple points in ArcGIS Desktop?
Here the questioner was looking to calculate distances on roads to some points, but your problem seems similar. I think the main point I would make here is that you should do your network distance classification prior to worrying about the raster layer. You may have to convert from polygon to lines or some workaround to get your data into a format that works, but this is the kind of job the tool is designed to do.
After you have reclassified your river shapefile based on their network distance to a given point, then convert the polygons to raster and use this to classify your original raster. You could do this in R or Arcmap. Arcmap will probably be faster.

Resources