GRASS GIS: Error while executing r.resamp.filter - raster

I want to resample a raster from 15m to 460m using a Gaussian filter.
The goal
I am having a coarse image which I want to downscale. I also have a fine resolution band to assist the downscaling. The downscaling method I am using is called geographically weighted area-to-point regression Kriging (GWATPRK). The method consists of two steps:
GWR and,
ATPK on the GWR's residuals.
In order to perform GWR using raster data, those needs to have the same pixel size. This means that, my fine resolution image needs to be upscaled to match the spatial resolution of the coarse band. This upscaling of the fine band needs to be done using a Gaussian kernel (i.e., the PSF). I have found that GRASS GIS has a tool called r.resamp.filter.
I am trying to run the function but I am getting the following error(s):
ERROR: Differing number of values for filter= and [xy_]radius=
This error occurs when I use two filter kernels (e.g., gauss + box, or gauss + bartlett). I am using two kernels because according to the Manual:
Kernels with infinite extent (Gauss, normal, sinc, Hann, Hamming,
Blackman) must be used in conjunction with a finite windowing function
(box, Bartlett, Hermite, Lanczos).
Doesn't matter what numbers I put in the Filter radius or Filter radius (horizontal) and Filter radius (vertical) (see image below), I tested A LOT of numbers.
ERROR: At least one filter must be finite
This error occurs when I use one filter kernel (I am interested in applying a Gaussian filter, because I want to model the point spread function during downscaling satellite imagery).
The steps I followed were:
r.external to import the raster
g.region where I set the region using my original fine resolution image BUT in Resolution tab I changed the 2D resolution into 460
r.resamp.filter and the errors I mentioned
Ultimately, I want to apply a Gaussian filter with sigma (std) = 0.5 to my image.
Here the image I am using

I had to check two filter kernels, the box and the gaussian. For the filter radius I had to insert 250, 250 (each value corresponds to a kernel). The output was an image of ~460m pixel size.

Related

Compress grayscale image using its histogram

I have a background on mathematics and Machine Learning, but I'm quite new on image compression. The other way I was thinking in the optimal way to compress an image just using a lookup table. This means, given an original image which has N unique values, change it to a new image with M unique values being M<N. Given a fixed value of M, my question was how to pick those values. I realized that if we take as figure of merit the total error (MSE) of all the pixels, all the information has to be in the histogram of the pixel intensities. Somehow, the most common values should be mapped to a closer value than the uncommon values, making the higher regions of the histogram more "dense" in the new values that the low regions.Hence I was wondering if it exists a mathematical formula that:
-Given the histogram h(x) of all the pixels intensities
-Given the number of uniques new values M
Defines the set of new M values {X_new} that minimizes the total error.
I tried to define the loss function and take the derivative, but it appeared some argmax operations that I don't know how to derivate them. However, my intution tells me that it should exist a closed formula.....
Example:
Say we have an image with just 10 pixels, with values {1,1,1,1,2,2,2,2,3,3}. We initially have N=3
and we are asked to select the M=2 unique values that minimizes the error. It is clear, that we have to pick the 2 most common ones, so {X_new}={1,2} and the new image will be "compressed" as {1,1,1,1,2,2,2,2,2,2}. If we are asked to pick M=1, we will pick {X_new}=2 to minimize the error.
Thanks!
This is called color quantization or palettization. It is essentially a clustering problem, usually in the 3D RGB space. Each cluster becomes a single color in the downsampled image. The GIF and PNG image formats both support palettes.
There are many clustering algorithms out there, with a lot of research behind them. For this, I would first try k-means and DBSCAN.
Note that palettization would only be one part of an effective image compression approach. You would also want to take advantage of both the spatial correlation of pixels (often done with a 2-D spatial frequency analysis such as a discrete cosine transform or wavelet transform), as well as taking advantage of the lower resolution of the human eye in color discrimination as opposed to grayscale acuity.
Unless you want to embark on a few years of research to improve the state of the art, I recommend that you use existing image compression algorithms and formats.

How can I select the optimal radius value in order to obtain the best normal estmation results

I'm running a model-scene match between a set of point clouds in order to test the matching results.
The match is based on 3D features such as normals and point feature histogram.
I'm using the normal estimation of point cloud library (pcl) to compute the histogram after I'd resampled the point cloud of both model and scene.
My question is, how can I test the accuracy of selecting different radius values in the nearest-neighbor estimation step.
I need to use that values for normal estimation, resampling and histogram in objects such as cup/knife/hummer etc.
I tried to visualize those objects using the pcl visulizer with different radius values and choosing which one that gives correct normals (In terms of how perpendicular were the normals orientation to the surfaces).
But I think that this visual testing is not enough and I would like to know if there are some empiric ways to estimate the optimal radius value.
I would appreciate any suggestion or help ,share your thoughts :)
Thank you.
I think you should start from a ground test: create a point cloud from a mesh using the mesh normals (using CloudCompare for example), then load it twice: once with full data (including normals) and once without normals.
Rebuild normals using the search radius to be tested then you can directly compare de obtained normals with the one extracted from the mesh...

Finding the image boundary

While I use R quite a bit, just started an image analysis project and I am using the EBImage package. I need to collect a lot of data from circular/elliptical images. The built-in function computeFeatures gives the maximum and minimum radius. But I need all of the radii it computes.
Here is the code. I have read the image, thresholded and filled.
actual.image = readImage("xxxx")
image = actual.image[,2070:4000]
image1 = thresh(image)
image1 = fillHull(image1)
As there are several objects in the image, I used the following to label
image1 = bwlabel(image1)
I generated features using the built in function
features = data.frame(computeFeatures(image1,image))
Now, computeFeatures gives max radius and min radius. I need all the radii of all the objects it has computed for my analysis. At least if I get the coordinates of boundaries of all objects, I can compute the radii through some other code.
I know images are stored as matrices and can come up with a convoluted way to find the boundaries and then compute radii. But, was wondering if there a more elegant method?
You could try extracting each object + some padding, and plotting the x and y axis intensity profiles for each object. The intensity profiles is simply the sum of rows / columns which can be computed using rowSums and colSums in R
Then you could find where it dropps by splitting each intensity profiles in half and computing the nearest minimum value.
Maybe an example would help clear things up:
Hopefully this makes sense

How to resample/rebin a spectrum?

In Matlab, I frequently compute power spectra using Welch's method (pwelch), which I then display on a log-log plot. The frequencies estimated by pwelch are equally spaced, yet logarithmically spaced points would be more appropriate for the log-log plot. In particular, when saving the plot to a PDF file, this results in a huge file size because of the excess of points at high frequency.
What is an effective scheme to resample (rebin) the spectrum, from linearly spaced frequencies to log-spaced frequencies? Or, what is a way to include high-resolution spectra in PDF files without generating excessively large files sizes?
The obvious thing to do is to simply use interp1:
rate = 16384; %# sample rate (samples/sec)
nfft = 16384; %# number of points in the fft
[Pxx, f] = pwelch(detrend(data), hanning(nfft), nfft/2, nfft, rate);
f2 = logspace(log10(f(2)), log10(f(end)), 300);
Pxx2 = interp1(f, Pxx, f2);
loglog(f2, sqrt(Pxx2));
However, this is undesirable because it does not conserve power in the spectrum. For example, if there is a big spectral line between two of the new frequency bins, it will simply be excluded from the resulting log-sampled spectrum.
To fix this, we can instead interpolate the integral of the power spectrum:
df = f(2) - f(1);
intPxx = cumsum(Pxx) * df; % integrate
intPxx2 = interp1(f, intPxx, f2); % interpolate
Pxx2 = diff([0 intPxx2]) ./ diff([0 F]); % difference
This is cute and mostly works, but the bin centers aren't quite right, and it doesn't intelligently handle the low-frequency region, where the frequency grid may become more finely sampled.
Other ideas:
write a function that determines the new frequency binning and then uses accumarray to do the rebinning.
Apply a smoothing filter to the spectrum before doing interpolation. Problem: the smoothing kernel size would have to be adaptive to the desired logarithmic smoothing.
The pwelch function accepts a frequency-vector argument f, in which case it computes the PSD at the desired frequencies using the Goetzel algorithm. Maybe just calling pwelch with a log-spaced frequency vector in the first place would be adequate. (Is this more or less efficient?)
For the PDF file-size problem: include a bitmap image of the spectrum (seems kludgy--I want nice vector graphics!);
or perhaps display a region (polygon/confidence interval) instead of simply a segmented line to indicate the spectrum.
I would let it do the work for me and give it the frequencies from the start. The doc states the freqs you specify will be rounded to the nearest DFT bin. That shouldn't be a problem since you are using the results to plot. If you are concerned about the runtime, I'd just try it and time it.
If you want to rebin it yourself, I think you're better off just writing your own function to do the integration over each of your new bins. If you want to make your life easier, you can do what they do and make sure your log bins share boundaries with your linear ones.
Solution found: https://dsp.stackexchange.com/a/2098/64
Briefly, one solution to this problem is to perform Welch's method with a frequency-dependent transform length. The above link is to a dsp.SE answer containing a paper citation and sample implementation. A disadvantage of this technique is that you can't use the FFT, but because the number of DFT points being computed is greatly reduced, this is not a severe problem.
If you want to resample an FFT at a variable rate (logarithmically), then the smoothing or low pass filter kernel will need to be variable width as well to avoid aliasing (loss of sample points). Just use a different width Sync interpolation kernel for each plot point (Sync width approximately the reciprocal of the local sampling rate).

Minimising interpolation error between two data sets

In the top of the diagrams below we can see some value (y-axis) changing over time (x-axis).
As this happens we are sampling the value at different and unpredictable times, also we are alternating the sampling between two data sets, indicated by red and blue.
When computing the value at any time, we expect that both red and blue data sets will return similar values. However as shown in the three smaller boxes this is not the case. Viewed over time the values from each data set (red and blue) will appear to diverge and then converge about the original value.
Initially I used linear interpolation to obtain a value, next I tried using Catmull-Rom interpolation. The former results in a values come close together and then drift apart between each data point; the latter results in values which remain closer, but where the average error is greater.
Can anyone suggest another strategy or interpolation method which will provide greater smoothing (perhaps by using a greater number of sample points from each data set)?
I believe what you ask is a question that does not have a straight answer without further knowledge on the underlying sampled process. By its nature, the value of the function between samples can be merely anything, so I think there is no way to assure the convergence of the interpolations of two sample arrays.
That said, if you have a prior knowledge of the underlying process, then you can choose among several interpolation methods to minimize the errors. For example, if you measure the drag force as a function of the wing velocity, you know the relation is square (a*V^2). Then you can choose polynomial fitting of the 2nd order and have pretty good match between the interpolations of the two serieses.
Try B-splines: Catmull-Rom interpolates (goes through the data points), B-spline does smoothing.
For example, for uniformly-spaced data (not your case)
Bspline(t) = (data(t-1) + 4*data(t) + data(t+1)) / 6
Of course the interpolated red / blue curves depend on the spacing of the red / blue data points,
so cannot match perfectly.
I'd like to quote Introduction to Catmull-Rom Splines to suggest not using Catmull-Rom for this interpolation task.
One of the features of the Catmull-Rom
spline is that the specified curve
will pass through all of the control
points - this is not true of all types
of splines.
By definition your red interpolated curve will pass through all red data points and your blue interpolated curve will pass through all blue points. Therefore you won't get a best fit for both data sets.
You might change your boundary conditions and use data points from both data sets for a piecewise approximation as shown in these slides.
I agree with ysap that this question cannot be answered as you may be expecting. There may be better interpolation methods, depending on your model dynamics - as with ysap, I recommend methods that utilize the underlying dynamics, if known.
Regarding the red/blue samples, I think you have made a good observation about sampled and interpolated data sets and I would challenge your original expectation that:
When computing the value at any time, we expect that both red and blue data sets will return similar values.
I do not expect this. If you assume that you cannot perfectly interpolate - and particularly if the interpolation error is large compared to the errors in samples - then you are certain to have a continuous error function that exhibits largest errors longest (time) from your sample points. Therefore two data sets that have differing sample points should exhibit the behaviour you see because points that are far (in time) from red sample points may be near (in time) to blue sample points and vice versa - if staggered as your points are, this is sure to be true. Thus I would expect what you show, that:
Viewed over time the values from each data set (red and blue) will appear to diverge and then converge about the original value.
(If you do not have information about underlying dynamics (except frequency content), then Giacomo's points on sampling are key - however, you need not interpolate if looking at info below Nyquist.)
When sampling the original continuous function, the sampling frequency should comply to the Nyquist-Shannon sampling theorem, otherwise the sampling process introduces an error (also known as aliasing). The error, being different in the two datasets, results in a different value when you interpolate.
Therefore, you need to know the highest frequency B of the original function and then collect samples with a frequency at least 2B. If your function has very high frequencies and you cannot sample that fast, you should at least try to filter them away before sampling.

Resources