Finding the image boundary - r

While I use R quite a bit, just started an image analysis project and I am using the EBImage package. I need to collect a lot of data from circular/elliptical images. The built-in function computeFeatures gives the maximum and minimum radius. But I need all of the radii it computes.
Here is the code. I have read the image, thresholded and filled.
actual.image = readImage("xxxx")
image = actual.image[,2070:4000]
image1 = thresh(image)
image1 = fillHull(image1)
As there are several objects in the image, I used the following to label
image1 = bwlabel(image1)
I generated features using the built in function
features = data.frame(computeFeatures(image1,image))
Now, computeFeatures gives max radius and min radius. I need all the radii of all the objects it has computed for my analysis. At least if I get the coordinates of boundaries of all objects, I can compute the radii through some other code.
I know images are stored as matrices and can come up with a convoluted way to find the boundaries and then compute radii. But, was wondering if there a more elegant method?

You could try extracting each object + some padding, and plotting the x and y axis intensity profiles for each object. The intensity profiles is simply the sum of rows / columns which can be computed using rowSums and colSums in R
Then you could find where it dropps by splitting each intensity profiles in half and computing the nearest minimum value.
Maybe an example would help clear things up:
Hopefully this makes sense

Related

Analyse Pixel distribution of a Rasterlayer

I really really need some advice. I have a Raster with many pixels. Each pixel has one value. Now I want to do a spatial analysis of these pixels. I want to see in which region have the most pixels and were not. Sounds simple, but it's not.
I had an idea to do this with the kernal density but it does not work with rasterlayer. It doesn't work either with ppp, because you can't transform a raster into this data type. I'm really lost. I don't know what could work. So I would be very grateful if I could get some help.
My Pixels looks like this:
There must be a way to show the regions with the most pixels and so on. But I don't know how I can do that.
Short answer: convert your raster object to a pixel image of class im in the spatstat package. Then use Smooth.im. Example:
library(spatstat)
Z <- as.im(my_raster_data)
S <- Smooth(Z)
plot(S)
Long answer: you're using the term "pixel" in a nonstandard sense. The pixels are the small squares which make up the image. Your illustration shows a pixel image in which the majority of the pixels have the value 0 (represented by white colour), but a substantial number of individual pixels have values greater than 0 (ranging from 0 to 0.3).
If I understand correctly, you would like to generate a colour image or heat map which has a brighter/warmer colour in those places where more of the pixels have positive values.
The simplest way is to use Gaussian smoothing of the pixel values in the image. This will calculate a spatially-varying average of the values of the nearby pixels, including the zero pixels. To do this, convert the raster to a pixel image of class im in the spatstat package
Z <- as.im(my_raster_object)
then apply Smooth.im
S <- Smooth(Z)
plot(S)
Look at the help for Smooth.im for options to control the degree of smoothing.
If you wanted to ignore the actual colours (pixel values) in the input data, you could just transform them to binary values before smoothing:
B <- (Z > 0)
SB <- Smooth(B)
plot(SB)

Compress grayscale image using its histogram

I have a background on mathematics and Machine Learning, but I'm quite new on image compression. The other way I was thinking in the optimal way to compress an image just using a lookup table. This means, given an original image which has N unique values, change it to a new image with M unique values being M<N. Given a fixed value of M, my question was how to pick those values. I realized that if we take as figure of merit the total error (MSE) of all the pixels, all the information has to be in the histogram of the pixel intensities. Somehow, the most common values should be mapped to a closer value than the uncommon values, making the higher regions of the histogram more "dense" in the new values that the low regions.Hence I was wondering if it exists a mathematical formula that:
-Given the histogram h(x) of all the pixels intensities
-Given the number of uniques new values M
Defines the set of new M values {X_new} that minimizes the total error.
I tried to define the loss function and take the derivative, but it appeared some argmax operations that I don't know how to derivate them. However, my intution tells me that it should exist a closed formula.....
Example:
Say we have an image with just 10 pixels, with values {1,1,1,1,2,2,2,2,3,3}. We initially have N=3
and we are asked to select the M=2 unique values that minimizes the error. It is clear, that we have to pick the 2 most common ones, so {X_new}={1,2} and the new image will be "compressed" as {1,1,1,1,2,2,2,2,2,2}. If we are asked to pick M=1, we will pick {X_new}=2 to minimize the error.
Thanks!
This is called color quantization or palettization. It is essentially a clustering problem, usually in the 3D RGB space. Each cluster becomes a single color in the downsampled image. The GIF and PNG image formats both support palettes.
There are many clustering algorithms out there, with a lot of research behind them. For this, I would first try k-means and DBSCAN.
Note that palettization would only be one part of an effective image compression approach. You would also want to take advantage of both the spatial correlation of pixels (often done with a 2-D spatial frequency analysis such as a discrete cosine transform or wavelet transform), as well as taking advantage of the lower resolution of the human eye in color discrimination as opposed to grayscale acuity.
Unless you want to embark on a few years of research to improve the state of the art, I recommend that you use existing image compression algorithms and formats.

Matlab contourf() to plot data on a global map

I have been using Matlab 2011b and contourf/contourfm to plot 2D data on a map of North America. I started from the help page for contourfm on the mathworks website, and it works great if you use their default data called "geoid" and reference vector "geoidrefvec."
Here is some simple code that works with the preset data:
figure
axesm('MapProjection','lambert','maplo',[-175 -45],'mapla',[10 75]);
framem; gridm; axis off; tightmap
load geoid
%geoidrefvec=[1 90 0];
load 'TECvars.mat'
%contourfm(ITEC, geoidrefvec, -120:20:100, 'LineStyle', 'none');
contourfm(geoid, geoidrefvec, -120:20:100, 'LineStyle', 'none');
coast = load('coast');
geoshow(coast.lat, coast.long, 'Color', 'black')
whitebg('w')
title(sprintf('Total Electron Content Units x 10^1^6 m^-^2'),'Fontsize',14,'Color','black')
%axis([-3 -1 0 1.0]);
contourcbar
The problem arises when I try to use my data. I am quite sure the reference vector determines where the data should be plotted on the globe but I was not able to find any documentation about how this vector works or how to create one to work with different data.
Here is a .mat file with my data. ITEC is the matrix of values to be plotted. Information about the position of the grid relative to the earth can be found in the cell array called RT but the basic idea is. ITEC(1,1) refers to Lat=11 Long=-180 and ITEC(58,39) refers to Lat = 72.5 Long = -53 with evenly spaced data.
Does anyone know how the reference vector defines where the data is placed on the map? Or perhaps there is another way to accomplish this? Thanks in advance!
OK. So I figured it out. I realized that, given that there are only three dimensions in the vector, the degrees between latitude data must be the same as the degrees between longitude data. That is, the spacing between each horizontal data point must be the same as the spacing between each vertical point. For instance, 1 degree.
The first value in the reference vector is the distance (in degrees) between each data point (I think...this works in my case), and the two second values in the vector are the minimum latitude and minimum longitude respectively.
In my case the data was equally spaced in each direction, but not the same spacing vertically and horizontally. I simply interpolated the data to a 1x1 grid density and set the first value in the vector to 1.
Hopefully this will help someone with the same problem.
Quick question though, since I answered my own question do I get the bounty? I'd hate to loose 50 'valuable' reputation points haha

Disperse points in a 2D visualisation

I have a set of points like this (that I have clustered using R):
180.06576696, 192.64378568
180.11529253999998, 192.62311824
180.12106092, 191.78020965999997
180.15299478, 192.56909828000002
180.2260287, 192.55455869999997
These points are dispersed around a center point or centroid.
The problem is that the points are very close together and are, thus, difficult to see.
So, how do I move the points apart so that I can distinguish each point more clearly?
Thanks,
s
Maybe I'm overlooking some intricacy here, but...multiply by 10?
EDIT
Assuming the data you listed above are Cartesian (x,y) coordinate pairs, you can visualize them as a scatter plot using Google Charts. I've rounded your data to 3 decimal places, because Google Charts doesn't appear to handle higher precision than that.
I don't know the coordinates for your central point. In the above chart, I'm assuming it is somewhere nearby and not at (0,0). If it is at (0,0), then I imagine it will be difficult to visualize all of the data at once without some kind of "zoom-in" feature, scaling the data, or a very large screen.
slotishtype, without going into code, I think you first need to add in the following tweaking parameters to be used by the visualization code.
Given an x by y display box, fill the entire box, with input parameters [0.0 to 1.0]...
overlap: the allowance for points to be placed on top of each other
completeness: how important is it to display all of your data points
centroid_display: how important is it to see the centroid in the same output
These produce the dependent parameter
scale: the ratio between display distances to numerical distances
You will need code to
calculate the distance(s) to the centroid like you said,
and also the distances between data points, affecting the output based on the chosen input parameters.
I take inspiration from the fundamentals in the GraphViz dot manual. Look at the "Drawing Orientation, Size and Spacing" on p12.

How to resample/rebin a spectrum?

In Matlab, I frequently compute power spectra using Welch's method (pwelch), which I then display on a log-log plot. The frequencies estimated by pwelch are equally spaced, yet logarithmically spaced points would be more appropriate for the log-log plot. In particular, when saving the plot to a PDF file, this results in a huge file size because of the excess of points at high frequency.
What is an effective scheme to resample (rebin) the spectrum, from linearly spaced frequencies to log-spaced frequencies? Or, what is a way to include high-resolution spectra in PDF files without generating excessively large files sizes?
The obvious thing to do is to simply use interp1:
rate = 16384; %# sample rate (samples/sec)
nfft = 16384; %# number of points in the fft
[Pxx, f] = pwelch(detrend(data), hanning(nfft), nfft/2, nfft, rate);
f2 = logspace(log10(f(2)), log10(f(end)), 300);
Pxx2 = interp1(f, Pxx, f2);
loglog(f2, sqrt(Pxx2));
However, this is undesirable because it does not conserve power in the spectrum. For example, if there is a big spectral line between two of the new frequency bins, it will simply be excluded from the resulting log-sampled spectrum.
To fix this, we can instead interpolate the integral of the power spectrum:
df = f(2) - f(1);
intPxx = cumsum(Pxx) * df; % integrate
intPxx2 = interp1(f, intPxx, f2); % interpolate
Pxx2 = diff([0 intPxx2]) ./ diff([0 F]); % difference
This is cute and mostly works, but the bin centers aren't quite right, and it doesn't intelligently handle the low-frequency region, where the frequency grid may become more finely sampled.
Other ideas:
write a function that determines the new frequency binning and then uses accumarray to do the rebinning.
Apply a smoothing filter to the spectrum before doing interpolation. Problem: the smoothing kernel size would have to be adaptive to the desired logarithmic smoothing.
The pwelch function accepts a frequency-vector argument f, in which case it computes the PSD at the desired frequencies using the Goetzel algorithm. Maybe just calling pwelch with a log-spaced frequency vector in the first place would be adequate. (Is this more or less efficient?)
For the PDF file-size problem: include a bitmap image of the spectrum (seems kludgy--I want nice vector graphics!);
or perhaps display a region (polygon/confidence interval) instead of simply a segmented line to indicate the spectrum.
I would let it do the work for me and give it the frequencies from the start. The doc states the freqs you specify will be rounded to the nearest DFT bin. That shouldn't be a problem since you are using the results to plot. If you are concerned about the runtime, I'd just try it and time it.
If you want to rebin it yourself, I think you're better off just writing your own function to do the integration over each of your new bins. If you want to make your life easier, you can do what they do and make sure your log bins share boundaries with your linear ones.
Solution found: https://dsp.stackexchange.com/a/2098/64
Briefly, one solution to this problem is to perform Welch's method with a frequency-dependent transform length. The above link is to a dsp.SE answer containing a paper citation and sample implementation. A disadvantage of this technique is that you can't use the FFT, but because the number of DFT points being computed is greatly reduced, this is not a severe problem.
If you want to resample an FFT at a variable rate (logarithmically), then the smoothing or low pass filter kernel will need to be variable width as well to avoid aliasing (loss of sample points). Just use a different width Sync interpolation kernel for each plot point (Sync width approximately the reciprocal of the local sampling rate).

Resources