Scipy - data interpolation from one irregular grid to another irregular spaced grid - grid

I am struggling with the interpolation between two grids, and I couldn't find an appropriate solution for my problem.
I have 2 different 2D grids, of which the node points are defined by their X and Y coordinates. The grid itself is not rectangular, but forms more or less a parallelogram (so the X-coordinate for (i,j) is not the same as (i,j+1), and the Y coordinate of (i,j) is different from the Y coordinate of (i+1,j).
Both grids have a 37*5 shape and they overlap almost entirely.
For the first grid I have for each point the X-coordinate, the Y-coordinate and a pressure value. Now I would like to interpolate this pressure distribution of the first grid on the second grid (of which also X and Y are known for each point.
I tried different interpolation methods, but my end result was never correct due to the irregular distribution of my grid points.
Functions as interp2d or griddata require as input a 1D array, but if I do this, the interpolated solution is wrong (even if I interpolate the pressure values from the original grid again on the original grid, the new pressure values are miles away from the original values.
For 1D interpolation on different irregular grids I use:
def interpolate(X, Y, xNew):
if xNew<X[0]:
print 'Interp Warning :', xNew,'is under the interval [',X[0],',',X[-1],']'
yNew = Y[0]
elif xNew>X[-1]:
print 'Interp Warning :', xNew,'is above the interval [',X[0],',',X[-1],']'
yNew = Y[-1]
elif xNew == X[-1] : yNew = Y[-1]
else:
ind = numpy.argmax(numpy.bitwise_and(X[:-1]<=xNew,X[1:]>xNew))
yNew = Y[ind] + ((xNew-X[ind])/(X[ind+1]-X[ind]))*(Y[ind+1]-Y[ind])
return yNew
but for 2D I thought griddata would be easier to use. Does anyone have experience with an interpolation where my input is a 2D array for the mesh and for the data?

Have another look at interp2d. http://docs.scipy.org/scipy/docs/scipy.interpolate.interpolate.interp2d/#scipy-interpolate-interp2d
Note the second example in the 'x,y' section under 'Parameters'. 'x' and 'y' are 1-D in a loose sense but they can be flattened arrays.
Should be something like this:
f = scipy.interpolate.interp2d([0.25, 0.5, 0.27, 0.58], [0.4, 0.8, 0.42,0.83], [3, 4, 5, 6])
znew = f(.25,.4)
print znew
[ 3.]
znew = f(.26,.41) # midway between (0.25,0.4,3) and (0.27,0.42,5)
print znew
[ 4.01945345] # Should be 4 - close enough?
I would have thought you could pass flattened 'xnew' and 'ynew' arrays to 'f()' but I couldn't get that to work. The 'f()' function would accept the row, column syntax though, which isn't useful to you. Because of this limitation with 'f()' you will have to evaluate 'znew' as part of a loop - might should look at nditer for that. Make sure also that it does what you want when '(xnew,ynew)' is outside of the '(x,y)' domain.

Related

3D Ploting in Scilab: Weird plot behaviour

I want to plot a function in scilab in order to find the maximum over a range of numbers:
function y=pr(a,b)
m=1/(1/270000+1/a);
n=1/(1/150000+1/a);
y=5*(b/(n+b)-b/(m+b))
endfunction
x=linspace(10,80000,50)
y=linspace(10,200000,50)
z=feval(x,y,pr)
surf(x,y,z);
disp( max(z))
For these values this is the plot:
It's obvious that increasing the X axis will not increase the maximum but Y axis will.
However from my tests it seems the two axis are mixed up. Increasing the X axis will actually double the max Z value.
For example, this is what happens when I increase the Y axis by a factor of ten (which intuitively should increase the function value):
It seems to increase the other axis (in the sense that z vector is calculated for y,x pair of numbers instead of x,y)!
What am I doing wrong here?
With Scilab's surf you have to use transposed z if comming from feval. It is easy so realize if you use a different number of points in X and Y directions, as surf will complain about the size of the third argument. So in your case, use:
surf(x,y,z')
For more information see the help page of surf.
Stephane's answer is correct, but I thought I'd try to explain better why / what is happening.
From the help surf page (emphasis mine):
X,Y:
two vectors of real numbers, of lengths nx and ny ; or two real matrices of sizes ny x nx: They define the data grid (horizontal coordinates of the grid nodes). All grid cells are quadrangular but not necessarily rectangular. By default, X = 1:size(Z,2) and Y = 1:size(Z,1) are used.
Z:
a real matrix explicitly defining the heights of nodes, of sizes ny x nx.
In other words, think of surf as surf( Col, Row, Z )
From the help feval page (changed notation for convenience):
z=feval(u,v,f):
returns the matrix z such as z(i,j)=f(u(i),v(j))
In other words, in your z output, the i become rows (and therefore u should represent your rows), and j becomes your columns (and therefore v should represent your columns).
Therefore, you can see that you've called feval with the x, y arguments the other way round. In a sense, you should have designed pr so that it should have expected to be called as pr(y,x) instead, so that when passed to feval as feval(y,x,pr), you would end up with an output whose rows increase with y, and columns increase with x.
Then you could have called surf(x, y, z) normally, knowing that x corresponds to columns, and y corresponds to rows.
However, if you don't want to change your whole function just for this, which presumably you don't want to, then you simply have to transpose z in the call to surf, to ensure that you match x to the columns or z' (i.e, the rows of z), and y to the rows of z' (i.e. the columns of z).
Having said all that, it would probably be much better to make your function vectorized, and just use the surf(x, y, pr) syntax directly.

How to animate 3D scatter plot by adding each point at a time in R or MATLAB

I have a set of 3D coordinates here. The data has 52170 rows and 4 columns. Each row represent one point. The first column is point index number, increasing from 1 to 52170. The second to fourth columns are coordinates for x, y, and z axis, respectively. The first 10 lines are as follow:
seq x y z
1 7.126616 -102.927567 19.692112
2 -10.546907 -143.824966 50.77417
3 7.189214 -107.792068 18.758278
4 7.148852 -101.784027 19.905006
5 -14.65788 -146.294952 49.899158
6 -37.315742 -116.941185 12.316169
7 8.023512 -103.477882 19.081482
8 -14.641933 -145.100098 50.182739
9 -14.571636 -141.386322 50.547684
10 -15.691803 -145.66481 49.946281
I want to create a 3D scatter plot in which each point is added sequentially to this plot using R or MATLAB. The point represented by the first line is added first, then the point represented by the second line, ..., all the way to the last point.
In addition, I wish to control the speed at which points are added.
For 2D scatter plot, I could use the following code:
library(gganimate)
x <- rnorm(50, 5, 1)
y <- 7*x +rnorm(50, 4, 4)
ind <- 1:50
data <- data.frame(x, y, ind)
ggplot(data, aes(x, y)) + geom_point(aes(group = seq_along(x))) + transition_reveal(ind)
But I cannnot find information on how to do this for 3D scatter plot. Can anyone show me how this could be done? Thank you.
This is an answer for MATLAB
In a general fashion, animating a plot (or 3d plot, or scatter plot, or surface, or other graphic objects) can be done following the same approach:
Do the first plot/plot3/scatter/surf, and retrieve its handle. The first plot can incorporate the first "initial" sets of points or even be empty (use NaN value to create a plot with invisible data point).
Set axis limits and all other visualisation options which are going to be fixed (view point, camera angle, lightning...). No need to set the options which are going to evolove during the animation.
In a loop, update the minimum set of plot object properties: XData, YData ( ZData if 3D plot, CData if the plot object has some and you want to animate the color).
The code below is an implementation of the approach above adapted to your case:
%% Read data and place coordinates in named variables
csvfile = '3D scatter plot.csv' ;
data = csvread(csvfile,2) ;
% [optional], just to simplify notations further down
x = data(:,2) ;
y = data(:,3) ;
z = data(:,4) ;
%% Generate empty [plot3] objects
figure
% create an "axes" object, and retrieve the handle "hax"
hax = axes ;
% create 2 empty 3D point plots:
% [hp_new] will contains only one point (the new point added to the graph)
% [hp_trail] will contains all the points displayed so far
hp_trail = plot3(NaN,NaN,NaN,'.b','Parent',hax,'MarkerSize',2) ;
hold on
hp_new = plot3(NaN,NaN,NaN,'or','Parent',hax,'MarkerSize',6,'MarkerEdgeColor','r','MarkerFaceColor','g','LineWidth',2) ;
hold off
%% Set axes limits (to limit "wobbling" during animation)
xl = [min(x) max(x)] ;
yl = [min(y) max(y)] ;
zl = [min(z) max(z)] ;
set(hax, 'XLim',xl,'YLim',yl,'ZLim',zl)
view(145,72) % set a view perspective (optional)
%% Animate
np = size(data,1) ;
for ip=1:np
% update the "new point" graphic object
set( hp_new , 'XData',x(ip), 'YData',y(ip), 'ZData',z(ip) )
% update the "point history" graphic object
% we will display points from index 1 up to the current index ip
% (minus one) because the current index point is already displayed in
% the other plot object
indices2display = 1:ip-1 ;
set(hp_trail ,...
'XData',x(indices2display), ...
'YData',y(indices2display), ...
'ZData',z(indices2display) )
% force graphic refresh
drawnow
% Set the "speed"
% actually the max speed is given by your harware, so we'll just set a
% short pause in case you want to slow it down
pause(0.01) % <= comment this line if you want max speed
end
This will produce:

Find correct 2D translation of a subset of coordinates

I have a problem I wish to solve in R with example data below. I know this must have been solved many times but I have not been able to find a solution that works for me in R.
The core of what I want to do is to find how to translate a set of 2D coordinates to best fit into an other, larger, set of 2D coordinates. Imagine for example having a Polaroid photo of a small piece of the starry sky with you out at night, and you want to hold it up in a position so they match the stars' current positions.
Here is how to generate data similar to my real problem:
# create reference points (the "starry sky")
set.seed(99)
ref_coords = data.frame(x = runif(50,0,100), y = runif(50,0,100))
# generate points take subset of coordinates to serve as points we
# are looking for ("the Polaroid")
my_coords_final = ref_coords[c(5,12,15,24,31,34,48,49),]
# add a little bit of variation as compared to reference points
# (data should very similar, but have a little bit of noise)
set.seed(100)
my_coords_final$x = my_coords_final$x+rnorm(8,0,.1)
set.seed(101)
my_coords_final$y = my_coords_final$y+rnorm(8,0,.1)
# create "start values" by, e.g., translating the points we are
# looking for to start at (0,0)
my_coords_start =apply(my_coords_final,2,function(x) x-min(x))
# Plot of example data, goal is to find the dotted vector that
# corresponds to the translation needed
plot(ref_coords, cex = 1.2) # "Starry sky"
points(my_coords_start,pch=20, col = "red") # start position of "Polaroid"
points(my_coords_final,pch=20, col = "blue") # corrected position of "Polaroid"
segments(my_coords_start[1,1],my_coords_start[1,2],
my_coords_final[1,1],my_coords_final[1,2],lty="dotted")
Plotting the data as above should yield:
The result I want is basically what the dotted line in the plot above represents, i.e. a delta in x and y that I could apply to the start coordinates to move them to their correct position in the reference grid.
Details about the real data
There should be close to no rotational or scaling difference between my points and the reference points.
My real data is around 1000 reference points and up to a few hundred points to search (could use less if more efficient)
I expect to have to search about 10 to 20 sets of reference points to find my match, as many of the reference sets will not contain my points.
Thank you for your time, I'd really appreciate any input!
EDIT: To clarify, the right plot represent the reference data. The left plot represents the points that I want to translate across the reference data in order to find a position where they best match the reference. That position, in this case, is represented by the blue dots in the previous figure.
Finally, any working strategy must not use the data in my_coords_final, but rather reproduce that set of coordinates starting from my_coords_start using ref_coords.
So, the previous approach I posted (see edit history) using optim() to minimize the sum of distances between points will only work in the limited circumstance where the point distribution used as reference data is in the middle of the point field. The solution that satisfies the question and seems to still be workable for a few thousand points, would be a brute-force delta and comparison algorithm that calculates the differences between each point in the field against a single point of the reference data and then determines how many of the rest of the reference data are within a minimum threshold (which is needed to account for the noise in the data):
## A brute-force approach where min_dist can be used to
## ameliorate some random noise:
min_dist <- 5
win_thresh <- 0
win_thresh_old <- 0
for(i in 1:nrow(ref_coords)) {
x2 <- my_coords_start[,1]
y2 <- my_coords_start[,2]
x1 <- ref_coords[,1] + (x2[1] - ref_coords[i,1])
y1 <- ref_coords[,2] + (y2[1] - ref_coords[i,2])
## Calculate all pairwise distances between reference and field data:
dists <- dist( cbind( c(x1, x2), c(y1, y2) ), "euclidean")
## Only take distances for the sampled data:
dists <- as.matrix(dists)[-1*1:length(x1),]
## Calculate the number of distances within the minimum
## distance threshold minus the diagonal portion:
win_thresh <- sum(rowSums(dists < min_dist) > 1)
## If we have more "matches" than our best then calculate a new
## dx and dy:
if (win_thresh > win_thresh_old) {
win_thresh_old <- win_thresh
dx <- (x2[1] - ref_coords[i,1])
dy <- (y2[1] - ref_coords[i,2])
}
}
## Plot estimated correction (your delta x and delta y) calculated
## from the brute force calculation of shifts:
points(
x=ref_coords[,1] + dx,
y=ref_coords[,2] + dy,
cex=1.5, col = "red"
)
I'm very interested to know if there's anyone that solves this in a more efficient manner for the number of points in the test data, possibly using a statistical or optimization algorithm.

All points on Line

If I draw a line from let's say: (2,3) to (42,28), how can I get all points on the line in a Point list? I tried using the slope, but I can't seem to get the hang of it.
To be clear: I would like all the pixels that the line covers. So I can make the line 'clickable'.
This is a math question. The equation of a line is:
y = mx + c
So you need to figure out the gradient (m) and the intercept (c) and then plug in values for x to get values for y.
But what do you mean by "all the points on a line"? There is an infinite number of points if x and y are real numbers.
You can use the formula (x-x1)/(x1-x2) = (y-y1)/(y1-y2). And you know the points with x values ranging from 2 to 42 are on the line and their associated y values have to be found. If any of the resulting y value is not an integer then it should be approximated rightly. And if two consecutive y values differ by more than 1 then the missing y values should be mapped to the last x value.
Here is the pseudo code (tried to capture the crux of the algorithm)
prevY = y1
for(i=x1+1;i<=x2;++i)
{
y = computeY(i);
if(diff(y,prevY)>1) dump_points(prevY,y,i);
prevY = y;
dump_point(i,y);
}
dump_points(prevY,y2,x2);
I am probably not covering all the cases here (esp. not the corner ones). But the idea is that for one value of x there would could be many values of y and vice versa depending on the slope of the line. The algorithm should consider this and generate all the points.

Remove redundant points for line plot

I am trying to plot large amounts of points using some library. The points are ordered by time and their values can be considered unpredictable.
My problem at the moment is that the sheer number of points makes the library take too long to render. Many of the points are redundant (that is - they are "on" the same line as defined by a function y = ax + b). Is there a way to detect and remove redundant points in order to speed rendering ?
Thank you for your time.
The following is a variation on the Ramer-Douglas-Peucker algorithm for 1.5d graphs:
Compute the line equation between first and last point
Check all other points to find what is the most distant from the line
If the worst point is below the tolerance you want then output a single segment
Otherwise call recursively considering two sub-arrays, using the worst point as splitter
In python this could be
def simplify(pts, eps):
if len(pts) < 3:
return pts
x0, y0 = pts[0]
x1, y1 = pts[-1]
m = float(y1 - y0) / float(x1 - x0)
q = y0 - m*x0
worst_err = -1
worst_index = -1
for i in xrange(1, len(pts) - 1):
x, y = pts[i]
err = abs(m*x + q - y)
if err > worst_err:
worst_err = err
worst_index = i
if worst_err < eps:
return [(x0, y0), (x1, y1)]
else:
first = simplify(pts[:worst_index+1], eps)
second = simplify(pts[worst_index:], eps)
return first + second[1:]
print simplify([(0,0), (10,10), (20,20), (30,30), (50,0)], 0.1)
The output is [(0, 0), (30, 30), (50, 0)].
About python syntax for arrays that may be non obvious:
x[a:b] is the part of array from index a up to index b (excluded)
x[n:] is the array made using elements of x from index n to the end
x[:n] is the array made using first n elements of x
a+b when a and b are arrays means concatenation
x[-1] is the last element of an array
An example of the results of running this implementation on a graph with 100,000 points with increasing values of eps can be seen here.
I came across this question after I had this very idea. Skip redundant points on plots. I believe I came up with a far better and simpler solution and I'm happy to share as my first proposed solution on SO. I've coded it and it works well for me. It also takes into account the screen scale. There may be 100 points in value between those plot points, but if the user has a chart sized small, they won't see them.
So, iterating through your data/plot loop, before you draw/add your next data point, look at the next value ahead and calculate the change in screen scale (or value, but I think screen scale for the above-mentioned reason is better). Now do the same for the next value ahead (getting these values is just a matter of peeking ahead in your array/collection/list/etc adding the for next step increment (probably 1/2) to the current for value whilst in the loop). If the 2 values are the same (or perhaps very minor change, per your own preference), you can skip this one point in your chart by simply adding 'continue' in the loop, skipping adding the data point as the point lies exactly on the slope between the point before and after it.
Using this method, I reduce a chart from 963 points to 427 for example, with absolutely zero visual change.
I think you might need to perhaps read this a couple of times to understand, but it's far simpler than the other best solution mentioned here, much lighter weight, and has zero visual effect on your plot.
I would probably apply a "least squares" algorithm to obtain a line of best fit. You can then go through your points and downfilter consecutive points that lie close to the line. You only need to plot the outliers, and the points that take the curve back to the line of best fit.
Edit: You may not need to employ "least squares"; if your input is expected to hover around "y=ax+b" as you say, then that's already your line of best fit and you can just use that. :)

Resources