R - using heatmaply for a 2d histogram / density - r

I'm rather new to programming and the site so let me know if I screw up on this explanation.
I have a rather long series of x, y coordinates representing a character in 2d space. Let's say that space is 200 x 400. I want to represent the amount of time the character is in each x, y coordinate by means of a pretty chloropleth.
I want to use heatmaply for this because I think the output is pretty and I want my audience to be able to zoom in on the data. It isn't really meant to do density estimation (I think?) so I'm trying to work around it.
I suppose the way to do this is to fill a 200x400 dataframe with counts of the number of occurrences of each x, y coordinate in the data at each x, y coordinate in the frame. Essentially, to build a 2d map out of the data frame and impose the counts upon it
So, I suppose my questions are:
1). How do I get the count of each unique x, y coordinate in my set
2). How might I pass those counts easily to the matching x, y cell in my 200x400 dataframe full of zeroes?
This seems like it should be easy but I can't seem to figure it out! I'm a novice to r and can't see the shape of what I need to do.

You can use the table function to get your matrix of counts.
table(X,Y)
X and Y should be columns of coordinates.
Output based on some sample data

Related

How can I push to a specific series in a Julia plot?

I'm trying to create an animation in Julia where a satellite orbits Earth. Earth in this case is represented by a static circle and the satellite's trajectory is a path extending from the launch point to the satellite's current position.
If I understand the process correctly, to create a gif in Julia, I need to use the #gif macro with a loop and create the next gif frame on each iteration of the loop. I've been attempting to plot Earth, then plot the launch point, then push the next position in the satellite's trajectory on each loop iteration, but it's pushing data to the Earth dataset.
I also have other plots that I would like to animate, but the animation examples that use multiple data series don't specify any x values. I need to specify x and y values for each datapoint in each series.
How can I specify the series to push a new point to?
Well, while trying to put together a small example script, I figured it out.
To begin, the conditions under which you can use push! with a plot are fairly specific. You can't use an Int64 (or any other type of integer) as an x value or push! will try to access the plot like an array at the "index" specified by your x data. This means you have to ensure every input is a Float (I didn't try this with more exotic data types for plotting like Bools, but I assume that that wouldn't go well either).
Also, the x and y (and z) data in a plot can't be something that push! doesn't work on normally, like a StepRangeLen (e.g. t = 0:10). Unfortunately this introduces an extra layer of complexity; if you need to use StepRangeLens in your plots, you'll have to convert them to Arrays: t = Array{Float64}(0:10).
Finally, it's probably good practice to pass in as many x and y values on each call to push! as you have series (if this wording is awkward, see the example below). Some of the examples for the Plots package add complexity in specifying a single x value for multiple y values, which is fine if your x values are the same for both series, but becomes a problem if they're different.
Putting all of this together, here's a minimal example of pushing to different series:
using Plots
# Let x and z be two different-valued, different-length vectors
x = Array{Float64}(range(0, stop=π, length=30))
z = Array{Float64}(range(0, stop=-π, length=20))
p = plot(x,sin.(x))
plot!(p, z, cos.(z))
# Pushing a single x,y pair goes to the first series:
push!(p, 0.0, -0.5)
# Pushing a single x value and a 2x1 Array sends the x value to
# both series, the first y value to the first series, and the
# second y value to the second series.
push!(p, -0.2, [-0.75, 0.2])
# Note: comma ^ is important
# Pushing two x values and two y values sends the first x value to
# the first series and the second x value to the second series.
# Same for the y values, which is the same as the previous example
push!(p, [-π/4, π/4], [0.1, 0.2])
# If you want to push only to one series, send a NaN to the others:
push!(p, [NaN, -3π/2], [NaN, 1.0])
display(p)
The plot is pretty incoherent if you run this as-is. I recommend commenting out each of the push! statements and uncommenting each one individually to see its effect on the plot.

3D Ploting in Scilab: Weird plot behaviour

I want to plot a function in scilab in order to find the maximum over a range of numbers:
function y=pr(a,b)
m=1/(1/270000+1/a);
n=1/(1/150000+1/a);
y=5*(b/(n+b)-b/(m+b))
endfunction
x=linspace(10,80000,50)
y=linspace(10,200000,50)
z=feval(x,y,pr)
surf(x,y,z);
disp( max(z))
For these values this is the plot:
It's obvious that increasing the X axis will not increase the maximum but Y axis will.
However from my tests it seems the two axis are mixed up. Increasing the X axis will actually double the max Z value.
For example, this is what happens when I increase the Y axis by a factor of ten (which intuitively should increase the function value):
It seems to increase the other axis (in the sense that z vector is calculated for y,x pair of numbers instead of x,y)!
What am I doing wrong here?
With Scilab's surf you have to use transposed z if comming from feval. It is easy so realize if you use a different number of points in X and Y directions, as surf will complain about the size of the third argument. So in your case, use:
surf(x,y,z')
For more information see the help page of surf.
Stephane's answer is correct, but I thought I'd try to explain better why / what is happening.
From the help surf page (emphasis mine):
X,Y:
two vectors of real numbers, of lengths nx and ny ; or two real matrices of sizes ny x nx: They define the data grid (horizontal coordinates of the grid nodes). All grid cells are quadrangular but not necessarily rectangular. By default, X = 1:size(Z,2) and Y = 1:size(Z,1) are used.
Z:
a real matrix explicitly defining the heights of nodes, of sizes ny x nx.
In other words, think of surf as surf( Col, Row, Z )
From the help feval page (changed notation for convenience):
z=feval(u,v,f):
returns the matrix z such as z(i,j)=f(u(i),v(j))
In other words, in your z output, the i become rows (and therefore u should represent your rows), and j becomes your columns (and therefore v should represent your columns).
Therefore, you can see that you've called feval with the x, y arguments the other way round. In a sense, you should have designed pr so that it should have expected to be called as pr(y,x) instead, so that when passed to feval as feval(y,x,pr), you would end up with an output whose rows increase with y, and columns increase with x.
Then you could have called surf(x, y, z) normally, knowing that x corresponds to columns, and y corresponds to rows.
However, if you don't want to change your whole function just for this, which presumably you don't want to, then you simply have to transpose z in the call to surf, to ensure that you match x to the columns or z' (i.e, the rows of z), and y to the rows of z' (i.e. the columns of z).
Having said all that, it would probably be much better to make your function vectorized, and just use the surf(x, y, pr) syntax directly.

How to cleanly use interpolation between points to generate a mean in R

I am having issues trying to generate a code that will cleanly produce a mean (specifically a weighted average) based on a simple plot of points using interpolation.
For Example;
ex=c(1,2,3,4,5)
why=c(2,5,9,15,24)
This shows the kind of information I am working with.
plot(ex, why, type="o")
At this point, I want to actually have each point "binned" so the lines between them are straight. To do this, I have been adding points to the x values manually in excel as (x+0.01).
This is the new output:
why=c(2,2,5,5,9,9,15,15,24,24)
ex=c(1,2,2.01,3,3.01,4,4.01,5,5.01,6)
plot(ex, why, type="o")
So this is where my question comes in to play. I have to do this many times and do not want to generate a ton of new vectors and objects. To get a weighted average, I have been interpolating y values for increments of x at 0.01 using interpolation into a new object. I am then able to go into this new object and get a mean when a point falls between the actual ex values, i.e.
mean(newy[1:245])
Because I made new y values for 100 increments of x that (basically) follow a straight line, I am getting a weighted average here for x= 1 to 2.45.
Is there an easier and more elegant way to embed the interpolate code into the mean code so I could just say "average of interpolated y for nonreal x to nonreal x?"
It doesn't do exactly what you want, but you should consider the stepfun function -- this creates a step function out of two series.
plot(stepfun(ex[-1], why))
stepfun is handy because it gives you a function defined over that interval, so you can easily interpolate just by evaluating anywhere. The downside to it is that it is not strictly defined on the range given (which is why we have to cut off the first value in ex).
Based on your second plotting example, I think you are probably looking for this:
library(ggplot2)
qplot(ex, why, geom="step")
this gives:
Or if you want the line to go vertical first, you can use:
qplot(ex, why, geom="step", direction = "vh")
which gives:

inverse interpolation of multidimensional grids

I am working on a project of interpolating sample data {(x_i,y_i)} where the input domain for x_i locates in 4D space and output y_i locates in 3D space. I need generate two look up tables for both directions. I managed to generate the 4D -> 3D table. But the 3D -> 4D one is tricky. The sample data are not on regular grid points, and it is not one to one mapping. Is there any known method to treat this situation? I did some search online, but what I found is only for 3D -> 3D mapping, which are not suitable for this case. Thank you!
To answer the questions of Spektre:
X(3D) -> Y(4D) is the case 1X -> nY
I want to generate a table that for any given X, we can find the value for Y. The sample data is not occupy all the domain of X. But it's fine, we only need accuracy for point inside the domain of sample data. For example, we have sample data like {(x1,x2,x3) ->(y1,y2,y3,y4)}. It is possible we also have a sample data {(x1,x2,x3) -> (y1_1,y2_1,y3_1,y4_1)}. But it is OK. We need a table for any (a,b,c) in space X, it corresponds to ONE (e,f,g,h) in space Y. There might be more than one choice, but we only need one. (Sorry for the symbol confusing if any)
One possible way to deal with this: Since I have already established a smooth mapping from Y->X, I can use Newton's method or any other method to reverse search the point y for any given x. But it is not accurate enough, and time consuming. Because I need do search for each point in the table, and the error is the sum of the model error with the search error.
So I want to know it is possible to find a mapping directly to interpolate the sample data instead of doing such kind of search in 3.
You are looking for projections/mappings
as you mentioned you have projection X(3D) -> Y(4D) which is not one to one in your case so what case it is (1 X -> n Y) or (n X -> 1 Y) or (n X -> m Y) ?
you want to use look-up table
I assume you just want to generate all X for given Y the problem with non (1 to 1) mappings is that you can use lookup table only if it has
all valid points
or mapping has some geometric or mathematic symmetry (for example distance between points in X and Yspace is similar,and mapping is continuous)
You can not interpolate between generic mapped points so the question is what kind of mapping/projection you have in mind?
First the 1->1 projections/mappings interpolation
if your X->Y projection mapping is suitable for interpolation
then for 3D->4D use tri-linear interpolation. Find closest 8 points (each in its axis to form grid hypercube) and interpolate between them in all 4 dimensions
if your X<-Y projection mapping is suitable for interpolation
then for 4D->3D use quatro-linear interpolation. Find closest 16 points (each in its axis to form grid hypercube) and interpolate between them in all 3 dimensions.
Now what about 1->n or n->m projections/mappings
That solely depends on the projection/mapping properties which I know nothing of. Try to provide an example of your datasets and adding some image would be best.
[edit1] 1 X <- n Y
I still would use quatro-linear interpolation. You still will need to search your Y table but if you group it like 4D grid then it should be easy enough.
find 16 closest points in Y-table to your input Y point
These points should be the closest points to your Y in each +/- direction of all axises. In 3D it looks like this:
red point is your input Y point
blue points are the found closest points (grid) they do not need to be so symmetric as on image .
Please do not want me to draw 4D example that make sense :) (at least for sober mind)
interpolation
find corresponding X points. If there is more then one per point chose the closer one to the others ... Now you should have 16 X points and 16+1 Y points. Then from Y points you need just to calculate the distance along lines from your input Y point. These distances are used as parameter for linear interpolations. Normalize them to <0,1> where
0 means 'left' and 1 means 'right' point
0.5 means exact middle
You will need this scalar distance in each of Y-domain dimension. Now just compute all the X points along the linear interpolations until you get the corresponding red point in X-domain.
With tri-linear interpolation (3D) there are 4+2+1=7 linear interpolations (as on image). For quatro-linear interpolation (4D) there are 8+4+2+1=15 linear interpolations.
linear interpolation
X = X0 + (X1-X0)*t
X is interpolated point
X0,X1 are the 'left','right' points
t is the distance parameter <0,1>

Is there a way of forcing the image function in R not to normalize coordinates?

When using the image function in R it normalized the length of the dimensions of the input matrix so X and Y axes go from 0 to 1.
Is there a way of telling the image function not to normalize these numbers?
I need to do so in order to overlay different kinds of data and normalizing all these coordinates into the [0,1] space is very tedious.
EDIT: The answer provided by Greg explains the situation.
A reproducible example would be very helpful here. Generally if you only give image a z matrix then the function chooses default x and y values that work, I think this is what you are seeing. On the other hand if you give image an x vector and a y vector then it uses that information to construct the graph. If the x/y vectors have a length equal to the corresponding dimension of z then those values represent the centers of the rectangles, if x/y is 1 longer than the corresponding dimension of z then they represent the corners of the rectangles. This gives you a lot of control over the things that you mention.
If this does not answer the question then give us a self contained reproducible example to work with.
I am going to answer my question based on the answer Greg Snow provided in order to follow the best practice of this site as anything that provides information should be an answer.
If you do not provide the x nor y parameters to the image() function, then the range of the axes is from 0 to 1 as in the next example.
> image(volcano)
Then, if you want to locate a point of interest in the matrix in use, for the element of the matrix with [x,y] coordinates of [10,40] you need to do something like:
> points(x=10/length(volcano[,1]),y=40/length(volcano[1,]))
If the x and y parameters are specified, and (as Greg mentioned) they fit the dimensions of the matrix, then the axes will range withing the specified x and y vectors.
> dim(volcano)
[1] 87 61
> image(x=1:87, y=1:61, z=volcano)
> points(10,40)

Resources