Is there an efficient way to count dots in cells? - graph

I have graphs of sets of points like:-
There are up to 1 million points on each graph. You can see that the points are scattered over a grid of cells, each sized 200 x 100 units. So there are 35 cells shown.
Is there an efficient way to count how many points there are in each cell? The brute force approach seems to be to parse the data 35 times with a whole load of combined is less or greater than statements.

Some of the steps below could be optimized in the sense that you could perform some of these as you build up the data set. However I'll assume you are just given a series of points and you have to find which cells they fit into. If you can inject your own code into the step that builds up the graph, you could do the stuff I wrote below along side of building the graph instead of after the fact.
You're stuck with brute force in the case of just being given the data, there's no way you can know otherwise since you have to visit each point at least once to figure out what cell it is in. Therefore we are stuck with O(n). If you have some other knowledge you could exploit, that would be up to you to utilize - but since it wasn't mentioned in the OP I will assume we're stuck with brute force.
The high level strategy would be as follows:
// 1) Set rectangle bounds to have minX/Y at +inf, and maxX/Y to be -inf
// or initialize it with the first point
// 2) For each point:
// Set the set the min with min(point.x, bounds.min.x)
// Same for the max as well
// 3) Now you have your bounds, you divide it by how many cells fit onto each
// axis while taking into account that you might need to round up with division
// truncating the results, unless you cast to float and ceil()
int cols = ceil(float(bounds.max.x - bounds.min.x) / CELL_WIDTH);
int rows = ceil(float(bounds.max.y - bounds.min.y) / CELL_HEIGHT);
// 4) You have the # of cells for the width and height, so make a 2D array of
// some sort that is w * h cells (each cell contains 32-bit int at least) and
// initialize to zero if this is C or C++
// 5) Figure out the cell number by subtracting the bottom left corner of our
// bounds (which should be the min point on the x/y axis that we found from (1))
for (Point p in points):
int col = (p.x - minX) / cellWidth;
int row = (p.y - minY) / cellHeight;
data[row][col]++;
Optimizations:
There are some ways we might be able to speed this up off the top of my head:
If you have powers of two with the cell width/height, you could do some bit shifting. If it's a multiple of ten, this might possibly speed things up if you aren't using C or C++, but I haven't profiled this so maybe hotspot in Java and the like would do this for you anyways (and no idea about Python). Then again 1 million points should be pretty fast.
We don't need to go over the whole range at the beginning, we could just keep resizing our table and adding new rows and columns if we find a bigger value. This way we'd only do one iteration over all the points instead of two.
If you don't care about the extra space usage and your numbers are positive only, you could avoid the "translate to origin" subtraction step by just assuming everything is already relative to the origin and not subtract at all. You could get away with this by modifying step (1) of the code to have the min start at 0 instead of inf (or the first point if you chose that). This might be bad however if your points are really far out on the axis and you end up creating a ton of empty slots. You'd know your data and whether this is possible or not.
There's probably a few more things that can be done but this would get you on the right track to being efficient with it. You'd be able to work back to which cell it is as well.
EDIT: This assumes you won't have some really small cell width compared to the grid size (like your width being 100 units, but your graph could span by 2 million units). If so then you'd need to look into possibly sparse matrices.

Related

Perlin noise different implementations

I have read some articles on perlin noise but each seems to have their own way of implementation:
In this article, the gradient function returns a single double value.
In this article, the gradient is generated as a 3D vector.
In this article a static 256 array of random gradient vectors is generated and a random one is picked using the permutation table and then more complex details of spherical gradients are discussed.
And these are just a few of the articles I saw. With all these variations of the same algorithm which one do I use or which one is suitable for what purpose?
I have generated terrains and height maps with each of these techniques and their respective outputs widely differ in their own ways and I can't tell if I am doing it right cause I don't know what to look for in the output (cause it's just random values at the end)
I am just looking for some context on when to use what so any insight would be very usefull
There are multiple ways to implement the same algorithm, some are faster or slower than others, some are easier or harder to understand. The original implementation by Ken Perlin is difficult to understand by just looking at it. So some of the articles you linked (including #2, which I wrote, yay!), try to simplify the implementation to make it easier to understand.
But in the end, its exactly the same algorithm:
Take the input, calculate the coordinates of the 4 corners of the square (for 2D Perlin noise, or cube if using the 3D version) containing the input point
Calculate a random value for all 4 of them (by first assigning a random gradient vector to each one (there are 4 possibilities in 2D: (+1, +1), (-1, +1), (-1, -1) and (+1, -1)), then calculating the dot product between this random gradient vector and the vector from the corner of the square to the input point)
Finally, smoothly interpolate between those 4 random values to get a final value
In article #1, the grad function returns the dot product directly, whereas in article #2, vector objects are created and a dot product function is called to make it explicit what is being done (this will probably be a bit slower than the other implementations since a lot of vector objects are created and used briefly each time you want to run the algorithm).
Whether 2 implementations will produce the same terrain / height maps depends on if they generate the same random values for each corner of the square/cube (the results of the dot products). If 2 algorithms generate the same random values for every single integer points on the grid (all the corners of all the possible squares/cubes), then they will produce the same results. Ken Perlin's original implementation and the 3 articles all use an array of integers to generate a random gradient vector for each corner (out of 4 possible choices) to calculate the dot product. So in theory if the arrays are identical, then they should produce the same results. (Unless maybe if some implementation uses another method to generate the random vectors.)
I'm not really sure if that answers you questions, so don't hesitate to ask something else :)
Edit:
Generally, you would not use Perlin noise alone. So for every final value you want (for example a single pixel in a height map texture), you would call the noise function multiple times (octaves). For example:
float finalValue = 0.0f;
float amplitude = 1.0f;
float frequency = 1.0f;
int octaveCount = 8;
for (int octave = 0; octave < octaveCount; ++octave) {
finalValue += amplitude * noise(x * frequency, y * frequency, z * frequency);
amplitude *= 0.5f;
frequency *= 2.0f;
}
// Do something fun with 'finalValue'
Frequency, amplitude and the number of octaves are the most common parameters you can play with to produce different values.
If, say, you are generating a terrain, you would want many octaves. The first one will produce the rough shape of the mountains, so you would want a high amplitude (1.0 in the example code) and low frequency (also 1.0 in the above code). But just this octave would result in really smooth terrain with no details. For those small details, you would want more octaves, but with higher frequencies (so in the same range for your inputs (x, y, z), you would have a lot more ups and downs of the Perlin noise value), and lower amplitudes (you want small details, because if you would keep the same amplitude as the first octave (1.0, in the example code), there would be a lot of ups and downs really close together and really high, and this would result in a really rough moutains (imagine 100 meters 80 degrees drops and slopes every few meters you walk))
You can play with those parameters to get different results. There is also something called "domain warping" or "warped noise" that you can look up. Basically, you call a noise function as the input of a noise function. Like instead of calling:
float result = noise(x, y, z);
You would call something like:
// The numbers used are arbitrary values, you can just play around until you get something cool
float result = noise(noise(x * 1.7), 0.5 * noise(y * 4.1), noise(z * 2.3));
This can produce really interesting results

Handle "Division by Zero" in Image Processing (or PRNU estimation)

I have the following equation, which I try to implement. The upcoming question is not necessarily about this equation, but more generally, on how to deal with divisions by zero in image processing:
Here, I is an image, W is the difference between the image and its denoised version (so, W expresses the noise in the image), and K is an estimated fingerprint, gained from d images of the same camera. All calculations are done pixel-wise; so the equations does not involve a matrix multiplication. For more on the Idea of estimating digital fingerprints consult corresponding literature like the general wikipedia article or scientific papers.
However my problem arises when an Image has a pixel with value Zero, e.g. perfect black (let's say we only have one image, k=1, so the Zero gets not overwritten by the pixel value of the next image by chance, if the next pixelvalue is unequal Zero). Then I have a division by zero, which apparently is not defined.
How can I overcome this problem? One option I came up with was adding +1 to all pixels right before I even start the calculations. However this shifts the range of pixel values from [0|255] to [1|256], which then makes it impossible to work with data type uint8.
Other authors in papers I read on this topic, often do not consider values close the range borders. For example they only calculate the equation for pixelvalues [5|250]. They reason this, not because of the numerical problem but they say, if an image is totally saturated, or totally black, the fingerprint can not even be estimated properly in that area.
But again, my main concern is not about how this algorithm performs best, but rather in general: How to deal with divisions by 0 in image processing?
One solution is to use subtraction instead of division; however subtraction is not scale invariant it is translation invariant.
[e.g. the ratio will always be a normalized value between 0 and 1 ; and if it exceeds 1 you can reverse it; you can have the same normalization in subtraction but you need to find the max values attained by the variables]
Eventualy you will have to deal with division. Dividing a black image with itself is a proper subject - you can translate the values to some other range then transform back.
However 5/8 is not the same as 55/58. So you can take this only in a relativistic way. If you want to know the exact ratios you better stick with the original interval - and handle those as special cases. e.g if denom==0 do something with it; if num==0 and denom==0 0/0 that means we have an identity - it is exactly as if we had 1/1.
In PRNU and Fingerprint estimation, if you check the matlab implementation in Jessica Fridrich's webpage, they basically create a mask to get rid of saturated and low intensity pixels as you mentioned. Then they convert Image matrix to single(I) which makes the image 32 bit floating point. Add 1 to the image and divide.
To your general question, in image processing, I like to create mask and add one to only zero valued pixel values.
img=imread('my gray img');
a_mat=rand(size(img));
mask=uint8(img==0);
div= a_mat/(img+mask);
This will prevent division by zero error. (Not tested but it should work)

Programming Dot Probe for Psychopy in Builder

I am new to using PsychoPy and I have programmed a few simple tasks. I am currently really struggling to program a word dot probe. I do not want to use coder, simply because the rest of my research team need to be able to easily edit the program, and work and use it.
In case anyone is wondering what my specific problem is, I cannot seem to get the pictures to load at the same time correctly and do not know how to get a probe to appear behind one of the pictures once the pictures have disappeared.
Timing
The timing issue can be solved by inserting an ISI period in the beginning of the trial, e.g. during a fixation cross. This allows psychopy to load the images in the background so that they are ready for presentation.
Truly random dot position
In your case, you want the dot position to be random, independently of image. This is one of the cases that TrialHandler does not handle and I suspect you need to insert a code component to make this work. For true randomness but only 50% probability in the limit of infinite trials, simply put this in a code component under "begin routine":
x = (np.random.binomial(1, prob) - 0.5) * xdist
y = 0
dot.pos = [x, y]
and change dot to the name of your dot stimulus, y is the vertical offset, x is the horizontal offset (here varying between trials), xdist is the distance between the dot positions, and prob is the chance of the dot appearing to the right. You probably want to set this to 0.5, i.e. 50 %.
Balanced dot position
If you want the dot to appear at each side exactly the same number of times, you can do the following in the code component:
Under "begin experiment", make a list with the exact length of the number of trials:
dotPos = [0, 1] * int(round(numberOfTrials/2)) # create the correct number of left/right (coded as 0 and 1). [0,1] yields 50%. [0,0,0,1] and /4 would yield 25 % etc.
np.random.shuffle(dotPos) # randomize order
Then under "begin routine" do something akin to what we did above:
x = (dotPos.pop() - 0.5) * xdist # dotPos.pop() takes returns the last element while removing it from the list.
y = 0
dot.pos = [x, y]
Naturally, if the number of trials is uneven, one position will be occupied one more time than the other.
Two dot positions for each condition
For the record, if the dot position is to be shown at each position for each image-combination, simply count each of these situations as conditions, i.e. give them a separate rows in the conditions file.

Calculus? Need help solving for a time-dependent variable given some other variables

Long story short, I'm making a platform game. I'm not old enough to have taken Calculus yet, so I know not of derivatives or integrals, but I know of them. The desired behavior is for my character to automagically jump when there is a block to either side of him that is above the one he's standing on; for instance, stairs. This way the player can just hold left / right to climb stairs, instead of having to spam the jump key too.
The issue is with the way I've implemented jumping; I've decided to go mario-style, and allow the player to hold 'jump' longer to jump higher. To do so, I have a 'jump' variable which is added to the player's Y velocity. The jump variable increases to a set value when the 'jump' key is pressed, and decreases very quickly once the 'jump' key is released, but decreases less quickly so long as you hold the 'jump' key down, thus providing continuous acceleration up as long as you hold 'jump.' This also makes for a nice, flowing jump, rather than a visually jarring, abrupt acceleration.
So, in order to account for variable stair height, I want to be able to calculate exactly what value the 'jump' variable should get in order to jump exactly to the height of the stair; preferably no more, no less, though slightly more is permissible. This way the character can jump up steep or shallow flights of stairs without it looking weird or being slow.
There are essentially 5 variables in play:
h -the height the character needs to jump to reach the stair top<br>
j -the jump acceleration variable<br>
v -the vertical velocity of the character<br>
p -the vertical position of the character<br>
d -initial vertical position of the player minus final position<br>
Each timestep:<br>
j -= 1.5; //the jump variable's deceleration<br>
v -= j; //the jump value's influence on vertical speed<br>
v *= 0.95; //friction on the vertical speed<br>
v += 1; //gravity<br>
p += v; //add the vertical speed to the vertical position<br>
v-initial is known to be zero<br>
v-final is known to be zero<br>
p-initial is known<br>
p-final is known<br>
d is known to be p-initial minus p-final<br>
j-final is known to be zero<br>
j-initial is unknown<br>
Given all of these facts, how can I make an equation that will solve for j?
tl;dr How do I Calculus?
Much thanks to anyone who's made it this far and decides to plow through this problem.
Edit: Here's a graph I made of an example in Excel.
I want an equation that will let me find a value for A given a desired value for B.
Since the jump variable decreases over time, the position value isn't just a simple parabola.
There are two difficulties in play here. The first is that you don't actually have j -= 1.5, you have j = max(0, j - 1.5). That throws somewhat of a wrench into calculations. Also, your friction term v *= 0.95 makes direct solution difficult.
I would suggest using a lookup table for this. You can precalculate the desired a for each possible b, by trial and error (e.g. binary search on the values of a that give you the required b). Store the results in a table and just do a simple table lookup during the game.
After extensive use of Excel 2010 and its Seek Goal function, I was able to make a table of values, and Excel gave me an approximate trendline and equation for it, which I tweaked until it worked out. The equation is j = 3.35 * h ^ 0.196, where j is the initial jump force and h is the height required to jump. Thanks for your help.
If I neglect the friction term, and assume that j reaches zero before v reaches zero, I get after a page of calculations that:
b = 1/(8*(deceleration^2)*gravity)*j0^4 - 1/(6*deceleration^2)*j0^3
the solution to this is quite long, but equal approximately (for 10 < b < 400) to:
j0 = (10*(deceleration^2)*gravity*b)^0.25

How to randomly but evenly distribute nodes on a plane

I need to place 1 to 100 nodes (actually 25px dots) on a html5 canvas. I need to make them look randomly distributed so using some kind of grid is out. I also need to ensure these dots are not touching or overlapping. I would also like to not have big blank areas. Can someone tell me what this kind of algorithm is called? A reference to an open source project that does this would also be appreciated.
Thanks all
Guido
What you are looking for is called a Poisson-disc distribution. It occurs in nature in the distribution of photoreceptor cells on your retina. There is a great article about this by Mike Bostock (StackOverflow profile) called Visualizing Algorithms. It has JavaScript demos and a lot of code to look at.
In the interest of doing more then dropping a link into the answer, I will try to give a brief summary of the article:
Mitchell's best-candidate algorithm
A simple approximation known as Mitchell’s best-candidate algorithm. It is easy to implement both crowds some spaces and leaves gaps in other. The algorithm adds new points one at a time. For each new sample, the best-candidate algorithm generates a fixed number of candidates, say 10. The point furthest from any other point is added to the set and the process is repeated until the desired density is achieved.
Bridson's Algorithm
Bridson’s algorithm for Poisson-disc sampling (original paper pdf) scales linearly and is easy to implement as well. This algorithm grows from an initial point and (IMHO) is quite fun to watch (again see Mike Bostock's article). All points in the set are either active or inactive. all points are added as active. One point is chosen from the active set and some number of candidate points are generated in the annulus (a.k.a ring) that extends from the sample with the inner circle having a radius r and the outer circle having a radius 2r. Candidate sample less then r distance away from any point in the FinalSet are rejected. Once a sample is found that is not rejected it is added the the FinalSet. If all the candidate sample are rejected the original point is marked as inactive on the assumption that is has so many neighboring points that no more can be added around it. When all samples are inactive the algorithm terminates.
A grid of size r/√2 can be used to greatly increase the speed of checking candidate points. Only one point may ever be in a grid square and only a limited number of adjacent squares need to be checked.
The easiest way would be to just generate random (x, y) coordinates for each one, repeating if they are touching or overlapping.
Pseudocode:
do N times
{
start:
x = rand(0, width)
y = rand(0, height)
for each other point, p
if distance(p.x, p.y, x, y) < radius * 2
goto start
add_point(x, y);
}
This is O(n^2), but if n is only going to be 100 then that's fine.
I don't know if this is a named algorithm, but it sounds like you could assign each node a position on a “grid”, then pick a random offset. That would give the appearance of some chaos while still guaranteeing that there are no big empty spaces.
For example:
node.x = node.number / width + (Math.random() - 0.5) * SOME_SCALE;
node.y = node.number % height + (Math.random() - 0.5) * SOME_SCALE;
Maybe you could use a grid of circles and place one 25px-dot in every circle? Wouldn't really be random, but look good.
Or you could place dots randomly and then make empty areas attract dots and give dots a limited-range-repulsion, but that is maybe too complicated and takes too much CPU time for this simple task.

Resources