is there a kdtree with a find that returns the nearest point of two dimensions subject to relative range limit of a third dimension? - recursion

Given a kdtree defined for 3 dimensions, how to find Pythagorean-nearest in just 2 of those dimensions but subject to the constraint of just being within a certain range in dimension 3?
Actually wrote it and it works. Trying here to see if other implementations exist and might be better.

Related

Calculating a list of valid combinations

So I came up with a problem that I can’t logically solve. I’ve fleshed out an example;
You have a list of numbers that you need to output (e.g. 3x number 1 & 2x number 2) - we’ll call them ‘target numbers’.
You’re also given a range of source numbers (e.g. 2, 3, 4 and 5).
The task is to return all valid combinations of the source numbers that would allow you to produce the target numbers. You can use any combination and quantity of source numbers.
The constraints are that you can break a source number down to get to target numbers (e.g. you could break down a 5 into a 2 and a 3) but you cannot add source numbers together to get to a target number (for example you can’t add a 1 to a 1 to get to a 2).
Remainders are perfectly acceptable (e.g. using a source 3 to get to a target 2 and the remaining 1 is part of the combination but not ‘used’ in getting to the target).
In the interests of limiting results you’d also want to have a constraint that an acceptable combination does not contain any ‘totally unused’ source numbers [i.e. neither a result of being split nor a target number in the result)
So in the example target & source numbers given, the following results would be valid;
[1,1,1,2,2],[3,4],[3,3,1],[4,4]
But a [1,1,1,1,1,2] would not be valid, because you cannot join two source 1’s together to make a target 2.
I’ve thought about this logic and am largely thinking the solution involves some level of recursion, but the only examples I’ve seen are where the constraints are reversed (i.e. you can add source numbers together to reach a target number, but cannot split them to reach one)
What kind of logic would you use to generate all valid permutations in code?

How is RSME calculated between point clouds?

RSME calculates how close the predicted value is compared to the actual value, but in a point cloud, there are 2 things that I am confused about:
How do we know which point corresponds to which point, to be subtracted from?
Point clouds are 3-dimensional since it has xyz values, but how do people turn those 3 values to one RSME value?
First of all, it's RMSE, not RSME. It stands for Root Mean Square Error:
https://en.wikipedia.org/wiki/Root-mean-square_deviation
With 3D coordinates you can compare component wise, or however else you choose to define a distance measure. Then you plug this into the RMSE formula. Essentially this means comparing an expected value to your observed value.
As for the point correspondence - this depends on the algorithm of choice. Probably one of the most famous examples is ICP:
https://de.wikipedia.org/wiki/Iterative_Closest_Point_Algorithm
In a nutshell for every point of one cloud, the closest point of the other cloud is determined. Then an error measure is calculated and lastly points are transformed. This is done an arbitrary number of times, depending on the desired precision.
Since I strongly suspect that you are indeed looking for ICP, here is the description as to how they are put together:
https://en.wikipedia.org/wiki/Iterative_closest_point
Other than that you will have to do some reading yourself.

Setting the "tpow" and "expcost" arguments in TraMineR::seqdist

I'm actually working on the pathways of inpatients during their hospital stay. These pathways are represented as states sequences (the current medical unit at each time unit) and I'm trying to find typical pathways through clustering algorithms.
I create the distance matrix by using the seqdist function from the R package TraMineR, with the method "OMspell". I've already read the R documentation and the related articles, but I can't find how to set the arguments tpow and expcost.
As the time unit is an hour, I don't want any little difference of duration to have a big impact on the clustering result (contrary to a medical unit transfer for example). But I don't want the duration not to have any impact either...
Also, is there a proper way to choose their value ? Or do I just continue to grope around for a good configuration ? (I'm using Dunn, Davies-Bouldin and Silhouette criteria to compare the results of hierarchical clustering, besides the medical opinion on the resulting clusters)
The parameter tpow is an exponential coefficient applied to transform the actual spell lengths (durations). The default value is 1 for which the spell lengths are taken as are. With tpow=0, you would just ignore spell durations, and with tpow=0.5 you would consider the square root of the spell lengths.
The expcost parameter is the expansion cost, i.e. the cost for expanding a (transformed) spell length by one unit. In other words, when in the editing of one sequence into the other a spell of length t1 has to be expanded to length t2, it would cost expcost * |t2^tpow - t1^tpow|. With expcost=0 spells in a same state (e.g. AA and AAAAA) would be equivalent whatever their lengths.
With tpow=.5, for example, increasing the spell length from 1 to 2 costs more than increasing a spell length form 3 to 4. If you do not want to give to much importance to small differences in spell lengths use a low expcost. However, note that the expcost applies to the transformed spell lengths and you may want to adjust it when you change the tpow value.

Is it possible to represent 'average value' in programming?

Had a tough time thinking of an appropriate title, but I'm just trying to code something that can auto compute the following simple math problem:
The average value of a,b,c is 25. The average value of b,c is 23. What is the value of 'a'?
For us humans we can easily compute that the value of 'a' is 29, without the need to know b and c. But I'm not sure if this is possible in programming, where we code a function that takes in the average values of 'a,b,c' and 'b,c' and outputs 'a' automatically.
Yes, it is possible to do this. The reason for this is that you can model the sort of problem being described here as a system of linear equations. For example, when you say that the average of a, b, and c is 25, then you're saying that
a / 3 + b / 3 + c / 3 = 25.
Adding in the constraint that the average of b and c is 23 gives the equation
b / 2 + c / 2 = 23.
More generally, any constraint of the form "the average of the variables x1, x2, ..., xn is M" can be written as
x1 / n + x2 / n + ... + xn / n = M.
Once you have all of these constraints written out, solving for the value of a particular variable - or determining that many solutions exists - reduces to solving a system of linear equations. There are a number of techniques to do this, with Gaussian elimination with backpropagation being a particularly common way to do this (though often you'd just hand this to MATLAB or a linear algebra package and have it do the work for you.)
There's no guarantee in general that given a collection of equations the computer can determine whether or not they have a solution or to deduce a value of a variable, but this happens to be one of the nice cases where the shape of the contraints make the problem amenable to exact solutions.
Alright I have figured some things out. To answer the question as per title directly, it's possible to represent average value in programming. 1 possible way is to create a list of map data structures which store the set collection as key (eg. "a,b,c"), while the average value of the set will be the value (eg. 25).
Extract the key and split its string by comma, store into list, then multiply the average value by the size of list to get the total (eg. 25x3 and 23x2). With this, no semantic information will be lost.
As for the context to which I asked this question, the more proper description to the problem is "Given a set of average values of different combinations of variables, is it possible to find the value of each variable?" The answer to this is open. I can't figure it out, but below is an attempt in describing the logic flow if one were to code it out:
Match the lists (from Paragraph 2) against one another in all possible combinations to check if a list contains all elements in another list. If so, substract the lists (eg. abc-bc) as well as the value (eg. 75-46). If upon substracting we only have 1 variable in the collection, then we have found the value for this variable.
If there's still more than 1 variables left such as abcd - bc = ad, then store the values as a map data structure and repeat the process, till the point where the substraction count in the full iteration is 0 for all possible combinations (eg. ac can't substract bc). This is unfortunately not where it ends.
Further solutions may be found by combining the lists (eg. ac + bd = abcd) to get more possible ways to subtract and derive at the answer. When this is the case, you just don't know when to stop trying, and the list of combinations will get exponential. Maybe someone with strong related mathematical theories may be able to prove that upon a certain number of iteration, further additions are useless and hence should stop. Heck, it may even be possible that negative values are also helpful, and hence contradict what I said earlier about 'ac' can't subtract 'bd' (to get a,c,-b,-d). This will give even more combinations to compute.
People with stronger computing science foundations may try what templatetypedef has suggested.

Path finding - Merging different cost functions

In my path finding school project, the user is given 3 options to navigate between two points:
Shortest path (Kilometers). I've defined the cost function for each 2 points to be the distance of the road that connects them.
Fastest path (Each road has a speed limit). I've defined the cost function between each 2 points to be 1/(SpeedLimit).
Simplest path (Minimizes turns, a turn is defined if the road changes direction by more than alpha degrees). I've defined a state to be a tuple of a point and direction, and defined the cost function to be 1 if the change of direction is larger than alpha and 0 otherwise.
The user then supplies 3 real numbers between 0 and 1 to specify the importance of each navigating option.
So basically the cost function should be the sum of the three cost functions described above, each multiplied by the number supplied. My problem is that each cost function is of different units, for example, the first cost function is in kilometers and the third cost function is boolean (0 or 1).
How can I convert them so that it makes sense?
Define a cost function for each criteria that maps from a path to a real number.
f1(path) = cost associated with the distance of the path
f2(path) = cost of the time taken to traverse the path
f3(path) = cost of the complexity of the route
Defining f1 and f2 should be fairly straightforward. f3 is more complex and subjective but I suspect it really shouldn't be a boolean unless there's some very specific reason why you would need it to be. Perhaps the function for path complexity could be something like the sum of the number of degrees (radians) in every turn taken in the trip. There's certainly quite a few other choices for such a function that immediately come to mind, (for example the length of the representation required to describe the path). For f3 you will have to choose whatever one suits your purposes best.
Once you have defined the individual cost functions you could get an overall cost for the path by taking a linear combination of those 3 functions:
cost(path) = a1*f1(path) + a2*f2(path) + a3*f3(path)
Finding sensible values for a1, a2, a3 is most of the challenge. There are a few statistical methods you might want to use to do this.

Resources