ℝ³ -> ℕ mapping for a finite number of values - math

I am looking for an algorithm that is capable of mapping a finite but large number of 3 dimensional positions (about 10^11) to indices (so a mapping ℝ³ -> ℕ)
I know that it's possible and fairly simple to make an ℕ -> ℝ³ mapping, and that's essentially what i want to do, but ℕ -> ℝ³ would be an impractical way of figuring out which indices of ℕ are near a certain position,
Ideally i would also like to ensure that my finite subset of ℕ contains no duplicates.
Some background on how this would be implemented to give a better idea on the constraints and problems with some naive solutions to this problem:
I'm trying to think of a way to map stars in a galaxy to a unique ID that i can then use as a "seed" for a random number generator, an ℕ -> ℝ³ mapping would require me to iterate over all of ℕ to find the values of ℝ³ that are near a given location, which is obviously not a practical approach
I've already found some information about the cantor pairing function and dovetailing, but those cause problems because those mainly apply to ℕⁿ and not ℝⁿ.
It's not guaranteed that my ℝ³ values follow a grid, if they did i could map ℝ³-> ℕ³ by figuring out which "box" the value is in in, and then use cantor's pairing function to figure out which ℕ belongs to that box, but in my situations the box might contain multiple values, or none.
Thanks in advance for any help

You could use a k-d tree to spatially partition your set of points. To map onto a natural number, treat the path through the tree to each point as string of binary digits where 0 is the left branch and 1 is the right branch. This might not get you exactly what you're looking for, since some points which are spatially close to each other, may lie on different branches, and are therefore numerically distant from each other. However if two points are close to each other numerically, they will be close to each other spatially.
Alternatively, you could also use an octree, in which case you get three bits at a time for each level you descend into the tree. You can completely partition the space so each region contains at most one point of interest.

Related

What is an active node in a graph and how can it be identified?

I don't understand exactly what an active node is and how I could identify one, you have an example in the image below
https://i.stack.imgur.com/zJ12z.png
I don't necessarily need a solution, but more of a definition or a good example
The Push-Relabel Maximum Flow Algorithm has the definition:
For a fixed flow f1, a vertex v ∉ {s, t} is called active if it has positive excess with respect to f1, i.e., xf(u) > 0.
If that is the algorithm you are using to analyse the graph then you can use that definition; if you are using a different algorithm then you should consult the definitions for that algorithm and it will tell you how an "active" vertex is defined.

Trying to use ConcatLayer with different shape inputs

I am trying to work with nolearn and use the ConcatLayer to combine multiple inputs. It works great as long as every input has the same type and shape. I have three different types of inputs that will eventually produce a single scalar output value.
The first input is an image of dimensions (288,1001)
The second input is a vector of length 87
The third is a single scalar value
I am using Conv2DLayer(s) on the first input.
The second input utilizes Conv1DLayer or DenseLayer (not sure which would be better since I can't get it far enough to see what happens)
I'm not even sure how the third input should be set up since it is only a single value I want to feed into the network.
The code blows up at the ConcatLayer with:
'Mismatch: input shapes must be the same except in the concatenation axis'
It would be forever grateful if someone could write out a super simple network structure that can take these types of inputs and output a single scalar value. I have been googling all day and simply cannot figure this one out.
The fit function looks like this if it is helpful to know, as you can see I am inputting a dictionary with an item for each type of input:
X = {'base_input': X_base, 'header_input': X_headers, 'time_input':X_time}
net.fit(X, y)
It is hard to properly answer the question, because - it depends.
Without having information on what you are trying to do and what data you are working on, we are playing the guessing game here and thus I have to fall back to giving general tips.
First it is totally reasonable, that ConcatLayer complains. It just does not make a lot of sense to append a scalar to the Pixel values of an Image. So you should think about what you actually want. This is most likely combining the information of the three sources.
You are right by suggesting to process the Image with 2D convolutions and the sequence data with 1D convolutions. If you want to generate a scalar value, you propably want to use dense layers later on, to condense the information.
So it would be naturally, to leave the lowlevel processing of the three branches independent and then concatenate them later on.
Something along the lines of:
Image -> conv -> ... -> conv -> dense -> ... -> dense -> imValues
Timeseries -> conv -> ... -> conv -> dense ... -> dense -> seriesValues
concatLayer([imValues, seriesValues, Scalar] -> dense -> ... -> dense with num_units=1
Another less often reasonable Option would be, to add the Information at the lowlevel processing of the Image. This might make sense, if the local processing is much easier, given the knowledge of the scalar/timeseries.
This architecture might look like:
concatLayer(seriesValues, scalar) -> dense -> ... -> reshape((-1, N, 1, 1))
-> Upscale2DLayer(Image.shape[2:3]) -> globalInformation
concatLayer([globalInformation, Image]) -> 2D conv filtersize=1 -> conv -> ... -> conv
Note that you will almost certainly want to go with the first Option.
One unrelated Thing I noticed, is the huge size of your Input Image. You should reduce it(resizing/patches). Unless you have a gigantic load of data and tons of Memory and computing power, you will otherwise either overfit or waste Hardware.

Why do we need to have absolute values of V(number of Vertices) and E(Number of Edges) in upper bound of various Graph algorithms

I have been reading through the Graph Algorithms recently and saw the notation for various upper bounds of graph algorithms is of the form O(|V| + |E|). especially in DFS/BFS search algorithms where linear time is achieved with above upper bound.
I have seen both the notations interchangeably used, i.e. O(V+E) as well. as far as I understand "|" bar notation is used for absolute values in math world. if V = # of vertices and E = # of Edges, how can they be negative numbers, such that we have need to get the absolute values before computing the linear function. Please help.
|X| refers to the cardinality (size) of X when X is a set.
O(V+E) is technically incorrect, assuming that V and E refer to sets of vertices and edges. This is because the value inside O( ) should be quantitative, rather than abstract sets of objects that have an ambiguous operator applied to them. |V| + |E| is well-defined to be one number plus another, whereas V + E could mean a lot of things.
However, in informal scenarios (e.g. conversing over the internet and in person), many people (including me) still say O(V+E), because the cardinality of the sets is implied. I like to type fast and adding in 4 pipe characters just to be technically correct is unnecessary.
But if you need to be technically correct, i.e. you're in a formal environment, or e.g. you're writing your computer science dissertation, it's best to go with O(|V|+|E|).
In this case, the vertical bars || denote the cardinality or number of elements of a set (i.e. |E| represents the count of elements in the set E).
http://en.wikipedia.org/wiki/Cardinality

pattern matching

Suppose I have a set of tuples like this (each tuple will have 1,2 or 3 items):
Master Set:
{(A) (A,C) (B,C,E)}
and suppose I have another set of tuples like this:
Real Set: {(BOB) (TOM) (ERIC,SALLY,CHARLIE) (TOM,SALLY) (DANNY) (DANNY,TOM) (SALLY) (SALLY,TOM,ERIC) (BOB,SALLY) }
What I want to do is to extract all subsets of Tuples from the Real Set where the tuple members can be substituted to become the same as the Master Set.
In the example above, two sets would be returned:
{(BOB) (BOB,SALLY) (ERIC,SALLY,CHARLIE)}
(let BOB=A,ERIC=B,SALLY=C,CHARLIE=E)
and
{(DANNY) (DANNY,TOM) (SALLY,TOM,ERIC)}
(let DANNY=A,SALLY=B,TOM=C,ERIC=E)
Its sort of pattern matching, sort of combinatorics I guess. I really don't know how to classify this problem and what common plans of attack there are for it. What would the stackoverflow experts suggest?
Seperate your tuples into sets by size. Within each set, create a data structure that allows you to efficiently query for tuples containing a given element. The first part of this structure is your tuples as an array (so that each tuple has a cannonical index). The second set is: Map String (Set Int). This is somewhat space intensive but hopefully not prohibative.
Then, you, essentially, brute force it. For all assignments to the first master set, restrict all assignments to other master sets. For all remaining assignments to the second, restrict all assignments to the third and beyond, etc. The algorithm is basically inductive.
I should add that I don't think the problem is NP-complete so much as just flat worst-case exponential. It's not a decision problem, but an enumeration problem. And it's fairly easy to imagine scenarios of inputs that blow up exponentially.
It will be difficult to do efficiently since your problem is probably NP-complete (it includes subgraph isomorphism as a special case). That assumes the patterns and database both vary in size, though. How much data are you searching? How complicated will your patterns be? I would recommend the brute force solution first, then test if that is too slow and you need something fancier.

Find number range intersection

What is the best way to find out whether two number ranges intersect?
My number range is 3023-7430, now I want to test which of the following number ranges intersect with it: <3000, 3000-6000, 6000-8000, 8000-10000, >10000. The answer should be 3000-6000 and 6000-8000.
What's the nice, efficient mathematical way to do this in any programming language?
Just a pseudo code guess:
Set<Range> determineIntersectedRanges(Range range, Set<Range> setofRangesToTest)
{
Set<Range> results;
foreach (rangeToTest in setofRangesToTest)
do
if (rangeToTest.end <range.start) continue; // skip this one, its below our range
if (rangeToTest.start >range.end) continue; // skip this one, its above our range
results.add(rangeToTest);
done
return results;
}
I would make a Range class and give it a method boolean intersects(Range) . Then you can do a
foreach(Range r : rangeset) { if (range.intersects(r)) res.add(r) }
or, if you use some Java 8 style functional programming for clarity:
rangeset.stream().filter(range::intersects).collect(Collectors.toSet())
The intersection itself is something like
this.start <= other.end && this.end >= other.start
This heavily depends on your ranges. A range can be big or small, and clustered or not clustered. If you have large, clustered ranges (think of "all positive 32-bit integers that can be divided by 2), the simple approach with Range(lower, upper) will not succeed.
I guess I can say the following:
if you have little ranges (clustering or not clustering does not matter here), consider bitvectors. These little critters are blazing fast with respect to union, intersection and membership testing, even though iteration over all elements might take a while, depending on the size. Furthermore, because they just use a single bit for each element, they are pretty small, unless you throw huge ranges at them.
if you have fewer, larger ranges, then a class Range as describe by otherswill suffices. This class has the attributes lower and upper and intersection(a,b) is basically b.upper < a.lower or a.upper > b.lower. Union and intersection can be implemented in constant time for single ranges and for compisite ranges, the time grows with the number of sub-ranges (thus you do not want not too many little ranges)
If you have a huge space where your numbers can be, and the ranges are distributed in a nasty fasion, you should take a look at binary decision diagrams (BDDs). These nifty diagrams have two terminal nodes, True and False and decision nodes for each bit of the input. A decision node has a bit it looks at and two following graph nodes -- one for "bit is one" and one for "bit is zero". Given these conditions, you can encode large ranges in tiny space. All positive integers for arbitrarily large numbers can be encoded in 3 nodes in the graph -- basically a single decision node for the least significant bit which goes to False on 1 and to True on 0.
Intersection and Union are pretty elegant recursive algorithms, for example, the intersection basically takes two corresponding nodes in each BDD, traverse the 1-edge until some result pops up and checks: if one of the results is the False-Terminal, create a 1-branch to the False-terminal in the result BDD. If both are the True-Terminal, create a 1-branch to the True-terminal in the result BDD. If it is something else, create a 1-branch to this something-else in the result BDD. After that, some minimization kicks in (if the 0- and the 1-branch of a node go to the same following BDD / terminal, remove it and pull the incoming transitions to the target) and you are golden. We even went further than that, we worked on simulating addition of sets of integers on BDDs in order to enhance value prediction in order to optimize conditions.
These considerations imply that your operations are bounded by the amount of bits in your number range, that is, by log_2(MAX_NUMBER). Just think of it, you can intersect arbitrary sets of 64-bit-integers in almost constant time.
More information can be for example in the Wikipedia and the referenced papers.
Further, if false positives are bearable and you need an existence check only, you can look at Bloom filters. Bloom filters use a vector of hashes in order to check if an element is contained in the represented set. Intersection and Union is constant time. The major problem here is that you get an increasing false-positive rate if you fill up the bloom-filter too much.
Information, again, in the Wikipedia, for example.
Hach, set representation is a fun field. :)
In python
class nrange(object):
def __init__(self, lower = None, upper = None):
self.lower = lower
self.upper = upper
def intersection(self, aRange):
if self.upper < aRange.lower or aRange.upper < self.lower:
return None
else:
return nrange(max(self.lower,aRange.lower), \
min(self.upper,aRange.upper))
If you're using Java
Commons Lang Range
has a
overlapsRange(Range range) method.

Resources