ArangoDB pedigree graph traversal - graph

I have an example pedgree with a structure as shown here.
My ultimate goal is to extract the ancestry of certain people in the so-called trio format, which is a table with columns id mom dad.
In my example, the result for the pedigree of the two most recent persons G and H would be
+-----+-----+-----+
| id | mom | dad |
+-----+-----+-----+
| D | A | B |
| E | C | B |
| G | D | E |
| H | F | E |
+-----+-----+-----+
The closest thing I could come up with in AQL is the following query.
LET last_generation = ['people/G', 'people/H']
FOR person IN last_generation
FOR v, e, p in 1..10 OUTBOUND person is_mom, is_dad
LET role = contains('mom', e._id) ? 'mom': 'dad'
SORT e._from DESC
RETURN DISTINCT {'id': DOCUMENT('people', e._from)._key,
'parent': DOCUMENT('people', e._to)._key,
'role': role}
Altough the result is not yet in the right format, post-processing is easy.
Now my questions are:
I am forced to use the DISTINCT keyword to ensure uniqueness of rows. However, I would like to avoid unnesseary traversal in the first place rather than filtering. Ideally, I think I need the option uniqueEdges: "global", which is sadly not availabe any more. For instance, after having processed the pedigree of person G, I don't want to traverse the part of the pedigree shared between G and H (i.e., person E and its parents) again. Using uniqueVertices: "global" is not an option, because I would then miss the edge between H --> E.
Is there some kind of option to know the edge collection type during a traversal rather than using the kind of cumbersome checking I do? Please note that it is not an option for me to put the sex into a property of the person (which is reasonable for most humans), because in reality I am dealing with plants, which can (usually) be mother and father at the same time.

Related

query dynamoDB for values between X an Y

I have a DynamoDB table like depicted in the attached image and I'm looking for ways to query the table based on lon and lat fields. More specifically, I'm looking for all the results with Lon between X and Y and Lat between A and B.
Is there any way to do that ? I created indexes for both Lon and Lat but the values are strings.
Thanks a lot for your help !!
you can come up with a good hash function(lets say f) and have the below schema for dynamodb
| pk | sk | lat | lon | name
| hashvalue1 | 48.80#2.35#Fac | 48.80 | 2.35 | Fac du
| hashvalue1 | 48.83#2.36#Groupe| 48.83 | 2.36 | Groupe Hos
here f(48.80, 2.35) = hashvalue1
f(48.83, 2.36) = hashvalue1
And whenever you have to query for lat1 and lon1, calculate f(lat1, lon1) and query the db.
But the problem with this solution is coming up with a good hashing function because in the worst case you may have to recalculate hash of every entered value in db otherwise it may become a hot key. this approach is well documented here and here.
I would suggest go with elastic search, it will give you much more flexibility. in terms of future use cases.

Eliminate Left Recursion in a Context Free Grammar

I understood Left Recursive Grammar (LRG) and how to remove it.
But i dont know how to remove recursive grammar which combine both left and right recursive:
A -> aAb | c
The full question is construct parsing table of LL(1) grammar of this:
S -> aABb
A -> aAb | e (epsilon)
B -> bB | c

How to get average of last N numbers in a stream with static memory

I have a stream of numbers and in every cycle I need to count the average of last N of them. This can be, of course, solved using an array where I store the last N numbers and in every cycle I shift it, add the new one and count the average.
N = 3
+---+-----+
| a | avg |
+---+-----+
| 1 | |
| 2 | |
| 3 | 2.0 |
| 4 | 3.0 |
| 3 | 3.3 |
| 3 | 3.3 |
| 5 | 3.7 |
| 4 | 4.0 |
| 5 | 4.7 |
+---+-----+
First N numbers (where there "isn't enough data for counting the average") doesn't interest me much, so the results there may be anything/undefined.
My question is, can this be done without using an array, that is, with static amount of memory? If so, then how?
I'll do the coding myself - I just need to know the theory.
Thanks
Think of this as a black box containing some state. If you control the input stream, you can draw conclusions on the state. In your sliding window array-based approach, it is kind of obvious that if you feed a bunch of zeros into the algorithm after the original input, you get a bunch of averages with a decreasing number of non-zero values taken into account. The last one has just one original non-zero value, so if you multiply that my N you get the last input back. Using that and the second-to-last output which accounts for two non-zero inputs, you can reconstruct the second-to-last input, and so on.
So essentially your algorithm needs to maintain sufficient state to reconstruct the last N elements of input, at least if you formulate it as an on-line algorithm. I don't think an off-line algorithm can do any better, except if you consider it reading the input multiple times, but I don't have as strong an agument for that.
Of course, in some theoretical models you can avoid the array and e.g. encode all the state into a single arbitrary length integer, but that's just cheating the theory, and doesn't make any difference in practice.

sum and distinct-count measures (star schema design koan)

I am quite a beginner in Data Warehouse Design. I have red some theory, but recently met a practical problem with a design of a OLAP cube. I use star schema.
Lets say I have 2 dimension tables and 1 fact table:
Dimension Gazetteer:
dimension_id
country_name
province_name
district_name
Dimension Device:
dimension_id
device_category
device_subcategory
Fact table:
gazetteer_id
device_dimension_id
hazard_id (measure column)
area_m2 (measure column)
A "business object" (which is a mine field actually) can have multiple devices, is located in a single location (Gazetteer) and ocuppies X square meters.
So in order to know which device categories there are, I created a fact per each device in hazard like this:
+--------------+---------------------+-----------------------+-----------+
| gazetteer_id | device_dimension_id | hazard_id | area_m2 |
+--------------+---------------------+-----------------------+-----------+
| 123 | 321 | 0a0a-502c-11aa1331e98 | 6000 |
+--------------+---------------------+-----------------------+-----------+
| 123 | 654 | 0a0a-502c-11aa1331e98 | 6000 |
+--------------+---------------------+-----------------------+-----------+
| 123 | 987 | 0a0a-502c-11aa1331e98 | 6000 |
+--------------+---------------------+-----------------------+-----------+
I defined a measure "number of hazards" as distinct-count of hazard_id.
I also defined a "total area occupied" measure as a sum of area_m2.
Now I can use the dimension gazetteer and device and know how many hazards there are with given dimension members.
But the problem is the area_m2: because it is defined as a sum, it gives a value n-times higher than the actual area, where n is th number of devices of the hazard object. For example, with the data above would give 18000m2.
How would you solve this problem?
I am using the Pentaho stack.
Thanks in advance
[moved from comment]
If a hazard-id is a minefield, and you're looking at mines-by-region(gazetter) & size-of-minefields-by-gazetteer, maybe you could make a Hazard dimension, which holds the area of the Hazard; or possibly make a Null-device entry in the DeviceDimension table, and only the Null-device entry gets the area_m2 set, the real devices get area_m2=0.
If you need to answer queries like: total area of minefields containing device 321, the second approach isn't going to easily answer these questions, which suggests that making a Hazard dimension might be a better approach.
I would also consider adding a device-count fact, which could have the num devices of each type per hazard.

How to create a matrix with dynamic rows and columns in ASP.NET?

I have to make a control in ASP.NET that allows me to create a matrix. I have a list of strings (obtained from a method) that will be the rows (each string is one row), and I have another list of strings (obtained from other method) that will be the columns (each string is one column). After that, depending on the row-cloumn cross I have to put an image in that position, something like this:
x | y | z
a | OK | OK | BAD|
------------------
b | OK |BAD | OK |
------------------
c |BAD |BAD | BAD|
How can I achieve this? Thanks a lot in advance!
You can use nested Repeaters.
The outer repeater for rows, the inner one for columns/cells.

Resources