sum graph from bottom to top - graph

given the following graph modeled in neo4j
goal:
calculate the sum of all nodes multiplied by the edge percentage from the bottom up.
e.g.
(((30*.6)+(50*.1)+100)*.5)+10)=71,5
status:
I found the REDUCE function (http://neo4j.com/docs/stable/query-functions-collection.html#functions-reduce)
but in my opinion it sums from top to the bottom, instead of bootom up.
Is this a commen problem with a well known name, and I dont know it?
Is there any solution in neo4j or in another (graph)database/language?

This was a really interesting one :
I assumed 2 things, first all nodes have the :A label, second the property on nodes and relationship has the key p
Here is a working query :
MATCH p=(:A)-[r]->(pike)
WITH pike, collect(p) as paths
OPTIONAL MATCH (pike)-[r]->()
WITH
CASE r WHEN null THEN 1 ELSE r.p END as multiplier,
CASE r WHEN null THEN last(nodes(paths[0])).p
ELSE reduce(x=0, path in paths | x + (head(nodes(path)).p * head(rels(path)).p)) + last(nodes(paths[0])).p END as total
RETURN sum(total*multiplier) as total
The logic behind :
Find one depths paths, agreggate the children by the pike (first WITH)
In case the optional match doesn't pass, the multiplier will be 1 instead of a possible float value on the relationship property
The second case, do the math logic, if this is the top of the pikes (hence here node A) it will just add the value of the top node, otherwise it will take the value of the children
Then it sum the score + the multiplication
You can test it here : http://console.neo4j.org/r/ih8obf

Related

Graph - Algorithm to position nodes

I am trying to create a dynamic graph where users can add new nodes using ELK.js
The graph is a tree that has one root node. I am trying to set the position of nodes in a row using (x,y) position (y is not important for now).
Assumptions:
Lower x value brings the node to the left.
Two nodes in one row can't have the same x value.
When we add a new node it should appear on the right of other children if available (the green box in row 2 and 3 for example are a new nodes)
New nodes can be added to every row at any moment (green box in row 2 and row 3)
Max number we can use to set x value is 16 digit long: 9999999999999999
A simple example of how positions behave can be found here (see the position of nodes n2, n3, n4 and change them in JSON)
I am trying not to calculate every position of every node in a row. I tried a lot of different numbers but I stuck and need fresh ideas.
Any help would be appreciated. Thank you
You could approach this as follows:
When a new node is the first one on its level, give it 0 as its x-value.
When it is not the first, find out what its immediate two siblings are on that level (one at the left, one at the right of the node). In some cases you'll need to traverse from the node via one or more of its ancestor(s) to find such immediate sibling.
Get the x-values of these two siblings, and take the average of those two values for the new node's x value.
It might be that there is only a sibling at one side. If there is no sibling at the right side, take the average between the left siblings's x-value and 1016. If it is the left sibling that is missing, take the average between the right siblings's x-value and -1016.
This practically means you use an initial range of -1016...1016 and keep cutting segments in half when a new node must be placed within a segment.

Shortest distance to cover all the (N+1) points. All the N points lie on x- axis. Remaining one point lies anywhere in the coordinate plane

Given (N+1) points. All the N points lie on x- axis. Remaining one point (HEAD point) lies anywhere in the coordinate plane.
Given a START point on x- axis.
Find the shortest distance to cover all the points starting from START point.We can traverse a point multiple times.
Example N+1=4
points on x axis
(0,1),(0,2),(0,3)
HEAD Point
(1,1) //only head point can lie anywhere //Rest all on x axis
START Point
(0,1)
I am looking for a method as of how to approach this problem.
Whether we should visit HEAD point first or HEAD point in between.
I tried to find a way using Graph Theory to simplify this problem and reduce the paths that need to be considered. If there is an elegant way to represent this problem using graphs to identify a solution, I was not able to find it. This approach becomes very inefficient as the n increases - the time and memory is O(2^n).
Looking at this as a tree graph, the root node would be the START point, then each of its child nodes would be the points it is connected to.
Since the START point and the rest of the points aside from the HEAD all lay on the x-axis, all non-HEAD points only need to be connected to adjacent points on the x-axis. This is because the distance of the path between any two points is the sum of the distances between any adjacent points along the path between those two points (the subset of nodes representing points on the x-axis does not need to form a complete graph). This reduces the brute force approach some.
Here's a simple example:
The upper left shows the original problem: points on the x-axis along with the START and HEAD points.
In the upper right, this has been transformed into a graph with each node representing a point from the original problem. The edges represent the paths that can be taken between points. This assumes that the START point only represents the first point in the path. Unlike the other nodes, it is only included in the path once. If that is not the case and the path can return to the START point, this would approximately double the possible paths, but the same approach can be followed.
In the bottom left, the START point, a, is the root of a tree graph, and each node connected to the START point is a child node. This process is repeated for each child node until either:
A path that is obviously not optimal is identified, in which case that node can just be excluded from the graph. See the nodes in red boxes; going back and forth between the same nodes is unnecessary.
All points are included when traversing the tree from the root to that node, producing a potential solution.
Note that when creating the tree graph, each time a node is repeated, its "potential" child nodes are the same as the first time the node was included. By "potential", I mean cases above still need to be checked, because the result might include a nonsensical path, in which case that node would not be included. It is also possible a potential solution results from the path after its child nodes are included.
The last step is to add up the distances for each of the potential solutions to determine which path is shortest.
This requires a careful examination of the different cases.
Assume for now START (S) is on the far left, and HEAD (H) is somewhere in the middle the path maybe something like
H
/ \
S ---- * ----*----* * --- * ----*
Or it might be shorter to from H to and from the one of the other node
H
//
S ---- * --- * -- *----------*---*
If S is not at one end you might have something like
H
/ \
* ---- * ----*----* * --- * ----*
--------S
Or even going direct from S to H on the first step
H
/ |
* ---- * ----*----* |
S
A full analysis of cases would be quite extensive.
Actually solving the problem, might depend on the number of nodes you have. If the number is small < 10, then compete enumeration might be possible. Just work out every possible path, eliminate the ones which are illegal, and choose the smallest. The number of paths is I think in the order of n!, so its computable for small n.
For large n you can break the problem into small segments. I think its enough just to consider a small patch with nodes either side of H and a small patch with nodes either side of S.
This is not really a solution, but a possible way to think about tackling the problem.
(To be pedantic stackoverflow.com is not the right site for this question in the stack exchange network. Computational Science : algorithms might be a better place.
This is a fun problem. First, lets try to find a brute force solution, as Poosh did.
Observations about the Shortest Path
No repeated points
You are in an Euclidean geometry, thus the triangle inequality holds: For all points a,b,c, the distance d(a,b) + d(b,c) <= d(a,c). Thus, whenever you have an optimal path that contains a point that occurs more than once, you can remove one of them, which means it is not an optimal path, which leads to a contradiction and proves that your optimal path contains each point exactly once.
Permutations
Our problem is thus to find the permutation, lets call it M_i, of the numbers 1...n for points P1...Pn (where P0 is the fixed start point and Pn the head point, P1...Pn-1 are ordered by increasing x value) that minimizes the sum of |(P_M_i)-(P_M_(i-1))| for i from 1 to n, || being the vector length sqrt(v_x²+v_y²).
The number of permutations of a set of size n is n!. In this case we have n+1 points, so a brute force approach testing all permutations would have complexity (n+1)!, which is higher than even 2^n and definitely not practical, so we need further observations to improve this.
Next Steps
My next step would now be to see if there are any other sequences that can be proven to be not optimal, leading to a reduction in the number of candidates to be tested.
Paths of non-head points
Lets look at all paths (sequences of indices of points that don't contain a head point and that are parts of the optimal path. If we don't change the start and end point of a path, then any other transpositions have no effect on the outside environment and we can perform purely local optimizations. We can prove that those sequences must have monotonic (increasing or decreasing) x coordinate values and thus monotonic indices (as they are ordered by ascending x coordinate between indices 0 and n-1):
We are in a purely one dimensional subspace and the total distance of the path is thus equal to the sum of the absolute values of the differences in x coordinates between one such point and the next. It is clear that this sum is minimized by ordering by x coordinate in either ascending or descending order and thus ordering the indices in the same way. Note that this is true for maximal such paths as well as for all continuous "subpaths" of them.
Wrapping it up
The only choices we have left are:
where do we place the head node in the optimal path?
which way do we order the two paths to the left and right?
This means we have n values for the index of the head node (1...n, 0 is fixed as the start node) and 2x2 values for sort order. So we have 4n choices which we can all calculate and pick the shortest one. One of the sort orders probably determines the other but I leave that to you.
Anyways, the complexity of this algorithm is O(4n) = O(n). Because reading in the input of the problems is in O(n) and writing the output is as well, I believe that is an algorithm of optimal complexity. However, if we could reformulate the problem somewhat, so that we could read and write the input and output in some compressed form, as in only the parameters that we actually need to solve the problem, then it is possible that we could do better.
P.S.: I'm not a mathematician so I probably used wrong words for some concepts and missed the usual notation for the variables and functions. I would be glad for some expert to check this for any obvious errors.

Add or update an edge in Gremlin

I have a graph database in Azure CosmoDB that stores how similar vertices are and the edge contains the numeric value of how similar they are.
The complication is that I want to add an edge or update by incrementing the similarity value. This is the current code I have for adding:
g.V('A').addE('similar').to(g.V('B')).property('x', 10)
I need something that will increase X if the edge exists, or create it if not. Pseudo code exemplifying it:
g.V('A').updateE('similar').to(g.V('B')).property('x', currentValue+2).ifNulll({g.V('A').addE('similar').to(g.V('B')).property('x', 10)})
is there an easy way to achieve this ?
There is a known recipe for checking existence by using the pattern fold().coalesce(unfold()...) for vertex and similar pattern for edge. In your case, you want to update a property value as well.
Assuming there are vertices with IDs A and B, and edge goes from A-->B, the query might look like this:
g.withSack(0).V('B').as('toV').V('A').coalesce(
outE('similar').where(inV().as('toV')),
addE('similar').to('toV').property('similarity', 0)
)
.sack(assign).by('similarity').sack(sum).by(constant(2))
.property('similarity', sack())
Explanation:
Get B (target) vertex and save reference, get A (source) vertex.
Using coalesce, check if there is an edge connecting them, otherwise create such edge and set similarity to 0.
Assign the 'similarity' value to the sack, and sum the additional value.
Last, store the new value on the edge property.
Another query without using sack:
g.V('B').as('toV').V('A').coalesce(
outE('similar').where(inV().as('toV')),
addE('similar').to('toV').property('similarity', 0)
).property('similarity', values('similarity').fold(10, sum))

Programming Dot Probe for Psychopy in Builder

I am new to using PsychoPy and I have programmed a few simple tasks. I am currently really struggling to program a word dot probe. I do not want to use coder, simply because the rest of my research team need to be able to easily edit the program, and work and use it.
In case anyone is wondering what my specific problem is, I cannot seem to get the pictures to load at the same time correctly and do not know how to get a probe to appear behind one of the pictures once the pictures have disappeared.
Timing
The timing issue can be solved by inserting an ISI period in the beginning of the trial, e.g. during a fixation cross. This allows psychopy to load the images in the background so that they are ready for presentation.
Truly random dot position
In your case, you want the dot position to be random, independently of image. This is one of the cases that TrialHandler does not handle and I suspect you need to insert a code component to make this work. For true randomness but only 50% probability in the limit of infinite trials, simply put this in a code component under "begin routine":
x = (np.random.binomial(1, prob) - 0.5) * xdist
y = 0
dot.pos = [x, y]
and change dot to the name of your dot stimulus, y is the vertical offset, x is the horizontal offset (here varying between trials), xdist is the distance between the dot positions, and prob is the chance of the dot appearing to the right. You probably want to set this to 0.5, i.e. 50 %.
Balanced dot position
If you want the dot to appear at each side exactly the same number of times, you can do the following in the code component:
Under "begin experiment", make a list with the exact length of the number of trials:
dotPos = [0, 1] * int(round(numberOfTrials/2)) # create the correct number of left/right (coded as 0 and 1). [0,1] yields 50%. [0,0,0,1] and /4 would yield 25 % etc.
np.random.shuffle(dotPos) # randomize order
Then under "begin routine" do something akin to what we did above:
x = (dotPos.pop() - 0.5) * xdist # dotPos.pop() takes returns the last element while removing it from the list.
y = 0
dot.pos = [x, y]
Naturally, if the number of trials is uneven, one position will be occupied one more time than the other.
Two dot positions for each condition
For the record, if the dot position is to be shown at each position for each image-combination, simply count each of these situations as conditions, i.e. give them a separate rows in the conditions file.

How is this Huffman Table created?

I have a table that shows the probability of an event happening.
I'm fine with part 1, but part 2 is not clicking with me. I'm trying to get my head around how
the binary numbers are derived in part 2?
I understand 0 is assigned to the largest probability and we work back from there, but how do we work out what the next set of binary numbers is? And what do the circles around the numbers mean/2 shades of grey differentiate?
It's just not clicking. Maybe someone can explain it in a way that will make me understand?
To build huffman codes, one approach is to build a binary tree, using a priority queue, in which the data to be assigned codes are inserted, sorted by frequency.
To start with, you have a queue with only leaf nodes, representing each of your data.
At each step you take the two lowest priority nodes from the queue, make a new node with a frequency equal to the sum of the two removed nodes, and then attach those two nodes as the left and right children. This new node is reinserted into the queue, according to it's frequency.
You repeat this until you only have one node in the queue, which will be the root.
Now you can traverse the tree from the root to any leaf node, and the path you take (whether you go left or right) at each level gives you either a 0 or a 1, and the length of the path (how far down the tree the node is) gives you the length of the code.
In practice you can just build this code as you build the tree, but appending 0 or 1 to the code at each node, according to whether the sub-tree it is part of is being added to the left or the right of some new parent.
In your diagram, the numbers in the circles are indicating the sum of the frequency of the two nodes which have been combined at each stage of building the tree.
You should also see that the two being combined have been assigned different bits (one a 0, the other a 1).
A diagram may help. Apologies for my hand-writing:

Resources