On Neo4j, running gds.beta.graphSage.mutate yields inconsistent results: for the same model, a given node n gets different embeddings between runs.
This happens despite the graph being constant (nodes, arcs, initial features, they are all the same across runs, as well as the model).
The weird thing is that, in this scenario, node n in some runs obtains the "original embeddings" (the ones obtained during the training run).
I assume this behaviour is dictated by some randomness in assigning initial parameters, however I cannot find at which point this happens.
Is there any way to achieve consistency between runs in Neo4j GraphSage?
Related
I would like to perform some optimizations by minimizing the maximum of a specific path variable within Dymos. or the maximum of the absolute of such a variable.
In linear programming methods, this can be done by introducing slack variables.
Do you know if this has been attempted before with Dymos, or if there was a reason not to include it?
I understand gradient based methods are not entirely suitable for these problems, though I think some "functions" can be introduced to mitigate this.
For example,
The space shuttle reentry problem from [Betts][1] used as a [test example][2] in dymos, the original source contains an example where the maximum heat flux is minimized. Such functionality could be implemented with the "loc" argument as:
phase.add_objective('q_c', loc='max')
[1]: J. Betts. Practical Methods for Optimal Control and Estimation Using Nonlinear Programming. Society for Industrial and Applied Mathematics, second edition, 2010. URL: https://epubs.siam.org/doi/abs/10.1137/1.9780898718577, arXiv:https://epubs.siam.org/doi/pdf/10.1137/1.9780898718577, doi:10.1137/1.9780898718577.
[2]: https://openmdao.github.io/dymos/examples/reentry/reentry.html
This has been done with pseudospectral methods before. Dymos currently doesn't have any direct way of implementing this, for a few reasons:
As you said, doing this naively can introduce discontinuous gradients that confuse the optimizer. When the node at which the maximum occurs switches, this tends to cause a sharp edge discontinuity in the gradient.
Since the pseudospectral methods are discrete, you cannot guarantee that the maximum will occur at a node. It's often fine to assume it does, but sometimes your requirements might demand more precision.
There are two possible ways to get around this.
The KSComp in OpenMDAO can be used as a "differentiable maximum". Add one after the trajectory, feed it the timeseries data for the output of interest, and set it up such that it returns a smooth approximation to the maximum. The KS function is a bit conservative, so it won't pick out the precise maximum, but depending on the value of the rho option it can be tuned to get pretty close.
When a more precise value of a maximum is needed, it's pretty common to set up a trajectory such that a phase ends when the maximum or minimum is reached.
If the variable whose maximum is being sought is a state, this can be done by adding a boundary constraint on the rate source for that state.
This ensures that the maximum occurs at the first or last node in the phase (depending on if its an initial or final boundary constraint). That lets you more accurately capture its value.
If the variable being sought is not a state, its possible to use the polynomials that are used for fitting states and controls in a phase to interpolate the variable of interest. By then taking the time derivative of that polynomial we can get a reasonably good approximation for its rate. The master branch of dymos has a method add_timeseries_rate_output that does this. And soon, within a few weeks hopefully, we'll add add_boundary_rate_constraint so that these interpolated rates can be easily used as boundary constraints.
In the meantime, you should be able to achieve this by adding the timeseries rate output and then manually applying the OpenMDAO method 'add_constraint' to the resulting timeseries output, using either indices=[0] or indices=[-1] to treat it as an initial or final constraint.
This is a common enough request that we'll add some documentation on how to achieve this behavior using both the KSComp approach and the boundary constraint approach.
Personally I'm not as much of a fan of KSComp because I've had trouble getting problems getting those types of objectives to converge in the past. I've used the slack variable and that has worked well. In the following example, we take a guess at the Rotor power in static analysis, and then we run a trajectory and get the actual rotor power during the mission. The objective was to minimize aircraft weight, so if you have a large amount of power in statics, that costs more weight. The constraint shown below prevents us from decreasing our updated guess of rotor power in statics below the maximum power required during the trajectory.
p.model.add_subsystem(
'static_power_check',
om.ExecComp('Power_check = Power_ODE - Power_statics',
Power_check = {'value':np.ones(nn_timeseries_main_tx), 'units':'kW'},
Power_ODE = {'value':np.ones(nn_timeseries_main_tx), 'units':'kW'},
Power_statics = {'value':0.0, 'units':'kW'}),
promotes_inputs=[
('Power_ODE','hop0.main_phase.timeseries.Power_R'), ('Power_statics','Power_{rotor,slack}')],
promotes_outputs=['Power_check'])
p.model.add_constraint('Power_check', upper=0, ref=1)
The constraint on the slack variable effectively helped us ensure that our slack rotor power matched the maximum rotor power during the mission. This allowed us to get the right sizes for the rotor parts (i.e. motors).
When visualizing the structure of the circuit tutorial via an N² diagram, I noticed that implicit components with indexed inputs/outputs labelled using the pattern x:y (e.g. I_out:0 of n1) do not display connections into the output of the block (in this case V of n1).
I understand that it is computing the residuals with the inputs and some initial "guess" to provide the output, so is this by design for ImplicitComponent because the connections are implicit? I tend to use the diagrams for debugging, and seeing no connections to the output makes it look unclear if it's connected, even though the inputs are fed into it and the code processes it via the residual equation correctly.
This is a known bug in OpenMDAO 2.9.1, but has been fixed already on OpenMDAO master. So the next release, due out before the end of Feb 2020 (2.10) should have the issue fixed.
I am totally new to NN and want to classify the almost 6000 images that belong to different games (gathered by IR). I used the steps introduced in the following link, but I get the same training accuracy in each round.
some info about NN architecture: 2 convloutional, activation, and pooling layers. Activation type: relu, Number of filters in first and second layers are 30 and 70 respectively.
2 fully connected layer with 500 and 2 hidden layers respectively.
http://firsttimeprogrammer.blogspot.de/2016/07/image-recognition-in-r-using.html
I had a similar problem, but for regression. After trying several things (different optimizers, varying layers and nodes, learning rates, iterations etc), I found that the way initial values are given helps a lot. For instance I used a -random initializer with variance of 0.2 (initializer = mx.init.normal(0.2)).
I came upon this value from this blog. I recommend you read it. [EDIT]An excerpt from the same,
Weight initialization. Worry about the random initialization of the weights at the start of learning.
If you are lazy, it is usually enough to do something like 0.02 * randn(num_params). A value at this scale tends to work surprisingly well over many different problems. Of course, smaller (or larger) values are also worth trying.
If it doesn’t work well (say your neural network architecture is unusual and/or very deep), then you should initialize each weight matrix with the init_scale / sqrt(layer_width) * randn. In this case init_scale should be set to 0.1 or 1, or something like that.
Random initialization is super important for deep and recurrent nets. If you don’t get it right, then it’ll look like the network doesn’t learn anything at all. But we know that neural networks learn once the conditions are set.
Fun story: researchers believed, for many years, that SGD cannot train deep neural networks from random initializations. Every time they would try it, it wouldn’t work. Embarrassingly, they did not succeed because they used the “small random weights” for the initialization, which works great for shallow nets but simply doesn’t work for deep nets at all. When the nets are deep, the many weight matrices all multiply each other, so the effect of a suboptimal scale is amplified.
But if your net is shallow, you can afford to be less careful with the random initialization, since SGD will just find a way to fix it.
You’re now informed. Worry and care about your initialization. Try many different kinds of initialization. This effort will pay off. If the net doesn’t work at all (i.e., never “gets off the ground”), keep applying pressure to the random initialization. It’s the right thing to do.
http://yyue.blogspot.in/2015/01/a-brief-overview-of-deep-learning.html
My model failed with the following error:
Compiling rjags model...
Error: The following error occured when compiling and adapting the model using rjags:
Error in rjags::jags.model(model, data = dataenv, inits = inits, n.chains = length(runjags.object$end.state), :
Error in node Y[34,10]
Observed node inconsistent with unobserved parents at initialization.
Try setting appropriate initial values.
I have done some diagnosis and found that there was a problem with initial values in chain 3. However, this can happen from time to time. Is there any way to tell run.jags or JAGS itself to re-try and re-run the model in such cases? For example, to tell him to make another N attempts to initialize the model properly. That would be very logical thing to do instead of just failing. Or do I have to do it manually with some tryCatch thing?
P.S.: note that I am currently using run.jags to run JAGS from R.
There is no facility for that provided within runjags, but it would be fairly simple to write yourself like so:
success <- FALSE
while(!success){
s <- try(results <- run.jags(...))
success <- class(s)!='try-error'
}
results
[Note that if this model NEVER works, the loop will never stop!]
A better idea might be to specify an initial values function/list that provides initial values that are guaranteed to work (if possible).
In runjags version 2, it will be possible to recover successful simulations when some simulations have crashed, so if you ran (say) 5 chains in parallel then if 1 or 2 crashed you would still have 3 or 4. That should be released within the next couple of weeks, and contains a large number of other improvements.
Usually when this error occurs it is an indication of a serious underlying problem. I don't think a strategy of "try again" is useful in general (and especially because default initial values are deterministic).
The default initial values generated by JAGS are given by a "typical" value from the prior distribution (e.g. mean, median, or mode). If it turns out that this is inconsistent with the data then there are typically two possible causes:
A posteriori constraints that need to be taken into account, such as
when modelling censored survival data with the dinterval
distribution
Prior-data conflict, e.g. the prior mean is so far
away from the value supported by the data that it has zero
likelihood.
These problems remain the same when you are supplying your own initial values.
If you think you can generate good initial values most of the time, with occasional failures, then it might be worth repeated attempts inside a call to try() but I think this is an unusual case.
I want to carry out Graph Clustering in a huge undirected graph with millions of edges and nodes. Graph is almost clustered with different clusters joined together only by some nodes(kind of ambiguous nodes which can relate to multiple clusters). There will be very few or almost no edges between two clusters. This problem is almost similar to finding vertex cut set of a graph, with one exception that graph needs to be partitioned into many components(their number being unknown).(Refer this picture https://docs.google.com/file/d/0B7_3zLD0XdtAd3ZwMFAwWDZuU00/edit?pli=1)
Its almost like different strongly connected components sharing a couple of nodes between them and i am supposed to remove those nodes to separate those strongly connected components. Edges are weigthed but this problem is more like finding structures in a graph, so edge weigths won't be of relevance. (Another way to think about the problem would be to visualize Solid Spheres touching each other at some points with Spheres being those strongly connected components and touching points being those ambiguous nodes)
I am prototyping something, so am quiet short of time to pick up Graph Clustering Algorithms by myself and to select the best possible. Plus i need a solution that would cut nodes and not edges since different clusters share nodes and not edges in my case.
Is there any research paper, blog that addresses this or somewhat related problem? Or can anyone come up with a solution to this problem howsoever dirty.
Since millions of nodes and edges are involved, i would need a MapReduce implementation of the solution. Any inputs, links for that too?
Is there any current open source implementation in MapReduce that can i directly use?
I think this problem is analogous to Finding Communities in online social networks by removing vertices.
Your problem is not so simple. I am afraid that it is related to the clique problem, which is NP complete, so unless you quantify somehow the statement "there are almost no edges between the clusters", your problem might be still very difficult. But what I would do in your shoes, would be to try one dirty, greedy approach, namely regarding the nodes as the following kind of quasi-neural net:
Each vertex I would consider to have inputs, outputs and a sigmoid activation function which convert the input value (sum of inputs) into the output value. The output value, and I consider this important, would not be cloned and sent to all the neighbors, but rather divided evenly between the neighbors. In addition to this, I would define a logarithmic decay of activity in a neuron (self-suppression, suppressive connection to itself), defined by a decay parameter global for the net.
Now, I would start simulation with all the neurons starting from activity 0.5 (activity range is 0 to 1) with very high decay parameter, which would lead to all the neuronst quickly stabilizing in 0 state. I would then gradually decrease the decay parameter until the steady state result would yield the first clique with non-zero stable activity.
The question is what to do next. One possibility is to subtract the found clique from the graph and run the same process again until we find all the cliques. This greedy approach might succeed if your graph is indeed as well behaved (really almost clustered) as you say, but might lead to unexpected results otherwise. Another possibility is to give the found clique a unique clique smell that would be repulsive (mutual suppresion) to other cliques an rerun the algorithm until the second clique is found, give it a different clique smell repulsive to all others etc., until each node has its own assigned smell.
I think this would be as many big ideas as i have about this.
The key is, that since it is probably not possible to solve this problem in the general case (likely NP complete), you need to take use of whatever special properties your graph has. That means you need to play with parameters for a while until the algorithm solves 99% of the cases that you encounter. I don't think that it is possible to give the numerically precise answer to your question without long experimentation with the actual datasets that you encounter.
Since millions of nodes and edges are involved, i would need a MapReduce implementation of the solution. Any inputs, links for that too?
In my experience I doubt if using Map/Reduce over here would be truly advantageous. First 10^6 order of nodes isn't really that large [that too in a non hyper-connected graph, since you are considering clustering], and the over head of using Map/Reduce [unless you already have setup your hardware/software for it] for your problem will not be worth it.
Map/Reduce will work much better, where once you have solved the clustering issue, and then want to process each cluster with similar analysis. Basically when you can break your task into relatively isolated sub-tasks, which can be performed in parallel. This of course can be cascaded to several layers.
In a relatively similar situation, I personally first modelled my graph into a graph database (I used Neo4J, and would recommend it highly) and then ran my analytic and queries on it. You will be surprised as to how white board friendly this solution is, and even massively joined and connected queries will be executed near instantaneously especially at the scale of only a few million nodes. For example, you can do a filtered analysis, based on degrees of separation, followed by listing of commons.