Multi-neural network cascade,how to optimize the loss? - networking

Multi-neural network cascade,how to optimize the loss?
Debugged the learning rate and some hyperparameters, but it didn't work for my task.
My network is multi-stage cascade, only the output loss of the last neural network is supervised, but the output of the first layer of the network needs to be available for external use, but it is not supervised, how can I make sure that the output accuracy and effect of the optimized first layer of the network is good enough?
Debugged the learning rate and some hyperparameters, but it didn't work for my task.

Related

Deep Reinforcement Learning (A3C) for Pong diverging (Tensorflow)

I'm trying to implement my own version of the Asynchronous Advantage Actor-Critic method, but it fails to learn the Pong game. My code was mostly inspired by Arthur Juliani's and OpenAI Gym's A3C versions. The method works well for a simple Doom environment (the one used in Arthur Juliani's code), but when I try the Pong game, the method diverges to a policy where it always executes the same action (always move down, or always move up, or always executes the no-op action). My code is located in my GitHub repository.
I have already adapted my network to resemble the architecture used by OpenAI Gym's A3C version, which is:
4 convolutional layers with the same specs, those being: 32 filters, 3x3 kernels, 2x2 strides, with padding (padding='same'). The output of the last convolutional layer is then flattened and fed to a LSTM layer with an output of size 256. The initial states C and H of the LSTM layer are given as an input. The output of the LSTM layer is then separated into two streams: a fully connected layer with an output size equals to the number of actions (policy) and another fully connected layer with only one output (value function) (more details in Network.py of my code);
The loss function used is just as is informed in the original A3C paper. Basically, the policy loss is the log_softmax of the linear policy times the advantage function. The value loss is the square of the difference between the value function and the discounted rewards. The total loss accounts for the value loss, policy loss, and the entropy. The gradients are clipped to 40 (more details in Network.py of my code);
There is only one global network and several worker networks (one network for each worker). Only the global network is updated. This update is done with respect to the local gradients of each worker network. Therefore, each worker simulate the environment for BATCH_SIZE iterations, saving the state, value function, chosen action, reward received, and the LSTM state. After BATCH_SIZE (I used BATCH_SIZE = 20) iterations, each worker pass those data into the network, calculate the discounted rewards, the advantage function, the total loss, and the local gradients. It then updates the global network with those gradients. Finally, the worker's local network is synchronized with the global network (local_net = global_net). All workers does that asynchronously (for more details in this step, check the work and train methods of the Worker class inside the Worker.py);
The LSTM states C and H are reset between episodes. It is also important to note that the current states C and H are kept locally by each worker;
To apply the gradients to the global network, I used the Adamoptimizer with learning rate = 1e-4.
I have already tried different configurations for the network (by trying several different convolutional layers configurations, including different activation functions), other optimizers (RMSPropOptimizer and AdadeltaOptimizer) with different parameters configurations, and different values to BATCH_SIZE. But it almost ends up diverging to a policy where it always executes only one action. I mean always because there are certain configurations where the agent maintains a policy similar to a random policy for several episodes, with no apparent improvements (I waited until 62k episodes before giving up in those cases).
Therefore, I would like to know if anyone have obtained success in training an agent in the Pong game using the A3C with a LSTM layer. If so, what are the parameters used? Any help would be appreciated!
[EDIT] As I said in the comments, I managed to partially solve the problem by feeding the correct LSTM state before calculating the gradients (instead of feeding an initialized LSTM state). This made the method learn reasonably well for the PongDeterministic environment. But the problem persists when I try the Breakout-v0: the agent reaches a mean score of 40 in about 65k episodes, but it seems to stop learning after this (it maintained this score for some time). I have checked the OpenAI starter agent several times and I can't find any significant differences between mine implementation with their's. Any help would be extremely appreciated!

How to define topology in Castalia-3.2 for WBAN

How can defined topology in Castalia-3.2 for WBAN ?
How can import topology in omnet++ to casalia ?
where the topology defined in default WBAN scenario in Castalia?
with regard
thanks
Topology of a network is an abstraction that shows the structure of the communication links in the network. It's an abstraction because the notion of a link is an abstraction itself. There are no "real" links in a wireless network. The communication is happening in a broadcast medium and there are many parameters that dictate if a packet is received or not, such as the power of transmission, the path loss between transmitter and receiver, noise and interference, and also just luck. Still, the notion of a link could be useful in some circumstances, and some simulators are using it to define simulation scenarios. You might be used to simulators that you can draw nodes and then simply draw lines between them to define their links. This is not how Castalia models a network.
Castalia does not model links between the nodes, it models the channel and radios to get a more realistic communication behaviour.
Topology is often confused with deployment (I confuse them myself sometimes). Deployment is just the placement of nodes on the field. There are multiple ways to define deployment in Castalia, if you wish, but it is not needed in all scenarios (more on this later). People can confuse deployment with topology, because under very simplistic assumptions certain deployments lead to certain topologies. Castalia does not make these assumptions. Study the manual (especially chapter 4) to get a better understanding of Castalia's modeling.
After you have understood the modeling in Castalia, and you still want a specific/custom topology for some reason then you could play with some parameters to achieve your topology at least in a statistical sense. Assuming all nodes use the same radios and the same transmission power, then the path loss between nodes becomes a defining factor of the "quality" of the link between the nodes. In Castalia, you can define the path losses for each and every pair of nodes, using a pathloss map file.
SN.wirelessChannel.pathLossMapFile = "../Parameters/WirelessChannel/BANmodels/pathLossMap.txt"
This tells Castalia to use the specific path losses found in the file instead of computing path losses based on a wireless channel model. The deployment does not matter in this case. At least it does not matter for communication purposes (it might matter for other aspects of the simulation, for example if we are sampling a physical process that depends on location).
In our own simulations with BAN, we have defined a pathloss map based on experimental data, because other available models are not very accurate for BAN. For example the, lognormal shadowing model, which is Castalia's default, is not a good fit for BAN simulations. We did not want to enforce a specific topology, we just wanted a realistic channel model, and defining a pathloss map based on experimental data was the best way.
I have the impression though that when you say topology, you are not only referring to which nodes could communicate with which nodes, but which nodes do communicate with which nodes. This is also a matter of the layers above the radio (MAC and routing). For example it's the MAC and Routing that allow for relay nodes or not.
Note that in Castalia's current implementations of 802.15.6MAC and 802.15.4MAC, relay nodes are not allowed. So you can not create a mesh topology with these default implementations. Only a star topology is supported. If you want something more you'll have to implemented yourself.

How does OpenMPI's gather work?

I'm new to MPI and I'm trying to understand how MPI (and specifically OpenMPI) work in order to reason about the performance of my system.
I've tried to find resources online to help me understand things a little better, but haven't had much luck. I thought I'd come here.
Right now my question is simple: if I have 3 nodes (1 master, 2 clients) and I issue an MPI_Gather, does the root process handle incoming data sequentially or concurrently? In other words, if processes 1 is the first to make a connection with processes 0, will process 2 have to wait until processes 1 is done sending its data before it can start to send its data?
Thanks!
There are multiple components in Open MPI that implement collective operations and some of them provide multiple algorithms for the implementation of each operation.
What you are most likely interested in is the tuned component of the coll framework as that is what Open MPI uses by default. tuned implements all collectives using point-to-point operations and provides several algorithms for gather:
linear with synchronisation - used when messages are large to mid-size
binomial - used when the number of processes is large or the message size is small
basic linear - used in all other cases
The performance of each algorithm depends strongly on the particular combination of message size and number of ranks, therefore the library comes with a set of heuristics that tries to determine the best algorithm based on the data size and the size of the communicator (as indicated above). There are several mechanisms to override the heuristics and either force a certain algorithm or provide a list of custom algorithm selection rules.
The basic linear algorithm simply has the root loop over all other ranks receiving their messages in sequence. In that case, rank 2 won't be able to send its chunk before rank 1 since the root will first receive the message from rank 1 and only then move on to rank 2.
The linear with synchronisation algorithm splits the chunks into two pieces each. The first pieces are collected in sequence just like in the basic linear algorithm. The second pieces are collected asynchronously using non-blocking receives.
The binomial algorithm arranges the ranks as a binomial tree. The processes at the nodes of the tree receive the chunks from the lower levels and aggregate them into larger chunks that then get passed to the upper levels until they reach the root rank.
You can find the source code of the tuned module in the ompi/mca/coll/tuned folder of the Open MPI source tree. In the development branch, part of the tuned component got promoted to the base implementation of the collective framework and the code for the gather is to be found in ompi/mca/coll/base instead.
Hristo's answer is of course excellent, but I would like to offer a different point of view.
Contrary to your expectation, the question is not simple. It isn't even possible to specifically answer it without knowing more system specifics, as Hristo pointed out. That doesn't mean the question is invalid, but you should start to reason about performance on a different level.
First, consider the complexity of a the gather operation: The total network transfer to the root as well as the memory requirements are linearly growing with the number of processes in the communicator. This naturally limits scalability.
Second, you may assume that your MPI implementation does implement MPI_Gather in the most efficient way possible - better than you could do it by hand. This assumption may very well be wrong, but it is the best starting point to write your program.
Now when you have your program, you should measure and see where time is spent - or wasted. For that you should an MPI performance analysis tools. Now if you have identified that your Gather has a significant impact on performance, you can go ahead and try to optimize that: But to do so, first consider if you can structure your communication conceptually better, e.g. by somehow removing the computation all together or using a clever reduction instead. If you still need to stick to the gather: go ahead and tune your MPI implementation. Afterwards verify that your optimization did indeed improve performance on your specific system.

Real-time anomaly detection

I would like to do anomaly detection in R on real-time stream of sensor data. I would like to explore use of either the Twitter anomalyDetection or anomalous.
I am trying to think of the most efficient way to do this, as some online sources suggest R is not suitable for real-time anomaly detection. See https://anomaly.io/anomaly-detection-twitter-r. Should I use the stream package to implement my own data stream source? If I do so, is there any "rule-of-thumb" as to how much data I should stream in order to have a sufficient amount of data (perhaps that is what I need to experiment with)? Is there any way of doing the anomaly detection in-database rather than in-application to speed things up?
My experience is that if you want real time anomaly detection, you need to apply an online learning algorithm (rather than batch), ideally running on each sample as it is collected/generated. To do it, you would need to modify the existing open sources to run in online mode and adapt the model parameters for each sample that is processed.
I'm not aware of an open source package that does it though.
For example, if you're computing a very simple anomaly detector, using the normal distribution, all you need to do is update the mean and variance of each metric with each sample that arrives. If you want the model to be adaptive, you'll need to add a forgetting factor (e.g., exponential forgetting), and control the "memory" of the mean and variance.
Another algorithm which lends itself to online learning is Holt-Winters. There is a several R-implementations of it, though you still have to make it run in online mode to be real time.
I gave a talk on this topic at the Big Data, Analytics & Applied Machine Learning - Israeli Innovation Conference last May. The video is at:
https://www.youtube.com/watch?v=SrOM2z6h_RQ
(DISCLAIMER: I am the chief data scientist for Anodot, a commercial company doing real time anomaly detection).

Simulation of LTE networks using Mininet

I am planning to do a project on WIFI offloading using Software Defined Networking. Basically to switch the signals from WIFI to LTE and vice-versa based on the signal strength. Could anybody let me know how i could simulate this and carry out certain experimental tests? I know there is a software called Mininet and i am not sure if we can create base-stations to simulate the experiments. Is it possible to simulate this using Mininet?
Thanks!
You can sett up some link parameters such as bandwidth and latency in mininiet to "emulate" an wireless base station (LTE/WIFI). However, I'm not quite sure how you should emulate signal strength. You could of course write a program with a "node" that moves around on a 3 axis graph where the x,y,z values could give some signal strength value based on output effect multiplied with the vector found in the graph, and when it reaches a threshold have it change to a node (mininet link) that is closer. I.e. change the forwarding tables so your "stream"/tunnel uses another link.
I did an undergraduate thesis on wireless network emulation in Mininet and used Mininet and Ns-3 together using this project. We primarily did validity testing on this platform to determine it's accuracy and limitations (especially at scale). The wireless model is very good until a very clear performance degradation when the CPU usage reaches (100/n)% - where n is the number of available cores on the machine (for a single threaded implementation).
It also has the functionality to set up multiple base stations, although I didn't delve deeply into this beyond initial testing that it works. The main benefits achieved through this tool is the accurate performance degradation due to distance from the source and interference.
A lot of time has past and it seems better ways are available:
ns-3 has LTE features (it seems it got it through LENA)
for mininet-wifi, here, here and here

Resources