Is it possible to update a plot every 2 seconds, for example ?
Or, even better, to just call a function that will update the plot given the new x, y values ?
Additional Information -
I am developing a neural network, and would like to update a line chart showing the output vs the targets after each iteration.
Many thanks
How are you creating the neural network? It may be possible to insert code into what you are already doing that would update your plot.
There are functions in the tcltk2 package that will run code after specified waiting times and will allow other functions to run while waiting, but these can be very dangerous in creating race conditions, or changing objects that other code depends on. You will still need a way to acces the network information as it is being created (and this is very dificult if it is inside of another function), this will probably also slow the fitting code down a bit as it needs to keep checking the time and doing the other calculations.
It is probably best to insert the update code into the fitting code rather that depending on timing. If you show us more about how you are fitting the network (reproducible example) then we may be able to give a more detailed answer.
Related
I am searching for a solution to automatize an iterative data comparison process until all data packages are consistent. My general guess is to use something like Apache Airflow, but the iterative nature seem to be a cyclic graph. Apache airflow only allows DAGs (directed acyclic graph). Since I have not even a lot of knowledge in Airflow, I am a bit lost and would appreciate some expert knowledge here.
Current status: I am in a position were I regularly need to compare data packages for consistency and communicate errors to and between the two different parties manually.
On the one hand there is a design data set and on the other hand there are measured data sets. Both datasets involve many manual steps from different parties. So if an inconsistency occurs, I contact one or the other party and the error is removed manually. There are also regular changes to both data sets that can introduce new errors to already checked datasets.
I guess this process was not automatized yet, because the datasets are not directly comparable, but some transformations need to be done in between. I automatized this transformation process the last weeks so all that need to be done now from my side is to run the script and to communicate the errors.
What I would need now is a tool that orchestrates my script against the correct datasets and contacts the according persons as long as errors exists. In case something changes or was added the script needs to be run again.
My first guess was that I would need to create a workflow in apache airflow, but this iterative process seems to me as a cyclic graph, which is not allowed in Airflow. Do you have any suggestions or is this a common occurrence, were also solutions with Airflow exists?
I think one way to solve your problem could be to have a DAG workflow for the main task of comparing the datasets and sending notifications. Then run a periodical task in Cron, Quartz, etc, that triggers that DAG workflow. You are correct in Airflow not liking cyclic workflows.
I worked on Cylc, a cyclic graph workflow tool. Cyclic workflows (or workflows with loops) are very common in areas such as Numerical Weather Prediction NWP (reason why Cylc was created), and also in other fields such as optimization.
In NWP workflows, some steps may be waiting for datasets, and the workflow may stall and send notifications if the data is not as it was expected (e.g. some satellite imaging data is missing, and also the tides model output file is missing).
Also, in production, NWP models run multiple times a day. Either because you have new observation data, or new input data, or maybe because you want to run ensemble models, etc. So you end up with multiple runs of the workflow in parallel, where the workflow manager is responsible to manage dependencies, optimize the use of resources, send notifications, and more.
Cyclic workflows are complicated, that's probably why most implementations opt to support only DAGs.
If you'd like to try Cylc, the team has been trying to make it more generic so that it's not specific to NWP only. It has a new GUI, and the input format and documentation were improved with ease of use in mind.
There are other tools that support cyclic workflows too, such as StackStorm, Prefect, and I am currently checking if Autosubmit supports it too. Take a look at these tools if you'd like.
If you are in life sciences, or are interested in reproducible workflows, the CWL standard also has some ongoing discussion about adding support to loops, which could allow you to achieve something akin to what you described I reckon.
I'm setting up an algorithm in R which involves performing tasks on data that is streamed from a websocket. I have sucessfuly implemented a connection to a websocket (using package https://github.com/rstudio/websocket), but my algorithm does not perform optimal due to linear processing of data received from websocket (some important tasks get delayed because less important ones get triggered before them). As the tasks could easily be divided, I am wondering:
1) whether it would be possible to run two websocket connections simultaneously, providing that there is a single data frame (as a global variable) that gets updated in one instance and is accessible in another?
2) is it possible to check the queue from websocket and prioritize certain tables?
3) I am considering also a solution that includes two separate R sessions, but I am not sure if there is a way to access data that gets updated in real time in another R session? Is there a workaround that does not include saving a table in one and loading it in another?
I have already tried this with async (https://github.com/r-lib/async) without much success and I have also tried 'jobs' pannels in the newer versions of RStudio.
I would be happy to provide some code, but at this point I think that this might be irrelevant, as the question that I have is more or less trying to expand the code rather than fix it. I am also aware that probably Python offers an easier solution, but I would still like to exhaust every option that R offers.
Any help would be greatly appreciated!
In the end I was able to accomplish this by running two separate R instances and connecting them using r sockets.
I would like to offend the elder gods and use parallel::mcfork (or something like it) with minimal starting knowledge understanding that there are hidden dangers that may fall on top of my head. Maybe the behavior I'm hoping for is foolish or impossible; but I didn't think it could hurt too much to ask.
What I want to do is load some data into the workspace and then work with it from two separate interactive sessions with no intent for those sessions to communicate with each other. I can parallel::mcfork(estranged=TRUE) and see that there is another R session with a distinct pid. What I haven't been able to do is figure out how to connect to it in an interactive session. I tried using reptyr, but only got a message saying that both pids have a sub-process and I can't attach to them.
Is it possible to accomplish this aim? If so, how?
For what purpose? I have a largish dataset that takes a while to load. Now that I'm using Ubuntu, I've noticed that I can do parallel processing on this large dataset incurring a much lower cost in RAM and time than when I was using a Windows machine (i.e. mclapply vs parLapply). Now I have this large dataset... but I don't know quite what all I might want to do with it in terms of analysis. What I do know is that a measurable amount of time is going to pass between my issuing a command and the result. I'd like to, having loaded the data, analyze the data in two separate interactive sessions so that I can pursue the lines of reasoning that seem most fruitful without having layed out a plan in advance or being stuck waiting and manually monitoring the results of mcparallel. Incidentally, mcparallel provides some hopeful looking options, e.g. mc.interactive, but yields similar errors as before with reptyr.
I need to know if the data for training that is passed in the neuralnet call is randomized in the routine or does the routine uses the data in the same order that is given. I really need to know this info for a project that I am working on, and I have not being able to figure it out by looking at the source.
Thnx!
Look into the code - thats one of the most important advantages of FOSS: you can actually check what it is doing (neuralnet is pure R, so you don't even need to fear that you need to dig into FORTRAN or C code, and you can use debug to step through the code with example data to get an overview).
Moreover, if necessary, you can even introduce e.g. a new parameter that allows you to switch off randomization if needed.
Possibly maintainer ("neuralnet") would be willing to help you as well (and able to answer much faster than about everyone else here on SE).
I wonder about the idea of representing and executing programs using graphs. Some kind of stackless model where the each node in the graph represents a function and the edges represent arguments to the functions. In this way a function doesn't return the result to its caller,but passes the result as an arg to another function node. Total nonsense? Or maybe it is just a state machine in disguise? Any actual implementations of this anywhere?
This sounds a lot like a State machine.
I think Dybvig's dissertation Three Implementation Models for Scheme does this with Scheme.
I'm pretty sure the first model is graph-based in the way you mean. I don't remember whether the third model is or not. I don't think I got all the way through the dissertation.
for javascript you might want to checkout node-red (visual) or jsonflow (json)