Is it possible for a step or transformation in Pentaho Data Integration to call itself, passing the results of the previous call as parameters/variables?
My first thought was to create a loop in a transformation, but they don't seem to be allowed...
If you're wanting to do this at the transformation level then take a look at the sub mapping step.
Of course, there is no loops as you describe, but you can always simulate one with a generate rows.
Alternatively you'd be best to ask this question on the Kettle forums (forums.pentaho.org) as there are very few kettle dev folk on here.
Related
i first want to say that i'm beginner in ocaml. So i made a simple app that takes data from a json, does some calculations or replace some of them with arg from the command line, then writes another json with the new data and also replace those values in a html template and writes that too. You can see my project here https://github.com/ralcr/invoice-cmd/blob/master/invoice.ml
The question is how to deal with that amount of variables? In the languages i know i would probably repeat myself twice, but here are like 6 times. Thanks for any advice.
First of all, I would like to notice, that StackExchange code review is probably a better place to post such questions, as the question is more about a design rather than about the language.
I have two suggestions, on how to improve your code. The first one is to use string maps (or hashtables) to store your variables. Another is much more radical, is to rewrite the code in a more functional way.
Use maps
In your code, you're doing a lot of pouring the same water from one bucket into another, without doing actual work. The first thing that comes to mind, is whether it is necessary at all. When you parse JSON definitions into a set of variables, you do not actually reduce complexity or enforce any particular invariants. Basically, you're confusing data with code. These variables, are actually data that you're processing not a part of the logic of your application. So the first step would be to use string map, and store them in it. Then you can easily process a big set of variables with fold and map.
Use functions
Another approach is not to store the variables at all and express everything as stateless transformations on JSON data. Your application looks like a JSON processor, so I don't really see any reason why you should first read everything and store it in the memory, and then later produce the result. It is more natural to process data on the fly and express your logic as a set of small transformations. Try to split everything into small functions, so that each individual transformation can be easily understood. Then compose your transformation from smaller parts. This would be a functional style, where the flow of data is explicit.
I know that there is possibility to export/import h2o model, that was previously trained.
My question is - is there a way to transform h2o model to a non-h2o one (that just works in plain R)?
I mean that I don't want to launch the h2o environment (JVM) since I know that predicting on trained model is simply multiplying matrices, applying activation function etc.
Of course it would be possible to extract weights manually etc., but I want to know if there is any better way to do it.
I do not see any previous posts on SA about this problem.
No.
Remember that R is just the client, sending API calls: the algorithms (those matrix multiplications, etc.) are all implemented in Java.
What they do offer is a POJO, which is what you are asking for, but in Java. (POJO stands for Plain Old Java Object.) If you call h2o.download_pojo() on one of your models you will see it is quite straightforward. It may even be possible to write a script to convert it to R code? (Though it might be better, if you were going to go to that trouble, to convert it to C++ code, and then use Rcpp!)
Your other option is to export the weights and biases, in the case of deep learning, implement your own activation function, and use them directly.
But, personally, I've never found the Java side to be a bottleneck, either from the point of view of dev ops (install is easy) or computation (the Java code is well optimized).
I need to know if the data for training that is passed in the neuralnet call is randomized in the routine or does the routine uses the data in the same order that is given. I really need to know this info for a project that I am working on, and I have not being able to figure it out by looking at the source.
Thnx!
Look into the code - thats one of the most important advantages of FOSS: you can actually check what it is doing (neuralnet is pure R, so you don't even need to fear that you need to dig into FORTRAN or C code, and you can use debug to step through the code with example data to get an overview).
Moreover, if necessary, you can even introduce e.g. a new parameter that allows you to switch off randomization if needed.
Possibly maintainer ("neuralnet") would be willing to help you as well (and able to answer much faster than about everyone else here on SE).
Is it possible to update a plot every 2 seconds, for example ?
Or, even better, to just call a function that will update the plot given the new x, y values ?
Additional Information -
I am developing a neural network, and would like to update a line chart showing the output vs the targets after each iteration.
Many thanks
How are you creating the neural network? It may be possible to insert code into what you are already doing that would update your plot.
There are functions in the tcltk2 package that will run code after specified waiting times and will allow other functions to run while waiting, but these can be very dangerous in creating race conditions, or changing objects that other code depends on. You will still need a way to acces the network information as it is being created (and this is very dificult if it is inside of another function), this will probably also slow the fitting code down a bit as it needs to keep checking the time and doing the other calculations.
It is probably best to insert the update code into the fitting code rather that depending on timing. If you show us more about how you are fitting the network (reproducible example) then we may be able to give a more detailed answer.
I wonder about the idea of representing and executing programs using graphs. Some kind of stackless model where the each node in the graph represents a function and the edges represent arguments to the functions. In this way a function doesn't return the result to its caller,but passes the result as an arg to another function node. Total nonsense? Or maybe it is just a state machine in disguise? Any actual implementations of this anywhere?
This sounds a lot like a State machine.
I think Dybvig's dissertation Three Implementation Models for Scheme does this with Scheme.
I'm pretty sure the first model is graph-based in the way you mean. I don't remember whether the third model is or not. I don't think I got all the way through the dissertation.
for javascript you might want to checkout node-red (visual) or jsonflow (json)