I am trying to run a job in Airflow which executes a dataflow job. I realized there are 2 Operators, that are BeamRunPythonPipelineOperator and DataFlowPythonOperator, both operators can submit jobs to dataflow, but i have concern about the difference between them.
Is there any difference between them? Please help me. Any help would be highly appreciated?
DataFlowPythonOperator was deprecated and replaced by DataflowCreatePythonJobOperator which was then deprecated and replaced by BeamRunPythonPipelineOperator.
TLDR; use BeamRunPythonPipelineOperator as of July 2022.
Related
Xterm is used when running Corda locally on one computer using gradle.
Is there a way to specify your terminal editor when running as suggested by the following issue?
https://github.com/corda/corda/issues/2605
I completely share your pain on this. The way that runnodes has its tooling baked in makes it impossible for you to customize how the cordform plugin runs the nodes without digging into the internals.
Some other ideas for you
one thing you could do would be to stop using cordform altogether and run your corda network using dockerform (example here: https://github.com/corda/samples-java/blob/master/Features/dockerform-yocordapp/build.gradle#L93) so that the plugin doesn't need to actually create new terminals.
the much harder way would be to actually download the corda gradle plugins (https://github.com/corda/corda-gradle-plugins#installing-locally) and install it locally with your edits to the cordform task so that it opens the terminal of your choice. You may be able to PR them as the cordform task that's usually used to generate the runnodes script comes from here as far as I know.
As a separate note, I saw your github issue and I was disappointed by how that got handled. I'm sorry you had that experience and I'm going to dig into that issue internally to find out what's happening with that.
feel free to reach out to me (David Awad) on slack.corda.net and I can let you know what's going on there.
Thanks as always
I am using unshare(CLONE_FILES) on Linux to separate FD tables. Is there any similar system calls on FreeBSD?
(Edit: mentioned in comment, it seems rfork_thread won’t work in this case) I have tried rfork_thread(RFFDG|RFTHREAD, malloc(8000000), &myRoutine, arg), but it returns 0(no thread created)? As stated in the manual, rfork_thread has been deprecated in favor of pthread_create, and I didn’t find sample code that uses this system call.
Thanks in advance for providing any clues on how to achieve this on FreeBSD.
Is it possible to run an R script as an airflow dag? I have tried looking online for documentation on this and am unable to do so. Thanks
There doesn't seem to be a R Operator right now.
You could either write your own and contribute to the community or simply run your task as a BashOperator calling RScript.
Another option is to containerize your R script and run it using the DockerOperator, which is included in the standard distribution. This removes the need to have your worker nodes configured with the correct version of R and any needed R libraries.
USe BashOperator for executing R scripts.
For example:
opr_hello = BashOperator(task_id='xyz',bash_command='Rscript Pathtofile/file.r')
There is a pull request open for an R operator, still waiting for it to be incorporated.
https://github.com/apache/incubator-airflow/pull/3115/files
I want to pick up the rc.boot script file & modify something in AIX system.
How to find it out? Thanks
I would suggest not to modify the rc.boot script on AIX.
Not very many services are started at that point in the boot process - very easy to introduce something that may result in the system not fully booting.
May be replaced by updates from IBM without warning - thus wiping out your changes.
Follow the method from comp.unix.aix . This set-up or something similar to it has been used at all the AIX shops I have worked at over the last 20 years. I currently use this on 50+ servers (except it is called rc.server instead of rc.local). Placing it in the /etc/inittab as illustrated (after rc.nfs) ensures that NFS services are up and running when your script(s) are run.
Cheers
I Googled rc.boot and the first result was documentation from IBM.
It is located at /sbin/rc.boot.
I am about to implement the schduled kind of task, not sure whether should implement in window sevice or window schduler.
The use case is, there will be one executable deployed on the mechine, which is attached to the scanner. For every five minutes, the exe will be reading the scanned files from the specified folder and upload the files to the server.
What would be the best solution for this use case.
Thanks
Use a scheduled task. A windows service would have to be specially written, and this is perfectly suited for a simple job which runs at 5 minute intervals.
You'll find some good comparisons here:
windows service vs scheduled task
Personally, I would use the Windows Service because it is easier to troubleshoot, locate the logs and restart if necessary.