I am brand new to airflow. I need to run raw postgreSQL script to create a file and make changes to the file using python code in single workflow. How can I make this happen?
Thanks!
Related
I have a python script that I need to execute and automate via IICS. The output of the script is a csv file. This output should be loaded to the Target. How can I achieve this via Informatica cloud. Please help with some info and documentations to the same.
Thanks
There are two ways to do this.
You can create an executable(using py2exe or some tool) from your py script. Then put that file in Informatica cloud agent server. Then you can call it using shell command. Please note, you do not need to install python or any packages.
You can also put the .py file in agent server and run it using shell like $PYTHON_HOME/python your_script.py You need to make sure py version is compatible and you have all packages installed in agent server.
You can refer to the below screenshot for how to setup shell command. Then you can run it as part of some workflow. Schedule it if needed.
https://i.stack.imgur.com/wnDOV.png
I'm new to DataBricks. I am trying to access a .R file that is present in the DBFS storage but I cannot figure out how to do so. Any help is really appreciated.
I can read data from the storage using the file path /dbfs and also source code from the script but I want to make edits to the script.
You need some editor to do that - for example, you can setup RStudio on your cluster and connect to it via RStudio UI - in this case you can edit R files directly on DBFS.
But really, the simplest for you would be to use Databricks CLI fs command to copy the file to your local machine, make changes in the editor of your choice, and upload file back.
It is possible to run a R script with Pentaho, but instead of export the result as a csv file, insert the result directly into a table on a DB?
Using the Community Edition of Pentaho, you could use a script executor step to execute a shell script in your OS to do all the work, including inserting to the database, which is not much Pentaho related, all the work is done by the shell script and you just use Pentaho to call the execution of that script.
There's also a very old plugin available in Github that I don't know if it would work with modern versions of Pentaho and R, to execute R code within Pentaho and then continue the stream of data to "normal" steps like the table output to insert the data to a table.
These are the details to configure that plugin from the developers:
http://dekarlab.de/wp/?p=5
I just started looking into Flyway using the command line route, and was wondering if it is possible to reuse the .sql file?
Example:
I have a file called V1__Create_user.sql which has
CREATE USER ${user_name} WITH PASSWORD '${pass}';
GRANT readaccess TO ${user_name};
It looks like I can only use this sql file once, that is when I run below command
flyway -placeholders.userName=test_user -placeholders.pass=test migrate
When I run the above command again with different user_name and password for the placeholders, no changes was made.
So, I was wondering if there's a way to reuse that sql file instead of generating new sql files containing the same sql query over and over ?
The way Flyway works is that it runs the scripts you provide it in the order in which you provide it. It puts a marker in the database showing which scripts it has already run. Then, when you rerun it, it will only run the scripts that it has not yet: run V2__Whatever, V2.2__Something, etc..
You can't go back & modify an existing script and expect the tool to pick up that it has changed and then rerun it. Even if that script is using placeholders.
That doesn't include the repeatable scripts (stuff like view definitions) that run every time you deploy using Flyway. If you had to, you could make that a repeatable script and run it that way. BUT, that will work as long as every deployment is incremental, 1 to 2, 2 to 3, 3 to 4. As soon as you need to deploy from version 1 to 4, you can't pass multiple commands in.
Since every instance of SQL in relational data stores I'm aware of is a declarative language, the best way to deal with it, is to use it as such. Yes, that means stuff like CREATE USER is used over & over. However, since each USER created is unique, that's just how it works.
I wanted to execute R code from SSIS package. How can I add a data control step that executes R-code? SSIS supports only vb.net and asp.net.
SSIS has many data transformations available but R is very friendly when it comes to data manipulations.
I want to run a R-code from SSIS scripts or some other way.Basically, I'm trying to integrate R in ETL process.
I wanted to extract data(E) from from a CSV file.
Transform (T) it in R and load (L) it in Microsoft database.
Is it possible to get this workflow done in SSIS package by executing R-script using SSIS data control items? Thanks!
Here are a couple of ways you could integrate R into your ETL process.
Crude, fast and dirty - Execute Process Task in the Control Flow. This would be similar to calling RScript from the command line. You would likely make your transformation, save it to a file on disk, and get that filename from your Execute Process Task so you can feed it into a Data Flow task. Upside is you're keeping your R clean and separate from your C#/VB.
Integrated via Rdotnet - You could use the RDotNet library (I believe, haven't tried to integrate it). You would need to register the DLLs in the GAC, and then you can either work with .NET objects in your SSIS scripts or call R scripts directly.
Integrated in SQL Server 2016 - Microsoft has added R support via extended stored procedures. You call the R script via stored proc and use a sql query for input data and can store the output. See more detail here. This would mean utilizing an Execute SQL task in SSIS.
I hope it helps you or someone else, since you want data processing you might bring your dataset into a CSV file (throught a data flow task), execute the file using: "Rscript " (it might be executed as a command with the execute process task), inside the file you have to upload the dataset into a dataframe ( calling it with readLines() function), then do all the math/Calculation you request, write the data or calculation results into a CSV file an reading again it from SSIS.
It is not an elegant solution, but it works :), At least till microsoft integrates R as a control/data flow process.
CYA
PS. here you go how to execute files from the command line: Run R script from command line