I used unix script to convert oracle data to csv file and export csv file data to teradata database. It takes long time to load. How to use piping concept for this script in unix
Related
I want to build a Jenkins cron job, which fetches data from firebase into json format and then convert that data into csv or xlsx file and put it into one drive.
I have achieved fetching data from firebase, but stuck on converting it to csv and then save to one drive.
There are multiple ways to convert a json file to csv but considering the scenario that you are working with jenkins and might have a shell as build step, so in that case you can use jq utility. there are multiple answers which can help you.
Use jq to Convert json File to csv
How to convert json into csv file using jq?
if you are open to use any other option for build then you can use python also. which makes it much easier.
you can use pandas module from python to convert a json file to csv using below code.
import pandas as pd
with open('data.json', encoding='utf-8') as stream:
df = pd.read_json(stream)
df.to_csv('data.csv', encoding='utf-8', index=False)
Is there any way to achieve unloading data from Snowflake to XLS?
We are using Airflow to load data from Snowflake to XLS file or Converting from CSV to XLS
If you are leveraging Airflow, then you could potentially use Snowflake's Python Connector to load data from Snowflake to a Panda dataframe and then use Panda's to_excel() function to write that data out to Excel.
https://docs.snowflake.com/en/user-guide/python-connector.html
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_excel.html
I'm working with h2o (latest version 3.26.0.10) on a Hadoop cluster. I've read in a parquet file from HDFS and have performed some manipulation on it, built a model, etc.
I've stored some important results in an H2OFrame that I wish to export to local storage, instead of HDFS. Is there a way to export this file as a parquet?
I tried using h2o.exportFile, documentation here: http://docs.h2o.ai/h2o/latest-stable/h2o-r/docs/reference/h2o.exportFile.html but all the examples are for writing .csv. I tried using the a file path with .parquet as an extension and that didn't work. It wrote a file but I think it was basically a .csv as it was identical file size to the .csv.
example: h2o.exportFile(iris_hf, path = "/path/on/h2o/server/filesystem/iris.parquet")
On a related note, if I were to export my H2OFrame to HDFS instead of local storage, would it be possible to write that in parquet format? I could at least then move that to local storage.
I need to be able to merge RDS files so I do not have to run a SQL statement which takes 1 hour to run every day.
I have only been able to save and read separate RDS files. I am using R version 3.5.3 and I have not been able to find anything on updating files.
I tried merge(datafile1, datafile2) and this returned no data.
I Have a Excel file(xlsl) with multiple worksheets in Sharepoint server which I should read in informatica and load the data into different tables.
Informatica is hosted on unix server
Currently I am thinking of the below work around but I have challenges here:
1.Copying the excel file into unix.(Once i copy the file from sharepoint server to unix using the "curl" command, the format of the file is getting changed to html. I can I retain the original excel format ,I can't install any excel utility on our server)
2.Convert them into multiple CSV files using some script (How can I do this, As I mentioned earlier I don't have any utilities like xls2csv, unoconv )
3.And read the CSV file and load them into tables.
Please let me know if there is any better approach than this.
You can try using wget to download the excel file (or set of files) from the sharepoint to Informatica file server location. It will allow you to specify the directory and target file name as well.