How to unzip a '.zip' file uploaded to azure data lake store? - unzip

I have few zipped files uploaded in my ADLS folder. I want to unzip it. I do not have the access to download those files. Without unzipping, I cannot view the contents of the zipped file. How do I unzip the files.? I tried using ADF but its failing saying, 'unauthorized access'. I think I will have to use some custom code but I am unable to figure out.

3 options to do it:
Use datafactory to unzip the files using the copy file activity (native support for zip files).
Create your own batch activity os azure batch and access the data lake and unzip the file
Create a custom extractor that unzip the file and read and write line by line

Related

WinSCP command line for uploading file from folder named with current date

Our bank just changed the way in which upload and download files to them. Previously we could log in to a secured website, choose directory, and upload/download manually. Everything now has to be done through SFTP, using FileZilla or similar program.
I want to automate SFTP upload process by using WinSCP.
I realize I will need to use the put command line to upload. The file I'm wanting to upload is generated every day and the file name is exactly the same, but the folder being uploaded from changes. The directory structure is as such:
C:\Finance\FY 2021\YYYYMMDD\file.txt
My question is what would the upload command line look like to upload this file on a daily basis. This upload will always take place the same day, so the folder name will always be the current date in the above format.
Can these commands be contained within and run from a batch file rather than creating a batch file that merely points to a scripted txt file to run? Thanks for your help!
A follow-up question for handling of the FY YYYY part part:
Use WinSCP to upload from a folder with a fiscal year in its name to an SFTP server
WinSCP has %TIMESTAMP% syntax which you can use to refer to the folder with today's timestamp in its name.
And yes, you can specify WinSCP commands directly in the batch file using the /command parameter:
winscp.com /ini=nul /command ^
"open sftp://username:password#ftp.example.com/ -hostkey=""...""" ^
"put ""C:\Finance\FY 2021\%%TIMESTAMP#yyyymmdd%%\file.txt"" ""/remote/path/""" ^
"exit"

Where do locally downloaded files go on Cloud Composer?

I have a DAG that downloads a file from Cloud Storage and saves it to the following path: /home/airflow/gcs/data/FILENAME.txt
This file then appears in the Cloud Composer storage bucket under /data.
However, when I originally wrote the DAG I didn't specify the download location to be: /home/airflow/gcs/data/ and simply had it downloading the file in place. I would like to go delete those files but I don't know where to find them.
Where do downloaded files in Cloud Composer reside when you don't specify the folder?
It looks like you don't need to worry about cleanup from when you first wrote the DAG - if you're using the gcs_download_operator, then according to its source code, if you did not specify a value for the filename parameter, the downloaded file won't be stored on the local file system.

How can I read the excel file(xslx) which is in the Sharepoint server directly from informatica Powercenter?

I Have a Excel file(xlsl) with multiple worksheets in Sharepoint server which I should read in informatica and load the data into different tables.
Informatica is hosted on unix server
Currently I am thinking of the below work around but I have challenges here:
1.Copying the excel file into unix.(Once i copy the file from sharepoint server to unix using the "curl" command, the format of the file is getting changed to html. I can I retain the original excel format ,I can't install any excel utility on our server)
2.Convert them into multiple CSV files using some script (How can I do this, As I mentioned earlier I don't have any utilities like xls2csv, unoconv )
3.And read the CSV file and load them into tables.
Please let me know if there is any better approach than this.
You can try using wget to download the excel file (or set of files) from the sharepoint to Informatica file server location. It will allow you to specify the directory and target file name as well.

How to upload CSV files to GitHub repo and use them as data for my R scripts

I'm currently doing a project that uses R to process some large csv files that are saved in my local directory linked to my repo.
So far, I managed to create the R project and commit and push R scripts into the repo with no problem.
However, the scripts read in the data from the csv files saved in my local directory, so the code goes in a form
df <- read.csv("mylocaldirectorylink")
However, this is not helpful if my partner and I working on the same project have to change that url to our own local directory every time we pull it off the repo. So I was thinking that maybe we can upload the csv files onto GitHub Repo and let the R script refer directly to the csv files online.
So my questions are:
Why can't I upload csv files onto GitHub? They keep saying that my file is too large.
If I can upload the csv files, how to I read the data from these csv files?
Firstly, it's generally a bad idea to store data on Github, especially if it's large. If you want to save it somewhere on the Internet, you can use, say, Dataverse, and then can access your data with URL (through the API), or Google Drive, as Jake Kaupp suggested.
Now back to your question. If your data doesn't change, I would just use not the absolute paths to CSV but relative ones. In other words, instead of
df<-read.csv("C:/folder/subfolder/data.csv")
I would use
df <- read.csv("../data.csv")
If you are working with R project, then the initial working directory is inside the folder of the project. You can check it with getwd(). This working directory changes as you move the R project. Just agree with your colleague that your data file should be in the same folder where the folder with R project is situated.
This is for a Python script.
You can track csv files by editing your .gitignore file.
**OR**
You can add csv files in your github repo, which can be used by others.
I did so by following steps:
Checkout the branch on github.com
Go to the folder where you want to keep csv files.
Here, you will see an option "Add file" in top right area as shown below:
Here you can upload csv files and commit the changes in same branch or by creating a new branch.

Symfony2 - Upload, zip & encrypt a file once uploaded in the server

I have been implementing an entity in Symfony 2.2 in order to upload files to my server. I followed successfully the steps listed in
http://symfony.com/doc/current/cookbook/doctrine/file_uploads.html
However I need to implement an additional feature, which consists in saving the file along the entity, but not the original one, but the zipped & encrypted one, same as if I'd done that using the command line of linux and then uploaded the generated zip file. I mean, when I'm required to select in my form the file, I choose it as normal, but in the server it'd be stored a zip which contains that file instead of the file itself, and of course when downloading I want the zip as well, so the name in the table has to be the one of the zip file.
I guess it could be accomplished using system calls, allowing PHP to execute a zip command over the file, but I cannot figure it out how exactly. Any help?

Resources