Penthao Kettle Download File from URL - http

I want to download a File from a URL. (ex. http://www.webadress.com/service/servicedata?ID=xxxxxx)
I found the HTTP Step for Job executables but I am forced to define a Target file name instead of just accepting the filename the Webdownload offers. (ex. ServiceData20200101.PDF)
Other Problem is that it creates a File even when the Webcall actually wouldn't supply a File.
Is the REST Client or HTTP client Step in Transformations able to download a file over a URL call that accepts the File as is?

The HTTP steps in Pentaho are somewhat limited. In similar use cases in the past I've done this by using an external shell script with arguments that then calls wget or curl and saves the result. Then Pentaho picks up the file in the temp dir and processes it from there.
The Shell job step allows you to specify a script file and pass fields from the stream as arguments.
Note that if you paste shell commands directly into the step on the second tab, they will execute in the embedded shell with older versions of curl and wget. You will also be missing environment config and certificates/keys.

Related

Download only new files with WinSCP

I am currently writing a WinSCP script whose aim is to retrieve all the files from an SFTP server and then put them on a specified location in a destination server (on which the script is located, FYI).
Is there anyone to check if a file has already been transferred on the destination server? Is it overwritten when it has? In that case, is that really a bad thing? In such a case, I guess that if the file already exists on the destination server, I would like nothing to happen. If it doesn't exist, then I'd like to proceed with the transfer.
You will find enclose the code written so far below
# Automatically abort script on errors
option batch abort
# Disable overwrite confirmations that conflict with the previous
option confirm off
# Connect using a password
open sftp://SERVER#IP_ADDRESS:PORT -privatekey="PRIVATE_KEY" -hostkey="HOSTKEY" -passive=off
# Change remote directory
cd in
cd DIRECTORY
# Force binary mode transfer
option transfer binary
# Get ALL files from the directory specified
get /*.csv* \\DIRECTORY
# Remove all .csv files
rm /*.csv
# Exit WinSCP
bye
Thank you very much in advance for your help, hope it was clear enough, otherwise please let me know if I can provide you with further information
The easiest solution is to add -neweronly switch to your get command:
get -neweronly /*.csv \\DIRECTORY
For very similar results, you can also use synchronize command:
synchronize local \\DIRECTORY / -filemask=*.csv
See also WinSCP article on Downloading the most recent file.

nginx - run shell/python script on wget request

I want to run a shell/python script on each wget to my nginx server.
For example, if I do wget http:///text.txt?example=100
I want to call a script that generates a new file called text100.txt, and returns it.
In other words, can I pass the GET params to the script, and return arbitrary files from that script so the wget client will download them?
Thank you!
What you're asking about is called cgi-bin.
However, I wouldn't recommend using it;
I would recommend against using shell scripts for web serving altogether, and
for Python I'd suggest one of the web micro-frameworks such as flask.

Auto triggering a UNIX shell script

I have a main script in a folder called main.ksh (in /home/pkawar/folder), and its input input file inputfile.xls (in /home/pkawar/folder/ipfile).
When I run main.ksh, it uses inputfile.xls and deliver the output to a mail address.
The inputfile.xls is loaded to path /home/pkawar/folder/ipfile via ftp commands.
Is it possible to run main.ksh automatically and output will be sent via mail when the file inputfile.xls is loaded successfully?
The first option would be to use cron, but from your question it doesn't seem that you want to go that path.
My question would be, what is creating the *.xml file? Is it possible that whatever is creating that file to know when its finished and then calling the shell script, or better yet, have the xml file streamed to the shell script on the fly?
The first thing you should do is write a script that does whatever it is you want done. If your script performs correctly, you can use cron via a crontab file to have the script executed on whatever schedule you desire.
See man crontab for details.

How do I scp a file to a Unix host so that a file polling service won't see it before the copy is complete?

I am trying to transfer a file to a remote Unix server using scp. On that server, there is a service which polls the target directory to detect incoming files for processing. I would like to ensure that the polling service does not pick up new files before the copy is complete. Is there a way of doing that?
My file transfer process is a simple scp command embedded in a larger Java program. Ideally, a solution which did not involve changing the Jana would be best (for reasons involving change control processes).
You can scp the file to a different (/tmp) directory and move the
file via ssh after transfer is complete. The different directory needs to be on the same partition as the final destination directory otherwise there will be a copy operation and you'll face a similar problem. Another service on the destination machine can do this move operation.
You can copy the file as hidden (prefix the filename with .) and copy, then move
If you can modify the polling service, you can check active scp processes and ignore files matching scp arguments.
You can check for open files with lsof +d $directory and ignore them in the polling server
I suggest copying the file using rsync instead of scp. rsync already copies new files to temporary filenames, and has many other useful features for file synchronization as well.
$ rsync -a source/path/ remotehost:/target/path/
Of course, you can also copy file-by-file if that's your preference.
If rsync's temporary filenames are sufficient to avoid being picked up by your polling service, then you could simply replace your scp command with a shell script that acts as a wrapper for rsync, eliminating the need to change your Java program.
You would need to know the precise format that your Java program uses to call the scp command, to make sure that the options you feed to rsync do what you expect.
You would also need to figure out how your Java program calls scp. If it does so by full pathname (i.e. /usr/bin/scp), then this solution might put other things at risk on your system that depend on scp (like you, for example, expecting scp to behave as it usually does instead of as a wrapper). Changing a package-installed binary like /usr/bin/scp may also "break" your package registration, making it difficult to install future security updates because a binary has changed to a shell script. And of course, there might be security implications to any change you make.
All in all, I suspect you're better off changing your Java program to make it do precisely what you want, even if that is to launch a shell script to handle aspects of automation that you want to be able to change in the future without modifying your Java.
Good luck!

Unix invoke script when file is moved

I have tons of files dumped into a few different folders. I've tried organizing them several times, unfortunatly, there is no organization structure that consistently makes sense for all of them.
I finally decided to write myself an application that I can add tags to files with, then the organization can be custom to the actual organizational structure.
I want to prevent from getting orphaned data. If I move/rename a file, my tag application should be told about it so it can update the name in the database. I don't want it tagging files that no longer exist, and having to readd tags for files that used to exist.
Is there a way I can write a callback that will hook into the mv command so that if I rename or move my files, they will invoke the script, which will notify my app, which can update its database?
My app is written in Ruby, but I am willing to play with C if necessary.
If you use Linux you can use inotify (manpage) to monitor directories for file events. It seems there is a ruby interface for inotify.
From the Wikipedia:
Some of the events that can be monitored for are:
IN_ACCESS - read of the file
IN_MODIFY - last modification
IN_ATTRIB - attributes of file change
IN_OPEN and IN_CLOSE - open or close of file
IN_MOVED_FROM and IN_MOVED_TO - when the file is moved or renamed
IN_DELETE - a file/directory deleted
IN_CREATE - a file in a watched directory is created
IN_DELETE_SELF - file monitored is deleted
This does not work for Windows (and I think also not for other Unices besides Linux) as inotify does not exist there.
Can you control the path of your users? Place a script or exe and have the path point to it before the standard mv command. Have this script do what you require and then call the standard mv to perform the move.
Alternately an alias in each users profile. Have the alias call your replacement mv command.
Or rename the existing mv command and place a replacement in the same dir, call it mv and have it call your newly renamed mv command after doing what you want.

Resources