How to run Python files with Airflow using Samba connection - airflow

I have the following problem. I have a data pipeline at work that transforms raw data and loads it to a cloud database, for various projects. There are Python scripts for the project-based transformations, but everything must be done manually (defining the transformer's project-based inputs, run the transformer, load the data).
I want to automate this process with Airflow. I created the above steps as tasks in Python. The Airflow instance is running on some computer, which must reach a network drive, where the raw data and the transformer scripts are located. The required connection type is Samba.
I managed to connect to the drive and create a SambaHook object:
samba_file_share: Final[object] = SambaHook(connection_id, file_share_name)
In one task, I need to call and run the transformer script. With a former solution (without Samba) I used Popen, which worked fine. However, I must use Samba now, and I face the following problem.
I have the path of the transformer script by reading out the root folder of the file share from the Samba object, and join the path of the transformer to it:
samba_file_share._join_path(transformer_path)
If I print this out, the path is correct, and the network is available. If I fed it as a string to Popen (or byte string or path-like object) I got the error "No such file or directory".
Can anyone help with it? How can I fed it to Popen to run the script; or should I use something else, not Popen, to run it? The Samba documentation is totally incomplete, I could not found anything there so far.
Thanks,
Marci
This automated Airflow solution works perfectly if I connect from a machine that easily access the network drive.
However, that is only for development, and in production it must run in some other machine which has no direct access to the drive. I must use Samba to connect to it, and it breaks everything.

Related

How to encrypt a lua script and have it be able to run with a LuaJIT executor

I want to make a protected Lua script [for a game] that can be run via an external program. This means I don't want anyone else to see the source code. The external program is a Lua wrapper
Seraph is a ROBLOX Lua script execution exploit. It uses a wrapper to emulate a real ROBLOX scripting environment. It can run scripts in an elevated level 7 thread, allowing you to access API functions and change properties that are normally not accessible. It also contains re-implementations of missing functions such as loadstring and GetObjects, and bypasses various security checks, such as the URL trust check in the HttpGet/HttpPost functions.
They recently implemented LuaJIT and I thought this might help. If it can only be run by LuaJIT wrappers that would be awesome!
--PS I know basic lua but can't write anything cool.
--PPS It needs to be able to have a password pulled from an online database
Since I'm not familiar with ROBLOX, I'm just going to focus on this part of your question:
This means I don't want anyone else to see the source code.
For this, you will want to use Lua's bytecode dumping facilities. When executing a script, Lua first compiles it to bytecode, then executes said bytecode in the VM. LuaJIT does the same thing, except that it has a completely different VM & execution model. The important thing is that LuaJIT's bytecode is compatible with standard Lua and still offers the same bytecode dumping facilities.
So, the best 'protection' you can have for your code is to compile it on your end, then send and execute only the compiled, binary version of it on the external machine.
Here's how you can do it. First, you use this code on your machine to produce a compiled binary that contains your game's bytecode:
local file = io.open('myGame.bin', 'wb')
file:write(string.dump(loadfile('myGame.lua')))
file:close()
You now have a compiled version of your code in 'myGame.bin'. This is essentially as 'safe' as you're going to get.
Now, on your remote environment where you want to run the game, you transfer 'myGame.bin' to it, and run the compiled binary like so:
local file = io.open('myGame.bin', 'rb')
local bytecode = file:read('*all')
file:close()
loadstring(bytecode)()
That will effectively run whatever was in 'myGame.lua' to begin with.
Forget about passwords / encryption. Luke Park's comment was poignant. When you don't want someone to have your source, you give them compiled code :)

CFileFind::FindFile and network paths

I have a dll that opens a file for processing. It attempts to find the file with FindFile() function. I also have a service that calls the dll and here is the problem - when the path to the file is a network path, FindFile() fails to find it but only when called from the service, if I call it directly from my application it finds the file. I'm sure the FindFile() function gets the same parameters in both cases as I write a log file with it. Parameter looks like this:
"\SERVER\SERVER_USERS\USERX\TEST.TXT"
I know this is 6 months after the question, but I figured I'd answer it anyway ... Usually, it is a permissions thing. If the service does not have access to the network folder, then it won't find anything. Many services run as a local system account by default, and that account doesn't have built-in access to network files. So try making sure the service is running as an account that has access to the network folder in question.

QSQLDatabase (using SQLite) takes long time to open a database

I have developed an application win QT which uses SQLIte database. The copy of database is located on each site.
On one site let's say site 'BOB1' it works perfectly without any problem. But when we try to use it on another site lets say 'BOB2' it takes long time to open a database connection(approx 2000 milliseconds).
I thought that perhaps there is a network problem, So they tried to use the server of the site 'BOB1' as their server, which works fine. But when i tried to use the server of the site 'BOB2' from the site 'BOB1', I have the same problem. So i thought it may not be the network issue.
Another thing that came to my mind was that, perhaps there is a problem of DNS resolution. But when i tried to ping the server using IP and hostname, the response time is the same.
Any idea or pointer that what can be the problem.
PS: Server + database file path is specified in the setDatabasePath() fuinction using enviornment variables.
Consider copying the database to the local machine (eg temp folder if transient, or other suitable location if permanent). You can safely use either file copy, or consider using the qt backup API to ensure that the transfer happens successfully (plus you get the option of progress feedback)
https://sqlite.org/backup.html
You could even "backup" the file from the remote server to in-memory if the file is small and you say you're reading only?
You can see some sample code here on how to import an sqlite DB into a Qt QSqlDatabase. Note that when you do this, you want to make sure the version of sqlite native API that you're using is the same as that compiled into Qt, or you may get error messages from sqlite or Qt.

How to easily execute R commands on remote server?

I use Excel + R on Windows on a rather slow desktop. I have a full admin access to very fast Ubuntu-based server. I am wondering: how to remotely execute commands on the server?
What I can do is to save the needed variables with saveRDS, and load them on server with loadRDS, execute the commands on server, and then save the results and load them on Windows.
But it is all very interactive and manual, and can hardly be done on regular basis.
Is there any way to do the stuff directly from R, like
Connect with the server via e.g. ssh,
Transfer the needed objects (which can be specified manually)
Execute given code on the server and wait for the result
Get the result.
I could run the whole R remotely, but then it would spawn a network-related problems. Most R commands I do from within Excel are very fast and data-hungry. I just need to remotely execute some specific commands, not all of them.
Here is my setup.
Copy your code and data over using scp. (I used github, so I clone my code from github. This has the benefit of making sure that my work is reproducible)
(optional) Use sshfs to mount the remote folder on your local machine. This allows you to edit the remote files using your local text editor instead of ssh command line.
Put all things you want to run in an R script (on the remote server), then run it via ssh in R batch mode.
There are a few options, the simplest is to exchange secure keys to avoid entering SSH/SCP passwords manually all the time. After this, you can write a simple R script that will:
Save necessary variables into a data file,
Use scp to upload the data file to ubuntu server
Use ssh to run remote script that will process the data (which you have just uploaded) and store the result in another data file
Again, use scp command to transfer the results back to your workstation.
You can use R's system command to run scp and ssh with necessary options.
Another option is to set up cluster worker at the remote machine, then you can export the data using clusterExport and evaluate expressions using clusterEvalQ and clusterApply.
There are a few more options:
1) You can do the stuff directly from R by using Rserve. See: https://rforge.net/
Keep in mind that Rserve can accept connections from R clients, see for example how to connect to Rserve with an R client.
2) You can set up cluster on your linux machine and then use these cluster facilities from your windows client. The simplest is to use Snow, https://cran.r-project.org/package=snow, also see foreach and many other cluster libraries.

Is it better to execute a file over the network or copy it locally first?

My winforms app needs to run an executable that's sitting on a share. The exe is about 50MB (it's a setup.exe type of file). My app will run on many different machines/networks with varying speeds (some fast, but some awfully slow, like barely 10baseT speeds).
Is it better to execute the file straight from the share or is it more efficient to copy it locally and then execute it? I am talking in terms of annoying the user the least.
Locally is better. A copy will read each byte of the file a single time, no more, no less. As you execute, you may revisit code that is out of cache, etc and gets pulled again.
As a setup program, I would assume that the engine will want to do some kind of CRC or other integrity check too, which means it's reading the entire file anyway.
It is always better to execute it locally than running it over the network.
If you're application is small, and does not need to load many different resource during runtime then it is ok to run it over the network. It might even be preferable because if you run it over the network the code is read (download and load to memory) once as oppose of manually downloading the file then run it which take 2 read code. For example you can run a clock widget application over the network.
On the other hand, if your application does read a lot of resources during runtim, then it is absolutely a bad idea to run it over the network because each read of the resource will go over the network, which is very slow. For example, you probably don't want to be running Eclipse over the network.
Another factor to take into consideration is how many concurrent user will be accessing the application at the same time. If there are many, you should copy the application to local and run from there.
I believe the OS always copy the file to a local temp folder before it is actually executed. There are no round trips from/to the network after it gets a copy, it only happens once. This is sort of like how a browser works... it first retrieves the file, saves it locally, then it runs if off of the local temp where it saved it. In other words, there is no need to copy it manually unless you want to keep a copy for yourself.

Resources