I am wondering my script takes 29h on my current machine.
I have access to a remote Ubuntu machine with a more powerful CPU which might speed up the calculations.
Is there an easy way to transfer the results of the RScript run on the remote machine to my local machine session?
I can only think of saving the results to a csv file and then importing that csv file again locally.
Related
I have the following problem. I have a data pipeline at work that transforms raw data and loads it to a cloud database, for various projects. There are Python scripts for the project-based transformations, but everything must be done manually (defining the transformer's project-based inputs, run the transformer, load the data).
I want to automate this process with Airflow. I created the above steps as tasks in Python. The Airflow instance is running on some computer, which must reach a network drive, where the raw data and the transformer scripts are located. The required connection type is Samba.
I managed to connect to the drive and create a SambaHook object:
samba_file_share: Final[object] = SambaHook(connection_id, file_share_name)
In one task, I need to call and run the transformer script. With a former solution (without Samba) I used Popen, which worked fine. However, I must use Samba now, and I face the following problem.
I have the path of the transformer script by reading out the root folder of the file share from the Samba object, and join the path of the transformer to it:
samba_file_share._join_path(transformer_path)
If I print this out, the path is correct, and the network is available. If I fed it as a string to Popen (or byte string or path-like object) I got the error "No such file or directory".
Can anyone help with it? How can I fed it to Popen to run the script; or should I use something else, not Popen, to run it? The Samba documentation is totally incomplete, I could not found anything there so far.
Thanks,
Marci
This automated Airflow solution works perfectly if I connect from a machine that easily access the network drive.
However, that is only for development, and in production it must run in some other machine which has no direct access to the drive. I must use Samba to connect to it, and it breaks everything.
I have a remote MonetDB server running and I want to bulk upload a csv file as it is much faster.
Based on the params in MonetDB.R, there is a csvdump=TRUE option but I don't think it works when you are trying to do this against a remote server. The server has to be local.
https://rdrr.io/github/MonetDB/monetdb-r/man/dbWriteTable.html
First, am I correct that I can't do this and if not, is there a workaround? I have a dataframe with +5M rows so it takes a long time with insert statements rather than using COPY INTO.
When I try using csvdump=TRUE against the remote server, it can't find the csv file because it is local to computer that called the dbWriteTable command.
I think you are right. As a workaround either use explicit COPY INTO ON CLIENT SQL statements or first use some file transfer tool to copy the file to the remote server before calling dbWriteTable.
It reads from MonetDB's documentation on COPY INTO:
FROM files ON SERVER
With ON SERVER, which is the default, the file name must be an
absolute path on the system on which the database server (mserver5) is
running. ...
Interestingly enough pymonetdb, the Python driver for MonetDB, uses ON CLIENT for bulk loads. From the pymonetdb's doc:
File Uploads and Downloads
Classes related to file transfer requests as used by COPY INTO ON
CLIENT.
You might want to file an issue for the MonetDB R-driver project to have similar behavior as pymonetdb.
I have an R script file currently saved in my local computer. This is computationally intensive script which download data from Internet and save data in csv files in local disk.
I was thinking if I can run this script within Kaggle and save the data in csv files in Google drive. I also intend to run that script everyday at scheduled time.
Can you please guide me if this is a good idea to implement? How can I deploy this script in Kaggle and have it run based on scheduled time?
Your pointer will be highly appreciated.
s
I am working on a remote jupyter notebook located on internal server. I want to save my findings/insight on my local computer to make further analysis.
Example:
Suppose this is the final data I have after analysis
I want to write it to my local disk.
But the code below writes the data to the server's D:\ drive
data.to_csv(r'D:\Team.csv')
This is the sample code for savinng the png file of the dataset which I have train.
plotting data for saving img to local library
plot_model(model, to_file="model.png", show_shapes=True, show_layer_names=True)
I use Excel + R on Windows on a rather slow desktop. I have a full admin access to very fast Ubuntu-based server. I am wondering: how to remotely execute commands on the server?
What I can do is to save the needed variables with saveRDS, and load them on server with loadRDS, execute the commands on server, and then save the results and load them on Windows.
But it is all very interactive and manual, and can hardly be done on regular basis.
Is there any way to do the stuff directly from R, like
Connect with the server via e.g. ssh,
Transfer the needed objects (which can be specified manually)
Execute given code on the server and wait for the result
Get the result.
I could run the whole R remotely, but then it would spawn a network-related problems. Most R commands I do from within Excel are very fast and data-hungry. I just need to remotely execute some specific commands, not all of them.
Here is my setup.
Copy your code and data over using scp. (I used github, so I clone my code from github. This has the benefit of making sure that my work is reproducible)
(optional) Use sshfs to mount the remote folder on your local machine. This allows you to edit the remote files using your local text editor instead of ssh command line.
Put all things you want to run in an R script (on the remote server), then run it via ssh in R batch mode.
There are a few options, the simplest is to exchange secure keys to avoid entering SSH/SCP passwords manually all the time. After this, you can write a simple R script that will:
Save necessary variables into a data file,
Use scp to upload the data file to ubuntu server
Use ssh to run remote script that will process the data (which you have just uploaded) and store the result in another data file
Again, use scp command to transfer the results back to your workstation.
You can use R's system command to run scp and ssh with necessary options.
Another option is to set up cluster worker at the remote machine, then you can export the data using clusterExport and evaluate expressions using clusterEvalQ and clusterApply.
There are a few more options:
1) You can do the stuff directly from R by using Rserve. See: https://rforge.net/
Keep in mind that Rserve can accept connections from R clients, see for example how to connect to Rserve with an R client.
2) You can set up cluster on your linux machine and then use these cluster facilities from your windows client. The simplest is to use Snow, https://cran.r-project.org/package=snow, also see foreach and many other cluster libraries.