Write a csv file to a folder in remote computer - r

I would like to write a csv file to a remote folder in a computer identified by user#ip-address, running a different OS (e.g. my current OS is iOS and remote OS is Ubuntu-19).
Below, the straightforward code saves a file in local folder
write.csv(1:10, 'Save.csv')
But I want to have something like
write.csv(1:10, "root#xx.xxx.x.x:/folder/Save.csv")
Any pointer will be highly appreciated.

This seems to be possible with the cp.remote function of the ssh-utils package
https://cran.r-project.org/web/packages/ssh.utils/ssh.utils.pdf
A wrapper around the scp shell command that handles local/remote files and allows copying between remote hosts via the local machine.
cp.remote(remote.src, path.src, remote.dest, path.dest, verbose = FALSE,
via.local = FALSE, local.temp.dir = tempdir())

Could you separate it into two steps?
Save it locally with write.csv()
Call iOS's equivalent of scp (wrapped in R's system()) to copy the file to the remote machine.
It would look like
write.csv(1:10, 'Save.csv')
system("scp Save.csv root#xx.xxx.x.x:/folder/Save.csv")

Related

Reading files present in a directory in a remote folder through SFTP

TLDR; Convert the bash line to download sftp files get Inbox/* to c++ or python. We do not have execute permissions on Inbox directory.
I am trying to read the files present in a directory in a remote server through SFTP. The catch is that I only had read and write permissions on the directory and not execute. This means any method that requires opening (cding) into the folder would fail. I need to read the file names since they are variable. From what I understand ls does not require execute privs. If I can get a list of files in the directory then reading then would be fine. Here is the directory structure:
Inbox
--file-a.txt
--file_b.txt
...
I have tried libssh but sftp_readdir required a handle of the open directory. I also looked at paramiko for python but that too requires to open the directory to read the file names.
I am able to do this in bash using send "get Inbox/* ${destination_dir}". Is there anyway I can use a similar pattern match but on c++ or python?
Also, I cannot execute bash commands through my binary. Does anyone know of any library in python or c++ (preferred) that would support this?
I have not posted here in a while so please excuse me if I am not following the formatting. I will learn from your suggestions. Thank you!

pyspark: how to show current directory?

Hi I'm using pyspark interactively. I think I'm failing loading a LOCAL file correctly.
how do I check current directory, so that I can go to browser to take a look at that actual file?
Or is the default directory where pyspark is? Thanks
You can't load local file unless you have same file in all workers under same path. For example if you want to read data.csv file in spark, copy this file to all workers under same path(say /tmp/data.csv). Now you can use sc.textFile("file:///tmp/data.csv") to create RDD.
Current working directory is the folder from where you have started pyspark. You can start pyspark using ipython and run pwd command to check working directory.
[Set PYSPARK_DRIVER_PYTHON=/path/to/ipython in spark-env.sh to use ipython]
import os
cwd = os.getcwd()
print(cwd)

SparkR : how to acces files passed with --files in yarn-cluster mode

I am sending a sparkR Job to run on a Yarn cluster in cluster mode with ./bin/spark-submit script. I need to upload a file (external dataset) by the --file option. This action upload files to HDFS tempory directory. But I need to access to the path where the file was downloaded to include it directly in my SparkR code.
For java and PySpark, files distributed using --files can be accessed via SparkFiles.get(filename) method which return the absolute path of filename. Is there an equivalent in SparkR ?
I know we can work around the problem by different ways :
Put files manualy to HDFS
Deploy files on worker nodes
But I want to use this option for convinient reasons.

How to automate working directory change

I usually switch between Windows and Mac while accessing my R codes from Google Drive. One of the repetitive tasks I need to do whenever I switch between my desktop and laptop is to (un-)comment the file path to the respective directories where my google drive is located. Can anyone share an automation code on how to do this? I am already doing this in Stata.
Usually, for each project or analysis that I start I use a "config-like" R file which looks more or less like this:
.job <- list ()
## rootDir in my laptop
.job$base_data_dir <- file.path ("", "home", "dmontaner", "datos")
## rootDir in my server
##.job$base_data_dir <- file.path ("", "scratch", "datos")
In this "config" file I set the root directory where I am keeping the data in each machine. I keep a different "config" file in each machine and do not synchronize them via dropbox.
Then I start my R scripts with this line:
try (source (".job.r"))
and when I have to address any file or folder I do:
setwd (file.path (.job$base_data_dir, "raw_data"))
...
setwd (file.path (.job$base_data_dir, "results"))
Like this, if you keep the internal structure of the data directory in both machines, you are able to set the base or root dir where it is allocated and reach the data in both machines.
Also the file.path function takes care of the changes in operative system.
In the R session I call the config variable starting with a dot for it to be a hidden variable so I do not see it when I do a ls () or similar things.
That's my solution:
setwd(ifelse(.Platform$OS.type=="unix", "/Users/.../Google Drive", "C:/Users/.../Google Drive/"))

Set working directory to mapped network drive in BATCH mode

I'm having issues on windows with R failing when changing the working directory to a mapped network drive (e.g. \Share\Folder mapped to Z:) in batch mode. If I run the same script in an interactive console I don't have any issues. I am accomplishing this by running R.exe with the script specified inside of a windows batch (.bat) file. The .bat file contains the following.
"C:\RRO\R-3.2.1\bin\R.exe" CMD BATCH "C:/Scripts/Rscript.R"
The error is simply...
> setwd( 'Z:/' )
Error in setwd("Z:/") : cannot change working directory
I'd be open to a different approach entirely for scheduling these scripts via the windows task scheduler if that helps avoid the issue. The reason for mapping the drive is that I need to supply some credentials in order to access it, which is done automatically when it is mapped, but can test to see if that's not the case in R if anyone knows how.
I hope this can help with your question.
I duplicated the problem with no errors by using Rscript command instead of a CMD BATCH
my R code which I saved as a script (test1.R)
library(openxlsx)
setwd("P:/Records/Indexing Operations/Indexing Data Analysis/Daily Reports")
my.data = read.xlsx("FSI Daily Project Status Report - 18 Mar 2016.xlsx", sheet = 1)
setwd("C:/Users/golieth/Documents/")
png(filename = "test.png", width = 500, height = 350 )
plot(my.data$Total.Images, my.data$Completed.Images.A,
main = Sys.time())
dev.off()
Note I change the directory 2 times in this file. Once to access data on a mapped network drive and a 2nd to save the image to the computer. I put a timestamp of the current time as the main plot title so you can run the batch file repeatedly and verify it works
my batch file
cd C:\Program Files\R\R-3.2.3\bin\i386
Rscript C:\Users\golieth\Documents\test1.R
Note: On the batch file if your code relies on 32 bit you need to change the directory of your R program (cd) to the R 32bit program. Same with R64. Next the Rscript should reference where you have saved your .R file
Finally, and this might be stating the obvious but make sure you are connected to your VPN before running the batch file.
Imagine a batch file with
cd Z:\<Destination>
Z:
RScript "C:/Scripts/Rscript.R"
This will enable Windows to change to the directory with all credentials and then start R within that directory. So the working dir. is the location from where R is started. Doing so requires that "C:\RRO\R-3.2.1\bin\" is part of your PATH variable.
Good luck!
When writing a .bat file, remember that cd is not used to change drive letters. To change drive letters you simply enter the name of the drive letter, which should be done prior to issuing the final cd to the working directory.
Like this:
sample.bat
z:
cd z:\your\working\directory\
C:\RRO\R-3.2.1\bin\Rscript.exe C:/Scripts/Rscript.R
You can save the files locally in your code, and use file.copy in your code to copy the files over to your network drive. Also try replacing the path in file.copy the network drive letter by the full network address name eg. \\....\.....\

Resources