Azure Databricks: How do we access R Scripts present on DBFS? - r

I'm new to DataBricks. I am trying to access a .R file that is present in the DBFS storage but I cannot figure out how to do so. Any help is really appreciated.
I can read data from the storage using the file path /dbfs and also source code from the script but I want to make edits to the script.

You need some editor to do that - for example, you can setup RStudio on your cluster and connect to it via RStudio UI - in this case you can edit R files directly on DBFS.
But really, the simplest for you would be to use Databricks CLI fs command to copy the file to your local machine, make changes in the editor of your choice, and upload file back.

Related

How to execute Python Script through Informatica Cloud

I have a python script that I need to execute and automate via IICS. The output of the script is a csv file. This output should be loaded to the Target. How can I achieve this via Informatica cloud. Please help with some info and documentations to the same.
Thanks
There are two ways to do this.
You can create an executable(using py2exe or some tool) from your py script. Then put that file in Informatica cloud agent server. Then you can call it using shell command. Please note, you do not need to install python or any packages.
You can also put the .py file in agent server and run it using shell like $PYTHON_HOME/python your_script.py You need to make sure py version is compatible and you have all packages installed in agent server.
You can refer to the below screenshot for how to setup shell command. Then you can run it as part of some workflow. Schedule it if needed.
https://i.stack.imgur.com/wnDOV.png

open a OneDrive file with r

Im tring to create a shiny app that read and online onedrive xlsx file and show some things, but for the moment Im unable to read the onedrive xlsx file, I already explore the Microsoft365R and I can conect to my onedrive and I even can open the fil but... what it does is from r open a tab in chrome with the excel file.
I need the file in the local enviroment of r.. this its beacause the shiny app must be deploy in a web server, that every time the app runs it reads the update the file.
library(Microsfot365R)
odb <- get_business_onedrive()
odb$open_file("lcursos.xlsx")
Also this its a business account, so I also have to put the username and key to acces each file, that its beacause use the simple url doesnt work, it says Error 403 FORBIDEEN.
Any ideas?
Thank you so much!
Use the download_file() method to download the file to your local machine:
odb$download_file("lcursos.xlsx")
You can set the location of the download with the dest argument. Once it's downloaded, open it with the xls reader package of your choice. I suggest either openxlsx or readxl.
Note that if your file is password protected, your options are limited. See this question for possible solutions.

How to connect RStudio to OneDrive Excel file?

Currently trying to connect to a OneDrive file that contains a spreadsheet. Trying to pull some IDs from the file. What packages might achieve this?
Can the DBI package connect to OneDrive?
AS suggested by #chthonicdaemon you could install OneDrive locally, sync the file and use it exactly as a local file.
But there are other options - the most obvious being:
https://cran.r-project.org/web/packages/Microsoft365R/vignettes/od_sp.html

using sparklyr in RStudio, can I upload a LOCAL csv file to a spark cluster

I'm pretty new to cluster computing, so not sure if this is even possible.
I am successfully creating a spark_context in Rstudio (using sparklyr) to connect to our local Spark cluster. Using copy_to I can upload data frames from R to Spark, but I am trying to upload a locally stored CSV file directly to the Spark cluster using spark_read_csv without importing it into the R environment first (it's a big 5GB file). It's not working (even prefixing location with file:///), and it seems that it can only upload files that are ALREADY stored in the cluster.
How do I upload a local file directly to spark without loading it into R first??
Any tips appreciated.
You cannot. File has to be reachable from each machine in your cluster either as a local copy or placed on distributed files system / object storage.
You can upload the files from local to spark by using spark_read_csv() method. Please pass the path properly.
Note: It is not necessary to load the data first into R environment.

R shiny concurrent file access

I am using the R shiny package to build a web interface for my executable program. The web interface provides user input and shows output.
On the server background, the R script formats user inputs and saves them to a local input file. Then R calls the system command to run the executable program.
My concern is that if multiple users run the web app at the same time, it is possible that the input file generated by the first user will be overwritten by the second user's input before it is read by the executable program.
One way to solve the conflict is to ask R to create a temporary folder and generate/run the input file under that folder for each user. But I'd like to know whether there is a better or automatic way to resolve this potential conflict with shiny. For example, if use shiny fileInputs, the uploaded files are automatically stored in a temporary folder.
Update
Thanks for the advice.#Symbolix and #Mike Wise
I read the persistent data storage article before but I don't think it is exactly what I wanted. Maybe my understanding is not correct. I end up with creating a temporary folder and run my executable from there.

Resources