I'm new to DataBricks. I am trying to access a .R file that is present in the DBFS storage but I cannot figure out how to do so. Any help is really appreciated.
I can read data from the storage using the file path /dbfs and also source code from the script but I want to make edits to the script.
You need some editor to do that - for example, you can setup RStudio on your cluster and connect to it via RStudio UI - in this case you can edit R files directly on DBFS.
But really, the simplest for you would be to use Databricks CLI fs command to copy the file to your local machine, make changes in the editor of your choice, and upload file back.
I'm attempting to make a public shinyapps.io website, and I'm trying to use image_write to create a file into a local directory.
The following code works on my local R studio code:
image_write(im.resized, path = paste0(output_file_directory, file_name), format = "jpg")
When I run the code on the shinyapps.io website, the code runs without error, but I'm not sure where it downloads the file to. I know that the output_file_directory part isn't the issue, so I'm a little lost. Any help would be much appreciated!
On shinyapps.io it is not possibly to store permanently data, due to:
"Shinyapps.io is a popular server for hosting Shiny apps. It is designed to distribute your Shiny app across different servers, which means that if a file is saved during one session on some server, then loading the app again later will probably direct you to a different server where the previously saved file doesn’t exist."
See here:
https://shiny.rstudio.com/articles/persistent-data-storage.html
I have an iPad Pro as my primary computer, and am running R Studio from a cloud server for a class. I am trying to figure out how to import data sets, since my local working directory is on the server. I have been trying to download the package repmis, since I have been reading that that package allows for data set import from Dropbox. However, when I try to download the package, I get "Error:configuration failed for openssl" and a similar one for curl. I tried to install openssl but instead it says I need to install "deb" for ubuntu operating systems, but I can't find that in R Studio in the package database. (And I can't install curl without openssl either) Any suggestions?
If it's a relatively straightforward data set like a CSV, XML, JSON or even an .RData file you can use a Dropbox sharing URL to read it. Here's an example (it's a live URL) for reading in a CSV directly from a shared Dropbox link:
read.csv("https://www.dropbox.com/s/7xg5u0z1gtjcuol/mtcars.csv?dl=1")
The dl=1 won't be the default from a "share this link" copy (it'll probably be dl=0.
urla<- 'https://www.dropbox.com/s/fakecode/xyz.txt?raw=1'
bpdata<-read.table(urla, header=TRUE)
scode<-'https://www.dropbox.com/s/lp6fleehcsb3wi9/protoBP.R?raw=1'
source(scode)
readAndPlotZoomForYear(urla, 2019)
Basically I wanted to source code from a file in Dropbox and to use the read.table() function on a tab delineated data file. I found this could be done by replacing the string after the question mark with raw=1 in the Dropbox file links.
am trying to write an R script that will access an Excel file that is stored on my company's Sharepoint page so that I can make a few calculations and plot the results. I've tried various ways to do this (download.file, RCurl getURL(), gdata), but I can't seem to figure out how to do this. The url is HTTPS and there should be a username and password required. I've gotten the closest with this code:
require(RCurl)
URL<-"https://companyname.sharepoint.com/sites/folder/_layouts/15/WopiFrame.aspx?sourcedoc={2DCC2ED7-1C13-4910-AFAD-4A9ACFF1C797}&file=myfile.xlsx&action=default'
f<-getURL(URL,verbose=T,ssl.verifyhost=F,ssl.verifypeer=F,userpwd="mylogin:mypw")
This seems to connect (although the username and password don't seem to matter) and returns
> f
[1] "<html><head><title>Object moved</title></head><body>\r\n<h2>Object moved to here.</h2>\r\n</body></html>\r\n"`
However, I'm not sure what to do at this point, or even if I'm on the right track. Any help will be greatly appreciated.
I use
library(readxl)
read_excel('//companySharepointSite/project/.../ExcelFilename.xlsx', 'Sheet1', skip=1)
Note, no https:, and sometimes I have to open the file first (i.e., cut and paste //companySharepointSite/project/.../ExcelFilename.xlsx into my browser's address bar)
I found that other answers did not work for me, perhaps because I am on a Mac, which obviously does not play as well with Microsoft products such as Sharepoint.
Ended up having to split it into two pieces: first download the Excel file to disk and then separately read that Excel file.
library(httr)
library(readxl)
# the URL of your sharepoint file
file_url <- "https://yoursharepointsite/Documents/yourfile.xlsx"
# save the excel file to disk
GET(file_url,
authenticate(active_directory_username, active_directory_password, "ntlm"),
write_disk("tempfile.xlsx", overwrite = TRUE))
# save to dataframe
df <- read_excel("tempfile.xlsx")
df
# remove excel file from disk
file.remove("tempfile.xlsx")
This gets the job done, though would be interested if anyone knows how to avoid the interim step of writing to disk.
N.B. Depending on your specific machine/network/Sharepoint configuration, you may also be able to just use authenticate(":",":","ntlm") per this answer.
I was unable to accomplish this using hints from answers above in R (I tried many approaches found on this site). However, just to highlight the response by #RyanBradley above and especially the response by #ZS27:
I instead had to use the OneDrive Desktop client (Windows) to allow me to sync the folder to my computer. Newer versions of SharePoint (like that found in MS Teams) have a sync button or feature in the document libraries / folders that interfaces with OneDrive.
This is the functional equivalent of mounting the folder as a network drive, so R interfaces with the file as if it was a part of your file system. Works for me.
You may need to map a network drive to the SharePoint library so that you can connect to it directly. Or if you don't want to map a network drive you could also place a shortcut to the folder in your startup folder.
Example file path:
\company_sharepoint_site\ssp\site_name\sub_site_name\library_name
Example start up folder location (Windows 10):
C:\Users\USER_NAME\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Startup
Note direction of the slashes ("\" rather than "/") is important so that your file path is interpreted as a file location, not an internet browser location. By placing such a path in a network drive or as a shortcut in your startup folder your PC should connect to it when it boots.
# Load or install readxl
if(require(readxl) == FALSE){
install.packages("readxl")
if(require(readxl)== FALSE){stop("Unable to install and load readxl")}
}
# Define path to data
data_path <- "\\\\company_sharepoint_site\\ssp\\site_name\\sub_site_name\\library_name\\Example.xlsx"
# Pull data
df_employees <- read_xlsx(data_path)
I had a situation exactly like you. I want to access an excel file, available on an sharepoint site using R programming language.
I have also surfed many stuff in Internet and I didn't find anything relevant to my requirement.
Then, I have attempted the following thing:
I have made the sharepoint folder as a network drive folder, in my local system.
Then, I have accessed that excel file(in sharepoint site) from my machine without accessing web browser.
Hence, I have copied the network path, present in my system (it will be same as your sharepoint site, however it will not have https/http.
The site will start with "\" like the following: "\sharepoint.test.com\folder\path").
Launch RStudio and select Import Dataset option, under Environment section.
Choose 'From Excel'. 'Import Excel Data' form will be opened.
Under File/URL field: Paste the network path of sharepoint (copied from your machine).
Click Import, the excel file in Sharepoint will be imported in R successfully.
Ensure that the file should not have html language as input (lie %20 and all) and Backslash should be used as separator in the URL.
While importing the file, provide the input of the folder name exactly, as you see.
For example:
Sharepoint.microsoft.com - Sharepoint's Domain
Department name - name of the Folder
Project name - name of the folder
Sample.xlsx - name of the file
So, your URL to import dataset should be:
"\Sharepoint.microsoft.com\Department name\Project name\Sample.xlsx".
Thank you!
Try using the link in this format:
http://site/_layouts/download.aspx?SourceUrl=url-of-document-in-library
If above doesn't work try this syntax [note slash directions]:
"\\gov.sharepoint.com#SSL/DavWWWRoot/sites/SomePath/SomePath/SomePath/SomeFile"
See this for more info about syntax and what's going on:
Connect to a site via SSL/DavWWWRoot not usual URL? Why does this make a difference?
If I put a very small csv file in my GitHub directory so that it gets copied to /ocpu/github/username/projectname/www/ , will I be able to access the contents of the csv for use in a R function? I tried to ajax the file, but I get a 404 error even though I can see the csv file sitting in the www directory of my local server. I need to have the csv on the server as a static file rather than being uploaded by a function. Thanks
You should be able to access them like any other file. Can you post an example that shows what you are doing and what error you are getting?
That said, if you just want to use this data in your R functions, it is better to include it in the R package as an actual data file. Also see section 1.1.6 of Writing R Extensions. An example is the mapapp package, which includes a dataset called countryExData. Also see the live app.