Access sharepoint folders in R - r

I'm currently trying to access sharepoint folders in R. I read multiple articles addressing that issue but all the proposed solutions don't seem to work in my case.
I first tried to upload a single .txt file using the httr package, as follows:
URL <- "<domain>/<file>/<subfile>/document.txt"
r <- httr::GET(URL, httr::authenticate("username","password",type="any"))
I get the following error:
Error in curl::curl_fetch_memory(url, handle = handle) :
URL using bad/illegal format or missing URL
I then tried another package that use a similar syntax (RCurl):
URL <- "<domain>/<file>/<subfile>/document.txt"
r <- getURL(URL, userpwd = "username:password")
I get the following error:
Error in function (type, msg, asError = TRUE) :
I tried many other ways of linking R to sharepoint, but these two seemed the most straightforward. (also, my URL doesn't seem to be the problem since it works when I run it in my web browser).
Ultimately, I want to be able to upload a whole sharepoint folder to R (not only a single document). Something that would really help is to set my sharepoint folder as my working directory and use the base::list.files() function to list files in my folder, but I doubt thats possible.
Does anyone have a clue how I can do that?

I created an R library called sharepointr for doing just that.
What I basically did was:
Create App Registration
Add permissions
Get credentials
Make REST calls
The Readme.md for the repository has a full description, and here is an example:
# Install
install.packages("devtools")
devtools::install_github("esbeneickhardt/sharepointr")
# Parameters
client_id <- "insert_from_first_step"
client_secret <- "insert_from_first_step"
tenant_id <- "insert_from_fourth_step"
resource_id <- "insert_from_fourth_step"
site_domain <- "yourorganisation.sharepoint.com"
sharepoint_url <- "https://yourorganisation.sharepoint.com/sites/MyTestSite"
# Get Token
sharepoint_token <- get_sharepoint_token(client_id, client_secret, tenant_id, resource_id, site_domain)
# Get digest value
sharepoint_digest_value <- get_sharepoint_digest_value(sharepoint_token, sharepoint_url)
# List folders
sharepoint_path <- "Shared Documents/test"
get_sharepoint_folder_names(sharepoint_token, sharepoint_url, sharepoint_digest_value, sharepoint_path)

Related

Reading excel files from sharepoint folder in R

Currently I am building the automated process to clean and transform excel data from sharepoint using R. I have trouble reading excel files from sharepoint in R. I read a couple of posts (Accessing Excel file from Sharepoint with R, for instance), and tried a couple of suggestions, but none worked for me. The all error message are "Path" does not exist. Could someone give me some light for that?
I ran GET() and the link works:
r <- GET(url, authenticate("window_username","window_password",type="any"))
I run into the same issue using the following code to get the info from an excel on this sharepoint site with the same error as the one in the original question:
data <- read_excel(url)
Any feedback would be greatly appreciated.
To make access to SharePoint files easy you should sync the sites from the web app to File Explorer. Addresses for these cloud resources that have been synced are commonly of the form: C:\Users\username\My Org\My Teams Group - General\Project\My Excel.xlsx This can create a problem when the code is run multiple users. Whilst https addresses for cloud locations may work in File Explorer they do not work directly within R packages. If relative addresses don't work you can make the code user agnostic by setting the username as a variable or returning the homepath with Sys.getenv() function.
library(openxlsx)
username <- Sys.getenv("USERNAME")
sharepoint_address <- "/My Org/My Teams Group – General/Project/My Excel.xlsx"
df <- read.xlsx(xlsxFile = paste0("C:/Users/",username,sharepoint_address), sheet = "Raw Data”)
# More elegantly
df <- read.xlsx(xlsxFile = paste0(Sys.getenv("HOMEPATH"),sharepoint_address), sheet = "Raw Data”)

How to resolve issue with path with here package in R?

I had the following piece of code which is used for obtaining 4 csv files from a directory called RawData and combining the rows using rbind which works fine
library(data.table)
setwd("C:/Users/Gunathilakel/Desktop/Vera Wrap up Analysis/Vera_Wrapup_Analysis/RawData")
myMergedData <-
do.call(rbind,
lapply(list.files(path = getwd()), fread))
However, I want to ensure that this code is reproducible in another computer and decided get rid of setwd. So I decided to use the here package and implement the same procedure
library(here)
myMergedData <-
do.call(rbind,
lapply(list.files(path = here("RawData")), fread))
When I run this above script it gives the following message
Taking input= as a system command ('Issued and Referral Charge-2019untildec25th.csv') and a variable has been used in the expression passed to `input=`. Please use fread(cmd=...). There is a security concern if you are creating an app, and the app could have a malicious user, and the app is not running in a secure environment; e.g. the app is running as root. Please read item 5 in the NEWS file for v1.11.6 for more information and for the option to suppress this message.
'Issued' is not recognized as an internal or external command,
operable program or batch file.
The list.files call will return the filename Issued and Referral Charge-2019untildec25th.csv without its path. You need
list.files(path = here("RawData"), full.names = TRUE)
so that you get the path as well, and fread will be able to find the file.

Cannot access EIA API in R

I'm having trouble accessing the Energy Information Administration's API through R (https://www.eia.gov/opendata/).
On my office computer, if I try the link in a browser it works, and the data shows up (the full url: https://api.eia.gov/series/?series_id=PET.MCREXUS1.M&api_key=e122a1411ca0ac941eb192ede51feebe&out=json).
I am also successfully connected to Bloomberg's API through R, so R is able to access the network.
Since the API is working and not blocked by my company's firewall, and R is in fact able to connect to the Internet, I have no clue what's going wrong.
The script works fine on my home computer, but at my office computer it is unsuccessful. So I gather it is a network issue, but if somebody could point me in any direction as to what the problem might be I would be grateful (my IT department couldn't help).
library(XML)
api.key = "e122a1411ca0ac941eb192ede51feebe"
series.id = "PET.MCREXUS1.M"
my.url = paste("http://api.eia.gov/series?series_id=", series.id,"&api_key=", api.key, "&out=xml", sep="")
doc = xmlParse(file=my.url, isURL=TRUE) # yields error
Error msg:
No such file or directoryfailed to load external entity "http://api.eia.gov/series?series_id=PET.MCREXUS1.M&api_key=e122a1411ca0ac941eb192ede51feebe&out=json"
Error: 1: No such file or directory2: failed to load external entity "http://api.eia.gov/series?series_id=PET.MCREXUS1.M&api_key=e122a1411ca0ac941eb192ede51feebe&out=json"
I tried some other methods like read_xml() from the xml2 package, but this gives a "could not resolve host" error.
To get XML, you need to change your url to XML:
my.url = paste("http://api.eia.gov/series?series_id=", series.id,"&api_key=",
api.key, "&out=xml", sep="")
res <- httr::GET(my.url)
xml2::read_xml(res)
Or :
res <- httr::GET(my.url)
XML::xmlParse(res)
Otherwise with the post as is(ie &out=json):
res <- httr::GET(my.url)
jsonlite::fromJSON(httr::content(res,"text"))
or this:
xml2::read_xml(httr::content(res,"text"))
Please note that this answer simply provides a way to get the data, whether it is in the desired form is opinion based and up to whoever is processing the data.
If it does not have to be XML output, you can also use the new eia package. (Disclaimer: I'm the author.)
Using your example:
remotes::install_github("leonawicz/eia")
library(eia)
x <- eia_series("PET.MCREXUS1.M")
This assumes your key is set globally (e.g., in .Renviron or previously in your R session with eia_set_key). But you can also pass it directly to the function call above by adding key = "yourkeyhere".
The result returned is a tidyverse-style data frame, one row per series ID and including a data list column that contains the data frame for each time series (can be unnested with tidyr::unnest if desired).
Alternatively, if you set the argument tidy = FALSE, it will return the list result of jsonlite::fromJSON without the "tidy" processing.
Finally, if you set tidy = NA, no processing is done at all and you get the original JSON string output for those who intend to pass the raw output to other canned code or software. The package does not provide XML output, however.
There are more comprehensive examples and vignettes at the eia package website I created.

Moving data from local directory to AWS

I'm very new to R so be gentle. I've been tasked to make some amendments to a pre-existing project.
I have some code:
#SHINY_ROOT <- getwd()
#ARCHIVE_FILEPATH <- file.path(SHINY_ROOT, 'Data', 'archived_pqs.csv')
I want to move 'archived_pqs.csv' into S3 (Amazon Web Services), preferably while making as few changes to the rest of the code as possible.
My first thought was that I could do this:
ARCHIVE_FILEPATH <- s3tools::s3_path_to_full_df("alpha-pq-tool-data/Data/archived_pqs.csv")
Where 'alpha-pq-tool-data' is the S3 bucket.
I've tested this and it does indeed pull in the dataframe:
df <-s3tools::s3_path_to_full_df("alpha-pq-tool-data/Data/archived_pqs.csv")
The issue is that when I run other functions that go as follows:
if(file.exists(ARCHIVE_FILEPATH)) {
date <- last_answer_date()}
I get this error:
Error in file.exists(ARCHIVE_FILEPATH) : invalid 'file' argument
Called from: file.exists(ARCHIVE_FILEPATH)
Is there any easy way of doing this while making minimal changes? Can I no longer use file.exists function because the data is in S3?

How to download an .xlsx file from a dropbox (https:) location

I'm trying to adopt the Reproducible Research paradigm but meet people who like looking at Excel rather than text data files half way, by using Dropbox to host Excel files which I can then access using the .xlsx package.
Rather like downloading and unpacking a zipped file I assumed something like the following would work:
# Prerequisites
require("xlsx")
require("ggplot2")
require("repmis")
require("devtools")
require("RCurl")
# Downloading data from Dropbox location
link <- paste0(
"https://www.dropbox.com/s/",
"{THE SHA-1 KEY}",
"{THE FILE NAME}"
)
url <- getURL(link)
temp <- tempfile()
download.file(url, temp)
However, I get Error in download.file(url, temp) : unsupported URL scheme
Is there an alternative to download.file that will accept this URL scheme?
Thanks,
Jon
You have the wrong URL - the one you are using just goes to the landing page. I think the actual download URL is different, I managed to get it sort of working using the below.
I actually don't think you need to use RCurl or the getURL() function, and I think you were leaving out some relatively important /'s in your previous formulation.
Try the following:
link <- paste("https://dl.dropboxusercontent.com/s",
"{THE SHA-1 KEY}",
"{THE FILE NAME}",
sep="/")
download.file(url=link,destfile="your.destination.xlsx")
closeAllConnections()
UPDATE:
I just realised there is a source_XlsxData function in the repmis package, which in theory should do the job perfectly.
Also the function below works some of the time but not others, and appears to get stuck at the GET line. So, a better solution would be very welcome.
I decided to try taking a step back and figure out how to download a raw file from a secure (https) url. I adapted (butchered?) the source_url function in devtools to produce the following:
download_file_url <- function (
url,
outfile,
..., sha1 = NULL)
{
require(RCurl)
require(devtools)
require(repmis)
require(httr)
require(digest)
stopifnot(is.character(url), length(url) == 1)
filetag <- file(outfile, "wb")
request <- GET(url)
stop_for_status(request)
writeBin(content(request, type = "raw"), filetag)
close(filetag)
}
This seems to work for producing local versions of binary files - Excel included. Nicer, neater, smarter improvements in this gratefully received.

Resources