Reading excel files from sharepoint folder in R - r

Currently I am building the automated process to clean and transform excel data from sharepoint using R. I have trouble reading excel files from sharepoint in R. I read a couple of posts (Accessing Excel file from Sharepoint with R, for instance), and tried a couple of suggestions, but none worked for me. The all error message are "Path" does not exist. Could someone give me some light for that?

I ran GET() and the link works:
r <- GET(url, authenticate("window_username","window_password",type="any"))
I run into the same issue using the following code to get the info from an excel on this sharepoint site with the same error as the one in the original question:
data <- read_excel(url)
Any feedback would be greatly appreciated.

To make access to SharePoint files easy you should sync the sites from the web app to File Explorer. Addresses for these cloud resources that have been synced are commonly of the form: C:\Users\username\My Org\My Teams Group - General\Project\My Excel.xlsx This can create a problem when the code is run multiple users. Whilst https addresses for cloud locations may work in File Explorer they do not work directly within R packages. If relative addresses don't work you can make the code user agnostic by setting the username as a variable or returning the homepath with Sys.getenv() function.
library(openxlsx)
username <- Sys.getenv("USERNAME")
sharepoint_address <- "/My Org/My Teams Group – General/Project/My Excel.xlsx"
df <- read.xlsx(xlsxFile = paste0("C:/Users/",username,sharepoint_address), sheet = "Raw Data”)
# More elegantly
df <- read.xlsx(xlsxFile = paste0(Sys.getenv("HOMEPATH"),sharepoint_address), sheet = "Raw Data”)

Related

Access sharepoint folders in R

I'm currently trying to access sharepoint folders in R. I read multiple articles addressing that issue but all the proposed solutions don't seem to work in my case.
I first tried to upload a single .txt file using the httr package, as follows:
URL <- "<domain>/<file>/<subfile>/document.txt"
r <- httr::GET(URL, httr::authenticate("username","password",type="any"))
I get the following error:
Error in curl::curl_fetch_memory(url, handle = handle) :
URL using bad/illegal format or missing URL
I then tried another package that use a similar syntax (RCurl):
URL <- "<domain>/<file>/<subfile>/document.txt"
r <- getURL(URL, userpwd = "username:password")
I get the following error:
Error in function (type, msg, asError = TRUE) :
I tried many other ways of linking R to sharepoint, but these two seemed the most straightforward. (also, my URL doesn't seem to be the problem since it works when I run it in my web browser).
Ultimately, I want to be able to upload a whole sharepoint folder to R (not only a single document). Something that would really help is to set my sharepoint folder as my working directory and use the base::list.files() function to list files in my folder, but I doubt thats possible.
Does anyone have a clue how I can do that?
I created an R library called sharepointr for doing just that.
What I basically did was:
Create App Registration
Add permissions
Get credentials
Make REST calls
The Readme.md for the repository has a full description, and here is an example:
# Install
install.packages("devtools")
devtools::install_github("esbeneickhardt/sharepointr")
# Parameters
client_id <- "insert_from_first_step"
client_secret <- "insert_from_first_step"
tenant_id <- "insert_from_fourth_step"
resource_id <- "insert_from_fourth_step"
site_domain <- "yourorganisation.sharepoint.com"
sharepoint_url <- "https://yourorganisation.sharepoint.com/sites/MyTestSite"
# Get Token
sharepoint_token <- get_sharepoint_token(client_id, client_secret, tenant_id, resource_id, site_domain)
# Get digest value
sharepoint_digest_value <- get_sharepoint_digest_value(sharepoint_token, sharepoint_url)
# List folders
sharepoint_path <- "Shared Documents/test"
get_sharepoint_folder_names(sharepoint_token, sharepoint_url, sharepoint_digest_value, sharepoint_path)

Equivalent R function to slackr_upload for microsoft teams

We have recently moved from slack to Microsoft teams. There was a useful function (slackr) that allowed for files to be uploaded to slack from R (example below) and so wondering if there is an equivalent for Microsoft teams.
library(slackr)
slackrSetup(incoming_webhook_url = "webhook-url",
api_token = "api-token")
d1 <-
data.frame(col1 = "a", col2 = "b")
write.table(
d1,
file = paste0("my-location/export.csv"))
slackr_upload(paste0("my-location/export.csv"),
channel = "my-channel")
I have found that there is a teamr function which is useful for messages, but doesn't allow uploading of files. I have attempted to at least format the contents of the dataframe as a table in markdown in the message sent from teamr, but as the tables can be quite large (500 rows, 20-30 columns) this isn't convenient for the Microsoft teams users to extract the data.
Alternatively, I can create and send an email with an attachment from R, but hoping there is an approach to keep it to teams that I have missed.
Like #Gakku said I think that could be achieved with Microsoft365R package.
I think something in line this would put it in specific team, even specific channel creating upload folder along the way
library(Microsoft365R)
team <- get_team("NAME OF YOUR TEAM")
channel <- team$get_channel("NAME OF YOUR CHANNEL")
channel$get_folder()$create_folder("UPLOAD LOCATION")
channel$get_folder()$get_item("UPLOAD LOCATION")$upload("UPLOAD_FILE.CSV")
I know this is old, but in case someone comes across this, look at microsoft365r which lets you upload files and much more in MS teams.

Cannot access EIA API in R

I'm having trouble accessing the Energy Information Administration's API through R (https://www.eia.gov/opendata/).
On my office computer, if I try the link in a browser it works, and the data shows up (the full url: https://api.eia.gov/series/?series_id=PET.MCREXUS1.M&api_key=e122a1411ca0ac941eb192ede51feebe&out=json).
I am also successfully connected to Bloomberg's API through R, so R is able to access the network.
Since the API is working and not blocked by my company's firewall, and R is in fact able to connect to the Internet, I have no clue what's going wrong.
The script works fine on my home computer, but at my office computer it is unsuccessful. So I gather it is a network issue, but if somebody could point me in any direction as to what the problem might be I would be grateful (my IT department couldn't help).
library(XML)
api.key = "e122a1411ca0ac941eb192ede51feebe"
series.id = "PET.MCREXUS1.M"
my.url = paste("http://api.eia.gov/series?series_id=", series.id,"&api_key=", api.key, "&out=xml", sep="")
doc = xmlParse(file=my.url, isURL=TRUE) # yields error
Error msg:
No such file or directoryfailed to load external entity "http://api.eia.gov/series?series_id=PET.MCREXUS1.M&api_key=e122a1411ca0ac941eb192ede51feebe&out=json"
Error: 1: No such file or directory2: failed to load external entity "http://api.eia.gov/series?series_id=PET.MCREXUS1.M&api_key=e122a1411ca0ac941eb192ede51feebe&out=json"
I tried some other methods like read_xml() from the xml2 package, but this gives a "could not resolve host" error.
To get XML, you need to change your url to XML:
my.url = paste("http://api.eia.gov/series?series_id=", series.id,"&api_key=",
api.key, "&out=xml", sep="")
res <- httr::GET(my.url)
xml2::read_xml(res)
Or :
res <- httr::GET(my.url)
XML::xmlParse(res)
Otherwise with the post as is(ie &out=json):
res <- httr::GET(my.url)
jsonlite::fromJSON(httr::content(res,"text"))
or this:
xml2::read_xml(httr::content(res,"text"))
Please note that this answer simply provides a way to get the data, whether it is in the desired form is opinion based and up to whoever is processing the data.
If it does not have to be XML output, you can also use the new eia package. (Disclaimer: I'm the author.)
Using your example:
remotes::install_github("leonawicz/eia")
library(eia)
x <- eia_series("PET.MCREXUS1.M")
This assumes your key is set globally (e.g., in .Renviron or previously in your R session with eia_set_key). But you can also pass it directly to the function call above by adding key = "yourkeyhere".
The result returned is a tidyverse-style data frame, one row per series ID and including a data list column that contains the data frame for each time series (can be unnested with tidyr::unnest if desired).
Alternatively, if you set the argument tidy = FALSE, it will return the list result of jsonlite::fromJSON without the "tidy" processing.
Finally, if you set tidy = NA, no processing is done at all and you get the original JSON string output for those who intend to pass the raw output to other canned code or software. The package does not provide XML output, however.
There are more comprehensive examples and vignettes at the eia package website I created.

Moving data from local directory to AWS

I'm very new to R so be gentle. I've been tasked to make some amendments to a pre-existing project.
I have some code:
#SHINY_ROOT <- getwd()
#ARCHIVE_FILEPATH <- file.path(SHINY_ROOT, 'Data', 'archived_pqs.csv')
I want to move 'archived_pqs.csv' into S3 (Amazon Web Services), preferably while making as few changes to the rest of the code as possible.
My first thought was that I could do this:
ARCHIVE_FILEPATH <- s3tools::s3_path_to_full_df("alpha-pq-tool-data/Data/archived_pqs.csv")
Where 'alpha-pq-tool-data' is the S3 bucket.
I've tested this and it does indeed pull in the dataframe:
df <-s3tools::s3_path_to_full_df("alpha-pq-tool-data/Data/archived_pqs.csv")
The issue is that when I run other functions that go as follows:
if(file.exists(ARCHIVE_FILEPATH)) {
date <- last_answer_date()}
I get this error:
Error in file.exists(ARCHIVE_FILEPATH) : invalid 'file' argument
Called from: file.exists(ARCHIVE_FILEPATH)
Is there any easy way of doing this while making minimal changes? Can I no longer use file.exists function because the data is in S3?

Harvesting data with rvest retrieves no value from data-widget

I'm trying to harvest data using rvest (also tried using XML and selectr) but I am having difficulties with the following problem:
In my browser's web inspector the html looks like
<span data-widget="turboBinary_tradologic1_rate" class="widgetPlaceholder widgetRate rate-down">1226.45</span>
(Note: rate-downand 1226.45 are updated periodically.) I want to harvest the 1226.45 but when I run my code (below) it says there is no information stored there. Does this have something to do with
the fact that its a widget? Any suggestions on how to proceed would be appreciated.
library(rvest);library(selectr);library(XML)
zoom.turbo.url <- "https://www.zoomtrader.com/trade-now?game=turbo"
zoom.turbo <- read_html(zoom.turbo.url)
# Navigate to node
zoom.turbo <- zoom.turbo %>% html_nodes("span") %>% `[[`(90)
# No value
as.character(zoom.turbo)
html_text(zoom.turbo)
# Using XML and Selectr
doc <- htmlParse(zoom.turbo, asText = TRUE)
xmlValue(querySelector(doc, 'span'))
For websites that are difficult to scrape, for example where the content is dynamic, you can use RSelenium. With this package and a browser docker, you are able to navigate websites with R commands.
I have used this method to scrape a website that had a dynamic login script, that I could not get to work with other methods.

Resources