how to download files from a sharepoint folder with R - r

I have some files in my company's sharepint folder. I need to use r to download all files in that folder. How can I do that with R?
I googled the question and got this link. It's about uploading the file to a sharepoint folder.
[Uploading files to SharePoint from R
The code from the link is like below:
saveToSharePoint <- function(fileName)
{
cmd <- paste("curl --max-time 7200 --connect-timeout 7200 --ntlm --user","username:password",
"--upload-file /home/username/FolderNameWhereTheFileToTransferExists/",fileName,
"teamsites.OrganizationName.com/sites/PageTitle/Documents/UserDocumentation/FolderNameWhereTheFileNeedsToBeCopied/",fileName, sep = " ")
system(cmd)
}
saveToSharePoint("SomeFileName.Ext")
It's about uploading one specific file. But what I need is download files from sharepoint folder.
so, I modified the code to copy from sharepoint folder. I changed --upload-file to --download-file.
copyFromSharePoint <- function(fileName)
{
cmd <- paste("curl --max-time 7200 --connect-timeout 7200 --ntlm --user","username:password",
"--download-file teamsites.OrganizationName.com/sites/PageTitle/Documents/FolderNameWhereTheFileToTransferExists/",fileName,
"home/username/UserDocumentation/FolderNameWhereTheFileNeedsToBeCopied/",fileName, sep = " ")
system(cmd)
}
copyFromSharePoint("SomeFileName.Ext")
However, that does not seem to work. Error message below:
curl: option --download-file: is unknown
Does anyone know how I can do it with R?
In addition, what if I need to download all files in the folder, not one specific file.Does anyone know how I can do it with R?

Solution
I made a simple R Package called sharepointr to upload and download files from SharePoint.
The way I made the whole download/upload to sharepoint work from R was to:
Make a SharePoint App Registration
Add permissions to App Registration
Getting "tenant_id" and "resource_id" using cURL
Make get/post calls to API
It is all described in the package Readme.md
Example
# Installing the package
install.packages("devtools")
devtools::install_github("esbeneickhardt/sharepointr")
# Setting parameters
client_id <- "insert_from_first_step"
client_secret <- "insert_from_first_step"
tenant_id <- "insert_from_fourth_step"
resource_id <- "insert_from_fourth_step"
site_domain <- "yourorganisation.sharepoint.com"
sharepoint_url <- "https://yourorganisation.sharepoint.com/sites/MyTestSite"
# Getting a SharePoint Token
sharepoint_token <- get_sharepoint_token(client_id, client_secret, tenant_id, resource_id, site_domain)
# Getting sharepoint digest value
sharepoint_digest_value <- get_sharepoint_digest_value(sharepoint_token, sharepoint_url)
# Downloading a file
sharepoint_path <- "Shared Documents/test"
sharepoint_file_name <- "Mappe.xlsx"
out_path <- "C:/Users/User/Desktop/"
download_sharepoint_file(sharepoint_token, sharepoint_url, sharepoint_digest_value, sharepoint_path, sharepoint_file_name, out_path)

Related

Read open excel file in R

is there a way to read an open excel file into R?
When an excel file is open in Excel, Excel puts a lock on the file, such as the reading method in R cannot access the file.
Can you circumvent this lock?
Thanks
Edit: this occurs under windows with original excel.
I too do not have problem opening xlsx files that are already open in excel, but if you do i have a workaround that might work:
path_to_xlsx <- "C:/Some/Path/to/test.xlsx"
temp <- tempdir()
file.copy(path_to_xlsx, to = paste0(temp, "/test.xlsx"))
df <- openxlsx::read.xlsx(paste0(temp, "/test.xlsx"))
This copies the file (Which should not be blocked) to a temporary directory, and then loads the file from there. Again, i'm not sure if this is needed, as i do not have the problem you have.
You could try something like this using the ps package. I've used it on Windows and Mac to read from files that I had downloaded from some web resource and opened in Excel with openxlsx2, but it should work with other packages or programs too.
# get the path to the open file via the ps package
library(ps)
p <- ps()
# get the pid for the current program, in my case Excel on Mac
ppid <- p$pid[grepl("Excel", p$name)]
# get the list of open files for that program
pfiles <- ps_open_files(ps_handle(ppid))
pfile <- pfiles[grepl(".xlsx", pfiles$path),]
# return the path to the file
sel <- grepl("^(.|[^~].*)\\.xlsx", basename(pfile$path))
path <- pfile$path[sel]
What do you mean by "the reading method in R", and by "cannot access the file" (i.e. what code are you using and what error message do you get exactly)? I'm successfully importing Excel files that are currently open, with something like:
dat <- readxl::read_excel("PATH/TO/FILE.xlsx")
If the file is being edited in Excel, R imports the last saved version.
EDIT: I've now tried it on both Linux and Windows and it still works, at least with version 1.3.1 of 'readxl'.

Download a large zipped CSV file, unzip and read into R on Linux

I wish to read into my environment a large CSV (~ 8Gb) but I am having issues.
My data is a publicly available dataset:
# CREATE A TEMP FILE TO STORE THE DOWNLOADED DATA
temp <- tempfile()
# DOWNLOAD THE FILE FROM THE CMS
download.file("https://download.cms.gov/nppes/NPPES_Data_Dissemination_February_2022.zip",
destfile = temp)
This is where I'm running into difficulty, I am unfamiliar with linux working directories and where temp folders are created.
When I use list.dir() or list.files() I don't see any reference to this temp file.
I am working in an R project and my working director is as follows:
getwd()
[1] "/home/myName/myProjectName"
I'm able to read in the first part of the file but my system crashes after about 4Gb.
# UNZIP THE NPI FILE
npi <- unz(temp, "npidata_pfile_20050523-20220213.csv")
I then came across this post which has a function for decompressing large zip files using the system2 unzip functionality. However due to my limited R knowledge and Linux experience I couldn't get the function to point to the downloaded file in the temp folder
checking the path for temp above I get the following path:
temp
[1] "/tmp/Rtmpl6SHIJ/file7e5e6c1fc693"
Using the system2 function from the link above I tried the following:
x <- decompress_file(directory = temp,
file = "NPPES_Data_Dissemination_February_2022.zip")
But get the following error about setting the working directory:
Any pointers to how I can get this file unzipped given it's size and read it into memory would be much appreciated.
It might be a file permission issue. To get around it work in a directory you're already in, or know you have access to.
# DOWNLOAD THE FILE
# to a directory you can access, and name the file. No need to overcomplicate this.
download.file("https://download.cms.gov/nppes/NPPES_Data_Dissemination_February_2022.zip",
destfile = "/home/myName/myProjectname/npi.csv")
# use the decompress function if you need to, though unzip might work
x <- decompress_file(directory = "/home/myName/myProjectname/",
file = "npi.zip")
# remove .zip file if you need the space back
file.remove("/home/myName/myProjectname/npi.zip")
temp is the path to the file, not just the directory. By default, tempfile does not add a file extension. It can be done by using tempfile(fileext = ".zip")
Consequently, decompress_file can not set the working directory to a file. Try this:
x <- decompress_file(directory = dirname(temp), file = basename(temp))

Getting a zip file with R httr package: file is empty

I am downloading a zip file from a site that uses basic authentication. The download seems to be going OK but when I try the unzip the file, it turns out to be empty. When I download the file by hand, it has several folders and files in it.
Here's what I am doing:
library(httr)
dest <- paste0(getwd(), "/data/weekly_2017-02-18.zip")
GET("https://www.example.com/weeklydata/weekly_2017-02-18.zip",
authenticate("myemail", "mypassword", "basic"),
write_disk(dest, overwrite = TRUE))
unzip(dest) # <-- THIS FAILS
What am I doing wrong?
unzip(dest)) <-- there is an extra closing bracket here
use:
unzipped <- unzip(dest)

Copying file to sharepoint library in R

I am trying to copy files from a network drive location to a sharepoint library in R. The sharepoint library location requires user authentication and I was wondering how I can copy these files and pass authentication in code. A simple file.copy does not work. I was attempting to use the getURL() function from RCurl library but that hasn't worked either. I was wondering how I can accomplish this task - copying files while passing authentication.
Here are some code snippets that I have tried so far:
library(RCurl)
from <- "filename"
to <- "\\\\sharepoint.company.com\\Directory" #First attempt with just sharepoint location
to <- "file://sharepoint.company.com/Directory" #Another attempt with different format
h = getCurlHandle(header = TRUE, userpwd = "username:password")
getURL(to, verbose = TRUE, curl = h)
status <- file.copy(from, to)
Thank you!
Not the most elegant solution but if you're looking to save into a single library on SharePoint, you can first map that library as a drive on your local machine.
Simply use setwd() to point to whatever drive letter you mapped the library to. You can then treat that Sharepoint library as if it were any other shared drive location, reading and writing files from/to it.
I just use the following function to copy files to SharePoint. The only issue will be the file that was transferred will remain checked-out until the File is manually Checked-in for others to use.
saveToSharePoint <- function(fileName)
{
cmd <- paste("curl --max-time 7200 --connect-timeout 7200 --ntlm --user","username:password",
"--upload-file /home/username/FolderNameWhereTheFileToTransferExists/",fileName,
"teamsites.OrganizationName.com/sites/PageTitle/Documents/UserDocumentation/FolderNameWhereTheFileNeedsToBeCopied/",fileName, sep = " ")
system(cmd)
}
saveToSharePoint("SomeFileName.Ext")
If you have SharePoint online, you can navigate to that library and click on the "Sync to Computer" button (has an icon with arrows and a computer). Then you can have this as a oneDrive and write directly to it.

How to open Excel 2007 File from Password Protected Sharepoint 2007 site in R using RODBC or RCurl?

I am interested in opening an Excel 2007 file in R 2.11.1 using RODBC. The Excel file resides in the shared documents page of a MOSS2007 website. I currently download the .xlsx file to my hard drive and then import to R using the following code:
library(RODBC)
con<-odbcConnectExcel2007("C:/file location/file.xlsx")
data<-sqlFetch(con, "worksheet name")
close(con)
When I type in the web url for the document into the odbcConnectExcel2007 connection, an error message pops up with:
ODBC Excel Driver Login Failed: Invalid internet Address.
followed by the following message in my R console:
ERROR: Could not SQLDriverConnect
Any insights you can provide would be greatly appreciated.
Thanks!
**UPDATE**
The site I am attempting to download from is password protected. I tried another method using the method 'getUrl' in the package RCurl:
x = getURL("http://website.com/file.xlsx", userpwd = "uname:pw")
The error that I receive is:
Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) :
embedded nul in string: 'PK\003\004\024\0\006\0\b\0\0\0!\0dA»ï\001\0\0O\n\0\0\023\0Ò\001[Content_Types].xml ¢Î\001( \0\002\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\
I have no idea what this means. Any help would be appreciated. Thanks!
Two solutions worked for me.
If you do not need to automate the script that pulls the data, you can map a network drive pointing to the sharepoint folder from which you want to extract the Excel document.
If you need to automate a script to pull the Excel file every couple of minutes, I recommend sending your authentication credentials in a request that automatically saves the file to a local drive. From there you can read it into R for further data wrangling.
library("httr")
library("openxlsx")
user <- <USERNAME>
password <- <PASSWORD>
url <- "https://sharepoint.company/file_to_obtain.xlsx"
httr::GET(url,
authenticate(user, password, type="ntlm"),
write_disk("C:/tempfile.xlsx", overwrite = TRUE))
df <- openxlsx::read.xlsx("C:/tempfile.xlsx")
You can obtain the correct URL to the file by clicking on the sharepoint location and removing "?Web=1" after the file ending (xlsx, xlsb, xls,...). USERNAME and PASSWORD are usually windows credentials. It helps storing them in a key manager (such as:
library("keyring")
keyring::key_set_with_value(service = "Windows", username = "Key", password = <PASSWORD>)
and then authenticating via
authenticate(user, kreyring::key_get("Windows", "Key"), type="ntlm")
in some instances it may be sufficient to pass
authenticate(":", ":", type="ntlm")
if only your Windows credentials are required and the code is running from your machine.

Resources