I need to download an excel file shared in my company from onedrive, unfortunately I can't handle it. The link looks like this:
https://company.sharepoint.com/:x:/p/user_name/XXXXXXXXXXXXXXXXXX
After adding the parameter download=1 to the URL, the browser downloads automatically, but I can't write the R code that could download such a file.
I tried to download the file with this function
httr::GET(paste0(url), authenticate("username","password",type="any"))
I tried to get a list of files using Microsoft365R, but after accessing from IT, list_sharepoint_sites() returns an empty list.
Related
I am trying to download a files (it could be png, csv or pdf) from an internal url which seems to have redirects. I have tried downloading using download.file with extra=-L option, the download function from downloader package and the httr package.
However, in all cases I get a file of 768 B. Saving this file as a .txt shows that there is a another url within that. I have tried using that url, but without success. I see the following message (along with other information) in the downloaded file
Since your browser does not support JavaScript, you must press the Resume button once to proceed.
What works is if I stick that url in the browseURL function, I get a prompt to save the desired file.
I need to run the script in batch mode for reproducibility purposes, is there any way to run the browseURL in batch mode? or is there any other tool that would be useful here? (I have tried read.csv, fread etc. without any success). Unfortunately, I can't share the url as it is internal to my organization.
Thanks
I have a google form that accepts file uploads, all of them in zip format.
That form generated a spreadsheet, where each row has a link with a unique uri, that represents a zip. I want to download all of these zip files to disk.
Using gspread it is easy to get all the URIs. However these do not have .zip extensions, and they seem to be google drive paths.
I've tried extracting the ids from the URI and using the requests package to get:
https://drive.google.com/uc?export=download&id=DRIVE_FILE_ID
https://drive.google.com/u/2/uc?id=DRIVE_FILE_ID&export=download
but neither of these approaches seemed to work.
It seems like the URI is linking to a preview of the inside of the zip, but I can't figure out how to simply download it programatically. Clicking on hundreds of links and downloading each by hand isn't really an option.
I am trying to download a .tif file from my Google Drive folder (which is exported to it via Google Earth Engine), using the googledrive library. However, when calling the map function, I get the following error:
Error: 'file' identifies more than one Drive file.
I have already managed to download other .tif files with this code, which worked without any error. Why do I get this error, and how do I resolve it? As you can see in the Drive folder (it's public), the folder contains only one file, so why does 'file' identify more than one Drive file?
Code:
library(googledrive)
library(purrr)
## Store the URL to the folder
folder_url <- "https://drive.google.com/drive/folders/1Qdp0GN7_BZoU70OrpbEL-vIBBxBa1_Db"
## Identify this folder on Google Drive
## let googledrive know this is a file ID or URL, as opposed to file name
folder <- drive_get(as_id(folder_url))
## Identify files in the folder
files <- drive_ls(folder, pattern = "*.tif")
# Download all files in folder
map(files$name, overwrite = T, drive_download)
Google Drive API's method Files: list returns you by default an array
Even if the results contain only one file, or no files at all - it will still be an array.
All you need to do is to retrieve the first (0) element of this array. You can verify it by testing with the Try this API.
I expect the correct syntax in R to be something like
files[0]$name to retrieve the name of the first (even if it is the only) file.
Also: You should implement some condition statement to verify that the
list of files is not empty before you retrieve the file name.
Google Drive allows for multiple files with the exact same name in the same folder. The googledrive library does not accept this and thus will throw an error. However, even after deleting "double" files, the error wasn't solved. It seems that Google Drive also keeps some kind of hidden record/cache of the files, even when there are deleted. Only by deleting the entire folder and recreating it, I was able to solve the error.
I would like to download the excel file for a given date from this page:
http://www.mcxindia.com/SitePages/BhavCopy.aspx
The way it works is:
Request excel file by posting date
This returns a page with a snapshot of the file and a function to return an excel url link
Invoke the function to generate the url to the excel file
Download the excel file
Usually, I post code while asking questions, but in this case I have simply no idea where to start.
I'm trying to get data for RAIS (a Brazilian employee registry dataset) that is shared using a Google Drive public folder. This is the address:
https://drive.google.com/folderview?id=0ByKsqUnItyBhZmNwaXpnNXBHMzQ&usp=sharing&tid=0ByKsqUnItyBhU2RmdUloTnJGRGM#list
Data is divided into one folder per year and within each folder there is one file per state to download. I would like to automate the downloading process in R, for all years, and if not at least within each year folder. Downloaded file names should follow the file names that occur when downloading manually.
A know a little R, but no web programming or web scraping. This is what I got so faar:
By manually downloading the first of the 2012 file, I could see the URL my browser used to download:
https://drive.google.com/uc?id=0ByKsqUnItyBhS2RQdFJ2Q0RrN0k&export=download
Thus, I suppose the file id is: 0ByKsqUnItyBhS2RQdFJ2Q0RrN0k
Searching the html code of the 2012 page I was able to find that ID and the file name associated with it: AC2012.7z.
All the other ids' and file names are in that section of the html code. So, assuming I can download the file correctly, I suppose I could at least generalize tho the other files.
In R, I tried the flowing code to download the file:
url <- "https://drive.google.com/uc?id=0ByKsqUnItyBhS2RQdFJ2Q0RrN0k&export=download"
download.file(url,"AC2012.7z")
unzip("AC2012.7z")
It does download but I get and error when trying to uncompress the file (both within R and manually with 7.zip) There must be something wrong with file downloaded in R, as the the file size (3.412Kb) does not match what I get from manualy downloading the file (3.399Kb)
For anyone trying to solve this problem today, you can use the googledrive package.
library(googledrive)
ls_tibble <- googledrive::drive_ls(GOOGLE_DRIVE_URL_FOR_THE_TARGET_FOLDER)
for (file_id in ls_tibble$id) {
googledrive::drive_download(as_id(file_id))
}
This will (1) trigger an authentication page to open in your browser to authorise the Tidyverse libraries using gargle to access Google Drive on behalf of your account and (2) download all the files in the folder at that URL to your current working directory for the current R session.