downloading from a url with redirects - r

I am trying to download a files (it could be png, csv or pdf) from an internal url which seems to have redirects. I have tried downloading using download.file with extra=-L option, the download function from downloader package and the httr package.
However, in all cases I get a file of 768 B. Saving this file as a .txt shows that there is a another url within that. I have tried using that url, but without success. I see the following message (along with other information) in the downloaded file
Since your browser does not support JavaScript, you must press the Resume button once to proceed.
What works is if I stick that url in the browseURL function, I get a prompt to save the desired file.
I need to run the script in batch mode for reproducibility purposes, is there any way to run the browseURL in batch mode? or is there any other tool that would be useful here? (I have tried read.csv, fread etc. without any success). Unfortunately, I can't share the url as it is internal to my organization.
Thanks

Related

Download Onedrive shared file in R

I need to download an excel file shared in my company from onedrive, unfortunately I can't handle it. The link looks like this:
https://company.sharepoint.com/:x:/p/user_name/XXXXXXXXXXXXXXXXXX
After adding the parameter download=1 to the URL, the browser downloads automatically, but I can't write the R code that could download such a file.
I tried to download the file with this function
httr::GET(paste0(url), authenticate("username","password",type="any"))
I tried to get a list of files using Microsoft365R, but after accessing from IT, list_sharepoint_sites() returns an empty list.

readRDS from url

I saved a dataset in a RDS file on onedrive and made it shareable link. Now, if I use that link to read the file, I got an error.
readRDS(url("https://1drv.ms/u/s!Am3aUTxhPMS8iM4pqe5fUZbiA4m9rw"))
#> Error in readRDS(url("https://1drv.ms/u/s!Am3aUTxhPMS8iM4pqe5fUZbiA4m9rw")): unknown input format
On the other hand, if I download the file from the browser, I copy the link address and I use that one...that works.
Unfortunately, the link address obtained in that way is available only for a limited time (it doesn't work permanently).
I know googledrive and rdrop2 and so I have some workarounds,
but still...I don't understand the logic behind this.
Any help?

Reading OneDrive files to R

When I read in csv files from Dropbox into R, I right-click the file and click share Dropbox link. I then have a URL something like:
https://www.dropbox.com/blahhhhhhhhhh.csv?dl=0
So I change it to:
read.csv("http://dl.dropbox.com/blahhhhhhhhhh.csv?dl=0", ...) and it works without the need to use any packages etc.
Is there a way to read files from OneDrive in a similar manner?
https://onedrive.live.com/blahhhhhhhhhhhhhhhhccsv
As when I try to read it into R it doesn't give me the data frame I'm expecting from the file.
I tested this with public OneDrive link:
download the file
get the URL in download page (Ctrl+J in Chrome):
[]
paste the URL in read.csv("url...")
This works for me even when the public link changes.
OneFlow gave me "embed" in the URL, so I changed it to "download" and got the resul
None of those answers work for me, but I found an alternative solution that works.
Find the file in OneDrive online using Chrome.
Share it to anyone with the link.
Right click the … and select “download”.
Let the file download.
Press Ctrl + J to open the Downloads page in Chrome.
Find the link on that page for the file.
Use the following code to download the file.
read_url_csv <- function(url, ...){
tmpFile <- tempfile()
download.file(url, destfile = tmpFile)
url_csv <- readr::read_csv(tmpFile, ...)
return(url_csv)
}
onedrive_url <- "INSERT LINK COPIED ABOVE"
csv <- read_url_csv(onedrive)
Reddit users recommend syncing One Drive files locally and reading the local files.
That worked for me.
Have the person(s) who manage the shared location share the folder
location with you. (It's possible sharing a file could work, but I
haven't explored that. Folder location worked for me when testing,
so that's what I recommend).
Open the shared folder in One Drive
Click "Sync"
You should now have the a local copy of those files stored on your PC and syncing online (just like Dropbox). You files will probably found in a sub-folder of this location:
C:\Users[Your user name]\

R: download all files in a Google Drive public folder

I'm trying to get data for RAIS (a Brazilian employee registry dataset) that is shared using a Google Drive public folder. This is the address:
https://drive.google.com/folderview?id=0ByKsqUnItyBhZmNwaXpnNXBHMzQ&usp=sharing&tid=0ByKsqUnItyBhU2RmdUloTnJGRGM#list
Data is divided into one folder per year and within each folder there is one file per state to download. I would like to automate the downloading process in R, for all years, and if not at least within each year folder. Downloaded file names should follow the file names that occur when downloading manually.
A know a little R, but no web programming or web scraping. This is what I got so faar:
By manually downloading the first of the 2012 file, I could see the URL my browser used to download:
https://drive.google.com/uc?id=0ByKsqUnItyBhS2RQdFJ2Q0RrN0k&export=download
Thus, I suppose the file id is: 0ByKsqUnItyBhS2RQdFJ2Q0RrN0k
Searching the html code of the 2012 page I was able to find that ID and the file name associated with it: AC2012.7z.
All the other ids' and file names are in that section of the html code. So, assuming I can download the file correctly, I suppose I could at least generalize tho the other files.
In R, I tried the flowing code to download the file:
url <- "https://drive.google.com/uc?id=0ByKsqUnItyBhS2RQdFJ2Q0RrN0k&export=download"
download.file(url,"AC2012.7z")
unzip("AC2012.7z")
It does download but I get and error when trying to uncompress the file (both within R and manually with 7.zip) There must be something wrong with file downloaded in R, as the the file size (3.412Kb) does not match what I get from manualy downloading the file (3.399Kb)
For anyone trying to solve this problem today, you can use the googledrive package.
library(googledrive)
ls_tibble <- googledrive::drive_ls(GOOGLE_DRIVE_URL_FOR_THE_TARGET_FOLDER)
for (file_id in ls_tibble$id) {
googledrive::drive_download(as_id(file_id))
}
This will (1) trigger an authentication page to open in your browser to authorise the Tidyverse libraries using gargle to access Google Drive on behalf of your account and (2) download all the files in the folder at that URL to your current working directory for the current R session.

ZIP file download tries to download page

just putting this out there to see if anyone has any good off-the-cuff suggestions.
I have a web page with a button that triggers the download of a PDF file. When I run this page up in development from within VS I get the file coming back for download as expected, however since moving my web site to a staaging environment it is now yielding a very different result: When I click the download button I instead get an error and a message which seems to indicate that the call actually attempted to download the raw ASPX page rather than any ZIP file.
As this works so painlessly in my development environment, I'm assuming this must be down to environmental/configurational differences. Has anybody come across this before and if so could you inform me of the error of my ways?
Many thanks in advance
Ian
Could the aspx file be the actual zipfile ? Have you tried downloading it and open as zip?
Does the server allow for aspx to execute, eg mime-settings ?
Maybe this helps Filename and mime problems - ASP.NET Download file (C#)
Or look here How to retrieve and download server files (File.Exists and URL)

Resources