source_data R from private repository - r

I am trying to read one RData file from my private repository "data" in R
library(repmis)
source_data("https://github.com/**********.Rdata?raw=true")
This is my output
Error in download_data_intern(url = url, sha1 = sha1, temp_file = temp_file) :
Not Found (HTTP 404).
Other way
script <-
GET(
url = "https://api.github.com/repos/***/data/contents/01-wrangle-data-covid-ssa-mx-county.R",
authenticate(Sys.getenv("GITHUB_PAT"), ""), # Instead of PAT, could use password
accept("application/vnd.github.v3.raw")
) %>%
content(as = "text")
# Evaluate and parse to global environment
eval(parse(text = script))
Anyone knows how can I read this data from my private repo in R?

I could solve this.
Generate on GitHub your personal token
1.1 Go to GitHub
2.1 In the right corner go to "Settings"
2.2 Then in the left part go to "Developer setting"
2.3 Select the option "Personal access tokens"
2.4 Select the option "Generate new token"
2.5 Copy your personal token
On your home directory follow the next steps
2.1 Create the file .Renviron
macbook#user:~$ touch .Reviron
On this file write your personal token like this:
macbook#user:~$ nano .Reviron
GITHUB_PAT=YOUR PERSONAL TOKEN
Now on R, you can check if your personal token has been saved with this:
Sys.getenv("GITHUB_PAT")
also you can edit your token on R with this:
usethis::edit_r_environ()
DonĀ“t forget to restart R to save your changes.
3. Finally on R these are the line codes that will load your data from private repos
library(httr)
req <- content(GET(
"https://api.github.com/repos/you_group/your_repository/contents/your_path_to your_doc/df_test.Rdata",
add_headers(Authorization = "token YOUR_TOKEN")
), as = "parsed")
tmp <- tempfile()
r1 <- GET(req$download_url, write_disk(tmp))
load(tmp)

Related

How can I use httr to bulk upload files to documentcloud

I want to use documentcloud's API to bulk upload a file folder of pdfs via R's httr package. I also want to receive a dataframe of the URLS of the uploaded files.
I figured out how to generate a token, but I can't get anything to upload successfully. Here is my attempt to upload a single pdf:
library(httr)
library(jsonlite)
url <- "https://api.www.documentcloud.org/api/documents/"
# Generate a token
user <- "username"
pw <- "password"
response <- POST("https://accounts.muckrock.com/api/token/",
body= list(username = user,
password = pw)
)
token <- content(response)
access_token <- unlist(token$access)
paste <- paste("Bearer", access_token)
# Initiate upload for single pdf
POST(url,
add_headers(`Authorization` = paste),
body = upload_file("filename.PDF",
type = "application/pdf"),
verbose()
)
I get a 415 "Unsupported Media Type" error when attempting to initiate the upload for a single pdf. Unsure why this happens, and also once this is resolved, how I can bulk-upload many pdfs?

Best way to upload a large data frame from R to Big Query?

In my case, bq_table_upload() does not work since the file is 5G. Exporting to CSV and uploading through the BQ web UI also fails because of size. I think the code below used to be how I did this, but authentication through gar_auth() via the browser no longer works for me:
library(googleCloudStorageR)
library(bigrquery)
library(googleAuthR)
gcs_global_bucket("XXXXXXXXX")
## custom upload function to ignore quotes and column headers
f <- function(input, output) {
write.table(input, sep = ",", col.names = FALSE, row.names = FALSE,
quote = FALSE, file = output, qmethod = "double")}
## upload files to Google Cloud Storage
gcs_upload(mtcars, name = "mtcars_test1.csv", object_function = f)
## create the schema of the files you just uploaded
user_schema <- schema_fields(mtcars)
## load files from Google Cloud Storage into BigQuery
bqr_upload_data(projectId = "your-project",
datasetId = "test",
tableId = "from_gcs_mtcars",
upload_data = c("gs://XXXXX/mtcars_test1.csv")
schema = user_schema)
Is there any workaround?
This is the error this produces:
> gcs_upload(mtcars, name = "mtcars_test1.csv", object_function = f)
2020-06-30 11:49:37 -- File size detected as 1.2 Kb
2020-06-30 11:49:37> No authorization yet in this session!
2020-06-30 11:49:37> NOTE: a .httr-oauth file exists in current working directory.
Run authentication function to use the credentials cached for this session.
Error: Invalid token
Then I tried to authenticate with
gar_auth()
which launches a Chrome browser window where I was usually able to authenticate by picking the right Google profile, but now get "Error 400: invalid_request Missing required parameter: client_id".
Use gcs_auth() to authenticate your session for upload or see website on setting up authentication on library startup

Download CSV from a password protected website

If you go to the website https://www.myfxbook.com/members/iseasa_public1/rush/2531687 then click that dropdown box Export, then choose CSV, you will be taken to https://www.myfxbook.com/statements/2531687/statement.csv and the download (from the browser) will proceed automatically. The thing is, you need to be logged in to https://www.myfxbook.com in order to receive the information; otherwise, the file downloaded will contain the text "Please login to Myfxbook.com to use this feature".
I tried using read.csv to get the csv file in R, but only got that "Please login" message. I believe R has to simulate an html session (whatever that is, I am not sure about this) so that access will be granted. Then I tried some scraping tools to login first, but to no avail.
library(rvest)
login <- "https://www.myfxbook.com"
pgsession <- html_session(login)
pgform <- html_form(pgsession)[[1]]
filled_form <- set_values(pgform, loginEmail = "*****", loginPassword = "*****") # loginEmail and loginPassword are the names of the html elements
submit_form(pgsession, filled_form)
url <- "https://www.myfxbook.com/statements/2531687/statement.csv"
page <- jump_to(pgsession, url) # page will contain 48 bytes of data (in the 'content' element), which is the size of that warning message, though I could not access this content.
From the try above, I got that page has an element called cookies which in turns contains JSESSIONID. From my research, it seems this JSESSIONID is what "proves" I am logged in to that website. Nonetheless, downloading the CSV does not work.
Then I tried:
library(RCurl)
h <- getCurlHandle(cookiefile = "")
ans <- getForm("https://www.myfxbook.com", loginEmail = "*****", loginPassword = "*****", curl = h)
data <- getURL("https://www.myfxbook.com/statements/2531687/statement.csv", curl = h)
data <- getURLContent("https://www.myfxbook.com/statements/2531687/statement.csv", curl = h)
It seems these libraries were built to scrape html pages and do not deal with files in other formats.
I would pretty much appreciate any help as I've been trying to make this work for quite some time now.
Thanks.

source R file from private gitlab with basic auth

I would like to source an .R file from a private gitlab serveur. I need to use the basic authentication with user/password
I tried this kind of instruction without succes
httr::GET("http://vpsxxxx.ovh.net/root/project/raw/9f8a404b5b33c216d366d80b7d48e34577598069/R/script.R",
authenticate("user", "password",type="basic"))
any idea?
Regards
edit : I found this way... but I need to download all the project...
bundle <- tempfile()
git2r::clone("http://vpsxxx.ovh.net/root/projet.git",
bundle, credentials=git2r::cred_user_pass("user", "password"))
source(file.path(bundle,"R","script.R"))
you can use gitlab API to get a file from a repository. gitlabr can help you do that. Current version 0.9 is compatible with apiV3 and V4.
This should work (it work on my end on a private gitlab with apiv3)
library(gitlabr)
my_gitlab <- gl_connection("https://private-gitlab.com",
login = "username",
password = "password",
api_version = "v4") # by default. put v3 here if needed
my_file <- my_gitlab(gl_get_file, project = "project_name", file_path = "path/to/file")
This will get you a character version of your file. You can also get back a raw version to deal with it in another way.
raw <- gl_get_file(project = "project_name",
file_path = "file/to/path",
gitlab_con = my_gitlab,
to_char = F)
temp_file <- tempfile()
writeBin(raw, temp_file)
You can now source the code
source(temp_file)
It is one solution among others. I did not manage to source the file without using the API.
Know that :
* You can use an Access Token instead of username and password
* You can use the gitlabr several ways. It is documented in the vignette. I used 2 differents ways here.
* Version 1.0 will not be compatible with v3 api. But I think you use v4.
Feel free to get back to me so that I update this post if need a clearer answer.

Downloading large files with R/RCurl efficiently

I see that many examples for downloading binary files with RCurl are like such:
library("RCurl")
curl = getCurlHandle()
bfile=getBinaryURL (
"http://www.example.com/bfile.zip",
curl= curl,
progressfunction = function(down, up) {print(down)}, noprogress = FALSE
)
writeBin(bfile, "bfile.zip")
rm(curl, bfile)
If the download is very large, I suppose it would be better writing it concurrently to the storage medium, instead of fetching all in memory.
In RCurl documentation there are some examples to get files by chunks and manipulate them as they are downloaded, but they seem all referred to text chunks.
Can you give a working example?
UPDATE
A user suggests using the R native download file with mode = 'wb' option for binary files.
In many cases the native function is a viable alternative, but there are a number of use-cases where this native function does not fit (https, cookies, forms etc.) and this is the reason why RCurl exists.
This is the working example:
library(RCurl)
#
f = CFILE("bfile.zip", mode="wb")
curlPerform(url = "http://www.example.com/bfile.zip", writedata = f#ref)
close(f)
It will download straight to file. The returned value will be (instead of the downloaded data) the status of the request (0, if no errors occur).
Mention to CFILE is a bit terse on RCurl manual. Hopefully in the future it will include more details/examples.
For your convenience the same code is packaged as a function (and with a progress bar):
bdown=function(url, file){
library('RCurl')
f = CFILE(file, mode="wb")
a = curlPerform(url = url, writedata = f#ref, noprogress=FALSE)
close(f)
return(a)
}
## ...and now just give remote and local paths
ret = bdown("http://www.example.com/bfile.zip", "path/to/bfile.zip")
um.. use mode = 'wb' :) ..run this and follow along w/ my comments.
# create a temporary file and a temporary directory on your local disk
tf <- tempfile()
td <- tempdir()
# run the download file function, download as binary.. save the result to the temporary file
download.file(
"http://sourceforge.net/projects/peazip/files/4.8/peazip_portable-4.8.WINDOWS.zip/download",
tf ,
mode = 'wb'
)
# unzip the files to the temporary directory
files <- unzip( tf , exdir = td )
# here are your files
files

Resources