I came across this service that will format my local markdown files rather nicely. Per the example, it is easy to get a nicely formatted response with a sample curl command.
What I am looking to do is utilize some of the options available, namely the version and "name" parameters. How would I go about structuring the curl command? Below are the code samples that I have used within R.
This code works nicely, but lacks the specified options:
doc.up <- "curl -X POST --data-urlencode content#test-markdown.md \ http://documentup.com/compiled > index.html"
system(doc.up)
I tried to specify the name option, but no dice:
doc.up <- "curl -X POST --data-urlencode name#mynamevar content#test-markdown.md \ http://documentup.com/compiled > index.html"
system(doc.up)
Any help will be greatly appreciated!
EDIT: Per some of the suggestions below, I have attempted a few ways to use Rcurl and HTTR. I am using the default Markdown template within Rstudio, but for completeness sake, I saved it as test-markdown.Rmd and compiled it test-markdown.md.
Using RCurl, I attempted:
## attempt 1
f <- paste(readLines('test-markdown.md'),collapse="\n" )
h <- dynCurlReader()
wp <- curlPerform(url="http://documentup.com/compiled",
postfields = c(content=f))
## attempt 2
postForm("http://documentup.com/compiled",
"content" = fileUpload('test-markdown.md'))
Using httr, I tried:
## attempt 3
tmp <- POST("http://documentup.com/compiled", body = list(content= upload_file(f)))
content(tmp)
## attempt 4
tmp <- POST("http://documentup.com/compiled", body = list(content= upload_file("test-markdown.md")))
The following code works for me:
library(httr)
url <- "http://documentup.com/compiled"
contents <- readLines("README.md")
resp <- POST(url, body = list(content = contents, name = "plyr"))
content(resp)
#Btibert3 Actually, you can tie it to RStudio yourself using the rstudio.markdownToHTML option. See http://www.rstudio.com/ide/docs/authoring/markdown_custom_rendering
Related
I am trying to automate a process in R which involves downloading a zipped folder from an API* which contains a few .csv/.xml files, accessing its contents, and then extracting the .csv/.xml that I actually care about into a dataframe (or something else that is workable). However, I am having some problems accessing the contents of the API pull. From what I gather, the proper process for pulling from an API is to use GET() from the httr package to access the API's files, then the jsonlite package to process it. The second step in this process is failing me. The code I have been trying to use is roughly as follows:
library(httr)
library(jsonlite)
req <- "http://request.path.com/thisisanapi/SingleZip?option1=yes&option2=no"
res <- GET(url = req)
#this works as expected, with res$status_code == 200
#OPTION 1:
api_char <- rawToChar(res$content)
api_call <- fromJSON(api_char, flatten=T)
#OPTION 2:
api_char2 <- content(res, "text")
api_call2 <- fromJSON(api_char2, flatten=T)
In option 1, the first line fails with an "embedded nul in string" error. In option 2, the second line fails with a "lexical error: invalid char in json text" error.
I did some reading and found a few related threads. First, this person looks to be doing a very similar thing to me, but did not experience this error (this suggests that maybe the files are zipped/stored differently between the APIs that the two of us are using and that I have set up the GET() incorrectly?). Second, this person seems to be experiencing a similar problem with converting the raw data from the API. I attempted the fix from this thread, but it did not work. In option 1, the first line ran but the second line gave a similar "lexical error: invalid char in json text" as in option before and, in option 2, the second line gave a "if (is.character(txt) && length(txt) == 1 && nchar(txt, type = "bytes") < : missing value where TRUE/FALSE needed" error, which I am not quite sure how to interpret. This may be because the content_type differs between our API pulls: mine is application/x-zip-compressed and theirs is text/tab-separated-values; charset=utf-16le, so maybe removing the null characters is altogether inappropriate here.
There is some documentation on usage of the API I am using*, but a lot of it is a few years old now and seems to focus more on manual usage rather than integration with large automated downloads like I am working on (my end goal is a loop which executes the process described many times over slightly varying urls). I am most certainly a beginner to using APIs like this, and would really appreciate some insight!
* = specifically, I am pulling from CAISO's OASIS API. If you want to follow along with some real files, replace "http://request.path.com/thisisanapi/SingleZip?option1=yes&option2=no" with "http://oasis.caiso.com/oasisapi/SingleZip?resultformat=6&queryname=PRC_INTVL_LMP&version=3&startdatetime=20201225T09:00-0000&enddatetime=20201226T9:10-0000&market_run_id=RTM&grp_type=ALL"
I think the main issue here is that you don't have a JSON return from the API. You have a .zip file being returned, as binary (I think?) data. Your challenge is to process that data. I don't think fromJSON() will help you, as the data from the API isn't in JSON format.
Here's how I would do it. I prefer to use the httr2 package. The process below makes it nice and clear what the parameters of the query are.
library(httr2)
req <- httr2::request("http://oasis.caiso.com/oasisapi")
query <- req %>%
httr2::req_url_path_append("SingleZip") %>%
httr2::req_url_query(resultformat = 6) %>%
httr2::req_url_query(queryname = "PRC_INTVL_LMP") %>%
httr2::req_url_query(version = 3) %>%
httr2::req_url_query(startdatetime = "20201225T09:00-0000") %>%
httr2::req_url_query(enddatetime = "20201226T9:10-0000") %>%
httr2::req_url_query(market_run_id = "RTM") %>%
httr2::req_url_query(grp_type = "ALL")
# Check what our query looks like
query
#> <httr2_request>
#> GET
#> http://oasis.caiso.com/oasisapi/SingleZip?resultformat=6&queryname=PRC_INTVL_LMP&version=3&startdatetime=20201225T09%3A00-0000&enddatetime=20201226T9%3A10-0000&market_run_id=RTM&grp_type=ALL
#> Body: empty
resp <- query %>%
httr2::req_perform()
# Check what content type and encoding we have
# All looks good
resp %>%
httr2::resp_content_type()
#> [1] "application/x-zip-compressed"
resp %>%
httr2::resp_encoding()
#> [1] "UTF-8"
Created on 2022-08-30 with reprex v2.0.2
Then you have a choice what to do, if you want to write the data to a zip file.
I discovered that the brio package will write raw data to a file nicely. Or you can just use download.file to download the .zip from the URL (you can just do that without all the httr stuff above). You need to use mode = "wb".
resp %>%
httr2::resp_body_raw() %>%
brio::write_file_raw(path = "out.zip")
# alternative using your original URL or query$url
download.file(query$url, "out.zip", mode = "wb")
I need to bring data from this site (image below) in R
but I dont know how pass the parameters Key and Client via code, can anyone help me
Follow my code:
dados_get <- GET('http://api.climatempo.com.br/api/v1/forecast/72hours/temperature?idlocale=6873')
Try the following code:
require(httr)
require(jsonlite)
#list of parameters
a = list("key"="01234","type"="json","client"="abc","localeid"="6873")
dados_post <- POST(url='http://api.climatempo.com.br/api/v1/monitoring/weather',
body = toJSON(a))
Let me know if it works.
I’ve recently gotten into scraping (and programming in general) for my internship, and I came across PDF scraping. Every time I try to read a scanned pdf with R, I can never get it to work. I’ve tried using the file.choose() function to no avail. Do I need to change my directory, or how can I get the pdf from my files into R?
The code looks something like this:
> library(pdftools)
> text=pdf_text("C:/Users/myname/Documents/renewalscan.pdf")
> text
[1] ""
Also, using pdftables leads me here:
> library(pdftables)
> convert_pdf("C:/Users/myname/Documents/renewalscan.pdf","my.csv")
Error in get_content(input_file, format, api_key) :
Bad Request (HTTP 400).
You should use the packages pdftools and pdftables.
If you are trying to read text inside the pdf, then use pdf_text() function. What goes inside is the path (in your computer or web) to the pdf. For example
tt = pdf_text("C:/Users/Smith/Documents/my_file.pdf")
It would be nice if you were more specif and also give us reproducible example.
To use the PDFTables R package, you need to the run the following command:
convert_pdf('test/index.pdf', output_file = NULL, format = "xlsx-single", message = TRUE, api_key = "insert_API_key")
If you are looking to get tabular data, you might try tabulizer. Here is a full code tutorial: https://www.business-science.io/code-tools/2019/09/23/tabulizer-pdf-scraping.html
Basically, you can use this code from the tutorial:
library(tabulizer)
extract_tables(
file = "2019-09-23-tabulizer/endangered_species.pdf",
method = "decide",
output = "data.frame")
I am trying to use an R script hosted on GitHub plugin-draw.R. How should I use this plugin?
You can simply use source_url from package devtools :
library(devtools)
source_url("https://raw.github.com/tonybreyal/Blog-Reference-Functions/master/R/bingSearchXScraper/bingSearchXScraper.R")
Based on #Matifou's reply, but using the "new" method appending ?raw=TRUE at the end of your URL:
devtools::source_url("https://github.com/tonybreyal/Blog-Reference-Functions/blob/master/R/bingSearchXScraper/bingSearchXScraper.R?raw=TRUE")
You can use solution offered on R-Bloggers:
source_github <- function(u) {
# load package
require(RCurl)
# read script lines from website
script <- getURL(u, ssl.verifypeer = FALSE)
# parase lines and evaluate in the global environment
eval(parse(text = script))
}
source_github("https://raw.github.com/tonybreyal/Blog-Reference-Functions/master/R/bingSearchXScraper/bingSearchXScraper.R")
For the function to be evaluated in a global environment (I'm guessing that you will prefer this solution) you can use:
source_https <- function(u, unlink.tmp.certs = FALSE) {
# load package
require(RCurl)
# read script lines from website using a security certificate
if(!file.exists("cacert.pem")) download.file(url="http://curl.haxx.se/ca/cacert.pem", destfile = "cacert.pem")
script <- getURL(u, followlocation = TRUE, cainfo = "cacert.pem")
if(unlink.tmp.certs) unlink("cacert.pem")
# parase lines and evealuate in the global environement
eval(parse(text = script), envir= .GlobalEnv)
}
source_https("https://raw.github.com/tonybreyal/Blog-Reference-Functions/master/R/bingSearchXScraper/bingSearchXScraper.R")
source_https("https://raw.github.com/tonybreyal/Blog-Reference-Functions/master/R/htmlToText/htmlToText.R", unlink.tmp.certs = TRUE)
As mentioned in the the original article by Tony Breyal, this discussion on SO should also be credited as it is relevant to the discussed question.
If it is a link on GitHub where you can click on Raw next to the Blame, you can actually just use the ordinary base::source. Go to the R script of your choice and find the Raw button.
The link will contain raw.githubusercontent.com now, and the page show nothing but R script itself. Then, for this example,
source(
paste0(
"https://raw.githubusercontent.com/betanalpha/knitr_case_studies/master/",
"stan_intro/stan_utility.R"
)
)
(paste0 was used just to fit the URL into a narrower screen.)
I download some JSON files from Twitter with this command
library(RCurl)
getURL("http://search.twitter.com/search.json?since=2012-12-06&until=2012-12-07&q=royalbaby&result_type=recent&rpp=10&page=1")
But now all double quotes are transformed to \". In some special cases this destroys the JSON format. I think getURL or curl will make this change. Is there any way to suppress this action?
Thanks
Markus
Your page contain the "\ , it is not Rcurl behavior ( try to open the page with a browser)
library(RJSONIO)
library(RCurl)
raw_data <- getURL(you.url)
data <- fromJSON(raw_data)
The data is well formated.
Use cat to avoid \ representation.