How to get table from html form using rvest or httr?

How to get table from html form using rvest or httr? - r

I am using R, version 3.3.1. I am trying to scrap data from following web site:
http://plovila.pomorstvo.hr/
As you can see, it is a HTML form. I would like to choose "Tip objekta" (object type), for example "Jahta" (Yacht) and enter "NIB" (which is an integer, eg. 93567). You can try yourself; just choose "Jahta" and type 93567 in NIB field.
Method is POST, type application/x-www-form-urlencoded. I have tried 3 different approaches: using rvest, POST (httr package) and postForm (Rcurl). My rvest code is:
session <- html_session("http://plovila.pomorstvo.hr")
form <- html_form(session)[[1]]
form <- set_values(form, `ctl00$Content_FormContent$uiTipObjektaDropDown` = 2,
`ctl00$Content_FormContent$uiOznakaTextBox` = "",
`ctl00$Content_FormContent$uiNibTextBox` = 93567)
x <- submit_form(session, form)
If I run this code and get 200 status but I don't understand how can I get the table:
Additional step is to submit Detalji button and get additional information, but I can't see any information from x submit output.

I used the curlconverter package to take the "Copy as cURL" data from the XHR POST request and turn it automagically into:
httr::VERB(verb = "POST", url = "http://plovila.pomorstvo.hr/",
httr::add_headers(Origin = "http://plovila.pomorstvo.hr",
`Accept-Encoding` = "gzip, deflate",
`Accept-Language` = "en-US,en;q=0.8",
`X-Requested-With` = "XMLHttpRequest",
Connection = "keep-alive",
`X-MicrosoftAjax` = "Delta=true",
Pragma = "no-cache", `User-Agent` = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.34 Safari/537.36",
Accept = "*/*", `Cache-Control` = "no-cache",
Referer = "http://plovila.pomorstvo.hr/",
DNT = "1"), httr::set_cookies(ASP.NET_SessionId = "b4b123vyqxnt4ygzcykwwvwr"),
body = list(`ctl00$uiScriptManager` = "ctl00$Content_FormContent$ctl00|ctl00$Content_FormContent$uiPretraziButton",
ctl00_uiStyleSheetManager_TSSM = ";|635908784800000000:d29ba49:3cef4978:9768dbb9",
`ctl00$Content_FormContent$uiTipObjektaDropDown` = "2",
`ctl00$Content_FormContent$uiImeTextBox` = "",
`ctl00$Content_FormContent$uiNibTextBox` = "93567",
`__EVENTTARGET` = "", `__EVENTARGUMENT` = "",
`__LASTFOCUS` = "", `__VIEWSTATE` = "/wEPDwUKMTY2OTIzNTI1MA9kFgJmD2QWAgIDD2QWAgIBD2QWAgICD2QWAgIDD2QWAmYPZBYIAgEPZBYCZg9kFgZmD2QWAgIBDxAPFgYeDURhdGFUZXh0RmllbGQFD05heml2VGlwT2JqZWt0YR4ORGF0YVZhbHVlRmllbGQFDElkVGlwT2JqZWt0YR4LXyFEYXRhQm91bmRnZBAVBAAHQnJvZGljYQVKYWh0YQbEjGFtYWMVBAEwATEBMgEzFCsDBGdnZ2cWAQICZAIBDw8WAh4HVmlzaWJsZWdkFgICAQ8PFgIfA2dkZAICDw8WAh8DaGQWAgIBDw8WBB4EVGV4dGUfA2hkZAIHDzwrAA4CABQrAAJkFwEFCFBhZ2VTaXplAgoBFgIWCw8CCBQrAAhkZGRkZDwrAAUBBAUHSWRVcGlzYTwrAAUBBAUISWRVbG9za2E8KwAFAQQFBlNlbGVjdGRlFCsAAAspelRlbGVyaWsuV2ViLlVJLkdyaWRDaGlsZExvYWRNb2RlLCBUZWxlcmlrLldlYi5VSSwgVmVyc2lvbj0yMDEzLjMuMTExNC40MCwgQ3VsdHVyZT1uZXV0cmFsLCBQdWJsaWNLZXlUb2tlbj0xMjFmYWU3ODE2NWJhM2Q0ATwrAAcACyl1VGVsZXJpay5XZWIuVUkuR3JpZEVkaXRNb2RlLCBUZWxlcmlrLldlYi5VSSwgVmVyc2lvbj0yMDEzLjMuMTExNC40MCwgQ3VsdHVyZT1uZXV0cmFsLCBQdWJsaWNLZXlUb2tlbj0xMjFmYWU3ODE2NWJhM2Q0ARYCHgRfZWZzZGQWBB4KRGF0YU1lbWJlcmUeBF9obG0LKwQBZGZkAgkPZBYCZg9kFgJmD2QWIAIBD2QWBAIDDzwrAAgAZAIFDzwrAAgAZAIDD2QWBAIDDzwrAAgAZAIFDzwrAAgAZAIFD2QWAgIDDzwrAAgAZAIHD2QWBAIDDzwrAAgAZAIFDzwrAAgAZAIJD2QWBAIDDzwrAAgAZAIFDzwrAAgAZAILD2QWBgIDDxQrAAI8KwAIAGRkAgUPFCsAAjwrAAgAZGQCBw8UKwACPCsACABkZAIND2QWBgIDDxQrAAI8KwAIAGRkAgUPFCsAAjwrAAgAZGQCBw8UKwACPCsACABkZAIPD2QWAgIDDxQrAAI8KwAIAGRkAhEPZBYGAgMPPCsACABkAgUPPCsACABkAgcPPCsACABkAhMPZBYGAgMPPCsACABkAgUPPCsACABkAgcPPCsACABkAhUPZBYCAgMPPCsACABkAhcPZBYGAgMPPCsACABkAgUPPCsACABkAgcPPCsACABkAhkPPCsADgIAFCsAAmQXAQUIUGFnZVNpemUCBQEWAhYLZGRlFCsAAAsrBAE8KwAHAAsrBQEWAh8FZGQWBB8GZR8HCysEAWRmZAIbDzwrAA4CABQrAAJkFwEFCFBhZ2VTaXplAgUBFgIWC2RkZRQrAAALKwQBPCsABwALKwUBFgIfBWRkFgQfBmUfBwsrBAFkZmQCHQ88KwAOAgAUKwACZBcBBQhQYWdlU2l6ZQIFARYCFgtkZGUUKwAACysEATwrAAcACysFARYCHwVkZBYEHwZlHwcLKwQBZGZkAiMPPCsADgIAFCsAAmQXAQUIUGFnZVNpemUCBQEWAhYLZGRlFCsAAAsrBAE8KwAHAAsrBQEWAh8FZGQWBB8GZR8HCysEAWRmZAILD2QWAmYPZBYCZg9kFgICAQ88KwAOAgAUKwACZBcBBQhQYWdlU2l6ZQIFARYCFgtkZGUUKwAACysEATwrAAcACysFARYCHwVkZBYEHwZlHwcLKwQBZGZkZIULy2JISPTzELAGqWDdBkCVyvvKIjo/wm/iG9PT1dlU",
`__VIEWSTATEGENERATOR` = "CA0B0334",
`__PREVIOUSPAGE` = "jGgYHmJ3-6da6PzGl9Py8IDr-Zzb75YxIFpHMz4WQ6iQEyTbjWaujGRHZU-1fqkJcMyvpGRkWGStWuj7Uf3NYv8Wi0KSCVwn435kijCN2fM1",
`__ASYNCPOST` = "true",
`ctl00$Content_FormContent$uiPretraziButton` = "Pretraži"),
encode = "form") -> res
You can see the result of that via:
content(res, as="text") # returns raw HTML
or
content(res, as="parsed") # returns something you can use with `rvest` / `xml2`
Unfortunately, this is yet another useless SharePoint website that "eGov" sites around the world have bought into as a good thing to do. That means you have to do trial and error to figure out which of those parameters is necessary since it's different on virtually every site. I tried a minimal set to no avail.
You may even have to issue a GET request to the main site first to establish a session.
But this should get you going in the right direction.

Related

R: search_fullarchive() and Twitter Academic research API track

I was wondering whether anyone has found a way to how to use search_fullarchive() from the "rtweet" package in R with the new Twitter academic research project track?
The problem is whenever I try to run the following code:
search_fullarchive(q = "sunset", n = 500, env_name = "AcademicProject", fromDate = "202010200000", toDate = "202010220000", safedir = NULL, parse = TRUE, token = bearer_token)
I get the following error "Error: Not a valid access token". Is that because search_fullarchive() is only for paid premium accounts and that doesn't include the new academic track (even though you get full archive access)?
Also, can you retrieve more than 500 tweets (e.g., n = 6000) when using search_fullarchive()?
Thanks in advance!

I've got the same problem w/ Twitter academic research API. I think if you set n = 100 or just skip the argument, the command will return you 100 tweets. Also, the rtweet package does not (yet) support the academic research API.

Change your code to this:
search_fullarchive(q = "sunset", n = 500, env_name = "AcademicProject", fromDate = "202010200000", toDate = "202010220000", safedir = NULL, parse = TRUE, token = t, env_name = "Your Environment Name attained in the Dev Dashboard")
Also The token must be created like this:
t <- create_token(
app = "App Name",
'Key',
'Secret',
access_token = '',
access_secret = '',
set_renv = TRUE
)

How to Call Amazon Product Advertising API 5 from R?

I want to call Amazon product advertising API from R. Given below is the quick guide of pa API 5.
https://webservices.amazon.com/paapi5/documentation/quick-start/using-curl.html
I tried to do the way it described here https://webservices.amazon.com/paapi5/documentation/sending-request.html using 'httr' but got thrown off on the signature version 4 signing process.
I tried using 'aws.signature' package for signature prior to calling the POST function, but the final output I am getting is status code 500.
Here is the code I have used
library(httr)
library(jsonlite)
library(aws.signature)
request_body=data.frame("Keywords"="Harry",
"Marketplace"= "www.amazon.com",
"PartnerTag"= "mytag-20",
"PartnerType"= "Associates",
"Access Key"="my_accesskey",
"Secret Key"="my_secret_key",
"service"="ProductAdvertisingAPIv1",
"Region"="us-east-1"
"Resources"="Offers.Listings.Price",
"SearchIndex"= "All")
request_body_json=toJSON(request_body,auto_unbox=T)
request_body_json=gsub("\\[|\\]","",request_body_json)
t=signature_v4_auth(
datetime = format(Sys.time(), "%Y%m%dT%H%M%SZ", tz = "UTC"),
region = NULL,
service="ProductAdvertisingAPIv1",
verb="POST",
"com.amazon.paapi5.v1.ProductAdvertisingAPIv1.SearchItems",
query_args = list(),
canonical_headers=c("Host: webservices.amazon.com",
"Content-Type: application/json; charset=UTF-8",
"X-Amz-Target: com.amazon.paapi5.v1.ProductAdvertisingAPIv1.SearchItems",
"Content-Encoding: amz-1.0",
"User-Agent: paapi-docs-curl/1.0.0"),
request_body=request_body_json,
signed_body = TRUE,
key = "access_key",
secret = "secret-key",
session_token = NULL,
query = FALSE,
algorithm = "AWS4-HMAC-SHA256",
force_credentials = FALSE,
verbose = getOption("verbose", FALSE)
)
result=POST("https://webservices.amazon.com/paapi5/searchitems",body=request_body_json,
add_headers(.headers=c("Host: webservices.amazon.com",
"Content-Type: application/json; charset=UTF-8",
paste("X-Amz-Date:",format(Sys.time(), "%Y%m%dT%H%M%SZ", tz = "UTC")),
"X-Amz-Target: com.amazon.paapi5.v1.ProductAdvertisingAPIv1.SearchItems",
"Content-Encoding: amz-1.0",
"User-Agent: paapi-docs-curl/1.0.0",
paste0("Authorization: AWS4-HMAC-SHA256 Credential=",t[["Credential"]],"SignedHeaders=content-encoding;host;x-amz-date;x-amz-target Signature=",t[["Signature"]])
)))
Appreciate it if anyone one can help with this. Thanks

RDash - getting error while refresh the page and Upload the csv file

This is Dash-R code for Uploading csv file.
Following Error I while refresh the page:
error: non-character argument
request: 127.0.0.1 - ID_127.0.0.1 [15/Jul/2020:22:22:38 +0530] "POST /_dash-update-component HTTP/1.1" 500 0 "http://127.0.0.1:8050/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36"
while i am uploading csv file, i am getting following Error"
error: could not find function "base64_dec"
request: 127.0.0.1 - ID_127.0.0.1 [15/Jul/2020:22:23:08 +0530] "POST /_dash-update-component HTTP/1.1" 500 0 "http://127.0.0.1:8050/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36"
library(dash)
library(dashCoreComponents)
library(dashHtmlComponents)
library(dashTable)
app <- Dash$new()
app$layout(htmlDiv(list(
dccUpload(
id='upload-data',
children=htmlDiv(list(
'Drag and Drop or ',
htmlA('Select Files')
)),
style=list(
'width'= '100%',
'height'= '60px',
'lineHeight'= '60px',
'borderWidth'= '1px',
'borderStyle'= 'dashed',
'borderRadius'= '5px',
'textAlign'= 'center',
'margin'= '10px'
),
# Allow multiple files to be uploaded
multiple=TRUE
),
htmlDiv(id='output-data-upload')
)))
parse_contents = function(contents, filename, date){
content_type = strsplit(contents, ",")
content_string = strsplit(contents, ",")
decoded = base64_dec(content_string)
if('csv' %in% filename){
df = read.csv(utf8::as_utf8(decoded))
} else if('xls' %in% filename){
df = read.table(decoded, encoding = 'bytes')
} else{
return(htmlDiv(list(
'There was an error processing this file.'
)))
}
return(htmlDiv(list(
htmlH5(filename),
htmlH6(anytime(date)),
dashDataTable(df_to_list('records'),columns = lapply(colnames(df), function(x){list('name' = x, 'id' = x)})),
htmlHr(),
htmlDiv('Raw Content'),
htmlPre(paste(substr(toJSON(contents), 1, 100), "..."), style=list(
'whiteSpace'= 'pre-wrap',
'wordBreak'= 'break-all'
))
)))
}
app$callback(
output = list(id='output-data-upload', property = 'children'),
params = list(input(id = 'upload-data', property = 'contents'),
state(id = 'upload-data', property = 'filename'),
state(id = 'upload-data', property = 'last_modified')),
function(list_of_contents, list_of_names, list_of_dates){
if(is.null(list_of_contents) == FALSE){
children = lapply(1:length(list_of_contents), function(x){
parse_contents(list_of_contents[[x]], list_of_names[[x]], list_of_dates[[x]])
})
}
return(children)
})
app$run_server()

I am having the same problem following the dashr example to use the upload component to read a csv file. I detected a few lines that are not working properly, but I am still unabled to get a data-frame from the file in a staightforward way.
Regarding the error: could not find function "base64_dec", I found that jsonlite package has a function "base64_dec" that seems to do what is intended. You can indicate the package when calling the function:
decoded = jsonlite::base64_dec(content_string)
Regarding the error: non-character argument, it is generated in this line when loading the app, because "contents" is still empty:
#This gives an error if run before reading the data
content_type = strsplit(contents, ",")
#Anyway that should be like below, because the content are given in a list and you want the first element
content_type = strsplit(contents, ",")[[1]][1]
Dash runs the callback at the begining of the app, but here we need the function to execute after selecting a file. Here the condition in the if statement is not doing its job:
#Will execute the code in the if even before selecting data:
if(is.null(list_of_contents) == FALSE)
#Will exectue only when data is selected
if(length(list_of_contents[[1]])>0)
The main issue is that when you have the decoded binary file, read.csv can't read it (at least not how it is in the example code given, because its input is the filename). Something that partially worked for me is this readBin() example, but you need to be aware of the size of the table, which is not practical for my case, because it will be different every time.
This is the complete code I modified to solve the issues you found, but the core part of reading the CSV is not functional. I also modified the conditions to proceed if the file selected is a CSV of Excel file (because they were not working properly):
library(dashCoreComponents)
library(dashHtmlComponents)
library(dash)
library(anytime)
app <- Dash$new()
app$layout(htmlDiv(list(
dccUpload(
id='upload-data',
children=htmlDiv(list(
'Drag and Drop or ',
htmlA('Select Files')
)),
style=list(
'width'= '100%',
'height'= '60px',
'lineHeight'= '60px',
'borderWidth'= '1px',
'borderStyle'= 'dashed',
'borderRadius'= '5px',
'textAlign'= 'center',
'margin'= '10px'
),
# Allow multiple files to be uploaded
multiple=TRUE
),
htmlDiv(id='output-data-upload')
)))
parse_contents = function(contents, filename, date){
print("Inside function parse")
content_type = strsplit(contents, ",")[[1]][1]
content_string = strsplit(contents, ",")[[1]][2]
#print(content_string)
decoded = jsonlite::base64_dec(content_string)
#print(decoded)
if(grepl(".csv", filename, fixed=TRUE)){
print("csv file selected")
## Here a function to read a csv file from the binary data is needed
## Because read.csv asks for the file NAME.
## readBin() can read it, but you need to know the size of the table to parse properly
#as.data.frame(readBin(decoded, character()))
#df = read.csv(utf8::as_utf8(decoded))
} else if(grepl(".xlsx", filename, fixed=TRUE)){
##Also to read the Excel
df = read.table(decoded, encoding = 'bytes')
} else{
return(htmlDiv(list(
'There was an error processing this file.'
)))
}
return(htmlDiv(list(
htmlH5(filename),
htmlH6(anytime(date)),
dashDataTable(df_to_list('records'),columns = lapply(colnames(df), function(x){list('name' = x, 'id' = x)})),
htmlHr(),
htmlDiv('Raw Content'),
htmlPre(paste(substr(toJSON(contents), 1, 100), "..."), style=list(
'whiteSpace'= 'pre-wrap',
'wordBreak'= 'break-all'
))
)))
}
app$callback(
output = list(id='output-data-upload', property = 'children'),
params = list(input(id = 'upload-data', property = 'contents'),
state(id = 'upload-data', property = 'filename'),
state(id = 'upload-data', property = 'last_modified')),
function(list_of_contents, list_of_names, list_of_dates){
if(length(list_of_contents[[1]])>0){
print("Inside if")
children = lapply(1:length(list_of_contents), function(x){
parse_contents(list_of_contents[[x]], list_of_names[[x]], list_of_dates[[x]])
})
return(children)
}
})
app$run_server()
I hope they revise this example to make it work.

JSON URL From NBA Website Not Working Anymore

I've been working on this project that scrapes data from the nba.com stats website using R. A couple of months ago, I was able to use it easily, but now the url does not seem to work and I can't figure out why. Looking at the website, it doesn't seem like the url changed at all, but I can't access it via my browser.
library(rjson)
url <- "https://stats.nba.com/stats/scoreboardV2?DayOffset=0&LeagueID=00&gameDate=02%2F07%2F2020"
data_json <- fromJSON(file = url)
Is anyone else experiencing this problem?

It was a header related issue. The following fixed it:
url <- "https://stats.nba.com/stats/scoreboardV2?DayOffset=0&LeagueID=00&gameDate=02%2F07%2F2020"
headers = c(
`Connection` = 'keep-alive',
`Accept` = 'application/json, text/plain, */*',
`x-nba-stats-token` = 'true',
`User-Agent` = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36',
`x-nba-stats-origin` = 'stats',
`Sec-Fetch-Site` = 'same-origin',
`Sec-Fetch-Mode` = 'cors',
`Referer` = 'http://stats.nba.com/%referer%/',
`Accept-Encoding` = 'gzip, deflate, br',
`Accept-Language` = 'en-US,en;q=0.9'
)
res <- GET(url, add_headers(.headers = headers))
data_json <- res$content %>%
rawToChar() %>%
fromJSON()

Seach website for phrase in R

I'd like to understand what applications of machine learning are being developed by the US federal government. The federal government maintains the website FedBizOps that contains contracts. The web site can be searched for a phrase, e.g. "machine learning", and a date range, e.g. "last 365 days" to find relevant contracts. The resulting search produces links that contain a contract summary.
I'd like to be able to pull the contract summaries, given a search term and a date range, from this site.
Is there any way I can scrape the browser rendered data in to R? A similar question exists on web scraping, but I don't know how to change the date range.
Once the information is pulled into R, I'd like to organize the summaries with a bubble chart of key phrases.

This may look like a site that uses XHR via javascript to retrieve the URL contents, but it's not. It's just a plain web site that can easily be scraped via standard rvest & xml2 calls like html_session and read_html. It does keep the Location: URL the same, so it kinda looks like XHR even thought it's not.
However, this is a <form>-based site, which means you could be generous to the community and write an R wrapper for the "hidden" API and possibly donate it to rOpenSci.
To that end, I used the curlconverter package on the "Copy as cURL" content from the POST request and it provided all the form fields (which seem to map to most — if not all — of the fields on the advanced search page):
library(curlconverter)
make_req(straighten())[[1]] -> req
httr::VERB(verb = "POST", url = "https://www.fbo.gov/index?s=opportunity&mode=list&tab=list",
httr::add_headers(Pragma = "no-cache",
Origin = "https://www.fbo.gov",
`Accept-Encoding` = "gzip, deflate, br",
`Accept-Language` = "en-US,en;q=0.8",
`Upgrade-Insecure-Requests` = "1",
`User-Agent` = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.41 Safari/537.36",
Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
`Cache-Control` = "no-cache",
Referer = "https://www.fbo.gov/index?s=opportunity&mode=list&tab=list",
Connection = "keep-alive",
DNT = "1"), httr::set_cookies(PHPSESSID = "32efd3be67d43758adcc891c6f6814c4",
sympcsm_cookies_enabled = "1",
BALANCEID = "balancer.172.16.121.7"),
body = list(`dnf_class_values[procurement_notice][keywords]` = "machine+learning",
`dnf_class_values[procurement_notice][_posted_date]` = "365",
search_filters = "search",
`_____dummy` = "dnf_",
so_form_prefix = "dnf_",
dnf_opt_action = "search",
dnf_opt_template = "VVY2VDwtojnPpnGoobtUdzXxVYcDLoQW1MDkvvEnorFrm5k54q2OU09aaqzsSe6m",
dnf_opt_template_dir = "Pje8OihulaLVPaQ+C+xSxrG6WrxuiBuGRpBBjyvqt1KAkN/anUTlMWIUZ8ga9kY+",
dnf_opt_subform_template = "qNIkz4cr9hY8zJ01/MDSEGF719zd85B9",
dnf_opt_finalize = "0",
dnf_opt_mode = "update",
dnf_opt_target = "", dnf_opt_validate = "1",
`dnf_class_values[procurement_notice][dnf_class_name]` = "procurement_notice",
`dnf_class_values[procurement_notice][notice_id]` = "63ae1a97e9a5a9618fd541d900762e32",
`dnf_class_values[procurement_notice][posted]` = "",
`autocomplete_input_dnf_class_values[procurement_notice][agency]` = "",
`dnf_class_values[procurement_notice][agency]` = "",
`dnf_class_values[procurement_notice][zipstate]` = "",
`dnf_class_values[procurement_notice][procurement_type][]` = "",
`dnf_class_values[procurement_notice][set_aside][]` = "",
mode = "list"), encode = "form")
curlconverter adds the httr:: prefixes to the various functions since you can actually use req() to make the request. It's a bona-fide R function.
However, most of the data being passed in is browser "cruft" and can be trimmed down a bit and moved into a POST request:
library(httr)
library(rvest)
POST(url = "https://www.fbo.gov/index?s=opportunity&mode=list&tab=list",
add_headers(Origin = "https://www.fbo.gov",
Referer = "https://www.fbo.gov/index?s=opportunity&mode=list&tab=list"),
set_cookies(PHPSESSID = "32efd3be67d43758adcc891c6f6814c4",
sympcsm_cookies_enabled = "1",
BALANCEID = "balancer.172.16.121.7"),
body = list(`dnf_class_values[procurement_notice][keywords]` = "machine+learning",
`dnf_class_values[procurement_notice][_posted_date]` = "365",
search_filters = "search",
`_____dummy` = "dnf_",
so_form_prefix = "dnf_",
dnf_opt_action = "search",
dnf_opt_template = "VVY2VDwtojnPpnGoobtUdzXxVYcDLoQW1MDkvvEnorFrm5k54q2OU09aaqzsSe6m",
dnf_opt_template_dir = "Pje8OihulaLVPaQ+C+xSxrG6WrxuiBuGRpBBjyvqt1KAkN/anUTlMWIUZ8ga9kY+",
dnf_opt_subform_template = "qNIkz4cr9hY8zJ01/MDSEGF719zd85B9",
dnf_opt_finalize = "0",
dnf_opt_mode = "update",
dnf_opt_target = "", dnf_opt_validate = "1",
`dnf_class_values[procurement_notice][dnf_class_name]` = "procurement_notice",
`dnf_class_values[procurement_notice][notice_id]` = "63ae1a97e9a5a9618fd541d900762e32",
`dnf_class_values[procurement_notice][posted]` = "",
`autocomplete_input_dnf_class_values[procurement_notice][agency]` = "",
`dnf_class_values[procurement_notice][agency]` = "",
`dnf_class_values[procurement_notice][zipstate]` = "",
`dnf_class_values[procurement_notice][procurement_type][]` = "",
`dnf_class_values[procurement_notice][set_aside][]` = "",
mode="list"),
encode = "form") -> res
This portion:
set_cookies(PHPSESSID = "32efd3be67d43758adcc891c6f6814c4",
sympcsm_cookies_enabled = "1",
BALANCEID = "balancer.172.16.121.7")
makes me think you should use html_session or GET at least once on the main URL to establish those cookies in the cached curl handler (which will be created & maintained automagically for you).
The add_headers() bit may also not be necessary but that's an exercise left for the reader.
You can find the table you're looking for via:
content(res, as="text", encoding="UTF-8") %>%
read_html() %>%
html_nodes("table.list") %>%
html_table() %>%
dplyr::glimpse()
## Observations: 20
## Variables: 4
## $ Opportunity <chr> "NSN: 1650-01-074-1054; FILTER ELEMENT, FLUID; WSIC: L SP...
## $ Agency/Office/Location <chr> "Defense Logistics Agency DLA Acquisition LocationsDLA Av...
## $ Type / Set-aside <chr> "Presolicitation", "Presolicitation", "Award", "Award", "...
## $ Posted On <chr> "Sep 28, 2016", "Sep 28, 2016", "Sep 28, 2016", "Sep 28, ...
There's an indicator on the page saying these are results "1 - 20 of 2008". You need to scrape that as well and deal with the paginated results. This is also left as an exercise to the reader.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to get table from html form using rvest or httr? - r

Related

R: search_fullarchive() and Twitter Academic research API track

How to Call Amazon Product Advertising API 5 from R?

RDash - getting error while refresh the page and Upload the csv file

JSON URL From NBA Website Not Working Anymore

Seach website for phrase in R

Categories

Resources