I am learning how to fill forms and submit with rvest in R, and I got stucked when I want to search for ggplot tag in stackoverflow. This is my code:
url<-"https://stackoverflow.com/questions"
(session<-html_session("https://stackoverflow.com/questions"))
(form<-html_form(session)[[2]])
(filled_form<-set_values(form, tagQuery = "ggplot"))
searched<-submit_form(session, filled_form)
I've got the error:
Submitting with '<unnamed>'
Error in parse_url(url) : length(url) == 1 is not TRUE
Follow this question (rvest error on form submission) I tried several things to solve this, but I couldnt:
filled_form$fields[[13]]$name<-"submit"
filled_form$fields[[14]]$name<-"submit"
filled_form$fields[[13]]$type<-"button"
filled_form$fields[[14]]$type<-"button"
Any help guys
The search query is in html_form(session)[[1]]
As there is no submit button in this form :
<form> 'search' (GET /search)
<input text> 'q':
this workaround seems to work :
<form> 'search' (GET /search)
<input text> 'q':
<input submit> '':
Giving the following code sequence :
library(rvest)
url<-"https://stackoverflow.com/questions"
(session<-html_session("https://stackoverflow.com/questions"))
(form<-html_form(session)[[1]])
fake_submit_button <- list(name = NULL,
type = "submit",
value = NULL,
checked = NULL,
disabled = NULL,
readonly = NULL,
required = FALSE)
attr(fake_submit_button, "class") <- "input"
form[["fields"]][["submit"]] <- fake_submit_button
(filled_form<-set_values(form, q = "ggplot"))
searched<-submit_form(session, filled_form)
the problem is that the reply has a captcha :
searched$url
[1] "https://stackoverflow.com/nocaptcha?s=7291e7e6-9b8b-4b5f-bd1c-0f6890c23573"
You won't be able to handle this with rvest, but after clicking manually on the captcha you get the query you're looking for :
https://stackoverflow.com/search?q=ggplot
Probably much easier to use my other answer with:
read_html(paste0('https://stackoverflow.com/search?tab=newest&q=',search))
Related
please I have a problem with the parameters of an action. Can someone help me? I'll explain:
I have an action "Etiquette (string nom_pharm, int numBon, int nbColis)" in the "Etiquetage" controller. In the view I have two input text for numBon and nbColis, I try to get them back to pass them in parameter but always the second one indicates that it is null, I use this line:
<input type="button" value="Ok" onclick="window.location = '#Url.Action("Etiquette", "Etiquetage",new { nom_pharm = #Model.PharmacieNom, numBon = " ", nbColis = " " })'+parseInt(document.getElementById('in').value), +parseInt(document.getElementById('bon').value)" />
Do you have an idea of the correct syntax?
The relative url you need is /Etiquetage/Etiquette?non_pharm={value of in}&numBon={value of bon}&nbColis={value of in}
but your code is generating this /Etiquetage/Etiquette?non_pharm={value}&numBon= &nbColis= {value of in}{value of bon}.
Your approach is a bit strange. A more conventional approach would be to use a form with a POST rather than a GET. Assumming you have a good reason for using javascript here for sending a GET request this code should work.
<input type="button"
value="Ok"
onclick="window.location = '#Url.Action("Etiquette", "Etiquetage")?non_pharm=#Model.PharmacieNom&numBon='
+ document.getElementById('bon').value + '&nbColis='
+ document.getElementById('in').value" />
Note I have assumed that 'bon' goes with numBon and 'in' goes with nbColis.
Because you are creating a string the parseInts are not needed.
I need to read a source code for a reasearch and I can read the full text when I use a browser, but in R there is a hidden part. The code is replaced by a message saying that the content is allowed just for browsers which use cookies.
Based on the question
How to properly set cookies to get URL content using httr
I am using the following code:
library(httr)
url<-"https://www.ogol.com.br/player_results.php?id=5637"
r <- GET(url, query = list(a = 1))
cookies(r)
response<-GET(url,
set_cookies(`__cfduid` = "dde27d084f28a84488910bf48f22f5fa01530024956",
`FORCE_SITE_VERSION` = "desktop",
`FORCE_MODALIDADE` = "1",
`PHPSESSID` = "uou4jukkosdaafidp26857k8t3"))
player_code<-content(x = response,as = "text", encoding = "ISO-8859-1")
But it also hides a part of the code and returns the message:
"Este conteúdo apenas está disponível para browsers que aceitam cookies" (put the message just to identify if your help has the same result :) )
It means: The content is available just for browsers that accept cookies.
Am I using wrong cookie values or any other clue? Thanks in advance.
I am trying to webscrape with the code below, but am getting the following warning message:
In request_POST(session, url = url, body = request$values, encode = request$encode, :
Internal Server Error (HTTP 500)
What am I doing wrong?
library(rvest):
sisben <-html_session("https://wssisbenconsulta.sisben.gov.co/dnp_sisbenconsulta/dnp_sisben_consulta.aspx")
form <- html_form(sisben)[[1]]
fillform <- set_values(form,"ddlTipoDocumento" = "Cédula de Ciudadanía", "tboxNumeroDocumento" = "1234")
sis <- submit_form(session=sisben, form=fillform)
What data do you exactly want to scrape? For me the code rather looks as if you interact with a page (fill in a form and submit), but I do not see any rvest code you use to scrape data.
Regarding your error:
Looking at the html source code it looks like you only submit the label of the "Tipo de Documento", but not the correct internal value (which is numbered)
<option value="-1">Seleccione...</option>
<option value="1">Cédula de Ciudadanía</option>
<option value="3">Cédula de Extranjería</option>
<option selected="selected" value="4">Registro Civil</option>
<option value="2">Tarjeta de Identidad</option>
I didn't receive an error using the option value as input:
fillform <- set_values(form,"ddlTipoDocumento" = "1", "tboxNumeroDocumento" = "1234")
sis <- submit_form(session=sisben, form=fillform)
leads to the output:
Submitting with 'tboxNumeroDocumento'
Maybe this is already what you are looking for?
I am trying to access all the comments on a post on a particular fb page using getPost function. I am getting the below error. So how can I resolve this issue? Thanks
library(Rfacebook)
load("fbauthentication")
date1<-Sys.Date()-7
date2<-Sys.Date()
MaybellineUS<-getPage(page="Maybelline",token=authentication,n=100,since=date1,until=date2,feed=TRUE)
df <- data.frame(from_id=character(),from_name=character(),message=character(),created_time=character(),
likes_count=numeric(),comments_count=numeric(),id=character(),stringsAsFactors = FALSE)
i <- 1
while(i<=length(MaybellineUS)){
x<- getPost(post=MaybellineUS$id[i],n=500,token =authentication )
df<-rbind(df,x[['comments']])
i<-i+1
}
Error in callAPI(url = url, token = token) :
(#100) Tried accessing nonexisting field (from) on node type (Page)
I had the same problem with the Rfacebook package. It turns out that this is because the first call (getPage) returns some NA fields from the API. Consequently, your second call (getPost) is incorrectly formed. To prevent the error, wrap your second call in an if statement like this:
i <- 1
while(i<=length(MaybellineUS1)){
if (!is.na(MaybellineUS1$id[i]) {
x<- getPost(post=MaybellineUS1$id[i],n=500,token =authentication )
df<-rbind(df,x[['comments']])
i<-i+1
}
}
EDIT:
Also, I think in your example your token should be "fbauthentication", not "authentication".
How do I enclose a field name to postForm with RCurl when the form has fields like those below?
<input id="form:checkEstrato" type="checkbox" name="form:checkEstrato" checked="checked" />
<input id="form:checkArea" type="checkbox" name="form:checkArea" checked="checked" />
if I try something like
if(url.exists(url))
results <- postForm(url,
form:evento="35",
form:area = "10")
I get
> if(url.exists(url))
+ results <- postForm(url,
+ form:evento="35",
Error: unexpected '=' in:
" results <- postForm(url,
form:evento="
> form:area = "10")
Error: unexpected ')' in " form:area = "10")"
in fact it was simple, although now I have to work it out why Rcurl doesn't get what I want.
At least to avoid de above error it as just a matter of enclosing the parameter name with quotes
if(url.exists(url))
results <- postForm(url,
'form:evento'="35",
'form:area' = "10")
Now lets move ahead trying to understanding what is being sent to server and why its not working the way I expected.