Downloading data table from a website requires login in R - r

I am trying to automate downloading data from a website which requires login first. After login, I need to select the particular table that I want to download from a dropdown and then select the dates for which I want the data and then download it into text format. Can this be done in R? The page I m trying to scrape is a jsp page and when I run the following command it gives me an error like "JSP processing error". Can anybody help me with the solution? ALso, I need to select the dates, so need to get those forms also.
Any help would be really appreciated.
handle <- handle("http://xyzwebsite.jsp")
# fields found in the login form.
login <- list(
ussername = "username"
,password = "password"
)
response <- POST(handle = handle, body = login)
response

Related

Web Scraping - Using Functions on a Secure Site (rvest)

I'm attempting to scrape a site that requires a form to be submitted before getting results. I'm struggling to understand how it works, let alone syntax and other things.
I have been looking at code posted by other people, and many people use rvest or RSelenium. I can't seem to get my form to submit properly, and not sure how to go about extracting the results into R once it does submit.
Now, I can't share the specific site that I'm working from, but I've found an analog:
https://gapines.org/eg/opac/advanced
For example, I might need to select "Books" under "Item Type," and "Braille" under "Item Form." Once the form is submitted, I would need to capture that results page.
Copying from other peoples' code, I have the following:
library(rvest)
url <- "https://gapines.org/eg/opac/advanced"
my_session <- html_session(url) #Create a persistant session
unfilled_forms <- html_form(my_session)
login_form <- unfilled_forms[[2]] # select the form you need to fill
filled_form <- set_values(login_form,'fi:item_type'="Books",'fi:item_form'="Braille")
login_session <- submit_form(my_session,filled_form)
When I run the submit_form(), it says "Submitting with 'NULL'."
Once it is submitted, I also want to extract the results, but not sure how to even begin.
Thanks in advance.

Scraping login protected website with a challenge form?

I'm trying to do some web scraping from steamspy.com, specifically the total playtime hours for a certain game. That info is behind the login wall for the site, so I've been trying to figure out how to get R past it for html mining.
I tried this method for passing login credentials via POST() but it doesn't seem to work. I noticed that the login handler for that example used POST, whereas looking at the source code for steamspy it seems to use a challenge form and I wasn't sure how to proceed with R.
My attempt thus far looks like this:
handle <- handle("http://steamspy.com")
path <- "/login/"
login <- list(
jschl_vc = "bc4e...",
pass = "148..."
)
response <- POST(handle = handle, path = path, body = login)
I found the values for the jschl_vc and pass from inspecting the source code after I logged in. The code above doesn't work and gives me:
Error in curl::curl_fetch_memory(url, handle = handle) : Failure
when receiving data from the peer
probably since I'm tryign to use POST to a challenge form. Is there way that I'm missing to proceed?

How can I allow new R users to send information to a Google Form?

How can I allow new R users to send information to a Google Form? (RSelenium requires a bit of set up, at least for headless browsing, so it's not the best candidate IMO but I may be missing something that makes it the best choice).
I have some new R users I want to get responses from interactively and send to a secure location. I have chosen Google Forms to pass the information to, as it allows one way sends of the info and doesn't allow the user access to the spreadsheet that is created from the form.
Here's a url of this form:
url <- "https://docs.google.com/forms/d/1tz2RPftOLRCQrGSvgJTRELrd9sdIrSZ_kxfoFdHiqD4/viewform"
To give context here's how I'm using R to interact with the user:
question <- function(message, opts = c("Yes", "No")){
message(message)
ans <- menu(opts)
if (ans == "2") FALSE else TRUE
}
question("Was this information helpful?")
I want to then send that TRUE/FALSE to the Google form above. How can I send a response to the Google Form above from within R in a way that I can embed in code the user will interact with and doesn't require difficult set up by the user?
Add on R packages are fine if they accomplish the task.
You can send a POST query. Here an example using httr package:
For example:
library(httr)
send_response<-
function(response){
form_url <- "https://docs.google.com/forms/d/1tz2RPftOLRCQrGSvgJTRELrd9sdIrSZ_kxfoFdHiqD4/formResponse"
POST(form_url,
query = list(`entry.1651773982`=response)
)
}
Then you can call it :
send_response(question("Was this information helpful?"))

Cran R 'httr'/'Rcurl' packages - use cookies to load a page

I am a begginer in R coding (or in coding in general) and I am trying to load a bunch of prices from a website in Brazil.
http://www.muffatosupermercados.com.br/Home.aspx
When I open the page I am prompted with a form to choose a city, in which I want "CURITIBA".
Opening the Cookies in Chrome I get this:
Name: CidadeSelecionada
Content: CidadeId=55298&NomeCidade=CURITIBA&FilialId=53
My code is to get the prices from this link:
"http://www.muffatosupermercados.com.br/CategoriaProduto.aspx?Page=1&c=2"
library(httr)
a1 <- "http://www.muffatosupermercados.com.br/CategoriaProduto.aspx?Page=1&c=2"
b2 <- GET(a1,set_cookies(.CidadeSelecionada = c(CidadeId=55298,NomeCidade="CURITIBA",FilialId=53)))
cookies(b2)
From this the only response I get is the session Id cookie:
$ASP.NET_SessionId
[1] "o5wlycpnjbfraislczix1dj4"
and when I try to load the page I only get the page behind the form, which is empty:
html <- content(b2,"text")
writeBin(html, "myfile.txt")
Does anyone have an idea on how to solve this? I also tried using RCurl and posting the form data with no luck...
There is a link to another thread of mine trying to do this in a different way:
RCurl - submit a form and load a page

Reading HTML tables in R if login and other previous actions are required

I am using XML package to read HTML tables from web sites.
Actually I'm trying to read a table from a local address, something like http://10.35.0.9:8080/....
To get this table I usually have to login into by typing login and password.
Therefore, when I run:
library(XML)
acsi.url <- 'http://10.35.0.9:8080/...'
acsi.df <- readHTMLTable(acsi.url, header = T, stringsAsFactors = F)
acsi.df
I see acsi.df isn't my table but the login page.
How can I tell R to input login and password and loggin on before reading the table?
There is no general solution, you have to analyze the details of you login procedure, but package RCurl and the following link should help:
Login to WordPress using RCurl

Resources