Scraping login protected website with a challenge form? - r

I'm trying to do some web scraping from steamspy.com, specifically the total playtime hours for a certain game. That info is behind the login wall for the site, so I've been trying to figure out how to get R past it for html mining.
I tried this method for passing login credentials via POST() but it doesn't seem to work. I noticed that the login handler for that example used POST, whereas looking at the source code for steamspy it seems to use a challenge form and I wasn't sure how to proceed with R.
My attempt thus far looks like this:
handle <- handle("http://steamspy.com")
path <- "/login/"
login <- list(
jschl_vc = "bc4e...",
pass = "148..."
)
response <- POST(handle = handle, path = path, body = login)
I found the values for the jschl_vc and pass from inspecting the source code after I logged in. The code above doesn't work and gives me:
Error in curl::curl_fetch_memory(url, handle = handle) : Failure
when receiving data from the peer
probably since I'm tryign to use POST to a challenge form. Is there way that I'm missing to proceed?

Related

Downloading data table from a website requires login in R

I am trying to automate downloading data from a website which requires login first. After login, I need to select the particular table that I want to download from a dropdown and then select the dates for which I want the data and then download it into text format. Can this be done in R? The page I m trying to scrape is a jsp page and when I run the following command it gives me an error like "JSP processing error". Can anybody help me with the solution? ALso, I need to select the dates, so need to get those forms also.
Any help would be really appreciated.
handle <- handle("http://xyzwebsite.jsp")
# fields found in the login form.
login <- list(
ussername = "username"
,password = "password"
)
response <- POST(handle = handle, body = login)
response

Log in to website using Jsoup

I'm trying to scrap a webpage for data but came across the problem of needing to log in.
Connection.Response loginForm = Jsoup.connect("http://www.rapidnyc.net/users/google_login")
.method(Connection.Method.GET)
.execute();
Document document = Jsoup.connect("http://www.rapidnyc.net/users/google_login")
.data("Email", "testEmail")
.data("Passwd", "testPass")
.... //other form data
.cookies(loginForm.cookies())
.post();
This gives me the org.jsoup.HttpStatusException: HTTP error fetching URL. Status=400
I used chrome developer tool to look at the Form data being posted but nothing I post works.
1. Have you submitted ALL input fields? Including HIDDEN ones.
2. I see the website requires "captcha-box" authentication, which is to prevent web crawlers from logging in. I highly doubt you will be able to log in with your program.
I say the 400 status is coming from your program not being able to provide the value for "captcha" authentication.

How can I allow new R users to send information to a Google Form?

How can I allow new R users to send information to a Google Form? (RSelenium requires a bit of set up, at least for headless browsing, so it's not the best candidate IMO but I may be missing something that makes it the best choice).
I have some new R users I want to get responses from interactively and send to a secure location. I have chosen Google Forms to pass the information to, as it allows one way sends of the info and doesn't allow the user access to the spreadsheet that is created from the form.
Here's a url of this form:
url <- "https://docs.google.com/forms/d/1tz2RPftOLRCQrGSvgJTRELrd9sdIrSZ_kxfoFdHiqD4/viewform"
To give context here's how I'm using R to interact with the user:
question <- function(message, opts = c("Yes", "No")){
message(message)
ans <- menu(opts)
if (ans == "2") FALSE else TRUE
}
question("Was this information helpful?")
I want to then send that TRUE/FALSE to the Google form above. How can I send a response to the Google Form above from within R in a way that I can embed in code the user will interact with and doesn't require difficult set up by the user?
Add on R packages are fine if they accomplish the task.
You can send a POST query. Here an example using httr package:
For example:
library(httr)
send_response<-
function(response){
form_url <- "https://docs.google.com/forms/d/1tz2RPftOLRCQrGSvgJTRELrd9sdIrSZ_kxfoFdHiqD4/formResponse"
POST(form_url,
query = list(`entry.1651773982`=response)
)
}
Then you can call it :
send_response(question("Was this information helpful?"))

How to Mine Data from a Facebook Group using R?

I am using RFacebook package to do data mining.
For example, to get the facebook page then we do
fb_page <- getPage(page="facebook", token=fb_oauth)
In my case is it is a private group and the URL is something like this. So how do I get the page info for a group? I tried the following but got the following error.
Error in callAPI(url = url, token = token) : Unknown path
components: /posts
my_page <- getPage(page="group/222568978569", token=my_oauth)
This isn´t a reproducible example because the page you mention doesn´t exist. But I suggest that you check that this group has authorized your app. Note that after the introduction of version 2.0 of the Graph API, only friends/groups who are using the application that you used to generate the token to query the API will be returned.

login in to page in R httr moviepilot

I'm trying to begin working myself into web-scraping. Now my target is to get my personal rated movies from the moviepilot.de page.
For this I need to access following page: http://www.moviepilot.de/users/schlusie/rated/movies. But without authentication it is not possible.
I've read that the httr package can do something like this, save it as a handler with handle and than navigating over the homepage with your login-information. And thus accessing desired page. It should look like this:
library(httr)
mp = handle("http://moviepilot.de")
# authentication step
GET(handle=mp, path="/users/schlusie/rated/movies")
This is the login-page: http://www.moviepilot.de/login
Can someone please give me any pointers?

Resources