webscraping with rvest, select content in form - r

I am using rvest to web scrape a particular website. On the website I need to select for example the date and the location that I want to display.
library(rvest)
session<-html_session(url)
form<-html_form(read_html(url))[[1]]
I can now set different values e.g. for the date.
set_values(form, from="08.02.2018")
and on the console appears: < input text > 'from': 08.02.2018
as well as for the other element "location": < select > 'location': [0/3]
My question is how I can set the value for "location". I know the different elements of "location" from the manual dropdown. Still:
set_values(form, location="Location_A")
doesn't work. No selection is visible and when submitting the form with
result<-submit_form(session,form)
A red message is on the console (Submitting with 'Location'). The result is an object with class "session", which I wasnt able to read with
html_table(result)
With Error: Error in UseMethod("html_table") :
no applicable method for 'html_table' applied to an object of class "list"
I started with
html_table(result[1])
However, none was working, same error.

Related

Form is submitting empty values though it's required in the datasource (Google App Maker)

Could someone please help me how to display error message when I am submitting form and validating whether all the required fields are filled with data or not in Google App Maker, I have tried by using regular expressions but it didn't work:

Shiny error message appears and dissappears immediately

Three selectInput options I have in my app. First is Class, second is Select Student and third is Exams. So when I chose class 1 in the 'Class selectInput', all the students name in class 1 is generated in 'Select Student selectInput'.
The error message pops for few seconds and then goes away while choosing the class choices in 'Class selectInput'. This also happens at the time of landing at the page.
Error message is 'data' must be of a vector type, was 'NULL'.
This doesn't happen if I remove the bar plot and render just the table output.
The app is online here. You can see the error message yourself when you switch the Classes.
The code is here.
Any hints gratefully received.

Web Scraping in Ruby : Issues Scraping Data that comes from Database

I'm attempting to pull the Name, Address, Phone Number, and Email Address of resorts from the page(s)
http://www.exploreminnesota.com/places-to-stay/resorts/?keywords=&pageIndex=0&radius=0&mapTab=false&sortOrder=asc&sort=randomdaily&locationid=&startDate=false&class_id=7&lat=&lon=&city=&pageSize=20&type=reitlistings&attrFieldsOr=
I'm new to Ruby and have looked for some examples, but this seems to be too specific to find a suitable solution.
Focusing on the Email Address element. Having 'inspected element', and taken the CSS path (#category-listings > li:nth-child(1) > div > div > ul > li:nth-child(2) > a)
I have written the ruby script to attempt to pull this data:
require 'nokogiri'
require 'open-uri'
PAGE_URL = "http://www.exploreminnesota.com/places-to-stay/resorts/?keywords=&pageIndex=0&radius=0&mapTab=false&sortOrder=asc&sort=randomdaily&locationid=&startDate=false&class_id=7&lat=&lon=&city=&pageSize=20&type=reitlistings&attrFieldsOr="
page = Nokogiri::HTML(open(PAGE_URL))
site1 = page.css(' #category-listings li:nth-child(1) div div ul li:nth-child(2) a')
puts site1
The output:
href="mailto:**%7B%7Br._source.database_fields.email%7D%7D"** class="button gaTracker" title="**{{r._source.database_fields.email}}**" data-tracker-type="event" data-category="Email" data-label="{{r._source.location.split('/')[1]}}" data-action="{{url | analyticsAction}}">Email
As you can see, where I want the email address, the title shows up as the call : r._source.database_fields.email
When inspecting this element that data shows as:
href="mailto:vacation#gull4seasons.com" class="button gaTracker" title="vacation#gull4seasons.com" data-tracker-type="event" data-category="Email" data-label="gull-four-seasons-resort" data-action="Places to Stay">Email
I'm not sure how to access the data as shown in the browser counsel. Any help would be appreciated, and would also help me to understand HTML / CSS, and how data gets pulled into a page in general from a database.
Thank you!
The following code will make a GET request to exploreminnesota.com for only the JSON data you want -- i.e., you no longer need Nokogiri. Then it will convert the response (a temporary file) into a Ruby JSON object and pretty-print the object to the terminal.
require "open-uri"
require "json"
url = "http://www.exploreminnesota.com/getJsonData.ashx?id=61&keywords=&pageIndex=0&radius=0&mapTab=false&sortOrder=asc&sort=randomdaily&locationid=&startDate=false&class_id=7&lat=&lon=&city=&pageSize=20&type=reitlistings&attrFieldsOr="
response_file = open(url) # Make HTTP request and save as temp file
response_json = JSON.parse(response_file.read) # Convert response to JSON
puts JSON.pretty_generate(response_json)
Notice getJsonData.ashx in the URL assigned to url -- this URL will retrieve JSON data rather than HTML.
I found url using the Chrome inspector. In the inspector, go to the Network tab, check "Disable cache," refresh the page, and filter the results for the "XHR" request type. Here you'll see the list of XHR requests and responses that the browser has made after the initial page load (I counted 6). Now filter those results for the word "city" and you should see the one network request that you care about. Click on it, then click the "Preview" tab -- now you can explore the JSON response right there in the inspector by expanding and collapsing the objects.
If you add the following line to my example above:
puts response_json["hits"]["hits"][0]["_source"]["database_fields"]["email"]
it will print the first resort's email address.

How to increase the Fields of input in Firebase Leaderboard

Go to http://goo.gl/I4XLKF (Link to my jsfiddle workout)
Here I have tried to increase the input fields/option for Firebase LeaderBoard (https://www.firebase.com/tutorial/#example/leaderboard), But it is Not working.
I have defined 3 Inputs/Firebase elements
name - this element/child is for the name of website
AlternateURL - this element/child is for the URL of website
Score - this element/child is for setting priority to the URL by entering numerical values.
I am using this LeaderBoard script to allow my users to sumbit alternate url for the various apps.
I think in this section, you only need to use the name field to reference the row and not your alternateurl.
// Store a reference to the table row so we can get it again later.
htmlForPath[scoreSnapshot.name()] = newScoreRow;
htmlForPath[scoreSnapshot.AlternateURL()] = newScoreRow;
get rid of the last line.
Unless there is an error somewhere else, you've addressed the areas to change/add code (variable names are my version:
<input type="text" id="urlInput" placeholder="URL">
newScoreRow.append($("<td/>").text(scoreSnapshot.val().url));
var url1 = $("#urlInput").val();
userScoreRef.setWithPriority({ name:name, score:newScore, url:url1}, newScore);

Selecting list values automatically results in StaleElementReferenceException

In my search page I have a list with an input box and 'Go' button. I get the list value in my console but what I want is to iterate the list and enter the text (text is static) in the input box. Every time the list is selected it automatically searches rather than having to press the 'Go' button.
WebElement search_list = driver.findElement(By.xpath("//*[#id='tl_tPlace']"));
List <WebElement> li = search_list.findElements(By.tagName("option"));
for(int i =0; i<li.size();i++){
System.out.println(li.get(i).click());
try{
driver.findElement(By.xpath("//*#id='cmain_Place']")).sendKeys("text");
Now the first time the list has "select the state" it passes for the 2nd loop and it selects the list item. Once selected, the page reloads. Then there's an error message:
Exception in thread "main" org.openqa.selenium.StaleElementReferenceException: Element not found in the cache - perhaps the page has changed since it was looked up
This is entirely expected behavior, since the DOM is completely recreated when the page, so any references you have to elements will become invalid after the page refreshes.

Resources