Rselenium nested file link download - web-scraping

I am trying to download the files on this page.
I am able to find some of the media using the following code. However, some of the links are nested meaning some files show up after you click one link
`
all_media <- remDr$findElements(using = "class",
value = "nofocus")
`
I am able to click the link
image 1
and get to this
image 2
but I am not able to click
image 3.
Here is the inspect part of the link
image 4
How can I locate the "data-bind="html: doc.Name" in the inspect section.
I tried the following but it didn't work
remDr$findElements(using = "css selector",
value = "input[data-bind ='html:doc.Name']")
Thanks

Related

Fill input tag of an html in python

So I have tried web scraping a website and it has a field where you can write ( a navigation bar of some sort )
Whenever I am writing something there it creates a dropdown of things related to what I wrote ( things that contain what I wrote )
What I'm trying to do is essentially use requests.post from requests library in python in order to fill a value inside it, afterwards, I want it to grab whatever the dropdown showed.
I've had a few problems while doing it:
The dropdown disappears whenever you click somewhere else on the website so it does create temporary HTML tags of the list temporarily.
I couldn't find a way to actually post something inside the navigation bar.
A great example I've found on the web is inside FUTWIZ which does exactly what I described above, Whenever I try with F12 I see it creates some HTML description, is there a way to grab the HTML After the value is put inside the actual navigation bar?
EDIT
This is the code I've tried:
import requests
from bs4 import BeautifulSoup
urls = "https://www.futwiz.com/en/"
requst = requests.get(urls)
bs4Out = BeautifulSoup(requst.text, "html.parser")
poster = requests.post(urls, data={"form-control": "Messi"})
print(poster.text)
Now, I know the data in requests.post only puts it as a query but I can't really figure out how to fill the header
This is the link to FUTWIZ, it has the navigation bar which is the thing I'm trying to work with?
https://www.futwiz.com/en/

How to do internal links in Google Colab

I would like to make a reference list with the sections of my Colab notebook. The notebook is saved in my Google Drive.
I am trying HTML and Markdown hyperlinks, which work fine if the link is to an HTTP URL, but it is not working for internal Sections in the notebook.
For example, in a text cell I set the outline:
1. [Section 1](#s1)
2. [Section 2](#s2)
and in the destination section:
<a id='s1'></a>
#Section 1
.....
<a id='s2'></a>
#Section 2
The hyperlink in the list of the outline it is showed as a hyperlink but when I click on it or it does not do anything, or it opens a new tab in the browser with an error message:
Colab creates its own Content list using the markdown sections and subsections but internal links from one section to another are not possible.
With Colab you have to insert an <a name="id"></a> tag in the cell you want to link to.
The link works like normal:
[Link Text](#cell-id)
And the destination cell with the tag would be:
<a name="cell-id"></a>
# Heading
This is the cell I'm linking to
If you look close to your current colab url
https://colab.research.google.com/drive/BLAHBLAH?authuser=SOMENUMBER#scrollTo=BLOCKADDRESS
Every block has its own #scrollTo address.
Try clicking other blocks.
So,
I use it like
1. [LINK TEXT](#scrollTo=BLOCKADDRESS)
This way you don't have to put id tags
The following lines worked for me
Origination: text
Destination:
A typical mistake must be the use of " ". If you look at my above lines carefully, it applies "" for the destination line Link, whereas not for the origination line.
Please do not add " " like "#Link" in the origination, otherwise it will direct you to the "Sorry, the file you have requested does not exist." page.
I Create the internal links, in colab, as follows:
In the source text cell:
text of the link
In the target cell:
<div class="markdown-google-sans"><a name="id-link"></a></div>
Note: I used 'name' instead of 'id' because 'id' doesn't work.

Using street map and street view side by side in a xaringan slide

I came across this example from Google that shows how we can place a street map and a street view in a side-by-side fashion. But I don't know how I can create such a view in a xaringan slide.
I tried copying the code under "All" tab, saved the contents as an html file, and later imported that file in the xaringan slide. But this approach doesn't seem to work. Can somebody point me in the right direction?
(Cross posted from RStudio Community)
I finally figured it out! Here's what we need to do:
Go to this link to open the Street View Side-By-Side example
Go to All tab and click on the Copy code sample to clipboard menu
Open any text editor (e.g., notepad++), paste the code sample and save the file with .html extension
Go to this link and follow the instructions given under Get the API key section to get an API key
Replace the text YOUR_API_KEY on line 7 of the code sample with the API key obtained from Google through Step 4
src="https://maps.googleapis.com/maps/api/js?key=YOUR_API_KEY&callback=initialize&libraries=&v=weekly"
In the definition of the initialize() function, replace fenway with the desired location name and change the associated lat & lng
Add the following code to xaringan's source Rmd file to create a slide showing the satellite view and street view in a side-by-side fashion
# Adjust width & height to suit your needs
htmltools::tags$iframe(
src = "path/to/street-view-side-by-side.html",
width = 800,
height = 450
)

webscraping java by hovering mouse. Dynamic data not displayed after scraping

I want to scrape data from a graph of a particular website.
This information in graph is available only if you hover mouse on the graph.But after I scrape, I am unable to see the data in output even though it is visible under 'Inspect Element'.
I have tried to scrape using JSoup but when I scrape the data, the data that changes by hovering mouse is not displayed.
How can I do this?
Below is the information which I have to scrape. I have to scrape the dynamically changing value '184'.
The value 184 is dynamically changing when you hover mouse on graph wit h RGB values displyaed in the above line. Even these RGB values changes by hovering mouse on graph.
After scraping, the output of document by Jsoup looks like the below:
The number 184 and rgb values are not appeared. How are these fields disappeared in output? Does this not appear because it is a dynamic data by mouse hovering?
I actually have to scrape information from the following graph which displays 'Carbon Intensity' value from the graph "Carbon Intensity in the last 24 hours" only by hovering mouse on it.
I am stuck with this problem since two days and has not found any helpful solution. I am using Jsoup on linux.Could some one suggest me how can I do this.
Thanks in advance!
To do that you should use Selenium and add it to Maven if you are using it, or to whatever dependency manager you are using. Once you do that you need to add this .exe (https://github.com/mozilla/geckodriver/releases) to your project folder to get the Firefox support for Selenium, you can also use Google Chrome following this tutorial (https://github.com/SeleniumHQ/selenium/wiki/ChromeDriver).
You have a lot of tutorials on how to force the JS of a web page to get its content, but it could be something like this, to set the mouse over an item from the HTML:
WebDriver webDriver = new FirefoxDriver();
JavascriptExecutor js = (JavascriptExecutor)webDriver;
webDriver.get(URL); // You have to place the URL you are crawling here
Actions action = new Actions(webDriver);
WebElement webElement = webDriver.findElement(By.id("country-emission-rect));
// using By you have a lot more options to select HTML content, I guess you want to place the mouse over that item in particular, but you can change if it it's another one
action.moveToElement(webElement).perform();
WebDriverWait webDriverWait = new WebDriverWait(webDriver, 15); // wait max 15 seconds
// wait until the element with class name: "country-emission-intensity" is loaded
webDriverWait.until(ExpectedConditions.visibilityOfElementLocated(By.className("country-emission-intensity")));
// get the HTML generate after the mouse over that now has the text you want to get
String fullHtml = webDriver.getPageSource();
webDriver.quit();
If you want to keep using JSOUP instead of Selenium for the scrapping you can now do:
Document document = Jsoup.parse(fullHtml);
Remember to place the .exe in your project folder and to install correctly all the Selenium dependencies (enabling auto-import if you are using Maven).
Hope it helped you! If you need anything else feel free to ask!

R - using rselenium, how to navigate through drop-down directory

I am trying to use rselenium to upload a CSV file to a website. My last two lines of code are below, which I use to click on an 'upload' button on the website, prompting the website to drop-down a window with my directories, similar to finder, that I can navigate through to select which file I'd like to upload. The code that prompts the drop-down:
upload_CSV_link = remDr$findElement(using = "css selector", "button.dk-btn.dk-btn-success.dk-btn-icon.pull-right")
upload_CSV_link$clickElement()
For clarity, this is essentially the same drop-down that you would get when clicking save as in a microsoft word / excel document. The difficult aspect of this drop-down is that I cannot inspect elements in the drop-down, making it more difficult to navigate to the correct directory / select the correct file. Has anybody experienced this issue before?
Thanks,

Resources