I am trying to scrape referee game data using rvest. See the code below:
page_ref<-read_html("https://www.pro-football-reference.com/officials/HittMa0r.htm")
ref_tab <- page_ref %>%
html_node("#games") %>%
html_text()
#html_table()
But rvest does not recognize any of the nodes for the "Games" table in the link. It can pull the data from the first table "Season Totals" just fine. So, am I missing something? In general, what does it mean if rvest doesn't recognize a node identified with SelectorGadget and is clearly identified in the developer tools?
It is because the first table is in the html you get from the server and the other tables are filled by JavaScript. Rvest can only get you the things that are there in the html response from server. If you want to get the data filled by JS, you need to use some other tool like Selenium or Puppeteer, for example.
Selenium
Puppeteer / gepetto
Related
I am very new to rvest library and this is the first time that I am trying to scrape something. I am trying to scrape the very first table on this web page https://mup.gov.hr/promet-na-granicnim-prijelazima-282198/282198, that is titled PUTNICI (translated to PASSANGERS) within iframe, but I am struggling to do that.
In the top left corner, there is also a date option, that one can choose to select very specific day, month and year that one wants to see.
Is there any chance that I can scrape that very first table for specific time period, lets say whole January 2022, or if not, at least to scrape the very first table?
This is my code at the moment:
"https://mup.gov.hr/promet-na-granicnim-prijelazima-282198/282198" %>%
read_html() %>%
html_nodes("iframe") %>%
extract(1) %>%
html_attr("src") %>%
read_html() %>%
html_node("#prometGranicniPrijelaz") %>%
html_text()
I would be really thankful if someone helped me on this subject!
If you open your browser's Developer Tools - Network tab - fetch/Xhr and then change the date on the website you will see the request the happens in the backend that loads the data you are looking for. You can make your queries to that backend url and loop through the dates:
https://granica.mup.hr/default.inc.aspx?ajaxq=PrometPoDatumu&odDat=06.02.2022
https://granica.mup.hr/default.inc.aspx?ajaxq=PrometPoDatumu&odDat=07.02.2022
etc
I don't believe it's in an iframe but rather an HTML table with a class = "desktop promet" and you can parse the data you are looking for from there.
My goal is to scrape some data if the product available or not.
At present I am using the following:
=importxml (B2,//*[#id="product-controls"]/div/div1/div1)
Unfortunately, i am receiving an error. Here is the link to the file https://docs.google.com/spreadsheets/d/11OJvxRRIXJolpi2UttmNIOArAdwh1qeZhjqczlVI8oc/edit#gid=1531110146
As an example, I want to get the data from the url https://radiodetal.com.ua/mikroshema-5m0365r-dip8
and Xpath should be from here
got the formula
=importxml (B2,"//div[#class='stock']")
I want to import data from https://www.investing.com/equities/boc-hong-kong-historical-data by importxml formula in Google Sheets. It can be done by importhtml but i would like to import it by xpath becase it would not has scraping updates issues.
I used IMPORTXML("https://www.investing.com/equities/boc-hong-kong-historical-data","//*[#id='curr_table']") and then it scraped but in bad shape; for example it does not specify rows and columns or Comma-delimited.
How can I extract data by xPath in Google Sheets?
I believe your goal as follows.
You want to retrieve the table in the URL of =IMPORTHTML("https://www.investing.com/equities/boc-hong-kong-historical-data","table",2) using the xpath on Google Spreadsheet.
Modified formula:
In order to retrieve the values using the xpath, please use the following xpath.
=IMPORTXML("https://www.investing.com/equities/boc-hong-kong-historical-data","//table[#id='curr_table']//tr")
In this case, the xpath is //table[#id='curr_table']//tr.
Also, you can use the xpath of //*[#id='curr_table']//tr.
Result:
Note:
As another method, I think that IMPORTHTML can be also used like below. This is the same with above formula.
=IMPORTHTML("https://www.investing.com/equities/boc-hong-kong-historical-data","table",2)
References:
IMPORTXML
IMPORTHTML
I am trying to scrape a web forum using Scrapy for the href link info and when I do so, I get the href link with many letters and numbers where the question mark should be.
This is a sample of the html document that I am scraping:
I am scraping the html data for the href link using the following code:
response.xpath('.//*[contains(#id, "thread_title")]/#href').extract()
When I run this, I get the following results:
[u'showthread.php?s=f969fe6ed424b22d8fddf605a9effe90&t=2676278']
What should be returned is:
[u'showthread.php?t=2676278']
I have ran other tests scraping for href data with question marks elsewhere in the document and I also get the "s=f969fe6ed424b22d8fddf605a9effe90&" returned.
Why am I getting this data returned with the "s=f969fe6ed424b22d8fddf605a9effe90&" instead of just the question mark?
Thanks!
It seems that the site I am scraping from uses a unique identifier in order to more accurately update the number of views per the thread. I was not able to return scraped data without a unique id, it changed over time, and scraped a different HTML tag for the thread ID and then joined it to the web address (showthread.php?t=) to create the link I was looking for.
I have inserted a dataset to my asp.net webpage, using Add New Item >> Dataset. I have bound it with my table in DB and also previewed data by right clicking Preview Data BUT the problem is previewing data through code.
I am quite familiar with using the datasets in vb.net but I wonder how to use them in asp.net.
I am simply tring to use through filling the TableAdapter, as
Me.TblSQsTableAdapter.Fill(Me.DsSQs.tblSQs, vrExamIDSetPaper)
but dsSQs (my dataset name) does not snow the table name when I press . as we do in vb.net winforms.
I want to get no. of rows that come under given parameter. e.g vrExamIDSetPaper
Plesae help.
Thanks
You need to inspect the Tables collection and then rows.
dsSQs.Tables(0)
dsSQs.Tables(0).Rows.Count
You can do it by using below code by that name vrExamIDSetPaper
dsSQs.Tables(vrExamIDSetPaper).Rows.Count// For how many rows