I want to scrape the historical prices for a specific from this link (154.94).
The stock ID (5115167) and the custom dates I will generate from other cells in the sheet for the URL.
I tried scraping with =IMPORTXML(A1,...) and =IMPORTHTML(A1,"table",...) but it's not working.
Tried the following XML:
//*[#id="printThisElement"]/div[2]
//*[#id="wrapper"]/div[9]/div/div[2]/div/div/div/div/div[1]
//*[#id="printThisElement"]/div[2]/div[3]
Neither of those are working.
Related
My goal is to scrape some data if the product available or not.
At present I am using the following:
=importxml (B2,//*[#id="product-controls"]/div/div1/div1)
Unfortunately, i am receiving an error. Here is the link to the file https://docs.google.com/spreadsheets/d/11OJvxRRIXJolpi2UttmNIOArAdwh1qeZhjqczlVI8oc/edit#gid=1531110146
As an example, I want to get the data from the url https://radiodetal.com.ua/mikroshema-5m0365r-dip8
and Xpath should be from here
got the formula
=importxml (B2,"//div[#class='stock']")
I'm trying to scrape in any Amazon search to get products and their prices so I'm working with rvest library in R to do that.
For example, for this search:
Amazon Search
I want to extract all product names and their prices. I tried the follow:
library(rvest)
link='https://www.amazon.com.mx/s?k=gtx+1650+super&__mk_es_MX=%C3%85M%C3%85%C5%BD%C3%95%C3%91&ref=nb_sb_noss_2'
simple=read_html(link)
simple %>% html_nodes("[class='a-size-base-plus a-color-base a-text-normal']") %>% html_text()
Using Chrome, class 'a-size-base-plus a-color-base a-text-normal' is where
product name it's stored.
That code works fine and I get all the products names. So, I was trying to get theirs prices with this:
simple %>% html_nodes("[class='a-offscreen']") %>% html_text()
Using Chrome, class 'a-offscreen' is where price it's stored.
That code returns me every price in the search but if you have seen the search, not all products have price. So, that code returns me products with price and I can't match products with their prices.
Is there a way to make it possible? maybe it can be possible filter only those products that have class 'a-offset' and get their prices?
Thanks.
You need to scrape the nodes of items first and then with each node, you scrape the product name and the price. Similar to this question: RVEST package seems to collect data in random order
I'm trying to scrape Google with a specific search term, from a specific site, in a specific date range using R.
Example
Search term: "Miroslava Breach Velducea"
Site: www.jornada.com.mx
Dates: 1/1/2011 - 1/1/2012
The link for that specific search is: https://www.google.com/search?q=Miroslava+Breach+Velducea+site:www.jornada.com.mx&tbas=0&tbs=cdr:1,cd_min:1/1/2011,cd_max:1/1/2012&ei=UqCzW6LZC8OK5wKg97vYDA&start=10&sa=N&biw=1137&bih=474
When I code that in R, I can scrape Google for that search term and in that site, but not for those dates.
web_address ='https://www.google.com/search?q=miroslava+breach+velducea+site%3Awww.jornada.com.mx&tbs=cdr%3A1%2Ccd_min%3A1%2F1%2F2010%2Ccd_max%3A12%2F31%2F2011'
webpage_code = read_html(web_address)
Título = html_text(html_nodes(webpage_code,'.r a'))
Título
Does anyone know how to scrape Google general search for specific dates?
I am trying to scrap a website (Please refer to urls in the code).
From the website ,i am trying to scrap all the information and transfer the data to json file.
scrapy shell http://www.narakkalkuries.com/intimation.html
To extract the information from website
response.xpath('//table[#class="MsoTableGrid"]/tr/td[1]//text()').re(r'[0-9,-/]+|[0-9]+')
I am able to retrieve most of the information from the website.
Concern:
Able to scrap data under "Intimation",expect'Intimation For September 2017' not able to scrap information under this tab.
Finding:
For 'Intimation For September 2017', the value is stored in the span tag
/html/body/div[4]/div[2]/div/table/tbody/tr[32]/td[1]/table/tbody/tr[1]/td[1]/p/b/span
For the remaining month the values are stored in the font tag
/html/body/div[4]/div[2]/div/table/tbody/tr[35]/td[1]/table/tbody/tr[2]/td[1]/p/b/span/font
How to extract information for "Intimation For September 2017" ?
You tables use different #class (MsoTableGrid and MsoNormalTable) so you need some way to process all of them:
for table in response.xpath('//table[#width="519"]'):
for row in table.xpath('./tr[position() > 1]'):
for cell in row.xpath('./td'):
#you can stringify value
cell_value = cell.xpath('string(.)').extract_first()
The Google Search Appliance goes through and finds out the date of each article when it crawls (last modified date is the default).
However, it doesn't turn up articles when you query by date code.
Is there any way to get the GSA to do this?
(We have a daily broadcast which people often search for by date code. Right now we have to manually put in the 4 most common date codes into the meta-keywords in order for them to be pulled up through a query)
Have you tried using inmeta:date as described in the Search Protocol Reference documentation?
Alternatively, if the date code is in the document content or the URL you could use entity recognition to extract it.
One way to make sure GSA is collecting the document date is to check the search results in XML format and see if tag has the date value. You can see the results in XML format by removing any proxystylesheet parameter in the URL.
If the value of tag is empty then GSA is not getting the document dates.
You can configure the document dates under Crawl and Index > Document Dates (at least at GSA version 7). We are using a meta tag approach. We put a date meta tag to each document/page and tell GSA to use this meta tag to sort the documents. The full list of options are:
URL
Meta Tag
Title
Body
Last Modified
Here are some links that helped me to find answers when dealing with a similar problem:
https://support.google.com/gsa/answer/2675414?hl=en
https://developers.google.com/search-appliance/documentation/64/xml_reference#request_sort_by_date
https://groups.google.com/forum/#!searchin/google-search-appliance-help/sort$20by$20date$20not$20working