How to scrape Google with specific search criteria using R? - r

I'm trying to scrape Google with a specific search term, from a specific site, in a specific date range using R.
Example
Search term: "Miroslava Breach Velducea"
Site: www.jornada.com.mx
Dates: 1/1/2011 - 1/1/2012
The link for that specific search is: https://www.google.com/search?q=Miroslava+Breach+Velducea+site:www.jornada.com.mx&tbas=0&tbs=cdr:1,cd_min:1/1/2011,cd_max:1/1/2012&ei=UqCzW6LZC8OK5wKg97vYDA&start=10&sa=N&biw=1137&bih=474
When I code that in R, I can scrape Google for that search term and in that site, but not for those dates.
web_address ='https://www.google.com/search?q=miroslava+breach+velducea+site%3Awww.jornada.com.mx&tbs=cdr%3A1%2Ccd_min%3A1%2F1%2F2010%2Ccd_max%3A12%2F31%2F2011'
webpage_code = read_html(web_address)
Título = html_text(html_nodes(webpage_code,'.r a'))
Título
Does anyone know how to scrape Google general search for specific dates?

Related

Scraping availability of the product

My goal is to scrape some data if the product available or not.
At present I am using the following:
=importxml (B2,//*[#id="product-controls"]/div/div1/div1)
Unfortunately, i am receiving an error. Here is the link to the file https://docs.google.com/spreadsheets/d/11OJvxRRIXJolpi2UttmNIOArAdwh1qeZhjqczlVI8oc/edit#gid=1531110146
As an example, I want to get the data from the url https://radiodetal.com.ua/mikroshema-5m0365r-dip8
and Xpath should be from here
got the formula
=importxml (B2,"//div[#class='stock']")

scraping historical stock prices in google sheets

I want to scrape the historical prices for a specific from this link (154.94).
The stock ID (5115167) and the custom dates I will generate from other cells in the sheet for the URL.
I tried scraping with =IMPORTXML(A1,...) and =IMPORTHTML(A1,"table",...) but it's not working.
Tried the following XML:
//*[#id="printThisElement"]/div[2]
//*[#id="wrapper"]/div[9]/div/div[2]/div/div/div/div/div[1]
//*[#id="printThisElement"]/div[2]/div[3]
Neither of those are working.

What's the meaning of "ceid" in Google News url

I am recently working on web scraping of Google News and dealing with the URL format. I have same question of this post about the meaning of ceid.
For example the link from the post: https://news.google.com/rss/search?q=studie&hl=de&gl=DE&ceid=DE:de
I know hl = host language, gl = for the search results whose country of origin matches the parameter value, but I can't find what ceid for. If I cut that part of the search link, it is unable to show the result, so obviously it is a must part.
So what's the meaning and the purpose of the "ceid"?
Many thanks.
It might mean the country name and show the results according to the common language of the country.
For example: 'ceid=IN:en' means the country as "India", and hence the result should show up in English.

Google Analytics Query for R, content drilldown

I am trying to export Google Analytics data into R in order to build a report and do some other data mining related tasks with the data. I am using the RGoogleAnalytics package to do so. I have the connection working, but am having trouble specifying the correct query in order to obtain the right information.
I am trying to obtain information from a specific page that I would reach by going to the content drilldown section in google analytics, and searching for that specific page. I also would like to use a filtered view, to filter out ISP's that are from my work place. There are several websites under a particular view. To reach the specific page, I use the content drill down in Google Analytics. I am trying to build a query that pulls this information automatically. I have tried the following in regards to getting the correct query.
ValidateToken(token1)
query.list1 <- Init(start.date = "2016-10-28", end.date = "2016-12-05",
dimensions = "ga:date", metrics = "ga:uniquepageviews",
filters = "ga:pagePathLevel1
==/ed######.edu/;ga:pagePathLevel2==/content/",
table.id = "ga:##### ")
sort = "ga:date"
ga.query <- QueryBuilder(query.list1)
ga.data <- GetReportData(ga.query, token1)
This does not throw in error in R, but does not seem to be returning any metrics(it returns all zeros, for unique pageviews, when there are results) as shown below.
**date** **uniquepageviews**
1 20161028 0
2 20161029 0
3 20161030 0
4 20161031 0
In the above, I tried to use the filter to get the correct page. Is this correct? If so, what should I put into the filter so that it only returns metrics for a specific page in a given view? Also, is there a way to select for a given prebuilt view? Any help is appreciated, thanks.

Can Bing Search API V5 search for non english news articles

I attempted the following search on Bing Search API, limiting to News articles only:
$ms_api_url = "https://api.cognitive.microsoft.com/bing/v5.0/news/search?q=حج";
and
$ms_api_url = "https://api.cognitive.microsoft.com/bing/v5.0/news/search?q=%27%D8%AD%D8%AC%27"
and the results were very limited (55 articles) and nearly all English.
Can this API return non-english results?
mkt is an optional parameter to specify where the results are coming from. From the documentation:
Typically, this is the country where the user is making the request
from; however, it could be a different country if the user is not
located in a country where Bing delivers results. The market must be
in the form {language code}-{country code}. For example, en-US.
Full list of supported markets:
es-AR,en-AU,de-AT,nl-BE,fr-BE,pt-BR,en-CA,fr-CA,es-CL,da-DK,fi-FI,fr-FR,de-DE,zh-HK,en-IN,en-ID,en-IE,it-IT,ja-JP,ko-KR,en-MY,es-MX,nl-NL,en-NZ,no-NO,zh-CN,pl-PL,pt-PT,en-PH,ru-RU,ar-SA,en-ZA,es-ES,sv-SE,fr-CH,de-CH,zh-TW,tr-TR,en-GB,en-US,es-US

Resources