Google Plus public search activities with keyword using R - r

I'm trying to fetch data from the Google Plus API but I only know how to search if I know the user_id.
Here's how I get the JSON using RCurl library:
data <- getURL(paste0("https://www.googleapis.com/plus/v1/people/",
user_id,"/activities/public?maxResults=100&key=", api_key),
ssl.verifypeer = FALSE)
I have tried formatting the URL like the documentation on google
like so:
data <- getURL(paste0("https://www.googleapis.com/plus/v1/activities/",
keyword,"?key=",api_key),ssl.verifypeer = FALSE)
but it doesn't work.
Is it even possible to search using a keyword from R or not? As R isn't in the supported programming languages for the API according to this link

I figured out how to make it work.
The GET request should be formatted as:
data <- getURL(paste0("https://www.googleapis.com/plus/v1/activities?key=",api_key,"&query=",search_string),ssl.verifypeer = FALSE)

Related

How can I extract various tables from Wikipedia using R?

I have the next link:
https://en.wikipedia.org/wiki/List_of_prime_ministers_of_Spain
I'm trying to extract the information about Prime Ministers but it gives a table of data without any apparent order.
This is currently the code that I am using
library(XML)
library(httr)
url = "https://en.wikipedia.org/wiki/List_of_prime_ministers_of_Spain"
url <- GET(url)
datos = readHTMLTable(rawToChar(url$content), header=T,stringsAsFactors=F)
tabla2= datos[[2]]
I would suggest using Selenium. Through Selenium API you can access all functionalities of the DOM.Prior I have used urlib library in Python but it didn't help if the page uses a lot of functionality thus the DOM always changes.

How to extract analytics from Adobe at URL filtered by channel using R

I'm trying to extract URL level data in a filtered channel (organic search) using the Adobe Analytics API and it's failing. How do I make it work?
I've tried using the RSiteCatalyst in R using the following code:
QueueDataWarehouse(my_rsid,
date.from = "2019-06-01",
date.to = "2019-06-30",
segment.id = "s300007365_5b366bb3f30aae698e8b0488",
elements = c("entrypage"),
metrics = c("uniquevisitors","averagetimespentonpage", "event123", "event38"),
enqueueOnly = F)
I expected a dataframe as defined in the code above and instead I get a dataframe that is NULL.
Please help.
PS While I'm here, is there another R package to work with the newer version of the API being 2.0?

Cannot access EIA API in R

I'm having trouble accessing the Energy Information Administration's API through R (https://www.eia.gov/opendata/).
On my office computer, if I try the link in a browser it works, and the data shows up (the full url: https://api.eia.gov/series/?series_id=PET.MCREXUS1.M&api_key=e122a1411ca0ac941eb192ede51feebe&out=json).
I am also successfully connected to Bloomberg's API through R, so R is able to access the network.
Since the API is working and not blocked by my company's firewall, and R is in fact able to connect to the Internet, I have no clue what's going wrong.
The script works fine on my home computer, but at my office computer it is unsuccessful. So I gather it is a network issue, but if somebody could point me in any direction as to what the problem might be I would be grateful (my IT department couldn't help).
library(XML)
api.key = "e122a1411ca0ac941eb192ede51feebe"
series.id = "PET.MCREXUS1.M"
my.url = paste("http://api.eia.gov/series?series_id=", series.id,"&api_key=", api.key, "&out=xml", sep="")
doc = xmlParse(file=my.url, isURL=TRUE) # yields error
Error msg:
No such file or directoryfailed to load external entity "http://api.eia.gov/series?series_id=PET.MCREXUS1.M&api_key=e122a1411ca0ac941eb192ede51feebe&out=json"
Error: 1: No such file or directory2: failed to load external entity "http://api.eia.gov/series?series_id=PET.MCREXUS1.M&api_key=e122a1411ca0ac941eb192ede51feebe&out=json"
I tried some other methods like read_xml() from the xml2 package, but this gives a "could not resolve host" error.
To get XML, you need to change your url to XML:
my.url = paste("http://api.eia.gov/series?series_id=", series.id,"&api_key=",
api.key, "&out=xml", sep="")
res <- httr::GET(my.url)
xml2::read_xml(res)
Or :
res <- httr::GET(my.url)
XML::xmlParse(res)
Otherwise with the post as is(ie &out=json):
res <- httr::GET(my.url)
jsonlite::fromJSON(httr::content(res,"text"))
or this:
xml2::read_xml(httr::content(res,"text"))
Please note that this answer simply provides a way to get the data, whether it is in the desired form is opinion based and up to whoever is processing the data.
If it does not have to be XML output, you can also use the new eia package. (Disclaimer: I'm the author.)
Using your example:
remotes::install_github("leonawicz/eia")
library(eia)
x <- eia_series("PET.MCREXUS1.M")
This assumes your key is set globally (e.g., in .Renviron or previously in your R session with eia_set_key). But you can also pass it directly to the function call above by adding key = "yourkeyhere".
The result returned is a tidyverse-style data frame, one row per series ID and including a data list column that contains the data frame for each time series (can be unnested with tidyr::unnest if desired).
Alternatively, if you set the argument tidy = FALSE, it will return the list result of jsonlite::fromJSON without the "tidy" processing.
Finally, if you set tidy = NA, no processing is done at all and you get the original JSON string output for those who intend to pass the raw output to other canned code or software. The package does not provide XML output, however.
There are more comprehensive examples and vignettes at the eia package website I created.

R - Twitter - fromJSON - get list of tweets

I would like to retrieve a list of tweets from Twitter for a given hashtag using package RJSONIO in R. I think I am pretty close to the solution, but I seem to miss one step.
My code reads as follows (in this example, I use #NBA as a hashtag):
library(httr)
library(RJSONIO)
# 1. Find OAuth settings for twitter:
# https://dev.twitter.com/docs/auth/oauth
oauth_endpoints("twitter")
# Replace key and secret below
myapp <- oauth_app("twitter",
key = "XXXXXXXXXXXXXXX",
secret = "YYYYYYYYYYYYYYYYY"
)
# 3. Get OAuth credentials
twitter_token <- oauth1.0_token(oauth_endpoints("twitter"), myapp)
# 4. Use API
req=GET("https://api.twitter.com/1.1/search/tweets.json?q=%23NBA&src=typd",
config(token = twitter_token))
req <- content(req, as = "text")
response=fromJSON(req)
How can I get the list of tweets from object 'response'?
Eventually, I would like to get something like:
searchTwitter("#NBA", n=5000, lang="en")
Thanks a lot in advance!
The response object should be a list of length two: statuses and metadata. So, for example, to get the text of the first tweet, try:
response$statuses[[1]]$text
However, there are a couple of R packages designed to make just this kind of thing easier: Try streamR for the streaming API, and twitteR for the REST API. The latter has a searchTwitter function exactly as you describe.

Adding timeline while Extracting Twitter data in R

I am trying to extract twitter data for a keyword using the following code:
cred<- OAuthFactory$new(consumerKey='XXXX', consumerSecret='XXXX',
requestURL='https://api.twitter.com/oauth/request_token',
accessURL='https://api.twitter.com/oauth/access_token',
authURL='https://api.twitter.com/oauth/authorize')
cred$handshake(cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl"))
To enable the connection, please direct your web browser to:
https://api.twitter.com/oauth/authorize?oauth_token=Cwr7GgWIdjh9pZCmaJcLq6CG1zIqk4JsID8Q7v1s
When complete, record the PIN given to you and provide it here: 8387466
registerTwitterOAuth(cred)
search=searchTwitter('facebook',cainfo="cacert.pem",n=1000)
But evenwith n=1000, the function returns a list of only 99 tweets where it should more than that. I also tried the same function with a specific timeline:
search=searchTwitter('facebook',cainfo="cacert.pem",n=1000,since='2013-01-01',until='2014-04-01')
But this function returns a empty list.
Can anyone help me out, with the correct set of additional queries so that I can extract data from a specific timeline and without any restriction on the number of tweets? Does it have to do anything with the amount of data fetched by the API?
Thanks in advance
It looks like Twitter API restricts number of returned tweets. You should check this out in the API documentation. Keeping the restriction in mind, you can use the since and sinceID arguments of searchTwitter() within a loop, something like:
for (i in 1:20) {
if (i==1) search = searchTwitter('facebook',cainfo="cacert.pem",n=2, since='2014-04-15')
else search = searchTwitter('facebook',cainfo="cacert.pem",n=2, since='2014-04-15', sinceID=search[[1]]$id)
print(search)
Sys.sleep(10)
}
You may need to adjust the Sys.sleep(10) portion if you hit API restrictions.

Resources