I am trying to access the Moz API with R to get some data but I cannot get the signed authentication right so my requests always get 401 response. I think something is wrong with the signature. Here is my code:
library(rjson)
library(digest)
library(RCurl)
# Construct the url to call the API
api <- 'http://lsapi.seomoz.com/linkscape/url-metrics/'
site <- 'facebook.com'
# that is the cols parameter that I need to get the data required
# there is no issue with it as it works when used with the provided sample call
metrics <- 'Cols=36029458443938976'
ampersand <- '&'
# this is my access id as given by Moz
access_id <- 'member-d8fc642751'
# this gets the current time and adds another 5 minutes as recommended (Unix timestamp format)
expires <- round(as.numeric(as.POSIXct(Sys.time()+300)))
# this concatenates the access id and the expires with a linefeed as explained in the API doc
hash_string <- paste('member-d8fc642751','\n', expires, sep="")
# this hashes the string from above with my secret key with sha1, don't worry this key is not valid anymore
hmac_hash <- hmac('f74fc2f2a8d5337aaa0550bfa3a9bdaf', hash_string, "sha1")
# Encoding with base64
base64_hash <- base64(hmac_hash)
# URL encoding the generated signature
encoded_signature <- URLencode(base64_hash, reserved = TRUE)
# constructing the url for the API call
url <- paste(api, site, '?', metrics, ampersand, 'AccessID=', access_id, ampersand, 'Expires=', expires, ampersand, 'Signature=', encoded_signature, sep="")
# Get data from API (json format)
Moz_json_data <- fromJSON(file=url, method='C')
Here is a comparison:
1. http://lsapi.seomoz.com/linkscape/url-metrics/facebook.com?Cols=36029458443938976&AccessID=member-d8fc642751&Expires=1415381495&Signature=YThmYTI1N2I4MDYzY2QxMGQzNDNjOWVlNmIyYTU1MzgzY2FlOWFiOA%3d%3d
2. http://lsapi.seomoz.com/linkscape/url-metrics/facebook.com?Cols=36029458443938976&AccessID=member-d8fc642751&Expires=1415465853&Signature=vyZmngnjiYy5Ns62LCLRHXgQQ6c%3D
The first one is generated by the code and does not work. The second one is provided as a sample request by Moz and works. As you can see the Signature in the second one much shorter, which makes me think I am generating the wrong signature but I do follow the same steps as outlined in their API doc.
Useful links:
http://apiwiki.moz.com/signed-authentication
http://apiwiki.moz.com/anatomy-of-a-mozscape-api-call
Any help will be greatly appreciated!
If you follow the example PHP code they provide, you'll see that they set raw=TRUE when calling hash_hmac. Thus when they encode the data, they are encoding the bytes of the hash, not the character representation of the bytes of the hash. You need to do this in the R version as well. Compare
# INCORRECT
(dd <- hmac('f74fc2f2a8d5337aaa0550bfa3a9bdaf', hash_string, "sha1"))
# [1] "e521bd74fba9296920efb897a2bc7578d3e8b075"
base64(dd)
# [1] "ZTUyMWJkNzRmYmE5Mjk2OTIwZWZiODk3YTJiYzc1NzhkM2U4YjA3NQ=="
# attr(,"class")
# [1] "base64"
and
# CORRECT
(dd <- hmac('f74fc2f2a8d5337aaa0550bfa3a9bdaf', hash_string, "sha1", raw=TRUE))
# [1] e5 21 bd 74 fb a9 29 69 20 ef b8 97 a2 bc 75 78 d3 e8 b0 75
base64(dd)
# [1] "5SG9dPupKWkg77iXorx1eNPosHU="
# attr(,"class")
# [1] "base64"
Related
I can retrieve the data with this query with Node-RED, but need to retrieve it with R.
This is as far as I've gotten.
post.1 <- httr::POST(url=paste0("http://", influx.ip, ":8086/api/v2/signin"),
authenticate(influx.user, influx.passwd))
# Authentication seems to work.
influx.query <- 'from(bucket: "nr_meas")
|> range(start: -12h)'
post.2 <- httr::POST(url=paste0("http://", influx.ip, ":8086/api/v2/query"),
query=list(org=influx.org),
add_headers("Content-Type: application/json",
'Accept: application/csv'),
body=list(q=influx.query)
)
content(post.2)
# $code
# [1] "invalid"
#
# $message
# [1] "failed to decode request body: invalid character '-' in numeric literal"
To save from Node-RED isn't an option (on different computer).
What is the right way to get data from InfluxDB to R?
You could try InfluxDB 2.0 R client:
library(influxdbclient)
client <- InfluxDBClient$new(
url = paste0("http://", influx.ip, ":8086"),
token = "my-token",
org = influx.org)
data <- client$query('from(bucket: "nr_meas") |> range(start: -12h) |> drop(columns: ["_start", "_stop"])')
I have a code with "curl::curl_fetch_memory" for getting data from REST server. All works fine except processing national characters. Here is a code which generated bugs
# 1. Works
str_json <- "https://api.company.com/request?name=SEROUJ%20DEBUI"
# 2. NOT works
str_json <- "https://api.company.com/request?name=SÉROUJ%20DEBUI"
# 3. Create new handler
h <- curl::new_handle()
curl::handle_setopt(
handle = h, httpauth = 1,
userpwd = "password")
# 4. Using curl
resp <- curl::curl_fetch_memory(str_json, handle = h)
rawToChar(resp$content)
jsonlite::fromJSON(rawToChar(resp$content))
The difference between JSON is one char "E" (works) vs "É" (not works). It generates bug "502 Bad Gateway".
Any ideas on how to fix this bug are welcome!
if you do this with a space in the string it seems to work:
str_json <- "https://api.company.com/request?name=SÉROUJ DEBUI"
URLencode(str_json)
# "https://api.company.com/request?name=S%C3%89ROUJ%20DEBUI"
For some reason it does not pick it up if you already have some values encoded.
It works in other direction:
URLdecode("https://api.company.com/request?name=S%C3%89ROUJ%20DEBUI")
#[1] "https://api.company.com/request?name=SÉROUJ DEBUI"
I succeed getting the text of the post and share and likes count.
However, I am not able to get the like of the comments associated with the post . If this information is not avalaible , I would like to merge the like count of the post to each comments.
Example: A post gets 900 likes and 80 comments. I would like to associated the 900 likes values to each of the comments (a new column called post_like maybe).
I would like to use this information to perform a sentiment analysis using the number of likes (complexe like (i.e. haha, sad...)) in a logistic regression with the frequence of the most frequent words as the x variable.
Here is my script so far:
token<- "**ur token , get it at https://developers.facebook.com/tools/explorer/**"
# Function to download the comments
download.post <- function(i, refetch=FALSE, path=".") {
post <- getPost(post=fb_page$id[i], comments = TRUE, likes = TRUE, token=token)
post1<- as.data.frame(melt(post))
}
#----------------------- Request posts --- ALL
# Get post for ALL
fb_page<- getPage(page="**the page number u want**", token=token, since='2010/01/01', until='2016/01/01', n= 10000, reactions=TRUE)
fb_page$order <- 1:nrow(fb_page)
# Apply function to download comments
files<-data.frame(melt(lapply(fb_page$order, download.post)))
# Select only comments
files_c<-files[complete.cases(files$message),]
So basically I get the page with the post ID and create a function to get the post of the post ID on that page.
As you can see , I get all the information I need BESIDE the likes and share count.
I hope I am clear , thanks a lot for you help
It's all there:
library(Rfacebook)
token <- "#############" # https://developers.facebook.com/tools/explorer
fb_page <- getPage(page="europeanparliament", token=token, n = 3)
transform(
fb_page[,c("message", "likes_count", "comments_count", "shares_count")],
message = sapply(message, toString, width=30)
)
# message likes_count comments_count shares_count
# 1 This week members called o.... 92 73 21
# 2 Today we're all Irish, bea.... 673 133 71
# 3 European citizens will mee.... 1280 479 71
packageVersion("Rfacebook")
# [1] ‘0.6.12’
I would like to create a data frame that scrapes the NYT and WSJ and has the number of articles on a given topic per year. That is:
NYT WSJ
2011 2 3
2012 10 7
I found this tutorial for the NYT but is not working for me :_(. When I get to line 30 I get this error:
> cts <- as.data.frame(table(dat))
Error in provideDimnames(x) :
length of 'dimnames' [1] not equal to array extent
Any help would be much appreciated.
Thanks!
PS: This is my code that is not working (A NYT api key is needed http://developer.nytimes.com/apps/register)
# Need to install from source http://www.omegahat.org/RJSONIO/RJSONIO_0.2-3.tar.gz
# then load:
library(RJSONIO)
### set parameters ###
api <- "API key goes here" ###### <<<API key goes here!!
q <- "MOOCs" # Query string, use + instead of space
records <- 500 # total number of records to return, note limitations above
# calculate parameter for offset
os <- 0:(records/10-1)
# read first set of data in
uri <- paste ("http://api.nytimes.com/svc/search/v1/article?format=json&query=", q, "&offset=", os[1], "&fields=date&api-key=", api, sep="")
raw.data <- readLines(uri, warn="F") # get them
res <- fromJSON(raw.data) # tokenize
dat <- unlist(res$results) # convert the dates to a vector
# read in the rest via loop
for (i in 2:length(os)) {
# concatenate URL for each offset
uri <- paste ("http://api.nytimes.com/svc/search/v1/article?format=json&query=", q, "&offset=", os[i], "&fields=date&api-key=", api, sep="")
raw.data <- readLines(uri, warn="F")
res <- fromJSON(raw.data)
dat <- append(dat, unlist(res$results)) # append
}
# aggregate counts for dates and coerce into a data frame
cts <- as.data.frame(table(dat))
# establish date range
dat.conv <- strptime(dat, format="%Y%m%d") # need to convert dat into POSIX format for this
daterange <- c(min(dat.conv), max(dat.conv))
dat.all <- seq(daterange[1], daterange[2], by="day") # all possible days
# compare dates from counts dataframe with the whole data range
# assign 0 where there is no count, otherwise take count
# (take out PSD at the end to make it comparable)
dat.all <- strptime(dat.all, format="%Y-%m-%d")
# cant' seem to be able to compare Posix objects with %in%, so coerce them to character for this:
freqs <- ifelse(as.character(dat.all) %in% as.character(strptime(cts$dat, format="%Y%m%d")), cts$Freq, 0)
plot (freqs, type="l", xaxt="n", main=paste("Search term(s):",q), ylab="# of articles", xlab="date")
axis(1, 1:length(freqs), dat.all)
lines(lowess(freqs, f=.2), col = 2)
UPDATE: the repo is now at https://github.com/rOpenGov/rtimes
There is a RNYTimes package created by Duncan Temple-Lang https://github.com/omegahat/RNYTimes - but it is outdated because the NYTimes API is on v2 now. I've been working on one for political endpoints only, but not relevant for you.
I'm rewiring RNYTimes right now...Install from github. You need to install devtools first to get install_github
install.packages("devtools")
library(devtools)
install_github("rOpenGov/RNYTimes")
Then try your search with that, e.g,
library(RNYTimes); library(plyr)
moocs <- searchArticles("MOOCs", key = "<yourkey>")
This gives you number of articles found
moocs$response$meta$hits
[1] 121
You could get word counts for each article by
as.numeric(sapply(moocs$response$docs, "[[", 'word_count'))
[1] 157 362 1316 312 2936 2973 355 1364 16 880
I am using the rDrop package that is available from https://github.com/karthikram/rDrop, and after a bit of tweaking (as all the functions don't quite work as you would always expect them to) I have got it to work finally in the way I would like, but it still requires authorisation verification to allow the use of the app, once you get the token each time, as I think that tokens expire over time...(if this is not the case and I can hard code in my token please tell me as that would be a good solution too...)
Basically I wanted a near seamless way of downloading csv files from my dropbox folders from the commandline in R in one line of code so that I dont need to click on the allow button after the token request.
Is this possible?
Here is the code I used to wrap up a dropbox csv download.
db.csv.download <- function(dropbox.path, ...){
cKey <- getOption('DropboxKey')
cSecret <- getOption('DropboxSecret')
reqURL <- "https://api.dropbox.com/1/oauth/request_token"
authURL <- "https://www.dropbox.com/1/oauth/authorize"
accessURL <- "https://api.dropbox.com/1/oauth/access_token/"
require(devtools)
install_github("ROAuth", "ropensci")
install_github("rDrop", "karthikram")
require(rDrop)
dropbox_oa <- oauth(cKey, cSecret, reqURL, authURL, accessURL, obj = new("DropboxCredentials"))
cred <- handshake(dropbox_oa, post = TRUE)
raw.data <- dropbox_get(cred,dropbox.path)
data <- read.csv(textConnection(raw.data), ...)
data
}
Oh and if its not obvious I have put my dropbox key and secret in my .Rprofile file, which is what the getOption part is referring to.
Thanks in advance for any help that is provided. (For bonus points...if anybody knows how to get rid of all the loading messages even for the install that would be great...)
library(rDrop)
# my keys are in my .rprofile, otherwise specifiy inline
db_token <- dropbox_auth()
# Hit ok to authorize once through the browser and hit enter back at the R prompt.
save(db_token, file="my_dropbox_token.rdata")
Dropbox token are non-expiring and can be revoked anytime from the Dropbox web panel.
For future use:
library(rDrop)
load('~/Desktop/my_dropbox_token.rdata')
df <- data.frame(x=1:10, y=rnorm(10))
> df
x y
1 1 -0.6135835
2 2 0.3624928
3 3 0.5138807
4 4 -0.2824156
5 5 0.9230591
6 6 0.6759700
7 7 -1.9744624
8 8 -1.2061920
9 9 0.9481213
10 10 -0.5997218
dropbox_save(db_token, list(df), file="foo", ext=".rda")
rm(df)
df2 <- db.read.csv(db_token, file='foo.rda')
> df2
x y
1 1 -0.6135835
2 2 0.3624928
3 3 0.5138807
4 4 -0.2824156
5 5 0.9230591
6 6 0.6759700
7 7 -1.9744624
8 8 -1.2061920
9 9 0.9481213
10 10 -0.5997218
If you have additional problems, please file an issue.