Number of tweets using academic Twitter - r

I'm using the Academic Twitter API and I'm facing some difficulties to collect a good quantity of tweets to my database...
My script is:
tweets <- searchTwitter('ICMS + gasolina',
n = 10000,
since = '2022-08-01',
lang ="pt")
and the R return is:
Warning message:
In doRppAPICall("search/tweets", n, params = params, retryOnRateLimit = retryOnRateLimit, :
10000 tweets were requested but the API can only return 1547
I don't know if it's exactalty what I have to expect or I have anything that I can do.

Related

Authorization error in Scopus Search in R (rscopus)

I did a Scopus search using rscopus package with the following code:
author_search(au_id = "Smith",
searcher = "affil(princeton) and authlast")
I got the following error:
Error in get_results(au_id, start = init_start, count = count, facets = facets, :
Unauthorized (HTTP 401).
However, this code works well:
scopus_search(query = "Vocabulary", max_count = 20,
count = 10)
I have set the Scopus API Key using options("elsevier_api_key" = "MY-API-KEY-HERE"). So, I wonder what the problem is.

Getting more than the number of friends allowed by Twitter API using rtweet

I have written the following script that fetches friends of Twitter users ("barackobama" in this example) in batches of 75,000 (5000 friends per API call x 15 API calls) every 15 minutes using rtweet. However, after the script is done running, I find that the friend ids repeat after a fixed interval. For instance, rows 1, 280001, and 560001 have the same ID. Rows 2, 280002, and 560002 have the same ID, and so on. I'm wondering if I'm understanding next_cursor in the API incorrectly.
u = "barackobama"
n_friends = lookup_users(u)$friends_count
curr_page = -1
fetched_friends = 0
i = 0
all_friends = NULL
while(fetched_friends < n_friends) {
if(rate_limit("get_friends")$remaining == 0) {
print(paste0("API limit reached. Reseting at ", rate_limit("get_friends")$reset_at))
Sys.sleep(as.numeric((rate_limit("get_friends")$reset + 0.1) * 60))
}
curr_friends = get_friends(u, n = 5000, retryonratelimit = TRUE, page = curr_page)
i = i + 1
all_friends = rbind(all_friends, curr_friends)
fetched_friends = nrow(all_friends)
print(paste0(i, ". ", fetched_friends, " out of ", n_friends, " fetched."))
curr_page = next_cursor(curr_friends)
}
Any help will be appreciated.
You are not doing anything wrong. From the documentation:
this ordering is subject to unannounced change and eventual consistency issues
For very large lists, the API simply won't return all the information you want.

Using httr to place orders through BitMex API

I'm trying to use the httr R package to place orders on BitMex through their API.
I found some guidance over here, and after specifying both my API key and secret in respectively the objects K and S, I've tried the following
verb <- 'POST'
expires <- floor(as.numeric(Sys.time() + 10000))
path <- '/api/v1/order'
data <- '{"symbol":"XBTUSD","price":4500,"orderQty":10}'
body <- paste0(verb, path, expires, data)
signature <- hmac(S, body, algo = 'sha256')
body_l <- list(verb = verb, expires = expires, path = path, data = data)
And then both:
msg <- POST('https://www.bitmex.com/api/v1/order', encode = 'json', body = body_l, add_headers('api-key' = K, 'api-signature' = signature, 'api-expires' = expires))
and:
msg <- POST('https://www.bitmex.com/api/v1/order', body = body, add_headers('api-key' = K, 'api-signature' = signature, 'api-expires' = expires))
Give me the same error message when checked:
rawToChar(msg$content)
[1] "{\"error\":{\"message\":\"Signature not valid.\",\"name\":\"HTTPError\"}}"
I've tried to set it up according to how BitMex explains to use their API, but I appear to be missing something. They list out a couple of issues that might underly my invalid signature issue, but they don't seem to help me out. When following their example I get the exact same hashes, so that seems to be in order.
bit late to the party here but hopefully this helps!
Your POST call just needs some minor changes:
add content_type_json()
include .headers = c('the headers') in add_headers(). See example below:
library(httr)
library(digest)
S <- "your api secret"
K <- "your api key"
verb <- 'POST'
expires <- floor(as.numeric(Sys.time() + 10))
path <- '/api/v1/order'
data <- '{"symbol":"XBTUSD","price":4500,"orderQty":10}'
body <- paste0(verb, path, expires, data)
signature <- hmac(S, body, algo = 'sha256')
msg <- POST('https://www.bitmex.com/api/v1/order',
encode = 'json',
body = data,
content_type_json(),
add_headers(.headers = c('api-key' = K,
'api-signature' = signature,
'api-expires' = expires)))
content(msg, "text")
I have a package on CRAN - bitmexr - that provides a wrapper around the majority of BitMEX's API endpoints that you might be interested in. Still quite a "young" package so I would welcome any feedback!

TwitteR searchTwitter() call not working when in for loop

I'm currently trying to loop through a searchTwitter() call for each of 21 players in the NBA to get the 100 most recent tweets about them. However, weirdly, the call was working within my for loop of lastMVPs (which is a list of the names), but then stopped after 10 loops through, which would have only been 1,000 API calls. Now the call only works when outside of the for loop. Does anyone have any idea why this is?
For example - this works:
searchTwitter("Lebron James", n = 2, lang = 'en')
But this does not:
for (name in lastMVPs) {
newitem = searchTwitter(name, n = 100, lang = 'en')
df = twListToDF(newitem)
name = df$text
tweetMatrix = cbind(tweetMatrix, name)
}
And I get the error
Error in twListToDF(newitem) : Empty list passed to twListToDF
In addition: Warning message:
In doRppAPICall("search/tweets", n, params = params, retryOnRateLimit = retryOnRateLimit, :
100 tweets were requested but the API can only return 0
Which doesn't make sense to me because how can the API be maxed out when it's already working for my call outside of the loop?

make concurrent RCurl GET requests for set of URLs

I wrote a function to use RCurl to obtain the effective URL for a list of shortened URL redirects (bit.ly, t.co, etc.) and handle errors when the effective URL locates a document (PDFs tend to throw "Error in curlPerform... embedded nul in string.")
I would like to make this function more efficiently if possible (while keeping it in R). As written the run-time is prohibitively long for un-shortening a thousand or more URLs.
?getURI tells us that by default, getURI/getURL goes asynchronous when the length of the url vector is >1. But my performance seems totally linear, presumably because sapply turns the thing into one big for loop and the concurrency is lost.
Is there anyway I can speed up these requests? Extra credit for fixing the "embedded nul" issue.
require(RCurl)
options(RCurlOptions = list(verbose = F, followlocation = T,
timeout = 500, autoreferer = T, nosignal = T,
useragent = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2)"))
# find successful location (or error msg) after any redirects
getEffectiveUrl <- function(url){
c = getCurlHandle()
h = basicHeaderGatherer()
curlSetOpt( .opts = list(header=T, verbose=F), curl= c, .encoding = "CE_LATIN1")
possibleError <- tryCatch(getURI( url, curl=c, followlocation=T,
headerfunction = h$update, async=T),
error=function(e) e)
if(inherits(possibleError, "error")){
effectiveUrl <- "ERROR_IN_PAGE" # fails on linked documents (PDFs etc.)
} else {
headers <- h$value()
names(headers) <- tolower(names(headers)) #sometimes cases change on header names?
statusPrefix <- substr(headers[["status"]],1,1) #1st digit of http status
if(statusPrefix=="2"){ # status = success
effectiveUrl <- getCurlInfo(c)[["effective.url"]]
} else{ effectiveUrl <- paste(headers[["status"]] ,headers[["statusmessage"]]) }
}
effectiveUrl
}
testUrls <- c("http://t.co/eivRJJaV4j","http://t.co/eFfVESXE2j","http://t.co/dLI6Q0EMb0",
"http://www.google.com","http://1.uni.vi/01mvL","http://t.co/05Mz00DHLD",
"http://t.co/30aM6L4FhH","http://www.amazon.com","http://bit.ly/1fwWZLK",
"http://t.co/cHglxQkz6Z") # 10th URL redirects to content w/ embedded nul
system.time(
effectiveUrls <- sapply(X= testUrls, FUN=getEffectiveUrl, USE.NAMES=F)
) # takes 7-10 secs on my laptop
# does Vectorize help?
vGetEffectiveUrl <- Vectorize(getEffectiveUrl, vectorize.args = "url")
system.time(
effectiveUrls2 <- vGetEffectiveUrl(testUrls)
) # nope, makes it worse
I had bad experience with RCurl and Async request. R would completely freeze (though no error message, CPU and RAM did not spike) with only concurrent 20 requests.
I recommend switching to CURL and using curl_fetch_multi() function. It my case it could easily handle 50000 JSON request in one pool (with some division into subpools under the hood).
https://cran.r-project.org/web/packages/curl/vignettes/intro.html#async_requests

Resources