Twitter API v2, how can I set a maximum on the number of tweets I want to scarpe - twitter-api-v2

I would like to scrape only 2.000 tweets each day related to a specific query (in this example it's tesla). Do you guys know a way to set a maximum to the number of tweets I can scrape?
This is my code below without the access keys to my Academic Twitter API account. It works perfectly, however it keeps scraping all the tweets that are out there which results in me reaching the 10 million maximum monthly tweets I can scrape very quickly.
Thank you in advance!
client = tweepy.Client(
wait_on_rate_limit = True,
consumer_key = consumer_key,
consumer_secret = consumer_secret,
access_token = access_token,
access_token_secret = access_token_secret,
bearer_token = my_bearer_token,
)
query="climate change lang:en -is:retweet"
start_time = "2011-01-01T00:00:00Z"
end_time = "2011-06-30T23:59:59Z"
response_tweets = []
for response in tweepy.Paginator(client.search_all_tweets,
query=query,
user_fields = ["username", 'public_metrics'],
tweet_fields=['created_at', 'text'],
expansions = ['author_id'],
start_time=start_time,
end_time=end_time,
max_results=500):
time.sleep(1)
response_tweets.append(response)

When using tweepy you can give the total number of tweets to retrieve by using flatten(limit) method. So, in your case it will be smth like:
paginator = tweepy.Paginator(client.search_all_tweets,
query=query,
user_fields = ["username", 'public_metrics'],
tweet_fields=['created_at', 'text'],
expansions = ['author_id'],
start_time=start_time,
end_time=end_time,
max_results=500)
for response in paginator.flatten(limit=max_limit):
response_tweets.append(result)
More information about tweepy and pagination can be found here : Pagination-tweepy

Related

rscopus scopus_search() only returns first author. Need full author list

I am performing a bibliometric analysis, and have chosen to use rscopus to automate my document searches. I performed a test search, and it worked; the documents returned by scopus_search() exactly matched a manual check that I performed. Here's my issue: rscopus returned only information on the first author (and their affiliation) of each article, but I need information on all authors/affiliations for each article pulled for my particular research questions. I've scoured the rscopus documentation, as well as Elsevier's Developer notes for API use, but can't figure this out. Any ideas on what I'm missing?
query1 <- 'TITLE-ABS-KEY ( ( recreation ) AND ( management ) AND (challenge)'
run1 <- scopus_search(query = query1, api_key = apikey, count = 20,
view = c('STANDARD', 'COMPLETE'), start = 0, verbose = TRUE,
max_count = 20000, http = 'https://api.elsevier.com/content/search/scopus',
headers = NULL, wait_time = 0)
I wanted to post an update since I figured out what was going wrong. I was using the university VPN to access the Scopus API, but the IP address associated with that VPN was not within the range of addresses included in my institution's Scopus license. So, I did not have permission to get "COMPLETE" results. I reached out to Elsevier and very quickly got an institution key that I could add to the search. My working search looks as follows...
query1 <- 'TITLE-ABS-KEY ( ( recreation ) AND ( management ) AND (challenge)'
run1 <- scopus_search(query = query1, api_key = apikey, count = 20,
view = c('COMPLETE'), start = 0, verbose = TRUE,
max_count = 20000, http='https://api.elsevier.com/content/search/scopus',
headers = inst_token_header(insttoken), wait_time = 0)
Just wanted to reiterate Brenna's comment - I had the same issue using the VPN to access the API (which can be resolved by being on campus). Elsevier were very helpful and provided an institutional token very quickly - problem solved.
Otherwise the other workaround I found was to use CrossRef data using library(rcrossref)
I used the doi column from the scopusdata from my original Scopus search:
crossrefdata <- scopusdata %>%
pmap(function(doi){
cr_works(dois = doi) # returns CrossRef metadata for each doi
}) %>%
map(pluck("data")) %>% # cr_works returns a list, select only the 'data'
bind_rows()
You can then manipulate the crossref metadata however you need with full author list.

API call from Call of Duty API in R - authentication problem

I am trying to call the stats of a list of players from the call of duty API. This API requires firstly the login in website https://profile.callofduty.com/cod/login. Once logged in, the user can see the stats of a player using the call-of-duty API. For example, the stats of the streamer savyultras90 from Warzone can be seen through the following link: https://my.callofduty.com/api/papi-client/stats/cod/v1/title/mw/platform/psn/gamer/savyultras90/profile/type/wz.
If I log in from the website and try to see the stats of a player and the related json, I am able to do via browser. However, this doesn't seem straightforward in R.
I try to log in using the GET function from httr package as follows
respo <- GET('https://profile.callofduty.com/cod/login', authenticate('USER', 'PWD'))
But when I try to have access to the api and download the JSON file using the function fromJSON from the package jsonlite as follows
data <- fromJSON('https://my.callofduty.com/api/papi-client/stats/cod/v1/title/mw/platform/psn/gamer/savyultras90/profile/type/wz')
I get the error message "Not permitted: not authenticated".
How can I authenticate in one website and stay logged in to call from the API which relies on that authentication?
Seeing I've recently had to develop a PHP API for Warzone, I might be able to guide you in the right direction on how to handle this. But first a few remarks:
You need to authenticate each user individually with the appropriate platform if you want to request that player's data
There is a throttle limit on the amount of API requests
The API of Call of Duty is under strict usage guidelines and should only be used by registered partners. Making usage of the API could result in claims and eventually lawsuits: link
There is no public documentation of the API and the API has changed in the past, breaking several 3rd party tools.
Nevertheless, the process involves several steps as described below:
Register the device making the call
https://profile.callofduty.com/cod/mapp/registerDevice
with a json body in the form of {"deviceId":"INSERT_ID_HERE"}
This will return a response with the authHeader which we will use for as Token in the next calls
Login with Activision credentials
https://profile.callofduty.com/cod/mapp/login
Set the following headers:
Authorization: "INSERT_AUTHHEADER_HERE"
x_cod_device_id: "INSERT_PREVIOUSLY_USED_DEVICEID_HERE"
This in terms will generate a dataset where we will save the following data from:
rtkn, ACT_SSO_COOKIE and atkn.
Make the wanted API call for data
We have all the data now required to make the API call.
For each request we will submit 3 headers:
Authorization: "INSERT_AUTHHEADER_HERE"
x_cod_device_id: "INSERT_PREVIOUSLY_USED_DEVICEID_HERE"
Cookie: ACT_SSO_LOCALE=en_GB;country=GB;API_CSRF_TOKEN=**GENERATE_CSRF_TOKEN**;rtkn=**RTKN_HERE**;ACT_SSO_COOKIE=**ACT_SSO_COOKIE_HERE**;atkn=**ATKN_HERE**
For more reference, you can always look through a Python library or NodeJS library which succesfully implemented the API.
I struggled with this yesterday but finally made some progress. The issue is that you have to obtain an authentication token. The steps can be followed here: https://documenter.getpostman.com/view/7896975/SW7aXSo5#a37a2e5b-84bb-441d-b978-0fd8d42ffd29 but not available in R though.
My code works at first, as long as you don't authenticate again (still trying to figure out why). Basically what I did was to translate the steps in the link and extracted the content in the response from GET:
# Get token ---------------------------------------------------------------
resp <- GET('https://profile.callofduty.com/cod/login')
cookies = c(
'XSRF-TOKEN' = resp$cookies$value[1]
,'new_SiteId' = resp$cookies$value[2]
,'comid' = resp$cookies$value[3]
,'bm_sz' = resp$cookies$value[4]
,'_abck' = resp$cookies$value[5]
# ,'ACT_SSO_COOKIE' = resp$cookies$value[6]
# ,'ACT_SSO_COOKIE_EXPIRY' = resp$cookies$value[7]
# ,'atkn' = resp$cookies$value[8]
# ,'ACT_SSO_REMEMBER_ME' = resp$cookies$value[9]
# ,'ACT_SSO_EVENT' = resp$cookies$value[10]
# ,'pgacct' = resp$cookies$value[11]
# ,'CRM_BLOB' = resp$cookies$value[12]
# ,'tfa_enrollment_seen' = resp$cookies$value[13]
)
headers = c(
)
params = list(
`new_SiteId` = 'cod',
`username` = 'USER',
`password` = 'PWD',
`remember_me` = 'true',
`_csrf` = resp$cookies$value[1]
)
# Authenticate ------------------------------------------------------------
resp_post <- POST('https://profile.callofduty.com/do_login?new_SiteId=cod',
httr::add_headers(.headers=headers),
query = params,
httr::set_cookies(.cookies = cookies))
cookies = c(
'XSRF-TOKEN' = resp_post$cookies$value[1]
,'new_SiteId' = resp_post$cookies$value[2]
,'comid' = resp_post$cookies$value[3]
,'bm_sz' = resp_post$cookies$value[4]
,'_abck' = resp_post$cookies$value[5]
,'ACT_SSO_COOKIE' = resp_post$cookies$value[6]
,'ACT_SSO_COOKIE_EXPIRY' = resp_post$cookies$value[7]
,'atkn' = resp_post$cookies$value[8]
,'ACT_SSO_REMEMBER_ME' = resp_post$cookies$value[9]
,'ACT_SSO_EVENT' = resp_post$cookies$value[10]
,'pgacct' = resp_post$cookies$value[11]
,'CRM_BLOB' = resp_post$cookies$value[12]
,'tfa_enrollment_seen' = resp_post$cookies$value[13]
)
headers = c(
)
params = list(
`new_SiteId` = 'cod',
`username` = 'USER',
`password` = 'PWD',
`remember_me` = 'true',
`_csrf` = resp_post$cookies$value[1]
)
# Get data:
resp_psn <- httr::GET(url = 'https://my.callofduty.com/api/papi-client/stats/cod/v1/title/mw/platform/psn/gamer/savyultras90/profile/type/wz',
httr::add_headers(.headers=headers),
query = params,
httr::set_cookies(.cookies = cookies))
resp_psn_json <- content(resp_psn)
Let me know if you've already managed to resolve this!

HTTR package is not returning full response from API query

I am trying to use httr to call on an API from IGDB (documentation here). When I use the following query in Postman, I have no problem and get the full request which has 100 entries:
fields rating, game; where game = 114283; limit 100; sort id desc;
Example of entry here:
{
"id": 442667,
"game": 114283,
"rating": 3.0
},
However, when I attempt to make this query in R using httr as follows:
string <- paste0("rating, game; where game = ", ids[1,1], "; limit 100; sort id desc;")
data <- POST("https://api-v3.igdb.com/private/rates/",
add_headers("user-key" = "XXXXXXXXXX"),
query = list(fields = string)
)
fromJSON(rawToChar(data$content))
it returns only a data frame of 23 rows:
id game rating
1 442667 114283 3
...
23 383956 114283 10
Other calls similarly return shortened data frames with varying length depending upon the query.
If anyone has any idea as to why this might be happening, I would love some insight.
Thanks.
I think you're merely passing your query string in wrongly, since it is supposed to be passed into the -d (data) parameter of your http request.
Try the following:
append the initial "fields " into string
pass this to the body param of your POST request
So:
string <- paste0("fields rating, game; where game = ", ids[1,1], "; limit 100; sort id desc;")
data <- POST(
"https://api-v3.igdb.com/private/rates/",
add_headers("user-key" = "XXXXXXXXXX"),
body = string
)

failed to authenticate google translate in R

So I tried to use gl_translate function to 500,000 characters in Rstudio, which means I have to authenticate my google translate API. The problem is that I tried it like two months ago with my old google account and now I'm using the new one.
So when I tried to authenticate new client_id with my new google account, I got error message that my API hadn't been enabled yet, which I had enabled it. I restarted my Rstudio and now I got this error message:
020-01-22 19:01:24 -- Translating html: 147 characters -
2020-01-22 19:01:24> Request Status Code: 403
Error: API returned: Request had insufficient authentication scopes.
It is very frustrating because then I tried to enable the old google account and it requires me to put my credit card number, which is again I did and then now they asked me to wait several days.
Anyone can figure out what's problem with this?
here is my R code for authentication:
install.packages("googleAnalyticsR", dependencies = TRUE)
library(googleAnalyticsR)
install.packages("googleLanguageR")
library(googleLanguageR)
install.packages("dplyr")
library(dplyr)
library(tidyverse)
install.packages("googleAuthR")
library(googleAuthR)
client_id <- "107033903887214478396"
private_key <- "-----BEGIN PRIVATE KEY-----\nMIIEvAIBADANBgkqhkiG9w0BAQEFAASCBKYwggSiAgEAAoIBAQChPmvib1v9/CFA\nX7fG8b8iXyS370ivkufMmX30C6/rUNOttA+zMhamS/EO0uaYtPw44B4QdNzRsSGq\nm+0fQ5Sp1SHJaVPkoImuUXZdMlLO73pvY48nMmEFg7deoOZI+CNWZYgIvPY8whMo\nk4vKE2iyuG+pl9MT7dT6dwWNmXDWr8kgxAfryfVEUIeqaf+57Z9g3FfVPLARz4iH\nCaTu55hhbmo/XknUx0hPsrwBMPaNLGl2+o5MU1ZuIkl47EJvdL8CdUuEmb9qJDtv\nUyKqANlwFa7cVW8ij2tjFpjJ7bigRVJsI8odbsEbwmx1b5SLckDOQ7t4l8zERtmo\nUv5fxyNNAgMBAAECggEAApeLyWyL2IXcjPnc7OxG68kGwJQuoW/lnQLcpPcpIUm/\n1Vt/IxzLg2nWGqxmO48xPMLRiOcwA4jq5yCxi56c/avo6qFwUU0JWY2CrxXXge8U\nk0TQ8MrdB2cqI/HHMeYXP1TLfoR3GtvtzemtRhbQyIqxdNL1eC0LDump47BTQYg0\nVPuCxU3zXVIj+Qt0FZa+Pa/nAOGHf5b4ye56L7vxL2BCeRncuHdDcE6Ilkpz79Gv\nkXP1K5j22uEVCmobe1qRlq3BLx2Qimj4h8MI8CKiNS40tGR/oewJ5uMgmeCePYKv\nqSFOwCDvRkw9V2KdGu40WQFEq21mczlv9gMWhp2/EQKBgQDRmBZZM7ugIpo64wx6\nDFYhZo05LmMiwymIfWs2CibzKDeXPuy3OSytvTPVFCkG+RlcYthxAIAn1Z/qJ4UI\n+8c8Zwfg+toYtEa2gTYM2185vmnqQwqmAsaK+4xKZzgfqxie/CBuPzUOZO41q6P8\ni7A2KqXHcDb4SMqnkdGGLk/7+QKBgQDE8dBesgx4DsHFYg1sJyIqKO4d2pnLPkDS\nAzx5xvQuUcVCNTbugOC7e0vGxWmQ/Eqal5b3nitH590m8WHnU9UiE4HciVLe+JDe\nDe5CWzBslnncBjpgiDudeeEubhO7MHv/qZyZXMh73H2NBdO8j0uiGTNbBFoOSTYq\nsFACiCZu9QKBgE2KjMoXn5SQ+KpMkbMdmUfmHt1G0hpsRZNfgyiM/Pf8qwRjnUPz\n/RmR4/ky6jLQOZe6YgT8gG08VVtVn5xBOeaY34tWgxWcrIScrRh4mHROg/TNNMVS\nRY3pnm9wXI0qyYMYGA9xhvl6Ub69b3/hViHUCV0NoOieVYtFIVUZETJRAoGAW/Y2\nQCGPpPfvD0Xr0parY1hdZ99NdRQKnIYaVRrLpl1UaMgEcHYJekHmblh8JNFJ3Mnw\nGovm1dq075xDBQumOBU3zEzrP2Z97tI+cQm3oNza5hyaYbz7aVsiBNYtrHjFTepb\nT1l93ChnD9SqvB+FR5nQ2y07B/SzsFdH5QbCO4kCgYBEdRFzRLvjdnUcxoXRcUpf\nfVMZ6fnRYeV1+apRSiaEDHCO5dyQP8vnW4ewISnAKdjKv/AtaMdzJ5L3asGRWDKU\n1kP/KDBlJkOsOvTkmJ4TxbIhgcSI62/wqDBi5Xqw1ljR2mh8njzRwqDRKs12EtQ0\n9VaUDm7LCNTAskn2SR/o4Q==\n-----END PRIVATE KEY-----\n"
options(googleAuthR.client_id = client_id)
options(googleAuthR.client_secret = private_key)
devtools::reload(pkg = devtools::inst("googleAnalyticsR"))
ga_auth()
in case you need to see what's my translate code like:
translate <- function(tibble) {
tibble <- tibble
count <- data.frame(nchar = 0, cumsum = 0) # create count file to stay within API limits
for (i in 1:nrow(tibble)) {
des <- pull(tibble[i,2]) # extract description as single character string
if (count$cumsum[nrow(count)] >= 80000) { # API limit check
print("nearing 100000 character per 100 seconds limit, pausing for 100 seconds")
Sys.sleep(100)
count <- count[1,] # reset count file
}
if (grepl("^\\s*$", des) == TRUE) { # if description is only whitespace then skip
trns <- tibble(translatedText = "", detectedSourceLanguage = "", text = "")
} else { # else request translation from API
trns <- gl_translate(des, target='en', format='html') # request in html format to anticipate html descriptions
}
tibble[i,3:4] <- trns[,1:2] # add to tibble
nchar = nchar(pull(tibble[i,2])) # count number of characters
req <- data.frame(nchar = nchar, cumsum = nchar + sum(count$nchar))
count <- rbind(count, req) # add to count file
if (nchar > 20000) { # addtional API request limit safeguard for large descriptions
print("large description (>20,000), pausing to manage API limit")
Sys.sleep(100)
count <- count[1,] # reset count file
}
}
return(tibble)
}
I figured it out after 24 hours.
Apparently it is really easy. I just followed the step from this link.
But yesterday I make mistake because the json file I downloaded is the json file from service client Id, while I actually need the json file from service account.
Then I install the googleLanguageR package with this code:
remotes::install_github("ropensci/googleLanguageR")
library(googleLanguageR)
and then just set the file location of my download Google Project JSON file in a GL_AUTH argument like this:
gl_auth("G:/My Drive/0. Thesis/R-Script/ZakiServiceAccou***************kjadjib****.json")
and now I'm happy :)

R Scripts - Use Output Message as if condition

I am using the RGoogleAnalytics library to get all the data from my Google Analytics Account into R. However, complex queries deliver 0 results.
My code looks like:
query.list <- Init(start.date = paste(c(lastmonth.startdate)),
end.date = paste(c(lastmonth.enddate)),
metrics = "ga:goalCompletionsAll",
dimensions = "ga:countryIsoCode,ga:yearMonth",
filters = "ga:goalCompletionsAll>0",
max.results = 10000,
table.id = sprintf("ga:%s", sites$profile.id[i]))
# Create the Query Builder object so that the query parameters are validated
ga.query <- QueryBuilder(query.list)
# Extract the data and store it in a data-frame
ga.countriesConversions1 <- GetReportData(ga.query, token)
Everything is inside a "for", and the script stops if one of the queries end in 0 results, because GetReportData(ga.query, token) cannot create a dataframe if there is no data.
I would like to know if there is a way use the warning message ("Your query matched 0 results. Please verify your query using the Query Feed Explorer and re-run it") fired by the library to the console, assign it to a variable and use this as an if condition. So I could create a dummy data.frame before the next function comes.
Assuming getReportData is throwing an error, then you can try:
ga.countriesConversions1 <- try(GetReportData(ga.query, token), silent=TRUE)
if(inherits(ga.countriesConversions1, "try-error")) {
warning(geterrmessage())
... error handling logic ...
}

Resources