Scraping Tweets in R httr, jsonlite, dplyr - r

This is my code:
library(httr)
library(jsonlite)
library(dplyr)
bearer_token <- Sys.getenv("BEARER_TOKEN")
headers <- c('Authorization' = sprintf('Bearer %s', bearer_token))
params <- list('expansions' = 'attachments.media_keys')
handle <- readline('BenDuBose')
url_handle <-
sprintf('https://api.twitter.com/2/users/by?username=%s', handle)
response <-
httr::GET(url = url_handle,
httr::add_headers(.headers = headers),
query = params)
obj <- httr::content(response, as = "text")
print(obj)
This is my error message:
[1] "{"errors":[{"parameters":{"ids":[""]},"message":"The number of values in the ids query parameter list [0] is not between 1 and 100"}],"title":"Invalid Request","detail":"One or more parameters to your request was invalid.","type":"https://api.twitter.com/2/problems/invalid-request"}"
My end goal is to scrape an image from a specific tweet ID/user. I already have a list of users and tweet IDs, along with attachments.media_keys. But, I don't know how to use HTTR and I am trying to copy the Twitter Developer example verbatim to learn, but it isn't working.

Related

Get the text in the image description of a tweet using rtweet

Is there a way to get the text used as the image description of tweets? I'm using the package, which allows one to get several pieces of information about a tweet (text, links, hashtags etc) but I can't get this info.
{rtweet} allows one to post a tweet with rtweet::post_tweet and add the image descriptio through the parameter media_alt_text but I can't find this information when I download a tweet using the rtweet::get_timeline function.
reprex
library(rtweet)
# parsing the tweet data
last_tweet_parsed <- rtweet::get_timeline(user = 'esquinadobrasil',
n = 1,
parse = T)
head(last_tweet_parsed)
# not parsing the tweet data
last_tweet_unparsed <- rtweet::get_timeline(user = 'esquinadobrasil',
n = 1,
parse = F)
temp_df <- as.data.frame(last_tweet_unparsed)
head(temp_df)
Using v2 API is much more flexible and sync. with documentation.
Demo Tweet. one of tweet from esquinadobrasil
I will shows how to get image alt text.
https://twitter.com/esquinadobrasil/status/1615009611186069504
I will get red box text (alt_text of image)
sort 893
Demo
require(httr)
require(jsonlite)
require(dplyr)
bearer_token <- "***** your bearer_token *****"
headers <- c(`Authorization` = sprintf('Bearer %s', bearer_token))
params <- list(`expansions` = 'attachments.media_keys',
`media.fields` = 'public_metrics,url,alt_text')
tweet_id <- "1615009611186069504"
url_handle <-
sprintf('https://api.twitter.com/2/tweets/%s', tweet_id)
response <-
httr::GET(url = url_handle,
httr::add_headers(.headers = headers),
query = params)
obj <- httr::content(response, as = "text")
print(obj)
Run & Result
$ rscript get-image.R
[1] "{\"data\":{\"attachments\":{\"media_keys\":[\"3_1615009514297729024\"]},\"text\":\"Municipio: Santo Antônio Da
Platina - PR\\nSetor censitário: 412410305000028\\nPopulação: 718\\nÁrea (Km2): 1.31\\nDensidade (hab/Km2): 548.06\\
nZona: urbana\\n\\uD83D\\uDDFA https://xxx/KagyCLHLrM https://xxx/z1YDyTJArx\",\"id\":\"1615009611186069504\",\"ed
it_history_tweet_ids\":[\"1615009611186069504\"]},\"includes\":{\"media\":[{\"media_key\":\"3_1615009514297729024\",
\"url\":\"https://pbs.twimg.com/media/FmmqmLiXoAAdEmw.jpg\",\"alt_text\":\"sort 893\",\"type\":\"photo\"}]}}"
Main Idea
V2 Get Tweet by ID
GET /2/tweets/:id
One of query parameter media.fields can get the alt_text from documentation.
I tested the same API by Postman.
https://api.twitter.com/2/tweets/1615009611186069504/?expansions=attachments.media_keys&media.fields=url,alt_text
I can get the same Result

Filtering retweets in search_30day() reuquest

I am using the Sandbox account with limited request. I am trying to filter out retweets in my request by the following:
consumer_key <-
consumer_secret <-
access_token <-
access_secret <-
app <-
token = rtweet::create_token(app,consumer_key,consumer_secret,access_token,access_secret)
#period 2
dataBTC29 <- search_30day("Bitcoin analysis lang:en -is:retweet", n = 100, env_name = "Tweets30", fromDate = "202210130000", toDate = "202210130759", parse = FALSE)
However, one of the tweet attributes is "is:retweet", including this you will get retweets, but I read somewhere that using -is:retweet which exclude retweets in your search. However, when I do this, I get the following error:
Error: Twitter API failed [422]
Check error message at https://developer.twitter.com/en/support/twitter-api/error-troubleshooting
When I look up the error, this is what I get:
Check that the data you are sending in your request is valid. For example, this data could be the JSON body of your request or an image.
And if I run the following in R:
consumer_key <-
consumer_secret <-
access_token <-
access_secret <-
app <-
token = rtweet::create_token(app,consumer_key,consumer_secret,access_token,access_secret)
#period 2
dataBTC29 <- search_30day("Bitcoin analysis lang:en", n = 100, env_name = "Tweets30", fromDate = "202210130000", toDate = "202210130759", parse = FALSE)
It does return me a dataset, but 100 rows of only retweets. How can I request such thin gin order for it to work?

R - Use Twitter API to get every tweet from an account

My goal is to get EVERY tweet ever for any twitter account. I picked the NYTimes for this example.
The code below works, but it only pulls the last 100 tweets. max_results does not allow you to put a value over 100.
The code below almost fully copy-paste-able, you would have to have your own bearer token.
How can I expand this to give me every tweet from an account?
One idea is that I can loop it for every day since the account was created, but that seems tedious if there is a faster way.
# NYT Example --------------------------------------------------------------------
library(httr)
library(jsonlite)
library(tidyverse)
bearer_token <- "insert your bearer token here"
headers <- c(`Authorization` = sprintf('Bearer %s', bearer_token))
params <- list(`user.fields` = 'description')
handle <- 'nytimes'
url_handle <- sprintf('https://api.twitter.com/2/users/by?usernames=%s', handle)
response <- httr::GET(url = url_handle,
httr::add_headers(.headers = headers),
query = params)
json_data <- fromJSON(httr::content(response, as = "text"), flatten = TRUE)
json_data %>%
as_tibble()
NYT_ID <- json_data$data$id
url_handle <- paste0("https://api.twitter.com/2/users/", NYT_ID, "/tweets")
params <- list(`tweet.fields` = 'id,text,author_id,created_at,attachments,public_metrics',
`max_results` = '100')
response <- httr::GET(url = url_handle,
httr::add_headers(.headers = headers),
query = params)
json_data <- fromJSON(httr::content(response, as = "text"), flatten = TRUE)
NYT_tweets <- json_data$data %>%
as_tibble() %>%
select(-id, -author_id, -9)
NYT_tweets
For anyone that finds this later on, I found a solution that works for me.
Using the parameters of start_time and end_time you can clarify dates for the tweets to be between. I was able to pull all tweets from November for example and then rbind those to the ones from December, etc. Sometimes I had to do two tweet pulls (half of March, second half of March) to get all of them, but it worked for this.
params <- list(`tweet.fields` = 'id,text,author_id,created_at,attachments,public_metrics',
`max_results` = '100',
`start_time` = '2021-11-01T00:00:01.000Z',
`end_time` = '2021-11-30T23:58:21.000Z')

R Binance API HMAC SHA256 signed message

Im trying to send over signed api messages using the binance APIs I keep failing with a 404 error. can someone help me out with the below code please?
library(jsonlite)
library(httr)
library(dplyr)
library(digest)
timestamp <- 1516941586 #as.numeric(as.POSIXct(Sys.time()))
post_message <- paste0(timestamp, 'public.api' ) # data_client.id = client
id # data_key = key
sha.message <- toupper(digest::hmac('private.api', object = post_message,
algo = 'sha256', serialize = F))
url <- 'https://api.binance.com/api/v3/account'
body = list('timestamp' = timestamp, 'signature' = sha.message)
body2 <- paste("?timestamp=",timestamp,"&signature=",sha.message, sep = "")
httr::POST(url, body2 = body, verbose())
here is the documentation https://github.com/binance-exchange/binance-official-api-docs/blob/master/rest-api.md
Based on example under section "SIGNED Endpoint Examples for POST /api/v1/order" in the website, you can follow something similar. You will need to replace with your own apiKey and secretKey.
library(httr)
library(openssl)
url <- 'https://api.binance.com/api/v3/account'
apiKey <- "vmPUZE6mv9SD5VNHk4HlWFsOr6aKE2zvsw0MuIgwCIPy6utIco14y7Ju91duEh8A"
secretKey <- "NhqPtmdSJYdKjVHjA7PZj4Mge3R5YNiP1e3UZjInClVN65XAbvqqM6A7H5fATj0j"
timestamp <- 1516941586
recvWindow <- 1e20
postmsg <- paste0("timestamp=", timestamp, "&recvWindow=", recvWindow)
signature <- openssl::sha256(postmsg, key=secretKey)
GET(url,
add_headers("X-MBX-APIKEY"=apiKey),
query=list(timestamp=timestamp, recvWindow=recvWindow, signature=signature),
verbose())

Error in trying to pull information off instragram

I've been working on a project with the hopes of pulling off instagram post and comment information from instagram posts over the past year.
I am starting right now with a simple code just to pull out information from a single user.
Here is the code:
require(httr)
full_url <- oauth_callback()
full_url <- gsub("(.*localhost:[0-9]{1,5}/).*", x=full_url, replacement="\1")
print(full_url)
app_name <- "Cognitive Model of the Customer"
client_id <- "b03d4a910f0442b9bd1cd79fc06a086f"
client_secret <- "c35f785784fa45cd9eaf786742ae9b3f"
scope = "basic"
instagram <- oauth_endpoint(
authorize = "https://api.instagram.com/oauth/authorize",
access = "https://api.instagram.com/oauth/access_token")
myapp <- oauth_app(app_name, client_id, client_secret)
ig_oauth <- oauth2.0_token(instagram, myapp,scope="basic", type = "application/x-www-form-urlencoded",cache=FALSE)
tmp <- strsplit(toString(names(ig_oauth$credentials)), '"')
token <- tmp[[1]][4]
library(jsonlite)
library(RCurl)
user_info <- fromJSON(getURL(paste('https://api.instagram.com/v1/users/search? q=',"newbalance",'&access_token=',token,sep="")),unexpected.escape = "keep")
The error I am receiving is
Error in simplify(obj, simplifyVector = simplifyVector, simplifyDataFrame = simplifyDataFrame, :
unused argument (unexpected.escape = "keep")
I'm not sure I understand where this error comes from though.
Before running your code you should load essential packages.
Please load this package and then run your code:
library(rjson)

Resources