Get the text in the image description of a tweet using rtweet - r

Is there a way to get the text used as the image description of tweets? I'm using the package, which allows one to get several pieces of information about a tweet (text, links, hashtags etc) but I can't get this info.
{rtweet} allows one to post a tweet with rtweet::post_tweet and add the image descriptio through the parameter media_alt_text but I can't find this information when I download a tweet using the rtweet::get_timeline function.
reprex
library(rtweet)
# parsing the tweet data
last_tweet_parsed <- rtweet::get_timeline(user = 'esquinadobrasil',
n = 1,
parse = T)
head(last_tweet_parsed)
# not parsing the tweet data
last_tweet_unparsed <- rtweet::get_timeline(user = 'esquinadobrasil',
n = 1,
parse = F)
temp_df <- as.data.frame(last_tweet_unparsed)
head(temp_df)

Using v2 API is much more flexible and sync. with documentation.
Demo Tweet. one of tweet from esquinadobrasil
I will shows how to get image alt text.
https://twitter.com/esquinadobrasil/status/1615009611186069504
I will get red box text (alt_text of image)
sort 893
Demo
require(httr)
require(jsonlite)
require(dplyr)
bearer_token <- "***** your bearer_token *****"
headers <- c(`Authorization` = sprintf('Bearer %s', bearer_token))
params <- list(`expansions` = 'attachments.media_keys',
`media.fields` = 'public_metrics,url,alt_text')
tweet_id <- "1615009611186069504"
url_handle <-
sprintf('https://api.twitter.com/2/tweets/%s', tweet_id)
response <-
httr::GET(url = url_handle,
httr::add_headers(.headers = headers),
query = params)
obj <- httr::content(response, as = "text")
print(obj)
Run & Result
$ rscript get-image.R
[1] "{\"data\":{\"attachments\":{\"media_keys\":[\"3_1615009514297729024\"]},\"text\":\"Municipio: Santo Antônio Da
Platina - PR\\nSetor censitário: 412410305000028\\nPopulação: 718\\nÁrea (Km2): 1.31\\nDensidade (hab/Km2): 548.06\\
nZona: urbana\\n\\uD83D\\uDDFA https://xxx/KagyCLHLrM https://xxx/z1YDyTJArx\",\"id\":\"1615009611186069504\",\"ed
it_history_tweet_ids\":[\"1615009611186069504\"]},\"includes\":{\"media\":[{\"media_key\":\"3_1615009514297729024\",
\"url\":\"https://pbs.twimg.com/media/FmmqmLiXoAAdEmw.jpg\",\"alt_text\":\"sort 893\",\"type\":\"photo\"}]}}"
Main Idea
V2 Get Tweet by ID
GET /2/tweets/:id
One of query parameter media.fields can get the alt_text from documentation.
I tested the same API by Postman.
https://api.twitter.com/2/tweets/1615009611186069504/?expansions=attachments.media_keys&media.fields=url,alt_text
I can get the same Result

Related

R - Use Twitter API to get every tweet from an account

My goal is to get EVERY tweet ever for any twitter account. I picked the NYTimes for this example.
The code below works, but it only pulls the last 100 tweets. max_results does not allow you to put a value over 100.
The code below almost fully copy-paste-able, you would have to have your own bearer token.
How can I expand this to give me every tweet from an account?
One idea is that I can loop it for every day since the account was created, but that seems tedious if there is a faster way.
# NYT Example --------------------------------------------------------------------
library(httr)
library(jsonlite)
library(tidyverse)
bearer_token <- "insert your bearer token here"
headers <- c(`Authorization` = sprintf('Bearer %s', bearer_token))
params <- list(`user.fields` = 'description')
handle <- 'nytimes'
url_handle <- sprintf('https://api.twitter.com/2/users/by?usernames=%s', handle)
response <- httr::GET(url = url_handle,
httr::add_headers(.headers = headers),
query = params)
json_data <- fromJSON(httr::content(response, as = "text"), flatten = TRUE)
json_data %>%
as_tibble()
NYT_ID <- json_data$data$id
url_handle <- paste0("https://api.twitter.com/2/users/", NYT_ID, "/tweets")
params <- list(`tweet.fields` = 'id,text,author_id,created_at,attachments,public_metrics',
`max_results` = '100')
response <- httr::GET(url = url_handle,
httr::add_headers(.headers = headers),
query = params)
json_data <- fromJSON(httr::content(response, as = "text"), flatten = TRUE)
NYT_tweets <- json_data$data %>%
as_tibble() %>%
select(-id, -author_id, -9)
NYT_tweets
For anyone that finds this later on, I found a solution that works for me.
Using the parameters of start_time and end_time you can clarify dates for the tweets to be between. I was able to pull all tweets from November for example and then rbind those to the ones from December, etc. Sometimes I had to do two tweet pulls (half of March, second half of March) to get all of them, but it worked for this.
params <- list(`tweet.fields` = 'id,text,author_id,created_at,attachments,public_metrics',
`max_results` = '100',
`start_time` = '2021-11-01T00:00:01.000Z',
`end_time` = '2021-11-30T23:58:21.000Z')

Scraping Tweets in R httr, jsonlite, dplyr

This is my code:
library(httr)
library(jsonlite)
library(dplyr)
bearer_token <- Sys.getenv("BEARER_TOKEN")
headers <- c('Authorization' = sprintf('Bearer %s', bearer_token))
params <- list('expansions' = 'attachments.media_keys')
handle <- readline('BenDuBose')
url_handle <-
sprintf('https://api.twitter.com/2/users/by?username=%s', handle)
response <-
httr::GET(url = url_handle,
httr::add_headers(.headers = headers),
query = params)
obj <- httr::content(response, as = "text")
print(obj)
This is my error message:
[1] "{"errors":[{"parameters":{"ids":[""]},"message":"The number of values in the ids query parameter list [0] is not between 1 and 100"}],"title":"Invalid Request","detail":"One or more parameters to your request was invalid.","type":"https://api.twitter.com/2/problems/invalid-request"}"
My end goal is to scrape an image from a specific tweet ID/user. I already have a list of users and tweet IDs, along with attachments.media_keys. But, I don't know how to use HTTR and I am trying to copy the Twitter Developer example verbatim to learn, but it isn't working.

How to set up POST request to add song to spotify playlist with R

First lets say that i'm really new with R. Yesterday I listened to my favorite radio station. because there was so much advertising in between, I decided to scrape the music they play every day from their webpage. So i can listen to it without any ads.
I wrote a script in R that takes the title and artist of every song the radio played from their website:
### Radio2 playlist scraper ###
#Loading packages#
install.packages("rvest")
library(rvest)
install.packages("dplyr")
library("dplyr")
install.packages("remotes")
remotes::install_github("charlie86/spotifyr")
library(spotifyr)
install.packages('knitr', dependencies = TRUE)
library(knitr)
#Get playlist url #
url <- "https://www.nporadio2.nl/playlist"
#Read HTML code from pagen#
webpage <- read_html(url)
#Get Artist and Title#
artist <- html_nodes(webpage, '.fn-artist')
title <- html_nodes(webpage, '.fn-song')
#Artist and Title to text#
artist_text <- html_text(artist)
title_text <- html_text(title)
#Artist and Title to dataframe#
artiest <- as.data.frame(artist_text)
titel_text <- as.data.frame(title_text)
#Make one dataframe#
radioplaylist <- cbind(artiest$artist_text, titel_text$title_text)
radioplaylist <- as.data.frame(radioplaylist)
radioplaylist
#Rename columns#
colnames(radioplaylist)[1] <- "Artiest"
colnames(radioplaylist)[2] <- "Titel"
radioplaylist
#Remove duplicate songs#
radioplaylistuniek <- radioplaylist %>% distinct(Artiest, Titel, .keep_all = TRUE)
#Write to csv#
date <- Sys.Date()
date
write.csv(radioplaylistuniek, paste0("C://Users//Kantoor//Radio2playlists//playlist - ", date, ".csv"))
#Set spotify API#
Sys.setenv(SPOTIFY_CLIENT_ID = 'caxxxxxxxxxxxxxxxxxx')
Sys.setenv(SPOTIFY_CLIENT_SECRET = '7exxxxxxxxxxxxx')
access_token <- get_spotify_access_token()
clientID <- "xxxxxxxxxxxxxxx"
secret <- "xxxxxxxxxxxxxx"
library(httr)
library(magrittr)
library(rvest)
library(ggplot2)
response = POST(
'https://accounts.spotify.com/api/token',
accept_json(),
authenticate(clientID, secret),
body = list(grant_type = 'client_credentials'),
encode = 'form',
verbose()
)
token = content(response)$access_token
authorization.header = paste0("Bearer ", token)
#Get track info#
call1 <- GET(url = paste("https://api.spotify.com/v1/search?q=track:Ready%20To%20Go%20artist:Republica&type=track&limit=1"), config = add_headers(authorization = authorization.header))
call1
# JSON to TXT#
jsonResponseParsed <- content(call1, as="parsed") #JSON response structured into parsed data
jsonResponseParsed
# Extract track uri#
uri <- jsonResponseParsed$tracks$items[[1]]$uri
uri
# Add track to playlist #
POST(url= "https://api.spotify.com/v1/playlists/29fotSbWUGP1NmWbtGRaG6/tracks?uris=spotify%3Atrack%3A5Qt8U8Suu7MFH1VcJr17Td", config = add_headers(c('Accept="application/json"', 'Content-type= "application/JSON"', 'Authorization="Bearer BQDX9jbz99bCt6TXd7OSaaj12CgCh3s5F6KBwb-ATnv7AFkSnjuEASS9FOW0zx-xxxxxxxxxxxxxx"')))
What do i want?
I want to automatically add every song I picked up to my spotify playlist
What have i got so far?
I created an app via developer.spotify.com. For each song I can get a unique uri which is needed to add the song to my playlist.
Where do i get stuck?
I am unable to add the song to my playlist with a POST REQUEST. I get the message "No token provided".
I have created a sample POST REQUEST via https://developer.spotify.com/console/post-playlist-tracks/?playlist_id=&position=&uris= which adds the song neatly to my playlist. The code is:
POST https://api.spotify.com/v1/playlists/{playlist_id}/tracks
curl -X "POST" "https://api.spotify.com/v1/playlists/29fotSbWUGP1NmWbtGRaG6/tracks?uris=spotify%3Atrack%3A5Qt8U8Suu7MFH1VcJr17Td" -H "Accept: application/json" -H "Content-Type: application/json" -H "Authorization: Bearer BQDX9jbz99bCt6TXd7OSaaj12CgCh3s5F6KBwb-ATxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
How to setup the correct POST request?
Can someone help me with the last part to setup the correct POST request?
#webb thank you. It is working now with the following last code:
# GET user authorization code#
code <- get_spotify_authorization_code(client_id = Sys.getenv("SPOTIFY_CLIENT_ID"),
client_secret = Sys.getenv("SPOTIFY_CLIENT_SECRET"),
scope = "playlist-modify-public")
#Save code#
code2 = code[["credentials"]][["access_token"]]
usercode <- paste0("Bearer ", code2)
#Add track to playlist#
POST("https://api.spotify.com/v1/playlists/29fotSbWUGP1NmWbtGRaG6/tracks?uris=spotify%3Atrack%3A5Qt8U8Suu7MFH1VcJr17Td",
encode="json",
add_headers(Authorization = usercode),
body = "{\"texts\":[\"A simple string\"]}")

How to pull data from News River API into R

Trying to pull data from the NewsRiver API into R. Specifically would like to convert the json provided into a dataframe for further analysis. I would also like to be able to input my own search terms and domain I would like to search from as variables.
https://newsriver.io/
library(httr)
library(jsonlite)
set_config(config(ssl_verifypeer = 0L))
search_1 <- "Amazon"
search_2 <- "World Domination"
website <- "bloomberg.com"
url <- sprintf('https://api.newsriver.io/v2/search?query=text%%3A%s%%20OR%%20text%%3A%s%%20OR%%20website.domainName%%3A%s%%20OR%%20language%%3AEN&sortBy=_score&sortOrder=DESC&limit=100', search_1, search_2, website)
api_key <- "mykey"
news <- GET(url, add_headers(Authorization = paste(api_key, sep = "")))
news_txt <- content(news, as = "text")
news_china_df <- fromJSON(news_txt)

identify the correct CSS selector of a url for an R script

I am trying to obtain data from a website and thanks to a helper i could get to the following script:
require(httr)
require(rvest)
res <- httr::POST(url = "http://apps.kew.org/wcsp/advsearch.do",
body = list(page = "advancedSearch",
AttachmentExist = "",
family = "",
placeOfPub = "",
genus = "Arctodupontia",
yearPublished = "",
species ="scleroclada",
author = "",
infraRank = "",
infraEpithet = "",
selectedLevel = "cont"),
encode = "form")
pg <- content(res, as="parsed")
lnks <- html_attr(html_node(pg,"td"), "href")
However, in some cases, like the example above, it does not retrieve the right link because, for some reason, html_attr does not find urls ("href") within the node detected by html_node. So far, i have tried different CSS selector, like "td", "a.onwardnav" and ".plantname" but none of them generate an object that html_attr can handle correctly.
Any hint?
You are really close on getting the answer your were expecting. If you would like to pull the links off of the desired page then:
lnks <- html_attr(html_nodes(pg,"a"), "href")
will return a list of all of the links at the "a" tag with a "href" attribute. Notice the command is html_nodes and not node. There are multiple "a" tags thus the plural.
If you are looking for the information from the table in the body of then try this:
html_table(pg, fill=TRUE)
#or this
html_nodes(pg,"tr")
The second line will return a list of the 9 rows from the table which one could then parse to obtain the row names ("th") and/or row values ("td").
Hope this helps.

Resources