I can’t get the SpotifyR package to extract songs which have a specific title using the search_spotify command
I have read the Spotify developer page and the Spotifyr package readme.
Reproducible example below:
library(spotifyr)
Sys.setenv(SPOTIFY_CLIENT_ID = ‘myID’)
Sys.setenv(SPOTIFY_CLIENT_SECRET = ‘myCLIENTSECRET’)
access_token <- get_spotify_access_token()
searchresults <- search_spotify('Zooropa','track')
I would expect the results of this to be the Spotify tracks with the title “Zooropa”. This should be 7 results due to the presence of karaoke and tribute songs which include Zooropa in the track title. Instead the results are 16 observations, including every one of the 10 tracks on the Zooropa album, even those not called Zooropa (e.g. Lemon and Babyface).
Since I am searching in the ‘track’ field I don’t understand why I get the extra 9 results.
The spotify api returns the track id's for all of the search results. Zooropa is also an album as well as a song. If you search "Zooropa" inside of Spotify there will be more than 1 result.
Having researched this I have discovered that the spotifyr package does not allow the same search options as the underlying Spotify API. When using the search command in the Spotify API the "type" parameter allows the search to be restricted to tracks (see here). This type parameter cannot currently be used in spotifyr.
However, I have created a workaround which might be useful to others and the code for this is below. Stringdist is used because after I have extracted the data using spotifyr, I need to identify which track names are closest to "Zooropa". I use the jw method but others could be used.
library(spotifyr)
library(stringdist)
Sys.setenv(SPOTIFY_CLIENT_ID = 'yourID')
Sys.setenv(SPOTIFY_CLIENT_SECRET = 'yourclientsecret')
access_token <- get_spotify_access_token()
searchresults <- search_spotify('Zooropa','track')
artists <-searchresults$artists
artists2 <-lapply(artists, function(x) x[,3])
artists3 <-lapply(artists2, function(x) paste(x, collapse=', '))
searchresults$realartist <-as.data.frame(unlist(artists3))
usefuloutput <-cbind(searchresults$id,searchresults$name,searchresults$realartist)
colnames(usefuloutput) <- c("Spotifyid", "Spotifyname","Spotifyartist")
usefuloutput$titlecomparison <-stringdist(usefuloutput[,2],'Zooropa',method="jw")
Related
I am trying to connect to Qualtrics API using Rstudio Cloud "httr" package to download mailing lists. After a review of the API documentation I was unable to download the data, getting the following error after running the code:
"{"meta":{"httpStatus":"400 - Bad Request","error":{"errorMessage":"Expected authorization in headers, but none provided.","errorCode":"ATP_2"},"requestId":"8fz33cca-f9ii-4bca-9288-5tc69acaea13"}}"
This does not makes me any sense since I am using a inherit auth from parent token. Here is the code:
install.packages("httr")
library(httr)
directoryId<-"POOL_XXXXX"
mailingListId <- "CG_XXXXXX"
apiToken<-"XXXX"
url<- paste("https://iad1.qualtrics.com/API/v3/directories/",directoryId,
"/mailinglists/",mailingListId,"/optedOutContacts", sep = "")
response <- VERB("GET",url, add_headers('X_API-TOKEN' = apiToken),
content_type("application/octet-stream"))
content(response, "text")
Any help will be appreciated.
Thanks in advance.
Your call to httr::VERB breaks the API token and the content type into two arguments to the function, but they should be passed together in a vector to a single "config" argument. Also, content_type isn't a function, it's just the name of an element in that header vector. This should work:
response <- VERB("GET", url, add_headers(c(
'X_API-TOKEN' = apiToken,
'content_type' = "application/octet-stream")))
Note that mailing lists will be returned by Qualtrics as lists that will include both a "meta" element and a "result" element, both of which will themselves be lists. If the list is long, the only the first 100 contacts on the list will be returned; there will be an element response$result$nextpage that will provide the URL required to access the next 100 results. The qualtRics::fetch_mailinglist() function does not work with XM Directory contact lists (which is probably why you got a 500 error when using it), but the code for unpacking the list and looping over each "nextpage" element might be helpful.
I have a problem in scraping filtered followers in twitter using R.
To be specific, i'd like to scrape twitter followers
who have tweets more than 1,
who mentioned certain keywords in their tweets,
who retweets certain users(people)'s tweets
among A followers(A just for an example).
Below are packages i used. Spaces are to block my private API keys.
install.packages("devtools")
library(devtools)
install_github("mkearney/rtweet")
install.packages("rtweet")
library(rtweet)
api_key <-
api_secret_key <-
access_token <-
access_token_secret <-
mytoken<-create_token(app = "", api_key,api_secret_key, access_token, access_token_secret, set_renv = FALSE)
Codes I used to scrape twitter followers who follow certain twitter users are followed.
(FoxNews is only for the purpose to illustrate an example)
FoxNews.followers<-get_followers(user="FoxNews",n=75000)
FoxNews.followers.data<-lookup_users(users=FoxNews.followers$user_id)
df.FoxNews<-data.frame(FoxNews.followers)
What i need is the list of filtered followers(conditions 1,2,3 above) among <FoxNews.followers>.
I think I can get them if i put some codes in mine but i'm not sure what to put as well as where to put.
I'd so appreciate any advice.
Thanks.
The answer to the question located here does not seem to work anymore: Obtaining twitter screen names from a twitter list
Does anyone know if the twitteR package has been changed since it was answered in 2015. If so, is there a way to download the members of a public list in the current version?
Here's the previous answer's code updated to include a current lists. Requires a Twitter API authorization. It now returns a list of length 0 when it should have a list of the 20 Premier League club names.
library(rjson)
library(httr)
library(twitteR)
twlist <- "premier-league-clubs"
twowner <- "TwitterUK"
api.url <- paste0("https://api.twitter.com/1.1/lists/members.json?slug=",
twlist, "&owner_screen_name=", twowner, "&count=5000")
response <- POST(api.url, config(token=twitteR:::get_oauth_sig()))
response.list <- fromJSON(content(response, as = "text", encoding = "UTF-8"))
users.names <- sapply(response.list$users, function(i) i$name)
users.screennames <- sapply(response.list$users, function(i) i$screen_name)
head(users.names)
The author of the package in his github account mentions that the twitteR package is deprecated in favor or rtweet. Probably you have to take a look to the documentation of the rtweet package.
Swapping from POST to GET when making the request seems to work for lists I am retrieving.
I'm stuck on this one after much searching....
I started with scraping the contents of a table from:
http://www.skatepress.com/skates-top-10000/artworks/
Which is easy:
data <- data.frame()
for (i in 1:100){
print(paste("page", i, "of 100"))
url <- paste("http://www.skatepress.com/skates-top-10000/artworks/", i, "/", sep = "")
temp <- readHTMLTable(stringsAsFactors = FALSE, url, which = 1, encoding = "UTF-8")
data <- rbind(data, temp)
} # end of scraping loop
However, I need to additionally scrape the detail that is contained in a pop-up box when you click on each name (and on the artwork title) in the list on the site.
I can't for the life of me figure out how to pass the breadcrumb (or artist-id or painting-id) through in order to make this happen. Since straight up using rvest to access the contents of the nodes doesn't work, I've tried the following:
I tried passing the painting id through in the url like this:
url <- ("http://www.skatepress.com/skates-top-10000/artworks/?painting_id=576")
site <- html(url)
But it still gives an empty result when scraping:
node1 <- "bread-crumb > ul > li.activebc"
site %>% html_nodes(node1) %>% html_text(trim = TRUE)
character(0)
I'm (clearly) not a scraping expert so any and all assistance would be greatly appreciated! I need a way to capture this additional information for each of the 10,000 items on the list...hence why I'm not interested in doing this manually!
Hoping this is an easy one and I'm just overlooking something simple.
This will be a more efficient base scraper and you can get progress bars for free with the pbapply package:
library(xml2)
library(httr)
library(rvest)
library(dplyr)
library(pbapply)
library(jsonlite)
base_url <- "http://www.skatepress.com/skates-top-10000/artworks/%d/"
n <- 100
bind_rows(pblapply(1:n, function(i) {
mutate(html_table(html_nodes(read_html(sprintf(base_url, i)), "table"))[[1]],
`Sale Date`=as.Date(`Sale Date`, format="%m.%d.%Y"),
`Premium Price USD`=as.numeric(gsub(",", "", `Premium Price USD`)))
})) -> skatepress
I added trivial date & numeric conversions.
I belive your main issue is that the site requires a login to get the additional data. You should give that (i.e. logging in) a shot using httr and grab the wordpress_logged_inXXXXXXX… cookie from that endeavour. I just grabbed it from inspecting the session with Developer Tools in Chrome and that will also work for you (but it's worth the time to learn how to do it via httr).
You'll need to scrape two additional <a … tags from each table row. The one for "artist" looks like:
Pablo Picasso
You can scrape the contents with:
POST("http://www.skatepress.com/wp-content/themes/skatepress/scripts/query_artist.php",
set_cookies(wordpress_logged_in_XXX="userid%XXXXXreallylongvalueXXXXX…"),
encode="form",
body=list(id="pab_pica_1881"),
verbose()) -> artist_response
fromJSON(content(artist_response, as="text"))
(The return value is too large to post here)
The one for "artwork" looks like:
Les femmes d′Alger (Version ′O′)
and you can get that in similar fashion:
POST("http://www.skatepress.com/wp-content/themes/skatepress/scripts/query_artwork.php",
set_cookies(wordpress_logged_in_XXX="userid%XXXXXreallylongvalueXXXXX…"),
encode="form",
body=list(id=576),
verbose()) -> artwork_response
fromJSON(content(artwork_response, as="text"))
That's not huge but I won't clutter the response with it.
NOTE that you can also use rvest's html_session to do the login (which will get you cookies for free) and then continue to use that session in the scraping (vs read_html) which will mean you don't have to do the httr GET/PUT.
You'll have to figure out how you want to incorporate that data into the data frame or associate it with it via various id's in the data frame (or some other strategy).
You can see it call those two php scripts via Developer Tools and it also shows the data it passes in. I'm also really shocked that site doesn't have any anti-scraping clauses in their ToS but they don't.
I would like to retrieve a list of tweets from Twitter for a given hashtag using package RJSONIO in R. I think I am pretty close to the solution, but I seem to miss one step.
My code reads as follows (in this example, I use #NBA as a hashtag):
library(httr)
library(RJSONIO)
# 1. Find OAuth settings for twitter:
# https://dev.twitter.com/docs/auth/oauth
oauth_endpoints("twitter")
# Replace key and secret below
myapp <- oauth_app("twitter",
key = "XXXXXXXXXXXXXXX",
secret = "YYYYYYYYYYYYYYYYY"
)
# 3. Get OAuth credentials
twitter_token <- oauth1.0_token(oauth_endpoints("twitter"), myapp)
# 4. Use API
req=GET("https://api.twitter.com/1.1/search/tweets.json?q=%23NBA&src=typd",
config(token = twitter_token))
req <- content(req, as = "text")
response=fromJSON(req)
How can I get the list of tweets from object 'response'?
Eventually, I would like to get something like:
searchTwitter("#NBA", n=5000, lang="en")
Thanks a lot in advance!
The response object should be a list of length two: statuses and metadata. So, for example, to get the text of the first tweet, try:
response$statuses[[1]]$text
However, there are a couple of R packages designed to make just this kind of thing easier: Try streamR for the streaming API, and twitteR for the REST API. The latter has a searchTwitter function exactly as you describe.