How do I scrape all spotify playlists ever? [closed] - web-scraping

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 days ago.
This post was edited and submitted for review 5 days ago and failed to reopen the post:
Original close reason(s) were not resolved
Improve this question
I am trying to analyze all user-curated Spotify playlists and the tracks inside all of them, especially in the hip-hop genre.
I have tried using search API and Get Category’s Playlist Spotify API but there are limits around 1000 data points.
I am trying to go around the API by thinking of parsing different queries, but still have no idea which query can give me all data. I would appreciate any help!
I am expecting a list of all user-curated Spotify playlist IDs.
This is what I have tried with Get Category’s Playlist Spotify API with Spotipy Library in Google Colab
import pandas as pd
import numpy as np
import spotipy
import spotipy.util as util
from spotipy.oauth2 import SpotifyClientCredentials
import spotipy.oauth2 as oauth2
# Replace Auth details with your Client ID, Secret
spotify_details = {
'client_id' : 'Client ID',
'client_secret':'Client Secret',
'redirect_uri':'Redirect_uri'}
scope = "user-library-read user-follow-read user-top-read playlist-read-private playlist-read-collaborative playlist-modify-public playlist-modify-private"
sp = spotipy.Spotify(
auth_manager=spotipy.SpotifyOAuth(
client_id=spotify_details['client_id'],
client_secret=spotify_details['client_secret'],
redirect_uri=spotify_details['redirect_uri'],
scope=scope,open_browser=False))
results = sp.category_playlists(category_id="hiphop", limit = 5, country="US", offset=0)
total = results["playlists"]["total"]
df=pd.DataFrame([],columns = ['id', 'name', 'external_urls.spotify'])
for offset in range(0,total,50):
results = sp.category_playlists(category_id="hiphop", limit = 50, country="US", offset=offset)
playlists = pd.json_normalize(results['playlists']['items'])
#print(playlists.keys)
df=pd.concat([df,playlists])
df
I only can get around 104 playlists when I run
print(len(df))
>>104
P.S. This number varies around 80-100+ depending on the location of your account.

Main idea is same as #Nima Akbarzadeh's idea with offset
I am using axios call with Spotify API call on node.js
Got the playlists first, then get track within loop each playlist.
This Code can get all of hiphop songs from Spotify.
const axios = require('axios')
const API_KEY='<your client ID>'
const API_KEY_SECRET='<your client Secret>'
const getToken = async () => {
try {
const resp = await axios.post(
url = 'https://accounts.spotify.com/api/token',
data = '',
config = {
params: {
'grant_type': 'client_credentials'
},
auth: {
username: API_KEY,
password: API_KEY_SECRET
}
}
);
return Promise.resolve(resp.data.access_token);
} catch (err) {
console.error(err)
return Promise.reject(err)
}
};
const getCategories = async (category_id, token) => {
try {
let offset = 0
let next = 1
const songs = [];
while (next != null) {
const resp = await axios.get(
url = `https://api.spotify.com/v1/browse/categories/${category_id}/playlists?country=US&offset=${offset}&limit=20`,
config = {
headers: {
'Accept-Encoding': 'application/json',
'Authorization': `Bearer ${token}`,
}
}
);
for(const item of resp.data.playlists.items) {
if(item?.name != null) {
songs.push({
name: item.name,
external_urls: item.external_urls.spotify,
type: item.type,
id : item.id
})
}
}
offset = offset + 20
next = resp.data.playlists.next
}
return Promise.resolve(songs)
} catch (err) {
console.error(err)
return Promise.reject(err)
}
}
const getTracks = async (playlists, token) => {
try {
const tracks = [];
for(const playlist of playlists) {
const resp = await axios.get(
url = `https://api.spotify.com/v1/playlists/${playlist.id}`,
config = {
headers: {
'Accept-Encoding': 'application/json',
'Authorization': `Bearer ${token}`,
}
}
);
for(const item of resp.data.tracks.items) {
if(item.track?.name != null) {
tracks.push({
name: item.track.name,
external_urls: item.track.external_urls.spotify
})
}
}
}
return Promise.resolve(tracks)
} catch (err) {
console.error(err)
return Promise.reject(err)
}
};
getToken()
.then(token => {
getCategories('hiphop', token)
.then(playlists => {
getTracks(playlists, token)
.then(tracks => {
for(const track of tracks) {
console.log(track)
}
})
.catch(error => {
console.log(error.message);
});
})
.catch(error => {
console.log(error.message);
});
})
.catch(error => {
console.log(error.message);
});
I got 6435 songs
$ node get-data.js
[
{
name: 'RapCaviar',
external_urls: 'https://open.spotify.com/playlist/37i9dQZF1DX0XUsuxWHRQd',
type: 'playlist',
id: '37i9dQZF1DX0XUsuxWHRQd'
},
{
name: "Feelin' Myself",
external_urls: 'https://open.spotify.com/playlist/37i9dQZF1DX6GwdWRQMQpq',
type: 'playlist',
id: '37i9dQZF1DX6GwdWRQMQpq'
},
{
name: 'Most Necessary',
external_urls: 'https://open.spotify.com/playlist/37i9dQZF1DX2RxBh64BHjQ',
type: 'playlist',
id: '37i9dQZF1DX2RxBh64BHjQ'
},
{
name: 'Gold School',
external_urls: 'https://open.spotify.com/playlist/37i9dQZF1DWVA1Gq4XHa6U',
type: 'playlist',
id: '37i9dQZF1DWVA1Gq4XHa6U'
},
{
name: 'Locked In',
external_urls: 'https://open.spotify.com/playlist/37i9dQZF1DWTl4y3vgJOXW',
type: 'playlist',
id: '37i9dQZF1DWTl4y3vgJOXW'
},
{
name: 'Taste',
external_urls: 'https://open.spotify.com/playlist/37i9dQZF1DWSUur0QPPsOn',
type: 'playlist',
id: '37i9dQZF1DWSUur0QPPsOn'
},
{
name: 'Get Turnt',
external_urls: 'https://open.spotify.com/playlist/37i9dQZF1DWY4xHQp97fN6',
type: 'playlist',
id: '37i9dQZF1DWY4xHQp97fN6'
},
...
{
name: 'BILLS PAID (feat. Latto & City Girls)',
external_urls: 'https://open.spotify.com/track/0JiLQRLOeWQdPC9rVpOqqo'
},
{
name: 'Persuasive (with SZA)',
external_urls: 'https://open.spotify.com/track/67v2UHujFruxWrDmjPYxD6'
},
{
name: 'Shirt',
external_urls: 'https://open.spotify.com/track/34ZAzO78a5DAVNrYIGWcPm'
},
{
name: 'Back 2 the Streets',
external_urls: 'https://open.spotify.com/track/3Z9aukqdW2HuzFF1x9lKUm'
},
{
name: 'FTCU (feat. GloRilla & Gangsta Boo)',
external_urls: 'https://open.spotify.com/track/4lxTmHPgoRWwM9QisWobJL'
},
{
name: 'My Way',
external_urls: 'https://open.spotify.com/track/5BcIBbBdkjSYnf5jNlLG7j'
},
{
name: 'Donk',
external_urls: 'https://open.spotify.com/track/58lmOL5ql1YIXrpRpoYi3i'
},
... 6335 more items
]
node get-data.js > result.json
Update with Python version
import spotipy
from spotipy.oauth2 import SpotifyOAuth
import json
import re
SCOPE = ['user-library-read',
'user-follow-read',
'user-top-read',
'playlist-read-private',
'playlist-read-collaborative',
'playlist-modify-public',
'playlist-modify-private']
USER_ID = '<your user id>'
REDIRECT_URI = '<your redirect uri>'
CLIENT_ID = '<your client id>'
CLIENT_SECRET = '<your client secret>'
auth_manager = SpotifyOAuth(
scope=SCOPE,
username=USER_ID,
redirect_uri=REDIRECT_URI,
client_id=CLIENT_ID,
client_secret=CLIENT_SECRET)
def get_categories():
try:
sp = spotipy.Spotify(auth_manager=auth_manager)
query_limit = 50
categories=[]
new_offset = 0
while True:
results=sp.category_playlists(category_id='hiphop', limit = query_limit, country='US', offset=new_offset)
for item in results['playlists']['items']:
if (item is not None and item['name'] is not None):
# ['https:', '', 'api.spotify.com', 'v1', 'playlists', '37i9dQZF1DX0XUsuxWHRQd', 'tracks']
tokens = re.split(r"[\/]", item['tracks']['href'])
categories.append({
'id' : item['id'],
'name': item['name'],
'url': item['external_urls']['spotify'],
'tracks': item['tracks']['href'],
'playlist_id': tokens[5],
'type': item['type']
})
new_offset = new_offset + query_limit
next = results['playlists']['next']
if next is None:
break
return categories
except Exception as e:
print('Failed to upload to call get_categories: '+ str(e))
def get_songs(categories):
try:
sp = spotipy.Spotify(auth_manager=auth_manager)
songs=[]
for category in categories:
if category is None:
break
playlist_id = category['playlist_id']
results=sp.playlist(playlist_id=playlist_id)
for item in results['tracks']['items']:
if (item is not None and item['track'] is not None and item['track']['id'] is not None and item['track']['name'] is not None and item['track']['external_urls']['spotify'] is not None):
songs.append({
'id' : item['track']['id'],
'name': item['track']['name'],
'url': item['track']['external_urls']['spotify']
})
else:
break
return songs
except Exception as e:
print('Failed to upload to call get_songs: '+ str(e))
categories = get_categories()
songs = get_songs(categories)
print(json.dumps(songs))
# print(len(songs)) -> 6021
Result by
$ python get-songs.py > all-songs.json

Currently, Spotify will not let you scrape more than 1K as their application even show maximum 1k music (based on this answer).
Also, if there is any offset option, you can set it to 1k, and it will skip the first 1k, so you can get the second chunk.

Related

how can I change all cached query data from only a user_id RTK QUERY

I have a problem.
I fetch data with 2 parameters.
user_id and movie_channel
so a user has multiple movie channels like 1,2 or 3.
I fetch now a query with this params:
user_id: 1, movie_channel: 1
obj:
return {
user: {
user_id: 1,
username: 'assa',
is_follow: false
},
movie_channel: 1,
movies: []
}
then I get a list of movies from this channel and you get users information.
So anyone select now movie_channel 2, then I fetch again and get the obj with different movies.
in the header he can follow a person. (he is current now in movie channel 2)
he can now change the movie_channel to 1 and then I get the cached data. But now user is not followed because he followed in the channel 2. the cache shows the old obj.
how can I change all cached data where only the param is user_id ?
useGetProfileData: builder.query<IProfilePageData, { user_id: number; movie_channel?: number; }>({
query: (data) => ({
url: '/profile_data',
method: 'POST',
body: data
}),
}),
followUser: builder.mutation<void, { user_id: number; follower_id: number; movie_channel?: number; }>({
query: (data) => ({
url: '/follow_user',
method: 'POST',
body: data
}),
async onQueryStarted({ user_id, follower_id, movie_channel }, { dispatch, queryFulfilled }){
const patchResult = dispatch(
ProfileApi.util.updateQueryData('useGetProfileData', { user_id, movie_channel }, (draft) => {
return {
...draft,
user: {
...draft.user,
is_follow: !draft.user.is_follow
}
}
})
);
try {
await queryFulfilled;
} catch {
patchResult.undo();
}
}
}),

With Strapi 4 how can I get each users music events

I'm using strapi 4 with nextjs.
In the app strapi holds music events for each user and each user should be able add and retrieve there own music events.
I am having trouble retrieving
each users music events from strapi 4
I have a custom route and custom controller
The custom route is in a file called custom-event.js and works ok it is as follows:
module.exports = {
routes: [
{
method: 'GET',
path: '/events/me',
handler: 'custom-controller.me',
config: {
me: {
auth: true,
policies: [],
middlewares: [],
}
}
},
],
}
The controller id a file called custom-controller.js and is as follows:
module.exports = createCoreController(modelUid, ({strapi }) => ({
async me(ctx) {
try {
const user = ctx.state.user;
if (!user) {
return ctx.badRequest(null, [
{messages: [{ id: 'No authorization header was found'}]}
])
}
// The line below works ok
console.log('user', user);
// The problem seems to be the line below
const data = await strapi.services.events.find({ user: user.id})
// This line does not show at all
console.log('data', data);
if (!data) {
return ctx.notFound()
}
return sanitizeEntity(data, { model: strapi.models.events })
} catch(err) {
ctx.body = err
}
}
}))
Note there are two console.logs the first console.log works it outputs the user info
The second console.log outputs the data it does not show at all. The result I get back
using insomnia is a 200 status and an empty object {}
The following line in the custom-controller.js seems to be where the problem lies it works for strapi 3 but does not seem to work for strapi 4
const data = await strapi.services.events.find({ user: user.id})
After struggling for long time, days infact, I eventually got it working. Below is the code I came up with. I found I needed two queries to the database, because I could not get the events to populate the images with one query. So I got the event ids and then used the event ids in a events query to get the events and images.
Heres the code below:
const utils = require('#strapi/utils')
const { sanitize } = utils
const { createCoreController } = require("#strapi/strapi").factories;
const modelUid = "api::event.event"
module.exports = createCoreController(modelUid, ({strapi }) => ({
async me(ctx) {
try {
const user = ctx.state.user;
if (!user) {
return ctx.badRequest(null, [
{messages: [{ id: 'No authorization header was found'}]}
])
}
// Get event ids
const events = await strapi
.db
.query('plugin::users-permissions.user')
.findMany({
where: {
id: user.id
},
populate: {
events: { select: 'id'}
}
})
if (!events) {
return ctx.notFound()
}
// Get the events into a format for the query
const newEvents = events[0].events.map(evt => ({ id: { $eq: evt.id}}))
// use the newly formatted newEvents in a query to get the users
// events and images
const eventsAndMedia = await strapi.db.query(modelUid).findMany({
where: {
$or: newEvents
},
populate: {image: true}
})
return sanitize.contentAPI.output(eventsAndMedia,
strapi.getModel(modelUid))
} catch(err) {
return ctx.internalServerError(err.message)
}
}
}))

How to fix 'RealmObject cannot be called as a function' realm-js error?

In a react-native project using Realm-js, I've just created a clone of the app, integrated all libs, and copied over all src directories.
The app builds installs and runs on Android.
When i go through the authentication flow (which utilizes realm to store auth data), i ultimately get an error:
[ Error: RealmObject cannot be called as a function ]
login function:
async function login(username, password) {
try {
const result = await Api.login({
username: username,
pass: password,
});
const userAuthResult = await Db.updateAuth(result);
setUserAuth(userAuthResult);
} catch (err) {
console.log('[ ERROR ]:', err)
if (!err.message || err.message.includes('Network Error')) {
throw new Error('Connection error');
}
throw new Error('Wrong username or password');
}
}
and ive narrowed down the issue to Db.updateAuth(...)
updateAuth:
export const updateAuth = (params) => {
console.log(' [ HERE 1 ]')
const auth = {
id: params.id,
token: params.token,
refreshToken: params.refresh_token,
tokenExpiresAt: Math.floor(Date.now() / 1000) + 600, //params.expires_at,
federatedToken: params.federatedToken ?? '',
federatedTokenExpiresAt: params.federatedTokenExpiresAt ?? 0,
username: params.username,
name: params.name,
roleName: params.role_name,
roleId: params.role_id,
lastLogin: Math.floor(Date.now() / 1000),
};
console.log(' [ HERE 2 ]')
realm.write(() => {
console.log(' [ HERE 3 ]')
realm.create('Authorizations', auth, 'modified'); // PROBLEM
});
return auth;
};
inspecting the schema, i found theres no federatedToken propereties, yet in the auth update object, there are two. not sure why it wouldnt be throwing an error in the original non-cloned app.
authorizations schema:
AuthorizationsSchema.schema = {
name: 'Authorizations',
primaryKey: 'id',
properties: {
id: 'int',
token: 'string',
refreshToken: 'string',
tokenExpiresAt: 'int',
username: 'string',
name: 'string',
roleName: 'string',
roleId: 'int',
lastLogin: 'int',
},
};
Realm.js (class declaration) -> https://pastebin.pl/view/c903b2e2
from realm instantiation:
let realm = new Realm({
schema: [
schema.AccountSchema,
schema.AuthorizationsSchema,
schema.AvailableServiceSchema,
schema.FederatedTokensSchema,
schema.NoteSchema,
schema.PhotoSchema,
schema.PhotoUploadSchema,
schema.PrintQueueSchema,
schema.ProductSchema,
schema.ReportSchema,
schema.ServicesSchema,
schema.UploadQueueJobSchema,
schema.InvoicesSchema,
schema.TestSchema
],
schemaVersion: 60,
deleteRealmIfMigrationNeeded: true,
//path: './myrealm/data',
});
this logs the 1, 2, and 3 statements. The issue seems to come from the 'problem' line. Im not sure what exactly this error means, as there doesnt seem to be anything in realm's repo about it, and in the app this was cloned from, there was no issue with this line. I can also see other lines are throwing similar errors later on the user flows
Anyone know what this is about? or where i can learn more?
React-native: v64.2
realm-js: 10.6.0 (app cloned from was v10.2.0)
MacOS: 11.3 (M1 architecture)
in order to create you have the first call, the realm.write a method like this.
const storeInDataBase = (res,selectedfile) => {
try{
realm.write(() => {
var ID =
realm.objects(DocumentConverstionHistory).sorted('HistoryID', true).length > 0
? realm.objects(DocumentConverstionHistory).sorted('HistoryID', true)[0]
.HistoryID + 1
: 1;
realm.create(DocumentConverstionHistory, {
HistoryID: ID,
Name:`${selectedfile.displayname}.pdf`,
Uri:`file://${res.path()}`,
Date: `${new Date()}`
});
})
}catch(err){
alert(err.message)
}
}
Here is the schema file
export const DATABASENAME = 'documentconverter.realm';
export const DocumentConverstionHistory = "DocumentConverstionHistory"
export const DocumentConverstionHistorySchema = {
name: "DocumentConverstionHistory",
primaryKey: 'HistoryID',
properties: {
HistoryID: {type: 'int'},
Name: {type: 'string'},
Uri: {type: 'string?'},
Type: {type: 'string?'},
Size: {type: 'string?'},
Date: {type: 'date?'}
}
};

How can I upload an image to firebase storage and add it to the database?

I'm new to Vuejs. I want to have a form using which you can add products. The product image goes to firebase storage but how do I associate that image with the exact product in the database?
I've already set up my form, and created two methods. saveProduct() to save the products to the database and onFilePicked() to listen for changes in the input field and target the image and upload that to storage.
import { fb, db } from '../firebaseinit'
export default {
name: 'addProduct',
data () {
return {
product_id: null,
name: null,
desc: null,
category: null,
brand: null,
image: null,
}
},
methods: {
saveProduct () {
db.collection('products').add({
product_id: this.product_id,
name: this.name,
desc: this.desc,
category: this.category,
brand: this.brand
})
.then(docRef => {
this.$router.push('/fsbo/produkten')
})
},
onFilePicked (event) {
let imageFile = event.target.files[0]
let storageRef = fb.storage().ref('products/' + imageFile.name)
storageRef.put(imageFile)
}
}
}
what about this, you can use the filename, your images are going to be served as somefireurl.com/{your_file_name} on your product collection you can have an image prop with the imageFile.name.
methods: {
saveProduct (image = null) {
let productRef = db.collection('products').doc(this.product_id)
const payload = {
product_id: this.product_id,
name: this.name,
desc: this.desc,
category: this.category,
brand: this.brand
}
if (image) payload['image'] = image
return productRef
.set(payload, {merge: true})
.then(docRef => {
this.$router.push('/fsbo/produkten')
})
},
onFilePicked (event) {
let imageFile = event.target.files[0]
let storageRef = fb.storage().ref('products/' + imageFile.name)
storageRef.put(imageFile)
return this.saveProduct(imageFile.name)
}
}
That should be enough to get you started, maybe you want to try a different combination, or maybe you dont want to call saveProduct the way I set it, it's up to your use case but the idea is the same. Hope this can help you
I fixed it myself. Here's my solution. I don't know if it's technically correct but it works for my use case.
methods: {
saveProduct () {
let imageFile
let imageFileName
let ext
let imageUrl
let key
let task
db.collection('products').add({
product_id: this.product_id,
name: this.name,
desc: this.desc,
category: this.category,
brand: this.brand
})
.then(docRef => {
key = docRef.id
this.$router.push('/fsbo/produkten')
return key
})
.then(key => {
if(this.image !== null) {
this.onFilePicked
imageFile = this.image
imageFileName = imageFile.name
ext = imageFileName.slice(imageFileName.lastIndexOf('.'))
}
let storageRef = fb.storage().ref('products/' + key + '.' + ext)
let uploadTask = storageRef.put(imageFile)
uploadTask.on('state_changed', (snapshot) => {}, (error) => {
// Handle unsuccessful uploads
}, () => {
uploadTask.snapshot.ref.getDownloadURL().then( (downloadURL) => {
db.collection('products').doc(key).update({ imageUrl: downloadURL})
});
});
})
},
onFilePicked (event) {
return this.image = event.target.files[0]
}
}

how to get JSON data in from firebase, and then use it in angular 6, firebase return the data with value tag

I am using firebase functions to get data from db, this is how I am doing it,
exports.getTopPlayers = (request,response)=> {
SavePlayers(function(data,err){
if(err) console.log(err);
response.header('Access-Control-Allow-Origin', '*');
response.header(
'Access-Control-Allow-Headers',
'Origin, X-Requested-With, Content-Type, Accept'
);
const dbRef = admin.database().ref().child('topplayers/-LISMykRqLrVcc7xrK60');
dbRef.on('value', snap => {
var dbPlayer = snap.val();
response.send(dbPlayer);
});
});
Then I am using it in my website built in angular 6
getTopPlayers() {
return this.http.get(this.topPlayerURL);
}
It the be data in the below format,
{value: "[{"name":"WHYALWAYSME","tag":"9P08LYLL","rank":1,"…na":"League 8","arenaID":20,"trophyLimit":6100}}]"}
I want to get rid of this value tag. How can I? When I try to loop on this using
ngFor (*ngFor="let tp of topPlayer$) it return error, Cannot loop
[object,object]
I want the data in the below format,
[
{
name: "Leslie",
tag: "RPP89PVY",
rank: 1,
previousRank: 3,
expLevel: 13,
trophies: 6361,
donationsDelta: null,
clan: {
tag: "9CU2PQ2J",
name: "不正经的养老院",
badge: {
name: "Cherry_Blossom_04",
category: "01_Symbol",
id: 16000131,
image: "https://royaleapi.github.io/cr-api-assets/badges/Cherry_Blossom_04.png"
}
},
arena: {
name: "Grand Champion",
arena: "League 8",
arenaID: 20,
trophyLimit: 6100
}
},
I found the solution,
In angular in component init method, I did the following,
Call the service and read the data in a string array,
topPlayer$: string[];
ngOnInit() {
this.topPlayerSrvice.getTopPlayers()
.subscribe(response => {
let topPlayer: string[];
topPlayer = response.json();
this.topPlayer$ = JSON.parse(topPlayer['value']);
});
}

Resources