How to set the AirBNB API "extensions" parameter? - web-scraping

I am trying to scrape AirBNB by plain HTTP requests and noticed something.
Let's say we use this search string: "New York, New York, United States".
The simplest working request (striped off from unnecessary headers and fields) I can use to get the desired results is this:
GET /api/v3/ExploreSections?operationName=ExploreSections&locale=en&currency=USD&variables=%7B%22isInitialLoad%22%3Atrue%2C%22hasLoggedIn%22%3Afalse%2C%22cdnCacheSafe%22%3Afalse%2C%22source%22%3A%22EXPLORE%22%2C%22exploreRequest%22%3A%7B%22metadataOnly%22%3Afalse%2C%22version%22%3A%221.8.3%22%2C%22itemsPerGrid%22%3A20%2C%22placeId%22%3A%22ChIJOwg_06VPwokRYv534QaPC8g%22%2C%22query%22%3A%22New%20York%2C%20New%20York%2C%20United%20States%22%2C%22cdnCacheSafe%22%3Afalse%2C%22screenSize%22%3A%22large%22%2C%22isInitialLoad%22%3Atrue%2C%22hasLoggedIn%22%3Afalse%7D%2C%22removeDuplicatedParams%22%3Atrue%7D&extensions=%7B%22persistedQuery%22%3A%7B%22version%22%3A1%2C%22sha256Hash%22%3A%2282cc0732fe2a6993a26859942d1342b6e42830704b1005aeb2d25f78732275e7%22%7D%7D HTTP/2
Host: www.airbnb.com
X-Airbnb-Api-Key: d306zoyjsyarp7ifhu67rjxn52tv0t20
Accept-Encoding: gzip, deflate
At this point, that API key is pretty much public, so not a concern.
The readable content of the "variables" parameter is this:
{
"isInitialLoad": true,
"hasLoggedIn": false,
"cdnCacheSafe": false,
"source": "EXPLORE",
"exploreRequest": {
"metadataOnly": false,
"version": "1.8.3",
"itemsPerGrid": 20,
"placeId": "ChIJOwg_06VPwokRYv534QaPC8g",
"query": "New York, New York, United States",
"cdnCacheSafe": false,
"screenSize": "large",
"isInitialLoad": true,
"hasLoggedIn": false
},
"removeDuplicatedParams": true
}
The readable content of the "extensions" parameter is this:
{
"persistedQuery": {
"version": 1,
"sha256Hash": "82cc0732fe2a6993a26859942d1342b6e42830704b1005aeb2d25f78732275e7"
}
}
I am trying to figure out where that hash comes from.
It seems it's calculated from a GraphQL query but I don't know anything else and there is no documentation about it.
Any help?

I had the same issue (wanted to get the prices) and after investigating in the HAR files that you can get with Chrome, I found out that you get this value from a Javascript file called PdpPlatformRoute.xxx.js
The steps to get this hash are simply to load the file PdpPlatformRoute.xxx.js, then to parse the file to get an "operationId".
If this helps, this is how I did this.
// contentPage is the HTML content of the listing page (e.g. https://www.airbnb.com/rooms/1234567)
function getPdpPlatformRouteUrl(contentPage) {
return 'https://a0.muscache.com/airbnb/static/packages/web/en/frontend/gp-stays-pdp-route/routes/' + `${contentPage}`.match(/(PdpPlatformRoute\.\w+\.\js)/)?.[1];
}
// textContent is the JS content that you get when you fetch the previously found URL
function getSha256(textContent) {
return `${textContent}`.match(/name:'StaysPdpSections',type:'query',operationId:'(.*)'/)?.[1];
}

Related

Linked In API article thumbnails

I'm trying to migrate existing code that fetches organization posts from ugcPosts API to new versioned call of the Posts API(version 202210) and I'm facing issues with getting thumbnails for articles. Response that I get from Posts API doesn't contain thumbnail URL but instead it contains thumbnail URN(old ugcPost API returned thumbnail URL as a part of the post). Here is an example of an article post that I get from API
{
"isReshareDisabledByAuthor": false,
"createdAt": 1666603988797,
"lifecycleState": "PUBLISHED",
"lastModifiedAt": 1666603988797,
"visibility": "PUBLIC",
"publishedAt": 1666603988797,
"author": "urn:li:organization:1111",
"id": "urn:li:share:2222",
"distribution": {
"feedDistribution": "MAIN_FEED",
"thirdPartyDistributionChannels": []
},
"content": {
"article": {
"description": "some description",
"thumbnail": "urn:li:image:3333",
"source": "https://example.com",
"title": "some title"
}
},
"commentary": "some comment",
"lifecycleStateInfo": {
"isEditedByAuthor": false
}
}
I tried to use Images API to fetch thumbnail URL using a call
GET https://api.linkedin.com/rest/images/urn:li:image:3333
Unfortunately Linked In API responds with code 400 and message Invalid asset owner urn type provided: urn:li:article:4444
I don't get why it happens. Token that I'm using has enough permissions to fetch organization posts(token's scope contains permissions w_member_social, r_liteprofile, r_1st_connections_size, w_organization_social, r_member_social, r_organization_social, rw_organization_admin). Article id that presents in error message isn't anyhow connected to post id. It's also not clear why that asset is referenced as urn:li:article while the post itself is described as urn:li:share. To me it looks like Linked In API bug or am I doing something wrong?
Turns out there was a bug in Linked In API which was confirmed by their support. At this moment same calls work fine with version 202210

Fetch company posts from linkedin API

I am trying to fetch the posts of the company from the api, I have already applied to the marketing development platform and it was approved. I already got the token with the scope: r_organization_social and I'm calling the /shares api:
https://api.linkedin.com/v2/shares?q=owners&owners=urn:li:organization:{company_ID}&sharesPerOwner=100&count=25&sharesPerOwner=10
But I'm getting the following response:
{
"paging": {
"start": 0,
"count": 25,
"links": [
{
"type": "application/json",
"rel": "next",
"href": "/v2/shares?count=25&owners=urn%3Ali%3Aorganization%3A{company_ID}&q=owners&sharesPerOwner=10&sharesPerOwner=100&start=0"
}
],
"total": 242
},
"elements": []
}
I tried to change the query params and it's still the same
This end-point worked for me:
https://api.linkedin.com/v2/ugcPosts?q=authors&authors=List(urn%3Ali%3Aorganization%3A<ID_ORGANIZATION>)
See documentation: https://learn.microsoft.com/en-us/linkedin/marketing/integrations/community-management/shares/ugc-post-api?tabs=http#sample-request-6
Disclaimer: I've no access to the linkedin API and couldn't test. But these are some things I noticed:
Your url contains two times the paramater sharesPerOwner, try removing one.
In the docs it's recommended to set the sharesPerOwner to 1000 and the count to 50. I'd also include the start paramater, just to make sure:
Maybe try something like this:
GET https://api.linkedin.com/v2/shares?q=owners&owners=urn:li:organization:{id}&sharesPerOwner=1000&count=50&start=0
From the api-docs(https://learn.microsoft.com/en-us/linkedin/marketing/integrations/community-management/shares/share-api?tabs=http#find-shares-by-owner): "Note that the pagination excludes UGC and Direct Sponsored Content (DSC) posts". Make sure that the owner you are testing contains posts.
If this doesn't work. Could you provide some information on how you are sending the request? Have you tried accessing other parts of the api?

iTunes Lookup API response is different for different clients

Right now it looks like a mystery. Please help me in solving it.
I use iTunes public API to fetch an album: "Metallica" by Metallica (see it in browser: US region, MV region). I construct the following URLs to fetch it via API:
US region https://itunes.apple.com/lookup?id=579372950&country=US&entity=album - works
MV region https://itunes.apple.com/lookup?id=579372950&country=MV&entity=album - doesn't work
Here's the actual behaviour I observe:
If I query GET https://itunes.apple.com/lookup?id=579372950&country=MV&entity=album in a Spring app (using RestTemplate + Jackson HttpMessageConverter) I get an empty response:
{
"resultCount":0,
"results": []
}
If I navigate to https://itunes.apple.com/lookup?id=579372950&country=MV&entity=album in a browser I get a file download prompt. The file contains an empty response:
{
"resultCount":0,
"results": []
}
If I query API using HttpPie http get https://itunes.apple.com/lookup?id=579372950&country=MV&entity=album I get a non-empty response !!!
{
"resultCount": 1,
"results": [
{
"amgArtistId": 4906,
"artistId": 3996865,
"artistName": "Metallica",
"artistViewUrl": "https://music.apple.com/us/artist/metallica/3996865?uo=4",
"artworkUrl100": "https://is1-ssl.mzstatic.com/image/thumb/Music/v4/0b/9c/d2/0b9cd2e7-6e76-8912-0357-14780cc2616a/source/100x100bb.jpg",
"artworkUrl60": "https://is1-ssl.mzstatic.com/image/thumb/Music/v4/0b/9c/d2/0b9cd2e7-6e76-8912-0357-14780cc2616a/source/60x60bb.jpg",
"collectionCensoredName": "Metallica",
"collectionExplicitness": "notExplicit",
"collectionId": 579372950,
"collectionName": "Metallica",
"collectionPrice": 9.99,
"collectionType": "Album",
"collectionViewUrl": "https://music.apple.com/us/album/metallica/579372950?uo=4",
"copyright": "℗ 1991 Blackened Recordings",
"country": "USA",
"currency": "USD",
"primaryGenreName": "Metal",
"releaseDate": "1991-08-12T07:00:00Z",
"trackCount": 13,
"wrapperType": "collection"
}
]
}
I tried it multiple times and the results seem to be consistent. I compared the requests and they seem to be identical.
Why does iTunes respond differently to different clients? I can't understand. What important detail am I missing?
Similar questions:
Spring RestTemplate getForObject URL not working for Apple iTunes - there's another problem (double encoding of the whitespace character).
This problem happens to the following regions (it's a complete list):
LI https://itunes.apple.com/lookup?id=579372950&country=LI&entity=album
MV https://itunes.apple.com/lookup?id=579372950&country=MV&entity=album
MM https://itunes.apple.com/lookup?id=579372950&country=MM&entity=album
ET https://itunes.apple.com/lookup?id=579372950&country=ET&entity=album
RS https://itunes.apple.com/lookup?id=579372950&country=RS&entity=album
I spotted a difference:
http get 'https://itunes.apple.com/lookup?id=579372950&country=MV&entity=album' -> empty response
curl 'https://itunes.apple.com/lookup?id=579372950&country=MV&entity=album' -> empty response
http get https://itunes.apple.com/lookup?id=579372950&country=MV&entity=album -> 1 album in response
curl https://itunes.apple.com/lookup?id=579372950&country=MV&entity=album -> 1 album in response
if I don't use quotes around URL, the request is interpreted as GET https://itunes.apple.com/lookup?id=579372950. the default country is US and therefore I see 1 US album in response.

How to get site's name from WP API

I'm trying to get WordPress website title using javascript and WP API plugin
I didn't find any example on how to get the site's name but I found the variable name under the entities section in the developer guide
function _updateTitle(documentTitle) {
document.querySelector('title').innerHTML =
documentTitle + ' | '+ $http.get('wp-json/name');
}
The output string of $http.get('wp-json/name') is [object Object]
Does anyone know how to use fix this?
You didn't get enough context. What's $http? What happens when you go to wp-json/name directly in your browser? Here's what I see:
[{
"code":"json_no_route",
"message":"No route was found matching the URL and request method"
}]
Here's a simple example to get you the title:
var siteName;
$.getJSON( "/wp-json", function( data ) {
siteName = data.name;
});
See more elegant solution here https://wordpress.stackexchange.com/a/314767/94636
response will not contain extra data like:
authentication: []
namespaces: ["oembed/1.0", "akismet/v1", "acf/v3", "wp/v2"]
routes: {/: {namespace: "", methods: ["GET"],…},…}
timezone_string: ""
...
_links: {help: [{href: "http://v2.wp-api.org/"}]}

Stop GA reporting from appending the domain to the path when previewing

I'm using Google Analytics to track data across multiple domains in a single profile.
By default, reporting only shows the path, not the full URL. This makes it quite confusing where multiple pages on our different domains have the same paths (e.g. '/index' or '/about').
To get round this, I've implemented the filter advised by Google to display the full URL in reporting:
Filter Type: Custom filter > Advanced
Field A: Hostname Extract A: (.*)
Field B: Request URI Extract: (.*)
Output To: Request URI Constructor: $A1$B1
This works just fine ; the only downside is that using the 'preview link' button in the reporting always appends the domain, resulting in a 404 error.
....clicking the 'link preview' icon results in......
Does anyone know a way around this ; either by preventing GA from appending the domain or a better way of displaying the full URLs in reporting?
Thanks Eike - I took your advice and wrote a small browser extension for Chrome. Obviously this isn't an essential, but I wanted to address it as our marketing team use the feature so frequently.
The manifest json :
{
"manifest_version": 2,
"name": "Analytics cross-domain link shortcut",
"version": "1.0",
"description": "Makes the links shortcuts in analytics work when using a 'full url' filter!",
"content_scripts":
[
{
"matches": ["*://*/*"],
"js": ["myscript.js"],
"run_at": "document_start"
}
]
}
And the script:
if (window.opener && document.referrer == "") {
var currentLocation = window.location.href;
if(currentLocation.indexOf("www.appendedurl.com") > -1) {
var newLocation = currentLocation.substr(30); // where '30' is the length of the appended URL
window.location.href = "http://"+newLocation;
}
}
So it's essentially just snipping off the appended URL (if present) on freshly opened popup windows.

Resources