Google Sheets > importxml > Instagram Post - web-scraping

I'm trying to pull the URL of the first post on an Instagram profile but keep getting either 'content is empty' or 'cannot be parsed'. The example is:
Profile: https://www.instagram.com/bbcradio1/
The link within the code: https://www.instagram.com/p/BtwOOwogtvC/
I've tried:
=IMPORTXML("https://www.instagram.com/"&A2,"//article[#class='FyNDV']/a/#href")
but to no avail. (A2 has the instagram name in).
How can I get this to pull?

Related

Why am I not receiving any images from the HERE Places (Search) API?

I'm trying to get images for restaurants using the HERE Places (Search) API.
I'm using the "Browse" entrypoint, and then using the href in there to get a restaurant's details. In it, I keep on getting this:
media: {
images: {
available:0
items: [ ]
}
The same for reviews and ratings.
Based on other posts here, I'm confused what the problem is, as one post seemingly says its a bug, and one post seemingly says it's just the way the API is.
First of all, "HERE Places API" is deprecated. You should migrate to "HERE Geocoding and Search API v7". Check this out https://developer.here.com/documentation/geocoding-search-api/migration_guide/index.html
As already explained in this question
Include Review,Rating and Images in places API , the API will return the place IDs of external suppliers (TripAdvisor, Yelp etc). This is true also for the latest "HERE Geocoding and Search API v7". With these IDs, you can retrieve other details (such images, reviews, etc) from external system APIs.

google analytics duplicate url with unicode character

when I checked my google analytics > acquisition > search console > landing page
understand that I have 2 URLs for each blog post.
for example:
blog/429/legal/اسقاط-کافه-خیارات-به-چه-معناست/
and
/blog/429/legal/%D8%A7%D8%B3%D9%82%D8%A7%D8%B7-%DA%A9%D8%A7%D9%81%D9%87-%D8%AE%DB%8C%D8%A7%D8%B1%D8%A7%D8%AA-%D8%A8%D9%87-%DA%86%D9%87-%D9%85%D8%B9%D9%86%D8%A7%D8%B3%D8%AA/
Both refer to one blog post.
But the main problem is statistics:
URL #2 have 0 Impressions, clicks and CTR but correct position. Also URL #1 have correct Impressions, clicks and ctr, but incorrect position.
My blog posts have canonical tag and I check all internal link building. I used all linked with same form (for example: example.com/blog/429/legal/اسقاط-کافه-خیارات-به-چه-معناست/)
now
1- what is the source of problem and
2- how to fix it?
This is an issue of Google Search Console. The url is sent to Google Analytics in encoded form and the tool manages to convert it to show ad decoded. When it retrieves it from the search console it shows it as it receives it. I don't think there is an effective solution with the two tools, however you can export data and manage them in another tool, for example with Javascript (i.e. in Spreadsheet and Google Apps Script) you can de decode a URI with only one operation so after that you can build a table (in Spreadsheet) that finds the matches and compare the metrics.
<div id="get_url_encoded">/blog/429/legal/%D8%A7%D8%B3%D9%82%D8%A7%D8%B7-%DA%A9%D8%A7%D9%81%D9%87-%D8%AE%DB%8C%D8%A7%D8%B1%D8%A7%D8%AA-%D8%A8%D9%87-%DA%86%D9%87-%D9%85%D8%B9%D9%86%D8%A7%D8%B3%D8%AA/</div>
<br /><br />
<div id="set_url_decoded"></div>
<script>
var uri = document.getElementById("get_url_encoded").innerHTML;
var uri_dec = decodeURIComponent(uri);
document.getElementById("set_url_decoded").innerHTML = uri_dec;
</script>
https://jsfiddle.net/michelepisani/de058c4o/5/

How to use Instagram graphql to get profile posts urls?

I'm trying to get the post urls of an instagram profile.
With the following I can get a json of the main page of the profile which includes the first 12 post urls, as the rest are loaded while scrolling down the profile page.
https://www.instagram.com/pele/?__a=1
Also with the following I can get a json which includes the users that follow the given profile, and get as many of them as I want by changing the value of "first=XXXX". (You need to be logged in to instagram for that)
https://www.instagram.com/graphql/query/?query_id=17851374694183129&id=590087950&first=100
How could I possibly get the list of posts of this user including the url (most important) but maybe also information as the likes or comments of that post using Instagram graphql ?
In fact what I need is the ID of the post (The part after /p/ ) so that I can make the url like that:
https://www.instagram.com/p/Bg1DhGmDAsU/
Since a few days the ?__a=1 way to get the last posts is blocked.
To get the last 15 Post from a Instagram profile you have to get the url and extract the _sharedData JSON
Javascript
let url = "https://www.instagram.com/"+username;
$.ajax({
type: 'GET',
url: url,
error: function () {
//..
},
success: function (data) {
data = JSON.parse(data.split("window._sharedData = ")[1].split(";</script>")[0]).entry_data.ProfilePage[0].graphql;
console.log(data);
}
})
You can do this in client or server side without login.

Rfacebook Packages getpage() command only retrieving a few posts from Facebook Pages

I recently tried Rfacebook package by pablobarbera, which works quite well. I am having this slight issue, for which I am sharing the code.
install.packages("Rfacebook") # from CRAN
library(devtools)
install_github("Rfacebook", "pablobarbera", subdir = "Rfacebook")
library(Rfacebook)
# token generated here: https://developers.facebook.com/tools/explorer
token <- "**********"
page <- getPage("DarazOnlineShopping", token, n = 1000)
getPage command works, but it only retrieves 14 records from the Facebook page I used in the command. In the example used by pablobarbera in the original post he retreived all the posts from "Humans of New York", but when I tried the same command, facebook asked me to reduce the number of posts, and I hardly managed to get 20 posts. This is the command used by Pablo bera:
page <- getPage("humansofnewyork", token, n = 5000)
I thought I was using temporary token access that why Facebook is not giving me the required data, but I completed the wholo Facebook Oauth Process, and the same result.
Can somebody look into this, and tell why this is happening.
The getPage() command looks fine to me, I manually counted 14 posts (including photos) on the main page. It could be that Daraz Online Shopping has multiple pages and that the page name you are using only returns results from the main page, when (I assume) you want results from all of them.
getPage() also accepts page IDs. You might want to collect a list of IDs associated with Daraz Online Shopping, loop through and call each of them and combine the outputs to get the results you need.
To find this out these IDs you could write a scraper (or manually search for them all) that views the page source and searches for the unique page ID. Searching for content="fb://page/?id= will highlight the location of the page ID in the source code.

Issue in Getting the tweets of whom I follow using twitter API

I am using Twitter API and Flex 4 for creating Desktop App. I need to show tweets in two parts:
1)Tweets of User ABC to be shown in one section.
2)The tweets of people whom the User ABC is following, to be shown in another section.
I achieved point 1 by using :
https://api.twitter.com/1/statuses/user_timeline.xml?&screen_name="+ usrObj.username + "&count=" + usrObj.count;
But getting Bad Authentication Error while trying for point no.2.
I am hitting the following URL using HTTPSERVICE:
https://api.twitter.com/1.1/statuses/home_timeline.json
Also, I used:
https://api.twitter.com/1/statuses/home_timeline.xml?&screen_name="+ usrObj.username + "&count=" + usrObj.count;
where usrObj is an object.
Getting the following error message:
Error #2032: Stream Error. URL: https://api.twitter.com/1.1/statuses/home_timeline.json" errorID=2032
Please let me know whether I am following proper url and queries. Can anyone suggest me as to how to get the tweets exactly?
Got my work done by using the following url for retrieving home timeline:
https://api.twitter.com/1/statuses/following_timeline.xml

Resources