Scraping web site using Google Chrome extension - web-scraping

I am trying to build a chrome extension to be used by many users. This chrome extension will always scrape data from the same web site.
Following on-line trainings, I understand that I have to place the scraping logic in a content script. Now as the web site to be scraped contains many pages and many links, I am trying to do this in a way where the user of the extension does not see the main window opening different links.
You'll find below the starting point of the content script
chrome.runtime.onMessage.addListener(function(request,sender, sendResponse){
if (request.todo=="extractData") {
alert("before launching the request");
const request = require('request');
request('https://www.url_to_scrape.com', function(err, res, body) {
alert("in the request");
console.log(body);
});
}
});
I am getting the following error message : "Unchecked runtime.lastError: The message port closed before a response was received."
Any help would be very much appreciated :-)
Hugues

Found the answer, the code should be inserted in the background page, not in the content page.
Also, the list of websites to be crawled should be added in the manifest.json

Related

Web Scrape from Desktop Perspective

I have this static website where I use the Javascript's Fetch API to scrape the public information on some extensions from the Chrome Web Store (I use Just CORS to "bypass" the Same Origin Policy issues).
The scraping works, the problem is that it will fetch a different thing whether I'm on Desktop or Mobile (which makes sense). This is an issue for me, though, because the data that I am fetching is not loaded on Mobile, so when I open my website on Desktop, everything is fine, when I open it on Mobile, that data is, obviously, missing.
Is there a way to put something in the Headers to make sure that the fetch is being made always on a "Desktop perspective"?
I am using this function to do the scraping:
const webScrape = async url => {
return await fetch(url)
.then(result => result.text())
.then(content => (new DOMParser).parseFromString(content, "text/html"))
}

How to correctly set up an OrchardCore website to be used in an external site Iframe?

I have my orchard core website as the target of an Iframe on an external website. It works fine for "AllowAnonymous" pages. But if I hit one that requires authentication, I am redirected to the login page, which is fine, but when trying to log in, I am always getting a "Your browser sent a request that this server could not understand." error.
I have tried the "SuppressXFrameOptionsHeader" and "options.Cookie.Expiration" options, but with no luck
services.AddAntiforgery(options => { options.SuppressXFrameOptionsHeader = true; options.Cookie.Expiration = TimeSpan.Zero; });
Any Idea if what I am trying to do is possible? I am using OrchardCore (1.2.2)

Nuxt SSR blog still calling API endpoints to get blog posts even thought its setup to be SSR

been playing around with a simple blog built with JSONPlaceholder and Nuxt.js
Everything seems fine, I've got an archive and single blog posts working fine but when deployed on Netlify I can see that the browser is still doing API calls to JSONPlaceholder even though all the pages are built static and I can see they already have the content within the HTML.
I used the routes method within generate in the nuxt config to create the 100 html files based upon the JSONPlaceholder /posts results.
Here's the Netlify link: REMOVED.
And a public repo: https://bitbucket.org/oneupstudio/api-test/src/master/
Anything I've missed?
Nuxt.js doesn't support 'full static generation' yet, check this RFC.
For now, you can use this module in order to make your JSON requests static.
Nuxt currenty supports proper static generation of websites. Although one has to be aware of payload param in asyncData. So if payload is present that indicates that static generator is at work and no api calls should be made in this case:
async asyncData ({ params, error, payload }) {
if (payload) return { user: payload }
else return { user: await backend.fetchUser(params.id) }
}
Read more on this here.
RFC mentioned by #DreadMinder will further improve on this, but you can already do full static websites with Nuxt.

Xamarin Forms - Xamarin.Social

I am currently using the Xamarin.Social component in xamarin forms and when I try to post to Facebook, it gives me an error stating: Share Error: The remote server returned an error: (403) Forbidden", anyone know why I am getting this error and how to fix it? Also twitter posting works perfectly fine, so its just Facebook.
Thank You
some Troubleshooting for your issue:
You probably already have your ClientID correctly set up. Check it
again on https://developers.facebook.com/apps
It took me a while to figure out what to enter as for the
RedirectURL. At the moment i am using
https://apps.facebook.com/yourappname/. If this is not working for you, go to your App on Facebook Developers Page > Settings > Add Platform > Facebook Canvas > And enter this url as for the "Canvas Page". Authentication should work fine now, and the 403 error should no longer occur.
My working example, for creating the Facebook service:
public static FacebookService Facebook
{
get
{
if (mFacebook == null)
{
mFacebook = new FacebookService() {
ClientId = "<Your App ID from https://developers.facebook.com/apps>",
RedirectUrl = new Uri ("https://apps.facebook.com/yourappname/")
};
}
return mFacebook;
}
}
For which Platform are you developing?
The rest of the share process is just like in the Xamarin.Social IOS Unified sample.
Don‘t hesitate to ask for clarification.

Error Message: redirect_uri is not owned by the application

::UPDATE:: LINKS DO NOT EXIST ANYMORE!
Very strange indeed, this is definitely a bug! I did a test with app_id from another application and it worked.
See for yourself:
https://apps.megalopes.com/megabraziltv/test.php (app_id correct)
https://apps.megalopes.com/megabraziltv/test2.php (app_id from another application)
---/---
I found several people with the same question and all the answers are equal:
Site URL is not same as REQUEST_URI (Redirecting URL)
My app setting are:
Secure Page Tab URL: apps.megalopes.com/megabraziltv/...
App Domain: megalopes.com
code:
<div id="fb-root"></div>
<script src="http://connect.facebook.net/pt_BR/all.js">
</script>
<script>
FB.init({
appId:'123456789', cookie:true,
status:true, xfbml:true
});
FB.ui({ method: 'apprequests',
message: 'Here is a new Requests dialog...'});
</script>
This simple code is not redirecting to any other url. I tested on the js console getting the same results. Sometimes works and sometimes I get this error message:
API Error Code: 191 API Error Description: The specified URL is not
owned by the application Error Message: redirect_uri is not owned by
the application.
Regardless of being page tab or canvas, you must identify the website Site URL in https://developers.facebook.com/apps
How I fixed:
App Domain: megalopes.com (domain)
Site URL: / Secure Canvas URL: / Secure Page Tab URL: https://www.megalopes.com (subdomain)
I think I have run into something similar before.
In the summary page of your app ensure both the Secure Canvas URL and Page Tab URL are populated.
The URL in my redirect_uri should have "http://" in the beginning. It was missing the protocol information, thus leading Facebook not to recognize my website and throw this annoying 191 error. I finally found out after one hour pulling the hair I (still) have left.
You have to create a channel page, which allows "cross domain communication in certain browsers"
This is an html page (saying /channel.html) on your server, which only contains :
<script src="//connect.facebook.net/en_US/all.js"></script>
And make the Javascript SDK aware of it :
FB.init({
appId: 'xxxxxx',
cookie: true,
channelUrl: location.protocol + '//' + location.host + '/channel.html'
});
More about this :
https://developers.facebook.com/docs/reference/javascript/FB.init/
https://developers.facebook.com/docs/reference/javascript/
It's because of domain URL that you mentioned in facebook's mistake. Domain URL wont be like www.site.com
Update your domain url like subdomain.site.com.
Now it surely work.

Resources