Web Scrape from Desktop Perspective - web-scraping

I have this static website where I use the Javascript's Fetch API to scrape the public information on some extensions from the Chrome Web Store (I use Just CORS to "bypass" the Same Origin Policy issues).
The scraping works, the problem is that it will fetch a different thing whether I'm on Desktop or Mobile (which makes sense). This is an issue for me, though, because the data that I am fetching is not loaded on Mobile, so when I open my website on Desktop, everything is fine, when I open it on Mobile, that data is, obviously, missing.
Is there a way to put something in the Headers to make sure that the fetch is being made always on a "Desktop perspective"?
I am using this function to do the scraping:
const webScrape = async url => {
return await fetch(url)
.then(result => result.text())
.then(content => (new DOMParser).parseFromString(content, "text/html"))
}

Related

Next.JS - localhost is prepended when making external API call

I got a simple Next app where I'm making an external API call to fetch some data. This worked perfectly fine until a couple days ago - when the app is making an API request, I can see in the network tab that the URL that it's trying to call, got Next app's address (localhost:3000) prepended in front of the actual URL that needs to be called e.g.: instead of http://{serverAddress}/api/articles it is calling http://localhost:3000/{serverAddress}/api/articles and this request resolves into 404 Not Found.
To make the API call, I'm using fetch. Before making the request, I've logged the URL that was passed into fetch and it was correct URL that I need. I also confirmed my API is working as expected by making the request to the expected URL using Postman.
I haven't tried using other library like axios to make this request because simply it doesn't make sense considering my app was working perfectly fine only using fetch so I want to understand why is this happening for my future experience.
I haven't made any code changes since my app was working, however, I was Dockerizing my services so I installed Docker and WSL2 with Ubuntu. I was deploying those containers on another machine, now both, the API I'm calling and Next app are running on my development machine directly when this issue is happening.
I saw this post, I confirmed I don't have any whitespaces in the URL, however, as one comment mentions, I installed WSL2, however, I am not running the app via WSL terminal. Also, I've tried executing wsl --shutdown to see if that helps, unfortunately the issue still persists. If this is the cause of the issue, how can I fix it? Uninstall WSL2? If not, what might be another possible cause for the issue?
Thanks in advance.
EDIT:
The code I'm using to call fetch:
fetcher.js
export const fetcher = (path, options) =>
fetch(`${process.env.NEXT_PUBLIC_API_URL}${path}`, options)
.then(res => res.json());
useArticles.js
import { useSWRInfinite } from 'swr';
import { fetcher } from '../../utils/fetcher';
const getKey = (pageIndex, previousPageData, pageSize) => {
if (previousPageData && !previousPageData.length) return null;
return `/api/articles?page=${pageIndex}&limit=${pageSize}`;
};
export default function useArticles(pageSize) {
const { data, error, isValidating, size, setSize } = useSWRInfinite(
(pageIndex, previousPageData) =>
getKey(pageIndex, previousPageData, pageSize),
fetcher
);
return {
data,
error,
isValidating,
size,
setSize
};
}
You might be missing protocol (http/https) in your API call. Fetch by default calls the host server URL unless you provide the protocol name.
Either put it into env variable:
NEXT_PUBLIC_API_URL=http://server_address
Or prefix your fetch call with the protocol name:
fetch(`http://${process.env.NEXT_PUBLIC_API_URL}${path}`, options)

get History version from firebase hosting

I have site that deployed on firebase hosting site
on the site I wrote Version 3 but instead of that I want to show the deployment / release version like e925c5 that writen on the history version.
can I get history version with Rest API.? or can I define it before deployed.?
thank you
Update
I try with reading documentation and working with it, but still stuck with the authentication method,
Simple function is this, but always returned 403 response, already try with getGlobalDefaultAccount.login() and getGlobalDefaultAccount.getAccessToken() and getGlobalDefaultAccount() still not working,
const requireAuth = require('firebase-tools/lib/requireAuth')
const getGlobalDefaultAccount = require('firebase-tools/lib/auth')
const api = require('firebase-tools/lib/api')
const site = 'sarafe-testing'
requireAuth(getGlobalDefaultAccount.getAccessToken(), ['https://www.googleapis.com/auth/cloud-platform']).then(async () => {
try {
const response = await api.request('GET', `/v1beta1/sites/${site}/releases`, { auth: true, origin: api.hostingApiOrigin })
console.log(response)
} catch(e) {
console.log(e.message)
}
})
Full version
For now I just created table that hold version code (manuali update from firebase page) and make API to get the version, but this not a valid solution because is only show the latest version, some other people just open the webpage without refresh / clear cache so that one is still using the old version of webpage with latest version written,
Yes, there is a REST API that allows you to manage versions and releases for Firebase Hosting, and it has a method to get a list of versions for the site. I recommend checking out its documentation and reporting back if you get stuck while implementing the code.

eHow to transition away from inline editor on actions on google

In a previous Stack Overflow question, I shied away from using an external webhook on Actions on Google
so I needed to go back to the inline editor. I got that worked out, but now I'm feeling brave again.
I've outgrown the inline editor and want the ability to develop my code on my laptop, testing it in Firebase, and publishing to a site for my webhook, presumably where the inline code editor publishes to. In fact, I have already written the require functions and deployed them from Firebase. So the full functionality is ready to go, I just need to hook it up properly to Actions on Google.
What I have now in Actions on Google, inline editor, is more of a stub. I want to merge that stub into my more fullblown logic that I have in Firebase. Here is what is in the inline editor:
const { conversation } = require('#assistant/conversation');
const functions = require('firebase-functions');
const app = conversation();
app.handle('intent_a_handler', conv => {
// Implement your code here
conv.add("Here I am in intent A");
});
app.handle('intent_b_handler', conv => {
// Implement your code here
conv.add("Here I am in intent B");
});
exports.ActionsOnGoogleFulfillment = functions.https.onRequest(app);
When I search on the Internet, I see discussion from the point of view of Dialogflow, but like I say, I'm in "Actions on Google". I want to transition away from the inline editor, taking what I already have, as a basis.Can someone explain how I set that up? I'm happy to do this within the context of the Google ecosystem.
To test your own webhook locally on your own system I would recommend incorporating a web app framework such as express. With express you can host code on your local machine and make it respond to request from Actions on Google. In your case you would replace this will all the code related to the Firebase functions package. Here is an example of what a simple webhook for Actions on Google looks like:
const express = require('express');
const bodyParser = require('body-parser')
const { conversation } = require('#assistant/conversation');
const exprs = express();
exprs.use(bodyParser.json()) // allows Express to work with JSON requests
const app = conversation();
app.handle('example intent', () => {
// Do something
})
// More app.handle() setups
exprs.post('/', app);
exprs.listen(3000);
With this setup you should be able to run your own application locally. The only thing you need to do is install the required dependencies and add your own intent handlers for your action. At this point you have a webhook running on your own machine, but that isn't enough to use it as a webhook in Actions on Google because it runs locally and isn't publicly available via the internet.
For this reason we will be using a tool called ngrok. With ngrok you can create a public https address that runs all messages to your local machine. This way you can use ngrok address as your webhook URL. Now you can just make as many code changes as you want and Actions on Google will automatically use the latest changes when you develop. No need to upload and wait for Firebase to do this.
Just to be clear: Ngrok should only be used for development. When you are done with developing your action you should upload all your code to a cloud service or host it on your own server if you have any. A (free plan) ngrok URL usually expires every 6 hours. So its not a suitable solution for anything other than development.

Scraping web site using Google Chrome extension

I am trying to build a chrome extension to be used by many users. This chrome extension will always scrape data from the same web site.
Following on-line trainings, I understand that I have to place the scraping logic in a content script. Now as the web site to be scraped contains many pages and many links, I am trying to do this in a way where the user of the extension does not see the main window opening different links.
You'll find below the starting point of the content script
chrome.runtime.onMessage.addListener(function(request,sender, sendResponse){
if (request.todo=="extractData") {
alert("before launching the request");
const request = require('request');
request('https://www.url_to_scrape.com', function(err, res, body) {
alert("in the request");
console.log(body);
});
}
});
I am getting the following error message : "Unchecked runtime.lastError: The message port closed before a response was received."
Any help would be very much appreciated :-)
Hugues
Found the answer, the code should be inserted in the background page, not in the content page.
Also, the list of websites to be crawled should be added in the manifest.json

Nuxt SSR blog still calling API endpoints to get blog posts even thought its setup to be SSR

been playing around with a simple blog built with JSONPlaceholder and Nuxt.js
Everything seems fine, I've got an archive and single blog posts working fine but when deployed on Netlify I can see that the browser is still doing API calls to JSONPlaceholder even though all the pages are built static and I can see they already have the content within the HTML.
I used the routes method within generate in the nuxt config to create the 100 html files based upon the JSONPlaceholder /posts results.
Here's the Netlify link: REMOVED.
And a public repo: https://bitbucket.org/oneupstudio/api-test/src/master/
Anything I've missed?
Nuxt.js doesn't support 'full static generation' yet, check this RFC.
For now, you can use this module in order to make your JSON requests static.
Nuxt currenty supports proper static generation of websites. Although one has to be aware of payload param in asyncData. So if payload is present that indicates that static generator is at work and no api calls should be made in this case:
async asyncData ({ params, error, payload }) {
if (payload) return { user: payload }
else return { user: await backend.fetchUser(params.id) }
}
Read more on this here.
RFC mentioned by #DreadMinder will further improve on this, but you can already do full static websites with Nuxt.

Resources