How do I scrape a Meteor webapp? - meteor

I have a Meteor webapp. (e.g. http://www.merafi.com). I want to scrape the website using Google Apps Script. I wrote a small script for this.
function myFunction() {
const url = 'http://www.merafi.com';
const response = UrlFetchApp.fetch(url, {muteHttpExceptions: true});
return response.getContentText();
}
The script is used inside a Google Spreadsheet as a macro.
=myFunction()
The problem with scraping a Meteor webapp is that I get an empty body with only script tags within it. How do I get the content inside the body tag?

Some crawler like PhantomJS or NightmareJS is required to run the Meteor JS after the page is loaded. Unfortunately, Google Apps Script environment does not allow to load external dependencies / packages. The Apps Script API does not have any method which loads a page in a separate iframe / webview. This is not possible using Google Apps Script.
Thanks to #Floo and #CodeChimp for answering the question in comments.

Related

Host Google sign in client sources locally

We are in the process of implementing Sign In With Google functionality on our website. In the tutorial code snippet, external script is loaded from Google server:
<script src="https://accounts.google.com/gsi/client" async defer></script>
Is it possible to host this library locally? Where can I find all the files that I need to download?
EDIT:
I tried saving the JavaScript file content locally. However, it still tries to load the styles from the external URL (https://accounts.google.com/gsi/style). I guess I could modify the source of JavaScript source code so that it loads this CSS from my server, but it seems like an ugly solution to me. Is there any other way besides modifying their source code?

SSR explained in SvelteKit

I recently started working with Svelte via SvelteKit and I have a few questions about this framework to which I haven't been able to find any direct answers in the source/documentation:
SvelteKit has SSR and in the documentation it says:
All server-side code, including endpoints, has access to fetch in case you need to request data from external APIs.
What code is server-side rendered besides endpoints and how it decides this? All the code from scripts from svelte pages runs on the client or some of it runs on the server?
In order to make use of SSR locally you need an adapter for it or does svelte start a server on its own?
How does SSR work in a production environment like Netlify for example. Is the netlify adapter is used for SSR (running the endpoints in a netlify function)? If a netlify adapter is not provided, how/where would the endpoints run?
If I want to use custom netlify functions in a sveltekit project what configurations are needed (besides netlify.toml and netlify adapter) in order for netlify to recognize the functions from inside the functions directory?
What is the difference here between SSR and prerendering? SSR is used only for endpoints and other js code and prerendering is used for generating the Html to send it to the client which will then be hydrated, with the compiled js code, also sent to the browser?
Thanks!
By default, pages are also server-side rendered when you first visit a site. When you navigate to a subsequent pages they will be client-side rendered.
Adapters are only used in production. If you run npm run dev for local development you still get SSR. In production, how exactly SSR is run depends on the adapter you choose. An adapter is required for production. adapter-node runs SSR on a Node server, adapter-netlify runs SSR in Netlify functions, etc.
See here for discussion of custom Netlify functions: https://github.com/sveltejs/kit/issues/1249
SSR is used for pages as well, but prerendering means that rendering happens at build time instead of when a visitor visits the page. See this proposed documentation for more info: https://github.com/sveltejs/kit/pull/1525
Pages are SSR when you first visit the site, including all the code in the script tag of your svelte page. However, as you navigate to other pages, the code will be run on the client and the page will be dynamically rendered as Sveltekit makes a single page web app look like it has different pages with the history API.
You can decide which code runs on the server and which runs on the client. If you don't do anything special, Sveltekit and your deployment environment will decide that for you. If you want some code to run only in browser (perhaps it needs to use the window object or need authentication), you can use the browser variable.
import { browser } from '$app/environment';
if (browser) {
// Code that runs only in browser
}
You can also put the code in the onMount function, which will be run when the component first mounts, which only happens in browser.
import { onMount } from 'svelte';
onMount(() => {
// Do stuff
})
If you want SSR, you can put the function in the load function in route/+page.js. One typical use case is for a blog entry that grabs the content from the database and populates and formats the content. If you get to the page from a URL, or refresh page, the code in the load function will be executed on the server. If you navigate to the page from elsewhere in your web app, the code will be run on the client, but it will look like SSR as the UI will refresh only after the load function returns (you won't see loading screen or a blank page). See the official docs for more for more.
import { error } from '#sveltejs/kit';
/** #type {import('./$types').PageLoad} */
export function load({ params }) {
if (params.slug === 'hello-world') {
return {
title: 'Hello world!',
content: 'Welcome to our blog. Lorem ipsum dolor sit amet...'
};
}
throw error(404, 'Not found');
}
I am not very sure about how to use Netlify function, as Ben mentions, you can see the discussion on https://github.com/sveltejs/kit/issues/1249. Although I think that you might be able to implement the same functionality with +page.server.js, and the "Actions" to invoke them.

What is the difference between the two Google JS client CDN's?

A) <script src="https://apis.google.com/js/api:client.js"></script>
versus
B) <script src="https://apis.google.com/js/client.js"></script>
The only differnence being the api: before client.js.
CDN A is used in the Google Sign-In for Websites docs in the Building a button with a custom graphic section.
CDN B is used almost in the Google API Client Library for JavaScript (Beta) docs.
They both appear to work interchangeably.
Short answer: there is no difference
Long answer:
The Google JS client CDN is a bit weird because the actual JS you get is dynamically created based on the file name you provide.
You can load multiple components of the library by constructing the URL as module1:module2:module3.js
api is the core part and is always loaded even if you don't add it to the list of modules, because it handles loading the other modules.
Theoretically you could just include api.js and then dynamically load extra modules by calling gapi.load("module", callback) which is exactly what happens when you load api:client.js or just client.js
If for example you would want to use the API Client Library together with the new sign-in methods you could include api:client:auth2.js or client:auth2.js.
And for extra confusion you could even include https://apis.google.com/js/.js which is the same as https://apis.google.com/js/api.js
Use links only from the documentation!
Simple to check this:
1) Add to header of your page this script:
<script src="https://apis.google.com/js/client.js"></script>
Open DevTools -> Network
I see:
2) Change link to other script
<script src="https://apis.google.com/js/api.js"></script>
Open DevTools -> Network
I see:
api.js is the core, when client.js is the module.
Here a completely different content: https://apis.google.com/js/platform.js

Using Meteor with Iron-router to improve crawling with Google AJAX specifications

I have a Meteor site with Iron-Router. When I use Google's webmaster tools and "fetch as Google", it comes up with an empty body.
Reading Google's documentation on how to make the application crawlable, I believe I need to add a meta tag and return a plain html version of the page if the GET parameter ?_escaped_fragment= is sent.
Is there a simple way to do this with Iron-Router? I have tried diverting the browser to a different template if the GET parameter is present, eg:
Router.map(function () {
this.route('home', {
path: '/',
template: 'home',
onBeforeAction: function () {
if (this.params['_escaped_fragment']=='') {
this.route.options.template = 'another_page';
}
},
});
});
However, this just substitutes another template using javascript, which Google won't see either. Is there a way to provide a plain html file if a specific GET parameter is provided?
Add the spiderable package to your project:
meteor add spiderable
This will automatically add the correct <meta> tag to your page, and spiders will be served a PhantomJS-generated version of your site.
Note that the Google Webmaster tools will still show the empty AJAX version of your page in the crawl results, but Google will crawl and index your page correctly. This appears to be a bug in the Webmaster tools. You can verify that the app was successfully crawled by going to your Webmaster tools homepage (where the list of your websites is). Your website should have a screenshot showing what the Google crawler actually saw.
There is a typo:
if (this.params['_escaped_fragment']=='') {
should be
if (this.params['_escaped_fragment_']=='') {
You missed an underscore.

windows mobile & google analytics

im doing a project with windows mobile .net framework
and i need to track every page with google analytics
therefore i create a web browser on every page and hidden it
the web browser will include a local html page
the html page has embed google analytics
however if i use traditional method or async method to integrate GA, it will throw js error with unspecified error, but the html is working if i put it to webserver
therefore i build a GA tracking image url to log the pageview and visitor count, however i found that the cookie can not be save, those every page will generate a new visitor count
any advise
If I understand your question correctly, you are simply trying to add GA tracking to your
WP7 app. If that is the case, you do not have to use a web browser or a tracking image to accomplish this. There is a project over on GitHub that will allow you to do this right in your .xaml pages.
https://github.com/maartenba/GoogleAnalyticsTracker
Get the code, compile it and add a reference to the lib in your phone project, then in your xaml code behind:
Loaded += (s, e) => {
using (var tracker = new Tracker("UA-XXXXXXXX-X", "appname")) {
tracker.TrackPageView("MainPage", "Main");
}
};
Put your GA account info in there for the Tracker object
And presto, you have GA tracking.
HTH

Resources