Pupeteer with very large PDF not waiting until loaded - handlebars.js

Problem: Pupeteer generates a PDF when only about 5% of my data is there.
I'm using puppeteer to pass about 3000 lines of text to a handlebars HTML template I'm then trying to use puppeteer to print a PDF from. Had this working earlier today but a Git fiasco made me roll back and now I cant seem to generate a pdf longer than 3.5 pages (earlier this week it was up to about 90).
I'm thinking this has to do with the following:
const browser = await puppeteer.launch({
args: ['--no-sandbox'],
headless: true
});
var page = await browser.newPage();
await page.goto(`data:text/html;charset=UTF-8,${html}`, {
waitUntil:'load'. <------ (i've also tried networkidle0 and networkidle2)
});
await page.pdf(options);
await browser.close()
Heres the template.html
<!DOCTYPE html>
<html>
<head>
<title>PDF</title>
<head>
<style type="text/css">
</style>
<meta charset="utf-8">
</head>
<body>
<ul id="script">
{{#each this}}
<li class={{category}}>{{text}}</li>
{{/each}}
</ul>
</body>
</html>
My data is an array of 3300 objects and I know it's getting where it needs to. Is there anyway to set a static timeout for Puppeteer? I realize this is a lot of data but am I doing something wrong here?

The waitUntil:'load' goto parameter is the default, you don't need to set it, while the networkidle0 and networkidle2 options are waiting for network connections to be finished: as you don't have any of these as it is a plain HTML markup it neither helps to wait until it is populated with your desired data. I would rather suggest you to use domcontentloaded if you want to use waitUntil. You can check what are the exact differences between them in the docs.
I.) Your problem can be solved with a static timeout, it is called page.waitFor. If you are sure all data will be in the pdf in a certain time then you can set a static timeout, e.g. 3000 milliseconds (3 seconds) before the pdf generation.
await page.waitFor(3000);
await page.pdf(options);
II.) If you can access the very last text value of each object, you could also wait for the content to be appeared. But it will only work if you have unique content for each <li> element.
const veryLastItemText = options[options.length - 1].text // if "options" is an array with "category" and "text" property names inside
await page.waitForXPath(`//li[contains(text(), "${veryLastItemText}")]`);
await page.pdf(options);

Related

Using pick in array Nuxt 3 useFetch

I'm trying to replicate the nuxt 3 useFetch sample but no luck so far.
https://v3.nuxtjs.org/getting-started/data-fetching#usefetch
My post.vue page contains the below code:
<template>
<section>
<div v-for="mountain in mountains" :key="mountain">{{mountain}}</div>
</section>
</template>
<script setup>
const runtimeConfig = useRuntimeConfig()
const { data: mountains, pending, error, refresh } = await useFetch('/mountains',{
baseURL: runtimeConfig.public.apiBase,
pick: ['title']
})
console.log(mountains.value)
</script>
For some reason the data is not shown in template.
The console.log() shows Proxy {title: undefined}
I've realized that removing the pick option solves the issue, so I'm wondering if pick only works for objects and can't be used in arrays.
It's odd because the sample is using an array https://v3.nuxtjs.org/api/composables/use-fetch.
The pick option currently works only if you are fetching one document (one Js object).
In the official docs you can see they are fetching a specific document in their API: https://nuxt.com/docs/getting-started/data-fetching.
One option you have is to make an API route that returns one object only.
Inside of your <div>, you are trying to return {{mountain}}, but at the time this page loads there is no such variable--because the const mountains hasn't been fetched yet. What you need to do is add a <div v-if="mountains" v-for="mountain in mountains" :key="mountain">{{mountain}}</div>
For the brief moment it takes before the useFetch function to return mountains,the <div> will not try to show {{mountains}}, because the v-if prevents the <div> from being shown in the first place.

How do I make my firebase database public

I have a database in firebase and I want to make it public like https://publicdata-transit.firebaseio.com/sf-muni
What I see here they have a prefix "pulicdata", How do I get it?
A publicly accessible read-only dashboard, like the one you're referring to, is only available for apps managed by Firebase themselves. You cannot enable it on your own applications.
This won't do any formatting (you can make it pretty if you want), but this will take your snapshot and just put it up on the screen for anyone to see as long as you have your settings for read as true.
<html>
<head>
<script src='https://cdn.firebase.com/js/client/2.2.1/firebase.js'></script>
<script src='https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js'></script>
</head>
<body>
<div id='displaySnapshotDiv'></div>
<script>
var myDataRef = new Firebase('https://MY-FIREBASE-NAME-GOES-HERE.firebaseio.com/');
myDataRef.on('value', function(snapshot) {
displaySnapshot(snapshot.val());
});
function displaySnapshot(snapshot) {
$('<div/>').text(JSON.stringify(snapshot)).appendTo($('#displaySnapshotDiv'));
$('#displaySnapshotDiv')[0].scrollTop = $('#displaySnapshotDiv')[0].scrollHeight;
};
</script>
</body>
</html>
If you want it to be a little more readable, you could do something like:
<!-- language: lang-html -->
<html>
<head>
<script src='https://cdn.firebase.com/js/client/2.2.1/firebase.js'></script>
<script src='https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js'></script>
</head>
<body>
<div id='displaySnapshotDiv'></div>
<script>
var myDataRef = new Firebase('https://MY-FIREBASE-NAME-GOES-HERE.firebaseio.com/');
myDataRef.on('child_added', function(snapshot) {
displaySnapshotNeatly(snapshot.val());
});
function displaySnapshotNeatly(snapshot) {
$('<div/>').text(JSON.stringify(snapshot)).appendTo($('#displaySnapshotDiv'));
};
</script>
</body>
</html>
Here is the second one working in JSFiddle: https://jsfiddle.net/lukeschlangen/rzfn45pz/
And here is the second one with your firebase data (please tell me the security settings for writing are set to something other than true?): https://jsfiddle.net/lukeschlangen/rzfn45pz/2/
It seems like you might want to do some formatting, but this is displaying all of the data.
The data can be available public if you change your database rules to true or use the auth token for authentication. But since you do not want to authenticate access, all you simply need to do is Make you access rules public
for more information check out: https://firebase.google.com/docs/reference/rest/database/
enter image description here

jsdom does not fetch scripts on local file system

This is how i construct it:
var fs = require("fs");
var jsdom = require("jsdom");
var htmlSource = fs.readFileSync("./test.html", "utf8");
var doc = jsdom.jsdom(htmlSource, {
features: {
FetchExternalResources : ['script'],
ProcessExternalResources : ['script'],
MutationEvents : '2.0'
},
parsingMode: "auto",
created: function (error, window) {
console.log(window.b); // always undefined
}
});
jsdom.jQueryify(doc.defaultView, 'https://code.jquery.com/jquery-2.1.3.min.js', function() {
console.log( doc.defaultView.b ); // undefined with local jquery in html
});
the html:
<!DOCTYPE HTML>
<html>
<head></head>
<body>
<script src="./js/lib/vendor/jquery.js"></script>
<!-- <script src="http://code.jquery.com/jquery.js"></script> -->
<script type="text/javascript">
var a = $("body"); // script crashes here
var b = "b";
</script>
</body>
</html>
As soon as i replace the jquery path in the html with a http source it works. The local path is perfectly relative to the working dir of the shell / actual node script. To be honest i don't even know why i need jQueryify, but without it the window never has jQuery and even with it, it still needs the http source inside the html document.
You're not telling jsdom where the base of your website lies. It has no idea how to resolve the (relative) path you give it (and tries to resolve from the default about:blank, which just doesn't work). This also the reason why it works with an absolute (http) URL, it doesn't need to know where to resolve from since it's absolute.
You'll need to provide the url option in your initialization to give it the base url (which should look like file:///path/to/your/file).
jQuerify just inserts a script tag with the path you give it - when you get the reference in the html working, you don't need it.
I found out. I'll mark Sebmasters answer as accepted because it solved one of two problems. The other cause was that I didn't properly wait for the load event, thus the code beyond the external scripts wasn't parsed yet.
What i needed to do was after the jsdom() call add a load listener to doc.defaultView.
The reason it worked when using jQuerify was simply because it created enough of a timeout for the embedded script to load.
I had the same issue when full relative path of the jquery library to the jQueryify function. and I solved this problem by providing the full path instead.
const jsdom = require('node-jsdom')
const jqueryPath = __dirname + '/node_modules/jquery/dist/jquery.js'
window = jsdom.jsdom().parentWindow
jsdom.jQueryify(window, jqueryPath, function() {
window.$('body').append('<div class="testing">Hello World, It works')
console.log(window.$('.testing').text())
})

Meteor Iron:Router Template not Rendering

I have a main page which lists a few text items ("Ideas"), which are clickable links. Clicking on them should take you to a page where you can edit them. Here's my html:
<head>
<title>Ideas</title>
</head>
<body>
</body>
<template name="Ideas">
<ul>
{{#each ideas}}
{{> idea}}
{{/each}}
</ul>
</template>
<template name="idea">
<li>{{text}}</li>
</template>
<template name="ShowIdea">'
<div class="editable" contentEditable="true">{{text}}</div>
</template>
I've added Iron:Router to my project to allow for moving between the pages. Here's the javascript:
Ideas = new Mongo.Collection("ideas");
if (Meteor.isClient) {
Router.route('/', function() {
this.render('Ideas');
});
Router.route('/idea/:_id', function() {
var idea = Ideas.findOne({_id: this.params._id});
this.render('ShowIdea', {text: idea.text});
});
Template.Ideas.helpers({
ideas: function () {
return Ideas.find({});
}
});
}
I inserted a single idea to my Mongo DB using the Meteor Mongo command line tool. That single item shows up properly on my main page. Here's what the HTML looks like in my debugger for the main page:
<html>
<head>...</head>
<body>
<ul>
<li>
The first idea ever
</li>
</ul>
</body>
</html>
Clicking on that link takes me to a new page with an address of:
http://localhost:3000/idea/ObjectID(%22550b7da0a68cb03381840feb%22)
But nothing shows up on the page. In the debugger console I see this error message + stack trace, but it means nothing to me since it all seems to be pertaining to iron-router and meteor, not code which I actually wrote:
Exception in callback of async function: http://localhost:3000/Idea.js?2fd83048a1b04d74305beae2ff40f2ea7741d40d:10:44
boundNext#http://localhost:3000/packages/iron_middleware-stack.js?0e0f6983a838a6516556b08e62894f89720e2c44:424:35
http://localhost:3000/packages/meteor.js?e53378596562e8922a6369c955bab1e047fa866b:978:27
onRerun#http://localhost:3000/packages/iron_router.js?a427868585af16bb88b7c9996b2449aebb8dbf51:520:13
boundNext#http://localhost:3000/packages/iron_middleware-stack.js?0e0f6983a838a6516556b08e62894f89720e2c44:424:35
http://localhost:3000/packages/meteor.js?e53378596562e8922a6369c955bab1e047fa866b:978:27
onRun#http://localhost:3000/packages/iron_router.js?a427868585af16bb88b7c9996b2449aebb8dbf51:505:15
boundNext#http://localhost:3000/packages/iron_middleware-stack.js?0e0f6983a838a6516556b08e62894f89720e2c44:424:35
http://localhost:3000/packages/meteor.js?e53378596562e8922a6369c955bab1e047fa866b:978:27
dispatch#http://localhost:3000/packages/iron_middleware-stack.js?0e0f6983a838a6516556b08e62894f89720e2c44:448:7
_runRoute#http://localhost:3000/packages/iron_router.js?a427868585af16bb88b7c9996b2449aebb8dbf51:543:17
dispatch#http://localhost:3000/packages/iron_router.js?a427868585af16bb88b7c9996b2449aebb8dbf51:844:27
route#http://localhost:3000/packages/iron_router.js?a427868585af16bb88b7c9996b2449aebb8dbf51:710:19
boundNext#http://localhost:3000/packages/iron_middleware-stack.js?0e0f6983a838a6516556b08e62894f89720e2c44:424:35
http://localhost:3000/packages/meteor.js?e53378596562e8922a6369c955bab1e047fa866b:978:27
boundNext#http://localhost:3000/packages/iron_middleware-stack.js?0e0f6983a838a6516556b08e62894f89720e2c44:371:18
http://localhost:3000/packages/meteor.js?e53378596562e8922a6369c955bab1e047fa866b:978:27
dispatch#http://localhost:3000/packages/iron_middleware-stack.js?0e0f6983a838a6516556b08e62894f89720e2c44:448:7
http://localhost:3000/packages/iron_router.js?a427868585af16bb88b7c9996b2449aebb8dbf51:390:21
_compute#http://localhost:3000/packages/tracker.js?21f0f4306879f57e10ad3a97efe9ea521c5b5775:308:36
Computation#http://localhost:3000/packages/tracker.js?21f0f4306879f57e10ad3a97efe9ea521c5b5775:224:18
autorun#http://localhost:3000/packages/tracker.js?21f0f4306879f57e10ad3a97efe9ea521c5b5775:499:34
http://localhost:3000/packages/iron_router.js?a427868585af16bb88b7c9996b2449aebb8dbf51:388:17
nonreactive#http://localhost:3000/packages/tracker.js?21f0f4306879f57e10ad3a97efe9ea521c5b5775:525:13
dispatch#http://localhost:3000/packages/iron_router.js?a427868585af16bb88b7c9996b2449aebb8dbf51:387:19
dispatch#http://localhost:3000/packages/iron_router.js?a427868585af16bb88b7c9996b2449aebb8dbf51:1688:22
onLocationChange#http://localhost:3000/packages/iron_router.js?a427868585af16bb88b7c9996b2449aebb8dbf51:1772:33
_compute#http://localhost:3000/packages/tracker.js?21f0f4306879f57e10ad3a97efe9ea521c5b5775:308:36
_recompute#http://localhost:3000/packages/tracker.js?21f0f4306879f57e10ad3a97efe9ea521c5b5775:322:22
flush#http://localhost:3000/packages/tracker.js?21f0f4306879f57e10ad3a97efe9ea521c5b5775:452:24
And then it ends with this warning message:
Route dispatch never rendered. Did you forget to call this.next() in an onBeforeAction?
I don't have an onBeforeAction (I'm not even sure what that is)... so I don't think that message pertains to me?
I just started using Meteor the other day and just added iron-router not 24 hours ago, so I'm a bit lost here. Any pointers on how I can debug and fix this would be great.
Two things need fixing:
When you insert documents from the shell they are assigned _id values which are mongo ObjectIDs, whereas meteor defaults to using strings. This explains the weird URL. To avoid this problem, it's generally best to initialize your data from the server. Here's an example:
if (Meteor.isServer) {
Meteor.startup(function() {
if (Ideas.find().count() === 0) {
Ideas.insert({text: 'feed the cat'});
}
});
}
Now after a $ meteor reset you will always start with one cat-related idea.
If you wish to pass a context to your template, you'll need to use the data attribute like so:
Router.route('/idea/:_id', function() {
this.render('ShowIdea', {
data: function () {return Ideas.findOne({_id: this.params._id})}
});
});
See this example from the docs. After making those changes, the code worked correctly for me.

parsing html and following a javascript link

I have been asked to extract info by an academic colleague from a website where I need to link the content of a webpage in a table - not too hard with the contents of a text file which is only reacheable (as far as I can tell) by clicking on a javascript link... e.g.
<a id="tk1" href="javascript:__doPostBack('tk1$ContentPlaceHolder1$grid$tk$OpenFileButton','')">
The table is conveniently inside a table with id='tk1' which is nice... but how do I follow the link which pulls the text file.
Ideally I'd like to do this in R... I can grab the relevant table in text format by saying
u <- the url of interest...
library(XML)
tables = readHTMLTable(u)
interestingTable <- tables[grep('tk1', names(tables))]
And this will give the text in the table, but how do I grab the html for that particular table? and how do I "click" on the button and get the text file behind it?
I note that there is a form with massive hidden values - the site appears to be asp.net driven and uses impenetrable URLs.
Many thanks!
This is somewhat tricky, and not fully integrated in R, but some system()-fiddling will get you started.
Download and install phantom javascript: http://code.google.com/p/phantomjs/
Check the short script on http://menne-biomed.de/uni/JavaButton.html, which emulates your case. When you click the javascript anchor, it redirects http://cran.at.r-project.org/ via doPostBack(inaccessibleJavascriptVar).
Save the following script locally as javabutton.js
var page = new WebPage();
page.open('http://www.menne-biomed.de/uni/JavaButton.html', function (status) {
if (status !== 'success') {
console.log('Unable to access network');
} else {
var ua = page.evaluate(function () {
var t = document.getElementById('tk1').href;
var re = new RegExp('\((.*)\)');
return eval(re.exec(t)[1]);
});
console.log(ua);// Outputs http://cran.at.r-project.org/
}
phantom.exit();
});
With phantomjs on path, call
phantomjs javabutton.js
The link will be displayed on the console. Use any method to get it into Rcurl.
Not elegant, but maybe someones wraps phantomjs into R one day. In case the link to JaveButton.html should be lost, here it is as code.
<!DOCTYPE html >
<head>
<script>
inaccesibleJavascriptVar = 'http://' + 'cran.at.r-project.org/';
function doPostBack(myref)
{
window.location.href= myref;
return false;
}
</script>
</head>
<body>
<a id="tk1" href="javascript:doPostBack(inaccesibleJavascriptVar)" >Click here</a>
</body>
</html>
Have a look at the RCurl package:
http://www.omegahat.org/RCurl/

Resources