Quickest way to download HTML source run through a JS engine? - web-scraping

I'm looking for a quick way to download HTML from a URL. The page has to be processed by a JS engine first so curl won't cut it. I can do:
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --headless --disable-gpu --dump-dom https://www.nytimes.com/
But that is very slow (takes 5 seconds or more) as I suspect Chrome has a bunch of overhead.
I'd like to get something that works in under a second (assuming web server is fast).

This little python package works great: https://pypi.org/project/cloudscraper/

Related

IE11 custom toolbar buttons - scripting

This is about IE11, which I know has been deprecated. But I'm on Win7 and it's still a tool I use.
Here's the issue:
I have a task that's boring and can be automated on certain external web pages.
I have created a button within the IE11 Toolbar using the approach described on this page. Unfortunately, that page doesn't provide any guidance about what language or file extension should be used for the actual script.
The button does in fact appear in the toolbar and it finds the file to be executed, and the IE 11 console says that it has "navigated there" when I push the button. But the script does not actually execute.
I have tried the file extensions and languages for .bat, .vbs, .js, .wsf, and .htm...just trying to put up a "hello world" message...and nothing works from the browser button even though the scripts execute properly from the command line or URL.
I have relaxed IE 11's security settings so that it shouldn't be blocking anything. The only IE console messages are informational codes HTML1300 and DOM7011
So...what scripting language/file format will actually work in this use case???
Well, after only a few hours I figured out the only thing that seems to work: the "script" has to be an HTML file and the only language you can use is Javascript. No, you can't execute a .js file from the button...it has to be .htm...

How to debug LESS in Chrome?

It looks as though LESS debugging has come a decent distance since even a year ago, and I was wondering how many people have experience with debugging using developer tools in Chrome/Canary.
I'm trying to ensure that when I'm debugging a file, the element's CSS shows up as the LESS file, rather than the CSS file. It's of little use to have CSS line numbers show up, when I need to know the requisite line number of the LESS file. I can do this in firefox with firebug and fireless, but it's not working in chrome
I tried to follow the steps here, however it doesn't appear to be functioning for me correctly even after following the instructions carefully.
I'm running OSX, have LESS installed via node.js, and am using the ST2 plugin Less2CSS in order to process the less file on save. Using the command lessc --line-numbers=mediaquery style.less style.css works as expected and writes this to the top of my css file #media -sass-debug-info{filename{font-family:file\:\/\/\/Applications\/XAMPP\/xamppfiles\/htdocs\/sandbox\/lessDebug\/style\.less}line{font-family:\000035}}, however when inspecting an element, it's still only catching the CSS file, and not the LESS file.
I have the requisite Chrome preferences turned on (Support for SASS and Enable Source Maps)
Thoughts?
This is now working perfectly fine with less.js 1.5b4 and Chrome 30.0.1599.69
Basically, you need to make sure lessc generates valid source map url at the end of your css file:
/*# sourceMappingURL=/templates/lwks/css/template.css.map */
and that the .css.map file is being loaded by the browser. If this is still for some reason not working for you, in check chrome://flags Enable Developer Tools experiments is on
more details here: https://github.com/less/less.js/issues/1050
Blog post author here...I've gone back and updated my post so it now works with regular Chrome 26. Just checked in Canary and it doesn't seem to work anymore. So Chrome 24 - 26 are good but Canary is busted.
I think that the issues that you refer are not related.
As far as I understand you compile your LESS file on the server side and all you want to do is to retrieve the new css file and not the cached one? Am I right?
Did you tried disable cache on google chrome?

Print a file from Watir

I'm looking at Watir-Webdriver to manipulate a browser. In particular, I'd like to open a local file and print it to a PDF file.
Yes, wkhtmltopdf would be a good thing, but it's not working for me on debian squeeze, for reasons that are difficult to ascertain. The page contains Javascript, which rules out many html-to-pdf options. wkhtmltopdf works on OS X, same version (0.9.9), so I know it's not a problem with how I'm using it (PDFKit and Ruby). I'd just like to sidestep these issues and try a different way. Opening up chromium on debian shows a perfectly rendered page.
How does one "print" from Watir?
Edit: After more reading, I think there is no way to do this.
You could take a png sreenshot, then use the prawn gem to convert the png screenshot to a pdf:
require 'prawn'
require 'watir-webdriver'
b = Watir::Browser.start 'watirwebdriver.com'
b.driver.save_screenshot 'screenshot.png'
Prawn::Document.generate 'screenshot.pdf' do
image 'screenshot.png', :scale => 0.5
end
b.close
You'll need to use something that lets you do automation at the OS level. such as Autoit or maybe RAutomation. not sure what exists to do this on *nix operating systems.
Watir only drives the browser in terms of what is inside the browser window, it has very limited capability to work the menus of the browser itself.

Highcharts running off local directory

I am working with a js library named Highcharts and it is only working some of the time.
In all cases the links to the library are relative and I have proven that the code is not at faUlt (at least, not the code I have written). I am also loading a seperate library which works fine on all platforms.
If I load the page in IE the address is:
C:\path\to\file
If I upload it, then:
http://www.foo-bar.com/path/to/file
These both work.
However, if I open it locally in FF or Chrome, then the path is:
file:///C:/path/to/file
This last one doesn't work. Does anyone know why this might be?
Thanks in advance.
The problem here was Access-Control-Allow-Origin. My solution was to load the data as inline jSON, rather than link to a csv with the data in it.

response.write only working IE for ASP.NET

I'm using uploadify (http://www.uploadify.com/) to upload video to my site then convert them into *.flv using ffmpeg and play preview. But it dosen't fully working with firefox, chrome or safari.
uploadify provides a onComplete interface, so when the script (.ashx, .php) used on your site for saving uploaded files. you can use response.write("blabla") or (echo "blabla") to invoke the javascript function that registed as OnComplete.
i have test with few video files like avi, mpg, mp4, they are less then 50mb,and they all worked with all 4 browsers. However, when i was trying to upload a 75mb mp4 file, it worked in IE, but didn't working in other three. I can see the .flv file has been create in the upload folder, i can see debug messsage output after response.write("blabla"), but the javascript function was not invoked. i.e. the preview didn't play.
anyone knows why? is there a timeout or something on response.write so after a period of time it wont work? e.g. 75mb file took longer time to convert than other smaller size file i tried.
thansk
Could be a timeout from the server or caching issue. Or an incorrect uploadifiy property as stated here
after a deeper looking in the source code, and googling around. the problem is narrow down on
DataEvent.UPLOAD_COMPLETE_DATA and firefox issues
someone reported bug
http://bugs.adobe.com/jira/browse/FP-1419

Resources