Download complex 3D JS/Canvas webpage for offline - web-scraping

I'm attempting to download a truly offline version of this interactive 3D model of a home, as seen on this webpage: https://my.matterport.com/show/?m=bDbFD5mSEb5&play=1&lang=en
Because it loads dynamically, all the basic offline downloaders won't work.
I'm wondering if there's a way to extract a browser's cache, after all the page's content has downloaded? But all modern browsers seem to cache in strange encrypted DB files, and it's not easy to convert that into basic HTML/CSS/JS.
Is there a more sophisticated tool to extract a dynamic website? Ideally I'd end up with just pure HTML, CSS, and JS, as it's only an HTML5 webpage that loads images onto a canvas. Definitely possible in theory.

Related

Convert VML graphic to bitmap on the ASP.NET server

I'm using RaphaelJS. The browser renders a chart for me (which is in VML as all my users are on IE). I want the user to be able to save this image and share it normally eg paste into an email, into powerpoint, into a document etc etc.
Problem is not many things can render VML. I can easily get the VML markup describing the image back to the server. All I want to do is convert it to some kind of more universal format eg PNG, BMP, GIFF, whatever which I can then allow the user to download.
I've seen lots of people struggling with this. I would have thought the seeing as VML is Microsoft's proprietary SVG format they might have at least provided facilities within their own languages (C#,VB.net) to convert VML to bitmaps.
Anyone know how?
(Incidentally I can't use PHP - I've seen a lot of people attempting to solve this with a PHP based solution)
Thanks :)
You can use the Internet Explorer ActiveX control and screenshotting techniques to achieve this. See the open source tool IECapt as an example.
You could also evaluate whether a cloud service like Litmus API could be used.

Localize Images in ASP.NET

A couple of years ago, we had a graphic designer revamp our website. His results looked great, but he unfortunately introduced a new unsupported font by the web browser.
At first I was like, "What!?!"... since most of our content is dynamic and there was no real way to pre-make all of the images. There was also the issue of multiple languages (since we knew Spanish was on the horizon).
Anyway, I decided to create some classes to auto-generate images via GDI+ and programatically cache them as needed. This solved most of our initial problems. However, now that our load has increased dramatically, there has been a drain on our UI server.
Now to the question... I am looking to replace most of the dynamic GDI+ images with a standard web browser font. I am thinking of keeping some of the rendered GDI+ images and putting them in a resx file, but plan to replace most of them with Tahoma or Arial fonts via asp:Labels.
Which have you found to be a better localized image solution?
Embedding images into the resx
Only adding the image url into the resx
Some other solution
My main concern is to limit the processing on the UI server. If that is the case, would adding the image url to the resx be a better solution compared to actually embedding the image into the resx?
You should only need to generate each image once, and then save it on the hard disk. The load on your site shouldn't increase the amount of processing you have to do. That being said, it almost sounds like you are using images for things you shouldn't be. If there are so many different images that you can't keep up with generating them, it's time to abandon your fancy images for things that shouldn't be images, and go back to straight text. If the user doesn't have the specified font installed, it should just fall back to a similar looking font. CSS has good support for this.
see my response here
This can be done manually or using some sort of automated (CMS) system.
The basic method is to cache your images in a language specific directory structure and then write an HTTP handler that effectively removes the additional directory layer. eg:
/images/
/en/
header1.gif
/es/
header1.gif
In your markup or CSS you would just reference /images/header1.gif. The http hander then uses session (if language is user specific), or config (if site specific) to choose which directory to serve the image from.
This provides a clean line bewteen code and content, and allows for client side caching. Resx is great for small strings but I much prefer a system like this for images and larger content. especially on the web where it is typically easy to switch images around.
I had the same problem a few years back and our interface team pointed us to SIFr. http://wiki.novemberborn.net/sifr/
You embed your font into a Flash movie and then use the SIFr JavaScript to dynamically convert your text into your font. Because it's client-side, there is no server-side impact.
If the user doesn't have Flash or JavaScript installed, they get the closest web-friendly font.
As an added bonus: because your content is still Text -- Google can search and index the content -- a huge SEO optimization.
Because of caching, I'd rather add only the image url into the resx. Caching is much better for static content (i-e plain files ) than for generated content.
I'd be very cautious about putting text in images at all, CSS with appropriate font-family fallback is probably the correct response on accessibility and good MVC grounds.
Where generation really is required I think Kiblee and JayArr outline good solutions

Any way to build Google Docs like viewer for PDF files?

Does anyone think it is possible to build a Google Docs style PDF document viewer, which will convert a document to a format that doesn't require Adobe Reader on the client machine?
If so, any references to point to? Either a place that had done it, or an explanation of how to do it.
I've done a lot of research regarding this matter and I hope I can help.
Good old Macromedia used to market Flash Paper, which was supposed to be a PDF Adobe Reader killer as it allowed any webmaster to embed and display PDF docs online using Flash. But that was before they sold out to Adobe and Flash Paper was soon put on a shelf and forgotten in favor of Adobe's priorities.
However, Today there are a so many ground-breaking alternatives...
As a user has mentioned above you can use Scribd.com (the wanna-be YouTube for documents). But they're not the only service (and certainly not the ones most ahead of the curve).
Here are my two favorites:
Issuu (http://www.issuu.com)
Mygazines (http://www.mygazines.com/)
I enjoy Mygazines's flash user interface the most (it's also faster) but it costs $99. It's pretty impressive. Depending on what you want to do that price tag can be worth it.
Issuu however, has won me over recently with their Smartlook Platform: http://issuu.com/smartlook
Here's a sample of Smartlook setup on a website:
http://www.ismartlook.com/
Plus it's completely free, which is nice.
A third alternative, which I've considered using myself is this free and open source code made by this guy named samurajdata. He calls it psview (PostScript Viewer). Anyone can download the source code and see it in action here:
http://view.samurajdata.se/
The converted PDFs losses quality as it converts to image fie, but it's fast and easy to setup.
I hope this helps!
You may try Doconut.com looks pretty same as Google Docs viewer. It is available for asp.net 4.0, apart from PDF it can also show all office formats, tiff, dwg, psd etc.. However it is a paid library.
If I understand you correctly you only want to view these files and not edit them.
Google already makes a best effort at providing PDF files found in it's search results as HTML. This doesn't always work. You can try it out by setting up a gmail account, mailing all your PDF files to it, and then using all the "View attachment as HTML" links in the messages.
Your other options are to take the source material and make it into HTML as say LaTeX2HTML does for LaTeX documents, or to convert the PDF into one of: a raster image (tiff, DjVu, etc), or a vector image (PostScript, SVG, SWF).
If the input to this process starts with the PDF files, you have very limited options, especially if the contents of the PDFs are just raster images (say scanned pages).
Personally I'd advocate for creating the PDFs from their source and trying to use Flash Paper to create an SWF out of them too as Flash Paper will pretend to be a printer. Because some 98% of browsers have Flash 9 or greater.
Have you seen Scribd?
You can just use the Google Docs Viewer which also supports PDF documents. It allows you to embed it in your web page and point to the URL where the PDF is located (which doesn't have to be on the Google servers).
Example:
http://docs.google.com/viewer?embedded=true&url=http%3A%2F%2Fwww.domain.com%2Fdocument.pdf
There is the Internet Archive BookReader available. It's a nice book viewer implemented in javascript (jQuery), so the client doesn't need a PDF reader nor Flash. Though it needs images for the book pages, you can easily connect it to your own image server, so you may try to convert a PDF to images via ASP.NET (or any other tool like XPDF). I found that this is simpler to implement than actually implementing an images viewer.
Also, it seems to support search highlighting (try it here), but I haven't investigated exactly which metadata are needed and in what format.
The last release file contains a simple example on how to use it. More details and examples can be found in the first link.
Try converting them from PDF to TIFF. Tiff supports multiple pages and is widely supported.
If formatting isn't that important, and your PDFs are structured right (ie actually contain text, not images of text), an alternate could be to convert to HTML. The tools from Aspose are pretty good.
I'm wondering why you would want to do that. PDF is such a general and widely supported format that if you try to avoid it you're limited to:
A more obscure or less well supported format (dvi, svg until it gets better support)
Converting to text/HTML like Google does with less than perfect results
Converting to an image format like TIFF which bumps up file sizes and removes all the niceties of PDF like real, selectable text and hyperlinks
If you don't want your users to have to install Adobe Reader (understandable), there are many free lightweight PDF viewers available (Foxit Reader for example), I'm sure many of these have browser embedding capabilities.
Am I missing something here? Google Docs DOES support PDF. Simply upload the PDF file.
Some other alternatives depending upon what you're looking to do:
RAD PDF - ASP.NET component for displaying PDF documents, forms, etc. Also allows PDF searching, bookmarks, text selection, and basic editing.
Atalasoft - ASP.NET component for image viewing, but also allows PDF use as an image. Doesn't support any PDF features beyond simple viewing.

Linking directly to a SWF, what are the downsides?

Usually Flash and Flex applications are embedded on in HTML using either a combination of object and embed tags, or more commonly using JavaScript. However, if you link directly to a SWF file it will open in the browser window and without looking in the address bar you can't tell that it wasn't embedded in HTML with the size set to 100% width and height.
Considering the overhead of the HTML, CSS and JavaScript needed to embed a Flash or Flex application filling 100% of the browser window, what are the downsides of linking directly to the SWF file instead? What are the upsides?
I can think of one upside and three downsides: you don't need the 100+ lines of HTML, JavaScript and CSS that are otherwise required, but you have no plugin detection, no version checking and you lose your best SEO option (progressive enhancement).
Update don't get hung up on the 100+ lines, I simply mean that the the amount of code needed to embed a SWF is quite a lot (and I mean including libraries like SWFObject), and it's just for displaying the SWF, which can be done without a single line by linking to it directly.
Upsides for linking directly to SWF file:
Faster access
You know it's a flash movie even before you click on the link
Skipping the html & js files (You won't use CSS to display 100% flash movie anyway)
Downsides:
You have little control on movie defaults.
You can't use custom background colors, transparency etc.
You can't use flashVars to send data to the movie from the HTML
Can't use fscommand from the movie to the page
Movie proportions are never the same as the user's window's aspect ratio
You can't compensate for browser incompetability (The next new browser comes out and you're in trouble)
No SEO
No page title, bad if you want people to bookmark properly.
No plugin information, download links etc.
If your SWF connects to external data sources, you might have cross domain problems.
Renaming the SWF file will also rename the link. Bad for versioning.
In short, for a complicated application - always use the HTML. For a simple animation movie you can go either way.
You also lose external control of the SWF. When it's embedded in HTML you can use javascript to communicate with the SWF. If the SWF is loaded directly that may not be possible.
Your 100+ lines quote seems pretty high to me. The HTML that FlashDevelop generates for embedding a SWF is only around 35 lines, with an include of a single swfobject.js file. You shouldn't need to touch the js file, and at the most would only have to tweak the HTML in very minor ways to get it to do what you want.
In my experience not all browsers handle this properly. I'm not really sure why (or which browsers) but I've mistakenly sent links like this to clients on occasion and they've often come back confused. I suspect their browser prompts them to download the file instead of displaying it properly.
One upside I can think of is being able to specify GET parameters in the direct URL to the SWF, which will then be available in the Flash app (via Application.application.parameters in Flex, not sure how you'd access them in Flash CS3). This can of course be achieved by other means as well if you have an HTML wrapper but this way it's less work.
Why would you need 100+ lines of code? Using something like swfobject reduces this amout quite some (and generally you don't want to do plugin detection, etc. by hand anyway).
More advantages:
Light weight look cuz you can get rid of the header with all the tool bars that seem to accumulate there and even the scroll bar is not needed. This enhances the impact when you are trying to show a lot of action in a short flash.
The biggie: you get it in a window that you can drag larger or smaller and make the movie larger and smaller. The player will resize the movie to fill the window you have. This is great for things like group photos where everyone wants to enlarge to find themselves and their friends. I've done this for a one frame Flash production!
Downsides:
As with popups in general, if you are asking for multiple ones from the same site, and you want different size popups, the browsers tend to simply override the size you ask for in window.open and reuse whatever is up. You need to close any open popup so the window.open will do a fresh create. It gets complicated, and I have not been able to get it to work across pages in a website. Anyone who has done this successfully, pls post how!
Adobe should be ashamed of themselves with the standard embed, which defeats the puprose of convention over configuration. Check ^swfobject (as mentioned above) or swfin

What is the best way to cache a menu system locally, in the browser?

I have a very large cascading menu system with over 300 items in it. (I know it's large but it's a requirement.)
Currently, it's written in javascript so the external file is cached by browsers.
To improve search engine results I need to convert this to a css menu system.
I realize the browsers will also cache external stylesheets but,
is there a way to cache the menu content (<ul> and <li> tags)?
If I use javascript (document.write) to write the content I could have this in an external javascript file, which would be cached locally, but,
would this be search engine friendly?
What is the best solution?
The best way to accomplish what you want to do is using SiteMaps to inform Google about the urls for your web site. Basically you will want to translate your hierarchial data for the menus into a SiteMap.
You could generate the menus beforehand into static html / javascript files, and have all the pages pull the site from the same URL on your site. That way, the client side browser will do the caching. You'll just have to have a step in your deployment that generates the html files for the menu.
Try to have it generate as much plain HTML (+JS +CSS) as possible, then whatever has to be dynamic can be adjusted with javascript.
You could do the whole thing in CSS and HTML only, and you don't need yo use any Java script. See < http://www.netwiz.com.au/cssmenu.htmlvalue >. This pages shows a tool to be used with a specific documentation software, but the sample CSS and HTML shows how to use ul li elements for a CSS/HTML only menu in a large number of browsers.
You still have the problem of 300 items in the menu which will add to the loading time. If this is an issue I guess you could move this code to a separate iframe to increase the chance of it being cached at a proxy (or by the browser). At the risk of offending the purists even a frame might do the job, but you will have problems with the topic pages not being able to display the menu if they are linked to directly.

Resources