Common Lisp package for parsing invalid HTML? [closed] - common-lisp

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
As a learning exercise, I'm writing a web scraper in Common Lisp. The (rough) plan is:
Use Quicklisp to manage dependencies
Use Drakma to load the pages
Parse the pages with xmls
I've just run into a sticking point: the website I'm scraping doesn't always produce valid XHTML. This means that step 3 (parse the pages with xmls) doesn't work. And I'm as loath to use regular expressions as this guy :-)
So, can anyone recommend a Common Lisp package for parsing invalid XHTML? I'm imagining something similar to the HTML Agility Pack for .NET ...

The "closure-html" project (available in Quicklisp) will recover from bogus HTML and produce something with which you can work. I use closure-html together with CXML to process arbitrary web pages, and it works nicely. http://common-lisp.net/project/closure/closure-html/

For next visitors: today we have Plump: https://shinmera.github.io/plump
Plump is a parser for HTML/XML like documents, focusing on being lenient towards invalid markup. It can handle things like invalid attributes, bad closing tag order, unencoded entities, inexistent tag types, self-closing tags and so on. It parses documents to a class representation and offers a small set of DOM functions to manipulate it. You are free to change it to parse to your own classes though.
and them we have other libs to query the document, like lquery (jquery-like) or CLSS (simple CSS selectors) by the same author.
We also now have a little tutorial on the Common Lisp Cookbook: https://lispcookbook.github.io/cl-cookbook/web-scraping.html
See also Common Lisp wiki: http://www.cliki.net/Web

Duncan, so far I've been successful using Clozure Common Lisp under both Ubuntu Linux and Windows (7 & XP), so if you're looking for an implementation that will work anywhere you might try this one.

Related

Analyze partial or corrupted QR codes [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
How can I analyze broken/partial QR codes? Normally a QR decoder will just tell you that the data can not be read. This is not very useful. Even though the code is not readable, some information can, presumably, be extracted!
Is the finder patterns found?
Is the timing pattern found?
What is the version?
What is the error level?
What is the mask?
Is the format intact?
What is the mode?
Is the stop pattern found after the correct length?
Is there any meaningful data?
How can I extract this information from broken/partial QR codes?
This is a question that comes up in many ways; some easier than others.
To answer your direct question: The tool you need: Your brain.
Software can help but to decode partial or misprinted codes takes some work. It is like detective work. You need to take what you have and fill in what you know about the way they are created in the first place, then make educated guesses for the win.
Here is a tour of the concept. By looking at these articles most of the items on your bullet-point list will be answered.
This article explains the overall format in good detail:
Wounded QR Codes
For instance, here is the first image in the article about formatting:
Here is a real-world example of the process of decoding a partial image:
Decoding a partial QR code
It begins with the challenge image
Then shows you the order of bits that are encoded:
Then through the process of detective work to produce the final image:
Here is a different problem. You have a full image but it won't scan properly so you have to decode it by hand:
Decoding small QR codes by hand
It starts out with a tattoo:
Which is in the wrong orientation, and also won't scan properly.
So you work through the decoding process:
Yielding the final result: Maci Clare Peltz
Have fun detecting!
You can simply hack some open source code like zxing to print out its progress on a command line during decoding and in that way see how far it got. Just sprinke in a few System.out.println() statements.
The problem is false positives. It will almost always find at least 3 regions that look like a QR code's finder patterns; it always takes the 3 most likely candidates. They usually are phantoms since you're usually not looking at a QR code. The next step would then fail, finding valid version info. (In a very unlikely case it would even find phantom version info.)
Some of these aspects you mention aren't necessarily detected by a library since they don't have to be, like timing pattern and stop pattern (which isn't required for short data).
Aside from those caveats, should be easy.

How can I visualize Fortran (90 or later) source code, e.g. using Graphviz? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
Improve this question
I've been thrown into a large Fortran project with a large number of source files.
I need to contribute to this project and it would seem prudent that I first understand the source.
As a first step, I'd like to visualize the interdependences between the various source files, i.e. which source files need which modules. As far as I can tell, automated methods exist for other languages and result in a graph that can be built using Graphviz.
But is anyone aware of software out there that can do this for Fortran 90 code?
[Searching the interwebs for Fortran help is a real pain as you end up searching the inter-cobwebs thanks to the painfully ubiquitous FORTRAN 77.]
I would recommend doxygen, which automatically generates documentation from source code (and is free). Usually you add some markup to comments describing your functions and variables. However, you can just run doxygen on undocumented source files, provided you set EXTRACT_ALL to YES in the configuration file, and have it create create relationship diagrams for all your functions (i.e. this function call these functions and is called by these other functions).
You need GraphViz installed to get diagrams generated and have the HAVE_DOT option set to YES in the configuration file.
See the doxygen documentation for graphs and diagrams for more information and this example class documentation for a example of the output generated.
Edit: Of course for Fortran you should set the OPTIMIZE_FOR_FORTRAN option to YES in the configuration file.
If you have money then Understand for Fortran is worth looking at. If you don't have money but intend to work quickly then you might get by with a trial download of the software.
For a static call graph, I've never found a free tool as useful as Understand; it's hard to find any free tools let alone a useful one. I'd write one myself but the market would be tiny :-(
For a dynamic call graph investigate your compiler options. I use the Intel Fortran Compiler which can generate a mound of useful information about an executing program. The TotalView debugger can also visualise the call graph of an executing program. You should also look at gprof2dot which makes a DOT file out of a GPROF call 'graph'. This is free and OK.
And I should also add, though it's not something I've ever used, that Callgrind may be of use.
You can use callgrind from within Valgrind:
valgrind --tool=callgrind [your program]
This will produce a callgrind.out.[pid] file. This works best if you compile your program without optimisations, and with debug flags.
You then have a couple of options for viewing the data:
Convert the callgrind output to a .dot file with grof2dot, and then view it with xdot, or convert it to a static graph with GraphViz.
View it directly with Kcachegrind (includes source analysis, and call graphs).

CSS regression tool? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I'm looking for a visual regression testing tool for CSS refactoring and see whether or not there are any unintended cascading behavior in a website.
Ideally, the tool that can crawl a website (even locally) and grab snapshots of each page and store it in a single repository.
When run for the second time, it will show the pages that are visually different since the last time it was run.
Even better:
if it can show the overlapper XOR view of the 2 version of the page.
compare rendering results of different browsers (almost like an automated Microsoft Expression Web compare feature).
My current favorite is WebDriverCSS in combination with BrowserStack Automate API. This pair of tools allows for multi-platform, multi-browser regression testing across the very wide range of devices that BrowserStack supports. It requires writing code but is much more comprehensive than any solution bound to Phantom or Slimer.
If you are ok with an old WebKit being your only test UA, here's a great writeup on CSS regression testing using PhantomCSS. Their basic example provides exactly what the original question desired: visual diffs between two commits.
For a simpler tool that requires no coding (only YAML config), I point people towards Wraith more often than PhantomCSS. Give #ericcraio's answer a vote if you like Wraith and don't want to write Casper code.
I know this question has been posted for awhile but I wanted to mention about a new CSS regression tool called wraith by bbc-news.
http://github.com/bbc-news/wraith
It utilizes tools such as phantomJS and imagemagick.
http://responsivenews.co.uk/post/56884056177/wraith
Check out Browser Shots. This is a free service.
There are some restrictions on how many tests you can run each day as a free user. But unlike Litmus; you can run tests on all supported browsers--Litmus only allows free users to test their websites on Internet Explorer 7 and Mozilla Firefox 2.
I am developing a CSS regression testing tool which is called SUCCSS, it is a npm global, open source: https://github.com/B2F/Succss. Atm, you can read its full documentation there: http://succss.ifzenelse.net
Check out Litmus.
It'll crawl your site and take screen captures has damn near every browser you'd want.
In addition to the core functionality Litmus also allows you to to track bugs, log in to private sites, and allows you to publish compatibility reports from your tests.
What you've described is precisely what Mogotest does. We can log into your site, take screenshots for all the pages you've configured, and do automated comparison using the principles of Web Consistency Testing.
We also keep a full track of history so we can tell you exactly when something broke (and what your site looked at that time) and even cooler, we can detect when you've fixed something. And finally, we snapshot your code at each test run so we can show you exactly what changed for each issue.
Sorry for the self-promoting nature of this answer. I just wanted to be thorough in addressing what you're looking for.

How to document a ASP.net website? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have released a ASP.net website.
How to document it ?
Is there any tools available in VS2008?
How it can be achieved ? Please help?
Any automatically-generated documentation is useless, in my opinion. Unless you are ready to take your time and describe high-level decisions, structure, code organization and other issues personally, you can safely omit this part.
As it has mentiond good documentation can't be automated. So you can use MS Word. And for any kinds of diagrams I would use MS Visio.
I found this tool, they offer a free trial version. I never used it. Maybe it will help you.
http://www.innovasys.com/products/dx2008/overview.aspx
Quotes from the site:
"Document! X automates the process of creating and maintaining documentation for a wide range of solution elements."
"With Document! X, documentation can be automatically produced throughout design and development without requiring investment of developer resources, providing development teams with an accurate and up to date reference and allowing new developers to jump the learning curve of new components and schemas. Document! X makes producing documentation a natural and productive activity for developers and technical writers alike."
This is a quote from other site about the same tool:
"New features included in Document! X 2008 include compatibility with Microsoft Visual Studio 2008, documentation of ASP.NET Ajax Javascript and new templates to replicate the fresh look and feel of the Microsoft Visual Studio 2008 documentation."
What do you need to document?
The design? You can use sandcastle to generate a code file from the XML comments in your source code. Providing a detailed description of design choices and architecture can't be automated and requires time to document. Provide workflows where necessary to explain processes. You might want to split this document into high level design and detailed design, providing an overview of functionality and then a detailed description of the design. Don't replicate or explain the actual code per se (i.e. "using an integer counter, loop through..."), that's what the source is for.
The application usage? Again, this is something that you will need to spend timing writing. Hopefully you already have a functional specification and use cases for the application and can leverage these to write a user document.

Tool for generating railroad diagram used on json.org [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I love the syntax of railroad diagrams on json.org which are a graphical representation of the BNF language. I haven't found any tools that can produce results as eloquently.
Can anyone identify the tool used to generate these diagrams?
There is an Online Railroad Diagram Generator. It creates SVG syntax diagrams, also known as railroad diagrams, from context-free grammars specified in EBNF. You can copy the SVG code or take screen shots.
You have to type in the grammar and it'll make the diagram.
For example, to create the first railroad diagram you show, you would use the code:
object ::= '{' ((string ':' value ) ( ',' string ':' value )*)? '}'
Then you could go on to define string and value using string ::= ... and value ::= ... The references are all shown.
Check out some of the example diagrams on the page. They have XML and even EBNF itself.
from Douglas Crockford
to Aleem B
date Tue, Apr 28, 2009 at 6:01 PM
subject Re: Railroad Diagrams on json.org
I drew them with Visio. Creative
Docs.NET also works well.
--
Aleem B wrote:
Hello Douglas,
I thoroughly enjoy most things you put
out there and the railroad diagrams on
json.org are no different. I have been
trying to look around for a tool that
would generate diagrams nearly as
eloquent but have had no luck:
Tool for generating railroad diagram used on json.org
Is there some tool you used to convert
the BNF to these diagrams or were they
hand crafted?
-- Aleem
Tab Atkins Jr. created a Javascript Railroad-diagram Generator using svg specifically because he could not find one that had the visual appeal he wanted , i.e. "the JSON.org look".
There was a similar question a few days ago:
What is a good tool for creating railroad diagrams?
That question was about how railroad diagrams in the SQLite syntax diagrams were generated. The accepted answer found that the diagrams were generated using a DSL written in Tcl.
Another answer offered a suggestion to use a diagram generator which works off of EBNL grammar.
I have been looking also for the tools used to generate these Syntax Diagrams and if possible in js library so it can be edited and displayed without awaiting a boring time for a graphic to come.
I know there are tools out there but I would say that the generator from bottlecaps.de has a nice graphic with color option. Unfortunately I could not get source code of the tool it self there.
I went also to the related questions of answers here but got only followings where we can get it as open source js library and provided with an online demo where we can try and play with.
railroad diagram generator from tabatkins, in js (Syntax exists as a Python library as well)
js-sequence-diagrams from bramp, in js but UML (Syntax is generated via bottlecaps.de)
umlClass from GoJS, in js but found only for UML
One of the things that IBM's railroad track generator handles well is default values. I have not seen another generator that does this.
An example is
┌─────◀────┐┌(──«defaults»─)─┐
▶▶─COMMAND┴«argument»┴┼────────────────┼──────────────────────▶◀
│ ┌────◀─────┐ │
└(┴┬«option»┬┴┬─┬┘
└Help────┘ └)┘
I found J-algo tool. I think it is very easy to draw diagram but I can't export to image or another type.
http://j-algo.binaervarianz.de/index.php
Take a look at http://code.google.com/p/html-railroad-diagram/ which generates HTML railroad diagrams. There is an example that shows the JSON railroad generated in an HTML page by JavaScript with links.
I seem to remember that IBM has a tool that builds such diagrams as part of their BookMaster SGML suite. Railroad diagrams are often used in mainframe documentation.

Resources