Tool that would show all semantic data contained in a given web page [closed] - facebook-opengraph

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I'm looking for a web service, browser extension, or anything else that directly extracts any and all semantic data contained in a given web page, as long as that semantic data is following any of the myriad of modern standards used for embedding semantic information inside web pages. Somehow I couldn't find anything that works. I could find many 'semantic crawlers' but no tool that just shows what semantic data you have at hand on a given web page.
I'd be very glad getting pointers to any such tool, if one exists out there.
I can't fathom how people debug or develop their semantic harvesters without it.......
I listed some of the relevant standards as the tags for this question (see question's tags which usually show here below) but this list is not to be taken as exhaustive.
Thanks!

For some good starting points, you might consider:
Google Structured Data Testing Tool
RDFa Play (More of a testbed, but nice visuals)
GetSchema.org
Apache's Any23.org
Sindice
Sindice is perhaps the most general of these, most of the others focus on RDFa (my own bias, sorry). Your choice might depend a bit on what you consider semantic data (e.g. do you want HTML5 semantics like <title> to count? For just RDFa I have found Apache's Any23 best for my needs, with nice API, flexible formats and accurate extraction.
Good question though, I'd be curious to see what tools others most recommend. W3C has a longer list that may be slightly dated.

Yandex has tool for validating embedded semantic markup as well. There is some doc available also. It works with microdata, schema.org, opengraph, rdfa, microformats. Not just with microformats, as you may conclude from title :)
If you're looking for opensource tools there is mighty library RDFLib on Github. It does a lot and parsing in particular.
The library contains parsers and serializers for RDF/XML, N3, NTriples, N-Quads, Turtle, TriX, RDFa and Microdata.

For RDF data, there is Tim Berners-Lee’s Tabulator. A browser available as web app (resp. FLOSS JavaScript) and Firefox add-on. Howver, it seems no longer to be maintained (?).
For RDFa, there is the Firefox add-on RDFa Developer.
For RDF files linked in the page’s head, there is the Firefox-addon Semantic Radar.
Another Firefox add-on is OpenLink Data Explorer.

Related

Report generator in Qt using webkit or QML [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
in my Qt application I'd like to output PDF-reports. The reports should be slightly modifiable for a particular customer. My idea was to create PFDs using QPrinter from rendered QWebkit-view. Rendered QML could also work. Looks for me as a very easy was to create PDF-reports with very flexible layout customization (CSS/HTML or QML) and without the need to learn/introduce additional software-package into the project.
Is someone aware of such an implementation already? It should be Open-Source (free or commercial)?
My wish-list is :-)
It should be able to display images preferably provided from
client-application as QImage/QPixmal.
It should be able either to
accept all variables as QMap or query values from a Postgres DB.
It should be either embeddable as code or linked as a library
EDIT
already checked:
QtRPT - pretty experimental and unmature. Many magics, comments in Russian in code.
NCReport - Open-source code is too old, last update 2007. Doesn't compile with Qt4.8.4. New versions are provided under commercial licenses. Commercial version looks very mature, has good documentation (ca. 100p), However I'm looking for a software which renders html/QML, so we could order a HTML-developer for creating/maintaining reports.
I dont think there is something ready made exists, otherwise it would be well known, because an issue itself pretty common. As a previous answer I also wrote my own generator. It's not open source, thought.
Problem is not only in printing (as being mentioned in previous answer). It can be more or less solved as soon as you can split whole report into pages yourself. Then you can render report content with headers/footers/page numbers/etc on 'per page' basis and print them separately.
Main problem is that it's easier up to me (having in mind all options Qt provides) to develop nice report generator for particular software, rather then trying to develop something very generic. In this case you have to either limit yourself on features you can use in the report OR introduce a lot of 'magics', certain assumptions/conventions etc.
You can make some kind of generic code for some cases then your reports all have similar structure (for example - header - first page header - main table section- footer with page numbering and all other pages the same without first page header), and then it's fairly simple to make an algorithm which will nicely split you main table section into pages.. but it's all gone as soon as you start think about more complicated scenarios with graphs etc.
An engine I've done based on JS and operates using basic reports primitives (like table, graph, label) which have some layout properties and actually JS code places them on a final report. Some reports primitives can be automatically splitter between pages some not..
I have made that kind of report generator using QWebKit (Qt version 5.1). It is not open source, though.
The biggest problem is that WebKit (or any browser's layout engine that I tried) does not work very well with printing. CSS standard covers printing, but the layout engines do not implement that stuff, or implement it only partially. So if you want features like headers and footers, page numbers, support for multiple paper sizes and support for both landscape and portrait, you have to do a lot of googling and testing. Almost nothing works as expected, so workarounds need to be invented and ugly hardcoding done.
So you can make a report generator using QWebKit. It's not going to be fun and new versions of Qt and QWebKit will most probably break something. So I would recommend making the report generator a separate application so that you can use different Qt versions for the main application and the report generator. At least design the report generator so, that you can separate it from the main application later if needed.

List of mso- attributes [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I'm searching for a list of Microsoft specific CSS attributes with mso- prefix.
Any link to an offical or unofficial source would be awesome.
I just had the same difficulties to find a list of MS Office prefixed style properties. I found this link, but it was not even well organized.
So I decided to write down a list in a Gist for everyone are looking for having these troubles as well. Still not found all things, but you may find this source helpful at least as a reference on the web.
MS Office prefixed Style properties
https://gist.github.com/webtobesocial/ac9d052595b406d5a5c1
Microsoft’s information on the topic can be found via the page Microsoft® Office HTML and XML Reference. It links to an .exe file, which when executed (on Windows, of course) installs C:\ofhtml9\ofhtml9.chm. The material is rather extensive and not particularly well organized. But search for “mso-” in the Search box, and you’ll find a long table titled “Style Attributes” and containing both standard and nonstandard CSS properties.
The table “shows the style attributes used by Microsoft Office 2000”, but I’m not aware of similar information for other Office versions.
This MSO - Style Reference Sheet link has a list of 372 Microsoft specific styles which includes common ones used in email e.g. style="mso-line-height-rule:exactly;" which prevents outlook from overwriting your line-height style line-height:20px;etc. All you have to do with the link is type mso into the filter bar and it should give you the list of Microsoft specific CSS attributes with mso- prefix.
Best list I've seen of the attributes is located at Litmus.
The real issue isn't finding a list of mso- attributes though, it's the details as to how they work.
For anyone following in my footsteps you will probably have issues opening the CHM file linked by others on this thread. There is a CHM reader for the Mac that works pretty well, though it's pointless as there is hardly any useful information contained within. You can find that reader on the App Store
The closest i can find is this list:
Link to list Archive of Link to list
I assume that a ctrl+f on mso, will quickly show you all you need? Then from there you can quickly gather your list.

Do you use microformats in your web projects? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Do you use microformats in your web projects?
If yes then why?
If no then why?
If yes then for which things do you use?
Is there any alternate of microformats in HTML 5 ?
I haven't used microformats yet. Should i start to use now or not much need?
I’ve used microformats fairly extensively. The benefits I see are
access to data for robots like Google Rich Snippets
access to data for users via µF-comsuming tools like H2VX
(some) data validation
more meaningful markup, which makes me happy
(minor) disadvantages are
time; hand-coding these things can be a pita. Either add programmatically (e.g. generate from data in CMS) or make a bunch of snippets. If doing via a CMS then it’s (for me) a no-brainer
require extra attention to UI to do well (best if exposed, but that often involves custom icon etc)
Microformats work fine as-is in HTML5. There are new HTML5 elements that map well to some µF functions, notably <time>, but be warned that current µF tools generally can’t cope with HTML5’s new elements (“Tool support” slide).
Your other alternatives are HTML5’s microdata, and RDFa. Microdata is pretty nice, but quite new so doesn’t have many tools available. You can represent microformats in microdata, and the HTML5 spec has microdata versions of vCard and vCal. There’s also HTML5’s data- attribute, but that’s for private use and doesn’t encode visible data, so is probably not what you’re after.
I perceive these three as a continuum from easy but specific (microformats) to hard but capable of anything (RDFa), with microdata (for me) occupying a sweet spot in the middle. Google Rich Snippets can read data in any of these, but user tools are still playing catchup. The main benefit of any of these is making your content more usable by exposing more of the information, and for me that’s generally worth the time.
For completeness I’ve used
hCard
hCalendar
hEvent
hAtom
hReview (once? :)
XFN
plus some rels like rel-license
EDIT: I’ve written these articles on HTML5Doctor with everything you need to know ;)
Extending HTML — Microformats
Extending HTML — Microdata
HTH
You should ask Jeff Atwood.
HTML5 defines various semantic tags to mark your data:
<time>
<address>
<header>
<nav>
<footer>
<article>
<summary>
<details>
It also allows for custom data attributes starting with "data-" within elements.
There is support for microdata which is based on microformats to provide more semantic structure to individual and groups of elements.
And to answer your main question:
No, I don't microformats because I didn't see the advantages until I gave your question a serious thought. I am using the newer HTML5 elements such as time, and custom data attributes, but not microformats because the data was already structured on the backend, and for more structured and semantic access, I'd would've used RSS feeds with specific extensions and include a link to the feed within the document itself.
That said, here's why I still support microformats and believe they are awesome and will most likely start using it in the very near future. For me, it serves a very specific purpose and has to do with programmatic access to the elements within my web applications. RSS and Atom feeds provide the same data in a very structured manner, but it's an alternative view. Microformats, or any other homegrown standards can be used effectively to enhance applications.
As long as the elements are structured in a standard manner, I can build upon a shared library of reusable code across all applications that deals with commonly occurring data items such as names, addresses, contact details, telephone numbers, etc. to enhance all applications. For example, automatically linking addresses to Google Maps, or linkifying telephone numbers to use a native protocol such as tel: on the fly for mobile devices and various other enhancements that I can do.

Need recommendations for an ASP.Net compatible HTML->PDF library [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I am looking for a library to convert HTML to PDF, including styles. I would prefer it be able to accept a URL as an option, but if required, HTML can be passed in as a stream/byte array with all styles in a single file/buffer. I am using C# / ASP.Net 3.5. I have complete control of the server machine ( I can run as a service, etc... ) however, I cannot require the user to be logged in, so no application level or print driver type solutions please.
I know there exist free solutions like PDFSharp, but these still require you to format your output properly. I want the library to generate almost identical output to the source input, which this library doesn't provide ( or at least, I believe it doesn't, correct me if I am wrong ).
I also know there are a number of commercial products available. Some of them are a few thousand dollars, which is beyond my current budget for this feature. There is one library ABCPDF which seems to do exactly what I want to do, for a price I can afford to pay. That said, the lack of professionalism/polish on their website has me a bit wary.
If you have experience with ABCpdf, or can recommend a similar library ( price feature wise ), I would greatly appreciate it.
EDIT: Thanks all for the answers. Sounds like abcPDF or aspPDF are the way to go, and both are within budget.
I use and highly recommend ABCPDF for what you need. I use it to pdf complicated html reports (nested tables & lots of css, & charts). It works fantastic pointing it to a url.
As a bonus it supports the page-break-inside: avoid css.
Back when I wrote classic asp, I used their ABCUpload product, which is also fantastic.
Not a recommendation but....
Stay away from iTextSharp for HTML -> PDF. Its the only "approved" PDF gen lib my current employer allows. We've spend endless hours trying to convert HTML to PDF...its just not what its good for. iTextSharp is great for PDF versions of forms
I have used ABCPDF in the past and it is a very good product for the price.
I have had to use their support as well and they were very quick at dealing with an extremely obscure issue. Good communication and turn around time. I think it was 2 days from the time I put in a ticket to the time a solution was found and resolved.
They have also been very easy to work with with respect to licensing as we have had to buy older licenses that were no longer available (so that an existing product did not need to be retested).
Before settling on ABCPDF we tried 2 or 3 open source products but none had the flexibility or level of support we were happy with.
Here is a link to a thread that I commented on that was looking for a similar product:
Generate HTML To PDF Control for the .NET application
We use ExpertPDF's HtmlToPdf converter, which is pretty nice. Our company has an old license, but many new features have been added. It is pretty straightforward using CSS.
http://www.html-to-pdf.net/
ASPPDF supports HTML to PDF including stylesheets.
http://www.asppdf.com/index.html
But, commercial.
I was using aspPDF but from Persist Software, and it works very fine for my project.
It is very impressive, and it has support from a company developing asp components since the beggining of the ASP.
You can check very good live demos at: http://www.asppdf.com/livedemo.html

Maintain CSS styling when converting HTML to PDF in ASP.NET [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I am using ITextSharp to convert an HTML page to PDF.
However, ITextSharp prints the CSS in the STYLE declaration straight out, ignores stylesheets even when added programmatically and only listens to some inline styles (e.g. font-size and color but not background-color).
Is there something I am missing with ITextSharp, or is there a better (and free) way of doing this conversion?
Thanks in advance,
HTML / CSS support in iText / iTextSharp is very basic. It's just not the right tool to convert html to pdf.
Take a look at these solutions instead:
Create screenshot of the page with Watin-like tool
http://blog.taiki.be/index.php/2008/07/generating-screenshots-of-webpages-using-net/
http://www.codegod.de/WebAppCodeGod/Screenshot-of-Webpage-with-ASP-NET-AID398.aspx
These render html to an image. Then you can insert them in your PDF with iTextSharp.
Otherwise you could try converting HTML -> XSL-FO -> PDF, but including CSS there is a whole other thing.
Have a look at WKHTMLTOPDF. It is open source, based on webkit and free.
We wrote a small tutorial here.
Try PDF Duo .NET converter. You can apply for support if you need a special feature.
http://www.duodimension.com/html_pdf_asp.net/downloads/html_pdf_net.zip
ABCpdf provides two HTML rendering engines. One is based around the MSHTML version installed on the system. The other is based around the FireFox Gecko rendering engine.
So there's plenty of room for manouver if you want a particular set of features. It's very real world.
In terms of quality I would just say that we do get a lot of people settling on ABCpdf after trying a lot of different alternatives.
I work on the ABCpdf .NET software component so my replies may feature concepts based around ABCpdf. It's just what I know. :-)
Why not use online API? There are plenty of them available and they do the work well, which let you worry about your core work, not how to render a PDF correctly :)
You mention something "free". It depends on your usage, but most services offer free conversions to start with, ranging from 50 to 250 (and even more). Maybe it would be enough for you?
All you'd have to do then is a basic POST request to the service with your HTML data (or link), and you'll get a PDF in response.
Here are a list of API to convert HTML to PDF (not exhaustive):
PDFShift
HTML2PdfRocket
Web2PDFConvert
PDFonline
... (and many more :) )

Resources