Can the protocol be left off in og:image and og:url href attributes? - uri

I've been using the double-slashed, protocol-independent, version of URLs systematically whenever I can for a while now, in HTML (for href on anchors and src on images) and in JS (for XHR). Today I have been wondering if this would be possible on OpenGraph tags.
Currently I have this:
<meta property="og:image" content="http://static.example.com/image.png">
<meta property="og:url" content="http://example.com">
I have been wondering what the effect of using the following would be:
<meta property="og:image" content="//static.example.com/image.png">
<meta property="og:url" content="//example.com">
Is this allowed by the spec?
Is this allowed by (major) implementations?
Are there any obvious issues I'm not seeing?
Am I thinking about this completely wrong?
Has anyone done this or tried this before?
Okay, that's a bit too many questions, but you can see where I'm going: should I (and other developers who might chance upon this) use explicit protocols or is it okay to leave them off for og:* properties?

By adding these meta elements with the property attribute to your page, you are using RDFa (which is a serialization format of RDF). So you are participating in the Semantic Web.
The fundamental idea of the Semantic Web is using URIs to describe things represented by URIs. Some URIs represent web pages (we all know these), while other URIs represent real world or abstract things (like the person J. R. R. Tolkien, the concept of love, or the Eiffel Tower). (See this answer on how you could distinguish these.)
For example, this URI represents the physical world building (not a web page about that building):
http://dbpedia.org/resource/Eiffel_Tower
The HTTPS variant (https://dbpedia.org/resource/Eiffel_Tower) would be a totally different URI, which could, in principle, be used for something unrelated like a football or so. For RDF (in contrast to the common practice in the Web), there is no relation between a HTTP URI and its HTTPS counterpart.
So when you provide RDF statements about both URIs, it’s (at first) not clear that both are referring to the same thing. When you use an unique object for both URIs, then these could be mapped to mean the same thing. You can also explicitly specify with OWL (→ owl:sameAs) that two URIs are representing the same thing.
So, it’s not forbidden or wrong, but I’d advise to use only one of your "synonymous" URIs for a page/thing. Other people may want to make RDF statements with your URIs (→ things/pages), so it’s better that they all use the same ones.

No, it's wrong.
Why?
Facebook parser don't know how protocol use on site.

Related

Why are some XML URI's recommended by the W3C?

I dont know if my question would make sense but in the XML DTD, some websites use URI's that relate to W3.org sites, Why is this?
The DOCTYPE declaration on top of every HTML and XHTML documents defines the document type and the possible elements in the documents. They do this by referring to the DTD (Document Type Definition), which is nominally a file on the W3C website, w3.org.
Now in most cases, browsers don't even load the DTD off the W3C website. Although they're able to load and parse it (DTDs are written in SGML after all, a language all browsers know), they actually don't. They already know what's in it!
If it helps, think of it as being cached and pre-parsed.
Or to look at it in another way, the exact URL of the DTD is enough for the browser to know what is being meant. And since HTML5, specifying the DTD is actually optional; the browsers are smart enough to know what you mean even without saying which one explicitly.
Hope this answers your question.

Using Java to write RDFa

I need to automatically generate (from a database) an XHTML document marked up with RDFa or some other microformat, it doesn't matter which one. How can I best do this using Java? I have been using Jena to output RDF/XML but it doesn't do RDFa unfortunately.
The reason that Jena doesn't provide an RDFa writer is that the whole point of RDFa is to be embedded in some other (human-readable) web page. I think your main option is to use something like Velocity or Freemarker to produce the pages with embedded calls out to Jena to get the appropriate RDF statements. You'll have to handle the RDFa encoding yourself. For testing, you could read your web pages back in using an RDFa reader to see if you get back the right set of triples, but really that's only half the story. You also need to test whether the page expresses the user-intent you want by enabling inline metadata, and that's much harder to test.
If you are willing to take another step forward, there are also Grails plugins that provide easy methods to produce RDFa from domain classes in views:
http://grails.org/plugin/rdfa

Web accessibility

So we should make accessible web sites, providing alt attribute for img elements and all other stuff. But although this effects comparatively lesser number of users, I could not find any information to the issues that effects each and every user.
Let me explain. If we were to simplify matters by saying that web sites should provide the most revelant information in the least amount of time, would I be wrong? Given this axion if I were to
1 - Want to download the offline version of Acrobat Reader X. There is nothing, and I mean nothing on the site http://www.adobe.com/products/reader.html which provides a hint, link or anything to that. I have to use google to find ftp://ftp.adobe.com/pub/adobe/reader/
2 - Again trying to find the offline version of Google Chrome at http://www.google.com/chrome/ . Nothing there that may lead to http://www.google.com/chrome/eula.html?standalone=1
3 - So Internet Explorer has an addon called Web Developer Tool Bar. It is safe to assume I will find it at http://www.ieaddons.com/in/. No such luck. Have to google it again and find it at http://www.microsoft.com/downloads/en/details.aspx?FamilyID=95e06cbe-4940-4218-b75d-b8856fced535
4 - Trying to get the the Firebug addon from https://addons.mozilla.org/en-US/firefox/extensions/web-development/. Successfully navigated to web development. You can use "view all recetally added" or "view all top downloads" or "view all top rated". What if you want to view all for web development. Offcouse you sue the search!
These are just some of the situations. I guess my question would be that are these not accessibility issues?
If the issues you are are describing apply equally to say sighted users as to blind users using a screenreader, then no, they are not considered to be accessibility issues, but are perhaps broader usability issues.
If, for example, the adobe web site had no link at all to the offline version, and all users, sighted or not, had to do extra work to find it, that's a usability issue.
But if the web site had a graphic image that sighted users could see was a link to the download, but users using a screenreader did not get this information (eg. because the graphic had no ALT text, or the image was not operable via keyboard), then it's an accessibility issue.
There's certainly overlap between these; and it's often the case that usability issues are harder for disabled users to work around; but generally accessibility refers to cases where the design of a site confronts a user with a disability with additional barriers or challenges beyond those that users without a disability have to deal with.
I think it depends on your definition. Some definitions describe accessibility assuming that the correct website is known and is concerned only with the accessibility of that website. Others do describe the ease of users finding the required resource on the Web, which would encapsulate your issues above.
There are two reasons why accessibility is a failure on the web, and for these failures the technology HTML is to blame for both.
1) HTML is not self-validating. SGML does not have a direct self-validating subset and all versions of HTML < 5 are subsets of SGML. HTML5 is based upon a specification document not vested in any computer language, so its perhaps more lost.
XML does have a direct self-validating subset called schema. There are three widely recognized schema languages for XML: Schematron, Relax NG, W3C XML Schema (official).
By self-validating I mean that the language itself can be called to validate its instances without external assistance from the local parser. Without a self-validating component there is no assurance of integrity of a document's structure, and therefore there is no integrity of accessibility. In a world where web browsers will parse anything without regard for the proper well-formedness of a structure then by practice everything is acceptable completely without regard for accessibility.
2) Less obvious and more devastating is that HTML does not understand its own structure. There are two levels of structure as defined in the HTML specifications: block-level elements and inline elements. According to the specifications the difference between these two structure levels is vested primarily in the visual intention of the elements' presentation, which contradicts other language in the specifications in that HTML is a data structure and not a presentational language.
Furthermore, two levels of structure is insufficient and the actual structural definition of HTML elements exceeds a two level structure anyways without inherently stating such. For example in HTML many block-level elements may contain a 'p' element representing a paragraph, but such an element may not contain other block level elements although many other block level elements may certainly contain block level children.
At a minimum a three level structure is required to describe natural language in a manner consumable to a human audience equally without need for further accessibility assistance. In accordance with the structure defined in Mail Markup Language there would be:
Complex blocks
Simple blocks
Inline elements
Complex blocks are purely structural in that they may contain simple blocks, or in some cases other complex block elements, but will never contain inline elements or text nodes. Simple blocks will never contain complex block or simple block elements, but may contain inline elements or text nodes. Inline elements be either singletons containing nothing or will contain text nodes, but inline elements will never contain other elements.
Such a structure is self-sufficient in properly arranging and structuring content so that accessibility requirements are met immediately in a manner where violations of accessibility requirements are more costly and complex than simple conformance to the given structure. Once a sufficient structure is in place all that is missing is the meta data supplied via descriptive and well-known element names, and in some cases additional extraneous content via attributes.
If either of these two items are missing a minimum baseline for accessibility cannot be assured. When they are both missing, as with the web, then accessibility is likely a lost cause and immediate failure.
Web accessibility
Website is made up of different contents like images, texts, videos, button, etc, with combination of different colors.
Web accessibility means that people with disabilities can use the Web.
Web accessibility means that people with disabilities can perceive, understand, navigate, and interact with the Web, and that they can contribute to the Web.
Web accessibility also benefits others, including older people with changing abilities due to aging.
The main theme of web accessibility is creating a website which is accessible to every one. After designing a website it is essential to check the website ADA compliance, whether it is accessible and how much it is user friendly for disabled people.

What are cons if we do not care about validation of XHTML and CSS?

What are cons if we do not care about validation of XHTML and CSS? Errors other than CSS 3 and vendor specific properties
In terms of development time(How valid XHTML and CSS code save time to find problems?),
Code debugging (How we can track then problem quickly?),
Cross browser compatibility (How it helps us to achieve cross browser compatibility?),
Website maintainability (How it would be helpful to maintain and update for someone else?),
Future changes in website (How it would be helpful to make any changes in design if client can ask in future?),
SEO ranking (How it can affect our site's search engine ranking?)
Accessibility (Does validity of code increase accessibility of site?)
I have to explain a client's Secretary,Code validation is not just Fashion, it is beneficial for his site. I'm not just advocating of this to make more money. it's not useful only for developer it mainly beneficial for his website.
There's the obvious point that if your markup is valid, the odds of it being rendered as you want it to be by a wide variety of browsers are improved.
But separate from that, sometimes you spend valuable development time tracking down bugs (usually ones that seem specific to a given browser) only to find that the reason for the bug is that your markup is invalid and different browsers are handling the invalid markup in different ways. Validating (whether it's XHTML or HTML) saves you time tracking down those sorts of problems. There was an example here just yesterday, in fact. The OP thought he was having a weird Firefox-specific jQuery problem. In fact, he just had invalid markup, and fixing the markup fixed his problem.
So I'm thinking that you tell the client that validation saves time, and therefore money.
Note that this is an argument for validating, not for proclaiming validity (via icons and such).
I found some very good answers here
http://validator.w3.org/docs/why.html
http://ianpouncey.com/weblog/2010/01/web-accessibility-myths/
Using markup improperly -- not
according to specification -- hinders
accessibility. Misusing markup for a
presentation effect (e.g., using a
table for layout or a header to change
the font size) makes it difficult for
users with specialized software to
understand the organization of the
page or to navigate through it.
Furthermore, using presentation markup
rather than structural markup to
convey structure (e.g., constructing
what looks like a table of data with
an HTML PRE element) makes it
difficult to render a page
intelligibly to other devices (refer
to the description of difference
between content, structure, and
presentation).
http://www.w3.org/TR/WAI-WEBCONTENT/#gl-structure-presentation

Best practice for preventing saving malicious client script in HTML

We have an ASP.NET custom control that lets users enter HTML (similar to a Rich text box). We noticed that a user can potentially inject malicious client scripts within the <script> tag in the HTML view. I can validate HTML code on save to ensure that I remove any <script> elements.
Is this all I need to do? Are all other tags other than the <script> tag safe? If you were an attacker, what else would you attempt to do?
Any best practices I need to follow?
EDIT - How is the MS anti Xss library different from the native HtmlEncode for my purpose?
XSS (Cross Site Scripting) is a big a difficult subject to tackle correctly.
Instead of black-listing some tags (and missing some of the ways you may be attacked), it is better to decide on a set of tags that are OK for your site and only allowing them.
This in itself will not be enough, as you will have to catch all possible encodings an attacker might try and there are other things an attacker might try. There are anti-xss libraries that help - here is one from Microsoft.
For more information and guidance, see this OWASP article.
Have a look at this page:
http://ha.ckers.org/xss.html
to get an idea of different XSS attacks that somebody may try.
There's a whole lot to do when it comes to filtering out JavaScript from HTML. Here's a short list of some of the bigger points:
Multiple passes over the input is required to make sure that what you removed before doesn't create a new injection. If you're doing a single pass, things like <scr<script></script>ipt>alert("XSS!");</scr<script></script>ipt> will get past you since after your remove <script> tags from the string, you'll have created a new one.
Strip the use of the javascript: protocol in href and src attributes.
Strip embedded event handler attributes like onmouseover/out, onclick, onkeypress, etc.
White lists are safer than black lists. Only allow tags and attributes that you know are safe.
Make sure you're dealing with all the same character encoding. If you treat the input like ASCII (single byte) and the input has Unicode (multibyte) characters, you're going to get a nasty surprise.
Here's a more complete cheat sheet. Also, Oli linked to a good article at ha.ckers.org with samples to test your filtration.
Removing only the <script> tags will not be sufficient as there are lots of methods for encoding / hiding them in input. Most languages now have anti-xss and anti-csrf libraries and functions for filtering input. You should use one of these generally agreed upon libraries to filter your user input.
I'm not sure what the best options are in ASP.NET, but this might shed some light:
http://msdn.microsoft.com/en-us/library/ms998274.aspx
This is called a Cross Site Scripting (XSS) attack. They can be very hard to prevent, as there are a lot of surprising ways of getting JavaScript code to execute (javascript: URLs, sometimes CSS, object and iframe tags, etc).
The best approach is to whitelist tags, attributes, and types of URLs (and keep the whitelist as small as possible to do what you need) instead of blacklisting. That means that you only allow certain tags that you know are safe, rather than banning tags that you believe to be dangerous. This way, there are fewer possible ways for people to get an attack into your system, because tags that you didn't think about won't be allowed, rather than blacklisting where if you missed something, you will still have a vulnerability. Here's an example of a whitelist approach to sanitization.

Resources