itemtype with http or better https? - http

I use like:
itemtype="http://schema.org/ImageObject"
but the request http://schema.org/ImageObject will be forwarded to https://schema.org/ImageObject.
If I change to itemtype="https://schema.org/ImageObject", the Google SDTT shows no problem, but nearly all examples about structured data from Google are with http.
What is best or recommended to use http://schema.org or https://schema.org for itemtype?

From Schema.org’s FAQs:
Q: Should we write https://schema.org or http://schema.org in our markup?
There is a general trend towards using https more widely, and you can already write https://schema.org in your structured data. Over time we will migrate the schema.org site itself towards using https: as the default version of the site and our preferred form in examples. However http://schema.org -based URLs in structured data markup will remain widely understood for the forseeable future and there should be no urgency about migrating existing data. This is a lengthy way of saying that both https://schema.org and http://schema.org are fine.
tl;dr: Both variants are possible.
The purpose of itemtype URIs
Note that the URIs used for itemtype are primarily identifiers, they typically don’t get dereferenced:
If a Microdata consumer doesn’t know what the URI in itemtype="http://schema.org/ImageObject" stands for, this consumer "must not automatically dereference" it.
If a Microdata consumer does know what the URI stands for, this consumer has no need to dereference this URI in the first place.
So, there is no technical reason to prefer the HTTPS variant. User agents won’t dereference this URI (in contrast to URIs specified in href/src attributes), and users can’t click on it. I think there is only one case where the HTTPS variant is useful: if a visitor looks into the source code and copy-pastes the URI to check what the type is about.
I would recommend to stick with the HTTP variant until Schema.org switched everything to HTTPS, most importantly the URI in RDF’a initial context.

The specification of Schema for the type ImageObject indicated:
Canonical URL: http://schema.org/ImageObject
It is probably useful to refer to the canonical URL because it is the “preferred” version of the web page.

Related

RESTful URLs and folders

On the Microformats spec for RESTful URLs:
GET /people/1
return the first record in HTML format
GET /people/1.html
return the first record in HTML format
and /people returns a list of people
So is /people.html the correct way to return a list of people in HTML format?
If you just refer to the URL path extension, then, yes, that scheme is the recommended behavior for content negotiation:
path without extension is a generic URL (e.g. /people for any accepted format)
path with extension is a specific URL (e.g. /people.json as a content-type-specific URL for the JSON data format)
With such a scheme the server can use content negotiation when the generic URL is requested and respond with a specific representation when a specific URL is requested.
Documents that recommend this scheme are among others:
Cool URIs don't change
Cool URIs for the Semantic Web
Content Negotiation: why it is useful, and how to make it work
You have the right idea. Both /people and /people.html would return HTML-formatted lists of people, and /people.json would return a JSON-formatted list of people.
There should be no confusion about this with regard to applying data-type extensions to "folders" in the URLs. In the list of examples, /people/1 is itself used as a folder for various other queries.
It says that GET /people/1.json should return the first record in JSON format. - Which makes sense.
URIs and how you design them have nothing to do with being RESTful or not.
It is a common practice to do what you ask, since that's how the Apache web server works. Let's say you have foo.txt and foo.html and foo.pdf, and ask to GET /foo with no preference (i.e. no Accept: header). A 300 MULTIPLE CHOICES would be returned with a listing of the three files so the user could pick. Because browsers do such marvelous content negotiation, it's hard to link to an example, but here goes: An example shows what it looks like, except for that the reason you see the page in the first place is the different case of the file name ("XSLT" vs "xslt").
But this Apache behaviour is echoed in conventions and different tools, but really it isn't important. You could have people_html or people?format=html or people.html or sandwiches or 123qweazrfvbnhyrewsxc6yhn8uk as the URI which returns people in HTML format. The client doesn't know any of these URIs up front, it's supposed to learn that from other resources. A human could see the result of All People (HTML format) and understand what happens, while ignoring the strange looking URI.
On a closing note, the microformats URL conventions page is absolutely not a spec for RESTful URLs, it's merely guidance on making URIs that apparently are easy to consume by various HTTP libraries for some reason or another, and has nothing to do with REST at all. The guidelines are all perfectly OK, and following them makes your URIs look sane to other people that happen to glance on the URIs (/sandwiches is admittedly odd). But even the cited AtomPub protocol doesn't require entries to live "within" the collection...

Is it old-fashioned use query string for id?

I am curious if is out-of-date to use query string for id. We have webapp running on Net 2.0. When we display detail of something (can be product) we use query string like this : http://www.somesite.com/Shop/Product/Detail.aspx?ProductId=100
We use query string for reason that user can save the link somewhere and come back any time later. I suppose that we use url rewriting soon or later but in mean time I would like to know your opinion. Thanks.Cheers, X.
A common strategy is to use an item ID in the URL, coupled with some keywords that describe the item. This is good from a user's perspective, because they can easily see what a URL refers to if they save it somewhere. More importantly, it's useful from a SEO (Search Engine Optimisation) point of view, as search engines will - it is said - rate a given URL more highly if it contains the keywords someone is searching for.
You can see this approach on this very site, where the ID after 'questions' is used for the database query and the text is purely for the benefit of users and search engines.
Whether you use a straightforward query string, or a more advanced approach that makes the ID look like part of the folder path, is up to you. It's largely a matter of personal taste.
Yes, it is old fashioned!
However, if you are thinking about changing it to a RESTful implementation as others have suggested, then you should continue to support the old URL and querystring addresses by implementing an HTTP 301 redirect to forward from the querystring URLs, to the new restful URLs. This will ensure that any users old links and bookmarks will continue to work while telling the search engine bots that the url has changed.
Since your post is tagged ASP.Net, there is a good write-up on how you can support both, using the new ASP.Net routing mechanism here: http://msdn.microsoft.com/en-us/magazine/dd347546.aspx
Nothing wrong with query string parameters. Simple to create and understand. A lot of sites are using fancy urls like 'www.somesite.com/Shop/Product/white_sox_t_shirt` which is cool and sort-of user friendly, but more work for us poor developers.
Using query strings is not outdated at all, it just has to be used in the right places. However, never place anything in the query string that could be a security issue and remember that anything you read from the query string could have been modified so you should be validating all input in your checks.
It's not outdated, but anothter alternative is a more RESTful approach:
yourwebsite.com/products/100/usb-coffee-maker
The reason is that a) search engines usually ignore any URL with a QueryString (so the product.aspx?id=100 page may never get indexed) and b) having the name in the url purely for display purposes supposedly helps SEO as well.
Permanent links are best for SEO and also , what if your product moved to another database , and the ID of the product needs to be changed ?
I don't think the chances of a product's name will be changed or the manufacturer.
E.g Apple/Iphone won't change :) Seems to me a good Permalink

Any justification for an IT policy that query parameters should not be used?

My company, which builds ad server, affiliate network, contact form and CRM software was acquired last year, and we are now in the process of reworking our technology to fit the IT policies and guidelines of the parent corporation.
One of these policies is a tremendous sticking point and causing all sorts of problems for us:
No query parameters are to be used in any URL visible to the end user
This includes content URLs, ad clickthrough targets, redirects, anything which will either show up in the address bar or in a mouseover status bar update. The effect would be no affiliate ID parameters, media source tracking IDs, session IDs, CMS content selection parameters, anything. Several fundamental functions of our software simply can't be accomplished without passing parameter data from one page to another. In our case, many of these links are from different sites or subdomains, it's not possible to pass data via cookies, either
The only justification I've been given is that query parameters prevent some proxy caches from working properly. This makes no sense to me--I've never heard of such a thing--and nobody is willing or interested in discussing it at length. I've not even been given an example of what specifically is broken or why the policy was created.
In any case, this being a global corporate IT policy, in the end the reasoning doesn't matter, only compliance. Although getting it changed is most likely out of the question, I would still like to understand what valid concerns may have prompted its institution. Understanding the mindset may be a first step towards finding a workaround.
My first thought for a workaround was to embed parameters within the path portion of the URL and extract them with an Apache mod_rewrite, but this is out of the question because:
Corollary: Every URL must present unique content available through no other URL
So making multiple URLs which actually refer to the same page but contain other parameter data in the URL is also unacceptable.
Questions:
Is there valid justification for not using query parameters?
Specifically what proxies or systems fail to work when query parameters are present?
Does it possibly have something to do with SEO? The corollary makes it appear so.
What workarounds might there be for passing data from one site to another under this restriction?
i only have answer for the "workaround" question: use PATH_INFO.
edit to be more specific
instead of /banner.php?what=ever&any=thing use /banner.php/what=ever/any=thing. apache will still serve the request through /banner.php, and /what=ever/any=thing will be present in $_SERVER['PATH_INFO']. you'll have to rawurldecode and explode the string yourself since the webserver won't do that for you, but that's no big deal.

So why should we use POST instead of GET for posting data? [duplicate]

This question already has answers here:
Closed 13 years ago.
Possible Duplicates:
How should I choose between GET and POST methods in HTML forms?
When do you use POST and when do you use GET?
Obviously, you should. But apart from doing so to fulfil the HTTP protocol, are there any reasons to do so? Less overhead? Some kind of security thing?
because GET must not alter the state of the server by definition.
see RFC2616 9.1.1 Safe Methods:
9.1.1 Safe Methods
Implementors should be aware that the
software represents the user in their
interactions over the Internet, and
should be careful to allow the user to
be aware of any actions they might
take which may have an unexpected
significance to themselves or others.
In particular, the convention has been
established that the GET and HEAD
methods SHOULD NOT have the
significance of taking an action other
than retrieval. These methods ought to
be considered "safe". This allows user
agents to represent other methods,
such as POST, PUT and DELETE, in a
special way, so that the user is made
aware of the fact that a possibly
unsafe action is being requested.
If you use GET to alter the state of the server then a search engine bot or some link prefetching extension in a web browser can wreak havoc on your site and (for example) delete all user data just by following links to your site.
There is a nice paper by the W3C about this: URIs, Addressability, and the use of HTTP GET and POST.
1.3 Quick Checklist for Choosing HTTP GET or POST
Use GET if:
The interaction is more like a question (i.e., it is a safe operation such as a query, read operation, or lookup).
Use POST if:
The interaction is more like an order, or
The interaction changes the state of the resource in a way that the user would perceive (e.g., a subscription to a service), or
The user be held accountable for the results of the interaction
Because, if you use GET to alter state, Google can delete your stuff.
When do you use POST and when do you use GET?
How should I choose between GET and POST methods in HTML forms?
If you accept GETs to perform write operations then a malicious hacker could inject somewhere links to perform an unauthorized operation. Your user clicks on a link - and something is deleted from a database. Or maybe some amount of money is transferred away from the user's account if he's still logged in to their online banking.
http://superbank.com/TransferMoney?amount=1000&recipient=2342524
Send a malicious email with an embedded image referencing this link, and as soon as the document is opened, something funny has happened behind the scenes.
GET is limited by the length of URL the browser/server can handle. This used to be as short as 256 characters.
There is atleast one situation where you want a GET to change data on the server. That is when a GET returns data, and you need to record which data was given to a user and when it was given.
If you use complex data types then it must be in a POST it cannot be in a GET. For example testing a WCF web service in a browser can only be done when the contract uses simple data types.
Using GET and POST where it is expected helps to keep your program understandable.
When you use POST, you can see the information being "posted" in the address-bar of the web browser. This is [apparently] not the case when you use the GET method.
This article was somewhere on http://www.w3schools.com/ Once I've found the exact page it was on, I'll repost. :-)

Do search engines respect the HTTP header field “Content-Location”?

I was wondering whether search engines respect the HTTP header field Content-Location.
This could be useful, for example, when you want to remove the session ID argument out of the URL:
GET /foo/bar?sid=0123456789 HTTP/1.1
Host: example.com
…
HTTP/1.1 200 OK
Content-Location: http://example.com/foo/bar
…
Clarification:
I don’t want to redirect the request, as removing the session ID would lead to a completely different request and thus probably also a different response. I just want to state that the enclosed response is also available under its “main URL”.
Maybe my example was not a good representation of the intent of my question. So please take a look at What is the purpose of the HTTP header field “Content-Location”?.
I think Google just announced the answer to my question: the canonical link relation for declaring the canonical URL.
Maile Ohye from Google wrote:
MickeyC said...
You should have used the Content-Location header instead, as per:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
"14.14 Content-Location"
#MikeyC: Yes, from a theoretical standpoint that makes sense and we certainly considered it. A few points, however, led us to choose :
Our data showed that the "Content-Location" header is configured improperly on many web sites. Sometimes webmasters provide long, ugly URLs that aren’t even duplicates -- it's probably unintentional. They're likely unaware that their webserver is even sending the Content-Location header.
It would've been extremely time consuming to contact site owners to clean up the Content-Location issues throughout the web. We realized that if we started with a clean slate, we could provide the functionality more quickly. With Microsoft and Yahoo! on-board to support this format, webmasters need to only learn one syntax.
Often webmasters have difficulty configuring their web server headers, but can more easily change their HTML. rel="canonical" seemed like a friendly attribute.
http://googlewebmastercentral.blogspot.com/2009/02/specify-your-canonical.html?showComment=1234714860000#c8376597054104610625
Most decent crawlers do follow Content-Location. So, yes, search engines respect the Content-Location header, although that is no guarantee that the URL having the sid parameter will not be on the results page.
In 2009 Google started looking at URIs qualified as rel=canonical in the response body.
Looks like since 2011, links formatted as per RFC5988 are also parsed from the header field Link:. It is also clearly mentioned in the Webmaster Tools FAQ as a valid option.
Guess this is the most up-to-date way of providing search engines some extra hypermedia breadcrumbs to follow - thus allow keeping you to keep them out of the response body when you don't actually need to serve it as content.
In addition to using 'Location' rather than 'Content-Location' use the proper HTTP status code in your response depending on your reason for redirect. Search engines tend to favor permanent redirect (301) status vs temporary (302) status.
Try the "Location:" header instead.

Resources