HTML & XHTML & XML - What is a URI in laymans terms? - uri

What is a URI in HTML, XHTML and XML?

A Uniform Resource Identifier (URI) is simply a uniform string that identifies some resource on the 'net.

Related

What standard specifies inner structure of the query component in HTTP URI

According to RFC3986 (URI),
The query component is indicated by the first question
mark ("?") character and terminated by a number sign ("#") character
or by the end of the URI.
And specifies what characters are allowed inside. That's generic URI.
In daily interaction with various HTTP/Web servers, in URI http scheme, we're seeing query components represented as key=value pairs separated by & sign. RFC7230 (HTTP/1.1) says nothing about it, just that the content of the query component corresponds to RFC3986 generic definition.
The only standard defining said key-value pairs is HTML 4.01 while talking about content type application/x-www-form-urlencoded. It's also the only standard saying + should be treated as Space character in the query component.
However, as far as I could dig up in the specs, Content-Type header only applies to the message body, not its URI. And when, as an experiment, I'm googling for "asd zxc" Chrome sends the request /search?q=zxc+asd to Google without specifying said application/x-www-form-urlencoded content type at all.
Is it just conventional or am I missing something?

How can I get the full url including characters after a # char

I have the following code:
<%#Language="VBSCRIPT"%>
<html>
<head>
<title>in/Test Page 2</title>
</head>
<body>
<%
response.write "Request.QueryString = """ & Request.QueryString & """<br />"
for each item in request.QueryString
response.write (item + " = " + Request.QueryString(item) + "<br>")
next
%>
</body>
</html>
When I call it with this url:
TestPage2.asp?id=1&turl=http://www.google.com?id=1&url=generic#index
Which produces this output
Request.QueryString = "id=1&turl=http://www.google.com?id=1&url=generic"
id = 1
turl = http://www.google.com?id=1url=generic
How do I get the bit after the # char in the original url? I've looked all through the Request.Servervariable's.
Short answer: You can't
It's by design.
From RFC 2396: Uniform Resource Identifiers (URI): Generic Syntax - Section 4.1 Fragment Identifier
When a URI reference is used to perform a retrieval action on the identified resource, the optional fragment identifier, separated from the URI by a crosshatch (“#”) character, consists of additional reference information to be interpreted by the user agent after the retrieval action has been successfully completed. As such, it is not part of a URI, but is often used in conjunction with a URI.
Basically the # is never sent to the server so you can't retrieve it from Classic ASP. The client (Internet Browser) removes the # fragment from the URI before it is sent.
Having said that it is possible to retrieve it from the client-side using JavaScript then pass the value via a form / querystring to the server where Classic ASP can retrieve and use it.

How to determine if a URI is escaped?

I am using apache commons HTTPClient to download web resources. The URI for these resources come from third parties, I do not generate them.
The commons httpclient requires a URI object to be given to the GetMethod object.
The URI constructor takes a string (for the uri) and a boolean specifying if it is escaped or not.
Currently, I am doing the following to determine if the original url I am given is already escaped...
boolean isEscaped = URIUtil.getPathQuery(originalUrl).contains("%");
m.setURI(new URI(originalUrl, isEscaped));
Is this the correct way to determine if a uri is already escaped?
Update...
according to wikipedia ( Well, according to wikipedia ( http://en.wikipedia.org/wiki/Percent-encoding ) it says that percent is a reserved character and should always be encoded... I am quoting verbatim here...
Percent-encoding the percent character[edit] Because the percent ("%")
character serves as the indicator for percent-encoded octets, it must
be percent-encoded as "%25" for that octet to be used as data within a
URI.
Doesnt this mean that you can never have a naked '%' character in a valid uri?
Also, the uri(s) come from various sources so I cannot be sure if they are escaped or unescaped.
This wouldn't work. It's possible the un-encoded string has a % in it already.
ex:
https://www.google.com/#q=like%25&safe=off
is the url for a google search for like%. In unescaped form it would be https://www.google.com/#q=like%&safe=off
Your consumers should let you know if the URI is escaped or not.

How to Show & in xml constructed from string

I am creating a web api using asp.net mvc4 and the response output is xml. Before outuptting to browser I modify the xml response so that one of the values between the start and closing tags contain a url string which may have '&'
When outputting in browser, this generates an error that xml is not well formed.
I have read from How to show & in a XML attribute That would be produced by XSLT that one can use D-O-E to generate unescaped content using xslt
but dont know how this could apply for xml generated from a string and displayed in browser
You should encode the & as
&
which is understood by XML (see http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references#Predefined%5Fentities%5Fin%5FXML)
Another alternative would be to surround the output in a CDATA tag (http://stackoverflow.com/questions/2784183/what-does-cdata-in-xml-mean)

Query string: Can a query string contain a URL that also contains query strings?

Example:
http://foo.com/generatepdf.aspx?u=http://foo.com/somepage.aspx?color=blue&size=15
I added the iis tag because I am guessing it also depends on what server technology you use?
The server technology shouldn't make a difference.
When you pass a value to a query string you need to url encode the name/value pair. If you want to pass in a value that contains a special character such as a question mark (?) you'll just need to encode that character as %3F. If you then needed to recursively pass another query string to the encoded url, you'll need to double/triple/etc encode the url resulting in the original ? turning into %253F, %25253F, etc.
you'll probably want to UrlEncode the url that is in the query string.
As reported in http://en.wikipedia.org/wiki/Query_string
W3C recommends that all web servers support semicolon separators in
addition to ampersand separators (link reported on that wiki page) to allow
application/x-www-form-urlencoded query strings in URLs within HTML
documents without having to entity escape ampersands.
So, I suppose the answer to the question is yes and you have to change in a ";" semicolon the "&" ampersand usaully used for key=value separator.
Yes it can, as far as I can tell, according to RFC 3986: Uniform Resource Identifier (URI): Generic Syntax (from year 2005):
This is the BNF for the query string:
query = *( pchar / "/" / "?" )
pchar = unreserved / pct-encoded / sub-delims / ":" / "#"
The spec says:
The characters slash ("/") and question mark ("?") may represent data within the query component.
as query components are often used to carry identifying information in the form of "key=value" pairs and one frequently used value is a reference to another URI, it is sometimes better for usability to avoid percent-encoding those characters
(But I suppose your server framework might or might not follow the specification exactly.)
No, but you can encode the url and decode it later.

Resources