What is a URI in HTML, XHTML and XML?
A Uniform Resource Identifier (URI) is simply a uniform string that identifies some resource on the 'net.
Related
According to RFC3986 (URI),
The query component is indicated by the first question
mark ("?") character and terminated by a number sign ("#") character
or by the end of the URI.
And specifies what characters are allowed inside. That's generic URI.
In daily interaction with various HTTP/Web servers, in URI http scheme, we're seeing query components represented as key=value pairs separated by & sign. RFC7230 (HTTP/1.1) says nothing about it, just that the content of the query component corresponds to RFC3986 generic definition.
The only standard defining said key-value pairs is HTML 4.01 while talking about content type application/x-www-form-urlencoded. It's also the only standard saying + should be treated as Space character in the query component.
However, as far as I could dig up in the specs, Content-Type header only applies to the message body, not its URI. And when, as an experiment, I'm googling for "asd zxc" Chrome sends the request /search?q=zxc+asd to Google without specifying said application/x-www-form-urlencoded content type at all.
Is it just conventional or am I missing something?
I have the following code:
<%#Language="VBSCRIPT"%>
<html>
<head>
<title>in/Test Page 2</title>
</head>
<body>
<%
response.write "Request.QueryString = """ & Request.QueryString & """<br />"
for each item in request.QueryString
response.write (item + " = " + Request.QueryString(item) + "<br>")
next
%>
</body>
</html>
When I call it with this url:
TestPage2.asp?id=1&turl=http://www.google.com?id=1&url=generic#index
Which produces this output
Request.QueryString = "id=1&turl=http://www.google.com?id=1&url=generic"
id = 1
turl = http://www.google.com?id=1url=generic
How do I get the bit after the # char in the original url? I've looked all through the Request.Servervariable's.
Short answer: You can't
It's by design.
From RFC 2396: Uniform Resource Identifiers (URI): Generic Syntax - Section 4.1 Fragment Identifier
When a URI reference is used to perform a retrieval action on the identified resource, the optional fragment identifier, separated from the URI by a crosshatch (“#”) character, consists of additional reference information to be interpreted by the user agent after the retrieval action has been successfully completed. As such, it is not part of a URI, but is often used in conjunction with a URI.
Basically the # is never sent to the server so you can't retrieve it from Classic ASP. The client (Internet Browser) removes the # fragment from the URI before it is sent.
Having said that it is possible to retrieve it from the client-side using JavaScript then pass the value via a form / querystring to the server where Classic ASP can retrieve and use it.
I am using apache commons HTTPClient to download web resources. The URI for these resources come from third parties, I do not generate them.
The commons httpclient requires a URI object to be given to the GetMethod object.
The URI constructor takes a string (for the uri) and a boolean specifying if it is escaped or not.
Currently, I am doing the following to determine if the original url I am given is already escaped...
boolean isEscaped = URIUtil.getPathQuery(originalUrl).contains("%");
m.setURI(new URI(originalUrl, isEscaped));
Is this the correct way to determine if a uri is already escaped?
Update...
according to wikipedia ( Well, according to wikipedia ( http://en.wikipedia.org/wiki/Percent-encoding ) it says that percent is a reserved character and should always be encoded... I am quoting verbatim here...
Percent-encoding the percent character[edit] Because the percent ("%")
character serves as the indicator for percent-encoded octets, it must
be percent-encoded as "%25" for that octet to be used as data within a
URI.
Doesnt this mean that you can never have a naked '%' character in a valid uri?
Also, the uri(s) come from various sources so I cannot be sure if they are escaped or unescaped.
This wouldn't work. It's possible the un-encoded string has a % in it already.
ex:
https://www.google.com/#q=like%25&safe=off
is the url for a google search for like%. In unescaped form it would be https://www.google.com/#q=like%&safe=off
Your consumers should let you know if the URI is escaped or not.
I am creating a web api using asp.net mvc4 and the response output is xml. Before outuptting to browser I modify the xml response so that one of the values between the start and closing tags contain a url string which may have '&'
When outputting in browser, this generates an error that xml is not well formed.
I have read from How to show & in a XML attribute That would be produced by XSLT that one can use D-O-E to generate unescaped content using xslt
but dont know how this could apply for xml generated from a string and displayed in browser
You should encode the & as
&
which is understood by XML (see http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references#Predefined%5Fentities%5Fin%5FXML)
Another alternative would be to surround the output in a CDATA tag (http://stackoverflow.com/questions/2784183/what-does-cdata-in-xml-mean)
Example:
http://foo.com/generatepdf.aspx?u=http://foo.com/somepage.aspx?color=blue&size=15
I added the iis tag because I am guessing it also depends on what server technology you use?
The server technology shouldn't make a difference.
When you pass a value to a query string you need to url encode the name/value pair. If you want to pass in a value that contains a special character such as a question mark (?) you'll just need to encode that character as %3F. If you then needed to recursively pass another query string to the encoded url, you'll need to double/triple/etc encode the url resulting in the original ? turning into %253F, %25253F, etc.
you'll probably want to UrlEncode the url that is in the query string.
As reported in http://en.wikipedia.org/wiki/Query_string
W3C recommends that all web servers support semicolon separators in
addition to ampersand separators (link reported on that wiki page) to allow
application/x-www-form-urlencoded query strings in URLs within HTML
documents without having to entity escape ampersands.
So, I suppose the answer to the question is yes and you have to change in a ";" semicolon the "&" ampersand usaully used for key=value separator.
Yes it can, as far as I can tell, according to RFC 3986: Uniform Resource Identifier (URI): Generic Syntax (from year 2005):
This is the BNF for the query string:
query = *( pchar / "/" / "?" )
pchar = unreserved / pct-encoded / sub-delims / ":" / "#"
The spec says:
The characters slash ("/") and question mark ("?") may represent data within the query component.
as query components are often used to carry identifying information in the form of "key=value" pairs and one frequently used value is a reference to another URI, it is sometimes better for usability to avoid percent-encoding those characters
(But I suppose your server framework might or might not follow the specification exactly.)
No, but you can encode the url and decode it later.