So I know you can separate your parameters in a query string through a couple different characters
(eg. www.example.com?foo=1&bar=2 or www.example.com?foo1;bar=2)
Are there any characters other than ';' and '&' that can be used to separate query parameters? Is it just general coding practice to use ';' or '&' or are there some regulations that list which characters I can use? I know in RFC 3986 the reserved characters include
";" | "/" | "?" | ":" | "#" | "&" | "=" | "+" | "$" | ","
So does this mean that any of these characters can be used to separate query parameters?
The format of the query string contents isn't part of RFC 3986 (URIs) or RFC 723* (HTTP); it's a side effect of how HTML forms work.
So if your code needs to work with HTML forms, you are restricted to what browsers do. Otherwise, in theory, you can use any format you want, as long as it's consistent with RFC 3986's definition of the "query" component.
Related
I have an Asset entity with a field called symbol. This field can basically contain any human-readable string, including special symbols.
I'd like to generate a URL with this symbol as a parameter, but without it being escaped.
For instance I have an Asset with symbol $, but it's being generated as assets/%24
I need to be able to generate it in the Twig template without escaping these characters.
I'm using Symfony 5.
$ is a reserved character as specified in the RFC2393 :
2.2. Reserved Characters
Many URI include components consisting of or delimited by, certain
special characters. These characters are called "reserved", since
their usage within the URI component is limited to their reserved
purpose. If the data for a URI component would conflict with the
reserved purpose, then the conflicting data must be escaped before
forming the URI.
reserved = ";" | "/" | "?" | ":" | "#" | "&" | "=" | "+" |
"$" | ","
If you don't mind not following this recommandation, you could try to url_decode your generated url by creating a Twig filter and use it like this :
{{ asset(...)|urldecode }}
I have an application using struts2-ognl. I have an array of strings and it gets populated by GET parameters in the request. Like the following example
http://localhost:8080/myapp/myAction.action?myArray[1]=value1&myArray[2]=value2 etc etc
Is there any way to instruct ognl that for arrays I don't want to use "[" or "]" but another set of characters ( let;s say for example "(" and ")" so the url could work like:
http://localhost:8080/myapp/myAction.action?myArray(1)=value1&myArray(2)=value2 etc etc
I'm asking this because "[" and "]" are url reserved characters. I have tried to urlEncode them but then ognl ignores those parameters.
What characters are allowed in an URL query string?
Do query strings have to follow a particular format?
Per https://www.rfc-editor.org/rfc/rfc3986
In section 2.2 Reserved Characters, the following characters are listed:
reserved = gen-delims / sub-delims
gen-delims = “:” / “/” / “?” / “#” / “[” / “]” / “#”
sub-delims = “!” / “$” / “&” / “’” / “(” / “)” / “*” / “+” / “,” / “;”
/ “=”
The spec then says:
If data for a URI component would conflict with a reserved character’s
purpose as a delimiter, then the conflicting data must be
percent-encoded before the URI is formed.
Next, in section 2.3 Unreserved Characters, the following are listed:
unreserved = ALPHA / DIGIT / “-” / “.” / “_” / “~”
Wikipedia has your answer: http://en.wikipedia.org/wiki/Query_string
"URL Encoding: Some characters cannot be part of a URL (for example, the space) and some other characters have a special meaning in a URL: for example, the character # can be used to further specify a subsection (or fragment) of a document; the character = is used to separate a name from a value. A query string may need to be converted to satisfy these constraints. This can be done using a schema known as URL encoding.
In particular, encoding the query string uses the following rules:
Letters (A-Z and a-z), numbers (0-9) and the characters '.','-','~' and '_' are left as-is
SPACE is encoded as '+' or %20[citation needed]
All other characters are encoded as %FF hex representation with any non-ASCII characters first encoded as UTF-8 (or other specified encoding)
The octet corresponding to the tilde ("~") character is often encoded as "%7E" by older URI processing implementations; the "%7E" can be replaced by"~" without changing its interpretation.
The encoding of SPACE as '+' and the selection of "as-is" characters distinguishes this encoding from RFC 1738."
Regarding the format, query strings are name value pairs. The ? separates the query string from the URL. Each name value pair is separated by an ampersand (&) while the name (key) and value is separated by an equals sign (=). eg. http://domain.com?key=value&secondkey=secondvalue
Under Structure in the Wikipedia reference I provided:
The question mark is used as a separator and is not part of the query string.
The query string is composed of a series of field-value pairs
Within each pair, the field name and value are separated by an equals sign, '='.
The series of pairs is separated by the ampersand, '&' (or semicolon, ';' for URLs embedded in HTML and not generated by a ...; see below).
W3C recommends that all web servers support semicolon separators in addition to ampersand separators[6] to allow application/x-www-form-urlencoded query strings in URLs within HTML documents without having to entity escape ampersands.
This link has the answer and formatted values you all need.
https://perishablepress.com/url-character-codes/
For your convenience, this is the list:
< %3C
> %3E
# %23
% %25
{ %7B
} %7D
| %7C
\ %5C
^ %5E
~ %7E
[ %5B
] %5D
` %60
; %3B
/ %2F
? %3F
: %3A
# %40
= %3D
& %26
$ %24
+ %2B
" %22
space %20
I have a problem with Regular Expression Validator. It takes 0-9, A-Z and prevents ' and " But it not takes lowercase alphabets. Here is my expression ^[a-z|A-Z|0-9|]+[^\"\']*$
You should use:
^[a-zA-Z0-9]+$
The | , "OR", should be used in groups ([a-z]|[A-Z|..). Also, by adding [^"']*, you allow users to enter phrases like a ##%$^&&&&*&^&*#$##$ (starting with alphanumeric character, followed by any non-quote char).
My suggested RegEx means:
^ <start>
[a-zA-Z0-9] Any alplhanumeric character, case-insensitive
+ at least once
$ <end>
If all you need is that the text contain exclusively letters and digits, you could use:
/^[a-zA-Z0-9]+$/
Notice that there is no | in the character set, all the things you put inside the [] are implicitly "or"ed together. Adding a | would allow for a literal | in the text.
Change the + to * if an empty string is valid. You don't need to exclude quotes specifically, since you're not allowing them in at all.
Shorter version of similar regexp:
^[\w]+$
Note: it will also match non-Latin characters and "_", which may be either good or bad depending on your requirements.
I have made custom xhtml valdidator in .NET(validating through dtd + some extra rules) and I have noticed a discrepancy between my validation and w3c validation.
In my validator I get the following error when there is colon in the id (let's say : id="mustang:horse")
(Error) The 'id' attribute has an invalid value according to its data type.
But I do not get any errors on w3c for this pattern.
I tried to find a list of invalid characters for an attribute in xml/xhtml but couldn't find it?
Thank you for your help,
There is a list and and it does permit colons.
The XHTML 1.0 spec says at http://www.w3.org/TR/xhtml1/#h-4.10
... in XHTML 1.0 the id attribute is defined to be of type ID ...
The XML 1.0 spec says at http://www.w3.org/TR/2008/REC-xml-20081126/#id
Values of type ID MUST match the Name production.
And the Name production is defined at http://www.w3.org/TR/2008/REC-xml-20081126/#NT-Name
[4] NameStartChar ::= ":" |
[A-Z] | "_" | [a-z] | [#xC0-#xD6] |
[#xD8-#xF6] | [#xF8-#x2FF] |
[#x370-#x37D] | [#x37F-#x1FFF] |
[#x200C-#x200D] | [#x2070-#x218F] |
[#x2C00-#x2FEF] | [#x3001-#xD7FF] |
[#xF900-#xFDCF] | [#xFDF0-#xFFFD] |
[#x10000-#xEFFFF]
[4a] NameChar ::= NameStartChar | "-" | "." |
[0-9] | #xB7 | [#x0300-#x036F] |
[#x203F-#x2040]
[5] Name ::= NameStartChar (NameChar)*
And also says above this formal definition:
Document authors are encouraged to use
names which are meaningful words or
combinations of words in natural
languages, and to avoid symbolic or
white space characters in names. Note
that COLON, HYPHEN-MINUS, FULL STOP
(period), LOW LINE (underscore), and
MIDDLE DOT are explicitly permitted.
(My emphasis)
The reason for this difference is that W3C validator doesn't seem to do namespace aware XHTML processing. Although XHTML documents need to be in the XHTML namespace, this is actually reasonable, because HTML documents are not using namespaces and the normative valid structure of XHTML documents (as HTML) is defined by a DTD file and DTDs are not actually namespace aware.
Like #Alochi already noted:
Values of type ID MUST match the Name
production.
This is true when the document is parsed as not namespace aware, but it is not true if the document needs to be namespace conformant. The Namespaces in XML specification states that IDs must match NCName production which explicitly forbids the colon character. Namespace aware parsing is a common convention and therefore using a colon in the value of a id is not recommended even though it is allowed when the document parsing is not namespace aware .
Summary: if namespaces are ignored, an ID value must be a valid Name and it can contain a colon; otherwise it must be a valid NCName and it can't contain a colon.