Regular expression to replace & with & in XML file - asp.net

In an XML file, I am capturing a long list of URLS from a web page, using regex (in .NET). Within the captured URLS, I simply need to substitute '&' for all '&' that are located within the URLS. How do I do this?

If you do this and save the results, you'll be left with invalid xml. If you are using a real xml parser the & will be correctly returned as & at the time you read it.
If you insist on proceeding, a simple String.Replace("&", "&") on each url should suffice.

Rather than a regex I'd suggest using the String.Replace("&","&") string function.

Why don't you do just:
string.Join("\n", System.IO.File.ReadAllLines("file.txt")).Replace("&", "&");
?

Related

Is it better to use a "?" or a ";" in a URL?

In my application, I redirect an HTTP request and also pass a parameter. Example:
http://localhost:9000/home;signup=error
Is it better to use a ; or shall I use a ? i.e. shall I do http://localhost:9000/home;signup=error or http://localhost:9000/home?signup=error?
Are the above two different from each other semantically?
The ? is a reserved character; I have read that this is both valid and invalid, but I have used it for 'slugs' when templating.
Should you choose to use it, percent-encode the query string using %3F which is not human readable, but will produce the ?. (An encoder is recommended)
Perhaps you will find a more suitable solution for your redirects by adding an .htaccess file to your project.

How to build a dynamic URL with ampersands

I have to build a dynamic url in XQuery. I have a hardcoded url. Part of the url should be built from a variable. How do I build that url? I am not able to use concat in XQuery because the first half of the url has special characters (ampersands).
The value ceiatlpaqer055.coxinc.com in the below url is dynamic and should be populated from a different variable. How Do I build this url in XQuery?
The URL is given below:
http://axiomweb604.testinc.com:8080/arsys/forms/axiom_7_6_4/SHR%3ALandingConsole/Default+Administrator+View/?mode=search&F304255500=AST:ComputerSystem&F1000000076=FormOpenNoAppList&F303647600=SearchTicketWithQual&F304255610=%27Name%27%3D%22ceiatlpaqer055.coxinc.com%22
& is a special character, so you have to escape it. Using the XML entity syntax this can be replaced by &. So you can in fact concat this string, e.g. by doing
"http://axiomweb604.testinc.com:8080/arsys/forms/axiom_7_6_4/SHR%3ALandingConsole/Default+Administrator+View/?mode=search&F304255500=AST:ComputerSystem&F1000000076=FormOpenNoAppList&F303647600=SearchTicketWithQual&F304255610=%27Name%27%3D%22" || "ceiatlpaqer055.coxinc.com" || "%22"
or
concat("http://axiomweb604.testinc.com:8080/arsys/forms/axiom_7_6_4/SHR%3ALandingConsole/Default+Administrator+View/?mode=search&F304255500=AST:ComputerSystem&F1000000076=FormOpenNoAppList&F303647600=SearchTicketWithQual&F304255610=%27Name%27%3D%22", "ceiatlpaqer055.coxinc.com", "%22")

How to Show & in xml constructed from string

I am creating a web api using asp.net mvc4 and the response output is xml. Before outuptting to browser I modify the xml response so that one of the values between the start and closing tags contain a url string which may have '&'
When outputting in browser, this generates an error that xml is not well formed.
I have read from How to show & in a XML attribute That would be produced by XSLT that one can use D-O-E to generate unescaped content using xslt
but dont know how this could apply for xml generated from a string and displayed in browser
You should encode the & as
&
which is understood by XML (see http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references#Predefined%5Fentities%5Fin%5FXML)
Another alternative would be to surround the output in a CDATA tag (http://stackoverflow.com/questions/2784183/what-does-cdata-in-xml-mean)

Parameter separator in URLs, the case of misused question mark

What I don't really understand is the benefit of using '?' instead of '&' in urls:
It makes nobody's life easier if we use a different character as the first separator character.
Can you come up with a reasonable explanation?
EDIT: after more research I found that "&" can be a part of file name (terms&conditions.html) so "?" is a good separator. But still I think using "?" for separators makes lives easier (from url generators and parsers point of view):
Is there any advantage in using "&" which is not clear at the first glance?
From the URI spec's (RFC 3986) point of view, the only separator here is "?". the format of the query is opaque; the ampersands just are something that HTML happens to use for form submissions.
The answer's pretty much in this article - http://www.skorks.com/2010/05/what-every-developer-should-know-about-urls/ . To highlight it, here goes :
Query is the preferred way to send some parameters to a resource on
the server. These are key=value pairs and are separated from the rest
of the URL by a ? (question mark) character and are normally separated
from each other by & (ampersand) characters. What you may not know is
the fact that it is legal to separate them from each other by the ;
(semi-colon) character as well. The following URLs are equivalent:
http://www.blah.com/some/crazy/path.html?param1=foo&param2=bar
http://www.blah.com/some/crazy/path.html?param1=foo;param2
The RFC 3896 (https://www.ietf.org/rfc/rfc3986.txt) defines general and sub delimiters ... '?' is a general, '&' and ';' are sub. The spec is pretty clear about that.
In this case the latter '?' chars would be treated as part of the query. If the query parser follows the spec strictly, it would then pass the whole query on to the app-destination. If the app-destination could choose to further process the query string in a manner which treats the ? as a param name-value pairs delimiter, that is up to the app's designers.
My guess is that this often 'just works' because code that splits query strings and the original uri uses all delimiters for matching: 1) first query is split on '?' then 2) query string is parsed using char match list that includes '?' (convenience only).... This could be occurring in ubiquitous parsing libraries already.

Encoding apostrophe

i am building up a string on the server that is getting put into a javascript variable on the client.
what is the best of encoding this to avoid any issues
right now on the server i am doing something like this:
html = html.Replace("'", "'");
but i assume there is a more elegant fool proof way of doing stuff like this.
You're really better off using the Microsoft Anti-Cross Site Scripting Library to do this. They provide a JavaScriptEncode method that does what you want:
Microsoft.Security.Application.AntiXss.JavaScriptEncode("My 'Quotes' and ""more"".", False)
html = html.Replace("'", "%27");
I'm not sure in which context you're using this string, but \' might be what you're looking for. The backslash is an escape character and allows you to use certain characters that can't otherwise be present in a string literal. This is what the output JavaScript should look like:
alert('It\'s amazing');
Of course, you could use alert("It's amazing"); in this particular case.
Anyway, if you're building JavaScript code:
html = html.Replace("'", "\\'");
On the other hand, there are other characters besides apostrophes that need some processing. Using the Microsoft Anti-Cross Site Scripting Library would get all of them at once.
I found that the AntiXSS library was not able to accomplish what I was looking for, which was to encode server side and decode in javascript.
Instead I used Microsoft.JScript.dll which allows you to:
GlobalObject.escape(string);
and on the client side in javascript:
unescape(string);
The characters that you need to escape in a string value are the backslash and the character used as string delimiter.
If apostrophes (') are used as string delimiter:
html = html.Replace(#"\", #"\\").Replace("'", #"\'");
If quotation marks (") are used as string delimiter:
html = html.Replace(#"\", #"\\").Replace(#"""", #"\""");
If you don't know which delimiter is used, or if it may change in the future, you can just escape both:
html = html.Replace(#"\", #"\\").Replace("'", #"\'").Replace(#"""", #"\""");

Resources