Handling http query string inside PWP HTML Page in Prolog - http

I'm currently working on a project that demand a few web application written in Prolog, and I choosed to use the famous SWI-Prolog PWP library. Which parses a script with prolog queries inside an HTML file.
I have a page responding to the following request example:
/user?id=N
Where N is a integer value.
But I'm having trouble to read the query string ID of the request inside the HTML file.
I have the .pl file:
showUser(UserId, Request) :-
reply_pwp_file(mydir('user_page.html'), [mime_type('text/html')], Request).
I don't know how I can read the UserId or the Request to retrieve again the UserId in the query strings.
I tried this way in the HTML markup:
<span pwp:ask="http_parameters(Request, [id(UserId, [optional(true)])])." pwp:use="UserId" />
Someone had this kind of trouble before?
Thank you very much.
Here's some interesting links that may help us:
PWP/SGML Pages
SWI-Prolog HTTP Library

It took to me some time, but at least I've been able to run the demo_pwp.pl that I found in ~/pl-devel/packages/http/examples. Now, after
?- server(1234).
I open the URL
http://localhost:1234/user_id.pwp?user_id=1&user_name=carlo
where I wrote in ~/pl-devel/packages/http/examples/pwp/user_id.pwp file
<?xml version="1.0"?>
<!DOCTYPE html>
<html xmlns:pwp="http://www.cs.otago.ac.nz/staffpriv/ok/pwp.pl">
<head>
<title>Context variables for PWP scripts</title>
</head>
<body>
<p>This PWP demo lists the context-parameters that are passed into
the script.
</p>
<ul>
<li pwp:ask="member(Name=Value, CONTEXT)">
<span class=name pwp:use="Name"/>
=
<span class=value pwp:use="writeq(Value)"/>
</li>
</ul>
<!-- here is the specific part for my answer -->
<p pwp:ask="memberchk('QUERY'=Q, CONTEXT),memberchk(user_id=UID,Q),memberchk(user_name=NAME,Q)">
UID : <span pwp:use="UID"/> / NAME : <span pwp:use="NAME"/>
</p>
<!-- nested access is well thought -->
<p pwp:ask="member('QUERY'=Q,CONTEXT)">
UID : <span pwp:use="UID" pwp:ask="member(user_id=UID,Q)"/>
/ NAME : <span pwp:use="NAME" pwp:ask="member(user_name=NAME,Q)"/>
</p>
</body>
</html>
(that's a copy of context.pwp, with added my info at bottom)
and I get
This PWP demo lists the context-parameters that are passed into the script.
...
- QUERY = [user_id='1',user_name=carlo]
...
UID : 1 / NAME : carlo
UID : 1 / NAME : carlo
Then I can confirm that the guidelines that Giulio suggested are ok.

This is really out of the blue, since I've not churned Prolog in a long time, but I'm slightly amused at the effort of writing web applications in Prolog, and because I sympathize (long story short: I tried myself years ago, but it wasn't pure Prolog) I figured I could just take my chance at pointing you what I noticed by reading the documentation. Its clarity and extensivity, by the way, are not the reason why PWP is "famous", I presume.
However, buried somewhere in the PWP page you linked there is a blurb about the attribute pwp:use, that's said to take a Term as its value.
Term is a Prolog term; variables in Term are bound by the context. An empty Term is regarded as a missing value for this attribute. The Prolog variable CONTEXT refers to the entire context, a list of Name = Value, where Name is a Prolog atom holding the name of the context variable and Value is an arbitrary Prolog term.
Buried somewhere else, namely the documentation page for reply_pwp_page/3 (oh, there's no reply_pwp_file/3 up there in the page you linked, really, even if you used it) there's another interesting snippet listing the contents of the so-called initial context, and in particular:
QUERY [is a] Var=Value list representing the query-parameters
Since there is no hint or suggestion or even example about the use of the query parameters list - but that's hardly the worst problem for one that's forced to write web applications in Prolog anyway - my personal take is that the name for query parameter id is just id (hoping that Var is just a misname for Param, not a real Prolog variable) and that the value is, well, just the value, but then again we know nothing about conversions or whatever may happen automatically during the parsing of the query string, since in the query string everything is, well, a string, but you may need a numeric id, and you are probably left on your own converting that string to a number. I guess there's some magical predicate doing exactly that, somewhere. Ain't Prolog wonderful?
So, without any other clue, and with lots of thanks for those writing the documentation of this... stuff, my wild guess is that you need somewhere the following element, an empty span nonetheless, which is illegal in any reasonably valid HTML document:
<span pwp:ask="..."/>
where, as the ask value, you should provide a query that traverse the CONTEXT list (by means of member/2, maybe?) until it finds a term of the form 'QUERY'=QueryParameters; then in QueryParameters you should have the actual query parameters list, so you need to traverse it in the same fashion as the CONTEXT list before, and when you find a term of the form id=N here you finally are, N should contain the value of your hardly earned user id.
Now, I really hope it's way simpler that what I have outlined. Remember, it's just a wild guess by looking at the documentation you pointed to. But, while others will be quite probably busy down-voting this answer for a number of reasons (and hopefully because it's plain wrong, and the solution is way simpler), my last, parting suggestion is for you to discuss the constraints of your project again with whoever is in charge of them, because writing web applications in Prolog is really an unreasonable thing to do when there are plenty of frameworks (frameworks, I say, not just some module thrown into the standard library for the "greater good") written in other languages that are incredibly well documented, much simpler to understand and, of course, to use.

Related

Do MIME types REQUIRE the subtype to be specified

The question is clear enough, but let me flesh it out with the actual example I am encountering:
In interpreting an RSS feed, there is an image specified sometimes where it is not known what kind of image it is, but it is nonetheless clear that the link type is an image, such as in this example:
<itunes:image href="http://static1.doda.com/57914/1500w/" />
That iTunes spec does not enter the image type attribute for the image. So then let's say I am taking this image link (with others) and re-syndicating it, but now as a standard ATOM link. To specify it is an image, the type attribute of the link needs to start with image (as an image MIME type), and yet what should I do about the subtype:
<link href="http://static1.doda.com/57914/1500w/" type="image" /> //??
I'm guessing MIME types can never do this, but is that the case? Can the subtype ever be left off? Another solution is not so happy either: to enter a FALSE but common subtype (e.g.: "image/jpeg").
--- update ---
Thanks to Julien for that research. So one solution just occurred to me: using a generic subtype of a given value such as: unknown, or none.
image/unknown
image/none
Maybe just go with 'unknown', but 'none' has it's benefits as well. This might be especially useful if a lot of people started using one (or both) of these values when they don't know the subtype.
I would love to hear in the comments how that idea strikes you guys, yea or nea? Good idea or bad?
Yes, in theory, the RFC4287 says:
Whatever a media type is, it contains at least one slash
atomMediaType = xsd:string { pattern = ".+/.+" }
Now, from experience, there are many feeds out there which do not include the sub-type. Be robust: if you publish a feed, make sure you include both... and if you consume feeds, be ready to handle feeds where it's missing!

How secured is the simple use of addslashes() and stripslashes() to code contents?

Making an ad manager plugin for WordPress, so the advertisement code can be almost anything, from good code to dirty, even evil.
I'm using simple sanitization like:
$get_content = '<script>/*code to destroy the site*/</script>';
//insert into db
$sanitized_code = addslashes( $get_content );
When viewing:
$fetched_data = /*slashed code*/;
//show as it's inserted
echo stripslashes( $fetched_data );
I'm avoiding base64_encode() and base64_decode() as I learned their performance is a bit slow.
Is that enough?
if not, what else I should ensure to protect the site and/or db from evil attack using bad ad code?
I'd love to get your explanation why you are suggestion something - it'll help deciding me the right thing in future too. Any help would be greatly appreciated.
addslashes then removeslashes is a round trip. You are echoing the original string exactly as it was submitted to you, so you are not protected at all from anything. '<script>/*code to destroy the site*/</script>' will be output exactly as-is to your web page, allowing your advertisers to do whatever they like in your web page's security context.
Normally when including submitted content in a web page, you should be using htmlspecialchars so that everything comes out as plain text and < just means a less then sign.
If you want an advertiser to be able to include markup, but not dangerous constructs like <script> then you need to parse the HTML, only allowing tags and attributes you know to be safe. This is complicated and difficult. Use an existing library such as HTMLPurifier to do it.
If you want an advertiser to be able to include markup with scripts, then you should put them in an iframe served from a different domain name, so they can't touch what's in your own page. Ads are usually done this way.
I don't know what you're hoping to do with addslashes. It is not the correct form of escaping for any particular injection context and it doesn't even remove difficult characters. There is almost never any reason to use it.
If you are using it on string content to build a SQL query containing that content then STOP, this isn't the proper way to do that and you will also be mangling your strings. Use parameterised queries to put data in the database. (And if you really can't, the correct string literal escape function would be mysql_real_escape_string or other similarly-named functions for different databases.)

What's the correct format for TCDL linkAttributes?

I can see the technology-independent Tridion Content Delivery Language (TCDL) link has the following parameters, which are pretty well described on SDL Live Content.
type
origin
destination
templateURI
linkAttributes
textOnFail
addAnchor
VariantId
How do we add multiple attribute-value pairs for the linkAttributes? Specifically, what do we use to escape the double quotes as well as separate pairs (e.g. if we need class="someclass" and onclick="someevent").
The separate pairs are just space delimited, like a normal series of attributes. Try XML encoding the value of linkAttributes however. So, " become &quote;, etc...
If you are using some Javascript, you might take care of the Javascript quotes too, as in \".
Edit: after I figured out your real question, the answer is a lot simpler:
You should wrap the values inside your linkAttributes in single quotes. Spaces inside linkAttributes are typically handled fine; but if not, escape then with %20.
If you need something more or want something that isn't handled by the standard tcdl:ComponentLink, remember that you can always create your own TCDL tag and and use a TagHandler or TagRenderer (look them up in the docs for examples or search for Jaime's article on TagRenderer) to do precisely what you want.
My original answer was to a question you didn't ask: what is the format for TCDL tags (in general). But the explanation might still be useful to some, so remains below.
I'd suggest having a look at what format the default building blocks (e.g. the Link Resolver TBB in the Default Finish Actions) output and use that as a guide line.
This is what I could quickly get from the transport package of a published page:
<tcdl:Link type="Page" origin="tcm:5-199-64" destination="tcm:5-206-64"
templateURI="tcm:0-0-0" linkAttributes="" textOnFail="true"
addAnchor="" variantId="">Home</tcdl:Link>
<tcdl:ComponentPresentation type="Embedded" componentURI="tcm:5-69"
templateURI="tcm:5-133-32">
<span>
...
One of the things that I know from experience: your entire TCDL tag will have to be on a single line (I wrapped the lines above for readability only). Or at least that is the case if it is used to invoke a REL TagRenderer. Clearly the tcdl:ComponentPresentation tag above will span multiple lines, so that "single line rule" doesn't apply everywhere.
And that is probably the best advice: given the fact that TCDL tags are processed at multiple points in Tridion Publishing, Deployment and Delivery pipeline, I'd stick to the format that the default TBBs output. And from my sample that seems to be: put everything on a single line and wrap the values in (double) quotes.

Are there any anti-XSS libraries for ASP.Net?

I was reading some questions trying to find a good solution to preventing XSS in user provided URLs(which get turned into a link). I've found one for PHP but I can't seem to find anything for .Net.
To be clear, all I want is a library which will make user-provided text safe(including unicode gotchas?) and make user-provided URLs safe(used in a or img tags)
I noticed that StackOverflow has very good XSS protection, but sadly that part of their Markdown implementation seems to be missing from MarkdownSharp. (and I use MarkdownSharp for a lot of my content)
Microsoft has the Anti-Cross Site Scripting Library; you could start by taking a look at it and determining if it fits your needs. They also have some guidance on how to avoid XSS attacks that you could follow if you determine the tool they offer is not really what you need.
There's a few things to consider here. Firstly, you've got ASP.NET Request Validation which will catch many of the common XSS patterns. Don't rely exclusively on this, but it's a nice little value add.
Next up you want to validate the input against a white-list and in this case, your white-list is all about conforming to the expected structure of a URL. Try using Uri.IsWellFormedUriString for compliance against RFC 2396 and RFC 273:
var sourceUri = UriTextBox.Text;
if (!Uri.IsWellFormedUriString(sourceUri, UriKind.Absolute))
{
// Not a valid URI - bail out here
}
AntiXSS has Encoder.UrlEncode which is great for encoding string to be appended to a URL, i.e. in a query string. Problem is that you want to take the original string and not escape characters such as the forward slashes otherwise http://troyhunt.com ends up as http%3a%2f%2ftroyhunt.com and you've got a problem.
As the context you're encoding for is an HTML attribute (it's the "href" attribute you're setting), you want to use Encoder.HtmlAttributeEncode:
MyHyperlink.NavigateUrl = Encoder.HtmlAttributeEncode(sourceUri);
What this means is that a string like http://troyhunt.com/<script> will get escaped to http://troyhunt.com/<script> - but of course Request Validation would catch that one first anyway.
Also take a look at the OWASP Top 10 Unvalidated Redirects and Forwards.
i think you can do it yourself by creating an array of the charecters and another array with the code,
if you found characters from the array replace it with the code, this will help you ! [but definitely not 100%]
character array
<
>
...
Code Array
& lt;
& gt;
...
I rely on HtmlSanitizer. It is a .NET library for cleaning HTML fragments and documents from constructs that can lead to XSS attacks.
It uses AngleSharp to parse, manipulate, and render HTML and CSS.
Because HtmlSanitizer is based on a robust HTML parser it can also shield you from deliberate or accidental
"tag poisoning" where invalid HTML in one fragment can corrupt the whole document leading to broken layout or style.
Usage:
var sanitizer = new HtmlSanitizer();
var html = #"<script>alert('xss')</script><div onload=""alert('xss')"""
+ #"style=""background-color: test"">Test<img src=""test.gif"""
+ #"style=""background-image: url(javascript:alert('xss')); margin: 10px""></div>";
var sanitized = sanitizer.Sanitize(html, "http://www.example.com");
Assert.That(sanitized, Is.EqualTo(#"<div style=""background-color: test"">"
+ #"Test<img style=""margin: 10px"" src=""http://www.example.com/test.gif""></div>"));
There's an online demo, plus there's also a .NET Fiddle you can play with.
(copy/paste from their readme)

Create XML object from poorly formatted HTML

I want to make an XML document from an HTML one so I can use the XML parsing tools. My problem is that my HTML is not guaranteed to be XHTML nor valid. How can I bypass the exceptions? In this string <p> is not terminated, nor is <br> nor <meta>.
var poorHtml:String = "<html><meta content=\"stuff\" name=\"description\"><p>Hello<br></html>";
var html:XML = new XML(poorHtml);
TypeError: Error #1085: The element type "meta" must be terminated by the matching end-tag "</meta>".
I did some searching and couldn't come up with anything except this doesn't really seem possible, the major issue is how should it correct when the format is not valid.
In the case of browsers, every browser does this based upon it's own rules of what should happen in the case that the closing tag isn't found (put it in wherever it would cause the code to produce a valid XML and subsequently DOM tree, or self terminate the tag, or remove the tag, or for the case that a closing tag was found with no opening how should this be handled, what about unclosed attributes etc.).
Unfortunately I don't know of anything in the specification that explains what should be done in this case, with XHTML just like how flex treats it these are fatal errors and result in no functionality rather than how HTML4 treated it with the quirky and transitional DTD options.
To avoid the error or give better error messaging you can use this:
var poorHtml:String = "<html><meta content=\"stuff\" name=\"description\"><p>Hello<br></html>";
try
{
var html:XML = new XML(poorHtml);
}
catch(e:TypeError)
{
trace("error caught")
}
but it's likely you'll be best off using some sort of server side script to validate the XML or correct the XML before passing it over to the client.
There is probably an implementation of HTML Tidy in just about any language you might happen to be working with. This looks promising for your sitation: http://code.google.com/p/as3htmltidylib/
If you don't want to drag in a whole library (I wouldn't), you could just write your own XML parser that handles errors in whatever way suits you (I'd suggest auto-closing tags until the document makes sense again, ignoring end tags with no start tags, maybe un-closing certain special tags such as "body" and "html"). This has the added advantage that you can optimize it for whatever jobs you need it for, i.e. by storing a list of all elements with the attribute "href" as you come to them.
You could try to pass your HTML through HTML Tidy on the server before loading it. I believe that HTML Tidy does a good job at cleaning up broken HTML.

Resources