What is the difference between AntiXss.HtmlEncode and HttpUtility.HtmlEncode? - asp.net

I just ran across a question with an answer suggesting the AntiXss library to avoid cross site scripting. Sounded interesting, reading the msdn blog, it appears to just provide an HtmlEncode() method. But I already use HttpUtility.HtmlEncode().
Why would I want to use AntiXss.HtmlEncode over HttpUtility.HtmlEncode?
Indeed, I am not the first to ask this question. And, indeed, Google turns up some answers, mainly
A white-list instead of black-list approach
A 0.1ms performance improvement
Well, that's nice, but what does it mean for me? I don't care so much about the performance of 0.1ms and I don't really feel like downloading and adding another library dependency for functionality that I already have.
Are there examples of cases where the AntiXss implementation would prevent an attack that the HttpUtility implementation would not?
If I continue to use the HttpUtility implementation, am I at risk? What about this 'bug'?

I don't have an answer specifically to your question, but I would like to point out that the white list vs black list approach not just "nice". It's important. Very important. When it comes to security, every little thing is important. Remember that with cross-site scripting and cross-site request forgery , even if your site is not showing sensitive data, a hacker could infect your site by injecting javascript and use it to get sensitive data from another site. So doing it right is critical.
OWASP guidelines specify using a white list approach. PCI Compliance guidelines also specify this in coding standards (since they refer tot he OWASP guidelines).
Also, the newer version of the AntiXss library has a nice new function: .GetSafeHtmlFragment() which is nice for those cases where you want to store HTML in the database and have it displayed to the user as HTML.
Also, as for the "bug", if you're coding properly and following all the security guidelines, you're using parameterized stored procedures, so the single quotes will be handled correctly, If you're not coding properly, no off the shelf library is going to protect you fully. The AntiXss library is meant to be a tool to be used, not a substitute for knowledge. Relying on the library to do it right for you would be expecting a really good paintbrush to turn out good paintings without a good artist.
Edit - Added
As asked in the question, an example of where the anti xss will protect you and HttpUtility will not:
HttpUtility.HtmlEncode and Server. HtmlEncode do not prevent Cross Site Scripting
That's according to the author, though. I haven't tested it personally.
It sounds like you're up on your security guidelines, so this may not be something I need to tell you, but just in case a less experienced developer is out there reading this, the reason I say that the white-list approach is critical is this.
Right now, today, HttpUtility.HtmlEncode may successfully block every attack out there, simply by removing/encoding < and > , plus a few other "known potentially unsafe" characters, but someone is always trying to think of new ways of breaking in. Allowing only known-safe (white list) content is a lot easier than trying to think of every possible unsafe bit of input an attacker could possibly throw at you (black-list approach).

In terms of why you'd use one over the other, consider that the AntiXSS library gets released more often than the ASP.NET framework - since, as David Stratton says 'someone is always trying to think of new ways of breaking in', when someone does come up with one the AntiXSS library is much more likely to get an updated release to defend against it.

The following are the differences between Microsoft.Security.Application.AntiXss.HtmlEncode and System.Web.HttpUtility.HtmlEncode methods:
Anti-XSS uses the white-listing technique, sometimes referred to as the principle of inclusions, to provide protection against Cross-Site Scripting (XSS) attacks. This approach works by first defining a valid or allowable set of characters, and encodes anything outside this set (invalid characters or potential attacks). System.Web.HttpUtility.HtmlEncode and other encoding methods in that namespace use principle of exclusions and encode only certain characters designated as potentially dangerous such as <, >, & and ' characters.
The Anti-XSS Library's list of white (or safe) characters support more than a dozen languages (Greek and Coptic, Cyrillic, Cyrillic Supplement, Armenian, Hebrew, Arabic, Syriac, Arabic Supplement, Thaana, NKo and more)
Anti-XSS library has been designed specially to mitigate XSS attacks whereas HttpUtility encoding methods are created to ensure that ASP.NET output does not break HTML.
Performance - the average delta between AntiXss.HtmlEncode() and HttpUtility.HtmlEncode() is +0.1 milliseconds per transaction.
Anti-XSS Version 3.0 provides a test harness which allows developers to run both XSS validation and performance tests.

Most XSS vulnerabilities (any type of vulnerability, actually) are based purely on the fact that existing security did not "expect" certain things to happen. Whitelist-only approaches are more apt to handle these scenarios by default.

We use the white-list approach for Microsoft's Windows Live sites. I'm sure that there are any number of security attacks that we haven't thought of yet, so I'm more comfortable with the paranoid approach. I suspect there have been cases where the black-list exposed vulnerabilities that the white-list did not, but I couldn't tell you the details.

Related

ASP.NET Profile Provider: is it a value-add?

I've been trying to write an open-source profile provider to work against PostgreSQL (I was frustrated with the limitations and incompleteness in the other projects I'd seen available), but the documentation and examples of how people use it was surprisingly sparse. Even the SO tag for asp.net-profiles has a little over 100 questions associated.
The more I dig in to making it work, the less and less practical it seems; the value added does not seem to justify the complications associated; additionally, it only seems to work on a limited scope of web projects without a bunch of extra work.
I feel like I'm being led to the conclusion that it is not a popular technology, and that there are better ways to persist a more robust user-based information set.
Is my take on this fundamentally flawed? Is this widely used? I'm on the cusp of abandoning my profile provider as it seems to offer little of value.
I have always eschewed the ASP .NET Membership provider in favor of a custom implementation of IPrincipal for one simple reason. I've almost never needed the out-of-the-box functionality it provides.
Any custom implementation means creating your own implementation of MembershipProvider. Amongst other methods that I have never implemented, it includes wonders like RequiresQuestionAndAnswer and MaxInvalidPasswordAttempts. It forces an implementation upon you that you might not need and will take you more time to complete properly.
Sure, you could cheat and put a NotImplementedException in methods that you're not particularly bothered about, but what right-minded coder would feel comfortable with that in a production system? :D
I really like a lot of Microsoft's stuff, but my experience is that a lot of their "out-of-the-box" solutions are fine in vanilla mode, but the wheels tend to come off when you travel off the beaten path. A bit of cherry-picking is therefore required. My advice? Leave this one on the vine.
No, the Profile system in asp.net is not widely used, primarily because of the reasons you mention. It's just not useful for a lot of people.
The easiest solution is to simply create a profile table in your app, then key it on the ProviderUserkey of the Membership system.

Abstraction or not?

The other day i stumbled onto a rather old usenet post by Linus Torwalds. It is the infamous "You are full of bull****" post when he defends his choice of using plain C for Git over something more modern.
In particular this post made me think about the enormous amount of abstraction layers that accumulate one over the other where I work. Mine is a Windows .Net environment. I must say that I like C# and the .Net environment, it really makes most things easy.
Now, I come from a very different background made of Unix technologies like C and a plethora or scripting languages; to me, also, OOP is just one, and not always the best, programming paradigm.. I often struggle (in a working kind of way, of course!) with my colleagues (one in particular), because they appear to be of the "any problem can be solved with an additional level of abstraction" church, while I'm more of the "keeping it simple" school. I think that there is a very different mental approach to the problems that maybe comes from the exposure to different cultures.
As a very simple example, for the first project I did here I needed some configuration for an application. I made a 10 rows class to load and parse a txt file to be located in the program's root dir containing colon separated key / value pairs, one per row. It worked.
In the end, to standardize the approach to the configuration problem, we now have a library to be located on every machine running each configured program that calls a service that, at startup, loads up an xml that contains the references to other xmls, one per application, that contain the configurations themselves.
Now, it is extensible and made up of fancy reusable abstractions, providers and all, but I still think that, if we one day really happen to reuse part of it, with the time taken to make it up, we can make the needed code from start or copy / past the old code and modify it.
What are your thoughts about it? Can you point out some interesting reference dealing with the problem?
Thanks
Abstraction makes it easier to construct software and understand how it is put together, but it complicates fully understanding certain issues around performance and security, because the abstraction layers introduce certain kinds of complexity.
Torvalds' position is not absurd, but he is an extremist.
Simple answer: programming languages provide data structures and ways to combine them. Use these directly at first, do not abstract. If you find you have representation invariants to maintain that are at a high risk of being broken due to a large number of usage sites possibly outside your control, then consider abstraction.
To implement this, first provide functions and convert the call sites to use them without hiding the representation. Hide the data representation only when you're satisfied your functional representation is sufficient. Make sure at this time to document the invariant being protected.
An "extreme programming" version of this: do not abstract until you have test cases that break your program. If you think the invariant can be breached, write the case that breaks it first.
Here's a similar question: https://stackoverflow.com/questions/1992279/abstraction-in-todays-languages-excited-or-sad.
I agree with #Steve Emmerson - 'Coders at Work' would give you some excellent perspective on this issue.

Are computer language copyrighted? Can I make a compiler or ide or anything for any of them?

Are computer languages copyrighted or have some restrictions imposed on them how they can be used? What does that mean in practice? If so, what can be done or what cannot be done? Could I make a compiler or ide or anything for any of them?
For example for Pl/Sql?
Unfortunately, programming languages may be encumbered by patents. This appears to be the case e.g. with the Aikido language.
Just recently this seems to have become a non-issue for the C# programming language (and the .NET Common Language Infrastructure).
To answer your question regarding what can and what cannot be done: if in your implementation of the language you use an invention that somebody patented, you definitely don't want to try to make profit with your implementation in any country where the patent applies (unless you licence the tech, of course). However, if you can circumvent the patent, i.e., implement for example a compiler for the same language without using that specific trick but something else, then you should not have a problem. Patents need (well, should need) to be very specific, so this might often be possible. (IANAL, though.)
You really need to familiarize yourself with copyright. Copyright applies to works of art: writings, paintings, etc. So the programming language itself cannot be copyrighted. The text describing it usually is, but that only prevents you from copying that text - it doesn't prevent you from reading it, understanding it, and using it.
So for PL/SQL, it's probably the case that its description is copyrighted by Oracle, but that can't stop you from making compilers and IDEs. As Pukku points out: there are other kinds of intellectual property, such as patents and trade marks, which may prevent you from doing these things (or calling them PL/SQL when done), but not copyright.

Is obfuscation the best answer [duplicate]

This question already has answers here:
Closed 13 years ago.
Possible Duplicates:
How effective is obfuscation?
Protect ASP.NET Source code
(Why) should I use obfuscation?
Is obfuscation the best answer for protecting our code ?
*Specially in Web Projects when you want to deliver your web projects as libraries of code to your customer ( the person who ordered ) *
Edited
At first my priority is Server-Side Code
and second Client-Side
but the main goal is when you want to deliver a complete web project
and you made every piece of your code as components and dlls now how effective can you protect them and doesn't allow others to make your code back from them .
Edited
The problem is that I want to protect the code that I'm written to a company that they ordered , now all my code are inside some DLLs ,
Now they can reverse engineer that and get my code , I want to prevent them from doing so ,
Is there anyway to do so or not ?
I think that is a unique question , And I didn't ask for what obfuscation is nor for tools of doing this activity , further than that I think this is apart from Client-Server Security
Sorry if my question wasn't clear at first , but if that is really a case to be deleted , no problem for me
Also
Also I wanted to have a comparison look at this problem and the solutions ,
because I think obfuscation wasn't the only possible solution at this , I think we can have maybe some logical sort of workarounds about this problem
Maybe not the best. If you are really ambitious, you can write your own web server (plugin).
But is it worth the effort?
Software is similar to a bike in the Netherlands, there is no known way of protection that is 100% safe. You use either a better protection than the other bikes (thieves are lazy). Or you must obfuscate the bike so they won't take it.
Another way to increase the level of protection is to use custom made ActiveX code to store mission critical algorithms. Of course, they can be reverse engineered, but javascript is easier.
What exactly are you trying to protect your code from?
Does your client-side code contain valuable business logic?
If not: you shouldn't bother obfuscating something that doesn't have much value. Personally I think clientside code theft is a something that people are far too concerned about. 99% of web apps don't really have anything special in terms of implementation on the client side. What you need to worry about more is someone ripping off the idea or visual look, which you obviously can't obfuscate.
If it does: you need to consider refactoring that logic out of the client side, as even with heavy obfuscation, a determined party will always be able to untangle it relatively easily. The code that adds real value to your app should ideally be running on your servers where it's considerably more difficult to get access to.
Even if people stealing your html markup or javascript was a something to worry about (and it probably isn't), obfuscation doesn't really solve the problem. In my opinion it is a waste of effort and money.
Hosting a critical function as a web service is probably the most sure way to protect it. It keeps the code out of the user's hands entirely. But then you're stuck hosting a service, and your users have to be on line to use your functionality.
Obfuscators help by hiding useful names and replacing control flow with weird but logically equivalent alternatives. They might thwart an amateur, but they'll only slow down a skilled reverse engineer for a few minutes, and they won't stop someone who is determined to penetrate your secrets.
I you really want to protect your code, you should write native code using a native code compiler (C++, Delphi). This still does not guarantee that your code is 100% safe because any experience developer can read assembler and essentially disassemble the native code program.
A determined hacker will always find a way to get to what they want.
The best we can do is to make it hard or painful for the would-be hacker to get at our code and the following options can help us:
Customize the CLR engine
Run an obfuscation tool over your code and use name and control flow obfuscation and string encryption
Make the application a Web-based application where all your proprietary code sits on a server somewhere
Watermark your code using your own custom techniques to "throw off" the would-be hacker
Implement techniques to prevent debugging (this is a very advanced topic!)
I really like a comment made by one of the head developers of the .NET framework where he said that he does not feel it's really the fact that others can get at our code that should be a concern to us, but rather, we should concern ourselves with the level of support we provide with our products.
So if we provide a good support base, it does not matter what the hackers do with our code, because the clients will trust us and our ability to support them using our product and not some cheap hacker-hacked program.
NO, obfuscation is not the best way to protect your code.
The tool you need to use is "copyright".
There is no (technological) way you can protect you code from someone determined enough (provided they have access to the binaries / scripts).
What you can do is prevent them from legally modifying/distributing your code.
The normal server-side code in Web projects should under no circumstances be visible to the outside world. So there is no point in obfuscating the code.
Besides that two minior points:
Javascript code is visible to the user and can be obfuscated. Minimizing javascript to save bandwidth is recommended anyway. Minimizing js also obfuscates the code.
Also important is that on production system the configuration setting customErrors should be set to RemoteOnly or On to avoid showing a stacktrace with to much code details.
If your client side code has any broad value to others, it will get reverse engineered regardless of any obfuscation.
The reality is that it's likely not going to be broadly useful to many and there is a lot of other code out there to look at so probably not worth doing more than minifying the code which is plenty of obfuscation and if your code is large, it will improve download speed.
Have you considered the alternative? That it's a good thing to give somethings back to the community? I'm sure you've looked at the code of more than one site, no?

Best way to incorporate spell checkers with a build process

I try to externalize all strings (and other constants) used in any application I write, for many reasons that are probably second-nature to most stack-overflowers, but one thing I would like to have is the ability to automate spell checking of any user-visible strings. This poses a couple problems:
Not all strings are user-visible, and it's non-trivial to spearate them, and keep that separation in place (but it is possible)
Most, if not all, string externalization methods I've used involve significant text that will not pass a spell checker such as aspell/ispell (eg: theStrName="some string." and comments)
Many spellcheckers (once again, aspell/ispell) don't handle many words out of the box (generally technical terms, proper nouns, or just 'new' terminology, like metadata).
How do you incorporate something like this into your build procedures/test suites? It is not feasible to have someone manually spell check all the strings in an application each time they are changed -- and there is no chance that they will all be spelled correctly the first time.
We do it manually, if errors aren't picked up during testing then they're picked up by the QA team, or during localization by the translators, or during localization QA. Then we lodge a bug.
Most of our developers are not native English speakers, so it's not an uncommon problem for us. The number that slip through the cracks is so small that this is a satisfactory solution for us.
Nothing over a few hundred lines is ever 100% bug-free (well... maybe the odd piece of embedded code), just think of spelling mistakes as bugs and don't waste too much time on it.
As soon as your application matures, over 90% of strings won't change between releases and it would be a reasonably trivial exercise to compare two versions of your resources, figure out what'ts new (check them first), what's changed/updated (check next) and what hasn't changed (no need to check these)
So think of it more like I need to check ALL of these manually the first time, and I'm only going to have to check 10% of them next time. Now ask yourself if you still really need to automate spell checking.
I can think of two ways to approach this semi-automatically:
Have the compiler help you differentiate between strings used in the UI and strings used elsewhere. Overload different variants of the string datatype depending on it's purpose, and overload the output methods to only accept that type - that way you can create a fake UI that just outputs the UI strings, and do the spell checking on that.
If this is doable of course depends on the platform and the overall architecture of the application.
Another approach could be to simply update the spell checkers database with all the strings that appear in the code - comments, xpaths, table names, you name it - and regard them as perfectly cromulent. This will of course reduce the precision of the spell checking.
First thing, regarding string externalization - GNU GetText (if used properly) creates string files that are contain almost no text other then the actual content of the strings (there are some headers but its easy to cause a spell checker to ignore them).
Second thing, what I would do is to run the spell checker in a continuous integration environment and have the errors fed externally, probably through a web interface but email will also work. Developers can then review the errors and either fix them in the code or use some easy interface to let the spell check know that a misspelling should be ignored (a web interface can integrate both the error view and the spell checker interface).
If you're using java and are storing your localized strings in resource bundles then you could check the Bundle.properties files and validate the bundle strings. You could also add a special comment annotation that your processor could use to determine if an entry should be skipped.
This method will allow you to give a hint as to the locale and provide a way of checking multiple languages within the one build process.
I can't answer how you would perform the actual spell checking itself, though I think what I've presented will guid you as for the method of performing the spell checking.
Use aspell. It's a programme, it's available for unixoids and cygwin, it can be run over lots of kinds of source code. Use it.
First point, please don't put it into you build process. I would be a vengeful coder if I (meaning my computer) had to spell check all the content on the site every time I tried to debug or build a new feature. I don't even think this kind of operation belongs as a unit test (you're testing a human interface, not a computerised one).
Second point, don't write a script. You're going to have so many false positives fall through the cracks that people will stop reading the reports and you are no better off than when you started.
Third point, this is probably most easily solved by having humans do it: QA team, copy writers, beta testers, translators, etc. All the big sites with internationalised content that I've built had the same process: we took the copy from the copy writers, sent it to the translating service/agency, put it into the persistence layer, and deployed it. Testers (QA, developers, PMs, designers, etc.) would find spelling or grammatical mistakes and lodge bug reports. There is just too much red tape and pairs of eyes for that many spelling/grammar errors to slip through.
Fourth point, there will always be spelling and grammar mistakes on your page. Even major newspaper web sites haven't gotten around this and they have whole office buildings filled with editors.

Resources