Is it insecure to execute code via an HTTP URL? - r

I'm suspicious of the installation mechanism of Bioconductor. It looks like it is just executing (via source()) the R script from an HTTP URL. Isn't this an insecure approach vulnerable to a man-in-the-middle attack? I would think that they should be using HTTPS. If not, can someone explain why the current approach is acceptable?

Yes, you are correct.
Loading executable code over a cleartext connection is vulnerable to a MITM.
Unless loaded over HTTPS where SSL/TLS can be used to encrypt and authenticate the connection, or unless the code has been signed and verified at the client then a MITM attacker could alter the input stream and cause arbitrary code to be executed on your system.

Allowing code to execute via a HTTP GET request essentially means you're allowing user-input to be directly processed by the application thus directly influencing the behavior of the application. Whilst this is often what the developer would like (say to query specific information from a database) it may be exploited in ways as you have already mentioned (E.g MITM). This is often (however I'm not directly referring to Bioconductor in any way) a bad idea as it opens the system to possible XSS/(B)SQLi attacks amongst others.
However the URL - http://bioconductor.org/biocLite.R is essentially just a file placed on the Web Server and from what is seems source() is being used to directly download it. There does not seem to be any user-input anywhere in this example so no, I wouldn't mark is as unsafe; however your analogy is indeed correct.
Note: This is simply referring to GET requests - E.g: http://example.com/artists/artist.php?id=1. Such insecurities could be exploited in many HTTP requests such as Host Header attacks, however the general concept is the same. No user-input should ever be directly processed by the application in any way.

Related

CSRF protection while making use of server side caching

Situation
There is a site at examp.le that costs a lot of CPU/RAM to generate and a more lean examp.le/backend that will perform various tasks to read, write and serve user-specific data for authenticated requests. A lot of resources could be saved by utilizing a server side cache on the examp.le site but not on examp.le/backend and just asynchronously grab all user-specific data from the backend once the page arrives at the client. (Total loading time may even be lower, despite the need of an additional request.)
Threat model
CSRF attacks. Assuming (maybe foolishly) that examp.le is reliably safeguarded against XSS code injection, we still need to consider scripts on malicious site exploit.me that cause the victims browser to run a request against examp.le/backend with their authorization cookies included automagically and cause the server to perform some kind of data mutation on behalf of the user.
Solution / problem with that
As far as I understand, the commonly used countermeasure is to include another token in the generated exampl.le page. The server can verify this token is linked to the current user's session and will only accept requests that can provide it. But I assume caching won't work very well if we are baking a random token into every response to examp.le..?
So then...
I see two possible solutions: One would be some sort of "hybrid caching" where each response to examp.le is still programmatically generated but that program is just merging small dynamic parts to some cached output. Wouldn't work with caching systems that work on the higher layers of the server stack, let alone a CDN, but still might have its merits. I don't know if there is a standard ways or libraries to do this, or more specifically if there are solutions for wordpress (which happens to be the culprit in my case).
The other (preferred) solution would be to get an initial anti-CSRF token directly from examp.le/backend. But I'm not quite clear in my understanding about the implications of that. If the script on exploit.me could somehow obtain that token, the whole mechanism would make no sense to begin with. The way I understand it, if we leave exploitable browser bugs and security holes out of the picture and consider only requests coming from a non-obscure browser visiting exploit.me, then the HTTP_ORIGIN header can be absolutely trusted to be tamper proof. Is that correct? But then that begs the question: wouldn't we get mostly the same amount of security in this scenario by only checking authentication cookie and origin header, without throwing tokens back and forth?
I'm sorry if this question feels a bit all over the place, but I'm partly still in the process of getting the whole picture clear ;-)
First of all: Cross-Site Scripting (XSS) and Cross-Site Request Forgery (CSRF) are two different categories of attacks. I assume, you meant to tackle CSRF problem only.
Second of all: it's crucial to understand what CSRF is about. Consider following.
A POST request to exampl.le/backend changes some kind of crucial data
The request to exampl.le/backend is protected by authentication mechanisms, which generate valid session cookies.
I want to attack you. I do it by sending you a link to a page I have forged at cats.com\best_cats_evr.
If you are logged in to exampl.le in one browser tab and you open cats.com\best_cats_evr in another, the code will be executed.
The code on the site cats.com\best_cats_evr will send a POST request to exampl.le/backend. The cookies will be attached, as there is not reason why they should not. You will perform a change on exampl.le/backend without knowing it.
So, having said that, how can we prevent such attacks?
The CSRF case is very well known to the community and it makes little sense for me to write everything down myself. Please check the OWASP CSRF Prevention Cheat Sheet, as it is one of the best pages you can find in this topic.
And yes, checking the origin would help in this scenario. But checking the origin will not help, if I find XSS vulnerability in exampl.le/somewhere_else and use it against you.
What would also help would be not using POST requests (as they can be manipulated without origin checks), but use e.g. PUT where CORS should help... But this quickly turns out to be too much of rocket science for the dev team to handle and sticking to good old anti-CSRF tokens (supported by default in every framework) should help.

Is HTTP Conditional GET reliable to detect if web contents has been modified?

I'd like to ask a question regarding HTTP Conditional GET. Is this actually reliable to detect changes of a web resource?
I mean, if I write a program to check if the page content is changed by using HTTP Conditional GET, is it possible that the web server is misconfigured (or intentionally configured) to return there are no changes even though the contents of the HTML or XML (Restful) has changed?
(I'm referring to requesting a web page with a header "If-Modified-Since" as part of the GET request. So, is the modified date that comes back is always reliable?)
Of course it is possible. But the whole point of using a communication protocol, is that you trust that the other side is fulfilling it.
Usually, the situations like the one you mention are called "byzantine", because one of the ends is not following the protocol or failing, but cheating.
Yes, it is possible for a server to say that the content hasn't changed even though it has. It is still just code running on the server so it can do anything it wants.

What is the difference between REST and HTTP protocols?

What is the REST protocol and what does it differ from HTTP protocol ?
REST is a design style for protocols, it was developed by Roy Fielding in his PhD dissertation and formalised the approach behind HTTP/1.0, finding what worked well with it, and then using this more structured understanding of it to influence the design of HTTP/1.1. So, while it was after-the-fact in a lot of ways, REST is the design style behind HTTP.
Fielding's dissertation can be found at http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm and is very much worth reading, and also very readable. PhD dissertations can be pretty hard-going, but this one is wonderfully well-described and very readable to those of us without a comparable level of Computer Science. It helps that REST itself is pretty simple; it's one of those things that are obvious after someone else has come up with it. (It also for that matter encapsulates a lot of things that older web developers learnt themselves the hard way in one simple style, which made reading it a major "a ha!" moment for many).
Other application-level protocols as well as HTTP can also use REST, but HTTP is the classic example.
Because HTTP uses REST, all uses of HTTP are using a REST system. The description of a web application or service as RESTful or non-RESTful relates to whether it takes advantage of REST or works against it.
The classic example of a RESTful system is a "plain" website without cookies (cookies aren't always counter to REST, but they can be): Client state is changed by the user clicking a link which loads another page, or doing GET form queries which brings results. POST form queries can change both server and client state (the server does something on the basis of the POST, and then sends a hypertext document that describes the new state). URIs describe resources, but the entity (document) describing it may differ according to content-type or language preferred by the user. Finally, it's always been possible for browsers to update the page itself through PUT and DELETE though this has never been very common and if anything is less so now.
The classic example of a non-RESTful system using HTTP is something which treats HTTP as if it was a transport protocol, and with every request sends a POST of data to the same URI which is then acted upon in an RPC-like manner, possibly with the connection itself having shared state.
A RESTful computer-readable (i.e. not a website in a browser, but something used programmatically) system would obtain information about the resources concerned by GETting URI which would then return a document (e.g. in XML, but not necessarily) which would describe the state of the resource, including URIs to related resources (hypermedia therefore), change their state through PUTting entities describing the new state or DELETEing them, and have other actions performed by POSTing.
Key advantages are:
Scalability: The lack of shared state makes for a much more scalable system (demonstrated to me massively when I removed all use of session state from a heavily hit website, while I was expecting it to give a bit of extra performance, even a long-time anti-session advocate like myself was blown-away by the massive gain from removing what had been pretty slim use of sessions, it wasn't even why I had been removing them!)
Simplicity: There are a few different ways in which REST is simpler than more RPC-like models, in particular there are only a few "verbs" that are ever possible, and each type of resource can be reasoned about in reasonable isolation to the others.
Lightweight Entities: More RPC-like models tend to end up with a lot of data in the entities sent both ways just to reflect the RPC-like model. This isn't needed. Indeed, sometimes a simple plain-text document is all that is really needed in a given case, in which case with REST, that's all we would need to send (though this would be an "end-result" case only, since plain-text doesn't link to related resources). Another classic example is a request to obtain an image file, RPC-like models generally have to wrap it in another format, and perhaps encode it in some way to let it sit within the parent format (e.g. if the RPC-like model uses XML, the image will need to be base-64'd or similar to fit into valid XML). A RESTful model would just transmit the file the same as it does to a browser.
Human Readable Results: Not necessarily so, but it is often easy to build a RESTful webservice where the results are relatively easy to read, which aids debugging and development no end. I've even built one where an XSLT meant that the entire thing could be used by humans as a (relatively crude) website, though it wasn't primarily for human-use (essentially, the XSLT served as a client to present it to users, it wasn't even in the spec, just done to make my own development easier!).
Looser binding between server and client: Leads to easier later development or moves in how the system is hosted. Indeed, if you keep to the hypertext model, you can change the entire structure, including moving from single-host to multiple hosts for different services, without changing client code at all.
Caching: For the GET operations where the client obtains information about the state of a resource, standard HTTP caching mechanisms allow both for statements that the resource won't meaningfully change until a certain date at the earliest (no need to query at all until then) or that it hasn't changed since the last query (send a couple hundred bytes of headers saying this rather than several kilobytes of data). The improvement in performance can be immense (big enough to move the performance of something from the point where it is impractical to use to the point where performance is no longer a concern, in some cases).
Availability of toolkits: Because it works at a relatively simple level, if you have a webserver you can build a RESTful system's server and if you have any sort of HTTP client API (XHR in browser javascript, HttpWebRequest in .NET, etc) you can build a RESTful system's client.
Resiliance: In particular, the lack of shared state means that a client can die and come back into use without the server knowing, and even the server can die and come back into use without the client knowing. Obviously communications during that period will fail, but once the server is back online things can just continue as they were. This also really simplifies the use of web-farms for redundancy and performance - each server acts like it's the only server there is, and it doesn't matter that its actually only dealing with a fraction of the requests from a given client.
REST is an approach that leverages the HTTP protocol, and is not an alternative to it.
http://en.wikipedia.org/wiki/Representational_State_Transfer
Data is uniquely referenced by URL and can be acted upon using HTTP operations (GET, PUT, POST, DELETE, etc). A wide variety of mime types are supported for the message/response but XML and JSON are the most common.
For example to read data about a customer you could use an HTTP get operation with the URL http://www.example.com/customers/1. If you want to delete that customer, simply use the HTTP delete operation with the same URL.
The Java code below demonstrates how to make a REST call over the HTTP protocol:
String uri =
"http://www.example.com/customers/1";
URL url = new URL(uri);
HttpURLConnection connection =
(HttpURLConnection) url.openConnection();
connection.setRequestMethod("GET");
connection.setRequestProperty("Accept", "application/xml");
JAXBContext jc = JAXBContext.newInstance(Customer.class);
InputStream xml = connection.getInputStream();
Customer customer =
(Customer) jc.createUnmarshaller().unmarshal(xml);
connection.disconnect();
For a Java (JAX-RS) example see:
http://bdoughan.blogspot.com/2010/08/creating-restful-web-service-part-45.html
REST is not a protocol, it is a generalized architecture for describing a stateless, caching client-server distributed-media platform. A REST architecture can be implemented using a number of different communication protocols, though HTTP is by far the most common.
REST is not a protocol, it is a way of exposing your application, mostly done over HTTP.
for example, you want to expose an api of your application that does getClientById
instead of creating a URL
yourapi.com/getClientById?id=4
you can do
yourapi.com/clients/id/4
since you are using a GET method it means that you want to GET data
You take advantage over the HTTP methods: GET/DELETE/PUT
yourapi.com/clients/id/4 can also deal with delete, if you send a delete method and not GET, meaning that you want to dekete the record
All the answers are good.
I hereby add a detailed description of REST and how it uses HTTP.
REST = Representational State Transfer
REST is a set of rules, that when followed, enable you to build a distributed application that has a specific set of desirable constraints.
It is stateless, which means that ideally no connection should be maintained between the client and server.
It is the responsibility of the client to pass its context to the server and then the server can store this context to process the client's further request. For example, session maintained by server is identified by session identifier passed by the client.
Advantages of Statelessness:
Web Services can treat each method calls separately.
Web Services need not maintain the client's previous interaction.
This in turn simplifies application design.
HTTP is itself a stateless protocol unlike TCP and thus RESTful Web Services work seamlessly with the HTTP protocols.
Disadvantages of Statelessness:
One extra layer in the form of heading needs to be added to every request to preserve the client's state.
For security we may need to add a header info to every request.
HTTP Methods supported by REST:
GET: /string/someotherstring:
It is idempotent(means multiple calls should return the same results every time) and should ideally return the same results every time a call is made
PUT:
Same like GET. Idempotent and is used to update resources.
POST: should contain a url and body
Used for creating resources. Multiple calls should ideally return different results and should create multiple products.
DELETE:
Used to delete resources on the server.
HEAD:
The HEAD method is identical to GET except that the server MUST NOT return a message-body in the response. The meta information contained in the HTTP headers in response to a HEAD request SHOULD be identical to the information sent in response to a GET request.
OPTIONS:
This method allows the client to determine the options and/or requirements associated with a resource, or the capabilities of a server, without implying a resource action or initiating a resource retrieval.
HTTP Responses
Go here for all the responses.
Here are a few important ones:
200 - OK
3XX - Additional information needed from the client and url redirection
400 - Bad request
401 - Unauthorized to access
403 - Forbidden
The request was valid, but the server is refusing action. The user might not have the necessary permissions for a resource, or may need an account of some sort.
404 - Not Found
The requested resource could not be found but may be available in the future. Subsequent requests by the client are permissible.
405 - Method Not Allowed
A request method is not supported for the requested resource; for example, a GET request on a form that requires data to be presented via POST, or a PUT request on a read-only resource.
404 - Request not found
500 - Internal Server Failure
502 - Bad Gateway Error

How can I reliably detect if Flash was the originator of a request to a service?

I need to be able to detect if flash was the originator of a request to an ASP.NET service. The reason being that Flash is unable to process SOAP messages when the response status code is something other than 200. However, I allow exception to bubble up through our SOAP web services and as a result the status code for a SOAP server fault is 500. Before Flash 10 I was able to check the referrer property and if it ended in .SWF I changed the status code to 200 so that our Flex application could process the SOAP messages appropriately. But since the introduction of Flash 10 the referrer is no longer sent. I would like to use the x-flash-version header, but it seems to only be sent when using IE, not FF.
Which brings me to my question: How can I reliably detect if Flash was the originator of a request to a service?
You cannot reliably do this - after all, it could be a proxy, or someone may have snooped your Flash component's traffic to work out how to reuse your API without whatever restrictions the Flash version wouldn't have.
For a basic sanity check to differentiate the output, then you could just as simply add a flag to say "Flash API version please"; But with all HTTP communications, it is relatively trivial to fake whatever is required.
How about http://domain.com/path/to/target?flash=true? If all you are doing is changing the api or returning different errors you don't need a secure detection method.
Edit: Note, this is definitely not "reliable" but do you truly need a reliable detection method or one that merely works? This works, it's just not secure and if you need it to be secure you are doing something wrong because it's impossible to know what client is actually in use.
You can check the user agent (but it could be faked), Flash uses something like "Adobe Flash"
The most secure way (of the easy options presented) is to Regex match the referrer URL which will have .swf in it.
That would be a heck of a lot harder to spoof than a query string/form param of &flash=true. It's certainly hackable using hacker tools that can send false HTTP headers (referrer) but out of the options presented it takes the most effort.

How to post a file to an image hosting service in .NET?

Scenario:
localhost receives the current HttpRequest with 3 hidden inputs and a posted file. I must then forward this form data to an external image host and get the response.
See the System.Net.WebClient and related classes. You can use them to create a request to the remote server and handle the response. Also get Fiddler to help you replicate what the browser sends.
I hate doing this. It wastes my server's bandwidth and ties up IIS threads as well as using my server's CPU. It sucks and it's worth avoiding at all cost. Many services like, one that comes to mind is fliqz, provide a mechanism such that the files are uploaded directly from the client to their server (bypassing yours) and then they make a request to your server passing it various info on the query string.

Resources