What is the proper user-agent to use for a server?

What is the proper user-agent to use for a server? - http

For Filepicker.io we built "grab from url", but certain sites aren't happy with not passing a User-Agent header. I could just use a stock browser user agent as suggested in some other answers, but as a good web citizen I wanted to know if there isa more appropriate user-agent to set for a server requesting another server's data?

Depends on the language you wrote your server in. For example, Python's urllib sets a default value to User-agent: Python-urllib/2.1, but you can just as easily set it to something like User-agent: filepicker.io/<your-version-here> or something more language specific if you'd like.

Related

Which HTTP headers are set by the web browser and sent automatically

I am assuming that all web browsers send User-Agent, DNT, Accept, Accept-Language, Accept-Encoding etc automatically. The web developer do not have to do anything to set these headers. I am saying this because previously www.whatismybrowser.com used to show these header values.
If so then which headers are set by the web browser and sent automatically?

OP here. I got the answer from reddit.
One thing you could easily do is create a page like test.php and set it to just:
<?php
print_r($_SERVER);
Then visit that in the different browser and OS combos that you care about and take any of the notes that you're looking for.

Cross site scripting vulnerabilities for onmouseover in url

A website was audited for vulnerabilities and it had flagged XSS for many pages which, from my point of view, do not appear to be vulnerable as I don't display any data captured from form the page or the URL (such as query string).
Acunetix flagged the following URL as XSS by adding some javacript code
http://www.example.com/page-one//?'onmouseover='pU0e(9527)
Report:
GET /page-one//?'onmouseover='pU0e(9527)'bad=' HTTP/1.1
Referer: https://www.example.com/
Connection: keep-alive
Authorization: Basic FXvxdAfafmFub25cfGb=
Accept: /
Accept-Encoding: gzip,deflate
Host: example.com
So, how could this be vulnerable or is it possible that it's vulnerable?
Above all, if onmouseover can be added as XSS then how will it be affected?

Since you asked for more information, I'll post my response as an answer.
The main question as I see it:
Can there still be an XSS vulnerability from the query string if I don't use any of the parameters in my code?
Well, if they actually aren't used at all, then it should not be possible. However, there are subtle ways that you could be using them that you may have overlooked. (Posting the actual source code would be useful here).
One example would be something like this:
Response.Write("<a href='" +
HttpContext.Current.Request.Url.AbsoluteUri) + "'>share this link!</a>
This would put the entire URL in the body of the web page. The attacker can make use of the query string even though they aren't mapped to variables because the full URL is written in the response. Keep in mind it could also be in a hidden field.
Be careful writing out values like HttpContext.Current.Request.Url.AbsoluteUri or HttpContext.Current.Request.Url.PathAndQuery.
Some tips:
Confirm that the scanner is not reporting a false positive by opening the link in a modern browser like Chrome. Check the console for an error about "XSS Auditor" or similar.
Use an antixss library to encode untrusted output before writing to the response.
read this: https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet

Should I use the Content-Location header this way?

Preface:
After reading a lot about HTTP and REST, you have spent a few hours devising a cunning content-negotiation scheme. So that your web API can serve XML, JSON and HTML from a single URL. Because, you know, a resource should only have one URL and different representations should be requested using Accept headers. You start to wonder why it took the web 20 years for that realization.
And that is when reality slaps you in the face.
So to help browsers (and yourself trying to debug) with coercing your service to serve the desired content type you do what every self-respecting REST evangelist would despise you for: Filename extensions.
Eternal torment in hell notwithstanding, is the following use of Content-Location + .ext acceptable?
Say we have users at /users/:loginname for example /users/bob. This would be the API endpoint for anything that is capable of setting a proper Accept header. But for any possible Content-Type (or at least some), we allow an alternate method of access and that is a URL with a filetype suffix. For example /users/bob.html for an HTML representation. Let's assume (and that is a big assumption to make) login names will never contain a period/dot.
Request:
GET /users/bob.json HTTP/1.1
Host: example.com
Response:
HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 14
Content-Location: /users/bob
{"foo": "bar"}
This would allow me to encode alternative ways to access (in this case) the user information.
For example a link to a user page could be Bob.
A link to a vCard (to add the user to the Address-Book/Outlook/anything) would be Bob.
Are there any pitfalls I have missed? What would be pros/cons of this?
Edit: This popped up a bit late for me to notice. And even though it touches the subject and is really helpful, I think it's not exactly what I'm looking for...

As far as I can tell, you use Content-Location exactly the wrong way; it should point to the more specific URI.

According to RFC 2616:
The Content-Location entity-header field MAY be used to supply
the resource location for the entity enclosed in the message
when that entity is accessible from a location separate from
the requested resource's URI.
and
The Content-Location value is not a replacement for the original
requested URI; it is only a statement of the location of the resource
corresponding to this particular entity at the time of the request.
so generally, yes, you can use Content-Location header to identify origin resource. Main disadvantage of using of extension suffix is that you are making another URLs, e.g. /users/bob, /users/bob.vfc, /users/bob.html are three different resources.

Specify supported media types when sending "415 unsupported media type"

If a clients sends data in an unsupported media type to a HTTP server, the server answers with status "415 unsupported media type". But how to tell the client what media types are supported? Is there a standard or at least a recommended way to do so? Or would it just be written to the response body as text?

There is no specification at all for what to do in this case, so expect implementations to be all over the place. (What would be sensible would be if the server's response included something like an Accept: header since that has pretty much the right semantics, if currently in the wrong direction.)

I believe you can do this with the OPTIONS Http verb.
Also the status code of 300 Multiple Choices could be used if your scenario fits a certain use case. If they send a request with an Accept header of application/xml and you only support text/plain and that representation lives at a distinct URL then you can respond with a 300 and in the Location header the URL of that representation. I realize this might not exactly fit your question, but it's another possible option.
And from the HTTP Spec:
10.4.7 406 Not Acceptable
The resource identified by the request is only capable of generating response entities which have content characteristics not acceptable according to the accept headers sent in the request.
Unless it was a HEAD request, the response SHOULD include an entity containing a list of available entity characteristics and location(s) from which the user or user agent can choose the one most appropriate. The entity format is specified by the media type given in the Content-Type header field. Depending upon the format and the capabilities of the user agent, selection of the most appropriate choice MAY be performed automatically. However, this specification does not define any standard for such automatic selection.
Note: HTTP/1.1 servers are allowed to return responses which are
not acceptable according to the accept headers sent in the
request. In some cases, this may even be preferable to sending a
406 response. User agents are encouraged to inspect the headers of
an incoming response to determine if it is acceptable.

tl;dr;
Edited the generated proxy class to inherit from Microsoft.Web.Services3.WebServicesClientProtocol**.
I came across this question when troubleshooting this error, so I thought I would help the next person who might come through here, although not sure if it answers the question as stated. I ran into this error when at some point I had to take over an existing solution which was utilizing WSE and MTOM encoding. It was a windows client calling a web service.
To the point, the client was calling the web service where it would throw that error.
Something that contributed to resolving that error for me was to check the web service proxy class that apparently is generated by default to inherit from System.Web.Services.Protocols.SoapHttpClientProtocol.
Essentially that meant that it didn't actually use WSE3.
Anyhow I manually edited the proxy and changed it to inherit from Microsoft.Web.Services3.WebServicesClientProtocol.
BTW, to see the generated proxy class in VS click on the web reference and then click the 'Show All Files' toolbar button. The reference.cs is da place of joy!
Hope it helps.

In his book "HTTP Developer's Handbook" on page 81 Chris Shiflett explains what a 415 means, and then he says, "The media type used in the content of the HTTP response should be indicated in the Content-Type entity header."
1) So is Content-Type a possible answer? It would presumably be a comma-separated list of accepted content types. The obvious problem with this possibility is that Content-Type is an entity header not a response header.
2) Or is this a typo in the book? Did he really mean to say "the HTTP request"?

HTTP Get content type

I have a program that is supposed to interact with a web server and retrieve a file containing structured data using http and cgi. I have a couple questions:
The cgi script on the server needs to specify a body right? What should the content-type be?
Should I be using POST or GET?
Could anyone tell me a good resource for reading about HTTP?

If you just want to retrieve the resource, I’d use GET. And with GET you don’t need a Content-Type since a GET request has no body. And as of HTTP, I’d suggest you to read the HTTP 1.1 specification.

The content-type specified by the server will depend on what type of data you plan to return. As Jim said if it's JSON you can use 'application/json'. The obvious payload for the request would be whatever data you're sending to the client.
From the servers prospective it shouldn't matter that much. In general if you're not expecting a lot of information from the client I'd set up the server to respond to GET requests as opposed to POST requests. An advantage I like is simply being able to specify what I want in the url (this can't be done if it's expecting a POST request).
I would point you to the rfc for HTTP...probably the best source for information..maybe not the most user friendly way to get your answers but it should have all the answers you need. link text

For (1) the Content-Type depends on the structured data. If it's XML you can use application/xml, JSON can be application/json, etc. Content-Type is set by the server. Your client would ask for that type of content using the Accept header. (Try to use existing data format standards and content types if you can.)
For (2) GET is best (you aren't sending up any data to the server).
I found RESTful Web Services by Richardson and Ruby a very interesting introduction to HTTP. It takes a very strict, but very helpful, view of HTTP.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex