Different WebDav resource paths for same resource when using different clients - webdav

I am in the process of testing a web dav enabled view my system using anumber of different clients. One particular client seems to do sometimes do strange things and I was wonder is this to be expected..
The log below shows how it has somehow mangled the path to the directory(collection in webdav speak) by including again the full path to the servlet again (look at the last line). All other clients dont do this is this primarily because its a poorly coded client that is probably broken ?
[org.eclipse.jetty.util.log] : REQUEST /milton/http:/127.0.0.1/milton/!renamed/ on org.eclipse.jetty.server.nio.SelectChannelConnector$2#59fb21
[org.eclipse.jetty.util.log] : servlet=com.bradmcevoy.http.MiltonServlet-11108810
[org.eclipse.jetty.util.log] : servlet holder=
[org.eclipse.jetty.util.log] : chain=
[com.bradmcevoy.http.HttpManager] : PROPFIND :: http://127.0.0.1:9000/milton/http:/127.0.0.1/milton/!renamed/ - http://127.0.0.1:9000/milton/http:/127.0.0.1/milton/!renamed/
[org.eclipse.jetty.util.log] : RESPONSE /milton/http:/127.0.0.1/milton/!renamed/ 404
I have looked at the response log that the client makes available and the names are not mangled they make sense.
/milton/!renamed
and not
/milton/http:/127.0.0.1/milton/!renamed/

It is most likely the client. - what it looks like is that the server is replying with a fully qualified URL, with scheme, host and port (http://127.0.0.1/9000), but the client is treating it as a relative uri (href - "/milton/abc") and prepending the info back on it.
Returning href's (like /milton/abc) is often used, but both are legal.

Related

When serving a single-page application that uses the History API, should HTTP content negotiation be used?

When serving a single-page application (SPA) that uses the History API, it's common practice to serve the application's HTML instead of a 404 response for requests to any unknown resource. If the application experiences some kind of full-page reload, this allows it to continue presenting the same content to the user.
This can have some negative consequences. For example, an <img> element with a typo in its src attribute will probably get served HTML content instead of a 404. It's possible to avoid this particular case by utilizing HTTP content negotiation, and only serving the HTML when the request headers indicate that the client can accept it.
However, I'm concerned that this is actually a misuse of content negotiation. As I understand it, the main purpose of content negotiation is to provide different representations of the same resource, not to determine whether a resource exists at all.
If you did implement content negotiation for SPA serving it's probably more appropriate to use a 406 Not Acceptable response, rather than a 404 Not Found. But MDN at least seems to indicate that a 406 response is generally a bad idea:
In practice, this error is very rarely used. Instead of responding using this error code, which would be cryptic for the end user and difficult to fix, servers ignore the relevant header and serve an actual page to the user. It is assumed that even if the user won't be completely happy, they will prefer this to an error code.
If a server returns such an error status, the body of the message should contain the list of the available representations of the resources, allowing the user to choose among them.
The most popular implementation of this pattern for JavaScript servers seems to be connect-history-api-fallback. At the time of writing, it uses content negotiation to determine whether to serve the SPA HTML. It doesn't seem to use 406 responses though, instead opting for 404s.
So with all of the above in mind, my questions are:
What is the correct way to serve the HTML for a single-page app?
Should HTTP content negotiation be involved at all?
Additionally, if content negotiation is a desirable solution here, then in the case that an unknown resource is requested, but the client has indicated that HTML is not acceptable:
Should a 406 response be favoured over a 404 response?
What should the body of the response contain?
What additional header content is required to ensure that the system is well-behaved? (For example, I expect that I would probably at least need to set a Vary header to ensure that HTTP caching works correctly)
I feel the only right way to solve this is to not do a catch-all for every possible route, and instead correctly use 404's and serve HTML when there's actually a page to be served.

head request returns different content-type [duplicate]

I would like to try send requests.get to this website:
requests.get('https://rent.591.com.tw')
and I always get
<Response [404]>
I knew this is a common problem and tried different way but still failed.
but all of other website is ok.
any suggestion?
Webservers are black boxes. They are permitted to return any valid HTTP response, based on your request, the time of day, the phase of the moon, or any other criteria they pick. If another HTTP client gets a different response, consistently, try to figure out what the differences are in the request that Python sends and the request the other client sends.
That means you need to:
Record all aspects of the working request
Record all aspects of the failing request
Try out what changes you can make to make the failing request more like the working request, and minimise those changes.
I usually point my requests to a http://httpbin.org endpoint, have it record the request, and then experiment.
For requests, there are several headers that are set automatically, and many of these you would not normally expect to have to change:
Host; this must be set to the hostname you are contacting, so that it can properly multi-host different sites. requests sets this one.
Content-Length and Content-Type, for POST requests, are usually set from the arguments you pass to requests. If these don't match, alter the arguments you pass in to requests (but watch out with multipart/* requests, which use a generated boundary recorded in the Content-Type header; leave generating that to requests).
Connection: leave this to the client to manage
Cookies: these are often set on an initial GET request, or after first logging into the site. Make sure you capture cookies with a requests.Session() object and that you are logged in (supplied credentials the same way the browser did).
Everything else is fair game but if requests has set a default value, then more often than not those defaults are not the issue. That said, I usually start with the User-Agent header and work my way up from there.
In this case, the site is filtering on the user agent, it looks like they are blacklisting Python, setting it to almost any other value already works:
>>> requests.get('https://rent.591.com.tw', headers={'User-Agent': 'Custom'})
<Response [200]>
Next, you need to take into account that requests is not a browser. requests is only a HTTP client, a browser does much, much more. A browser parses HTML for additional resources such as images, fonts, styling and scripts, loads those additional resources too, and executes scripts. Scripts can then alter what the browser displays and load additional resources. If your requests results don't match what you see in the browser, but the initial request the browser makes matches, then you'll need to figure out what other resources the browser has loaded and make additional requests with requests as needed. If all else fails, use a project like requests-html, which lets you run a URL through an actual, headless Chromium browser.
The site you are trying to contact makes an additional AJAX request to https://rent.591.com.tw/home/search/rsList?is_new_list=1&type=1&kind=0&searchtype=1&region=1, take that into account if you are trying to scrape data from this site.
Next, well-built sites will use security best-practices such as CSRF tokens, which require you to make requests in the right order (e.g. a GET request to retrieve a form before a POST to the handler) and handle cookies or otherwise extract the extra information a server expects to be passed from one request to another.
Last but not least, if a site is blocking scripts from making requests, they probably are either trying to enforce terms of service that prohibit scraping, or because they have an API they rather have you use. Check for either, and take into consideration that you might be blocked more effectively if you continue to scrape the site anyway.
One thing to note: I was using requests.get() to do some webscraping off of links I was reading from a file. What I didn't realise was that the links had a newline character (\n) when I read each line from the file.
If you're getting multiple links from a file instead of a Python data type like a string, make sure to strip any \r or \n characters before you call requests.get("your link"). In my case, I used
with open("filepath", 'w') as file:
links = file.read().splitlines()
for link in links:
response = requests.get(link)
In my case this was due to fact that the website address was recently changed, and I was provided the old website address. At least this changed the status code from 404 to 500, which, I think, is progress :)

Difference Between // and http://

I know that HTTP is hyper text transfer protocol, and I know that's how (along with HTTPS) one accesses a website. However, what does just a // do? For instance, to access Google's copy of jQuery, one would use the url //ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js, as opposed to http://....
What exactly is the difference? What does just // indicate?
Thanks.
By saying on // it means use whatever protocol (IE: http vs https) your user is currently hittin for that resource.
So you don't have to worry about dealing with http: vs https: management yourself.
Avoiding potential browser security warnings. It would be good practice to stick with this approach.
For example: If your user is accessing http://yourdomain/ that script file would automatically be treated as http://ajax.googleapis.com/...
if your current request is http
//ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js
will be treated as
http://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js
if your current request is https
//ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js
will be treated as
https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js

How to know when to resolve referer

I was working on my server and encountered the need to implement the use of request.headers.referer When I did tests and read headers to determine how to write the parsing functions, I couldn't determine a differentiation between requests that invoke from a link coming from outside the server, outside the directory, or calls for local resources from a given HTML response. For instance,
Going from localhost/dir1 to localhost/dir2 using <a href="http://localhost/dir2"> will yield the response headers:
referer:"http://localhost/dir1" url:"/dir2"
while the HTML file sent from localhost/dir2 asking for resources using local URI style.css will yeild:
referer:"http://localhost/dir2" url:"/style.css"
and the same situation involving an image could end up
referer:"http://localhost/dir2" url:"/_images/image.png"
How would I prevent incorrect resolution, between url and referer, from accidentally being parsed as http://localhost/dir1/dir2 or http://localhost/_images/image.png and so on? Is there a way to tell in what way the URI is being referred by the browser, and how can either the browser or server identify when http://localhost/dir2/../dir1 is intended destination?

How should I handle unsupported verbs on a resource?

I am developing a RESTful framework and am deciding how to handle an unsupported verb being called against a resource. For example, someone trying to PUT to a read-only resource.
My initial thought was a 404 error, but the error is not that the resource cannot be found, it exists, just the user is trying to use the resource incorrectly. Is there a more appropriate error code? What is the most common way in which this situation is handled?
Is it that you simply don't support a certain verb ie DELETE? In that case I'd use the following HTTP response code if someone uses a verb you don't support.
405 Method Not Allowed
A request was made of a resource using a request method not supported by that resource;[2] for example, using GET on a form which requires data to be presented via POST, or using PUT on a read-only resource. [source]
I don't think you would receive a request to your app at all if the incorrect verb were used (but that probably depends on which specific technologies you're using on the server side).
To be more helpful to potentially confused client connection attempts I suppose you could create a stub endpoint/action for each commonly incorrect verb, method combinations and then send back a friendly "use {verbname} instead for this request" text response, but I'd personally just invest a bit of time in better developer documentation : )
You could also seamlessly redirect to the correct action in those cases...

Resources