CDN: Forward to a different resource instead of redirect - cdn

I need to send different resources (specially images) for same urls depending on a complex logic based on different factors (cookie, IP, time, random). I want to take advantage of CDNs (cache, availability, proximity). So, I want this CDN to make a call to my server in order to decide which resource serve to any request. It is very important to not use redirects, so the user will never see a 30X status code.
For clarification:
User makes a request to http://resources.mydomain.com/img/a.jpg, which domain is under CDN
CDN makes a call to my server, sending url requested, cookies and user IP
My server returns the name of the real resource to serve (http://hidden.mydomain.com/img/a-version3.jpg)
CDN requests that image if not in cache
CDN responds to user request sending a-version3.jpg data, but without any redirect
Is it possible using any current commercial solution?

Yes, I think it is already supported by CDNetworks long time ago.
It is called "Origin Logic Control" now. You can check the description from http://www.cdnetworks.com/wp-content/uploads/2013/08/CDNetworks-ContentAccel-DS-EN2.pdf:
Allows a customer’s domain to require checking with the origin on every request.
You can return a special HTTP header (or special HTTP body, I am not sure now) to tell CDNetworks to return resources directly (and using cached version if available), not 30x status code.

You can enable Redirect Chasing to get what you are looking for. Alternatively, look at the Akamai blog post on Edge Redirect for a faster option.

Related

Always serve stale/cached data from edge servers

Is it possible to serve always stale/cached data from CDN edge servers like Akamai. ?
Reason is if there is some problem in origin server and It might need 2-3 days to solve it.My origin server responds properly but I don’t want it to get overloaded and want CDN to keep serving the cached data instead for sometime.
Best Regards,
Saurav
Yes, Akamai can serve stale content if the request to the origin times out or produces an error code. Here's a screen shot of the "Caching" and "Cache HTTP Error Responses" behaviors.
Note, however, that your content will need to be fairly popular to remain in cache. If it's not popular, then it may be evicted before you're able to repair your origin.
A better alternative is to implement a Site Failover ruleset which allows you to serve your page with alternate content from a separate origin, or static assets from Akamai's NetStorage. Here's a screenshot of a typical Match of a failed origin and the standard Fail Over behavior.
The "Action" field provides the following options, which can each be configured to your needs:
Serve stale content
Redirect to a different location
Use alternate hostname in this property
Use alternate hostname on provider network
Serve alternate content from NetStorage

head request returns different content-type [duplicate]

I would like to try send requests.get to this website:
requests.get('https://rent.591.com.tw')
and I always get
<Response [404]>
I knew this is a common problem and tried different way but still failed.
but all of other website is ok.
any suggestion?
Webservers are black boxes. They are permitted to return any valid HTTP response, based on your request, the time of day, the phase of the moon, or any other criteria they pick. If another HTTP client gets a different response, consistently, try to figure out what the differences are in the request that Python sends and the request the other client sends.
That means you need to:
Record all aspects of the working request
Record all aspects of the failing request
Try out what changes you can make to make the failing request more like the working request, and minimise those changes.
I usually point my requests to a http://httpbin.org endpoint, have it record the request, and then experiment.
For requests, there are several headers that are set automatically, and many of these you would not normally expect to have to change:
Host; this must be set to the hostname you are contacting, so that it can properly multi-host different sites. requests sets this one.
Content-Length and Content-Type, for POST requests, are usually set from the arguments you pass to requests. If these don't match, alter the arguments you pass in to requests (but watch out with multipart/* requests, which use a generated boundary recorded in the Content-Type header; leave generating that to requests).
Connection: leave this to the client to manage
Cookies: these are often set on an initial GET request, or after first logging into the site. Make sure you capture cookies with a requests.Session() object and that you are logged in (supplied credentials the same way the browser did).
Everything else is fair game but if requests has set a default value, then more often than not those defaults are not the issue. That said, I usually start with the User-Agent header and work my way up from there.
In this case, the site is filtering on the user agent, it looks like they are blacklisting Python, setting it to almost any other value already works:
>>> requests.get('https://rent.591.com.tw', headers={'User-Agent': 'Custom'})
<Response [200]>
Next, you need to take into account that requests is not a browser. requests is only a HTTP client, a browser does much, much more. A browser parses HTML for additional resources such as images, fonts, styling and scripts, loads those additional resources too, and executes scripts. Scripts can then alter what the browser displays and load additional resources. If your requests results don't match what you see in the browser, but the initial request the browser makes matches, then you'll need to figure out what other resources the browser has loaded and make additional requests with requests as needed. If all else fails, use a project like requests-html, which lets you run a URL through an actual, headless Chromium browser.
The site you are trying to contact makes an additional AJAX request to https://rent.591.com.tw/home/search/rsList?is_new_list=1&type=1&kind=0&searchtype=1&region=1, take that into account if you are trying to scrape data from this site.
Next, well-built sites will use security best-practices such as CSRF tokens, which require you to make requests in the right order (e.g. a GET request to retrieve a form before a POST to the handler) and handle cookies or otherwise extract the extra information a server expects to be passed from one request to another.
Last but not least, if a site is blocking scripts from making requests, they probably are either trying to enforce terms of service that prohibit scraping, or because they have an API they rather have you use. Check for either, and take into consideration that you might be blocked more effectively if you continue to scrape the site anyway.
One thing to note: I was using requests.get() to do some webscraping off of links I was reading from a file. What I didn't realise was that the links had a newline character (\n) when I read each line from the file.
If you're getting multiple links from a file instead of a Python data type like a string, make sure to strip any \r or \n characters before you call requests.get("your link"). In my case, I used
with open("filepath", 'w') as file:
links = file.read().splitlines()
for link in links:
response = requests.get(link)
In my case this was due to fact that the website address was recently changed, and I was provided the old website address. At least this changed the status code from 404 to 500, which, I think, is progress :)

How to work around POST being changed to GET on 302 redirect?

Some parts of my website are only accessible via HTTPS (not whole website - security vs performance compromise) and that HTTPS is enforced with a 302 redirect on requests to the secure part if they are sent over plain HTTP.
The problem is for all major browsers if you do a 302 redirect on POST it will be automatically switched to GET (afaik this should only happen on 303, but nobody seems to care). Additional issue is that all POST data is lost.
So what are my options here other than accepting POSTs to secure site over HTTP and redirecting afterwards or changing loads of code to make sure all posts to secure part of website go over HTTPS from the beginning?
You are right, this is the only reliable way. The POST request should go over https connection from the very beginning. Moreover, It is recommended that the form, that leads to such POST is also loaded over https. Usually the first form after that you have the https connection is a login form. All browsers applying different security restrictions to the pages loaded over http and over https. So, this lowers the risk to execute some malicious script in context that own some sensible data.
I think that's what 307 is for. RFC2616 does say:
If the 307 status code is received in response to a request other
than GET or HEAD, the user agent MUST NOT automatically redirect the
request unless it can be confirmed by the user, since this might
change the conditions under which the request was issued.
but it says the same thing about 302 and we know what happens there.
Unfortunately, you have a bigger problem than browsers not dealing with response codes the way the RFC's say, and that has to do with how HTTP works. Simplified, the process looks like this:
The browser sends the request
The browser indicates it has sent the entire request
The server sends the response
Presumably your users are sending some sensitive information in their post and this is why you want them to use encryption. However, if you send a redirect response (step 3) to the user's unencrypted POST (step 1), the user has already sent all of the sensitive information out unencrypted.
It could be that you don't consider the information the user sends that sensitive, and only consider the response that you send to be sensitive. However, this turns out not to make sense. Sensitive information should be available only to certain individuals, and the information used to authenticate the user is necessarily part of the request, which means your response is now available to anyone. So, if the response is sensitive, the request is sensitive as well.
It seems that you are going to want to change lots of code to make sure all secure posts use HTTPS (you probably should have written them that way in the first place). You might also want to reconsider your decision to only host some of your website on HTTPS. Are you sure your infrastructure can't handle using all HTTPS connections? I suspect that it can. If not, it's probably time for an upgrade.

Is there a way to redirect Post requests preserving post data or an alternative?

I am setting up a CDN relying only on Header redirects or temporary URLs served by an API controlled by a Database cluster.
The Goal is to reduce hardware costs and have flexible nodes with only FTP/HTTP/PHP as requirement and create a cheap solution for websites that can work with this.
Howevery my Problem is that i want to have a static Address where file uploads (containing ClientID and Token) can be sent to. I am using simple post.
But the file should be sent directly to the most idle server.
So what I want is to have A Post request to http://whatever.com/upload.php which is redirected to http://server-in-cdn.whatever.com/upload.php whithout loosing the data.
The problem is that the post request gets converted into a GET request and Post data is lost.
The W3C documentation states that the 307 Header code could be used, but its not reliable and user confirmation is required.
Or is there an alternative? I am not really into network stuff... but I think the classic solution would be some sort of Load balancer or router running BGB/Quagga or something like that, and the traffic would still go over that node.. is that correct?
Or is there a way to totally redirect the traffic on Network/DNS basis?
Thanks in advance.

Is it safe to redirect to the same URL?

I have URLs of the form http://domain/image/⟨uuid⟩/42x42/some_name.png. The Web server (nginx) is configured to look for a file /some/path/image/⟨uuid⟩/thumbnail_42x42.png, and if it does not exist, it sends the URL to the backend (Django via mod_wsgi) which then generates the thumbnail. Then the backend emits a 302 redirect to exactly the same URL that was requested by the client, with the idea that upon this second request the server will notice the thumbnail file and send it directly.
The question is, will this work with all the browsers? So far testing has shown no problems, but can I be sure all the user agents will interpret this as intended?
Update: Let me clarify the intent. Currently this works as follows:
The client requests a thumbnail of an image.
The server sees the file does not exist, so it forwards the request to the backend.
The backend creates the thumbnail and returns 302.
The backend releases all the resources, letting the server share the newly generated file to current and subsequent clients.
Having the backend serve the newly created image is worse for two reasons:
Two ways of serving the same data must be created;
The server is much better at serving static content. What if the client has an extremely slow link? The backend is not particularly fast nor memory-efficient, and keeping it in memory while spoon-feeding the client can be wasteful.
So I keep the backend working for the minimum amount of time.
Update²: I’d really appreciate some RFC references or opinions of someone with experience with lots of browsers. All those affirmative answers are pleasant but they look somewhat groundless.
If it doesn't, the client's broken. Most clients will follow redirect loops until a maximum value. So yes, it should be fine until your backend doesn't generate the thumbnail for any reason.
You could instead change URLs to be http://domain/djangoapp/generate_thumbnail and that'll return the thumbnail and the proper content-type and so on
Yes, it's fine to re-direct to the same URI as you were at previously.

Resources