Cloudflare Worker fetch() global does not respect Cloudflare worker routes - cloudflare-workers

I have a web application that is proxied by Cloudflare running at hybrid-app.example
I am using the Cloudflare Worker Sites pattern to host a static site at hybrid-app.example/static-content/. This worker is listening on the route *hybrid-app.example/static-content*. I can verify that this worker is configured correctly by navigating to https://hybrid-app.example/static-content/hello.html, and seeing the correct content from my static site.
I also have a second Cloudflare worker that user the worker Router pattern to proxy select pages from the static site to vanity URLs on the same domain. In particular, this worker is configured to listen on the route *hybrid-app.example/hello*. Internally, it does the following:
return fetch('https://hybrid-app.example/static-content/hello.html')
But when I navigate to https://hybrid-app.example/hello, I see a GET request to /static-content/hello.html hit my origin server. So it appears that when I called fetch function in a Cloudflare worker, it is resolving the request without checking for matches against any other worker routes. Is there an alternative to the Fetch API I can use to force a request from one cloudflare worker to check for other matching CF worker routes before resolving?

This is not possible; Cloudflare workers always bypass Cloudflare (including the step where requests are routed to workers) for requests in the same zone:
Regarding worker composition (one worker invoking another worker):
such request chaining isn’t possible on same-zone subrequests.
Instead, we always send those to the origin. This limitation is in
place to resolve an ambiguity: how do we know that a subrequest is
“trusted” and should be sent directly to the origin, versus
“untrusted” and should go in the front door, running all Cloudflare
features (including Workers) from the start? (Note that this ambiguity
doesn’t exist for cross-zone subrequests: such subrequests are
obviously untrusted from the perspective of the other zone.)

Related

How to configure nginx to only allow requests from cloudfront client?

I have an server behind nginx, and I have a frontend distributed on AWS cloudfront using AWS Amplify. I'd like requests coming not from my client to be denied at the reverse proxy level. If others think I should do this on the app level, please lmk.
What I've tried so far is to allow all the ips of AWS' cloudfront system (https://ip-ranges.amazonaws.com/ip-ranges.json), and deny all after that. However, my requests from the correct client get blocked.
My other alternative is to do a lookup by IP of the domain for every request, and check against that - but I'd rather not do a DNS lookup every time.
I can also include some kind of token with every request, but come on - there's gotta be some easier way to get this done.
Any ideas?

Kubernetes sticky session / load balance by header value

I'm working on a project where I want to use Kubernetes and Docker. The microservice I'm about to implement must create a permanent HTTP/2 connection to another service (provided by others and I can't modify anything in that service) pro user / client in order to send asynchronous and cloud initiated messages to that user. Also, each following request from that client must use the same connection.
Obviously that is a challenge in terms of scalability, because every request from a client must be routed to the same instance of my microservice, which created the permanent connection to the other service. What makes things worse is the fact that my clients can change the IPs and that they can't use cookies. But what they can do is to send a custom header value which identifies them.
I thought about HAProxy and nginx, but can't find an option in either of them to load balance requests by a header value. Is there really no way to do that? How would you approach that issue? Any ideas?
Thanks!

What is "Reverse Proxy" and "Load Balancing" in Nginx / Web server terms?

These are two phrases I hear about very often, mainly associated with Nginx. Can someone give me a laymans defintion?
Definitions are often difficult to understand. I guess you just need some explanation for their use case.
A short explanation is: load balancing is one of the functionalities of reverse proxy, and reverse proxy is one of the softwares that can do load balancing.
And a long explanation is given below.
For example a service of your company has customers in UK and German. Because the policy is different for these two countries, your company has two web servers, uk.myservice.com for UK and de.myservice.com for German, each with different business logic. In addition, your company wants there to be only one unified endpoint, myservice.com for the service. In this case, you need to set up a reverse proxy as the unified endpoint. The proxy takes the url myservice.com, and rewrites the url of incoming requests so that requests from UK(determined by source ip) go to uk.myservice.com and requests from German go to de.myservice.com. From the view of a client from UK, it never knows the response is actually generated from uk.myservice.com.
In this case, the load of request traffic to the service is actually balanced to servers on uk.myservice.com and de.myservice.com as a side effect. So we normally don't call it used as a load balancer, just say it as a reverse proxy.
But lets say if your company uses the same policy for all countries, and has 2 servers, a.myservice.com and b.myservice.com, only for the reason that the work load is to heavy for one server machine. In this case, we normally call the reverse proxy as load balancer to emphasize the reason why it is being used.
Here is the basic definition:
Reverse Proxy is a proxy host, that receives requests from a client, and sends it to one of the servers behind itself. Nginx and apache httpd are commonly used as reverse proxies. These are in the administrative network of the web server that a servers a request.
This is in contrast with a (forward) Proxy, which sits in front of a client, and sends requests on behalf of a client to a web server. As an example, your corporate network address translator is a forward proxy. These are in the administrative network of the client from where the request originates.
Load balancing is a function performed by reverse proxies. The client requests are received by a load balancer, and the load balancer tries to send that request to one of the nodes (hosts) in the server pool, in an attempt to balance the load across various nodes.
I see both of them as a functionality of a HTTP/Web Server.
Load balancer’s job is to distribute the workload between servers node in a way that makes the best use of it.
Reverse proxy is a interface for external world ,forwarding request to a server node (even when we have a single node)
Its other use cases are caching of static content ,compression etc

How to make a fault tolerant system which can immediately handle the situation when a server goes down

Before XYZ.com was down I noticed that my request was being routed to IP address 192.33.31.xxx and when it came up I noticed that my request was routed to IP 50.17.196.xxx , is it some sort of server switching? Isn't Dynamic server switching in case of fault is a way to make fault tolerant system?
Most of the heavy used websites works with load balancers directing the request to multiple servers. There are very simple to very complex logic based on which the routing is defined.
For e.g. the logic can be
Once the request is handled by one particular server the request from the same IP is handled by the same server OR
Use round robing way of handling request ( In this case IP has role to play)
route request to the least used server etc.
Website can also use hybrid approach of all the above.
Now if the website you were accessing was using the IP based routing then the case would have been
This is your first request. Then you would be assigned a server which will handle your request. (This can be round robin or other logic (for e.g. the server which has less load)
Now the load balancer might have logic that next X (say 100) requests or for Y (say 1 hour) time period, all requests from the same IP will be routed to the same server or both
So in your case if you make a request to XYZ it routes to some a.b.c.d server and then if you make another call after 1 hour or you make 101'th call you might get a different server which handles your request.
Now if your server went down, then you can have a fallback mechanisms like,
If a server do not respond in some pre configured time period, remove that server from live server List in load balancer configuration and reroute the request.
Or we can have an idle sitting pre running servers (it is costly but to be followed when each request has to served without fail). As soon as a server goes down, invoke this new server and assign it the ip which was assigned to the old server which went down.
There can be better and much more complex solution for this problem though.

Will a request to api.myapp.com be slower then a request to api-myapp.herokuapp.com when hosted on heroku?

I'm trying to understand the best way to handle SOA on heroku, i've got it into my head that making requests to custom domains will somehow be slower, or would all requests go "out" via the internet?
On previous projects which are SOA in nature we've had dedicated hosting so could make requests like http://blogs/ (obviously on the internal network) I'm wondering if heroku treats *.herokuapp.com requests as "internal"... Or is it clever enough to know the myapp.com is actually myapp.herokuapp.com and route locally, or am i missing the point completely, and in fact all requests are "external"
What you are asking about is general knowledge of how internet requests are working.
Whenever you do request from your application to lets say example.com, domain name will first be translated into IP address using so called DNS servers.
So this how it works: does not matter you request myapp.com or myapp.heroku.com you will always request infromation from specific IP address, and domain name you have requested will be passed as part of request headers.
Server which receives this request will try to find in its internal records this domain name and handle request accordingly.
So conclusion is that does not matter you put myapp.com or myapp.heroku.com, the speed of request will always be same.
PS: As heroku will load balance your requests between different instances of your running myapp.com, the speed here will depend on several factors: how quickly your application will respond, how many instances you have running and load average per instance, how much is load balancer loaded at the moment. But surely it will not depend on which domain name you use.

Resources