What are the steps to switching Drupal website/server to HTTP2 - drupal

We need to convert our Drupal website to HTTP2. What would the steps be needed to convert the website and server?

Turn it on in your web server in dev.
Test it to make sure no unintended consequences - nothing should have to change at an app level to use HTTP/2 as the web server and browser takes care of all that for you automatically (one of the many nice things about HTTP/2!), but there can be unintended consequence due to removal of download constraints. See here as an example: https://www.lucidchart.com/techblog/2019/04/10/why-turning-on-http2-was-a-mistake/
Decide if you want to optimise for HTTP/2 (e.g. bundle and sprite less) to get most out of the new protocol.
Turn it on in your web server in prod.
Monitor and measure impact.

Related

Simulating a remote website locally for testing

I am developing a browser extension. The extension works on external websites we have no control over.
I would like to be able to test the extension. One of the major problems I'm facing is displaying a website 'as-is' locally.
Is it possible to display a website 'as-is' locally?
I want to be able to serve the website exactly as-is locally for testing. This means I want to simulate the exact same HTTP data, including iframe ads, etc.
Is there an easy way to do this?
More info:
I'd like my system to act as closely to the remote website as possible. I'd like to run command fetch for example which would allow me to go to the site in my browser (without the internet on) and get the exact same thing I would otherwise (including information that is not from a single domain, google ads, etc).
I don't mind using a virtual machine if this helps.
I figured this was quite a useful thing in testing. Especially when I have a bug I need to reliably reproduce in sites that have many random factors (what ads show, etc).
As was already mentioned, caching proxies should do the trick for you (BTW, this is the simplest solution). There are quite a lot of different implementations, so you just need to spend some time selecting a proper one (according to my experience squid is a good solution). Anyway, I would like to highlight two other interesting options:
Option 1: Betamax
Betamax is a tool for mocking external HTTP resources such as web services and REST APIs in your tests. The project was inspired by the VCR library for Ruby. Betamax aims to solve these problems by intercepting HTTP connections initiated by your application and replaying previously recorded responses.
Betamax comes in two flavors. The first is an HTTP and HTTPS proxy that can intercept traffic made in any way that respects Java’s http.proxyHost and http.proxyPort system properties. The second is a simple wrapper for Apache HttpClient.
BTW, Betamax has a very interesting feature for you:
Betamax is a testing tool and not a spec-compliant HTTP proxy. It ignores any and all headers that would normally be used to prevent a proxy caching or storing HTTP traffic.
Option 2: Wireshark and replay proxy
Grab all traffic you are interested in using Wireshark and replay it. This I would say it is not that hard to implement required replaying tool, but you can use available solution called replayproxy
Replayproxy parses HTTP streams from .pcap files
opens a TCP socket on port 3128 and listens as a HTTP proxy using the extracted HTTP responses as a cache while refusing all requests for unknown URLs.
Such approach provide you with the full control and bit-to-bit precise simulation.
I don't know if there is an easy way, but there is a way.
You can set up a local webserver, something like IIS, Apache, or minihttpd.
Then you can grab the website contents using wget. (It has an option for mirroring). And many browsers have an option for "save whole web page" that will grab everything, like images.
Ads will most likely come from remote sites, so you may have to manually edit those lines in the HTML to either not reference the actual ad-servers, or set up a mock ad yourself (like a banner image).
Then you can navigate your browser to http://localhost to visit your local website, assuming port 80 which is the default.
Hope this helps!
I assume you want to serve a remote site that's not under your control. In that case you can use a proxy server and have that server cache every response aggressively. However, this has it's limits. First of all you will have to visit every site you intend to use through this proxy (with a browser for example), second you will not be able to emulate form processing.
Alternatively you could use a spider to download all content of a certain website. Depending on the spider software, it may even be able to download JavaScript-built links. You then can use a webserver to serve that content.
This service http://www.json-gen.com provides mock for html, json and xml via rest. By this way, you can test your frontend separately from backend.

Why use Mongrel2?

I'm confused what purpose Mongrel2 serves/provides that nginx doesn't already do.
(Yes, I've read the manual but I must to be too much of a noob to understand how it's fundamentally different than nginx)
My current web application stack is:
- nginx: webserver
- Lua: programming language
- FastCGI + LuaJIT: to connect nginx to Lua
- Postgres: database
If you could only name one thing then it would be that Mongrel2 is build around ZeroMQ which means that scaling your web server has never been easier.
If a request comes in, Mongrel2 receives it (nothing unusual here, same as for NginX and any other httpd). Next thing that happens is that Mongrel2 distributes the task of compiling a response to n (ZeroMQ-enabled) backends, waits for them to do the work, receives results, compiles the response and sends it off to the client.
Now, the magic is with the fact that n can be any number and, that each of n can be written in any language as supported by ZeroMQ (20 or so) plus, all goes across the network so each n can be a dedicated box, possibly in another datacenter.
In other words: with NginX and all the rest you have to do scalability in your logic tier, Mongrel2 allows you to start (from a request/response cycle point of view) this right where the request hits your infrastructure, at the httpd rather than letting complexity penetrate down to your logic tier which blows complexity upwards by at least one order of magnitude imo.
You should look at the strengths of each and decide to use either or both depending on your use cases..
While, it seems that nginx does everything that mongrel2 provides in the surface, you'll find there are major differences in focus between the two.
Nginx shines as a front-end webserver, that can proxy requests to your backend webservers/appservers and also serve static content.
Mongrel2 is a slight change in the stack. As mentioned, it's power comes from it's use of zeromq as the transport layer between it and the backend appservers. It can serve dynamic request urls (app requests) and direct the compute portion of the task out to different backends using zeromq..
mongrel2 allows you to serve not just http, websockets etc, but other protocols (if you're inclined to do so) all from the same server. the user would never know that portions of the app are being served from different backends.
If your requirements for the functionality of your webapp keeps changing or you want to add things like streaming, the ability to code in different languages in the back end etc, then I would definitely look at mongrel2. Or even have a hybrid
where you use nginx/haproxy/varnish for static files and caching, and everything else is directed to mongrel2.

Http requests / concurrency?

Say a website on my localhost takes about 3 seconds to do each request. This is fine, and as expected (as it is doing some fancy networking behind the scenes).
However, if i open the same url in tabs (in firefox), then reload them all at the same time, it appears to load each page sequentially rather than all at the same time. What is this all about?
Have tried it on windows server 2008 iis and windows 7 iis
It really depends on the web browser you are using and how tab support in it has been programmed.
It is probably using a single thread to load each tab in turn, which would explain your observation.
Edit:
As others have mentioned, it is also a very real possibility the the webserver running on your localhost is single threaded.
If I remember correctly HTTP standard limits the number of concurrent conections to the same host to 2. This is the reason highload websites use CDNs (content delivery networks).
network.http.max-connections 60
network.http.max-connections-per-server 30
The above two values determine how many connections Firefox makes to a server. If threshold is breached, it will pipeline the requests.
Each browser implements it in its own way. The requests are made in such a way to maximize the performance. Moreover, it also depends on the server (localhost which is slower).
Your local web server configuration might have only one thread, so every next request will wait for the previous to finish

Harvesting Dynamic HTTP Content to produce Replicating HTTP Static Content

I have a slowly evolving dynamic website served from J2EE. The response time and load capacity of the server are inadequate for client needs. Moreover, ad hoc requests can unexpectedly affect other services running on the same application server/database. I know the reasons and can't address them in the short term. I understand HTTP caching hints (expiry, etags....) and for the purpose of this question, please assume that I have maxed out the opportunities to reduce load.
I am thinking of doing a brute force traversal of all URLs in the system to prime a cache and then copying the cache contents to geodispersed cache servers near the clients. I'm thinking of Squid or Apache HTTPD mod_disk_cache. I want to prime one copy and (manually) replicate the cache contents. I don't need a federation or intelligence amongst the slaves. When the data changes, invalidating the cache, I will refresh my master cache and update the slave versions, probably once a night.
Has anyone done this? Is it a good idea? Are there other technologies that I should investigate? I can program this, but I would prefer a configuration of open source technologies solution
Thanks
I've used Squid before to reduce load on dynamically-created RSS feeds, and it worked quite well. It just takes some careful configuration and tuning to get it working the way you want.
Using a primed cache server is an excellent idea (I've done the same thing using wget and Squid). However, it is probably unnecessary in this scenario.
It sounds like your data is fairly static and the problem is server load, not network bandwidth. Generally, the problem exists in one of two areas:
Database query load on your DB server.
Business logic load on your web/application server.
Here is a JSP-specific overview of caching options.
I have seen huge performance increases by simply caching query results. Even adding a cache with a duration of 60 seconds can dramatically reduce load on a database server. JSP has several options for in-memory cache.
Another area available to you is output caching. This means that the content of a page is created once, but the output is used multiple times. This reduces the CPU load of a web server dramatically.
My experience is with ASP, but the exact same mechanisms are available on JSP pages. In my experience, with even a small amount of caching you can expect a 5-10x increase in max requests per sec.
I would use tiered caching here; deploy Squid as a reverse proxy server in front of your app server as you suggest, but then deploy a Squid at each client site that points to your origin cache.
If geographic latency isn't a big deal, then you can probably get away with just priming the origin cache like you were planning to do and then letting the remote caches prime themselves off that one based on client requests. In other words, just deploying caches out at the clients might be all you need to do beyond priming the origin cache.

Physically Separating Secure and Non Secure Web Requests

We have been doing some research into physically isolating the secure and non-secure sections of our web application into two applications. All "http" requests would be served by one server (or cluster) and all "https" requests would be served by another server (or cluster).
The reason that we are looking into this is partially for the survivability of the application. Since the secure section of the application is revenue generating we could, for example, have a larger and/or more powerful cluster to serve the requests. Conversely, when we upgrade the hardware in the secure application, it could be re-purposed to serve the non-secure site - basically extending the life of the servers.
Has anyone worked with this approach? We had an RFP out to a (well known) vendor last year for an architectural assessment and this was one of the possible paths that was recommended. While I see the potential upside, I worry about things such as maintenance, deployment, version control, etc.
Depending how your app is architected, it seems to me that if you used virtualisation / load balancing you could have the same benefits of guaranteed resources and isolation for the paid area, while also being able to dynamically burst resources to deal with spikes in load in either area. Your current proposal allows you to guarantee and prioritise resources, but it may result in some of them being idle.
Plus it would be easier to manage load through configuration, as it would then be a pure deployment issue and an entirely separate concern. You'd also be more independent of your hardware upgrade path as you'd just be adding/assigning virtual machines to the new hardware.

Resources