What are the guidelines for architecting a website with 5000+ concurrent hits? [closed] - software-design

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 months ago.
Improve this question
I am developing an architecture of a website which will have more than 5000+ concurrent hits with each user possibly requiring heavy background processing. What are the guidelines for the same? Which technologies are recommended.

This is primarily for Java/Spring-boot
try to acheive horizontal scalability.
use JPA with L2 caching that's backed by Redis if you're doing processing
Utilize a cloud infrastructure to have the managed redis/serverless databases (saves you the headache)
When implementing your web tier, use Reactive programming platforms like Project Reactor. This will allow you to scale better with less resources.
Offload anything that's scheduled processing away from your main app cluster. Since the scheduler usually runs as a single instance
Don't put your background processing in the same nodes as your main app cluster.
Offload UI responsibility to the client. (i.e. just expose APIs)
Avoid request/response (except for authentication) and focus on subscribing to events to update the local client data. Or use something like CouchDB to synchronize data between server and device.
Leverage caching on the device.
Do not "proxy" large content, instead use a direct upload to an object store like S3 (or better use Minio to avoid vendor lock to Amazon).
leverage different types of data store technologies
RDBMS (stable, easily understood, less vendor lock if using ORMs, easy to back up and restore, not as scalable for writes)
Elastic Search (efficient searching of data, but only use it for search, vendor lock)
Kafka (stable, harder to understand, but much more scalable, vendor lock)
Hazelcast/MemCached/Redis (unreliable key-value stores, very very fast, super scalable, useful for sharing and caching data)
I intentionally didn't list others like Cassandra, MongoDB as these would yield major vendor lock and harder to transfer skills.

Related

How are requirements of network specified to ensure QoS? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
When I review the hardware requirements of many database backed enterprise solutions, I find requirements for the application server (OS, processor, RAM, disk space, etc), for the database server (versions, RAM, etc) and requirements for the client.
Here is an example of Oracle Fusion Middleware.
I cannot find requirements on network speed or architecture (switch speed, SAN IOPS,RIOPS, etc). I do not want a bad user experience from my application but caused by network latency in the clients environment.
When sending clients required hardware requirements specifications, how do you note the requirements in these areas? What are the relevant measures of network performance? (Or is it simply requiring IOPS=x )
Generally, there is more than one level of detail for requirements. You'd typically differentiate the levels of detail into a range from 0 (rough mission statement) to 4 (technical details), for example.
So if you specify that your SAN shall be operating with at least a bandwidth capacity of x, that would be a high number on that scale. Make sure to break down your main ideas (The system shall be responsive, in order to prevent clients from becoming impatient and leaving for competitors....) into more measurable aims (as the one above).
Stephen Withall has written down good examples in his book "Software Requirement Patterns". See chapter 9, page 191 ff., it is not that expensive.
He breaks it down into recommendations on, and I quote, Response Time, Throughput, Dynamic Capacity, Static Capacity and Availability.
Of course, that's software! Because basically, you'd probably be well advised to begin with defining what the whole system asserts under specified circumstances: When do we start to measure? (e.g. when the client request comes in at the network gateway); what average network delay do we assume that is beyond our influence? from how many different clients do we measure and from how many different autonomous systems do these make contact? Exactly what kind of task(s) do they execute and for which kind of resource will that be exceptionally demanding? When do we stop to measure? Do we really make a complete system test with all hardware involved? Which kinds of network monitoring will we provide at runtime? etc.
That should help you more than if you just assign a value to a unit like transfer rate/ IOPS which might not even solve your problem. If you find the network hardware to perform below your expectations later, it's rather easy to exchange. Especially if you give your hosting to an external partner. The software, however, is not easy to exchange.
Be sure to differentiate between what is a requirement or constraint that you have to meet, and what is actually a part of the technical solution you offer. There might be more solutions. Speed is a requirement (a vague one, though). Architecture for hardware is a solution.

HTTP to JMS Bridge [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
Overview
I'd like to expose a message queue to the internet so that client applications can communicate with some of our back-end services.
I don't want to expose the jms endpoint directly because of sercurity reasons. Also, a plain HTTP transport would obviate the need to distribute jms plugins to heterogeneous client applications (.net, java, javascript).
Research Findings
ActiveMQ
I've taken a look at ActiveMQ's "built-in" REST interface:
http://activemq.apache.org/rest.html
But in testing, I found the demo to be unreliable (i.e. "Where did my messages go?"). Also, it wasn't well documented on how to pull the demo out into a "real" implementation.
ESB
Since this is sounds like a classic "Bridge" pattern in enterprise integration patterns, I looked that the major open-source ESB/SOA integration engines:
Spring Integration
Mule
ServiceMix
Of the three, the clearest piece of documentation seems to be ServiceMix's which offers a In-only message pattern, I would require both POST-ing and GET-ing messages.
Unfortunately for terms of evaluation, it seems like I would have to a deep-dive into each implementation and configuration. I realize that an out-of-the-box setup may be too much to ask, but I'd rather not learn all three just to which one fits my needs the best. So...
Questions
Have you implemented a similar architecture? What did you use?
Regardless of the first answer which would you suggest now?
Which is simplest?
You could always check out the Apache Camel Project.
It allows you to expose and route requests from Http, Web Services, etc to a JMS queue.
Although I voted for Will's answer. The servlet is really the way to go here.
Or you could write a servlet and do this in a couple dozen lines of code.
I have a similiar goal, exposing a lightweight http resource for clients. It actually acts as adapter that takes simple textmessages and puts them simply asynchronous to a queue, for later processing. My research results so far (only additions to the existing answers):
HornetQ REST
Good, but the calles has to know the destination name, which is undesired for my usecase.
Documentation
HJB (HTTP JMS Bridge)
Also didn't fits my needs, beside of this the documentation is not good to understand, it is also not longer maintained.
Website
I probably end up writing my own adapter, either using some thin servlet/etc. or Apache Camel.

How common is web farming/gardens? Should i design my website for it? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I'm running a ASP.NET website, the server both reads and writes data to a database but also stores some frequently accessed data directly in the process memory as a cache. When new requests come in they are processed depending on data in the cache before it's written to the DB.
My hosting provider suddenly decided to put their servers under a load balancer. This means that my caching system will go bananas as several servers randomly processes the requests. So i have to rewrite a big chunk of my application only to get worse performance since i now have to query the database instead of a lightning fast in memory variable check.
First i don't really see the point of distributing the load on the iis server as in my experience DB queries are most often the bottleneck, now the DB has to take even more banging. Second, it seems like these things would require careful planning, not just something a hosting provider would set up for all their clients and expect all applications to be written to suit them.
Are these sort of things common or was i stupid using the process memory as cache in the first place?
Should i start looking for a new hosting provider or can i expect web farming to arrive sooner or later anywhere? Should I keep transitions like this in consideration for all future apps i write and avoid in process caching and similar designs completely?
(Please don't want to make this into a farming vs not farming battle, i'm just wondering if it's so common that i have to keep it in mind when developing.)
I am definitely more of a developer than a network/deployment guru. So while I have a reasonably good overall understanding of these concepts (and some firsthand experience with pitfalls/limitations), I'll rely on other SO'ers to more thoroughly vet my input. With that caveat...
First thing to be aware of: a "web farm" is different from a "web garden". A web farm is usually a series of (physical or virtual) machines, usually each with a unique IP address, behind some sort of load-balancer. Most load balancers support session-affinity, meaning a given user will get a random machine on their first hit to the site, but will get that same machine on every subsequent hit. Thus, your in-memory state-management should still work fine, and session affinity will make it very likely that a given session will use the same application cache throughout its lifespan.
My understanding is a "web garden" is specific to IIS, and is essentially "multiple instances" of the webserver running in parallel on the same machine. It serves the same primary purpose as a web farm (supporting a greater number of concurrent connections). However, to the best of my knowledge it does not support any sort of session affinity. That means each request could end up in a different logical application, and thus each could be working with a different application cache. It also means that you cannot use in-process session handling - you must go to an ASP Session State Service, or SQL-backed session configuration. Those were the big things that bit me when my client moved to a web-garden model.
"First i don't really see the point of distributing the load on the iis server as in my experience DB queries are most often the bottleneck". IIS has a finite number of worker threads available (configurable, but still finite), and can therefore only serve a finite number of simultaneous connections. Even if each request is a fairly quick operation, on busy websites, that finite ceiling can cause slow user experience. Web farms/gardens increases that number of simultaneous requests, even if it doesn't perfectly address leveling of CPU load.
"Are these sort of things common or was i stupid using the process memory as cache in the first place? " This isn't really an "or" question. Yes, in my experience, web farms are very common (web gardens less so, but that might just be the clients I've worked with). Regardless, there is nothing wrong with using memory caches - they're an integral part of ASP.NET. Of course, there's numerous ways to use them incorrectly and cause yourself problems - but that's a much larger discussion, and isn't really specific to whether or not your system will be deployed on a web farm.
IN MY OPINION, you should design your systems assuming:
they will have to run on a web farm/garden
you will have session-affinity
you will NOT have application-level-cache-affinity
This is certainly not an exhaustive guide to distributed deployment. But I hope it gets you a little closer to understanding some of the farm/garden landscape.

Do we really need the app server? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I'm about to start writing a web app (Asp.Net/IIS7) which will be accessible over the internet. It will be placed behind a firewall which accepts http and https.
The previous system which we are going to replace doesn't let this web server talk directly to a database, but rather have it making highly specialized web service calls (through a new firewall which only allows this kind of calls) to a separate app server which then go to the DB to operate on the data.
I have worked on many systems in my day, but this is the first one which has taken security this seriously. Is this a common setup? My first thought was to use Windows Authentication in the connectionstring on the web server and have the user be a crippled DB-user (can only view and update its own data) and then allow DB access through the inner firewall as well.
Am I Naïve? Seems like I will have to do a lot of mapping of data if we use the current setup for the new system.
Edit: The domain of this app is online ordering of goods (Business to business), Users (businesses) log in, input what they can deliver at any given time period, view previous transaction history, view projected demand for goods etc. No actual money is exchanged through this system, but this system provides the information on which goods are available for sale, which is data input to the ordering system
This type of arrangement (DMZ with web server, communicating through firewall with app server, communicating through firewall with db) is very common in certain types of environment, especially in large transactional systems (online corporate banking, for example)
There are very good security reasons for doing this, the main one being that it will slow down an attack on your systems. The traditional term for it is Defence in Depth (or Defense if you are over that side of the water)
Reasonable security assumption: your webserver will be continually under attack
So you stick it in a DMZ and limit the types of connection it can make by using a firewall. You also limit the webserver to just being a web server - this reduces the number of possible attacks (the attack surface)
2nd reasonable security assumption: at some point a zero-day exploit will be found that will get to your web server and allow it to be compromised, which could lead to to an attack on your user/customer database
So you have a firewall limiting the number of connections to the application server.
3rd reasonable security assumption: zero-days will be found for the app server, but the odds of finding zero-days for the web and app servers at the same time are reduced dramatically if you patch regularly.
So if the value of your data/transactions is high enough, adding that extra layer could be essential to protect yourself.
We have an app that is configured similarly. The interface layer lives on a web server in the DMZ, the DAL is on a server inside the firewall with a web service bridging the gap between them. In conjunction with this we have an authorization manager inside the firewall which exposes another web service that is used to control what users are allowed to see and do within the app. This app in one of our main client data tracking systems, and is accessible to our internal employees and outside contractors. It also deals with medical information so it falls under the HIPAA rules. So while I don’t think this set up is particularly common it is not unheard of, particularly with highly sensitive data or in situations where you have to deal with audits by a regulatory body.
Any reasonably scalable, reasonably secure, conventional web application is going to abstract the database away from the web machine using one or more service and caching tiers. SQL injection is one of the leading vectors for penetration/hacking/cracking, and databases often tend to be one of the more complex, expensive pieces of the overall architecture/TOC. Using services tiers allows you to move logic out of the DB, to employ out-of-process caching, to shield the DB from injection attempts, etc. etc. You get better, cheaper, more secure performance this way. It also allows for greater flexibility when it comes to upgrades, redundancy or maintenance.
Configuring the user's access rights seems like a more robust solution to me. Also your DataAccess layer should have some security built in, too. Adding this additional layer could end up being a performance hit but it really depends on what mechanism you're using to move data from "WebServer1" to "WebServer2." Without more specific information in that regard, it's not possible to give a more solid answer.

Creating a P2P / Decentralized file sharing network [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I was wondering where I could learn more about decentralized sharing and P2P networks. Ideally, I'd like to create something to help students share files with one another over their universities network, so they could share without fear of outside entities.
I'm not trying to build the next Napster here, just wondering if this idea is feasible. Are there any open source P2P networks out there that could be tweaked to do what I want?
Basically you need a server (well, you don't NEED a server, but it would make it much simplier) that would store user IPs between other things like file hash lists, etc.
That server can be in any enviroinment you want (which is very comfortable).
Then, each client connects to the server (it should have a dns, it can be a free one, I've used no-ip.com once) and sends basic information first (such as its IP, and a file hash list), then sends something every now and then (say each 5 minutes or less) to report that it's still reachable.
When a client searchs files/users, it just asks the server.
This is a centralized network, but the file sharing would be done in p2p client-to-client connections.
The reason to do it like this is that you can't know an IP to connect to without some reference.
Just to clear this server thing up:
- Torrents use trackers.
- eMule's ED2K uses lugdunum servers.
- eMule's "true p2p" Kademlia uses known nodes (clients) (most of the time taken from servers like this).
Tribler is what you are looking for!
It's a fully decentralized BitTorrent Client from the Delft University of Technology. It's Open Source and written in Python, so also a great starting point to learn.
Use DC++
What is wrong with Bit-Torrent?
Edit: There is also a pre-built P2P network on Microsoft operating systems that is pretty cool as the basis to build something. http://technet.microsoft.com/en-us/network/bb545868.aspx

Resources