Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
When I review the hardware requirements of many database backed enterprise solutions, I find requirements for the application server (OS, processor, RAM, disk space, etc), for the database server (versions, RAM, etc) and requirements for the client.
Here is an example of Oracle Fusion Middleware.
I cannot find requirements on network speed or architecture (switch speed, SAN IOPS,RIOPS, etc). I do not want a bad user experience from my application but caused by network latency in the clients environment.
When sending clients required hardware requirements specifications, how do you note the requirements in these areas? What are the relevant measures of network performance? (Or is it simply requiring IOPS=x )
Generally, there is more than one level of detail for requirements. You'd typically differentiate the levels of detail into a range from 0 (rough mission statement) to 4 (technical details), for example.
So if you specify that your SAN shall be operating with at least a bandwidth capacity of x, that would be a high number on that scale. Make sure to break down your main ideas (The system shall be responsive, in order to prevent clients from becoming impatient and leaving for competitors....) into more measurable aims (as the one above).
Stephen Withall has written down good examples in his book "Software Requirement Patterns". See chapter 9, page 191 ff., it is not that expensive.
He breaks it down into recommendations on, and I quote, Response Time, Throughput, Dynamic Capacity, Static Capacity and Availability.
Of course, that's software! Because basically, you'd probably be well advised to begin with defining what the whole system asserts under specified circumstances: When do we start to measure? (e.g. when the client request comes in at the network gateway); what average network delay do we assume that is beyond our influence? from how many different clients do we measure and from how many different autonomous systems do these make contact? Exactly what kind of task(s) do they execute and for which kind of resource will that be exceptionally demanding? When do we stop to measure? Do we really make a complete system test with all hardware involved? Which kinds of network monitoring will we provide at runtime? etc.
That should help you more than if you just assign a value to a unit like transfer rate/ IOPS which might not even solve your problem. If you find the network hardware to perform below your expectations later, it's rather easy to exchange. Especially if you give your hosting to an external partner. The software, however, is not easy to exchange.
Be sure to differentiate between what is a requirement or constraint that you have to meet, and what is actually a part of the technical solution you offer. There might be more solutions. Speed is a requirement (a vague one, though). Architecture for hardware is a solution.
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 months ago.
Improve this question
I am developing an architecture of a website which will have more than 5000+ concurrent hits with each user possibly requiring heavy background processing. What are the guidelines for the same? Which technologies are recommended.
This is primarily for Java/Spring-boot
try to acheive horizontal scalability.
use JPA with L2 caching that's backed by Redis if you're doing processing
Utilize a cloud infrastructure to have the managed redis/serverless databases (saves you the headache)
When implementing your web tier, use Reactive programming platforms like Project Reactor. This will allow you to scale better with less resources.
Offload anything that's scheduled processing away from your main app cluster. Since the scheduler usually runs as a single instance
Don't put your background processing in the same nodes as your main app cluster.
Offload UI responsibility to the client. (i.e. just expose APIs)
Avoid request/response (except for authentication) and focus on subscribing to events to update the local client data. Or use something like CouchDB to synchronize data between server and device.
Leverage caching on the device.
Do not "proxy" large content, instead use a direct upload to an object store like S3 (or better use Minio to avoid vendor lock to Amazon).
leverage different types of data store technologies
RDBMS (stable, easily understood, less vendor lock if using ORMs, easy to back up and restore, not as scalable for writes)
Elastic Search (efficient searching of data, but only use it for search, vendor lock)
Kafka (stable, harder to understand, but much more scalable, vendor lock)
Hazelcast/MemCached/Redis (unreliable key-value stores, very very fast, super scalable, useful for sharing and caching data)
I intentionally didn't list others like Cassandra, MongoDB as these would yield major vendor lock and harder to transfer skills.
Closed. This question is not about programming or software development. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 4 months ago.
Improve this question
What is currently the most seeded torrent in the world?
Is it even possible to accurately determine the most connected torrent ?
I suppose if you had access to every ISP's network and could filter out the protocol bittorrent uses. Maybe that's a little unrealistic and too low-level.
Every tracker could be queried, counting the seeds for each torrent, eventually resulting in the highest count unique to a specific torrent.
Any tracker that has a significant amount of users that would sway the count weighting would be included in the query.
Oddly enough a program BitChe allows for multiple tracker / search engines for a specific textual inquiry. However there is no "popular torrents" section.
"Popular Torrents this Week", like at (https://1337x.to/home/) is not the same or a complete metric.
Every tracker / search engine with a significant volume of monthly traffic would need to be included, and the count metric would have to be a difference accumulated monthly. Accuracy would improve over time.
A good list is at (https://blokt.com/guides/best-torrent-sites) showing the traffic volume (in millions) of multiple popular torrent trackers / search engines.
Has anyone developed this already by any chance?
What is currently the most seeded torrent in the world? Should we know this? A definitive list of the most popular torrents to date.
Is it even possible to accurately determine the most connected torrent ?
Accurately? No. Bittorrent is a protocol, it can be deployed in private networks. E.g. if large cloud providers happened to use bittorrent internally and we don't know about it then they might happen to be operating the largest swarm in the world and any outside observer would miss that.
If we limit ourselves to torrents traversing the public internet then perhaps if you had access to several internet exchanges and could passively sniff traffic...
Since most of bittorrent is unencrypted it would be possible to gather statistics that way. There is no other way to globally observe torrents marked as private since they only communicate with their specific trackers which in turn may not publish their statistics to unauthenticated users.
If you restrict yourself to the subset of non-private torrents on the open internet seeded through clients with DHT peer discovery enabled then it gets easier. You can first build a database of torrents via active infohash sampling and passively observing get_peers lookups coming your way. Since popular torrents should create a lot of traffic you're likely to learn about those infohashes fairly soon this way. Then perform DHT scrapes to get the seed count for each torrent and perhaps connect to the Top-100 torrents gathered that way and perform peer exchanges to validate the DHT scrapes.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I was wondering where I could learn more about decentralized sharing and P2P networks. Ideally, I'd like to create something to help students share files with one another over their universities network, so they could share without fear of outside entities.
I'm not trying to build the next Napster here, just wondering if this idea is feasible. Are there any open source P2P networks out there that could be tweaked to do what I want?
Basically you need a server (well, you don't NEED a server, but it would make it much simplier) that would store user IPs between other things like file hash lists, etc.
That server can be in any enviroinment you want (which is very comfortable).
Then, each client connects to the server (it should have a dns, it can be a free one, I've used no-ip.com once) and sends basic information first (such as its IP, and a file hash list), then sends something every now and then (say each 5 minutes or less) to report that it's still reachable.
When a client searchs files/users, it just asks the server.
This is a centralized network, but the file sharing would be done in p2p client-to-client connections.
The reason to do it like this is that you can't know an IP to connect to without some reference.
Just to clear this server thing up:
- Torrents use trackers.
- eMule's ED2K uses lugdunum servers.
- eMule's "true p2p" Kademlia uses known nodes (clients) (most of the time taken from servers like this).
Tribler is what you are looking for!
It's a fully decentralized BitTorrent Client from the Delft University of Technology. It's Open Source and written in Python, so also a great starting point to learn.
Use DC++
What is wrong with Bit-Torrent?
Edit: There is also a pre-built P2P network on Microsoft operating systems that is pretty cool as the basis to build something. http://technet.microsoft.com/en-us/network/bb545868.aspx
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
And what would you recommend for an ASP Net web application, with a not so large SQL server database (around 10Gb)?
I was just wondering, is that a good idea to have an Amazon EC2 instance configured ready to host your app in an emergency?
In this scenario, what would be the best approach to keep the database updated (log shipping? manual backup restore?) and the easiest and fastest way to change the dns settings?
Edit: the acceptable downtime would be something between 4 to 6 hours, thats why i considered using the Amazon ec2 option for its lower cost if compared to renting a secondary server.
Update - Just saw your comment. Amazon EC2 with log shipping is definitely the way to go. Don't use mirroring because that normally assumes the other standby database is available. Changing your DNS should not take more than 1/2 hour if you set your TTL to that. That would give you time to integrate any logs that are pending. Might turn on the server once a week or so just to integrate logs that are pending (or less to avoid racking up hourly costs.)
Your primary hosting location should have redundancy at all levels:
Multiple internet connections,
Multiple firewalls set to failover,
Multiple clustered web servers,
Multiple clustered database servers,
If you store files, use a SAN or Amazon S3,
Every server should have some form of RAID depending on the server's purpose,
Every server can have multiple PSUs connected to separate power sources/breakers,
External and internal server monitoring software,
Power generator that automatically turns on when the power goes out, and a backup generator for good measure.
That'll keep you running at your primary location in the event of most failure scenarios.
Then have a single server set up at a remote location that is kept updated using log shipping and include it in your deployment script (after your normal production servers are updated...) A colocated server on the other side of the country does nicely for these purposes. To minimize downtime of having to switch to the secondary location keep your TTL on the DNS records as low as you are comfortable.
Of course, so much hardware is going to be steep so you'll need to determine what is worth being down for 1 second, 1 minute, 10 minutes, etc. and adjust accordingly.
It all depends on what your downtime requirements are. If you've got to be back up in seconds in order to not lose your multi-billion dollar business, then you'll do things a lot differently to if you've got a site that makes you maybe $1000/month and whose revenue won't be noticeably affected if it's down for a day.
I know that's not a particularly helpful answer, but this is a big area, with a lot of variables, and without more information it's almost impossible to recommend something that's actually going to work for your situation (since we don't really know what your situation is).
The starting point for a rock solid DR Strategy is to first work out what the true cost is to the business of your server/platform downtime.
The following article will get you started along the right lines.
https://web.archive.org/web/1/http://articles.techrepublic%2ecom%2ecom/5100-10878_11-1038783.html
If you require further guidelines good old Google can provide plenty more reading.
A project of this nature requires you to collaborate with your key business decision makers and you will need to communicate to them what the associated costs of downtime are and what the business impact would be. You will likely need to collaborate with several business units in order to gather the required information. Collectively you then need to come to a decision as to what is considered acceptable downtime for your business. Only then can you devise a DR strategy to accommodate these requirements.
You will also find that conducting this exercise may highlight shortcomings in your platforms current configuration with regard to high availability and this may also need to be reviewed as an aside project.
The key point to take away from all of this is that the decision as to what is an acceptable period of downtime is not for the DBA alone to decide but rather to provide the information and expert knowledge necessary so that a realistic decision can be reached. Your task is to implement a strategy that can meet the business requirements.
Don’t forget to test your DR strategy by conducting a test scenario in order to validate your recovery times and to practice the process. Should the time come when you need to implement your DR strategy you will likely be under pressure, your phone will be ringing frequently and people will be hovering around you like mosquitoes. Having already honed and practiced your DR response, you can be confident in taking control of the situation and implementing the recovery will be a smooth process.
Good luck with your project.
I haven't worked with different third party tools but I've experienced cloudendure, and as for the replica you get I can tell it is a really high end product. Replication is done in really tiny time intervals which makes your replica very reliable, but I can see you're not in need of having your site back up within seconds so maybe asking for a price offer or getting away with a different vendor might help.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 8 years ago.
Improve this question
The discussion of Dual vs. Quadcore is as old as the Quadcores itself and the answer is usually "it depends on your scenario". So here the scenario is a Web Server (Windows 2003 (not sure if x32 or x64), 4 GB RAM, IIS, ASP.net 3.0).
My impression is that the CPU in a Webserver does not need to be THAT fast because requests are usually rather lightweight, so having more (slower) cores should be a better choice as we got many small requests.
But since I do not have much experience with IIS load balancing and since I don't want to spend a lot of money only to find out I've made the wrong choice, can someone who has a bit more experience comment on whether or not More Slower or Fewer Faster cores is better?
For something like a webserver, dividing up the tasks of handling each connection is (relatively) easy. I say it's safe to say that web servers is one of the most common (and ironed out) uses of parallel code. And since you are able to split up much of the processing into multiple discrete threads, more cores actually does benefit you. This is one of the big reasons why shared hosting is even possible. If server software like IIS and Apache couldn't run requests in parallel it would mean that every page request would have to be dished out in a queue fashion...likely making load times unbearably slow.
This also why high end server Operating Systems like Windows 2008 Server Enterprise support something like 64 cores and 2TB of RAM. These are applications that can actually take advantage of that many cores.
Also, since each request is likely has low CPU load, you can probably (for some applications) get away with more slower cores. But obviously having each core faster can mean being able to get each task done quicker and, in theory, handle more tasks and more server requests.
We use apache on linux, which forks a process to handle requests. We've found that more cores help our throughput, since they reduce the latency of processes waiting to be placed on the run queue. I don't have much experience with IIS, but I imagine the same scenario applies with its thread pool.
Mark Harrison said:
I don't have much experience with IIS, but I imagine the same scenario applies with its thread pool.
Indeed - more cores = more threads running concurrently. IIS is inherently multithreaded, and takes easy advantage of this.
The more the better. As programming languages start to become more complex and abstract, the more processing power that will be required.
Atleat Jeff believes Quadcore is better.