I want to cache static frequently used content on disk - nginx

We are going to deploy a storage server without raid ( we have lots of data but limited storage for now | data is not important ), so we will assign a subdomain to each of 12 x 8 TB drives for our clients to download from it.
Clients will be downloading content through a static URL over http (http://subdomain1.xyzwebsite.com/folder1/file1.mkv), our server is powerful with 128 GB of RAM and 6 x 2 Cores Processor with 10 Gigabit LAN Card but without RAID when multiple clients download from same drive it will look like a bottleneck so to overcome it I started to look into varnish cache but i do not get a satisfaction how will it serve data (I do not understand setting object size and manually setting cache location to RAM or DISK).
NOTE: each file size can range from 500 MB to 4 GB
We do not want a separate server for caching data, we want to utilize this powerful server to do this, now for the solution i think that data is located in a 1 drive and if it is possible to copy/mirror/cache frequent used (files download in 24 hours or 12 hours) content to second drive and serve same file with same sub-domain
NOTE: Nginx know which file is accessed via access.log
scenerio:
there are 12 drives (there are 2 separate drives for os which i'm not counting here), i will store data on 11 drives and use 12th drive as a copy/mirror/cache for all drives, i know how http works whether i add multiple ip to same domain i can only download from one ip at a time ( i will add multiple ip address on same server ), this is my solution data will be served via round-robin, if one client is downloading from one ip another client might get to download from second ip.
Now i dont know how to implement it, i tried searching for solutions but i do not get any, there are two main problems:
how to copy/mirror/cache only frequent data of the 11 drives to 1 drive and serve from it
If i add second ip address entry to same subdomain and there is no data on 12th drive how will it fetch it
Nginx or Varnish based solution is required on same server, if RAM based cache can be done it will be good to

Varnish can be used for this, but unfortunately not the open source version.
Varnish Enteprise features the so-called Massive Storage Engine, which uses both disk and RAM to store large volumes of data.
Instead of using files to store objects, MSE uses pre-allocated large files with filesystem-like behavior. This is much faster and less prone to disk fragmentation.
In MSE you can configure how individual disks should behave and how much storage per disk is used. Each disk or group of disks can be tagged.
Based on Varnish Enterprise's MSE VMOD, you can then control what content is stored on each disk or group of disks.
You can decide how content is distributed to disk based on content type, URL, content size, disk usage and many other parameters. You can also choose not to persist content on disk, but just keep content in memory.
Regardless of this MSE VMOD, "hot content" will be automatically buffered from disk into memory. There are also "waterlevel" settings you can tune do decide how to automatically ensure that enough space is always available.

Related

I want to scale mariadb database for huge number of query requests

I use Moodle on centos7 with Php, Mariadb, Nginx. There are huge number of users that use this Moodle. If the number of users grows more than 300user per sec, the Moodle has delay in response and seems to be hanged!
I read about:
Galera (multi master clustering with 3nodes)
slave-master (separate read and write)
MaxScale
increase ram and cpu (I have up to: 288GB ram, 24coreCPU, SSD drive)
What is the best practice to serve huge number of requests without delay? How can I scale my database (because it is the bottleneck)? I want scale it for serve huge request (most of them is read from database)
MariaDB (and MySQL) can scale 'infinitely' for reads by using Replication and sending read requests to Slave servers.
500 connections per second is very high. (But I don't know what the practical limit is.)
There are several extra tools that can do "connection pooling". Search for this; it may let you go well past 500 logical connections on a single server.
In the case of Galera, you could have 3 read-write nodes, plus any number of Slaves hanging off each of the 3.
For simple Master-Slave, there can be any number of Slaves hanging off the one Master.
Obviously you can do generic MySQL/MariaDB tuning first, and use a recent version of Moodle (3.7 is current right now)
After that, one thing you can check is how you have sessions implemented.
https://docs.moodle.org/37/en/Session_handling
This page also has many more tips:
https://docs.moodle.org/37/en/Performance_recommendations

Why is direct output to network share much slower than inter-buffering?

This is an Arch Linux System where I mounted a network device over SSHFS (SFTP) using GVFS managed by Nemo FM. I'm using Handbrake to convert a video that lies on my SSD.
Observations:
If I encode the video using Handbrake and set the destination to a folder on the SSD, I get 100 FPS
If I copy a file from the SSD to the network share (without Handbrake), I get 3 MB/s
However, if I combine both (using Handbrake with the destination set to a folder on the network share), I get 15 FPS and 0.2 MB/s, both being significantly lower than the available capacities.
I suppose this is a buffering problem. But where does it reside? Is it Handbrake's fault, or perhaps GVFS caching not enough? Long story short, how can the available capacities be fully used in this situation?
When accessing the file over SFTP Handbrake will be requesting small portions of the file rather than the entire thing, meaning it is starting and finishing lots of transfers and adding that much more overhead.
Your best best for solving this issue is to transfer the ENTIRE file to the SSD before performing the encoding. 3 MB/s is slower than direct access to an older, large capacity mechanical drive and as such will not give you the performance you are looking for so direct access to a network share is not recommended unless you can speed up those transfers significantly.

To how many users per second, 1 MB page can be served through 100 Mbps (12.5 MBps) uplink port of a dedicated server.

To how many users per second, 1 MB page can be served through 100 Mbps (12.5 MBps) uplink port of a dedicated server.
I am planning to increase capacity of my dedicated server as my current server is not able to manage the load of my application.
Henceforth, I need to understand the uplink port connection offered by varied dedicated server providers.
In Amazon EC2 this is mentioned as Network Performance, which only providsions 10 Gigabit on its largest instances.
Pls guide.
Simply put, a 12.5MB/s connection is going to be able to serve a 1MB page to 12.5 users every second.
That said, are you absolutely sure it's the network throughput that's causing the problem, rather than a CPU or memory limit? In my experience, the network link is very rarely the bottleneck.
Bear in mind that a 1MB page will often compress to far less than that, assuming the server's compression is configured correctly. And unless you're genuinely seeing 12.5 new users every second, they will likely have a lot of the static assets (images, scripts, etc) cached either in their browser or by an upstream proxy, so they won't be requested every time.
If you really are just serving a 1MB page to a very high number of users rather than being bound by CPU, then you might more luck investigating a CDN (like Cloudflare or Cloudfront) than simply upgrading to a quicker link.

Increase in number of requests form server cause website slow?

In My office website,webpage has 3css files ,2 javascript files ,11images and 1page request total 17 requests from server, If 10000 people visit my office site ...
This may slow the website due to more requests??
And any issues to the server due to huge traffic ??
I remember My tiny office server has
Intel i3 Processor
Nvidia 2Gb Graphic card
Microsoft 2008 server
8 GB DDR3 Ram and
500GB Hard disk..
Website developed on Asp.Net
Net speed was 10mbps download and 2mbps upload.using static ip address.
There are many reasons a website may be slow.
A huge spike in Additional Traffic.
Extremely Large or non-optimized graphics.
Large amount of external calls.
Server issue.
All websites should have optimized images, flash files, and video's. Large types media slow down the overall loading of each page. Optimize each image.PNG images have an improved weighted optimization that can offer better looking images with smaller file size.You could also run a Traceroute to your site.
Hope this helps.
This question is impossible to answer because there are so many variables. It sounds like you're hypothesising that you will have 10000 simultaneous users, do you really expect there to be that many?
The only way to find out if your server and site hold up under that kind of load is to profile it.
There is a tool called Apache Bench http://httpd.apache.org/docs/2.0/programs/ab.html which you can run from the command line and simulate a number of requests to your server to benchmark it. The tool comes with an install of apache, then you can simulate 10000 requests to your server and see how the request time holds up. At the same time you can run performance monitor in windows to diagnose if there are any bottlenecks.
Example usage taken from wikipedia
ab -n 100 -c 10 http://www.yahoo.com/
This will execute 100 HTTP GET requests, processing up to 10 requests
concurrently, to the specified URL, in this example,
"http://www.yahoo.com".
I don't think that downloads your page dependencies (js, css, images), but there probably are other tools you can use to simulate that.
I'd recommend that you ensure that you enable compression on your site and set up caching as this will significanly reduce the load and number of requests for very little effort.
Rather than hardware, you should think about your server's upload capacity. If your upload bandwidth is low, of course it would be a problem.
The most possible reason is because one session is lock all the rest requests.
If you not use session, turn it off and check again.
relative:
Replacing ASP.Net's session entirely
jQuery Ajax calls to web service seem to be synchronous

What's the best solution for file storage for a load-balanced ASP.NET app?

We have an ASP.NET file delivery app (internal users upload, external users download) and I'm wondering what the best approach is for distributing files so we don't have a single point of failure by only storing the app's files on one server. We distribute the app's load across multiple front end web servers, meaning for file storage we can't simply store a file locally on the web server.
Our current setup has us pointing at a share on a primary database/file server. Throughout the day we robocopy the contents of the share on the primary server over to the failover. This scneario ensures we have a secondary machine with fairly current data on it but we want to get to the point where we can failover from the primary to the failover and back again without data loss or errors in the front end app. Right now it's a fairly manual process.
Possible solutions include:
Robocopy. Simple, but it doesn't easily allow you to fail over and back again without multiple jobs running all the time (copying data back and forth)
Store the file in a BLOB in SQL Server 2005. I think this could be a performance issue, especially with large files.
Use the FILESTREAM type in SQL Server 2008. We mirror our database so this would seem to be promising. Anyone have any experience with this?
Microsoft's Distributed File System. Seems like overkill from what I've read since we only have 2 servers to manage.
So how do you normally solve this problem and what is the best solution?
Consider a cloud solution like AWS S3. It's pay for what you use, scalable and has high availability.
You need a SAN with RAID. They build these machines for uptime.
This is really an IT question...
When there are a variety of different application types sharing information via the medium of a central database, storing file content directly into the database would generally be a good idea. But it seems you only have one type in your system design - a web application. If it is just the web servers that ever need to access the files, and no other application interfacing with the database, storage in the file system rather than the database is still a better approach in general. Of course it really depends on the intricate requirements of your system.
If you do not perceive DFS as a viable approach, you may wish to consider Failover clustering of your file server tier, whereby your files are stored in an external shared storage (not an expensive SAN, which I believe is overkill for your case since DFS is already out of your reach) connected between Active and Passive file servers. If the active file server goes down, the passive may take over and continue read/writes to the shared storage. Windows 2008 clustering disk driver has been improved over Windows 2003 for this scenario (as per article), which indicates the requirement of a storage solution supporting SCSI-3 (PR) commands.
I agree with Omar Al Zabir on high availability web sites:
Do: Use Storage Area Network (SAN)
Why: Performance, scalability,
reliability and extensibility. SAN is
the ultimate storage solution. SAN is
a giant box running hundreds of disks
inside it. It has many disk
controllers, many data channels, many
cache memories. You have ultimate
flexibility on RAID configuration,
adding as many disks you like in a
RAID, sharing disks in multiple RAID
configurations and so on. SAN has
faster disk controllers, more parallel
processing power and more disk cache
memory than regular controllers that
you put inside a server. So, you get
better disk throughput when you use
SAN over local disks. You can increase
and decrease volumes on-the-fly, while
your app is running and using the
volume. SAN can automatically mirror
disks and upon disk failure, it
automatically brings up the mirrors
disks and reconfigures the RAID.
Full article is at CodeProject.
Because I don't personally have the budget for a SAN right now, I rely on option 1 (ROBOCOPY) from your post. But the files that I'm saving are not unique and can be recreated automatically if they die for some reason so absolute fault-tolerance is necessary in my case.
I suppose it depends on the type of download volume that you would be seeing. I am storing files in a SQL Server 2005 Image column with great success. We don't see heavy demand for these files, so performance is really not that big of an issue in our particular situation.
One of the benefits of storing the files in the database is that it makes disaster recovery a breeze. It also becomes much easier to manage file permissions as we can manage that on the database.
Windows Server has a File Replication Service that I would not recommend. We have used that for some time and it has caused alot of headaches.
DFS is probably the easiest solution to setup, although depending on the reliability of your network this can become un-synchronized at times, which requires you to break the link, and re-sync, which is quite painful to be honest.
Given the above, I would be inclined to use a SQL Server storage solution, as this reduces the complexity of your system, rather then increases it.
Do some tests to see if performance will be an issue first.

Resources