Why is direct output to network share much slower than inter-buffering? - networking

This is an Arch Linux System where I mounted a network device over SSHFS (SFTP) using GVFS managed by Nemo FM. I'm using Handbrake to convert a video that lies on my SSD.
Observations:
If I encode the video using Handbrake and set the destination to a folder on the SSD, I get 100 FPS
If I copy a file from the SSD to the network share (without Handbrake), I get 3 MB/s
However, if I combine both (using Handbrake with the destination set to a folder on the network share), I get 15 FPS and 0.2 MB/s, both being significantly lower than the available capacities.
I suppose this is a buffering problem. But where does it reside? Is it Handbrake's fault, or perhaps GVFS caching not enough? Long story short, how can the available capacities be fully used in this situation?

When accessing the file over SFTP Handbrake will be requesting small portions of the file rather than the entire thing, meaning it is starting and finishing lots of transfers and adding that much more overhead.
Your best best for solving this issue is to transfer the ENTIRE file to the SSD before performing the encoding. 3 MB/s is slower than direct access to an older, large capacity mechanical drive and as such will not give you the performance you are looking for so direct access to a network share is not recommended unless you can speed up those transfers significantly.

Related

I want to cache static frequently used content on disk

We are going to deploy a storage server without raid ( we have lots of data but limited storage for now | data is not important ), so we will assign a subdomain to each of 12 x 8 TB drives for our clients to download from it.
Clients will be downloading content through a static URL over http (http://subdomain1.xyzwebsite.com/folder1/file1.mkv), our server is powerful with 128 GB of RAM and 6 x 2 Cores Processor with 10 Gigabit LAN Card but without RAID when multiple clients download from same drive it will look like a bottleneck so to overcome it I started to look into varnish cache but i do not get a satisfaction how will it serve data (I do not understand setting object size and manually setting cache location to RAM or DISK).
NOTE: each file size can range from 500 MB to 4 GB
We do not want a separate server for caching data, we want to utilize this powerful server to do this, now for the solution i think that data is located in a 1 drive and if it is possible to copy/mirror/cache frequent used (files download in 24 hours or 12 hours) content to second drive and serve same file with same sub-domain
NOTE: Nginx know which file is accessed via access.log
scenerio:
there are 12 drives (there are 2 separate drives for os which i'm not counting here), i will store data on 11 drives and use 12th drive as a copy/mirror/cache for all drives, i know how http works whether i add multiple ip to same domain i can only download from one ip at a time ( i will add multiple ip address on same server ), this is my solution data will be served via round-robin, if one client is downloading from one ip another client might get to download from second ip.
Now i dont know how to implement it, i tried searching for solutions but i do not get any, there are two main problems:
how to copy/mirror/cache only frequent data of the 11 drives to 1 drive and serve from it
If i add second ip address entry to same subdomain and there is no data on 12th drive how will it fetch it
Nginx or Varnish based solution is required on same server, if RAM based cache can be done it will be good to
Varnish can be used for this, but unfortunately not the open source version.
Varnish Enteprise features the so-called Massive Storage Engine, which uses both disk and RAM to store large volumes of data.
Instead of using files to store objects, MSE uses pre-allocated large files with filesystem-like behavior. This is much faster and less prone to disk fragmentation.
In MSE you can configure how individual disks should behave and how much storage per disk is used. Each disk or group of disks can be tagged.
Based on Varnish Enterprise's MSE VMOD, you can then control what content is stored on each disk or group of disks.
You can decide how content is distributed to disk based on content type, URL, content size, disk usage and many other parameters. You can also choose not to persist content on disk, but just keep content in memory.
Regardless of this MSE VMOD, "hot content" will be automatically buffered from disk into memory. There are also "waterlevel" settings you can tune do decide how to automatically ensure that enough space is always available.

How many safe parallel connections can be made to a server

I am trying to make a simple general purpose multi-threaded async downloader in python.How many parallel connections can be generally be made to a server with minimum risk of being banned or rate limited.
I am aware that network will be a limiting in some cases but lets assume in this case that network isn't an issue in this case for the sake of discussion.I/O is also done asynchronously.
According to Browserscope , browsers make a maximum of 17 connections at a time.
However according to my research , most download managers download files in multi-part and make 8+ connections per file.
1.How many files can be downloaded at a time ?
2.How many chunks for a single can be downloaded at one time ?
3.What should be the minimum size of those chunks to make it worth creating the overhead of creating parallel connections ?
It depends.
While some servers tolerate a high number of connections, others don't. General web servers might be more on the high side (low two digit), file hosters might be more sensitive.
There's little to say unless you can check the server's configuration or just try and remember for the next time when your ban has timed out.
You should however watch your bandwidth. Once you max out your access line there's no gain in further increasing the connections.

How to maximize downloading throughput by multi-threading

I am using multi-threading to speed up the process of downloading a bunch of files from the Web.
How might I determine how many threads I should use to maximize or nearly maximize the total download throughput?
PS:
I am using my own laptop and the bandwidth is 1Mb.
The data I want is the webpage source code of coursera.com
There are much more factors than only number of threads if you want to speed up downloading files from network. Actually I don't believe that you will achieve this expect that there are some limitations you haven't described (like max bandwidth per connection on server side, you have multilink client and you can use different links do download different data, you want to download different parts from different servers, or similar).
In usual conditions having multiple threads to download something will slow the process. You will need to maintain couple of connections and somehow synchronise data (expect if you will download e.g. different files at the same time).
I would say that in "ordinary" conditions much bigger limitations are your bandwidth limit so using more threads will not make downloading faster. You will in this case share your whole bandwidth to many connections.

Increase in number of requests form server cause website slow?

In My office website,webpage has 3css files ,2 javascript files ,11images and 1page request total 17 requests from server, If 10000 people visit my office site ...
This may slow the website due to more requests??
And any issues to the server due to huge traffic ??
I remember My tiny office server has
Intel i3 Processor
Nvidia 2Gb Graphic card
Microsoft 2008 server
8 GB DDR3 Ram and
500GB Hard disk..
Website developed on Asp.Net
Net speed was 10mbps download and 2mbps upload.using static ip address.
There are many reasons a website may be slow.
A huge spike in Additional Traffic.
Extremely Large or non-optimized graphics.
Large amount of external calls.
Server issue.
All websites should have optimized images, flash files, and video's. Large types media slow down the overall loading of each page. Optimize each image.PNG images have an improved weighted optimization that can offer better looking images with smaller file size.You could also run a Traceroute to your site.
Hope this helps.
This question is impossible to answer because there are so many variables. It sounds like you're hypothesising that you will have 10000 simultaneous users, do you really expect there to be that many?
The only way to find out if your server and site hold up under that kind of load is to profile it.
There is a tool called Apache Bench http://httpd.apache.org/docs/2.0/programs/ab.html which you can run from the command line and simulate a number of requests to your server to benchmark it. The tool comes with an install of apache, then you can simulate 10000 requests to your server and see how the request time holds up. At the same time you can run performance monitor in windows to diagnose if there are any bottlenecks.
Example usage taken from wikipedia
ab -n 100 -c 10 http://www.yahoo.com/
This will execute 100 HTTP GET requests, processing up to 10 requests
concurrently, to the specified URL, in this example,
"http://www.yahoo.com".
I don't think that downloads your page dependencies (js, css, images), but there probably are other tools you can use to simulate that.
I'd recommend that you ensure that you enable compression on your site and set up caching as this will significanly reduce the load and number of requests for very little effort.
Rather than hardware, you should think about your server's upload capacity. If your upload bandwidth is low, of course it would be a problem.
The most possible reason is because one session is lock all the rest requests.
If you not use session, turn it off and check again.
relative:
Replacing ASP.Net's session entirely
jQuery Ajax calls to web service seem to be synchronous

How to pick a file I/O buffer size for reading a file in Windows?

While investigating some slow performance in my application while reading a file over a WAN, I noticed that copying that file in Windows Explorer was significantly faster.
Some further investigation with Process Monitor revealed the cause: my application was using the C runtime's default BUFSIZE of 512, while Windows Explorer had somehow determined that it should read the file in 61440 byte blocks (which is apparently the maximum supported by either SMB or Windows' implementation of SMB). As a result, Windows Explorer had to make a LOT fewer round trips and ran a lot faster.
Most recommendations for buffer size are somewhere in the 4k-16k range, but for a WAN environment, minimizing round trips by maximizing the buffer size makes sense. How does Windows Explorer determine what buffer size to use?
I would have thought that something less than the network MTU might be good

Resources