Mapped drive download/upload is really slow - azure-storage-files

I shared some files on azure storage.
My internet speed is 8Mbps but I am getting around 1Mbps when I download files from the shared drive.
Could you please let me know how I can increase bandwidth?

The Target throughput for single file share can up to 60MB per second in Azure File storage, We could find this information here: https://azure.microsoft.com/en-us/documentation/articles/storage-scalability-targets/. So I think this issue is not caused by Azure storage side. Different Azure VM size has different network bandwidth. Refer to https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-windows-sizes/ for more details. I would suggest you change the VM size to see whether it can improve your bandwidth. Finally, please check your environment network bandwidth, make sure no other Apps are occupied bandwidth.

Related

Are Google Cloud Disks OK to use with SQLite?

Google Cloud disks are network disks that behave like local disks. SQLite expects a local disk so that locking and transactions work correctly.
A. Is it safe to use Google Cloud disks for SQLite?
B. Do they support the right locking mechanisms? How is this done over the network?
C. How does disk IOP's and Throughput relate to SQLite performance? If I have a 1GB SQLite file with queries that take 40ms to complete locally, how many IOP's would this use? Which disk performance should I choose between (standard, balanced, SSD)?
Thanks.
Related
https://cloud.google.com/compute/docs/disks#pdspecs
Persistent disks are durable network storage devices that your instances can access like physical disks
https://www.sqlite.org/draft/useovernet.html
the SQLite library is not tested in across-a-network scenarios, nor is that reasonably possible. Hence, use of a remote database is done at the user's risk.
Yeah, the article you referenced, essentially stipulates that since the reads and writes are "simplified", at the OS level, they can be unpredictable resulting in "loss in translation" issues when going local-network-remote.
They also point out, it may very well work totally fine in testing and perhaps in production for a time, but there are known side effects which are hard to detect and mitigate against -- so its a slight gamble.
Again the implementation they are describing is not Google Cloud Disk, but rather simply stated as a remote networked arrangement.
My point is more that Google Cloud Disk may be more "virtual" rather than purely networked attached storage... to my mind that would be where to look, and evaluate it from there.
Checkout this thread for some additional insight into the issues, https://serverfault.com/questions/823532/sqlite-on-google-cloud-persistent-disk
Additionally, I was looking around and I found this thread, where one poster suggest using SQLite as a read-only asset, then deploying updates in a far more controlled process.
https://news.ycombinator.com/item?id=26441125
the persistend disk acts like a normal disk in your vm. and is only accessable to one vm at a time.
so it's safe to use, you won't lose any data.
For the performance part. you just have to test it. for your specific workload. if you have plenty of spare ram, and your database is read heavy, and seldom writes. the whole database will be cached by the os (linux) disk cache. so it will be crazy fast. even on hdd storage.
but if you are low on spare ram. than the database won't be in the os cache. and writes are always synced to disk. and that causes lots of I/O operations.
in that case use the highest performing disk you can / are willing to afford.

Which one is the cheapest to use AWS RDS or my own database?

I have wordpress application running on my EC2 - AWS. I haven't decide which one is Amazon RDS or my own database on different hosting. Which one is the cheapest to use? Let's say I have my own MySQL database from Lunarpages or Bluehost hosting, to allow my wordpress on EC2 instance to connect/remote to my database on Lunarpages not allow my wordpress to connect remote to Amazon RDS. Which one is the cheapest to use? I heard people saying when you use Amazon RDS is very expensive, so I thought maybe to save costs to allow my wordpress to connect to my own database not Amazon RDS for wordpress. I don't know it is true or not. I don't know how it performance well. Which one is the best one. Any suggestion appreciated. Thank you.
I don't agree with that. In amazon AWS, the first thing you do is set up a virtual private network and create the corresponding network access interfaces. My experience working with heavy CMSs is that the architecture is much more stable with EC2 + RDS, each in one instance. In addition RDS has automated version maintenance and it is much more difficult to fail or suffer a crash, as opposed to a mysql or similar, running on the same virtual machine.
Also in terms of speed and performance, working with this scheme for example in wordpress, the system flies, the speed is much higher and appreciable even with small machines.
Running on a different hosting will cause extra latency.
Let's do the math on AWS RDS for the smallest instances (taking eu-west-1 region as example)
Running on RDS: db.t2.micro $0.018 per hour, or $12.96 per month for RDS. Free the first year under AWS free tier.
Running on EC2: t2.micro (You configure MySQL and backups, ...), $0.0126 per hour, or $9.07 per month. Free the first year under AWS free tier
If your application is small enough, you could host both your database and your application on the same machine (solution 2)
Performance wise is not good to have database on a totally different network of the website hosted itself. It'll delay. Imagine if you have a lot of calls, it'll multiply the delay.
You can host a local database on the EC2 it'self, this would be the best choice.

Shared data for Amazon EC2 instances

To handle high traffic, I'm planning to scale out, run my web application (WordPress based) on some EC2 instances (I'm very new to AWS). The instances need to work on the same data (images, videos...).
I am thinking about using S3 as the storage for this shared data.
My questions are:
If I use S3, do I need to write extra codes for my application to upload and get data to/from S3? Or there is a magic way to mount EC2 instances to S3, and after that EC2 instances can access S3 as accessing the local storage?
I've heard that S3 is a bit slow since it is accessed through web services (if users upload files and it takes time to upload the files to S3). So is there any better way for storing shared data?
I've read some documents about the ability of scaling of Amazon EC2. But none of them mentions about how to handle shared data. Any help is highly appreciated. Thanks.
There is no native facility to 'mount' an S3 bucket as storage to an EC2 instance, although there are several 3rd-party apps which offer mechanisms to make S3 storage available as a virtual drive or repository. Most of them offer a preset amount of free storage and then a tiered charging mechanism for larger amounts - Google for 'S3 storage interface' and take a look.
Whether you write code to use S3 through the API or use an interface layer, there will always be some latency between your app and the storage. That's a fact of physics and there's nothing you can do to eliminate the delay, because the S3 repository is not local to the EC2 cluster - so you will never achieve 'local' storage access speeds.
An alternative might be to use EBS which is local to your EC2 instance - it has different properties to S3 (for example, it does not offer edge locations for regionally-localised access) but is much faster for application use because it is inside the EC2 cluster and mounted as local storage.
You could mount S3 bucket onto all your EC2 instances. It's a 2-way mount so all your files will be synchronized. You could use s3f3 to do the mounting.
I used this guide and set up mine pretty quick: Mount S3 onto EC2
If you are then concerned about speed, you can use Amazon ElastiCache or even use EBS as a cache drive.
For starts you question lacks some details about your application architecture, but there are some possibilities.
First, if you project is medium-sized you could use GlusterFS on your main nodes as servers and clients at the same time (using native or NFS protocol), RDS *Multi-AZ* MySQL instance for DataBase. CloudFront as CDN with CDN linker or W3TC plugins. Also, put an ELB in front.
In this particular case I would recommend a couple c3.large instances at least.
Second, when your project would grow - you should make instance AMI and created an auto-scaling group that would just connect to your main storage and compute instances. (Also consider lifting compute load from these rather small instances).
Things to consider additionally:
Great WordPress article about WordPress clusters for is http://harish11g.blogspot.ru/2012/01/scaling-wordpress-aws-amazon-ec2-high.html
Alternative to GlusterFS solution could be Ceph File System
You also could (or maybe should) you SSD cache (for example flashcache) for that GlusterFS volume.

Efficient reliable incremental HTTP multi-file (or whole directory) upload software

Imagine you have a web site that you want to send a lot of data. Say 40 files totaling the equivalence of 2 hours of upload bandwidth. You expect to have 3 connection losses along the way (think: mobile data connection, WLAN vs. microwave). You can't be bothered to retry again and again. This should be automated. Interruptions should not cause more data loss than neccessary. Retrying a complete file is a waste of time and bandwidth.
So here is the question: Is there a software package or framework that
synchronizes a local directory (contents) to the server via HTTP,
is multi-platform (Win XP/Vista/7, MacOS X, Linux),
can be delivered as one self-contained executable,
recovers partially uploades files after interrupted network connections or client restarts,
can be generated on a server to include authentication tokens and upload target,
can be made super simple to use
or what would be a good way to build one?
Options I have found until now:
Neat packaging of rsync. This requires an rsync (server) instance on the server side that is aware of a privilege system.
A custom Flash program. As I understand, Flash 10 is able to read a local file as a bytearray (indicated here) and is obviously able to speak HTTP to the originating server. Seen in question 1344131 ("Upload image thumbnail to server, without uploading whole image").
A custom native application for each platform.
Thanks for any hints!
Related work:
HTML5 will allow multiple files to be uploaded or at least selected for upload "at once". See here for example. This is agnostic to the local files and does not feature recovery of a failed upload.
Efficient way to implement a client multiple file upload service basically asks for SWFUpload or YUIUpload (Flash-based multi-file uploaders, otherwise "stupid")
A comment in question 997253 suggests JUpload - I think using a Java applet will at least require the user to grant additional rights so it can access local files
GearsUploader seems great but requires Google Gears - that is going away soon

What's the best solution for file storage for a load-balanced ASP.NET app?

We have an ASP.NET file delivery app (internal users upload, external users download) and I'm wondering what the best approach is for distributing files so we don't have a single point of failure by only storing the app's files on one server. We distribute the app's load across multiple front end web servers, meaning for file storage we can't simply store a file locally on the web server.
Our current setup has us pointing at a share on a primary database/file server. Throughout the day we robocopy the contents of the share on the primary server over to the failover. This scneario ensures we have a secondary machine with fairly current data on it but we want to get to the point where we can failover from the primary to the failover and back again without data loss or errors in the front end app. Right now it's a fairly manual process.
Possible solutions include:
Robocopy. Simple, but it doesn't easily allow you to fail over and back again without multiple jobs running all the time (copying data back and forth)
Store the file in a BLOB in SQL Server 2005. I think this could be a performance issue, especially with large files.
Use the FILESTREAM type in SQL Server 2008. We mirror our database so this would seem to be promising. Anyone have any experience with this?
Microsoft's Distributed File System. Seems like overkill from what I've read since we only have 2 servers to manage.
So how do you normally solve this problem and what is the best solution?
Consider a cloud solution like AWS S3. It's pay for what you use, scalable and has high availability.
You need a SAN with RAID. They build these machines for uptime.
This is really an IT question...
When there are a variety of different application types sharing information via the medium of a central database, storing file content directly into the database would generally be a good idea. But it seems you only have one type in your system design - a web application. If it is just the web servers that ever need to access the files, and no other application interfacing with the database, storage in the file system rather than the database is still a better approach in general. Of course it really depends on the intricate requirements of your system.
If you do not perceive DFS as a viable approach, you may wish to consider Failover clustering of your file server tier, whereby your files are stored in an external shared storage (not an expensive SAN, which I believe is overkill for your case since DFS is already out of your reach) connected between Active and Passive file servers. If the active file server goes down, the passive may take over and continue read/writes to the shared storage. Windows 2008 clustering disk driver has been improved over Windows 2003 for this scenario (as per article), which indicates the requirement of a storage solution supporting SCSI-3 (PR) commands.
I agree with Omar Al Zabir on high availability web sites:
Do: Use Storage Area Network (SAN)
Why: Performance, scalability,
reliability and extensibility. SAN is
the ultimate storage solution. SAN is
a giant box running hundreds of disks
inside it. It has many disk
controllers, many data channels, many
cache memories. You have ultimate
flexibility on RAID configuration,
adding as many disks you like in a
RAID, sharing disks in multiple RAID
configurations and so on. SAN has
faster disk controllers, more parallel
processing power and more disk cache
memory than regular controllers that
you put inside a server. So, you get
better disk throughput when you use
SAN over local disks. You can increase
and decrease volumes on-the-fly, while
your app is running and using the
volume. SAN can automatically mirror
disks and upon disk failure, it
automatically brings up the mirrors
disks and reconfigures the RAID.
Full article is at CodeProject.
Because I don't personally have the budget for a SAN right now, I rely on option 1 (ROBOCOPY) from your post. But the files that I'm saving are not unique and can be recreated automatically if they die for some reason so absolute fault-tolerance is necessary in my case.
I suppose it depends on the type of download volume that you would be seeing. I am storing files in a SQL Server 2005 Image column with great success. We don't see heavy demand for these files, so performance is really not that big of an issue in our particular situation.
One of the benefits of storing the files in the database is that it makes disaster recovery a breeze. It also becomes much easier to manage file permissions as we can manage that on the database.
Windows Server has a File Replication Service that I would not recommend. We have used that for some time and it has caused alot of headaches.
DFS is probably the easiest solution to setup, although depending on the reliability of your network this can become un-synchronized at times, which requires you to break the link, and re-sync, which is quite painful to be honest.
Given the above, I would be inclined to use a SQL Server storage solution, as this reduces the complexity of your system, rather then increases it.
Do some tests to see if performance will be an issue first.

Resources