Hosting big files for users

Hosting big files for users - http

We need to be able to supply big files to our users. The files can easily grow to 2 or 3GB. These files are not movies or similiar. They are software needed to control and develop robots in an educational capacity.
We have some conflict in our project group in how we should approach this challenge. First of all, Bittorrent is not a solution for us (despite the goodness it could bring us). The files will be availiable through HTTP (not FTP) and via a filestream so we can control who gets access to the files.
As a former pirate in the early days of the internet i have often struggled with corrupt files and using filehashes and filesets to minimize the amount of redownload required. I advocate a small application that downloads and verifies a fileset and extracts the big install file once it is completely downloaded and verified.
My colleagues don't think this is nessecary and point to the TCP/IP protocols inherit capabiltities to avoid corrupt downloads. They also mention that Microsoft has moved away from a downloadmanager for their MSDN files.
Are corrupt downloads still a widespread issue or will the amount of time we spend creating a solution to this problem be wasted, compared to the amount of people who will actually be affected by it?
If a download manager is the way to go, what approach would you suggest we take?
-edit-
Just to clearify. Is downloading 3GB of data in one chunk, over HTTP a problem OR should we make our own EXE that downloads the big file in smaller chunks (and verifies them).

You do not need to go for your own download manager. You can use some really smart approach.
Split files in smaller chunks, let's say 100MB each. So even if a download is corrupted, user will end-up downloading with that particular chunk.
Most of web servers are capable of understanding and treating/serving range headers. You can recommend the users to use download manager / browser add-ons which can use this capacity. If your users are using unix/linux systems, wget is such a utility.
Its true that TCP/IP has capacities of preventing corruption but it basically assumes that network is still up and accessible. #2 mentioned above can be one possible work-around to the problems where network was completely down in middle of download.
And finally, it is always good to provide file hash to your users. This is not only to ensure the download but also to ensure the security of the software that you are distributing.
HTH

Related

Protecting a USB drive in java

I'm going to create a Java program that allows "locking" a USB drive by making it's files accessible only with a password. Similar software that does this is USB safeguard.
Here is what I am thinking of doing:
Store all files into a single archive on the USB.
Encrypt the archive using AES or
blowfish
Hide the archive.
The problem is, how can I "unlock" the USB? What approach can I take here? Here is what I have thought of:
Ramdisk: It is very hard, if not impossible, to load a Ramdisk from an encrypted arhive. While it may be plausible in c++, I think it may be much harder in Java and might involve messing with the system classes, which would kill the compatibility of the software and defeat the whole purpose of using Java.
Loading the unencrypted archive onto the USB - Nobody likes waiting 10 minutes just to view a file on a USB. Copying all the files might take some time. Also, what about free space on the USB?
Loading unencrypted archive onto harddrive - While being very unsecure and error-prone, this looks like the only possible way to get it done.
Creating a custom file browser allowing the user to browse the archive - Do you use winrar to browse your files? Would you like doing it? No. Creating a custom file browser will take alot of time to create, and again, is an error-prone and user-unfriendly approach.
I can't think of any other way of doing this. Can anyone think of a better way? Note that this is going to be free and open-source software.

TrueCrypt is Free Open-Source software for storing encrypted files on a storage device (i.e. USB drive). It runs on Windows, Linux, and MacOS. TrueCrypt even allows hidden volumes. I would start with their source code, and proceed from there.

rsync vs SyncML (Funambol)

I would like some idea about how rsync compares to SyncML/Funambol, especially when it comes to bandwidth, sync over unstable network and multiple clients to one server.
This is to sync several mobile devices with a directory structure of growing text-files. (Se we essentially want as much as possible on the server, and inconsistent files is not really a problem, also we know where changes originates).
So far, it seems Funambol doesn't compress, doesn't handle partial updates, and it is difficult to handle interruptions in a file-transfer.
I know rsync doesn't go through the server, but I don't quite see how that is a disadvantage.

Olav,
rsync can:
Compress the data (as you said) - thus gaining better performances over the net.
Synchronize only the newest data within each file - thus, once again, saving time.
Can be ran by multiple users at the same time. It's a very basic backup software behavior.
And one of my favorites: work over a secure shell.
You might want to check Rsyncrypto, for compressing and encrypting at the same time.
Dotan

What's the best strategy for large amounts of audio files in mobile application?

I have an S60 app which I am writing in Qt and which has a large number (several thousand) small audio files attached to it, each containing a clip of a spoken word. My app doesn't demand high fidelity reproduction of the sounds, but even at the lowest bit rate and in mono MP3 they average 6k each. That means my app may have a footprint of up to 10Mb. The nature of the app is such that only a few audio clips will be needed at any one time - maybe as many as 50, but more like 1-10.
So my question has two parts:
1) is 10Mb for a mobile app too large?
2) what is a reasonable alternative to shipping all audio files at install time?
Thanks

Have you considered rolling all clips into a single file and then seek in the stream? I'm not sure how much the per-file overhead of MP3 is but it might help.
That said, every S60 mobile phone out there should have 1GB or more, so 10MB doesn't sound like "too much". But you should deliver the app as a JAR file which people can download from your website with a PC and then install by cable. Downloading large amounts of data by the phone itself is pretty expensive in many parts of the world (Switzerland, for example).

In terms of persistent storage, 10Mb isn't a lot for modern mobile devices, so - once downloaded - storing your application data on the device shouldn't be a problem.
The other type of footprint you may want to consider, however, is memory usage. Ideally, you'd have clips buffered in RAM before starting playback, to minimise latency. Given that most Symbian devices impose a per-process default heap size limit of 1Mb, you can't hold all clips in memory, so your app would need to manage the loading and clearing of a cache.
It isn't generally possible to buffer multiple compressed clips at a time on Symbian however, since buffering a clip typically requires usage of a scarce resource (namely an audio co-processor). Opening a new clip while another is already open will typically cause the first to be closed, meaning that you can only buffer one in memory at a time.
If you do need to reduce latency, your app will therefore need to take care of loading and decompressing where necessary, to give PCM which you can then feed to the audio stack.

10MB is definitely on the large side. Most apps are < 1MB, but I think that I've seen some large ones (6-10-15 MB), like dictionaries.
Most S60 phones have in-phone storage space of around 100MB, but they also have memory cards and these are usually 128MB+, and 4GB is not uncommon for higher-end phones. You need to check the specs for your target phones!
Having such a large install pack will make installing over the air prohibitive. Try to merge the files so that you only have a few large files instead of many small ones, or the install will take too long.
An alternative would be to ship the most used sounds and download the rest as needed. S60 has security checks and you will need to give the app special permissions when you sign it.

Have you thought about separating the thousands of audio files into batches of, say, 20?
You can include a few batches into the application installation file and let the user download one (or more) batch at a time from your application GUI, as and when needed...

Store the sound files in a SQLite database, and access them only upon demand. Sounds like you are writing a speaking dictionary. Keep the app itself as small as possible. This will make the app load very fast by comparison. As the database market has matured, it seems a developer only needs to know about two database engines anymore: SQLite, for maximum-performance desktop and handheld applications, and MySQL for huge multi-user databases. Do not load all those sounds on startup unless it is critical. My favorite speaking dictionary application is still the creaking Microsoft Bookshelf '96; no kidding.

10MB for a mobile app is not too large provided, you convince the user that the content he / she is going to pull over the air is worth the data charges the user will incur.
Symbian as a platform can work well with this app as the actual audio files will be delivered from within the SIS file but the binary will not contain them and hence will not cause memory problems...
The best option would be to offer the media files for download via your website so that the user can download and sync them via PC- Suite / Mass Storage transfer. Allow the user to download the files into e:\Others or some publicly available folder and offer to read the media from there...
My 2cents...

Large file download in background, initiated from the browser

Is there any reasonable method to allow users of a webapp to download large files? I'm looking for something other than the browser's built-in download dialog - the requirements are that the user initiates the download from the browser and then some other application takes over, downloads the file in background and doesn't exit when the browser is closed. It might possibly work over http, ftp or even bittorrent. Platform independence would be a nice thing to have but I'm mostly concerned with Windows.

This might be a suitable use for BitTorrent. It works using a separate program (in most browsers), and will still run after the browser is closed. Not a perfect match, but meets most of your demands.

Maybe BITS is something for you?
Background Intelligent Transfer
Service Purpose
Background Intelligent Transfer
Service (BITS) transfers files
(downloads or uploads) between a
client and server and provides
progress information related to the
transfers. You can also download files
from a peer.
Where Applicable
Use BITS for applications that need
to:
Asynchronously transfer files in the
foreground or background. Preserve
the responsiveness of other network
applications. Automatically resume
file transfers after network
disconnects and computer restarts.
Developer Audience
BITS is designed for C and C++
developers.
Windows only

Try freeDownloadManager. It does integrate with IE and Firefox.

Take a look at this:
http://msdn.microsoft.com/en-us/library/aa753618(VS.85).aspx
It´s only for IE though.
Another way is to write a BandObject for IE, which hooks up on all links and starts your application.
http://www.codeproject.com/KB/shell/dotnetbandobjects.aspx

Depending on how large the files are, pretty much all web-browsers all have built-in download managers.. Just put a link to the file, and the browser will take over when the user clicks.. You could simply recommend people install a download manager before downloading the file, linking to a recommended free client for Windows/Linux/OS X.
Depending on how large the files are, Bittorrent could be an option. You would offer a .torrent file, when people open them in a separate download-client, which is seperate from the browser.
There are drawbacks, mainly depending on your intended audience:
Bittorrent is rarely allowed on corporate or school networks
it can be difficult to use (as it's a new concept to lots of people).. for example, if someone doesn't have a torrent client installed, they get a tiny file they cannot open, which can be confusing
problems with NAT/port-forwarding/firewalls are quite common
You have to use run a torrent tracker, and seed the file
...but, there are also benefits - mainly reduced bandwidth-usage on the server, as people download also seed the file.

How to avoid pauses when editing code on a network drive?

I'm planning on doing more coding from home but in order to do so, I need to be able to edit files on a Samba drive on our dev server. The problem I've run into with several editors is that the network latency causes the editor to lock up for long periods of time (Eclipse, TextMate). Some editors cope with this a lot better than others, but are there any file system or other tweaks I can make to minimize the impact of lag?
A few additional points:
There's a policy against having company data on personal machines, so I'd like to avoid checking out the code locally.
The mount is over a PPTP VPN connection.
Mounting to Linux or OS X client

Use a source control system — Subversion, Perforce, Git, Mercurial, Bazaar, etc. — so you're never editing code on a shared server. Instead you should be editing a local work area and committing changes to a repository located on the network.
Also, convince your company to adapt their policy such that company code is allowed on personal machines if it's on an encrypted volume. Encrypted disk images that you can use for this are trivial to create using Disk Utility, and can use strong cryptography. You can get even more security by not storing your encryption passphrase in your keychain, and instead typing it every time you mount the encrypted volume; this means that even if your local user account is compromised, as long as you don't have the volume mounted, nobody else will be able to mount it.
I did this all the time when I was consulting and none of my clients — some of whom had similar rules about company code — ever had a problem with it once I explained how things worked. (I think some of them even started using encrypted disk images even within their offices.)

Remate plugin simply disables this dreadful refresh-on-focus feature.
Download, unpack, doubleclick and choose "Disable Refresh on Regaining Focus" from "Window" menu (you can refresh manually by right-clicking project in drawer). Voila!

If you are accessing the data from your personal computer, it is in your RAM, so we will assume that you just can't store it on your hard drive, floppy, USB stick, etc.
Your solution is a RAM drive. Copy the files you need to edit there using whatever method you prefer (I would suggest source control) and then you can edit them without lag. When you are done commit them back to the server.
As was pointed out your editor may be caching changes to your temp directory, or maybe even your swap file (if it is in memory, then it can get swapped out). The solution to that is get a much larger RAM drive and run a Virtual Machine in the RAM drive. Not sure what OS you are running, but you can get a pretty slim install of most OS's if all you are doing is editing source code.
If you don't have enough RAM, then get a Gigabyte i-RAM solid state drive and remove the battery, that way it will lose everything when you power down.
Set your VMWare to not allow the OS to swap any of the virtual machine. Keep a baseline VM on your hard drive and copy it to your RAM drive before booting it up. Then you can use the hard drive in the VM like a hard drive, even though it is RAM.
Might be a good idea to run a secure erase on your RAM drive before powering down. Also keep in mind that they have found if you super cool a RAM chip before removing it from a functioning computer, and place it in a new computer quick enough, the data may still be intact.
I guess it all comes down to how detailed that policy is, and how it is interpreted.
Good luck!

Short answer: you can do no trick. CIFS is really geared towards LAN with a reasonably calm trafic, so you have zero chance to not suffer intermittent lag accessing a share through a VPN. The editor at some point needs to access the file in blocking IO, because it makes no real sense to do otherwise.
You could switch editor and use Emacs + TRAMP which is geared to work on remote files.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex