Boxfuse deployment without encrypting AMIs - cloudcaptain

Is it possible to make a Boxfuse deployment to AWS without encrypting images? Even the smallest Play Framework fused application (~90MB) takes about 5 minutes (as opposed to announced ~50 seconds) to encrypt, not to mention projects of greater size. For that reason it would be interesting to be able to disable encrypting at least for environments where deployments are frequent (QA, test, etc.).
There is a comment somewhere from Boxfuse about choosing AWS regions which are optimised for AMI signing, but I didn't have luck with that. It is still rather slow for my case.

Related

Remotely Verifying the Application in execution

Is it possible to prove to the remote party that the application I am running in my system is the same as I am claiming that I am running using DRTM or SRTM? If yes then How?
Theoretically: yes. The concept is called remote attestation.
The basic idea is: First you have a sound chain of trust built on your platform, like:
BIOS ==> Boot loader ==> OS ==> Applications
The resulting measurements are stored in the PCRs.
Now you can let the TPM sign this set of PCRs, that's called quote.
You can submit this quote to a remote entity. Here the problems start:
How can you proof that the quote was signed by a hardware TPM and not an emulator?
Possible solutions: pre-shared keys or some kind of CA.
How can you be sure that the PCR values represent a trusted system state?
That's not so easy. If you have SRTM, you have to consider every possible combination of
how your system load the components. E.g. in BIOS-phase, in which order are the
option-ROMs loaded?
Here DRTM comes for the rescue, but it makes the matter just slightly easier. With DRTM
you can forget about all the pre-DRTM stuff. If you have a small trusted environment,
say like flicker, then you'll have a manageable set of trusted configurations.
If you have a full-featured OS, than it's hard.
First, you have to find an OS that measures everything. IBM's IMA for the Linux
kernel is one example.
Then, the slightest difference in the order of loaded components will lead to
different PCR values. Furthermore consider all the combinations of states the
different installed software packages might be in.
Possible solutions are to restrict the possible set of PCR values that represent a
valid configuration. For example you can measure a whole OS image instead of each
binary. An example is the acTvSM platform published a few years ago.
Conclusion: There is no easy, off-the-shelf solution, but you can design a system such that it fits your requirements.

Difference between gLite and OpenNebula

What is the difference between gLite and OpenNebula?
I intend to differences in structure and in usage. When to use the one and when the other?
I think g.lite is a DSL service that offers a download speed that is slower than other forms (maximum of 1.5 Mbps).
Whereas, OpenNebula is a cloud computing toolkit for managing heterogeneous or different distributed database center infrastructures.
gLite is a grid middleware, which -- very simply put -- takes care of (bulk) submission and processing of grid jobs (+ related tasks like scheduling, data handling, monitoring, information service and others).
OpenNebula is a cloud management framework, which -- again put simply -- provides you with virtual machines on demand, based on available templates.
So, you will use gLite if your workload comes in the form of jobs (think scripts), which can run in the target system (gLite is currently only supported in Debian 6 and Scinetific Linux 5 & 6). You will use OpenNebula if you need on-demand virtual machines running your own (or publically available vanilla) system images. You can of course preinstall any software you need in those images.)
I promise to give a better aimed answer if you say more about what your situation is and what you want to achieve. Both products are actually intended to provide different kind of service.

Anyone using Memcached with ASP.NET on a distributed farm?

We have 22 HTTP servers each running their own individual ASP.NET Caches. They read from a read only DB that is only updated off peak hours.
We use a file dependency to invalidate the cache, prompting the servers to "new up" their caches...If this is accidentally done during peak hours, it risks bringing down our DB cluster due to the sudden deluge of open connections.
Has anyone used memcached with ASP.NET in this distributed form? It seems to me that it would offer a huge advantage of having to only build up one cache (and hit the DB 21 times less), while memcached would handle distributing it on each box.
If you have, do you place it on the same box as the HTTP boxes, or do you run a separate cache tier? How well does it scale, can we expect it to need powerful servers? Our working dataset is not huge (We fit it into 4 gigs of memory on each HTTP box just fine).
How do you handle invalidation?
Looking for experiences and war stories.
EDIT: Win2k3, IIS6, 64-bit servers...4 gigs per box (I believe, we may have upped it to 16 gigs when we changed to 64-bit servers).
"memcached would handle distributing it on each box"
memcached does not distribute or replicate a cache to each box in a memcached farm. The memcached client basically hashes the key and chooses a cache server based on that hash. When one of the memcached servers fail you will lose whatever cached items existed on that server, however, the client will recognize the failure and begin writing values to a different server. This being the case, your code needs to account for missing items in the cache and reset them if necessary.
This article discusses the memcached architecture in more detail: How memcached works.
Best practice (according to the memcached site) is to run memcached on the same box as your web server app or else you're making http calls (which isn't all that bad, but it's not optimal). If you're running a 64-bit app server (which you probably should if you're going to be running memcached), then you can load up each of the servers with loads of memory and it will be available to memcached. There's not much in the way of CPU resources used by memcached, so if your current app server isn't very taxed, it will remain that way.
Haven't used them together, but I've used them both on separate projects.
Last I saw the documentation explicitly said that sharing with the web server was ok.
Memcache really only needs RAM and if you take your asp.net cache out of the equation how much RAM is you web server actually using? Probably not much. It won't compete much with your web server for CPU and it doesn't need disk at all. You might consider segmenting off the network traffic (if you don't already) from the incoming web requests.
It worked well and was fast I didn't have any problems with it.
Oh, invalidation was explicit on the project I used it on. Not sure what other modes there are for that.
If you want to get replication accross your memcached servers then it maybe worth a look at repcached. It's a patch for memcached that handles the replication part.
Worth checking out Velocity, which is a distributed cache provided by Microsoft. I cannot give you a point-by-point comparison to memcached, but Velocity is integrated with ASP.NET and will continue to get more development and integration.

ASP.NET performance measurements on a hosted platform

I have a large ASP.NET website on a hosted platform. It shares the machine with a lot of other applications. We do not have access to the machine itself (only an FTP account).
Our client is complaining that it is starting to perform rather badly, particularly around peak hours. I've run some remote measurements (using a JMeter-like tool) that tells me that, yes, it does indeed perform rather badly during peak hours. It doesn't tell me why though. The client is resisting a move to a dedicated server without some hard facts.
As I see it, what I need are hard data about the machine itself. Setting up a local performance test environment would be extremely time-consuming, and I have no way to estimate the server performance.
My question: is there a good way to collect (a lot) of performance measurements when I have limited access to the machine, and certainly no access to the performance monitor? Any code would have to run in the asp.net application itself, without screwing it up too much.
We had a similar problem with our asp.net application hosted on a shared server, which also started to perform badly during peak hours.
Although I don't know of an elegant solution to your question, this is what we did:
Talk to your host providers to see what additional information they can give you - it's in their best interest to keep their clients happy. Our host providers were able to give us some time with one of their network engineers who provided us with some decent CPU and memory utilization stats.
Take your own performance measurements by dumping information to either a log file (using log4net) and/or the database - for example, user sessions, search times, page hits, timing measurements around key functionality. From this information we were able to ascertain what our systems normal behavior was for a set number of automation tests.
Setup a local server (not necessarily same stats as hosted/production server) with your application loaded and give it a full load/performance/capacity testing (we used Red Gate's ANTS Profiler). The stats that you gather from that will give you and your client a good indication of how the system should behave under certain loads with a known environment. Yes, this can be time consuming but it will give you a great performance measuring tool so that you can catch/fix bottlenecks locally rather than on production.
Good luck.

What's the best solution for file storage for a load-balanced ASP.NET app?

We have an ASP.NET file delivery app (internal users upload, external users download) and I'm wondering what the best approach is for distributing files so we don't have a single point of failure by only storing the app's files on one server. We distribute the app's load across multiple front end web servers, meaning for file storage we can't simply store a file locally on the web server.
Our current setup has us pointing at a share on a primary database/file server. Throughout the day we robocopy the contents of the share on the primary server over to the failover. This scneario ensures we have a secondary machine with fairly current data on it but we want to get to the point where we can failover from the primary to the failover and back again without data loss or errors in the front end app. Right now it's a fairly manual process.
Possible solutions include:
Robocopy. Simple, but it doesn't easily allow you to fail over and back again without multiple jobs running all the time (copying data back and forth)
Store the file in a BLOB in SQL Server 2005. I think this could be a performance issue, especially with large files.
Use the FILESTREAM type in SQL Server 2008. We mirror our database so this would seem to be promising. Anyone have any experience with this?
Microsoft's Distributed File System. Seems like overkill from what I've read since we only have 2 servers to manage.
So how do you normally solve this problem and what is the best solution?
Consider a cloud solution like AWS S3. It's pay for what you use, scalable and has high availability.
You need a SAN with RAID. They build these machines for uptime.
This is really an IT question...
When there are a variety of different application types sharing information via the medium of a central database, storing file content directly into the database would generally be a good idea. But it seems you only have one type in your system design - a web application. If it is just the web servers that ever need to access the files, and no other application interfacing with the database, storage in the file system rather than the database is still a better approach in general. Of course it really depends on the intricate requirements of your system.
If you do not perceive DFS as a viable approach, you may wish to consider Failover clustering of your file server tier, whereby your files are stored in an external shared storage (not an expensive SAN, which I believe is overkill for your case since DFS is already out of your reach) connected between Active and Passive file servers. If the active file server goes down, the passive may take over and continue read/writes to the shared storage. Windows 2008 clustering disk driver has been improved over Windows 2003 for this scenario (as per article), which indicates the requirement of a storage solution supporting SCSI-3 (PR) commands.
I agree with Omar Al Zabir on high availability web sites:
Do: Use Storage Area Network (SAN)
Why: Performance, scalability,
reliability and extensibility. SAN is
the ultimate storage solution. SAN is
a giant box running hundreds of disks
inside it. It has many disk
controllers, many data channels, many
cache memories. You have ultimate
flexibility on RAID configuration,
adding as many disks you like in a
RAID, sharing disks in multiple RAID
configurations and so on. SAN has
faster disk controllers, more parallel
processing power and more disk cache
memory than regular controllers that
you put inside a server. So, you get
better disk throughput when you use
SAN over local disks. You can increase
and decrease volumes on-the-fly, while
your app is running and using the
volume. SAN can automatically mirror
disks and upon disk failure, it
automatically brings up the mirrors
disks and reconfigures the RAID.
Full article is at CodeProject.
Because I don't personally have the budget for a SAN right now, I rely on option 1 (ROBOCOPY) from your post. But the files that I'm saving are not unique and can be recreated automatically if they die for some reason so absolute fault-tolerance is necessary in my case.
I suppose it depends on the type of download volume that you would be seeing. I am storing files in a SQL Server 2005 Image column with great success. We don't see heavy demand for these files, so performance is really not that big of an issue in our particular situation.
One of the benefits of storing the files in the database is that it makes disaster recovery a breeze. It also becomes much easier to manage file permissions as we can manage that on the database.
Windows Server has a File Replication Service that I would not recommend. We have used that for some time and it has caused alot of headaches.
DFS is probably the easiest solution to setup, although depending on the reliability of your network this can become un-synchronized at times, which requires you to break the link, and re-sync, which is quite painful to be honest.
Given the above, I would be inclined to use a SQL Server storage solution, as this reduces the complexity of your system, rather then increases it.
Do some tests to see if performance will be an issue first.

Resources