Cassandra and asp.net (C#) - asp.net

I am interested to create portal on cassandra services, since I faced some performance and scale issues starting from 1 million of records.
Definitely, it could be solved, but I am interested on other options.
My main issues is cost of updating all necessary indexes, to make reading fast.
First, is cassandra is good way for asp.net programmers? I mean, maybe there is some other projects, which worth to take a look
And second, can you provide any documentation samples on how to start with cassandra programming from C#?

since I faced performance and scale issues starting from 1 million of records.
Maybe your design was not that good, NoSQL is not a magic bullet for bad design. I have multi billion row tables and 95% of the response is sub second. Also what do you mean by updating indexes, do you mean updating statistics or rebuilding indexes?

since I faced performance and scale
issues starting from 1 million of
records.
You know, the one million mark for modern databases is where it is not something "totally ridiculously small" where you can ignore actually knowing what you do. Below one million is "tiny". I have a 800 million row table and get a LOT of sql running through with it - no problem at all.
First, is cassandra is good way for
asp.net programmers?
I would more suggest a basic book about SQL, reading the documentation and POSSIBLY throwing some hardware on the problem. As in: having totally bad hardware will kill all data management systems.

If you are using Cassandra for your .NET Application take a look at Aquiles. I developed it based on my company needs. If you find it useful or need any help let me know.

You can't really speak of Cassandra documentation. There's a myriad of partial tutorials on the web.
You may want to setup Linux in a virtual machine, because the windows build process is quite challenging, to say the least. (http://www.virtualbox.org, http://www.ubuntu.com)
Here's the howto:
http://www.ridgway.co.za/archive/2009/11/06/net-developers-guide-to-getting-started-with-cassandra.aspx
Note that the cassandra SVN url and the code sample have changed since the writing of this tutorial.
Here's another C# client:
http://github.com/mattvv/hectorsharp
And here some sample code:
http://www.copypastecode.com/26752/
Note that you need to download the latest Java Development Kit (JDK) from Sun for Linux.
It's not in the repositories of Ubuntu 10.04.
Then you need to type
export JAVA_HOME="/path/to/jdk"
in order for Cassandra to find your Java installation.
You might also want to take a look at:
http://en.wikipedia.org/wiki/NoSQL
Especially the taxonomy section is interesting.
Make sure Cassandra is the right type of NoSQL solution for your problem, e.g. use Neo4J if your problem actually is a graph problem.
Also, you need to make sure your NoSQL solution is ACID-compliant.
For example, Neo4J is the only ACID-compliant NoSQL graph engine.
Edit: Here's a jumpstart guide for Windows, without compiling:
http://coderjournal.com/2010/03/cassandra-jump-start-for-the-windows-developer/
http://www.ronaldwidha.net/2010/06/23/running-cassandra-on-windows-first-attempt/
http://www.yafla.com/dforbes/Getting_Started_with_Apache_Cassandra_a_NoSQL_frontrunner_on_Windows/

Instead of cassandra you might take a look at: ravendb. Supposedly it is a document store made with and created for .Net. It has Linq integration, and is (again supposedly) very fast.
As with any new technology, read if it helps you with your specific case, and check if it is proven technology (Do they have mainstream clients using it).
Before you go into this route see if you can't optimize your current solution first. Check if your queries are fast, if the indexes are done correctly, and if you can't remove load by adding caching.
Last nut not least, if adding some processors to your SQL machine might fix issues, it is typically a much cheaper solution.

If you want to do something new, then instead of going for noSQL, you might want to consider trying a database cluster.
The idea is when two machines each search half of the original database at the same time, you have half the search time without totally redesigning your existing database.

Related

Asp.net NHibernate CPU performance after upgrade

Has anyone else had CPU spikes after switching over to NHibernate?
We switched to using NHibernate about 2 years ago. Since then we've had issues with the server running using the CPU near 60 - 80. We also had issues with the server running out of memory.
Weve consistently been told to optimize our query. Which we did with only limited success. It wasnt until I recently upgraded from NHibernate 2.1 to 3.2 that we finally saw an improvement in the CPU. It dropped from a 60 percent average to about 30 percent. I was amazed, I was told by many who consider themselves experts that upgrading NHibernate would only produce limited improvements if any at all.
My question is ... Has anyone else noticed CPU spikes with NHibernate nd have they seen any improvement after doing a major version upgrade. And last, why exactly is the new version performing so much better? I know NHibernate 3 has a lot better support for linq and about 70 percent of my queries use Linq, so my guess is that may be part of the reason I'm seeing better performace.
Also, does anyone have any ideas how I can optimism NHibernate to produce even better CPU performance other than upgrading the dlls which I have already done.
I'm currently running NHibernate 3.2 and fluent NHibernate 1.2 upgraded from 2.1 and 1.0 respectively.
I suspect you have been told the same as I am about to recommend, but I urge you to look at all possibilities and discount them.
Weve consistently been told to optimize our query - Suspicions always lie with either the SQL generated by the ORM or the amount of time the DB takes to execute the query. This is sound advice and you must disprove this by using the following methods.
First I would set up a trace on the live database server that runs for a week. Once this is done you may find that you get suggestions on indexes or SQL related issues.
Secondly I would fire up NHProf on my development box and run some stress tests against heavily used pages or pages that have a lot of database trips to see what is going on behind the scenes with NHibernate. NHProf will give you advice about various problems including; select n+1, unbounded results, large number of rows returned, queries with too many joins etc. Again this tool is invaluable to bridge the gap between SQL server and your code.
Hopefully after this exercise you will have ideas on how to fix certain issues, introduce caching OR if you find you don't have any items to address give you valuable feedback that you can then post to the NHUser group.
After all if you think about it tens of thousands of users use NHibernate. I have used NHibernate myself for several years and subscribe to the NHusers group and I have not seen the CPU spike issue before. Always it turns out to be either; the SQL generated, the database is under pressure or large recordsets being hydrated

What about OpenEJB? Is it worth it? Any opinions?

I whould like to know some opinions about OpenEJB: we are considering to use it on a new project, but really didn't found many opinions about it.
So, here is my question: how about it? Does it perform well? Is it stable enough for a production environment?
We switched to OpenEJB (deployed embedded in our app on Tomcat). Performance tests showed better or not worse results processing our transactions compared to JBoss (transactions include data access, JMS, and servlets). We use ActiveMQ within OpenEJB for JMS. There are no stability problems as of yet - we are still in staging (pre-production) environment though. The documentation is definitely lacking, but not as poor as other embedded choices. Overall, we consider this as a good choice if you run on Tomcat. Deploying it on other application servers turned out to be much more difficult (JBoss, Weblogic, Websphere) but there are not many reasons for this usually (we had few but dropped this after several attempts basically failed).
And as in all open source products: expect lack of support (documentation, troubleshooting, bugs, etc.) to be compensated by free access to sources.
We've had experience with Oracle OAS and JBOSS before. We decided to give OpenEJB a try. We've found out that it is not only very fast but it also much easier to setup and configure, and it has much better defaults.
Currently we implement our own failure measures in the client, so we don't know how they compare for clustering, or other advanced features that we don't use.
We we have to go back and deal with JBOSS in the developer side, we see a drop on productivity, because it takes too long to bootstrap.

Caching Solutions

Has anyone done a thorough comparison of AppFabric and NCache or AppFabric and ScaleOut? We are currently looking to implement either AppFabric, NCache or ScaleOut for distributed caching in geographically distant locations and I would like to know anyone's thoughts who has compared them side by side. I appreciate that many people use one or the other and tell me why their chosen solution is great but I am really looking for a comparison of the two products. Such things as what does AppFabric not do or not do well (if anything), partially from a features point of view but also from developer's point of view. Is working with one compared to the other nicer, easier, more flexible, more powerful, etc.
There are plenty of lists of features which I can compare but am really looking for a comparison from someone who has perhaps been in a similar position to us and has performed the evaluation that we are about to launch into which will give us some food for thought whilst we do so.
Thanks in advance.
Here is a good comparison between the features of NCache and Appfabric
As a more mature product, NCache has a number of more advanced caching features that Velocity/AppFabric doesn't have -- check out their website for some "marketing" comparisons.
However, we have had a number of issues troubleshooting NCache and obtaining more visibility from their support/engineering team into certain behaviors of their application. Given that, plus the cost compared to AppFabric, I'm not sure I would recommend NCache at this point -- at least, we're in the process of re-evaluating our caching provider.
My frustration/complaint with Velocity/AppFabric is the the sluggishness in the release schedule. Seems like they were in CTP forever. Certainly Microsoft can crush NCache on price alone. There are now players like NorthScale (memcached) that are entering the fray which I think are also worth considering. A lot depends on what you want to use caching for in your application.
The most used one is Memcached. for sure.
we currently are starting using AppFabric as our dcache, as it easily integrates into our .net solutions, and has a good feature set, that we want to use.
if you just do basic dcaching, make a abstraction of caching itself (or use the .net 4 System.Runtime.Caching.ObjectCache) so you are safe if you want to do changes. or want to stress test more solutions.
Also, depending on your App architecture, think of using more entities/instances of your DCache, as different parts maybe favor different systems.
It is looking like we will need more advanced functionality than what Velocity provides so it will be either NCache or ScaleOut. There are good reasons for both, we just need to sort through these. We do not have Unix resources so memcached is out. I know there is a Windows port but colleagues who know memcached tell me that it is somewhat buggy and if you are going to bother going down the memcached path, you really should make the effort to go for the Unix version.
Some might argue that this is a biased comparison, but it's worth reviewing..
http://www.alachisoft.com/comparison/ncache-vs-appfabric.html
PDF has the full review.
http://www.alachisoft.com/downloads/comparison/ncache-vs-appfabric.pdf

What are you using for Distributed Caching in web farms running ASP.NET?

I am curious as to what others are using in this situation. I know a couple of the options that are out there like a memcached port or ScaleOutSoftware. The memcached ports don't seem to be actively worked on (correct me if I'm wrong). ScaleOutSoftware is too expensive for me (I don't doubt it is worth it). This is not to say that I don't want to hear about people using memcached or ScaleOutSoftware. I'm just stating what I "know" at this point.
So my question is basically this: for those of you ACTIVELY using distributed caching, what are you using, are you happy with it, and what should I look out for?
I am moving to two servers very soon...both will be at the same location. I use caching fairly heavily (but carefully) to reduce the load on my database server.
Edit: I downloaded Scaleout Software's solution. I've coded for it and it seems to work real well. I just have to decide if my wallet will part with the cash for it. :) Anyone have experiences good or bad with ScaleoutSoftware?
Edit Again: It's been a little while since I asked this? Any more thoughts on it? We ended up buying the solution from ScaleOutSoftware and have been happy with it, but I'm curious what others are doing.
Microsoft has a product pending code-named Velocity. It's still in CTP, and is moving slowly, but looks like it will be pretty good. We'll be beating it up in the near future to see how it handles what we want it to do (> 2 million read/writes per hour). Will post back with results.
There is a 100% native .NET, well documented open source (LGPL) project called Shared Cache. Looks like it is not yet mentioned on SO, but it's promising and should be able to do what most people expect from a distributed cache. It even supports different strategies like distributed or replicated caching etc.
I will update this post with more details as soon as I had a chance to try it on a real project.
We're currently using an incredibly simple cache that I wrote in a couple of hours, based on re-hosting the ASP.NET cache in a Windows Service (more info and source code here). I won't pretend it's anywhere near as optimised as something like Memcached but we were just looking for something simple and free until Velocity came along, and it's held up extremely well even under fairly heavy load.
It comes down to our personal preference for core components - i.e. ones that affect whether the site is available or not - that they are either (a) supported by a vendor with a history of rapid and high quality support, or (b) written by us so that if something goes wrong we can fix it quickly. Open source is all well and good, and indeed we do use some OSS, but if your site is offline then unfortunately newsgroups et al don't have a 1 hour SLA, and just because it's OSS doesn't mean you have the necessary understanding or ability to fix it yourself.
We are using the memcached port for Windows and we are very pleased with it. The enyim.com memcached client API is great and easy to work with. It's also open source, which is a big advantage, if you ask me.
We are now using this setup in a production web-app and it has helped a lot in improving its performance.
There's a great .NET wrapper/port found here on Codeplex. Awesomesauce!
We use memcached with the enyim library in a production environment (www.funda.nl). Works fine, very pleased with it, but we did notice a substantial raise in CPU use on the clients. Presumably due to the serializing/deserializing going on. We do around 1000 reads per second.
One tried and tested product by 100's of customers worldwide is NCache. Its
a feature rich product that lets you store session state in a redundant and highly available manner, lets you share data
within the enterprise as well as bridging for WAN communication essentially acting as a data fabric and lastly it lets you build an elastic caching tier so that when
your application scales, you can add servers to the cache and actually boost performance further.

What are your experiences with Windows Workflow Foundation?

I am evaluating WF for use in line of business applications on the web, and I would love to hear some recent first-hand accounts of this technology.
My main interest here is in improving the maintainability of projects and maybe in increasing developer productivity when working on complex processes that change frequently.
I really like the idea of WF, however it seems to be relatively unknown and many older comments I've come across mention that it's overwhelmingly complex once you get into it.
If it's overdesigned to the point that it's unusable (or a bad tradeoff) for a small to medium-sized project, that's something that I need to know.
Of course, it has been out since late 2006, so perhaps it has matured. If that's the case, that's another piece of information that would be very helpful!
Thanks in advance!
Windows Workflow Foundation is a very capable product but still very much in its 1st version :-(
The main reasons for use include:
Visually modeling business requirements.
Separating your business logic from the business rules and externalizing rules as XML files.
Separating your business flow from your application by externalizing your workflows as XML files.
Creating long running processes with the automatic ability to react if nothing has happened for some extended period of time. For example an invoice not being paid.
Automatic persistence of long running workflows to keep resource usage down and allow a process and/or machine to restart.
Automatic tracking of workflows helping with business requirements.
WF comes as a library/framework so most of the time you need to write the host that instantiates the WF runtime. That said, using WCF hosted in IIS is a viable solution and saves a lot of work. However the WCF/WF coupling is less than perfect and needs some serious work. See here http://msmvps.com/blogs/theproblemsolver/archive/2008/08/06/using-a-transactionscopeactivity-with-a-wcf-receiveactivity.aspx for more details. Expect quite a few changes/enhancements in the next version.
WF (and WCF) are pretty central to a lot of the new stuff coming out of Microsoft. You can expect some interesting announcements during the PDC.
BTW keeping multiple versions of a workflow running takes a bit of work but that is mostly standard .NET. I just did a series of blog posts on the subject starting here: http://msmvps.com/blogs/theproblemsolver/archive/2008/09/10/versioning-long-running-workfows.aspx
About visually modeling business requirements.
In theory, this works quite well with a separation of intent and implementation. However, in practice, you will drop quite a few extra activities on a workflow purely for technical reasons, and that sort of defeats the purpose as You have to tell a business analyst to ignore half the shapes and lines.
Related question: When to use Windows Workflow Foundation? My answer there:
You may need WF only if any of the
following is true:
You have a long-running process.
You have a process that changes frequently.
You want a visual model of the process.
For more details, see Paul Andrew's
post: What to use Windows Workflow Foundation for?
Please do not confuse or relate WF
with visual programming of any kind.
It is wrong and can lead to very bad
architecture/design decisions.
So, if you have such requirements, then WF is a good candidate. Of course it is relatively complex, but mention that the problems that is trying to solve is also complex (and sometimes very complex). IMHO, it is very complex for example to dehydrate/rehydrate objects that have event handlers attached (with events that can be triggered when the object is not in memory).
I can not judge what you mean by "small to medium-sized project", but in general I would say that if your project has at least two requirements from the above list, then you can consider WF as a solution.
We've used WF in a large-ish SharePoint application and I can say it's OK. It has lots of power and flexibility. and, as Kevin mentions, once you grok the underlying concepts of workflows, you can do pretty much anything you want with it.
On the other hand, it has some really serious issues, like lack of versioning, which can really hurt your application in the future. We've been forced to deploy up to 3 parallel versions of the same workflow named xxx-v1, xxx-v2 and xxx-v3 to keep older instances running and have new instances use the updated versions. A real pain in the ass. Oh, and there are also some really non-intuitive concepts in there (correlation tokens, wtf??)
We had a project at work that I was involved in using Workflows.
The idea (from management), was that us programmers would write the Workflow Activities along with the "engine" and framework. Then non-programmers would take care of all the rest by compiling their own Workflows into dlls which the engine would automatically load.
Management was sold on this idea of non-programmers using Workflow to help develop software, and it was pretty much a complete waste of time. The problem we were trying to solve with this project was relatively complex and we knew from the very beginning that the software would have to be modified almost constantly (its calculations were dependent on other companies and governements).
The end result was that we were unable to make the Workflow modules generic enough for anyone else to use. So the programmers were the ones who were forced to work with the Workflows, and all the Workflows did was get in our way.
I've been using Workflow 4.0 for the last few months and although mostly impressed, I've found it extremely hard to learn.
For the most recent version (that comes with .NET 4.0 RC), there is next-to-no documentation on the web, in any books or no training courses available. I've only found articles relating to the now defunct 3.0 version. Even the MSDN documenation is light on the ground.
The workflow designer is not as intuitive as it should be by any means so learning is very hard. I've had to rely on answers from a single person on StackOverflow (thanks by the way Maurice!) - and I would be stuffed without his help.
So in summary, I think it has potential but you would be quite mad to learn it yet - wait for more training, documentation and books otherwise you will be going into it blind!
Last year we completed a working application with WF, now used as the backbone of an unbelievably huge system which is used by a very big bank for its mortgage process. The pe process has many steps starting from customer application to approval of credit.
Although it was a success, there were so many problems and crisis all along the way. And it wont worth the trouble for any smaller size projects.
I consider MS WF as a low-level workflow library rather than a fully fledged enterprise workflow product such as K2. It will enable you to build a workflow enabled application, but is not in itself a workflow application. My experiance of it in this capacity has been positive, although we have had to build a lot of our own infrastructure around it (a pub/sub framework, a worlkflow lifetime manager etc). A lot of the documentation out there is fairly simplistic and does not cover building up an enterprise workflow application based on MS WF.
Hard to learn. Quite flexible. Not to be confused with a visual tool for end users, only for programmers. Not sure if I like the dependancy property approach.
It really depends on what you want to do with it. I've only used it a little, but compared to more mature products like MetaStorm (I know technically it's a BPM, but there is still a workflow component), Process Choriographer and IBM MQ workflow, there's no comparison. It's just not mature enough. On the other hand it's free where the others are not and can probably get the job done. I don't know if I would place a multi-million dollar operation on it, but with smaller ones, I'd give it another shot. The real hurdle you are going to face is the change in thought process it requires. If you don't have developers that have worked with state systems before that can be a real hurdle.
Brian, I can't reply to your comment, but anyway, by versioning i mean making changes to the underlying code of the workflow without breaking already running instances, and gracefully applying updates to existing workflows. I'm not sure about 'stock' WF, but at least in SharePoint environment there's no concept of workflow versions so new versions have to be deployed as completely different workflows which becomes a maintenance nightmare.
This has nothing to do with 'rehydration', rehydration is the process by which you bring a 'dormant' workflow back to activity after some event or change in state. That is handled transparently by the workflow runtime.
WF ist integrated into SharePoint (WSS 3.0), and i have created quite a few workflows for various SharePoint-Websites, so i can tell about my experience of WF in SharePoint. Compared with other workflow-frameworks WF scores well. It's stable (i haven't experienced any mysterious errors), workflows are fairly easy to design (thank to the workflow-designer in Visual Studio) and you can use not only sequential but also state-machine workflows.
It's not perfect, of course, and a developer will definitly need some time to understand the concept (of i.e. the Activity Model); but it's definitely useable - even for "small tasks".
Never tried WFF, but I remember reading this article about WFF by Leon Bambrick where he basically says the whole genre of software development tools is nonsense. Might help you decide one way or the other.

Resources