Scaling exercises to practice - scaling

I was trying to find online some exercises to practice scaling techniques (memchached, SQL Optimization, sharding dbs), but I could only find descriptions of these techniques, not any project on which to try them.
This link with slides on scaling techniques, is an interesting one, as it sums up some tools to achieve scalability quite well.
Is there a projecteuler kind of site for these kind of activities? Or at least some excercises (such as a downloadable ASP.NET/PHP site with obvious slowdowns, concurrency issues, subtle bugs) for people to try and learn how to fight this issue?

I find that the site High Scalability has some nice insights.

It might be interesting to hack at Wordpress. Their caching plugins take care of a lot of scaling issues but it would be cool to write your own plugin or hack at the source to cut down on SQL queries or to cache static pages. If you come up with something, make sure to let the rest of the community know!

George's slides are definitely a good basis to work from. Note that he is not talking about a specific technique or technology; rather he's discussing more general architectural and design decisions that will help your application scale as a whole.
I personally think this sort of high-level thinking would be much more valuable than individual optimisation techniques. Perhaps you could take a well known web application and hack it until it scales well across multiple machines? A cluster of lots of cheap, low-power EC2 machines could be really useful here. Getting an existing or new application to run properly across a number of machines would be a fantastic exercise.
Counter-intuitively, rather than getting as much as possible to run on a single machine, I'd say it would be much more educational to get the same application running on several machines.
Once you have that, it makes sense to move onto more specific improvements like a separate static content tier, memcached, DB sharding, batch operations and so on.
In terms of specific projects to work on, how about cloning Twitter, Flickr or The Pirate Bay. They've all had performance and scaling challenges in the past.

Related

Is programming in layers real?

I am fairly new in product development and I am trying to work over a product. The problem that I have realized is that people draw diagrams and charts showing different modules and layers.
But as I am working alone (I am my own team) I got a bit confused about the interaction I am facing in the development within the programs and I am wondering whether developing a product in modules is real or not?
Maybe I am not a great programmer, but I see no boundaries when data start to travel from frontend to backend.
I've written a lot of layered applications and it can be a useful pattern but it can lead you astray too, and thinking in modules is a bit more useful.
One problem with layers is that they're often used as a reason for repackaging data as it flows through the system, when the data is packaged perfectly well when it enters the system, such as from a database.
Another issue is that layering by its very nature stacks modules on top of one another - this is just too naive for most systems.
I suggest you get a good book on design patterns and spend some time studying and understanding the trade-offs with different architectural approaches. Developing modular applications is not easy but it's worth taking the time to do it well.

How do you deal with design changes?

I just finished working on a project for the last couple of months. It's online and ready to go. The client is now back with what is more or less a complete rewrite of most parts of the application. A new contract has been drafted and payment made for the additional work involved.
I'm wondering what would be the best way to start reworking this whole thing. What are the first few things you would do? How would you rework the design in a way that you stay confident that the stuff you're changing does not break other stuff?
In short, how would you tackle drastic application design changes efficiently (both DB and code)?
Presuming that you have unit tests in place, this is just refactoring.
If you don't have unit tests in place, then
Write unit tests for the parts you're likely to keep.
Write unit tests for the parts you're going to change.
Run the tests. The "keep" should pass. The "change" should fail.
Start refactoring until the tests pass.
This is NOT-A-NEW thing in software and people have done this and written a lot about this.
Try reading
Working Effectively with Legacy
Code
Refactoring Databases:
Evolutionary Database Design
The techniques explained here are invaluable to sustain any kind of long running IT projects.
Database design is different from application design in this regard.
Very often, client rethinking changes the application completely, but changes little, if anything, in the fundamental underlying data model of the enterprise. The reason for this is that clients tend to think in terms of business processes, but not in terms of fundamental data. Business processing and data processing are tightly coupled. Data storage is less tightly coupled.
In the days of classical database design, designers learned how to exploit this pattern, by dividing their database design into (at least) two layers: logical design and physical design. There are any number of times that a change of business process requires a complete rewrite of the application, and a major rework of the database physical design, but requires few, if any, changes to the logical design.
If your database design didn't separate out the layers like this, it's hard to tell what gets affected and what doesn't. Start with your tables and columns. Ask yourself if any of the changes require removing any column from the table it's in, or require inventing new columns. If the answer is no, you're in luck. Next, look at the constraints placed on the database (things like PRIMARY KEY, FOREIGN KEY, UNIQUE and NOT NULL). These constraints might be tightened or loosened by the client's changes. If not, you're in luck. If you didn't declare any constraints in the database, and chose to do all your integrity protection in application code, you're probably out of luck.
You still have a fair amount of work to do in terms of changing the indexes on the tables, and the way the application works with the data. But you've salvaged part of the investment in the old system.
The application itself is much more vulnerable to client changes in process than the database. If your database design was completely driven by your application design, you may be out of luck.
If it's THAT drastic of a change it might be best to just start over. I've worked on a number of projects that have gone through some drastic changes.
Starting over gives you a chance to use experience learned since the last project and provide a more efficent product.
I would recommend against trying to re-work the old site into the new site, you'll probably spend more time fiddling around changing things than you would have if you had just re-written it.
Best of luck to you !
How would you rework the design in a way that you stay confident that the stuff you're changing does not break other stuff? In short, how would you tackle drastic application design changes efficiently (both DB and code)?
Tests, code complexity/coverage metrics, and a continuous integration system. Run them early and often, so you know which parts are the riskiest and where to start writing.
These will become your safety nets when you have to make potentially problematic changes. If something does break, your CI system will tell you, and you won't have spent weeks down some rabbit hole before you realize there's a problem.
Sometimes you do things better the second time around so just try and stay positive. Plus you will have more domain knowledge this time around.

Automating paper forms and process flow in the office

I have been tasked with automating some of the paper forms in HR. This might turn into "automate all forms" eventually, so I want to approach this in a way which will be best for the long term and will be a good framework as this project grows.
The first things that come to mind were:
-InfoPath/SharePoint (We currently don't use SharePoint now, and wouldn't be an option for the next two years.)
-Workflow Foundation (I've looked into this and does not seem too attractive or appropriate)
Option I'm considering at this point:
-Custom ASP.NET (VB.NET) & SQL Server, which is what my team mostly writes their apps with.
-Leverage Infopath for creating the forms electronically. Wondering if there is a good approach to integrating this with a custom built ASP.NET app.
-Considering creating the app as an MVC web app.
My question is this:
-Are there other options I might want to consider?
-Are there any starter kits or VB.NET based open source projects there which would be a starting point or could be used as a good reference. Here I'm mostly concerned with the workflow processing.
-Any comnments from those who have gone down this path?
This is going to sound really dumb, but in my many years of helping companies automate paper form-based processes is to understand the process first. You will most likely find that no single person understands the whole thing. You will need to role-play the many paths thru the process to get your head around it. And once you present your findings, everyone will be shocked because they had no idea it was that complex. Use that as an opportunity to streamline.
Automating a broken process only makes it screw up faster and tell a lot of people.
As far as tools, my experience dates me but try to go with something with these properties:
EASY to change. You WILL be changing it. So don't hard-code anything.
Possible revision control - changes to a process may or may not affect documents already in route?
Visual workflow editing. Everyone wants this but they'll all ask you to drive it. Still, nice tools.
Not sure if this helps or not - but 80% of success in automating processes is not technology.
This is slightly off topic, but related - defect tracking systems generally have workflow engines/state. (In fact, I think Joel or some other FC employee posted something about using FB for managing the initial emails and resume process)
I second the other advice about modeling the workflow before doing any coding or technology choices. You will also want this to be flexible.
as n8owl reminded us, automating a mess yields an automated mess - which is not an improvement. Many paper-forms systems have evolved over decades and can be quite redundant and unruly. Some may view "messing with the forms" as a violation of their personal fiefdoms, so watch your back ;-)
model the workflow in terms of the forms used by whom in what roles for what purposes; this documents the current process as a baseline. Get estimates of how long each step takes, both in terms of man-hours and calendar time
understand the workflow in terms of the information gathered, generated, and transmitted
consolidate the information on the forms into a new set of forms for minimal workflow
be prepared to be told "This is the way we've always done it and we're not going to change", and to gently (a) validate their feelings, (b) explain how less work is more efficient, and (c) show concrete benefits [vs.the baseline from step 1]
soft-code when possible; use processing rules when possible; web services and html forms (esp. w/jquery) will go a long way if you have an intranet
beware of canned packages (including sharepoint) unless you are absolutely certain they encompass your organization's current and future needs
good luck!
--S
I detect here a general tone of caution with regards to a workflow based approach and must agree. Be advised about the caveats of most workflow technologies which sacrifice usability for flexibility.

How many organizations use vendor-supplied SOA stacks?

My work place recently started a SOA initiative. After a year-long examination of the biggest vendors (IBM and Oracle) they have decided which one to use and are now in the process investing quite a lot of money in the whole SOA stack (application servers, BAM, process servers, ESB, UDDI-like solution etc).
How many organizations are really using a fully-blown SOA stack? Did this technology show any proofs of being better? I'm afraid of a 'Silver Bullet' syndrome.
I work at a SOA shop (and we sell our own stack...perhaps you bought ours!), and it can really help businesses become more agile...if it's done well.
The problems come when:
People start making everything into a service, and you end up with just as many interconnections and interfaces as you had before you inserted an ESB. This makes change very difficult.
If you're using BPM with human interfaces: people don't 'get' portlets. Instead of making individual portlets do one task each, they make them do lots of things, which defeats the object of BPM. I can expand on this lots if necessary, but this might not be relevant to you.
It's all implemented at once. It's a massive system change, so try and do it slice by slice. (for example: just front your existing systems with web services, and build a new UI on top. Then gradually replace the UI calls that went to the old system with ones to the new system.) This will aid user acceptance as well as be a much safer way to do things. Management possibly won't want this approach (it's harder to manage) but emphasise the benefits. A lot.
It's sold as codeless development. This doesn't exist, and probably never will. Even if you don't have to write any code, if you don't know how code works then yours will be incomplete, ill-thought-out or unmaintainable.
From what I've seen, if SOA's done well then your business can turn on a sixpence, and it's cool. If it's done badly then it probably won't be worse than your legacy system, but it won't be better, and you'll have had an expensive and painful time in between for nothing.
I could give you a customer list but I dunno if I should, so I'll leave it. Suffice it to say we have some massive, everyday brandnames using our stack end to end.

Productivity gains of using CASE tools for development

I was using a CASE called MAGIC for a system I'm developing, I've never used this kind of tool before and at first sight I liked, a month later I had a lot of the application generated, I felt very productive and ... I would say ... satisfied.
In some way a felt uncomfortable, cause, there is no code and everything I was used to, but in the other hand I could speed up my developing. The fact is that eventually I returned to use C# because I find it more flexible to develop, I can make unit testing, use CVS, I have access to more resources and basically I had "all the control". I felt that this tool didn't give me confidence and I thought that later in the project I could not manage it due to its forced established rules of development. And also a lot of things like sending emails, using my own controls, and other things had their complication, it seemed that at some point it was not going to be as easy as initially I thought and as initially the product claims. This reminds me a very nice article called "No Silver Bullet".
This CASE had its advantages but on the other hand it doesn't have resources you can consult and actually the license and certification are very expensive. For me another dissapointing thing is that because of its simplistic approach for development I felt scared on first hand cause of my unexperience on these kind of tools and second cause I thought that if I continued using it maybe it would have turned to be a complex monster that I could not manage later in the project.
I think it's good to use these kind of solutions to speed up things but I wonder, why aren't these programs as popular as VS.Net, J2EE, Ruby, Python, etc. if they claim to enhance productivity better than the tools I've pointed?
We use a CASE tool at my current company for code generation and we are trying to move away from it.
The benefits that it brings - a graphical representation of the code making components 'easier' to pick up for new developers - are outweighed by the disadvantges in my opinion.
Those main disadvantages are:
We cannot do automatic merges, making it close to impossible for parallel development on one component.
Developers get dependant on the tool and 'forget' how to handcode.
Just a couple questions for you:
How much productivity do you gain compared to the control that you use?
How testable and reliant is the code you create?
How well can you implement a new pattern into your design?
I can't imagine that there is a CASE out there that I could write a test first and then use a CASE to generate the code I need. I'd rather stick to resharper which can easily do my mundane tasks and retain full control of my code.
The project I'm on originally went w/ the Oracle Development Suite to put together a web application.
Over time (5+ years), customer requirements became more complex than originally anticipated, and the screens were not easily maintainable. So, the team informally decided to start doing custom (hand coded) screens in web PL/SQL, instead of generating them using the Oracle Development Suite CASE tools (Oracle Designer).
The Oracle Report Builder component of the Development Suite is still being used by the team, as it seems to "get the job done" in a timely fashion. In general, the developers using the Report Builder tool are not very comfortable coding.
In this case, it seems that the productivity aspect of such CASE tools is heavily dependent on customer requirements and developer skill sets/training/background.
Unfortunaly the Magic tool doesn't generates code and also it can't implement a design pattern. I don't have control over the code cause as i stated before it doesn't have code to modify. Te bottom line is that it can speed up productivity in some way but it has the impossibility to user CVS, patterns also and I can't control all the details.
I agree with gary when he says "it seems that the productivity aspect of such CASE tools is heavily dependent on customer requirements and developer skill sets/training/background" but also I can't agree more with Klelky;
Those main disadvantages are:
1. We cannot do automatic merges, making it close to impossible for parallel development on one component.
2.Developers get dependant on the tool and 'forget' how to handcode.
Thanks

Resources