Is there a standard practice for storing default application data? - application-data

Our application includes a default set of data. The default data includes coefficients and other factors that are unlikely to ever change but still need to be update-able by the user.
Currently, the original default data is stored as a populated class within the application. Data updates are stored to an external XML file. This design allows us to include a "reset" feature to restore the original default data. Our rationale for not storing defaults externally [e.g. XML file] was to minimize the risk of being altered. The overall volume of data doesn't warrant a database.
Is there a standard practice for storing "default" application data?

Suppose I were to answers: "Yes, there is a standard. 79% of systems worldwide externalise to a database." Would you now feel motivated to adopt a database? Surely not! Your particular requirements don't merit that overhead.
We're talking trade-offs here. Do the defaults need to change frequently? How much effort is it to change them using your current approach? Do you need to release different versions of the application with different defaults? Do the defaults change as you move from UAT to Production?
If you explore your requirements an engineering solution should emerge. In all likelyhood you will then make a better choice than the current common practice ("standard") that most folks have adopted, which all too often is to use whatever technique they used on their previous project.
For what it's worth, my personal "standard" is to externalise everything. Even when I don't expect things to change, sometime, somewhere, they do. Once I've decided to externalise then XML or property files doesn't make much difference to me.

properties files sound like OK to me. You can also include them inside the jar so that you don't have to carry all around with it.
Edit: "reset" function goes into your application code though.

Having these defaults in an external file could make updating the defaults easier, you could always have a copy of this in the download/on cd etc.

Related

Data Removal Standards

I want to write an application that removes data from a hard drive. Are there any standards that I need to adhere to which will ensure that my software removes at least the bare minimum, or should I just use off the shelf software? If so any advice?
I think any "standard" you may encounter won't be any less science fiction or science mysticism than anything you come up with yourself. Basically, as long as you physically overwrite the data (even just once), there's no commercial forensic service that - even in the face of any amount of money you throw at them - will claim to be able to recover your data.
(Any "overwrite 35 times with rotating bit patterns" advice may have been true for coarsely spaced magnetic tapes in the 1970s, but it is entirely irrelevant for contemporary hard disks).
The far more important problem you have to solve is how to overwrite data physically. This is essentially impossible through any sort of application or even OS programming, and you'll have to find a way to talk to the hardware properly and get a reliable confirmation that the location you intended to write to has indeed be written to, and that there aren't any relocations of the clusters in question to other parts of the disk that might leak the data.
So in essence this is a very low-level question that'll probably have you pouring over your hard disk manufacturer's manuals quite a bit if you want a genuine solution.
Please define "data removal". Is this scrubbing in order to make undeletions impossible; or simply deletion of data ?
It is common to write over a file several times with a random bitpattern, if one wants to make sure it cannot be recovered. Due to the analog nature of the magnetic bit patterns, it might be possible to recover overwritten data in some circumstances.
Under all circumstances a normal file system delete operation will be revertable in most cases. When you delete a file (using a normal file system delete operation), you remove the file allocation table entry, not the data.
There are standards... see http://en.wikipedia.org/wiki/Data_erasure
You don't give any details so it is hard to tell whether they apply to your situation... Deleting a file with OS built-in file deletion can be almost always reverted... OTOH formatting a drive (NOT quick format) is usually ok except when you deal with sensitive data (like data from clients, patients, finance etc. or some security relevant stuff) then the above mentioned standards which usually use differents amounts/rounds/patterns of overwriting the data so make it nearly impossible to revert the deletion... in really really sensitive cases you first use the best of these methods, then format the drive, then use that method again and then destroy the drive physically (which in fact means real destruction, not only removing the electronics or similar!).
The best way to avoid all this hassle is to plan for this kind of thing and to use strong proven full-disk-encryption (with a key NOT stored on the drive electronics or media!)... this way you can easily just format the drive (NOT quick) and then sell it for example... since any strong encryption will look like "random data" is (if implemented correctly) absolutely useless without the key(s).

What cache strategy do I need in this case ?

I have what I consider to be a fairly simple application. A service returns some data based on another piece of data. A simple example, given a state name, the service returns the capital city.
All the data resides in a SQL Server 2008 database. The majority of this "static" data will rarely change. It will occassionally need to be updated and, when it does, I have no problem restarting the application to refresh the cache, if implemented.
Some data, which is more "dynamic", will be kept in the same database. This data includes contacts, statistics, etc. and will change more frequently (anywhere from hourly to daily to weekly). This data will be linked to the static data above via foreign keys (just like a SQL JOIN).
My question is, what exactly am I trying to implement here ? and how do I get started doing it ? I know the static data will be cached but I don't know where to start with that. I tried searching but came up with so much stuff and I'm not sure where to start. Recommendations for tutorials would also be appreciated.
You don't need to cache anything until you have a performance problem. Until you have a noticeable problem and have measured your application tiers to determine your database is in fact a bottleneck, which it rarely is, then start looking into caching data. It is always a tradeoff, memory vs CPU vs real time data availability. There is no reason to make your application more complicated than it needs to be just because.
An extremely simple 'win' here (I assume you're using WCF here) would be to use the declarative attribute-based caching mechanism built into the framework. It's easy to set up and manage, but you need to analyze your usage scenarios to make sure it's applied at the right locations to really benefit from it. This article is a good starting point.
Beyond that, I'd recommend looking into one of the many WCF books that deal with higher-level concepts like caching and try to figure out if their implementation patterns are applicable to your design.

Abstraction or not?

The other day i stumbled onto a rather old usenet post by Linus Torwalds. It is the infamous "You are full of bull****" post when he defends his choice of using plain C for Git over something more modern.
In particular this post made me think about the enormous amount of abstraction layers that accumulate one over the other where I work. Mine is a Windows .Net environment. I must say that I like C# and the .Net environment, it really makes most things easy.
Now, I come from a very different background made of Unix technologies like C and a plethora or scripting languages; to me, also, OOP is just one, and not always the best, programming paradigm.. I often struggle (in a working kind of way, of course!) with my colleagues (one in particular), because they appear to be of the "any problem can be solved with an additional level of abstraction" church, while I'm more of the "keeping it simple" school. I think that there is a very different mental approach to the problems that maybe comes from the exposure to different cultures.
As a very simple example, for the first project I did here I needed some configuration for an application. I made a 10 rows class to load and parse a txt file to be located in the program's root dir containing colon separated key / value pairs, one per row. It worked.
In the end, to standardize the approach to the configuration problem, we now have a library to be located on every machine running each configured program that calls a service that, at startup, loads up an xml that contains the references to other xmls, one per application, that contain the configurations themselves.
Now, it is extensible and made up of fancy reusable abstractions, providers and all, but I still think that, if we one day really happen to reuse part of it, with the time taken to make it up, we can make the needed code from start or copy / past the old code and modify it.
What are your thoughts about it? Can you point out some interesting reference dealing with the problem?
Thanks
Abstraction makes it easier to construct software and understand how it is put together, but it complicates fully understanding certain issues around performance and security, because the abstraction layers introduce certain kinds of complexity.
Torvalds' position is not absurd, but he is an extremist.
Simple answer: programming languages provide data structures and ways to combine them. Use these directly at first, do not abstract. If you find you have representation invariants to maintain that are at a high risk of being broken due to a large number of usage sites possibly outside your control, then consider abstraction.
To implement this, first provide functions and convert the call sites to use them without hiding the representation. Hide the data representation only when you're satisfied your functional representation is sufficient. Make sure at this time to document the invariant being protected.
An "extreme programming" version of this: do not abstract until you have test cases that break your program. If you think the invariant can be breached, write the case that breaks it first.
Here's a similar question: https://stackoverflow.com/questions/1992279/abstraction-in-todays-languages-excited-or-sad.
I agree with #Steve Emmerson - 'Coders at Work' would give you some excellent perspective on this issue.

How expensive is reading file properties? .NET

We're experimenting with appending timestamps to some URL's to let things cache but freshen them when they do change. We have code that boils down to this:
DateTime ts = File.GetLastWriteTime(absPath);
where absPath is a MappedPath of a url. So the web server will be checking this file's last write time every time we serve up a link to the file. Kinda gives me the willies - should it?
You should performance-test it, but off-hand I doubt it's any more expensive than testing a file's existence (e.g. whether it's read-only), and certainly less expensive than actually opening the file.
If (after testing) it you decide that it's a problem, you could also cache your calls to GetLastWriteTime (e.g. don't call it more than once every 5 seconds for any given file).
Also, I've never used it but if caching is a concern I hope you've considered delegating its implementation to some specialist like Squid instead of rolling your own.
I have not tried this, but your question is relevant to a situation that I have been thinking about.
You did not indicate what data is changing? database, xml data etc.
ASP.NET caching does support updating the cache based on a variety of dependencies.
Check out this article in the sections of File-based Dependency, Time-based Dependency,
and Key-based Dependency.
"Dependencies allow us to invalidate a particular item within the Cache based on changes to files, changes to other Cache keys, or at a fixed point in time. Let's look at each of these dependencies."
Here is the article:
http://msdn.microsoft.com/en-us/library/ms972379.aspx
Thanks
Joe
Essentially, there are three answers to your question of "how expensive?".
Too expensive - you've tested it and something has to change for the system to be usable.
Acceptable - you've tested it and it isn't great, but it is fast enough to use
Rather cheap - you've tested it and there is no noticeable impact on performance.
We can't really answer the question for you, so you'll just have to try it out. If you decide that it was too expensive or that it's worth your time to move it from acceptable to rather cheap, change the question to ask how to speed things up.
It'll incur additional small disk I/O's when the links are generated. If you create many URL's in a short period of time this could be a bottleneck. Noone can say for sure if this will impact your scenario - you really need to measure and see if this is going to be an issue.
Or if you're worried about it, why not cache it for a minute?

Should I cache instances of frequently accessed classes

New to .net and was wondering if there is a performance gain to keeping an instance of, for example a DAL object in scope?
Coming from the Coldfusion world I would instanciate a component and store it in the application scope so that every time my code needed to use that component it would not have to be instanciated over and over again effecting performance.
Is there any benefit to doing this in ASP.Net apps?
Unless you are actually experiencing a performance problem, than you need not worry yourself with optimizations like this.
Solve the business problems first, and use good design. As long as you have a decent abstraction layer for your data access code, then you can always implement a caching solution later down the road if it becomes a problem.
Remember that any caching solution increases complexity dramatically.
NO. In the multi-tier world of .asp this would be considered a case of "premature optimization". Once a sites suite of stubs, scripts and programs has scaled up and been running for a few months then you can look at logs and traces to see what might be cached, spawned or rewritten to improve performance. And as the infamous Jeff Atwood says "Most code optimizations for web servers will benifit from money being spent on new and improved hardware rather than tweaking code for hours and hours"
Yes indeed you can and probably should. Oftentimes the storage for this is in the Session; you store data that you want for the user.
If it's a global thing, you may load it in the Application_Start event and place it somewhere, possibly the HttpCache.
And just a note, some people use "Premature Optimisation" to avoid optimising at all; this is nonsense. It is reasonable to cache in this case.
It is very important to do the cost benefit analysis before caching any object, one must consider all the factors like
Performance advantage
Frequency of use
Hardware
Scalability
Maintainability
Time available for delivery (one of the most important factor)
Finally, it is always useful to cache object which are very costly to create or you are using very frequently i.e. Tables's Data (From DB) or xml data
Does the class you are considering this for have state? If not, (and DAL classes often do not have state, or do not need state), then you should make it's methods static, and then you don't need to instantiate it at all. If the only state it holds is a connection string, you can also make that property field a static property field, and avoid the requirement of instantiating it that way.
Otherwise, take a look at the design pattern called Flyweight

Resources