Add new columns in asp .net application - asp.net

I am facing this question in a new little project:
The system to be built will allow user to add new columns to a table in the system, and then the user will be able to maintain the data, I think there is two ways to implement this:
1) create a few tables including "columns" table with "columnName" "columnValue" "datatype" etc to store the column definition, aonther table "XXCoumn" to store the value of the column (entered by user), and user a store procedure to query/update column data.
2) create the column in the table schema when user enter a new column, then the maintain of the table data is just as normal
which way do you guys reckon? or any new suggestion?
Some additional info: the data volumn is small, and I need to create reports.

Any good recommendations would require a much better understanding of your requirements, but here are some comments on the options you mentioned, as well as some additional thoughts.
1) Entity-Attribute-Value (EAV) Design: This is the option you describe where you have a table that has columns for ColumnName, Type and Value. This option has the advantage of being able to accomodate unlimited new columns easily, but I have found it to be painful when the time comes to retrieve meaningful data back. For example, say you have rows in this EAV table for {Color, varchar}{Red, Green, Blue}, and {Size, varchar}{Small, Medium, Large}. If you want to find all the small green items, you need something like this (untested SQL of course):
SELECT *
FROM ITEMS
WHERE ITEMID IN (SELECT ITEMID
FROM ITEM_ATTRIBUTES ATT INNER JOIN ITEM_VALUES VLS
ON ATT.AttributeID = VLS.AttributeID
WHERE ATT.ColumnName = 'Color' AND VLS.Value = 'Green')
AND ITEMID IN (SELECT ITEMID
FROM ITEM_ATTRIBUTES ATT INNER JOIN ITEM_VALUES VLS
ON ATT.AttributeID = VLS.AttributeID
WHERE ATT.ColumnName = 'Size' AND VLS.Value = 'Small')
Contrast this with having actual columns on the items table for color and size:
SELECT *
FROM ITEMS
WHERE COLOR = 'Green' AND SIZE = 'Small'
In addition, you will have a difficult time maintaining data integrity, if that is important for this app (and it is almost always important, even when you are told otherwise). In the example above, you will need to implement extra logic if "Color" should be limited to Blue, Green, and Red. Also, you will need to implement even more logic if certain colors only come in certain sizes (example - blue items are only available in small and medium)
2) User-Defined Columns: Just giving the user the ability to add additional columns to the table has the advantage of making data retrieval simpler, but all the data integrity issues remain. Also, your app usually requires extra logic to deal with the unknown columns.
3) Pre-Existing Custom Columns: I have worked with a few apps, such as CRMs, that provide a dozen or more columns already in place for user definition. Basically, the designers put in columns like "Text1","Text2","Text3","Number1","Number2", etc. The users then provide header and description information for these columns, and that is what the app uses for display purposes. This model has the advantage of easy data retrieval, as well as a pre-defined DB schema which should simplify app logic. Data integrity issues remain, however. The other obvious downside is that you will run out of pre-defined columns, which is what you are usually trying to avoid with this type of solution.
As with most design issues, there are tradeoffs to each solution. My experience has been that while many users/clients say they want solutions like these, in reality they are simply trying to ensure they don't get trapped with an app that can't grow with their needs. I have found that there are actually very few places where a design like this is needed. I can almost always create a design that addresses the expansion desires of the client without putting them into the role of database designer.

"The system to be built will allow user to add new columns to a table in the system..."
Really - that's the user story? Sounds like you've already made up your mind on the solution, to me.
Whether it's a good idea or not to allow a user to extend schemas is pretty context dependent. I'd have little problem in an admin-like, limited use way. But it'd be a horribly bad idea in a MySpace type way. I suspect your situation lies somewhere between those 2 extremes.
Extending the schema would lead to greatly more efficient queries - as you could add indexes and such - but it does expose some relational rules on your users. Also, the extension would (probably) lock the entire table and concurrent edits would need to be dealt with.

If this is centrally hosted by you, I would suggest NOT allowing user-input data to change the schema of the database (i.e. drive the creation of new tables).
Rather you may want to look into using XML fields in SQL to store variable field names of data, or a more generic table structure... this technique works pretty well if we're not talking crazy amounts of data...

Is it possible you're looking at your solution sideways? It sounds like you need a mapping table (sort of like your #1). You have a table, say "objects" for example, a table called "properties" which holds what you're calling columns and then a table that holds the values, so it just has object_id, property_id, value.
To put in a smarter way than I said it, take a look at the Entity-attribute-value model.

Related

3 column query in DynamoDB using DynamooseJs

My table is (device, type, value, timestamp), where (device,type,timestamp) makes a unique combination ( a candidate for composite key in non-DynamoDB DBMS).
My queries can range between any of these three attributes, such as
GET (value)s from (device) with (type) having (timestamp) greater than <some-timestamp>
I'm using dynamoosejs/dynamoose. And from most of the searches, I believe I'm supposed to use a combination of the three fields (as a single field ; device-type-timestamp) as id. However the set: function of Schema doesn't let me use the object properties (such as this.device) and due to some reasons, I cannot do it externally.
The closest I got (id:uuidv4:hashKey, device:string:GlobalSecIndex, type:string:LocalSecIndex, timestamp:Date:LocalSecIndex)
and
(id:uuidv4:rangeKey, device:string:hashKey, type:string:LocalSecIndex, timestamp:Date:LocalSecIndex)
and so on..
However, while using a Query, it becomes difficult to fetch results of particular device,type as the id, (hashKey or rangeKey) keeps missing from the scene.
So the question. How would you do it for such kind of table?
And point to be noted, this table is meant to gather content from IoT devices, which is generated every 5 mins by each device on an average.
I'm curious why you are choosing DynamoDB for this task. Advanced queries like this seem to be much better suited for a SQL based database as opposed to a NoSQL database. Due to the advanced nature of SQL queries, this task in my experience is a lot easier in SQL databases. So I would encourage you to think about if DynamoDB is truly the right system for what you are trying to do here.
If you determine it is, you might have to restructure your data a little bit. You could do something like having a property that is device-type and that will be the device and type values combined. Then set that as an index, and query based on that and sort by the timestamp, and filter out the results that are not greater than the value you want.
You are correct that currently, Dynamoose does not pass in the entire object into the set function. This is something that personally I'm open to exploring. I'm a member on the GitHub project, and if you would like to submit a PR adding that feature I would be more than happy to help explore that option with you and get that into the codebase.
The other thing you might want to explore is having a DynamoDB stream, that will set that device-type property whenever it gets added to your DynamoDB table. That would abstract that logic out of DynamoDB and your application. I'm not sure if it's necessary for what you are doing to decouple it to that level, but it might be something you want to explore.
Finally, depending on your setup, you could figure out which item will be more unique, device or type, and setup an index on that property. Then just query based on that, and filter out the results of the other property that you don't want. I'm not sure if that is what you are looking for, it will of course work, but I'm not sure how many items you will have in your table, and there get to be questions about scalability at a certain level. One way to solve some of those scalability questions might be to set the TTL of your items if you know that you the timestamp you are querying for is constant, or predictable ahead of time.
Overall there are a lot of ways to achieve what you are looking to do. Without more detail about how many items, what exactly those properties will be doing, the amount of scalability you require, which of those properties will be most unique, etc. it's hard to give a good solution. I would highly encourage you to think about if NoSQL is truly the best way to go. That query you are looking to do seems a LOT more like a SQL query. Not saying it's impossible in DynamoDB, but it will require some thought about how you want to structure your data model, and such.
Considering opinion of #charlie-fish, I decided to jump into Dynamoose and improvise the code to pass the model to the set function of the attribute. However, I discovered that the model is already being passed to default parameter of the attribute. So I changed my Schema to the following:
id:hashKey;default: function(model){ return model.device + "" + model.type; }
timestamp:rangeKey
For anyone landing here on this answer, please note that the default & set functions can access attribute options & schema instance using this . However both those functions should be regular functions, rather than arrow functions.
Keeping this here as an answer, but I won't accept it as an answer to my question for sometime, as I want to wait for someone else to hit out a better approach.
I also want to make sure that if a value is passed for id field, it shouldn't be set. For this I can use set to ignore the actual incoming value, which I don't know how, as of yet.

How do I create an Access Form for Table 1 where two Table 1 fields have different data from a single field in Table 2?

I’m absolutely stumped by what I suspect is probably simple for those experienced with Access. I’m brand new to Access (2010) and studying a lot, but unfortunately still confused by many basic concepts. I’m embarrassed to admit that I have spent about 40 hours trying (unsuccessfully) to solve the specific issue below. Please forgive me if I haven’t included enough detail here - I’m not sure how much someone needs to know to address this. I’m happy to edit and/or provide more information.
My question:
How do I create a Form for creating new records / editing existing records in Table1 where two fields in Table1 have different values from a single field in Table2? (I have better detail below)
I want the Form to have all the records from Table1. I have tried many different ways with queries, sub-forms, etc., but can’t pull it off. I’m fairly certain the issue is related to how I address Table2. Ideally, the user would be able to select from dropdowns in the form for the two fields to be updated in Table1.
I am including screenshots of a mockup of my intended Form concept, the object Relationships as I currently have them, the design and datasheet views of Table1 and the design and datasheet views of Table2.
“Table1” above is “t_PEOPLE” in the images while “Table2” is “t_COLORS.”
The object relationship types are currently one-to-many with enforced referential integrity (cascade update related fields) and the join properties are "include ALL records from 't_PEOPLE' and only those records from 't_COLORS' where the joined fields are equal."
I'm happy to send the actual database file if that helps.
I will be very grateful for any guidance - thank you!!
The general approach to this is as follows:
A) If I understand what you're trying to do here, your t_colors table is usually referred to as a Reference table or Lookup table. You need to make one form to add, edit, and delete records in this t_colors table. How the user accesses that form varies. I'll get to that in a minute.
B) The form for your People table needs to have drop down menus for your two color selections. In the dropdown menus' RowSource, you will use a query that looks up values in your t_colors table.
C) Depending what you are using your colors for in t_people, you should consider making a third table with PeopleID and ColorID in it. It would then link to both t_people and t_colors. This would allow you to have multiple colors specified for a single person, and you wouldn't be limited to two. In your People form, you would use a subform for these colors. The subform would probably need to be a datasheet form or a continuous form. If you are using a datasheet form for your people form, then you would need to use a datasheet form for the colors subform.
If the user wants to use a color that isn't already in your t_colors table, you need to give them a way of inserting that color. There are various approaches to this. You could use a union query in the dropdowns RowSource that shows a "" option. If selected you would bring up your Colors form and when they close the colors form you have to requery your dropdown menu. Or you could insert the color for them using VBA when they enter a value that is "Not In List" (an event that Combobox's have).
Please note that the relationships you've defined are not overly helpful or important in this case. Yes, they can be helpful when it comes to using the Update Cascade or Delete Cascade features. But quite truthfully, relationships are basically for programmers, to make sure you get an error if and when referential integrity is violated. Users should never see these errors and properly designed forms will prevent them from occurring. The main reason to use them is that it will force you to design your forms properly by giving you an error when something is wrong, hopefully during your own testing phase of the project.

Tables with data that will never be deleted or changed

This is a more in depth follow up to a question I asked yesterday about storing historical data ( Storing data in a side table that may change in its main table ) and I'm trying to narrow down my question.
If you have a table that represents a data object at the application level and need that table for historical purposes is it considered bad practice to set it up to where the information can't be deleted. Basically I have a table representing safety requirements for a worker and I want to make it so that these requirements can never be deleted or changed. So if a change needs to made a new record is created.
Is this not a good idea? What are the best practice to deal with data like this? I have a table with historical safety training data and it points to the table with requirement data (as well as some other key tables) so I can't let the requirements be changed or the historical table will be pointing to the wrong information.
Is this not a good idea?
Your scenario sounds perfectly valid to me. If you have historical data that you need to keep there are various ways to meeting that requirement.
Option 1:
Store all historical data and current data in one table (make sure you store a creation date so you know what's old and what's new). When you need to retrieve the most recent record for someone, just base it on the most recent date that exists in the table.
Option 2:
Store all historical data in a separate table and keep current data in another. This might be beneficial if you're working with millions of records so you don't degrade performance of any applications built on top of it. Either at the time of creating a new record or through some nightly job you can move old data into the other table to keep your current table lightweight.
Here is one alternative, that is not necessarily "better" but is something to keep in mind...
You could have separate "active" and "historical" tables, then create a trigger so whenever a row in the active table is modified or deleted, the old row values are copied to the historical table, together with the timestamp.
This way, the application can work with the active table in a natural way, while the accurate history of changes is automatically generated in the historical table. And since this works at the DBMS level, you'll be more resistant to application bugs.
Of course, things can get much messier if you need to maintain a history of the whole graph of objects (i.e. several tables linked via FOREIGN KEYs). Probably the simplest option is to simply forgo referential integrity for historical tables and just keep it for active tables.
If that's not enough for your project's needs, you'll have to somehow represent a "snapshot" of the whole graph at the moment of change. One way to do it is to treat the connections as versioned objects too. Alternatively, you could just copy all the connections with each version of the endpoint object. Either case will complicate your logic significantly.

How to setup data model for customizable application

I have an ASP.NET data entry application that is used by multiple clients. The application consists of multiple data entry modules that are common to all clients.
I now have multiple clients that want their own custom module added which will typically consist of a dozen or so data points. Some values will be text, others numeric, some will be dropdown selections, etc.
I'm in need of suggestions for handling the data model for this. I have two thoughts on how to handle. First would be to create a new table for each new module for each client. This is pretty clean but I don't particular like it. My other thought is to have one table with columns for each custom data point for each client. This table would end up with a lot of columns and a lot of NULL values. I don't really like either solution and suspect there's a better way to do this, so any feedback you have will be appreciated.
I'm using SQL Server 2008.
As always with these questions, "it depends".
The dreaded key-value table.
This approach relies on a table which lists the fields and their values as individual records.
CustomFields(clientId int, fieldName sysname, fieldValue varbinary)
Benefits:
Infinitely flexible
Easy to implement
Easy to index
non existing values take no space
Disadvantage:
Showing a list of all records with complete field list is a very dirty query
The Microsoft way
The Microsoft way of this kind of problem is "sparse columns" (introduced in SQL 2008)
Benefits:
Blessed by the people who design SQL Server
records can be queried without having to apply fancy pivots
Fields without data don't take space on disk
Disadvantage:
Many technical restrictions
a new field requires DML
The xml tax
You can add an xml field to the table which will be used to store all the "extra" fields.
Benefits:
unlimited flexibility
can be indexed
storage efficient (when it fits in a page)
With some xpath gymnastics the fields can be included in a flat recordset.
schema can be enforced with schema collections
Disadvantages:
not clearly visible what's in the field
xquery support in SQL Server has gaps which makes getting your data a real nightmare sometimes
There are maybe more solutions, but to me these are the main contenders. Which one to choose:
key-value seems appropriate when the number of extra fields is limited. (say no more than 10-20 or so)
Sparse columns is more suitable for data with many properties which are filled out infrequent. Sounds more appropriate when you can have many extra fields
xml column is very flexible, but a pain to query. Appropriate for solutions that write rarely and query rarely. ie: don't run aggregates etc on the data stored in this field.
I'd suggest you go with the first option you described. I wouldn't over think it. The second option you outlined would be a bad idea in my opinion.
If there are fields common to all the modules you're adding to the system you should consider keeping those in a single table then have other tables with the fields specific to a particular module related back to the primary key in the common table. This is basically table inheritance (http://www.sqlteam.com/article/implementing-table-inheritance-in-sql-server) and will centralize the common module data and make it easier to query across modules.

Fastest alternative to Datatable.Select() to narrow cached data?

Stack Overflowers:
I have a search function on my company's website (based on .NET 2.0) that allows you to narrow the product catalog using up to 9 different fields. Right now, after you make your selections on the frontend I am building a dynamic query and hitting the database (SQL Server) to get the resulting list of items numbers.
I would like to move away from hitting the database everytime and do all of this in memory for faster results. Basically a 3500 - 4500 row "table" with 10 columns: the item number (which could be a primary key) and the 9 attribute fields (which have repeating values for many many rows). There can be any number of different searches between the 9 columns to get the items you want:
Column A = 'foo' AND Column D = 'bar'
Column B = 'foo' AND Column C = 'bar' AND Column I = 'me'
Column H = 'foo'
etc...
Based on my research, the .Select() function seems like the slowest way to perform the search, but it stands out to me as being the quickest and easiest way to perform the narrowing searches to get the list of item numbers:
MyDataSet.Select("Column B = 'foo' AND Column E = 'bar' AND Column I = 'me'")
In my specific case, what method do you suggest I use as an alternative that has the same narrowing functionality and better performance instead of settling for the datatable.select() method?
Your best alternative is to let your database do what it's best at: querying and filtering data.
Caching DataTables (especially ones with 3500-4500 rows) is a bad idea for web applications. Calling Select() on a DataTable doesn't reduce the number of rows in the DataTable - it returns a new collection of rows (copied from the original), which means you'll still have the original 4000 rows sitting in the cache. Better to have nothing at all in the cache, and just get the rows you need when the user requests them.
DataTables (and DataSets) are best used with fat clients (usually Windows applications) that need to work with in-memory copies of database data while in a disconnected state.
Datatables are not optimally built for being queried, I wouldn't recommend going down this route, unless you really have a documented performance problem that you're certain would be improved by doing so.
If your dynamic queries are slow, it's probably because you haven't indexed your table properly in your database. Databases are designed to be able to optimally query your data, so my hunch would be that a little work on the database side of things should get you where you need to go.
If you really need to query ADO.Net datatables, make sure you read Scaling ADO.Net DataTables thoroughly. It talks about things you can do to speed up the performance of them, and gives you some benchmarks so you can see the difference.

Resources