Bad idea to duplicate GUIDs when replicating a database? - guid

I have the following scenario and I would like to know if my logic is sound or not.
Lets say I have a master DB that among others contains a people table. I also have several other DBs that contain an exact copy of the master DB people table. The people table needs to be kept synchronized among all DBs. The DBs use GUIDs so when a new person is added to the table a new GUID is assigned to that row.
What should I do when that new person's row in the master DB needs to be synchronized with all the other DBs? I can copy over the data and generate new GUID (default behavior in the software I'm using) or I can copy the data AND the GUID.
I've though about this and I feel copying the GUID is the best solution. The main reasoning behind this is that I might not have a way to uniquely identify each person (please don't ask why) so when there is an update to a row I'm not sure I can look up the matching row in each DB and sync the changes. However if I copy the GUID, I can simply look for that GUID in each table and know that is the matching row.
I don't see any downsides to using the same GUID in what is essentially a replicated table. Of course I understand using the same GUID for different data would be a bad idea (and I don't see why you'd want to do that).
Does my idea make sense?

Related

Data revisioning/versioning best practices

I am struggling to define an effective process of revisioning. We have some data spread across multiple tables. We cannot delete or update, we need to create new issues of the same data. I know the solution of a history table containing all revisions etc, but that seems to work fine as long as you want to keep revisions of simple structures, such as a Blogging-platform.
What if you have a database with many complex structures, where the simplest of them looks like this below.
If you change something in tableA, you can keep the old data in a history table. What happens though if you change something in TableB, which defines what a record in TableA is? It almost forces you to create a copy of TableA (new ID in other words) and recreate it's underlying structures (more new IDs). The whole process of creating a new ID each time a mistake is corrected or some peripheral data is added, doesn't feel ok.
Is there any good practice for such cases? I read somewhere about keeping the whole old data structure revisioned in XML, but that practice can be reluctant to schema changes and it is not easily querable. Technologies such as Flashback doesn't cover the whole spectrum of our needs either.
Tip: We're using Oracle v11.2.

Tables with data that will never be deleted or changed

This is a more in depth follow up to a question I asked yesterday about storing historical data ( Storing data in a side table that may change in its main table ) and I'm trying to narrow down my question.
If you have a table that represents a data object at the application level and need that table for historical purposes is it considered bad practice to set it up to where the information can't be deleted. Basically I have a table representing safety requirements for a worker and I want to make it so that these requirements can never be deleted or changed. So if a change needs to made a new record is created.
Is this not a good idea? What are the best practice to deal with data like this? I have a table with historical safety training data and it points to the table with requirement data (as well as some other key tables) so I can't let the requirements be changed or the historical table will be pointing to the wrong information.
Is this not a good idea?
Your scenario sounds perfectly valid to me. If you have historical data that you need to keep there are various ways to meeting that requirement.
Option 1:
Store all historical data and current data in one table (make sure you store a creation date so you know what's old and what's new). When you need to retrieve the most recent record for someone, just base it on the most recent date that exists in the table.
Option 2:
Store all historical data in a separate table and keep current data in another. This might be beneficial if you're working with millions of records so you don't degrade performance of any applications built on top of it. Either at the time of creating a new record or through some nightly job you can move old data into the other table to keep your current table lightweight.
Here is one alternative, that is not necessarily "better" but is something to keep in mind...
You could have separate "active" and "historical" tables, then create a trigger so whenever a row in the active table is modified or deleted, the old row values are copied to the historical table, together with the timestamp.
This way, the application can work with the active table in a natural way, while the accurate history of changes is automatically generated in the historical table. And since this works at the DBMS level, you'll be more resistant to application bugs.
Of course, things can get much messier if you need to maintain a history of the whole graph of objects (i.e. several tables linked via FOREIGN KEYs). Probably the simplest option is to simply forgo referential integrity for historical tables and just keep it for active tables.
If that's not enough for your project's needs, you'll have to somehow represent a "snapshot" of the whole graph at the moment of change. One way to do it is to treat the connections as versioned objects too. Alternatively, you could just copy all the connections with each version of the endpoint object. Either case will complicate your logic significantly.

You cannot add or change a record because a related record is required in table

I'm fairly new to Access.
I have a DB table that needs to be normalized. I have some information about a person. These people are authorized to grant access to areas at our work site. Every person may be authorized several times to manage different areas, and of course different people can be authorized to manage different areas. My first try at it was to include the authorization and the areas together, but I realized that I was really repeating the data that way. After doing some study I decided that the best way to do this was to create 4 tables
tblPerson, tblPermission, tblArea, tblArea_Permission
The tblArea_Permission is a join table for the many-to-many relationship between tblPermission and tblArea (this is something that I just learned about). I seemingly set up the table relationships OK on the relationship tab. I also use a query for adding the records to the join table. When I try to do this, with a query that is getting the records from the tables, I get "You cannot add or change a record because a related record is required in table XXX." This would seem to be impossible.
I decided that I could probably live with the DB not enforcing referential integrity and took that away and used a combined primary key for the two records because every person with permission will control an area in only one combination. That seemed to work, but then I noticed that the records would randomly change. I decided that the DB must be corrupt. Parts of the DB seem to be working correctly, so I started with a new database and imported the tables and one form, then started to rebuild the new tables as described above. I got the same error.
Any help would be greatly appreciated. I've read through some different books, and used google, but nothing addresses this.
If a person is authorised to manage an area, you need a persons_area table:
PersonID ) Primary key
AreaID )
Which shows which areas the person can manage. I am not sure where the permissions table is coming from.
You will then not be able to add a record to person_areas table unless you have an ID in the area table and an ID in the persons table. If either of these IDs are missing, you will get the error above.
If you want more relevant comments on your DB design, you will need to post schemas.

sql server database design

I am planning to create a website using ASP.NET and SQL Server. However, my plan for the database design leaves me wondering if there is a better way.
The website will serve as a repository of information for various users. I figure I would have two databases, a Membership and Profile database.
The profile database would contain user data for all users, where each user may have ~20 tables. I would create the tables when the user account is created and generate a key used to name the tables. The tables are not directly related.
For Example a set of tables for two different users could look like:
User1 Tables - TransactionTable_Key1, AssetTable_Key1, ResearchTable_Key1 ....;
User2 Tables - TransactionTable_Key2, AssetTable_Key2, ResearchTable_Key2 ....;
The Key1, Key2 etc.. values would be retrieved based on the MembershipID data when the account was created. This could result in a very large number of tables over time. I'm not sure if this will limit scalability by setting up the database in this way. Any recommendations?
Edit: I should mention that some of these tables would contain 20k+ rows.
Realistically it sounds like you only really need one database for this.
From the way you worded your question, it sounds like you're trying to dynamically create tables for users as they create accounts. I wouldn't recommend this method.
What you want to do is create a master table that contains a primary key for each individual user. I'm assuming this is the Membership table. Then create the ~20 tables that you need for the profiles of these members. Every record, no matter the number of users that you have, will go into these tables. These 20 tables would need to have a foreign key pointing to the unique identifier of the Membership table.
When you want to query a Member for their user information, just select from the tables where the membership table's primary Id matches the foreign key in the profile tables.
This would result in only a few tables in the end and is easily maintainable and follows better database design.
Your ORM layer (EF, LINQ, DAL code) will hate having to deal with one set of tables per tenant. It is much better to have either one set of tables for all tenant in a single database, or a separate database per tenant. The later is only better if schema upgrade has to be vetted by tenant (like Salesforce.com has). If you can afford to upgrade all tenant to a new schema at once then there is no reason for database per tenant.
When you design a schema that hold multiple tenant the important things to remember are
don't use heaps, all tables must be clustered index
add the tenant ID as the leftmost key to every clustered
add the tenant ID as the leftmost key to every non-clustered index too
add the Left.tenantID = right.tenantID predicate to every join
add the table.TenantID = #currentTenantID to every query
These are fairly simple rules and if you obey them (with no exceptions) you will get a perfect partitioning per tenant of every query (no query will ever ever scan rows in a range of a different tenant) so you eliminate contention between tenants. To be more through, you can disable lock escalation to make sure no tenant escalates to block every other tenant.
This design also lends itself to table partitioning and to sharing the database for scale-out.
You definitely don't want to create a set of tables for each user, and you would want these only in one database. Even with SQL Server 2008's large capacity for tables (note really total objects in database), it would quickly become unmanageable. Your best bet is to use 20 tables, and separate them via a column into user areas. You might consider partitioning the tables by this user value, but that should be tested for performance reasons too.
Yes, since the tables only contain id, key, and value, why not make one single table?
Have the columns:
id, user ID, key, value
Put an Index on the user ID field.
A key idea behind a relational database is that the table structure does not change. You create a solid set of tables, and these are the "bones" of your application.
Cheers,
Daniel
Neal,
The solution really depends on your requirement. If security and data access are concern and you have only a handful of users, you can set up a different db for each user with access for him set to only his/her database.
Other wise, what Daniel Williams suggested is a good alternative where you have one DB and tables laid out with a indexed column partitioning the users data rows.
It's hard to tell from the summary, but it looks like you are designing for dynamic attribution by user. This design approach is called EAV (Entity-Attribute-Value) and consists of a simple base collection key (UserID, SiteID, ProductID...) and then rows consisting of name/value pairs. In a more complex version, categories are sometimes added as "super columns" to the tuple/row and provide sub-groupings for a set of name/value pairs.
Designing in this way moves responsibility for data type integrity, relational integrity and tuple integrity to the application layer.
The risk with doing this in a relational system involves the breaking of the tuple or row into a set of rows. Updates, deletes, missing values and the definition of a tuple are no longer easily accessible through human interaction. As your application evolves and the definition of a tuple changes, it becomes almost impossible to tell if a name/value pair is missing because it's part of an earlier-version tuple or because it was unintentionally deleted. Ad-hoc research as well becomes harder to manage as business analysts must keep an understanding of the virtual structure either in their heads or in documentation provided.
If you are looking to implement an EAV model, I would suggest you look at a non-relational solution (nosql) like MongoDB or CouchDB. These stores allow a developer to save and retrieve "documents" or json-formatted messages that are essentially made up of a collection of name/value pairs and can look very much like a serialized object. The advantage here is that you can store dynamic attribution without breaking your tuple. You always know that you have a complete tuple because you can store and retrieve it as a single "blob" of information that can be serialized and deserialized at-will. You can also update single attributes within the tuple, if that's a concern.
MongoDB also provides some database-like features such as multiple-attribute indexes, a query engine that is robust in comparison to other similar non-relational offerings and a sharding solution that is much less trouble than trying to do it with MySQL.
I hope this helps.

Generating unique database IDs in code

One requirement is that when persisting my C# objects to the database I must decide the database ID (surrogate primary key) in code.
Second requirement is that the database type for the key must be int or char(x)... so no uniqueidentifier or binary(16) or the like.
These are unchangeable requirements.
What would be the best way to go about handling this?
One idea is the base64 encoded GUIDs looking like "XSiZtdXcKU68QWe7N96Dig". These are easily created in code and are to me acceptable in URLs if necessary. But will it be too expensive regarding performance (indexing, size) having all primary and foreign keys be char(22)? Off hand I really like this idea.
Another idea would be to create a code version of a database sequence creating incremented integers for me. But I don't know if this is plausible and would need some guidance to secure the reliability. The sequencer must know har far it has come and what about threads that I don't control etc.
I imagine that no table involved will ever exceed 1.000.000 rows... will probably be far less.
You could have a table called "sequences". For each table there would be a row with a counter. Then, when you need another number, fetch it from the counter table and increment it. Put it in a transaction and you will have uniqueness.
However this will suffer in terms of performance, of course.
A simple incrementing int would be the easiest way to ensure uniqueness. This is what the database will do if you let it. If you set the table row to auto_increment, the database will do this for you automatically.
There are no security issues with this, but since you will be handling it yourself instead of letting the database engine take care of it, you will need to ensure that you don't generate the same id twice. This should be simple if you are on a single threaded system, but if your program is distributed you will need to put some effort into ensuring the uniqueness.
Seeing that you have an ASP.NET app, you could do the following (hoping and assuming all users must authenticate themselves before using your app!):
Assign each user a unique "UserID" in your database (can be INT, or CHAR)
Assign each user a "HighestSequentialID" (INT) in your database
When the user logs on, read those values from the database and store them in e.g. a custom principal, or in a cookie, or something else
whenever the user is about to insert a row, create a segmented ID: (UserID).(User's sequential number) and store it as "VARCHAR(20)" - e.g. your UserID is 15 and thus this user's entries would have unique IDs of "15.00001", "15.00002" and so on.
when the user logs off (or at any other time), update its new, highest used sequential ID in the database so that next time around, you'll know what this user has used last
Again - you'll have to do a lot more housekeeping work yourself, and it's always prone to a mishap (assigning a duplicate user ID, or misinterpreting the highest sequential number for that user).
I would strongly recommend trying to get these requirements changed - with these in place, all solutions will be sub-optimal at best, while using the database to handle this would be totally painless.
Marc
For a table below 1.000.000 rows, I would not be too terribly concerned about a char(22) Primary key. Of course the ideal solution for a situation like this would be for each object to have something unique about it that you could leverage for the key, even if it is a multi-part key. The next ideal solution would be to have the requirements changed :)

Resources