This is a more in depth follow up to a question I asked yesterday about storing historical data ( Storing data in a side table that may change in its main table ) and I'm trying to narrow down my question.
If you have a table that represents a data object at the application level and need that table for historical purposes is it considered bad practice to set it up to where the information can't be deleted. Basically I have a table representing safety requirements for a worker and I want to make it so that these requirements can never be deleted or changed. So if a change needs to made a new record is created.
Is this not a good idea? What are the best practice to deal with data like this? I have a table with historical safety training data and it points to the table with requirement data (as well as some other key tables) so I can't let the requirements be changed or the historical table will be pointing to the wrong information.
Is this not a good idea?
Your scenario sounds perfectly valid to me. If you have historical data that you need to keep there are various ways to meeting that requirement.
Option 1:
Store all historical data and current data in one table (make sure you store a creation date so you know what's old and what's new). When you need to retrieve the most recent record for someone, just base it on the most recent date that exists in the table.
Option 2:
Store all historical data in a separate table and keep current data in another. This might be beneficial if you're working with millions of records so you don't degrade performance of any applications built on top of it. Either at the time of creating a new record or through some nightly job you can move old data into the other table to keep your current table lightweight.
Here is one alternative, that is not necessarily "better" but is something to keep in mind...
You could have separate "active" and "historical" tables, then create a trigger so whenever a row in the active table is modified or deleted, the old row values are copied to the historical table, together with the timestamp.
This way, the application can work with the active table in a natural way, while the accurate history of changes is automatically generated in the historical table. And since this works at the DBMS level, you'll be more resistant to application bugs.
Of course, things can get much messier if you need to maintain a history of the whole graph of objects (i.e. several tables linked via FOREIGN KEYs). Probably the simplest option is to simply forgo referential integrity for historical tables and just keep it for active tables.
If that's not enough for your project's needs, you'll have to somehow represent a "snapshot" of the whole graph at the moment of change. One way to do it is to treat the connections as versioned objects too. Alternatively, you could just copy all the connections with each version of the endpoint object. Either case will complicate your logic significantly.
Related
Could you help to understand this approach:
I have to do a query that make some operations, i do not want to use containers, since i read that temporal tables are faster, at least for my case, but i dont get how it works:
The Web Service that i will use to make inserts in temporally table, will be consumed by some people in the same time, each values for each users will be diferents, because thats the reason why i want to do this.... but i dont understand how the temporal table, will manage data for each user; Because it will be only a table, so, if an user perform the WS, the table will contain some rows, but then another user could perform in the same time the WS, that should fill the table with another values, how is works?
Temporal tables are saved for each users or how it works, for my case?
Thanks in advance
Both temp tables are based on scope. When variable/buffer goes out of scope tables are dropped. So each user or WS call uses its own table.
You can find specs here:
https://msdn.microsoft.com/en-us/library/gg845661.aspx
https://msdn.microsoft.com/en-us/library/bb314749.aspx
I am struggling to define an effective process of revisioning. We have some data spread across multiple tables. We cannot delete or update, we need to create new issues of the same data. I know the solution of a history table containing all revisions etc, but that seems to work fine as long as you want to keep revisions of simple structures, such as a Blogging-platform.
What if you have a database with many complex structures, where the simplest of them looks like this below.
If you change something in tableA, you can keep the old data in a history table. What happens though if you change something in TableB, which defines what a record in TableA is? It almost forces you to create a copy of TableA (new ID in other words) and recreate it's underlying structures (more new IDs). The whole process of creating a new ID each time a mistake is corrected or some peripheral data is added, doesn't feel ok.
Is there any good practice for such cases? I read somewhere about keeping the whole old data structure revisioned in XML, but that practice can be reluctant to schema changes and it is not easily querable. Technologies such as Flashback doesn't cover the whole spectrum of our needs either.
Tip: We're using Oracle v11.2.
I need to date/timestamp various transactions, and can add that explicityly into the data structure.
Firebase creates an ID like IuId2Du7p9rJoT-BARu using some algorithm.
Is there a way I can decode the date/time from the firebase-created ID and avoid storing a separate date/timestamp?
Short answer: no.
I've asked the same question previously, because my engineer instincts tell me I can never duplicate data. The conclusion that I came to after I thought this through to the logical end, is that even in a SQL database there exists tons of duplication. It's simply hidden under the covers (as indices, temporary tables, and memory caches). This is a part of large and active data.
So drop the timestamp in the data and go have lunch; save yourself some energy :)
Alternately, skip the timestamp entirely. You know that the records are stored by timestamp already, assuming you haven't provided your own priority, so you should be good to go.
I'm developing a quick side project that needs a users table, and I want them to be able to store profile data. I was already reaching for the ASP.NET profile provider when I realized that users will only ever have one profile.
I realize that frequently changing data will impact performance on things like indexes and stuff but how frequent is too frequent?
If I have one profile change per month per user happening for say 1000 users, is that a lot?
Or are we talking more like users changing profile data on an hourly basis?
I realize this isn't an exact science but I'm trying to gauge at what point the threshold starts to peak, and since my users profile data will probably rarely change if I should bother the extra work or just wait a few decades for it to be a problem.
One thing to consider is how adding a large text column to a table will affect the layout of the rows. Some databases will store the large columns inlined with the other fixed size columns; this will make the rows variable sized and that means more work for the database when it needs to pull a row off the disk. Other databases (such as PostgreSQL) store large text columns away from the fixed size columns; this leads to fixed sized rows with quick access during table scans and the like but an extra bit of work is needed to pull out the text columns.
1000 users isn't that much in database terms so there's probably nothing to worry about one way or the other. OTOH, little one-off side projects have a nasty habit of turning into real mission critical projects when you're not looking so doing it right from the beginning is a good idea.
I think Justin Cave has covered the index issue well enough.
As long as you structure your data access properly (i.e. all access to your user table goes through one isolated pile of code) then changing your data schema for users won't be much work anyway.
Does the profile information actually need to be indexed? Or are you just going to be retrieving it based on the USER_ID of the table or some other indexed USER column? If the profile data isn't indexed, which seems likely to me, than there are no performance impacts to other indexes on the table.
The only reason I can think of to be concerned about putting profile information in the table is if there is a lot of data compared to the necessary information to define a user and if the USER table needs to be full scanned for some reason. In that case, increasing the size of the table would adversely affect the performance of a table scan. Assuming that you don't have a use case where it's regularly going to make sense to do a full scan on the USERS table, and given that the table will only have 1000 rows, that's probably not a big deal.
I've searched through the site and haven't found a question/answer that quite answer my question, the closest one I found was: Syncing objects between two disparate systems best approach.
Anyway to begun, because there is no RSS feeds available, I'm screen scraping a webpage, hence it does a fetch then it goes through the webpage to scrap out all of the information that I'm interested in and dumps that information into a sqlite database so that I can query the information at my leisure without doing repeat fetching from the website.
However I'm also storing various metadata on the data itself that is stored in the sqlite db, such as: have I looked at the data, is the data new/old, bookmark to a chunk of data (Think of it as a collection of unrelated data, and the bookmark is just a pointer to where I am in processing/reading of the said data).
So right now my current problem is trying to figure out how to update the local sqlite database with new data and/or changed data from the website in a manner that is effective and straightforward.
Here's my current idea:
Download the page itself
Create a temporary table for the parsed data to go into
Do a comparison between the official and the temporary table and copy updates and/or new information to the official table
This process seems kind of complicated because I would have to figure out how to determine if the data in the temporary table is new, updated, or unchanged. So I am wondering if there isn't a better approach or if anyone has any suggestion on how to architecture/structure such system?
Edit 1:
I'm not sure where to put the additional information, in an comment or as an edit, so I'm going to add it here.
This expands a bit on the metadata in regards of bookmarking, basically the data source can create new data/addition to the current data, so one reason why I was thinking of doing the temporary table idea was so that I would be able to determine if an data source that has been "bookmarked" has any new data or not.
Is it really important to determine if the data in the temporary table is new, updated or unchanged? Do you really need to keep an history of the changes?
NO: don't use the temporary table but just mark as old (timestamp) your old records, don't do updates, and just insert your new data.
YES: your idea seems correct to me but all depends on how much data you need to process each time; i don't think it is feasible with a large amount of data.