SQL Server Table Structure, StartDate and EndDate - asp.net

I have a Tariffs table for international dialing Codes
with StartDate and EndDate
I'm using ASP.net Application to import excel offers to this table , Each offer contain about 10000 row, so it is a large table (about 3 millions row)
what is the faster scenario in SQL Server 2008 to create a stored-procedure or trigger to change the previous endDate for same tariff same prefix same destination and new rate on insert a new row,
and how to undo saving offer of 10000 rows and get back the table and update records to the previous state
Thank you,

The information in your question seems a bit jumbled, partially because of the ideas within it but also unhelpful grammer/whitespace (sorry to be so blunt but these things are helpful) but I'll try my best to answer.
In general, assume that a trigger is slower than a stored proc. They also add a higher level of complexity than many other things, like procs, so always be sure you really need one before using one.
But, I don't understand why you'd need a trigger if you're only inserting into one table. Triggers are usually used to implement a complex chain of logic. If it's a straight insert or update then keep simple and use a proc.
If it's just an insert, then the quickest way of all is a bulk insert.
Since you want to keep the previous state, my advice would be to create an archive/audit table (basically a duplicate, with possibly some extra fields like WhenInserted etc), on insert move (i.e. insert in the new table and then delete from the original) the existing rows into the archive and then you can do a bulk insert for the new rows.
But you use the word "change", so it's difficult to know what you really want. Hope that helps.

Related

Adding new fields to historical tables in BigQuery

I'm getting daily exports of Google Analytics data into BigQuery and these form the basis for our main reporting dataset.
Over time i need to add new columns for additional things we use to enrich the data - like say a mapping from url to 'reporting category' for example.
This is easy to just add as a new column onto the processed tables (there is about 10 processing steps at the moment for all the enrichment we do).
This issue is if stakeholders then ask - can we add that new column to the historical data?
Currently i then need to rerun all the daily jobs which is very slow and costly.
This is coming up frequently enough that i'm seriously thinking about redesigning my data pipelines to tailor for the fact that i often need to essentially drop and recreate ALL the data from time to time when i need to add a new field or correct old dirty data or something.
I'm just wondering if there is better ways to
Add a new column to an old table in BQ (would be happy to do this by hand for these instances where i can just join the new column based on the ga [hit_key] i have defined which is basically a row key)
(Less common) Update existing tables based on some where condition.
Just wondering what best practices are and if anyone has had similar issues where you basically need to update an historic shema and if there are ways to do it without just dropping and recreating which is essentially what i'm currently doing.
To be clearer on my current approach: I'm taking the [ga_sessions_yyyymmdd] table and making a series of [ga_data_prepN_yyyymmdd] tables where is either add new columns at each step or reduce the data in some way. There is now 11 of these steps and each time i'm taking all the 100 or more columns along for the ride. This is what i'm going to try design away from as currently 90% of the columns at each stage dont even need to be touched as they can just be joined back on at the end maybe based on hit_key or something.
It's a little bit messy though to try and pick apart.
Adding new columns to the schema of the existing historical tables is possible, but the values for newly added columns will be NULLs. If you do need to populate values into these columns, probably the best approach is to use UPDATE DML statement. More details how to try it out is here: Does BigQuery support UPDATE, DELETE, and INSERT (SQL DML) statements?

Move Records from "Active Tables" to "Archive Tables" in SQL Server 2012

This one is tricky to me in the since of SQL. I have 4 Tables. I'm having more a logic issue than anything else I think. I have a table called Vehicle. Within my application, any record changed in this Vehicle table gets inserted into a VLog table. The Primary key for Vehicle is VehicleID and it is related to VehicleID in VLog. In SQL, I'm not positive how to archive both of these tables. I have two archive tables. VehicleArchive and VLogArchive. Within my application, I'm able to "Archive" certain records based on a certain parameter. This was easy as I was using a gridview and could verify the VehicleID and Insert any record with that VehicleID into VLogArchive and VehicleArchive. However, I'm going to be dealing with live records and was wondering if there is a solution in SQL. There are multiple records for each VehicleID in VLog as VLog keeps track of all the changes made in Vehicle. Within my application itself, On "Update" button click, I was able to Insert the records with from VLog into VLogArchive by comparing it to the VehicleID within the gridview and then removing said records from VLog and the Location was changed to "File Room." It then goes on to insert the records from Vehicle into VehicleArchive also by comparing it to the VehicleID within the gridview again as long as the Location is "File Room." It seems backwards but I had to do it this way because if I tried to remove a record from Vehicle before removing the related records from VLog, it wouldn't delete as it is related. I do not know how to do this approach in SQL and was wondering if someone knew how to go about moving all the records at the same time if the Location is in "File Room."
I have found this and this but I'm not positive these are the approaches that I need. Thanks for the help!
I'm going to disagree with #Richard. I'm always going to lean towards keeping your tables 'lean and mean' if at all possible. If the data is old / unused, get it out of your production tables! Keeping unnecessary records in your transactional tables presents many performance and maintenance risks. Since you're going to include a check on that flag in EVERY select query you run (using the view that #Richard suggests), you're going to have to add that flag to every nonclustered index to avoid tipping point problems. You're also going to run into problem with unique constraint enforcement (if your database is properly designed) as well as making queries more error prone (what if someone writes a report and doesn't use the view?). The list of problems with metadata flags like that go on and on and on. Just don't do it.
There are a lot of different ways to peel the onion on archiving data. I prefer more frequent and smaller transactions myself. If you're archiving to the same server, use T-SQL as in your first link. If you're archiving to a different server, use SSIS.
Don't do it like that; add an archived int(1) field to the tables and use that to filter the views.
then use this as the source for the gridview
SELECT * FROM Vehicle WHERE Archived=0
to make a record archived you can then do
UPDATE Vehicle SET Archived=1 WHERE ID=1
Moving records like this on a production system is never going to be easy and will almost certainly cause more problems than you can imagine.
If you do really want to do this; then you're going to have to do it record by record; by copying the data; and of course this is going to break the referential integrity that you'll have to fixup manually if any other tables reference Vehicle.
You can do this either in code as a background procedure or (better) using SQL / SSIS on the db server

SQL server inserting lots of data from ASP.NET?

I have this application, where there is a parent child table, and customers can order products. The whole structure is quite complex to post here but suffice to say, there is one Order table and one OrderDetails table for storing the orders. Currently what we are doing is INSERT one record in Order table, and then for each item the customer added, insert each item in a loop to OrderDetails table. The solution is not scalable for obvious reasons. It works fine for 100 or so items, but if user goes over 1000 items, or 1000 qty of a item or so, one can start to notice the unresponsiveness of the application.
There are a couple of solutions that come to mind, but I am not sure which one would scale well. One is I use BulkInsert from my asp.net application to insert into the OrderDetails table. Second is I generate XML and then pass that to a sql proc and extract / insert data into OrderDetails table from that XML, but that have associate overhead of memory consumption of the XML generated. I know I could benchmark and see for myself what would suit best for my application, but I would like to know what is the most common strategy and would scale better when compared to other. Also, if there is another technique that I could use instead of these two, that would be better performance wise ( I know performance is subjective word, but let me narrow it down to speed ) I could use that. Which is generally used the most? What do you use in your application?
You could consider exploring the option of using a table valued parameter in the database. You will have to create a table type object, whose structure will mimic that of the OrderDetails table. The stored proc for inserting the data will accept an input parameter of this type (such parameters are always READONLY).
In your server side code, you can construct a DataTable object containing all the Order Details data, which will be mapped to the input parameter of the stored proc. Ensure that the order of columns in the DataTable object exactly matches the order in the table valued parameter. Upon executing the query, all the data will be inserted in one shot. This will save you from looping for each row of data that is there, and will also prevent the overhead of XML parsing. This approach though will involve passing an entire object over the network.
You can read more about it here : MSDN Table Valued Parameters
1000 items for an order does seem quite excessive!
Would it be feasible to introduce a limit of 100 items per order into the business logic of the application?

How to work with data stored in database?

Good day everyone,
I have some questions, about how to do calculations of data stored in the database. Like, I have a table:
| ID | Item name | quantity of items | item price | date |
and for example i have stored 10000 records.
First that I need to do is to pick up items from a date interval, so I wont need the whole database for my calculations. And then I get items from that date interval, I have to add some tables, for example to calculate:
full price = quantity of items * item price
and store them in new table for each item. So the database for the items picked from the date interval should look like this:
| ID | Item name | quantity of items | item price | date | full price |
The point is that I don't know how to store that items which i picked with date interval. Like, do i have create some temporary table, or something?
This will be using an ASP.NET web application, and for calculations in the database I think I will use SQL queries. Maybe there is an easier way to do it? Thank you for your time to help me.
Like other people have said, you can perform these queries on the fly rather than store them.
However, to answer your question, a query like this should do the trick..
I haven't tested this so the syntax might be off a touch, though it will get you on the right track.
Ultimately you have to do an insert with a select
insert into itemFullPrice
select id, itemname, itemqty, itemprice, [date], itemqty*itemprice as fullprice from items where [date] between '2012/10/01' AND '2012/11/01'
again..don't shoot me if i have got the syntax a little off.. it's a busy day today :D
Having 10000 records, it'd not be a good idea to use temporary tables.
You'd better have another table, called ProductsPriceHistory, where you peridodically calculate and store, let's say, monthly reports.
This way, your reports would be faster and you wouldn't have to make calculations everytime you want to get your report.
Be aware this approach is OK if your date intervals are fixed, I mean, monthly, quarterly, yearly, etc.
If your date intervals are dynamic, ex. from 10/20/2011 to 10/25/2011, from 05/20/2011 to 08/13/2011, etc, this approach wouldn't work.
Another approach is to make calculations on ASP.Net.
Even with 10000 records, your best bet is to calculate something like this on the fly. This is what structured databases were designed to do.
For instance:
SELECT [quantity of items] * [item price] AS [full price]
, [MyTable].*
FROM [MyTable]
More complex calculations that involve JOINs to 3 or more tables and thousands of records might lend itself to storing values.
There are few approaches:
use sql query to calculate that on the fly - this way nothing is stored to the database
use same or another table to perform calculation
use calculated field
If you have low database load (few queries per minute, few thousands of rows per fetch) then use first aproach.
If calculation on the fly performs poorly (millions of records, x fetches per second...) try second or third aproach.
Third one is ok if your db supports calculated and persisted fields, say MSSQL Server.
EDIT:
Typically, as others said, you will perform calculation in your query. That is, as long as your project is simple enough.
First, when the table where you store all the items and their prices becomes attacked with insert/update/deletes from multiple clients, you don't want to block or be blocked by others. You have to understand that e.g. table X update will possibly block your select from table X until it is finished (look up page/row lock). This means that you are going to love parallel denormalized structure (table with product and the calculated stuff along with it). This is where e.g. reporting comes into play.
Second, when calculation is simple enough (a*b) and done over not-so-many records, then it's ok. When you have e.g. 10M records and you have to correlate each row with several other rows and do some aggregation over some groups, there is a chance that calculated/persisted field will save your time - you can gain up to 10-100 times faster result using this approach.
You should separate concerns in your application:
aspx pages for presentation
sql server for data persistency
some kind of intermediate "business" layer for extra logic like fullprice = p * q
E.g. if you are using Linq-2-sql for data retrieval, it is very simple to add a the fullprice to your entities. The same for entity framework. Also, if you want, you can already do the computation of p*q in the SQL select. Only if performance really becomes an issue, you can start thinking about temporary tables, views with clustered indexes etc.

Reindexing a large SQL Server database to Lucene

We have a web service method which accepts some data and puts it in Lucene index. We use it to index new and updated entries from our asp.net web app.
These entries are stored in a large SQL Server table (20M rows and growing), and I need a way to be able to reindex the whole table in case if current index gets deleted or corrupted. I'm not sure what's the optimal way to retrieve chunks of data from a large table. Currently, we use the fact that the table has PK which is autoincrement, so we get chunks of 1000 rows until it starts to return nothing. Kind of like (in pseudo language):
i = 0
while (true)
{
SELECT col1, col2, col3 FROM mytable WHERE pk between i and i + 1000
.... if result is empty 20 times in a row, break ....
.... otherwise send result to web service to reindex ....
i = i + 1000
}
This way, we don't need to SELECT COUNT(*) which would be a big performance killer, and we just move up the pk values until we stop getting any results. This has it's con: if we have a hole greater than 20,000 values somewhere in the table, it will stop indexing assuming it reached the end, but that's a tradeoff we have to live for now.
Can anyone suggest a more efficient way of getting data from a table to index? I would assume we are not the first ones facing this problem - search engines are widely used nowadays :)
For what we do with Lucene, we rarely need to reindex everything. I can't remember coming across any case when all index would be corrupted (Lucene is actually quite safe/good at this), but it has been many times when individual items needed to be reindexed because of one reason or another. I'd say the most frequent reindexing patterns would be:
reindex items by given id (or set of ids)
reindex items by given period of time
The latter, of course, requires separate db index on the relevant date field(s) which should be a bit costly for 20M+ records but we decided to go for it (our biggest deployment had up to 10M records) as disk space is cheap these days anyway.
EDIT: added few explanations as per question author's comment.
If the source data structure changes, requiring reindexing of all records, our approach is to roll out new code which ensures all new data is correct (basically forms correct Lucene Document from this moment). Then after we can reindex things in batches (either manually or by hand), by providing relevant period ranges. This, to certain extent, also applies to Lucene version changes, too.
Why is a COUNT(*) a performance killer? What about MAX(id)? I'm thinking that a index would provide the information needed for those queries. You do have an index on your primary key, right?
I actually just figured it out - I can use IDENT_CURRENT(table_name) to get the last generated id, and use that instead of MAX() or Count() - this method should blow the other two away :)

Resources