Optimizing a table with a huge text-field - asp.net

I have a project which generates snapshots of a database, converts it to XML and then stores the XML inside a separate database. Unfortunately, these snapshots are becoming huge files, and are now about 10 megabytes each. Fortunately, I only have to store them for about a month before they can be discarded again but still, a month of snapshots turn out to become real bad for it's performance...I think there is a way to improve performance a lot. No, not by storing the XML in a separate folder somewhere, because I don't have write access to any location on that server. The XML must stay within the database. But somehow, the field [Content] might be optimized somehow so things will speed up...I won't need any full-text search options on this field. I will never do any searching based on this field. So perhaps by disabling this field for search instructions or whatever?The table has no references to other tables, but the structure is fixed. I cannot rename things, or change the field types. So I wonder if optimizations is still possible.Well, is it?
The structure, as generated by SQL Server:
CREATE TABLE [dbo].[Snapshots](
[Identity] [int] IDENTITY(1,1) NOT NULL,
[Header] [varchar](64) NOT NULL,
[Machine] [varchar](64) NOT NULL,
[User] [varchar](64) NOT NULL,
[Timestamp] [datetime] NOT NULL,
[Comment] [text] NOT NULL,
[Content] [text] NOT NULL,
CONSTRAINT [PK_SnapshotLog]
PRIMARY KEY CLUSTERED ([Identity] ASC)
WITH (PAD_INDEX = OFF,
STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON,
FILLFACTOR = 90) ON [PRIMARY],
CONSTRAINT [IX_SnapshotLog_Header]
UNIQUE NONCLUSTERED ([Header] ASC)
WITH (PAD_INDEX = OFF,
STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON,
FILLFACTOR = 90)
ON [PRIMARY],
CONSTRAINT [IX_SnapshotLog_Timestamp]
UNIQUE NONCLUSTERED ([Timestamp] ASC)
WITH (PAD_INDEX = OFF,
STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON,
FILLFACTOR = 90)
ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
Performance isn't just slow when selecting data from this table but also when selecting or inserting data in one of the other tables in this database! When I delete all records from this table, the whole system is fast. When I start adding snapshots, performance starts to decrease. After about 30 snapshots, performance becomes bad and the risk of connection timeouts increase.Maybe the problem isn't in the database itself, although it's still slow when used through the management tool. (Fast when Snapshots is empty.) I mainly use ASP.NET 3.5 and the Entity Framework to connect to this database and then read the multiple tables. Maybe some performance can be gained here, although that wouldn't explain why the database is also slow from the management tools and when used through other applications with a direct connection...

The table is in PRIMARY filegroup. Could you move this table to a different filegroup or even that is constrained? If you can, you should move it to a different filegroup with its own physical file. That should help a lot. Check out how create new filegroup and move the object to a new file group.

Given your constraints you could try zipping the XML before inserting into the DB as binary. This should significantly reduce the storage cost of this data.
You mention this is bad for performance, how often are you reading from this snapshot table? If this is just stored it should only effect performance when writing. If you are often reading this are you sure the performance issue is with the datastoreage not the parsing of 10MB of XML?

The whole system became a lot faster when I replaced the TEXT datatype with the NVARCHAR(MAX) datatype. HLGEM pointed out to me that the TEXT datatype is outdated, thus troublesome. It's still a question if the datatype of these columns could be replaced this easy with the more modern datatype, though. (Translated: I need to test if the code will work with the altered datatype...)
So, if i would alter the datatype from TEXT to NVARCHAR(MAX), is there anything that would break because of this? Problems that I can expect?
Right now, this seems to solve the problem but I need to do some lobbying before I'm allowed to make this change. So I need to be real sure it won't cause any (unexpected) problems.

Related

Web service throws "The value 'null' cannot be parsed as the type 'Guid'." error

I have a system which stores data from an online SQL Server database in local storage. Data records are uploaded and downloaded using a web service. I am using an ADO.Net Entity Data Model in my code.
On some upload requests for one table the routine fails when I try to call it giving an error message "The value 'null' cannot be parsed as the type 'Guid'." This only happens occasionally and I have not worked out how to repeat the problem. I have logged it 80 times in the last month and in that time the routine has been called successfully 1200 times.
I have five fields in the database record for this table that are defined as uniqueidentifiers. Two of these are 'NOT NULL' and the other three are 'NULL'. Here is the 'CREATE TABLE' query showing the guid fields in this table:
CREATE TABLE [dbo].[Circuit](
[CircuitID] [uniqueidentifier] NOT NULL,
[BoardID] [uniqueidentifier] NOT NULL,
[RCDID] [uniqueidentifier] NULL,
[CircuitMasterID] [uniqueidentifier] NULL,
[DeviceID] [uniqueidentifier] NULL,
CONSTRAINT [PK_CircuitGuid] PRIMARY KEY NONCLUSTERED
(
[CircuitID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
)
GO
ALTER TABLE [dbo].[Circuit] WITH CHECK ADD CONSTRAINT [FK_Circuit_RCD] FOREIGN KEY([RCDID])
REFERENCES [dbo].[RCD] ([RCDID])
GO
ALTER TABLE [dbo].[Circuit] CHECK CONSTRAINT [FK_Circuit_RCD]
GO
ALTER TABLE [dbo].[Circuit] WITH CHECK ADD CONSTRAINT [FK_CircuitGuid_Board] FOREIGN KEY([BoardID])
REFERENCES [dbo].[Board] ([BoardID])
GO
ALTER TABLE [dbo].[Circuit] CHECK CONSTRAINT [FK_CircuitGuid_Board]
GO
The data uploaded for the guid fields in this table looks like this:
{"__type":"Circuit:#WaspWA","BoardID":"edb5f774-5e5d-490c-860b-73c3419628cf","CircuitID":"e95bbfa3-2af6-49a5-94dd-c98924ec9a62","CircuitMasterID":null,"DeviceID":"daf12fce-675c-46d9-94c4-ed28c63cdf30","RCDID":null}
This record was created on one machine uploaded to the online SQL Server database and then downloaded to another machine.
I have other similar tables in the database which never give any problems. It is just this table which I am getting error messages from. The two fields which are defined as 'NOT NULL' (BoardID and CircuitID) always have data in them and are never null.
Is there something obvious that I have missed here?
The problem was that the value 'null', a string, was being written into my local copy of CircuitMasterID rather than null. So when I tried to write this to SQL it didn't like it. The SQL error message shows null in quotes but I was not sure whether this was because it was a string or because the error message put the value in quotes to delineate it.
The value 'null' had found its way into the CircuitMasterID field because I had written the value null out into some HTML and when this was saved back to the field it became 'null'. I am storing data in local storage and this does not give very good type control. Note to self 'must add better type control'.

using ssdt, how can I create a filtered index on the latest 7 days?

We use SSDT to deploy our database changes. We have a script that recreates the index every week. Our script looks like this:
declare #cmd varchar(max)
set #cmd = '
CREATE NONCLUSTERED INDEX [iAudit-ModifiedDateTime] ON [dbo].[Audit]
(
[ModifiedDateTime] ASC
)
WHERE ModifiedDateTime > ''###''
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = ON, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 75) ON [PRIMARY]
'
set #cmd = replace(#cmd, '###', convert(varchar(8), dateadd(day, -3, getdate()), 112))
exec (#cmd)
Unfortunately when we run SSDT to update the database it changes the index to the definition in the project, or drops it when it is not included. Is there some way I can get around this?
The reason we need the filtered index is to add the latest records from an Audit table with 100's of millions of rows, into a data warehouse.
There are some options, in order of complexity:
Don't include the index definition in the project and disable the "Drop indexes not in source" option. In Visual Studio this is found in the Advanced options dialog of the Publish dialog. When using SqlPackage.exe to publish, you can use the parameter /p:DropIndexesNotInSource=false
Don't include the index definition in the project and put the index creation script into a post-deployment script. This will ensure that the index is always recreated after schema updates are deployed.
Use a community-authored deployment contributor to filter out modifications to this index. See https://the.agilesql.club/Blogs/Ed-Elliott/HOWTO-Filter-Dacpac-Deployments
Author a deployment contributor to filter out modifications to this index. See https://github.com/Microsoft/DACExtensions/

How can save a PDF file to a SQL table using an ASP.NET webpage?

This looks like a huge pile, but it's actually a very focused question. This looks bigger than it is because I am providing context and what I have been able to work out so far.
Let me start with the question in more precise terms: "Using an ASP.NET webpage, how can I: (a) "attach" a saved PDF file to a database table using a Formview, and (b) allow the user to view that saved PDF file when the database table row is selected in a Gridview?"
While I can easily store the path and filename of the PDF file, if the PDF file is renamed, moved, or deleted, the database record now has a broken link. My client has requested that I "attach" the actual PDF file to the database record to prevent broken links.
That answers the why: because my client requested it. Now it's a matter of figuring out how.
Here is what I have done up to now in my research:
I learned how to enable Filestream for a SQL Server 2012 database.
I created a table where one of the columns is a varbinary(max).
(Table definition language shown below in "Code Block #1".)
Using available online examples, I was able to test and verify a working T-SQL script -- however, I have not yet succeeded in making this a stored procedure because I do not know how to make the filename a variable in an "Openrowset" statement . (Script shown below in "Code Block #2".)
Where I'm drawing the big blank is the ASP.NET side of the equatiion. Here is the system I hope to set up. I'm not as restricted in terms of the details so long as they work along these lines.
User uses the Formview (connected to the database via SqlDataSource) to type in the values entered on the paper form, and finally "attach" the saved PDF file to the "Scanned_PDF_File" field.
A gridview immediately refreshes, showing the results from the "Scanned_PDFs" table, allowing the user to select a row and view the saved PDF file.
Is this approach possible? Any directions of further research would be greatly appreciated. Thank you for your help.
Code Block #1: Here is the definition of the SQL database table.
USE [IncidentReport_v3]
GO
/****** Object: Table [dbo].[Scanned_PDFs] Script Date: 1/13/2015 11:56:58 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[Scanned_PDFs](
[ID] [int] IDENTITY(1,1) NOT NULL,
[DateEntered] [date] NOT NULL,
[Scanned_PDF_File] [varbinary](max) NOT NULL,
CONSTRAINT [PK_Scanned_PDFs] PRIMARY KEY CLUSTERED ([ID] ASC) WITH (
PAD_INDEX = OFF,
STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON
) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
Code Block #2: This is the T-Sql Script I used to test the ability to insert a row w/a PDF file. It works great as a proof of concept if I hand-type the PDF file name and path, but I will need to make that filename a variable that the user supplies. I envision using this as a stored procedure -- or perhaps I could use this code on the client side? Not sure yet.
USE IncidentReport_v3;
GO
DECLARE #pdf AS VARBINARY(max)
SELECT #pdf = cast(bulkcolumn AS VARBINARY(max))
FROM openrowset(BULK '\\wales\e$\test\test.pdf', SINGLE_BLOB) AS x
INSERT INTO dbo.Scanned_PDFs (
DateEntered,
Scanned_PDF_File
)
SELECT cast('1/12/2015' AS DATE),
#pdf;
GO
You'll want to convert the PDF to a byte array before getting to the data layer. The [Scanned_PDF_File] gets set to the result. You can parse the file name or take it from some other value.
This link here, might give you everything you need.

Improving SQLite Query Performance

I have run the following query in SQLite and SQLServer. On SQLite the query has never finished runing - i have let it sit for hours and still continues to run. On SQLServer it takes a little less than a minute to run. The table has several hundred thousands of records. Is there a way to improve the performance of the query in SQLite?
update tmp_tbl
set prior_symbol = (select o.symbol
from options o
where o.underlying_ticker = tmp_tbl.underlying_ticker
and o.option_type = tmp_tbl.option_type
and o.expiration = tmp_tbl.expiration
and o.strike = (select max(o2.strike)
from options o2
where o2.underlying_ticker = tmp_tbl.underlying_ticker
and o2.option_type = tmp_tbl.option_type
and o2.expiration = tmp_tbl.expiration
and o2.strike < tmp_tbl.strike));
Update: I was able to get what I needed done using some python code and handling the data mapping outside of SQL. However, I am puzzled by the performance difference between SQLite and SQLServer - I was expecting SQLite to be much faster.
When I ran the above query initially, neither table had any indexes other than a standard primary key, id, which is unrelated to the data. I created two indexes as follows:
create index options_table_index on options(underlying_ticker, option_type, expiration, strike);
and:
create index tmp_tbl_index on tmp_tbl(underlying_ticker, option_type, expiration, strike);
But that didn't help. The query still continues to clock without any output - I let it run for nearly 40 minutes.
The table definition for tmp_tbl is:
create table tmp_tbl(id integer primary key,
symbol text,
underlying_ticker text,
option_type text,
strike real,
expiration text,
mid real,
prior_symbol real,
prior_premium real,
ratio real,
error_flag bit);
The definition of options table is similar but with a few more fields.

SqlDataSource inserts id 1004 instead of 14; How to fix? [duplicate]

This question already has answers here:
Identity increment is jumping in SQL Server database
(6 answers)
Closed 7 years ago.
I have a strange scenario in which the auto identity int column in my SQL Server 2012 database is not incrementing properly.
Say I have a table which uses an int auto identity as a primary key it is sporadically skipping increments, for example:
1,
2,
3,
4,
5,
1004,
1005
This is happening on a random number of tables at very random times, can not replicate it to find any trends.
How is this happening?
Is there a way to make it stop?
This is all perfectly normal. Microsoft added sequences in SQL Server 2012, finally, i might add and changed the way identity keys are generated. Have a look here for some explanation.
If you want to have the old behaviour, you can:
use trace flag 272 - this will cause a log record to be generated for each generated identity value. The performance of identity generation may be impacted by turning on this trace flag.
use a sequence generator with the NO CACHE setting (http://msdn.microsoft.com/en-us/library/ff878091.aspx)
Got the same problem, found the following bug report in SQL Server 2012
If still relevant see conditions that cause the issue - there are some workarounds there as well (didn't try though).
Failover or Restart Results in Reseed of Identity
While trace flag 272 may work for many, it definitely won't work for hosted Sql Server Express installations. So, I created an identity table, and use this through an INSTEAD OF trigger. I'm hoping this helps someone else, and/or gives others an opportunity to improve my solution. The last line allows returning the last identity column added. Since I typically use this to add a single row, this works to return the identity of a single inserted row.
The identity table:
CREATE TABLE [dbo].[tblsysIdentities](
[intTableId] [int] NOT NULL,
[intIdentityLast] [int] NOT NULL,
[strTable] [varchar](100) NOT NULL,
[tsConcurrency] [timestamp] NULL,
CONSTRAINT [PK_tblsysIdentities] PRIMARY KEY CLUSTERED
(
[intTableId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
and the insert trigger:
-- INSERT --
IF OBJECT_ID ('dbo.trgtblsysTrackerMessagesIdentity', 'TR') IS NOT NULL
DROP TRIGGER dbo.trgtblsysTrackerMessagesIdentity;
GO
CREATE TRIGGER trgtblsysTrackerMessagesIdentity
ON dbo.tblsysTrackerMessages
INSTEAD OF INSERT AS
BEGIN
DECLARE #intTrackerMessageId INT
DECLARE #intRowCount INT
SET #intRowCount = (SELECT COUNT(*) FROM INSERTED)
SET #intTrackerMessageId = (SELECT intIdentityLast FROM tblsysIdentities WHERE intTableId=1)
UPDATE tblsysIdentities SET intIdentityLast = #intTrackerMessageId + #intRowCount WHERE intTableId=1
INSERT INTO tblsysTrackerMessages(
[intTrackerMessageId],
[intTrackerId],
[strMessage],
[intTrackerMessageTypeId],
[datCreated],
[strCreatedBy])
SELECT #intTrackerMessageId + ROW_NUMBER() OVER (ORDER BY [datCreated]) AS [intTrackerMessageId],
[intTrackerId],
[strMessage],
[intTrackerMessageTypeId],
[datCreated],
[strCreatedBy] FROM INSERTED;
SELECT TOP 1 #intTrackerMessageId + #intRowCount FROM INSERTED;
END

Resources