Ok. I have been pulling my hair out for the past couple of days trying to do the worlds simplest report...
So after figuring out the table associations, and pulling sample data, it has come to my attention that I need to change up how Im pulling the data.
Part of this is pulling Year-to-Date from the Invoice table.
However, qodbc is stupid (its probably me, but it makes me feel better to blame the driver)
SELECT * FROM Invoice WHERE TimeCreated > '2014-01-01 00:00:00.000' keeps giving me the error of Invalid operand for operator: >
Searching Google has provided no help for me.
Soooo... I need help with searching by a date field. Any one have ANY ideas or suggestions?
Also, bonus points but related... Anyone else have issues with the speed of the qodbc driver? Some tables can be searched just as fast as mysql, but some tables... Holy crap. 10 mins for a simple query. Ideas for improving the speed of those?
Date Format
SELECT *
from InvoiceLine
WHERE Txndate >= {d '2005-09-23'}
Time Stamps Format
SELECT *
FROM Customer
WHERE TimeCreated = {ts '1999-07-29 14:24:18.000'}
SELECT *
from InvoiceLine
WHERE TimeModified >= {ts '2005-09-23 00:00:00.000'}
Refer: http://support.flexquarters.com/esupport/index.php?/Default/Knowledgebase/Article/View/2203/50/how-are-dates-formatted-in-sql-queries-when-using-the-quickbooks-generated-time-stamps
How to make QODBC running faster
Keep in mind that QODBC is not a database tool, but rather a translation tool. QuickBooks is a non-normalized flat file system which has no indexes available and will not perform like SQL Server or dBase files. Every transaction you request must be translated and communicated to QuickBooks via large complicated XML transactions.
Try and keep your result set as small as possible to get a feel for the system, carefully design and test any multi-file joins, and keep the number of returned fields to a minimum for maximum performance.
Our main goal is to make it easier to access QuickBooks data in a standardised database-like fashion, but queries must be optimized to perform as fast as possible.
Also, try and use ranges of dates on TxnDate, TxnDateMacro and TimeModified as much as possible to narrow down the data to the smallest possible segment. For example, make something similar to this in the first part of your WHERE clause:
Invoice.TimeModified >= {ts'2003-09-01 17:01:09'} AND
Invoice.TimeModified <= {ts'2003-09-02 17:01:09'}
I would suggest to use Optimizer. sp_optimizefullsync All
See: How to setup QODBC Optimizer and where are the Optimizer options for all the details about optimizer. (http://support.flexquarters.com/esupport/index.php?/Default/Knowledgebase/Article/View/2358/48/how-to-setup-qodbc-optimizer-and-where-are-the-optimizer-options)
By default the optimizer will update new and changed entries in a table from QuickBooks first and then execute the query against the local optimized table. This is faster than reading everything out of QuickBooks every time, especially the more data you have.
Never used this driver but try this:
SELECT * FROM Invoice WHERE TimeCreated > {d'2014-01-01 00:00:00.000'}
Might need to muck with the format of the date string a little, just a guess.
As far as the speed of your selects, if the queries have WHERE clauses this can be impacted by not having an index on the table. Tables with indexes will return results faster than tables without.
Related
I am trying to understand the limitations of DynamoDB/NoSQL, mostly as a learning exercise. I came across a problem that is fairly simple in a relational database, but I cannot figure out how to accomplish it in DynamoDB even with full control of rebuilding the tables and indexes.
Problem: Every day everyone in an office chooses one fruit for lunch. At the end of the week, I just want a list of everyone who ate both an apple and a banana.
Example Data
I thought employee name should be the PK, day of the week should be the SK.. and Fruit would be an attribute. But that doesn't seem to work, because you cant query against an attribute.
Is there a way to structure the data to make this work? Is there another tool like OpenSearch, HiveQL, GraphQL that can help me do what i am trying to do here?
Thanks.
When you say it's "fairly simple in a relational database", what you mean is it's simple to express, not exactly simple to compute. You're pushing a lot of list intersection work to the database. As your data set grows, the response time for your query will get slower and slower. At some point the database will no longer be able to give you the answer. And while it's consuming CPU (before timing out) you're negatively impacting the load on the relational database server for other users.
With DynamoDB you can't express queries that take unbounded effort to compute or that depend so much on total data set size for their performance characteristics. You have to design a query system up front that doesn't get exponentially slower as the data set grows.
The DynamoDB design then depends on what you know up front. For example, do you know it's always the intersection of an apple and banana? Then during insert of a new food note if the person ate both, and mark them as such on a user metadata item. Use that marker later during the query phase.
Sound like a nuisance? Well, if your data set isn't growing large and/or you don't need reliably fast query performance, then a relational database solves this problem well. Different databases for different purposes.
DynamoDB also supports SCAN and not only QUERY.
A simple design for the table is to have the PK to be the name of the person, and the attributes will be the numeric values of the fruits that you can increase every day.
UPDATE "FRUIT_COUNTS"
SET BANANA=BANANA + 1
WHERE Employee='Bob'
Then, at the end of the week, you can run a simple PartiQL query on the table:
SELECT * FROM "FRUIT_COUNTS"
WHERE BANANA > 0 AND APPLE > 0
Background:
I'm using SQL Server 2008 and ASP.NET 4 on Windows 2008
I have one table with about 10 million rows of products that I make available online for users to browse -- not search. Each of the 10 million products have extra attributes -- like categories -- that I keep in lookup tables -- there are three or four lookup tables.
Problem
When someone browses and starts using filters (shipping location, price, quality, brand), I need to join the tables, apply all the filters, and return the results. It's very slow and I want to make it faster. Sometimes users will apply a very broad filter, resulting in 800,000 results, and though I only return the first 10 of those for browsing, I still need to run the query for the full 800,000.
What I've Tried Already
I've joined all the information from the various tables into one physical table and then created a covering index for the table.
The queries are much faster, but there is a good bit of maintenance I have to do on the table behind the scenes with jobs to make sure if something goes out of stock I take it out within a reasonable time frame (5 mins or so).
I don't use materialized/indexed views b/c I've got aggregates in the results which SQL Server doesn't seem to like.
Question
How can I speed up browse results beyond the indexing and table optimization that I've already done? I'm not doing any full-text searches -- I'm filtering with exact parameters.
Possible Solutions I've Thought Of
Large caching solution -- AppFabric or MemCached. I'm know next to nothign about these and don't know they are appropriate.
Small caching solution -- Maybe leveraging ASP.NET caching -- but every person is going to apply different filters so I'm not sure how much this will give me.
SSDs -- as a larger-scale solution I've thought about getting SSDs but that will be down the road
CDN -- I don't think a CDN will help b/c the bottleneck here is my database's search capabilities, not the bandwidth/distance to the requester.
I had a similar problem with a complex join query causing horrible response times. I was able to solve it via using Lucene.NET. It's a .NET implementation of the Lucene search index. Basically, you build indexes on data fields (your categories) and then you can search via those categories and return thousands of rows very quickly. Basically, it takes the join operation out of the equation because it already knows, via the indexes, which records fit your criteria.
The following is a very good article on Lucene.NET. I highly recommend it. It took a search result that was taking 20 seconds using standard joins and reduced the response time to less than a second.
http://www.codeproject.com/Articles/29755/Introducing-Lucene-Net
Also, feel free to ping me if you have specific Lucene.NET implmenetation questions. I just got through a lot of research/learning in order to implement it properly on my site, so if you have specific questions on how to make it work I may be able to help with that as well.
"I perform the full query b/c I need to populate the new filters and
the number of results along with the search results. For example, if
someeone filters on category of "Shoes", and location of TX, some of
the other filters are going to be restricted based on the previous
filter."
Try executing two queries: One to count all results and one to select the top N. Maybe your bottleneck is copying 800,000 rows to the client. Doing two queries would fix this at the cost of an additional query. The cost is likely to be less than 2x though due to optimizations for few rows and for count-only queries.
Let's say i have a table in a database with 10k records. I dont need to actually use those 10k records anymore, but i still need to keep them in the database. That very table is now going to be used to store new data. So there's gonna be more records coming on top of the 10K records already present in the table. As opposed to the "old" 10K records, i do need to work with the newly inserted data. Right now im doing this to get the data i need:
List<Stuff> l = (from x in db.Table
where x.id > id
select x).ToList();
My question now is: how does the where clause in LINQ (or in SQL in general) work under the covers? Is the ENTIRE table going to be searched until (x.id > id) is true? Because let's say the table will increase from 10k records to 20K. It'd be a little silly to look through the entire 20 k records, if i know that i only have to start looking from a certain point.
I've had performance problems (not dramatic, but bad enough to be agitated by it) with this while using LINQ to entities, which i kinda don't understand because it should be no problem at all for a modern computer to sift through a mere 20 k records. I've been advised to use a stored procedure instead of a LINQ query, but i dont know whether or not this will boost performance?
Any feedback will be appreciated.
It's going to behave just like a similarly worded SQL query would. The question is whether the overhead you're experiencing is happening in the query or in the conversion of the query to a list. The query itself as you've written should equate literally to:
Select ID, Column1, Column2, Column3, ... , Column(n+1)
From db.Table
Where ID > id
This query should be fairly fast depending on the nature of the data. The query itself will not be executed until it is acted upon, however. In this case, you're converting it to a list, which is the equivalent of acting upon it. I can't find the comment someone made to me about this practice, but I've found it too be quite helpful in keeping performance clean. Unless you have some very specific need, you should leave your queries as IQueryable. Converting them to lists doubles the effort because first the query must be executed and then the result set must be converted into an appropriate IEnumerable (List in this case).
So you have 2 potential bottlenecks. The simple query could be taking a long time to query a massive collection of data, or the number of records could be bottenecking at the poing where the List is created. Another possibility is the nature of ID in this case. If it is numeric, that will save you some time. If it's performing a text-based search then it's going to be heavier.
To answer your specific question, yes, it's going to search every record in the database and return all of the records that match the expression. Edit: If the database has a proper index on the column in question, it will not search EVERY record but rather will use the index to perform the search. From comment from #Pleun.
As for using a stored procedure, that's a load of hogwash, but it's a perfectly acceptable alternative. I have several programs that routinely run similar queries against a database with over 40 million records, and the only performance issue I've run into so far has been CPU usage when multiple users are performing rapid firing queries. To solve your specific issue, I'd recommend that you tune it a little in SQL Management Studio until the query you want returns to your interface with an acceptable speed. Then you can convert that query into a compatible Linq statement. As long as you leave it as an IQueryable it should exhibit similar results.
Is it quicker to make one trip to the database and bring back 3000+ plus rows, then manipulate them in .net & LINQ or quicker to make 6 calls bringing back a couple of 100 rows at a time?
It will entirely depend on the speed of the database, the network bandwidth and latency, the speed of the .NET machine, the actual queries etc.
In other words, we can't give you a truthful general answer. I know which sounds easier to code :)
Unfortunately this is the kind of thing which you can't easily test usefully without having an exact replica of the production environment - most test environments are somewhat different to the production environment, which could seriously change the results.
Is this for one user, or will many users be querying the data? The single database call will scale better under load.
Speed is only one consideration among many.
How flexible is your code? How easy is it to revise and extend when the requirements change? How easy is it for another person to read and maintain your code? How portable is your code? what if you change to a diferent DBMS, or a different progamming language? Are any of these considerations important in your case?
Having said that, go for the single round trip if all other things are equal or unimportant.
You mentioned that the single round trip might result in reading data you don't need. If all the data you need can be described in a single result table, then it should be possible to devise a query that will get that result. That result table might deliver some result data in more than one row, if the query denormalizes the data. In that case, you might gain some speed by obtaining the data in several result tables, and composing the result yourself.
You haven't given enough information to know how much programming effort it will be to compose a single query or to compose the data returned by 6 queries.
As others have said, it depends.
If you know which 6 SQL statements you're going to execute beforehand, you can bundle them into one call to the database, and return multiple result sets using ADO or ADO.NET.
http://support.microsoft.com/kb/311274
the problem I have here is that I need it all, i just need it displayed separately...
The answer to your question is 1 query for 3000 rows is better than 6 queries for 500 rows. (given that you are bringing all 3000 rows back regardless)
However, there's no way you're going (to want) to display 3000 rows at a time, is there? In all likelihood, irrespective of using Linq, you're going to want to run aggregating queries and get the database to do the work for you. You should hopefully be able to construct the SQL (or Linq query) to perform all required logic in one shot.
Without knowing what you're doing, it's hard to be more specific.
* If you absolutely, positively need to bring back all the rows, then investigate the ToLookup() method for your linq IQueryable< T >. It's very handy for grouping results in non-standard ways.
Oh, and I highly recommend LINQPad (free) for trying out queries with Linq. It has loads of examples, and it also shows you the sql and lambda forms so you can familiarize yourself with Linq<->lambda form<->Sql.
Well, the answer is always "it depends". Do you want to optimize on the database load or on the application load?
My general answer in this case would be to use as specific queries as possible at the database level, therefore using 6 calls.
Thx
I was kind of thinking "ball park", but it sounds as though its a choice thing...the difference is likely small.
I was thinking that getting all the data and manipulating in .net would be the best - I have nothing concrete to base this on (hence the question), I just tend to feel that calls to the DB are expensive and if I know i need all the data...get it in one hit?!?
Part of the problem is that you have not provided sufficient information to give you a precise answer. Obviously, available resources need to be considered.
If you pull 3000 rows infrequently, it might work for you in the short term. However, if there are say 10,000 people that execute the same query (ignoring cache effects), this could become a problem for both the app and db.
Now in the case of something like pagination, it makes sense to pull in just what you need. But that would be a general rule to try to only pull what is necessary. It's much more elegant to use a scalpel instead of a broadsword. =)
If you are talking about a query that has already been run by SQL (so optimized by SQL Server), working with LINQ or a SqlDataReader might actually have the same performance.
The only difference will be "how hard will it be to maintain your code?"
LINQ doesn't query anything to the database until you ask for the result with ".ToList()" or ".ToArray()" or even ".Count()". LINQ is dynamically building your query so it is exactly the same as having a SqlDataReader but with runtime verification.
Rather than speculating, why don't you try both and measure the results?
It depends
1) if your connector implementation precaches a lot of objects AND you have big rows (for example blobs, contry polygons etc.) you have a problem, you have to download a LOT of data. I've optimalized once a code that had this problem and it was just downloading some megs of garbage all the time via localhost, and my software runs now 10 times faster because i removed the precaching by an option
2) If your rows are small and you have a good chance that you need to read through all the 3000, you're better going on a big resultset
3) If you don't use prepared statements, all queries have to be parsed! Big resultset might be better.
Hope it helped
I always stick to the rule of "bring in what I need" and nothing more...the problem I have here is that I need it all, I just need it displayed separately.
So say...
I have a table with userid and typeid. I want to display all records with a userid, and display on the page in grids say separated by typeid.
At the moment I call sproc that does "select field1, field2 from tab where userid=1",
then on the page set the datasource of a grid to from t in tab where typeid=2 select t;
Rather than calling a different sproc "select field1, field2 from tab where userid=1 and typeid=2" 6 times.
??
How would one go about profiling a few queries that are being run from an ASP.NET application? There is some software where I work that runs extremely slow because of the database (I think). The tables have indexes but it still drags because it's working with so much data. How can I profile to see where I can make a few minor improvements that will hopefully lead to larger speed improvements?
Edit: I'd like to add that the webserver likes to timeout during these long queries.
Sql Server has some excellent tools to help you with this situation. These tools are built into Management Studio (which used to be called Enterprise Manager + Query Analyzer).
Use SQL Profiler to show you the actual queries coming from the web application.
Copy each of the problem queries out (the ones that eat up lots of CPU time or IO). Run the queries with "Display Actual Execution Plan". Hopefully you will see some obvious index that is missing.
You can also run the tuning wizard (the button is right next to "display actual execution plan". It will run the query and make suggestions.
Usually, if you already have indexes and queries are still running slow, you will need to re-write the queries in a different way.
Keeping all of your queries in stored procedures makes this job much easier.
To profile SQL Server, use the SQL Profiler.
And you can use ANTS Profiler from Red Gate to profile your code.
Another .NET profiler which plays nicely with ASP.NET is dotTrace. I have personally used it and found lots of bottlenecks in my code.
I believe you have the answer you need to profile the queries. However, this is the easiest part of performance tuning. Once you know it is the queries and not the network or the app, how do you find and fix the problem?
Performance tuning is a complex thing. But there some places to look at first. You say you are returning lots of data? Are you returning more data than you need? Are you really returning only the columns and records you need? Returning 100 columns by using select * can be much slower than returning the 5 columns you are actually using.
Are your indexes and statistics up-to-date? Look up how to update statisistcs and re-index in BOL if you haven't done this in a awhile. Do you have indexes on all the join fields? How about the fields in the where clause.
Have you used a cursor? Have you used subqueries? How about union-if you are using it can it be changed to union all?
Are your queries sargable (google if unfamiliar with the term.)
Are you using distinct when you could use group by?
Are you getting locks?
There are many other things to look at these are just a starting place.
If there is a particular query or stored procedure I want to tune, I have found turning on statistics before the query to be very useful:
SET STATISTICS TIME ON
SET STATISTICS IO ON
When you turn on statistics in Query Analyzer, the statistics are shown in the Messages tab of the Results pane.
IO statistics have been particularly useful for me, because it lets me know if I might need an index. If I see a high read count from the IO statistics, I might try adding different indexes to the affected tables. As I try an index, I run the query again to see if the read count has gone down. After a few iterations, I can usually find the best index(es) for the tables involved.
Here are links to MSDN for these statistics commands:
SET STATISTICS TIME
SET STATISTICS IO