pull everything from db and then compute stats? - asp.net

I am working on an auto trade website. On the page where Ad list gets displayed, I plan to display the number of Ad for different categories: for example, for location: I will display something like "Vancouver (50), Richmond (12), Surrey (20)". For vehicle make, the following will be shown "Honda (20), Ford(12), VW (24)".
I am not sure if I should pull ALL the ad from the db into a List first, bind one page of the result to gridview control, and then compute stats for each category using Linq. if course I will limit the number of rows pulled from the db using some kind of condition - maybe set the MAX # of rows to be returned as 500.
My major concern is - is this going to be a memory hog?

It is bad idea to count rows in the memory. Use
SELECT COUNT() FROM... [ GROUP BY ]
which is much more efficient.
Also review possibility to cache these values. Assuming that multiple people loads your page simultaneously, but count of cars don't changes quicker then once per 1 minute - apply caching strategy to obsolete values over 1 min.

Related

Invisible graphs cause report to slow

I have a report with a parameter where the end user chooses a practice name that corresponds to a group of people. Most of these groups have fewer than 10 people, but a small number of them have as many as 150. When there are more than 15 people in a given group, they want separate graphs, each with no more than 15 people. So for most of the groups, we only need one graph. For a few, we need a lot of graphs.
Behind the scenes, I created a graph for each multiple of 15 people, and set them to only be visible if there are actually that many people in the group. This does what I need it to, but it makes the report super slow. As close as I can tell, behind the scenes when an end user runs the report it's still somehow rendering the hidden graphs and slowing it all to heck. (I did find this link which I think suggests this is a known bug.
I need to have one report where the end user selects the practice name, so I can't make two reports, "My practice is normal" and "My practice is ginormous". I thought maybe I could make a conditional sub-report split into those two reports based on the practice name parameter, but that doesn't appear to be possible; you can play around with visibility but I'm guessing that will still cause the invisible graph rendering problem and not help my speed.
Are there any other cool tips I can try to speed up my report, or is this just a case of too many graphs spoiling the broth?
The easiest way would be to generate a group number for every 15 people and then use a list control to repeat the chart for each group.
Here's a very quick example of this in action. I just used some sample data from one of the Adventure Works sample database.
Here's my query that returns every person in each selected department. Note that I have commented out the DELCAREs as these were just in there for testing.
--DECLARE #Department varchar(50) = ''
--DECLARE #chartMax int = 5
SELECT
GroupName, v.Department, v.FirstName, v.LastName
, ChartGroup = (ROW_NUMBER() OVER(PARTITION BY Department ORDER BY LastName, FirstName)-1) / #chartMax -- calc which chart number the person belongs to
, Salary = ((ABS(CHECKSUM(NewId())) % 100) * 500) + (ABS(CHECKSUM(NewId())) % 1000) + 10000 -- Just some random number to plot
FROM [HumanResources].[vEmployeeDepartment] v
WHERE Department IN (#Department)
ORDER BY Department
The key bit is the ChartGroup column
ChartGroup = (ROW_NUMBER() OVER(PARTITION BY Department ORDER BY LastName, FirstName)-1) / #chartMax
This will give the first 5 rows in each department a ChartGroup of 0 the next 15 1 and so on. I used 5 rather than 15 just so it's easier to demo.
Here's the dataset results
Now, in your report, add a List, set it's dataset property to your dataset containing your main data (the query above in my case).
Now edit the 'details' rowgroup properties and add a grouping by Practice and ChartGroup (Department and ChartGroup in this example)
In the list box's textbox, right-click then insert a chart.
Set the chart up as required, in my example, I used salary as the values on a pie chart and the employee names as the labels.
Here's the final design ..
Note that I set the department as a multi-value parameter and also set the number of persons per chart (chartMax) as a report parameter.
When I preview the report I get this for 'Engineering' which has 6 employees
Sales has 18 employees so we get this
.... and so on, it will generate a new chart for every 15 people or part thereof.

How do I create a running count of outcomes sequentially by date and unique to a specific person/ID?

I have a list of unique customers who have made transactions over a year (Jan – Dec). They have bought products using 3 different methods (card, cash, check). My goal is to build a multi-classification model to predict the method pf payment.
To do this I am engineering some Recency and Frequency features into my training data, but am having trouble with the following frequency count because the only way I know how to do it is in Excel using the Countifs and SUMIFs functions, which are inhibitingly slow. If someone can help and/or suggest another solution, it would be very much appreciated:
So I have a data set with 3 columns (Customer ID, Purchase Date, and Payment Type) that is sorted by Purchase Date then Customer ID. How do I then get a prior frequency count of payment type by date that does not include the count of the current row transaction or any future transactions that are > the Purchase Date. So basically I want to do a running count of each payment option, based on a unique Customer ID, and a date range that is < purchase date of that training row. In my head I see it as “crawling” backwards through the transactions and counting. Simplified screenshot of data frame is below with the 3 prior count columns I am looking to generate programmatically.
Screenshot
This gives you the answer as a list of CustomerID, PurchaseDate, PaymentMethod and prior counts
SELECT CustomerID, PurchaseDate, PaymentMethod,
(
select count(CustomerID) from History T
where
T.CustomerID=History.CustomerID
and T.PaymentMethod=History.PaymentMethod
and T.PurchaseDate<History.PurchaseDate
)
AS PriorCount
FROM History;
You can save this query and use it as the source for a crosstab query to get the columnar format you want
Some notes:
I assumed "History" as the source table name - you can change the query above to use the correct source
To use this as a query, open a new query in design view. Close the window that asks what tables the query is to be built on. Open the SQL view of the query design - like design view, but it shows the SQL instead of the normal design interface. Copy the above into the SQL view.
You should now be able to switch to datasheet view and see the results
When the query is working to your satisfaction, save it with any appropriate name
Open a new query in design view
When you get the list of tables to include, switch to the list of queries and include the query you just saved
Change the query type to crosstab and update the query as needed to select rows, columns and values - look up "access crosstab queries" if you need more help.
Another tip to see what is happening here:
You can take the subquery - the parts inside the () above - and make
just that statement into it's own query, excluding the opening and closing (). Then you can look at it's design view to see what it does
Save it with an appropriate name and put it into the query above in place of the statement in () - then you can look at the design view.
Sometimes it's easier to visualize and learn from 2 queries strung together this way than to work with sub queries.

How to Show Null in a Cell in MS Access 2010

Alright, so I'm trying to run a query that to display a null value in a field if a duplicate ID exist. There are 2 tables, parent-child relationship. The parent table can have one to many child records. In my scenario, we have fuel tanks that can get many inspections done on them. Tank is the parent table and Tank_Inspections is the child table. There's a capacity data field that I'm getting from the tank table and joining it with the tank inspection record and it shows up twice if multiple inspections exist for that tank. This is fine, however I don't want to double count the capacity and only want to show it once. I've pasted the link to an image of a screenshot of how it should be displayed if multiple records exist for parent table. The highlighted cell should be blank. As you can see, the TankID = 65 has two inspections of different types, since I'm getting the capacity field from the Tank table, it's getting inserted twice. I was to write a query so if two or more inspections exist for tanks, only show the capacity once and "blank" out the other capacity data element. In this case, the highlight cell should be blank. Suggestions?
http://imgur.com/6bYE8wS
This sounds like a job for an analytic function. Since Access doesn't natively support these, there is a hack to accomplish the row_number() analytic function that sounds like it would meet your needs:
Achieving ROW_NUMBER / PARTITION BY in MS Access
You create a query that invokes this self-join and use that instead of the table. Once you have the row number on each row, it would look something like this:
Tank ID Inspection ID Row
59 6841 1
60 6842 1
65 7344 1
65 6843 2
And your capacity formula would change from [Inspection].[Capacity] to something like this:
IIf([Self Join].[Row] = 1, [Inspection].[Capacity], Null)

Dynamodb data model for process/transaction monitoring

I am wanting to keep track of multi stage processing job.
Likely just need the following fields
batchId (guid) | eventId (guid) | statusId (int) | timestamp | message (string)
There are relatively small number of events per batch.
I want to be able to easily query events that have a statusId less than n (still being processed or didn't finish processing).
Would using multiple rows for each status change, and querying for latest status be the best approach? I would use global secondary index but StatusId does not seem like a good candidate for hashkey (less than 10 statuses).
Instead of using multiple rows for every status change, if you updated the same event row instead, you could use a technique described in the DynamoDB documentation in the section 'Use a Calculated Value'. Basically this would involve adding another attribute (say 'derivedStatusId') which would be derived by appending a random number to statusId at the time of writing to DynamoDB. For example, for a statusId of 2, derivedStatusId could be one of {"2-00", "2-01", .. "2-99"}. Setting up a Global Secondary Index on derivedStatusId would give you some fan-out that will help in preventing the index from becoming hot.
If you are sure that you will use this index for only unfinished events, then removing the derivedStatusId attribute from the record when it transitions to a finished status will remove it from index as well - which may be a good property if events are expected to finish processing eventually, and if they stay around forever. This technique is called "Sparse Index" and is described in more detail here.
From your question, it seems like keeping status history recording is a desired property (I assume this because you want to have multiple rows for status changes). Consider putting this historical information in the same row. DynamoDB supports list data types and also has a generous 400KB item limit which may just allow you to capture all the desired historical information in the same record.

Creating a report with a dynamic number of columns

I am developing an ASP.Net c# 4.0 application and am working towards a cross-tab report which will return a dynamically changing number of columns, like so:
Sales Region| ProductA|ProductB|ProductC|........
NorthEast| 10,000 | 3,000 |2,000 |........
SouthEast| 3,000 | 6,000 |2,500 |........
...................................................
...................................................
TOTAL | 100,000| 32,500 |7,800 |........
There is a non determined number of products and regions, so the table returned will have a variable number of columns and rows.
How can I design such a report in Visual Studio 2010, RDLC designer? I have already designed my stored procedure returning the results, but designing the table adapter to return the results gives me no columns (as they are not known).
I think you need a "matrix" report.
This tutorial might help you.
Unfortunately you probably need to rewrite the query (I believe you already used PIVOT to obtain the current query).
Checkout this link http://www.gotreportviewer.com/matrices/
You can use matrix for your solution. Since you need total with each column so you don't need to sum each column individually. It will create rows and columns dynamically as per the datatable records.

Resources