AnalysisServices: Cannot query internal supporting structures for column because they are not processed. Please refresh or recalculate the table - azure-analysis-services

I'm getting the following error when trying to connect Power BI to my tabular model in AS:
AnalysisServices: Cannot query internal supporting structures for column 'table'[column] because they are not processed. Please refresh or recalculate the table 'table'
It is not a calculated column and the connection seems to work fine on the local copy. I would appreciate any help with this!

This would depend on how you are processing the data within your model. If you have just done a Process Data, then the accompanying meta objects such as relationships have not yet been built.
Every column of data that you load needs to also be processed in this way regardless of whether it is a calculated column or not.
This can be achieved by running a Process Recalc on the Database or by loading your tables or table partitions with a Process Full/Process Default rather than just a Process Data, which automatically runs the Process Recalc once the data is loaded.
If you have a lot of calculated columns and tables that result in a Process Recalc taking a long time, you will need to factor this in to your refreshes and model design.
If you run a Process Recalc on your database or a Process Full/Process Default on your table now, you will no longer have those errors in Power BI.
More in depth discussion on this can be found here: http://bifuture.blogspot.com/2017/02/ssas-processing-tabular-model.html

Related

Change data detection in REST API

I'm building an ETL process that extracts data from REST API and then pushes the update messages to queue. The API doesn't support delta detection and uses hard delete for data deletion (record just disappears). I currently detect changes by keeping the table in DynamoDB that contains all the record ids along with their CRC. Whenever the API data is extracted next time I compare every record's CRC towards a CRC stored in DynamoDB thus detecting if change has occurred.
This allows to detect the updates/inserts but wouldn't detect the deletions. Is there a best practice of how to detect hard deletes without putting the whole dataset into memory?
I'm currently thinking of this:
1. Have a Redis/DynamoDB table where the last extracted data snapshot would be temporarily saved
2. When the data extraction is complete - do the reverse processing - stream the data from DynamoDB comparing against Redis dataset to detect the missing key values
Is there a best practice / better approach with regard to this?

Complex Validation Prior to Insert

I'm looking for some feedback. I need to pull a large set of data from an interface table, validate this data, and then merge it into a core table. The validation is fairly complex and fairly involved, requiring checking data against a number of different tables and against a number of scenarios.
For business reasons, this interface table is consistently being written by an external process. The current process does a pass a validation pass over portions of the data, flagging any records that fail validation in the interface table itself. We've found that, due to contention, this process scales extremely poorly... to the point where resource-wait timeouts are common. So, we need to rethink the approach of updating records in-place on the interface table.
I'm considering two different methods of doing this (I am open to alternatives). Throughput performance is the most important consideration, as we need this process to be able to keep pace with the data coming in. What would be the faster approach?
a) Pull the dataset into a cursor. Loop through each row, validating each record individually. If the record passes validation, perform the merge into the core table.
b) Pull the dataset into a temp table. Make multiple passes over the data in the temp table using SQL UPDATE statements to flag records that fail validation. Merge the data in the temp table that doesn't get flagged into the core table.
Or is there something else I've not considered?

Out of Memory Exception in Matrix RDLC

Im working on RDLC report where im using matrix to display the data.
But the problem is when the huge data is loading the report is not opening instead its showing the error System.Outofmemoryexception.
The reports without the matrix with huge data is working fine.
The records im trying to load is around 80,000 records.Do anyone faced the same problem?
The computer does not have sufficient memory to complete the requested operation when one or more of the following conditions are true:
A report is too large or too complex.
The overhead of the other running processes is very high.
The physical memory of the computer
is too small.
A report is processed in two stages. The two stages are execution and rendering. This issue can occur during the execution stage or during the rendering stage.
If this issue occurs during the execution stage, this issue most likely occurs because too much memory is consumed by the data that is returned in the query result. Additionally, the following factors affect memory consumption during the execution stage:
Grouping
Filtering
Aggregation
Sorting
Custom code
If this issue occurs during the rendering stage, the cause is related to what information the report displays and how the report displays the information.
Solution:
configure SQL Server to use more than 2 GB of physical memory
Schedule reports to run at off-hours when memory constraints are lower.
Adjust the MemoryLimit setting accordingly.
Upgrade to a 64-bit version of Microsoft SQL Server 2005 Reporting Services.
Redesign Report like
Return less data in the report queries.
Use a better restriction on the WHERE clause of the report queries.
Move complex aggregations to the data source.
Export the report to a different format. You can reduce memory consumption by using a different format to display the report like Excel, PDF etc
Simplify report design like Include fewer data regions or controls in the report or use a drillthrough report to display details.
In my case it is not problem of how big is dataset (how many rows), it is question of matrix report design. If variable in columns part of matrix has big value domain (more then 300 lets say) and you already have a variable with big value domain in rows part of matrix. It is not problem when both variable with big value domain are in rows or columns part of matrix. So different design, or create dataset which depend of value domain of variables with big value domain.

How can i improve the performance of the SQLite database?

Background: I am using SQLite database in my flex application. Size of the database is 4 MB and have 5 tables which are
table 1 have 2500 records
table 2 have 8700 records
table 3 have 3000 records
table 4 have 5000 records
table 5 have 2000 records.
Problem: Whenever I run a select query on any table, it takes around (approx 50 seconds) to fetch data from database tables. This has made the application quite slow and unresponsive while it fetches the data from the table.
How can i improve the performance of the SQLite database so that the time taken to fetch the data from the tables is reduced?
Thanks
As I tell you in a comment, without knowing what structures your database consists of, and what queries you run against the data, there is nothing we can infer suggesting why your queries take much time.
However here is an interesting reading about indexes : Use the index, Luke!. It tells you what an index is, how you should design your indexes and what benefits you can harvest.
Also, if you can post the queries and the table schemas and cardinalities (not the contents) maybe it could help.
Are you using asynchronous or synchronous execution modes? The difference between them is that asynchronous execution runs in the background while your application continues to run. Your application will then have to listen for a dispatched event and then carry out any subsequent operations. In synchronous mode, however, the user will not be able to interact with the application until the database operation is complete since those operations run in the same execution sequence as the application. Synchronous mode is conceptually simpler to implement, but asynchronous mode will yield better usability.
The first time SQLStatement.execute() on a SQLStatement instance, the statement is prepared automatically before executing. Subsequent calls will execute faster as long as the SQLStatement.text property has not changed. Using the same SQLStatement instances is better than creating new instances again and again. If you need to change your queries, then consider using parameterized statements.
You can also use techniques such as deferring what data you need at runtime. If you only need a subset of data, pull that back first and then retrieve other data as necessary. This may depend on your application scope and what needs you have to fulfill though.
Specifying the database with the table names will prevent the runtime from checking each database to find a matching table if you have multiple databases. It also helps prevent the runtime will choose the wrong database if this isn't specified. Do SELECT email FROM main.users; instead of SELECT email FROM users; even if you only have one single database. (main is automatically assigned as the database name when you call SQLConnection.open.)
If you happen to be writing lots of changes to the database (multiple INSERT or UPDATE statements), then consider wrapping it in a transaction. Changes will made in memory by the runtime and then written to disk. If you don't use a transaction, each statement will result in multiple disk writes to the database file which can be slow and consume lots of time.
Try to avoid any schema changes. The table definition data is kept at the start of the database file. The runtime loads these definitions when the database connection is opened. Data added to tables is kept after the table definition data in the database file. If changes such as adding columns or tables, the new table definitions will be mixed in with table data in the database file. The effect of this is that the runtime will have to read the table definition data from different parts of the file rather than at the beginning. The SQLConnection.compact() method restructures the table definition data so it is at the the beginning of the file, but its downside is that this method can also consume much time and more so if the database file is large.
Lastly, as Benoit pointed out in his comment, consider improving your own SQL queries and table structure that you're using. It would be helpful to know your database structure and queries are the actual cause of the slow performance or not. My guess is that you're using synchronous execution. If you switch to asynchronous mode, you'll see better performance but that doesn't mean it has to stop there.
The Adobe Flex documentation online has more information on improving database performance and best practices working with local SQL databases.
You could try indexing some of the columns used in the WHERE clause of your SELECT statements. You might also try minimizing usage of the LIKE keyword.
If you are joining your tables together, you might try simplifying the table relationships.
Like others have said, it's hard to get specific without knowing more about your schema and the SQL you are using.

How to handle large amounts of data for a web statistics module

I'm developing a statistics module for my website that will help me measure conversion rates, and other interesting data.
The mechanism I use is - to store a database entry in a statistics table - each time a user enters a specific zone in my DB (I avoid duplicate records with the help of cookies).
For example, I have the following zones:
Website - a general zone used to count unique users as I stopped trusting Google Analytics lately.
Category - self descriptive.
Minisite - self descriptive.
Product Image - whenever user sees a product and the lead submission form.
Problem is after a month, my statistics table is packed with a lot of rows, and the ASP.NET pages I wrote to parse the data load really slow.
I thought maybe writing a service that will somehow parse the data, but I can't see any way to do that without losing flexibility.
My questions:
How large scale data parsing applications - like Google Analytics load the data so fast?
What is the best way for me to do it?
Maybe my DB design is wrong and I should store the data in only one table?
Thanks for anyone that helps,
Eytan.
The basic approach you're looking for is called aggregation.
You are interested in certain function calculated over your data and instead of calculating the data "online" when starting up the displaying website, you calculate them offline, either via a batch process in the night or incrementally when the log record is written.
A simple enhancement would be to store counts per user/session, instead of storing every hit and counting them. That would reduce your analytic processing requirements by a factor in the order of the hits per session. Of course it would increase processing costs when inserting log entries.
Another kind of aggregation is called online analytical processing, which only aggregates along some dimensions of your data and lets users aggregate the other dimensions in a browsing mode. This trades off performance, storage and flexibility.
It seems like you could do well by using two databases. One is for transactional data and it handles all of the INSERT statements. The other is for reporting and handles all of your query requests.
You can index the snot out of the reporting database, and/or denormalize the data so fewer joins are used in the queries. Periodically export data from the transaction database to the reporting database. This act will improve the reporting response time along with the aggregation ideas mentioned earlier.
Another trick to know is partitioning. Look up how that's done in the database of your choice - but basically the idea is that you tell your database to keep a table partitioned into several subtables, each with an identical definition, based on some value.
In your case, what is very useful is "range partitioning" -- choosing the partition based on a range into which a value falls into. If you partition by date range, you can create separate sub-tables for each week (or each day, or each month -- depends on how you use your data and how much of it there is).
This means that if you specify a date range when you issue a query, the data that is outside that range will not even be considered; that can lead to very significant time savings, even better than an index (an index has to consider every row, so it will grow with your data; a partition is one per day).
This makes both online queries (ones issued when you hit your ASP page), and the aggregation queries you use to pre-calculate necessary statistics, much faster.

Resources