I’m facing a little issue to calculate a distinct count a number of clients in SSAS OLAP Cube. The difficulty appears for the credited client’s accounts, in other words, for the clients how have credit (quantity = -1) or for the clients how have bought the product and they receive a credit after (quantity = 0). My actual distinct count in my cube considers these two cases as real buying transaction, but in fact they’re not. I’ve checked in SSAS to make a distinct count with the expression (SUM Quantity > 1), but I didn’t find nothing. Now I’m thinking to model these cases directly in my Datawarehouse, but I don’t see how can’t do it. Can anyone de give me a little help?
Thanks.
I would feed this data into SSAS using a SQL View. Within that View I would define a calculation to return NULL for the rows you dont want to count, something like this:
CASE WHEN quantity <= 0 THEN NULL ELSE Client_Account END AS Client_Account_For_Distinct_Count
Then I would use that column as the basis of the SSAS Distinct Count measure.
Related
I'm creating my first ever project with Firebase, and I come to the point when I need some statistics based on user input. I know Firebase (or NoSQL databases in general) are not ideal for statistics but they work for me in any other cases so I would like to give it a try.
What I have:
I work on the application where people can invite a friend to work for their company, so I have a collection of "referrals" where ID of each referral is basically UserID of a user to who the referral belongs, and then there is a subcollection with name "items" where data are stored.
How my data looks like:
Each item have these data:
applicant
appliedDate
position(part of position is positionId & department on which this position is coming from)
status
What I wanted is to let user to make statistics based on:
date range
status
department
What I was thinking about:
It's probably not the best idea to let firebase iterate over all referrals once users make requests as it may get really expensive on firebase. What I was thinking of is using cloudfunctions to calculate statistics always when something change e.g. when a new applicant applies I will increase the counter by one and the same for a counter to a specific department. However I feel like this make work for total numbers or for predefined queries e.g. "LAST MONTH" but once I will not know what dates user will select it start to get tricky.
Any idea how can I design something like this?
Thanks a lot!
What you're considering is the idiomatic approach to calculate aggregated in Firestore, and most NoSQL databases. If you follow this pattern, Firestore is quite well suited to storing statistics.
It's ad-hoc statistic, like the unknown data range, that are trickier. Usually this comes down to storing the right values to allow you to get rid of the need to read an unknown number of documents to calculate a value.
For example, if you store counters for the statistics per month, week, day and hour, you can satisfy a wide range of date ranges with a limited number of read operations. You may need to read multiple documents, but the number of documents to read depends on the range, and not on the total number of documents in the database.
Of course, for the most flexible ad-hoc querying, you may still want to consider another solution, such as BigQuery, which was made precisely for this use-case.
did anyone know what is the formula for calculation field to count running total for the group of few dimensions and sort by payment date? eg: I want to count running total for "Sales", group by ProductName, Location, Date, PPID, sort by payment date in descending order.
I can done this in "Table Calculation" but not meet my requirement. because after I get the output, I need to apply it in another calculation fields. So I need to count the running total by formula.
Thanks
Tableau calcs are the only calculations in Tableau that take the order of rows into account.
Your other option is to use custom SQL to write a windowing or analytic query. Read about the SQL keywords PARTITION and OVER. Not all databases support them, but most major ones do.
I'm a little unfamiliar with ClickHouse and still study it by trial and error. Got a question about it.
Talking about the star scheme of data representations, with dimensions and facts. Currently, I keep everything in PostgreSQL, but OLAP queries with aggregations start to show bad timing, so I'm going to move some fact tables to ClickHouse. Initial tests of CH show incredible performance, however, in real life the queries should include joins to dimension tables from PostgreSQL. I know I can connect them as dictionaries.
Question: I found that using dictionaries I can make requests similar to LEFT JOINs in good old RDBMS, ie values from resultset could be joined with corresponding values from the dictionary. But can they be filtered by some restrictions on dictionary keys (as in INNER JOIN)? For example, in PostgreSQL I have a table users (id, name, ...) and in ClickHouse I have table visits (user_id, source, medium, session_time, timestamp, ...) with metrics about their visits to the site. Can I make a query to CH to fetch aggregated metrics (number of daily visits for given date range) of users which name matches some condition (LIKE "EVE%" for example)?
It sounds like ODBC table function is what you're looking for. ClickHouse have a bunch of table functions which work like Postgres foreign tables. The setup is similar to Dictionaries but you gain the traditional JOIN behavior. It currently doesn't show up in the official document. You can refer to this https://github.com/yandex/ClickHouse/blob/master/dbms/tests/integration/test_odbc_interaction/test.py#L84 . And in near future (this year), ClickHouse will have standard JOIN statement supported.
The dictionary will basically replace the value first. As I understand it your dictionary would be based off your users table.
Here is an example. Hopefully I am understanding your question.
select dictGetString('accountidmap', 'domain', tuple(toString(account_id))) AS domain, sum(session) as sessions from session_distributed where date = '2018-10-15' and like(domain, '%cats%') group by domain
This is a real query on our database so If there is something you want to try/confirm let me know
I am new to bigquery. Have a couple of rookie questions --
Is there any way to do a select top x * query, laid out kind of like the preview pane in table details? It can be a lot easier to understand when you can visually see the data and structure.
You can create a unique visit ID by concatenating VisitId and FullVisitorId. Why doesn't this match count of sessions? How do unique visits differ from sessions definitionally?
Thanks!
COUNT(DISTINCT field[, N]) is a statistical approximation. For counts less than N, it is exact.
To get an exact count for large values, use count(*) on a group each by, however this may be a much slower query.
See COUNT documentation at https://cloud.google.com/bigquery/query-reference.
Good day everyone,
I have some questions, about how to do calculations of data stored in the database. Like, I have a table:
| ID | Item name | quantity of items | item price | date |
and for example i have stored 10000 records.
First that I need to do is to pick up items from a date interval, so I wont need the whole database for my calculations. And then I get items from that date interval, I have to add some tables, for example to calculate:
full price = quantity of items * item price
and store them in new table for each item. So the database for the items picked from the date interval should look like this:
| ID | Item name | quantity of items | item price | date | full price |
The point is that I don't know how to store that items which i picked with date interval. Like, do i have create some temporary table, or something?
This will be using an ASP.NET web application, and for calculations in the database I think I will use SQL queries. Maybe there is an easier way to do it? Thank you for your time to help me.
Like other people have said, you can perform these queries on the fly rather than store them.
However, to answer your question, a query like this should do the trick..
I haven't tested this so the syntax might be off a touch, though it will get you on the right track.
Ultimately you have to do an insert with a select
insert into itemFullPrice
select id, itemname, itemqty, itemprice, [date], itemqty*itemprice as fullprice from items where [date] between '2012/10/01' AND '2012/11/01'
again..don't shoot me if i have got the syntax a little off.. it's a busy day today :D
Having 10000 records, it'd not be a good idea to use temporary tables.
You'd better have another table, called ProductsPriceHistory, where you peridodically calculate and store, let's say, monthly reports.
This way, your reports would be faster and you wouldn't have to make calculations everytime you want to get your report.
Be aware this approach is OK if your date intervals are fixed, I mean, monthly, quarterly, yearly, etc.
If your date intervals are dynamic, ex. from 10/20/2011 to 10/25/2011, from 05/20/2011 to 08/13/2011, etc, this approach wouldn't work.
Another approach is to make calculations on ASP.Net.
Even with 10000 records, your best bet is to calculate something like this on the fly. This is what structured databases were designed to do.
For instance:
SELECT [quantity of items] * [item price] AS [full price]
, [MyTable].*
FROM [MyTable]
More complex calculations that involve JOINs to 3 or more tables and thousands of records might lend itself to storing values.
There are few approaches:
use sql query to calculate that on the fly - this way nothing is stored to the database
use same or another table to perform calculation
use calculated field
If you have low database load (few queries per minute, few thousands of rows per fetch) then use first aproach.
If calculation on the fly performs poorly (millions of records, x fetches per second...) try second or third aproach.
Third one is ok if your db supports calculated and persisted fields, say MSSQL Server.
EDIT:
Typically, as others said, you will perform calculation in your query. That is, as long as your project is simple enough.
First, when the table where you store all the items and their prices becomes attacked with insert/update/deletes from multiple clients, you don't want to block or be blocked by others. You have to understand that e.g. table X update will possibly block your select from table X until it is finished (look up page/row lock). This means that you are going to love parallel denormalized structure (table with product and the calculated stuff along with it). This is where e.g. reporting comes into play.
Second, when calculation is simple enough (a*b) and done over not-so-many records, then it's ok. When you have e.g. 10M records and you have to correlate each row with several other rows and do some aggregation over some groups, there is a chance that calculated/persisted field will save your time - you can gain up to 10-100 times faster result using this approach.
You should separate concerns in your application:
aspx pages for presentation
sql server for data persistency
some kind of intermediate "business" layer for extra logic like fullprice = p * q
E.g. if you are using Linq-2-sql for data retrieval, it is very simple to add a the fullprice to your entities. The same for entity framework. Also, if you want, you can already do the computation of p*q in the SQL select. Only if performance really becomes an issue, you can start thinking about temporary tables, views with clustered indexes etc.