How to apply market basket analysis / association rule in QlikView? - associations

I want to use Association rule in Qlikview to find out the best product combination.
But Im not technically strong in QlikView. Can anyone provide tutorials or can kindly teach me how to use the features in qlikview to performed the association rule.(Time slice is flexible if possible. )
And now currently , I'm trying with this tutorial for Market Basket analysis
[ http://www.quickqlearqool.nl/?p=965 ]
But it did not count the number of customer for the same purchase behaviour . (within 1 month )
Data I have :
Order table:
Order_ID |
Order_date |
Product_ID |
customer_ID |
Quantity
Product table:
Product_ID |
Product Name
The following dropBox link is a sample datasets
[ https://www.dropbox.com/s/mu1bz1wbrojou4t/AssociationRule.zip ]
Thanks everyone in advance!

Related

BigQuery to Data Studio : Show reliable COUNT DISTINCT regardless of the selected period

in my BigQuery project I store event data integrated from Firebase. The granularity and dimension is such that trying to present raw data in Data Studio quickly makes the report become VERY slow (1-2 min per page/interaction).
I then started to think how I could create pre-aggregated tables in BigQuery to speed everything up, but quickly realised COUNT DISTINCT metrics would be a problem with this approach.
Let me explain:
SELECT user, date
FROM UNNEST([
STRUCT("Adam" AS user, "20190923" AS date),
("Bob", "20190923"),
("Carl", "20190923"),
("Adam", "20190924"),
("Bob", "20190924"),
("Adam", "20190925"),
("Carl", "20190925"),
("Bob", "20190926")
]) AS website_visits;
+------+----------+
| User | Date |
+------+----------+
| Adam | 20190923 |
| Bob | 20190923 |
| Carl | 20190923 |
| Adam | 20190924 |
| Bob | 20190924 |
| Adam | 20190925 |
| Carl | 20190925 |
| Bob | 20190926 |
+------+----------+
The above is a table of website visits.
Clearly, creating a pre-aggregated table like
SELECT date, COUNT(DISTINCT user) FROM website_visits GROUP BY date
has the limitation that the count cannot be aggregated further (or even less, dinamically) to get a total, as doing a SUM would return 8 unique users which is not correct, there are only 3 unique users.
In BigQuery, this is fixed by using HLL_COUNT, which despite the approximation works ok for me.
Now to the big question:
How to do the same so that the result is displayable in Data Studio????
HLL_COUNT.EXTRACT is not available as function in there, and in the reporting I always have to keep in mind that the date range is set by the user however (s)he likes so it's not possible to store a pre-aggregated result for ALL cases...
EDIT 1: APPROX_COUNT_DISTINCT
As per answer from Bobbylank, I tried to use APPROX_COUNT_DISTINCT.
However I found that this just seems to move the issue down the line. My fault for not explaining what's over there.
Despite being performances acceptable it does not seem possible to me to blend a data source with this calculated metric.
Example: After displaying the amount of unique users in the selected period (which now works), I'm also trying to display Average Revenue Per User (ARPU) in Data Studio like Firebase does.
To do this, I have to SUM(REVENUE) / APPROX_COUNT_DISTINCT(USER)
Clearly, REVENUE works ok with pre-aggregation and is available in the raw data. I tried then to blend the raw data with a table containing just user visits. However APPROX_COUNT_DISTINCT can't be used in the blended data definition as calculated metrics are not allowed.
Even trying to use the USER field as a metric with Count Distinct aggregation, despite returning the correct figures when showing revenue and user count separately, when I try to divide them the problem becomes aggregation (apply SUM or AVG to the field and basically the result will be AVG(REVENUE/USERS) for each day).
I also then tried to store REVENUE directly in the visits table, but was reminded by Data Studio that I can't create calculated metrics that I can't mix dimensions and metrics in a calculated field.
APPROX_COUNT_DISTINCT might be more performance friendly for you?
https://support.google.com/datastudio/answer/9189108?hl=en
Otherwise the only way I can think would be to pre-calculate several metrics (e.g. unique users on that day, 7-day cumulative, 14-day, etc.) as your customer require for each single day.
Or you could provide a 2 page report with both of these methods with the caveat that the first can be used over a time period but will be much slower?

Separate tables vs map lists - DynamoDB

I need your help. I am quite new to databases.
I'm trying to get set up a table in DynamoDB to store info about TV shows. It seems pretty simple and straightforward but I am not sure if what I am doing is correct.
So far I have this structure. I am trying to fit everything about the TV shows into one table. Seasons and episodes are contained within a list of maps within a list of maps.
Is this too much layering?
Would this present a problem in the future where some items are huge?
Should I separate some of these lists of maps to another table?
Shows table
Ideally, you should not put a potentially unbounded list in a single row in DynamoDB because you could end up running into the item size limit of 400kb. Also, if you were to read or write one episode of one show, you consume capacity as if you are reading or writing all the episodes in a show.
Take a look at the adjacency list pattern. It’s a good choice because it will allow you to easily find the seasons in a show and the episodes in a season. You can also take a look at this slide deck. Part of the way through, it talks about hierarchical data, which is exactly what you’re dealing with.
If you can provide more information about your query patterns, I can give you more guidance on how to model your data in the table.
Update (2018-11-26)
Based on your comments, it sounds like you should use composite keys to establish hierarchical 1-N relationships.
By using a composite sort key of DataType:ItemId where ItemId is a different format depending on the data type, you have a lot of flexibility.
This approach will allow you to easily get the seasons in the show, get all episodes in all seasons, get all episodes in a particular season, or even get all episodes between season 1, episode 5 and season 2 episode 5.
hash_key | sort_key | data
----------|-----------------|----------------------------
SHOW_1234 | SHOW:SHOW_1234 | {name:"Some TV Show", ...
SHOW_1234 | SEASON:SE_01 | {descr:"In this season, the main character...
SHOW_1234 | EPISODE:S01_E01 | {...
SHOW_1234 | EPISODE:S01_E02 | {...
Here are the various key condition expressions for the queries I mentioned:
hash_key = "SHOW_1234" and sort_key begins_with("SEASON:") – gets all seasons
hash_key = "SHOW_1234" and sort_key begins_with("EPISODE:") – gets all episodes in all season
hash_key = "SHOW_1234" and sort_key begins_with("EPISODE:S02_") – gets all episodes in season 2
hash_key = "SHOW_1234" and sort_key between "EPISODE:S01_E5" and "EPISODE:S02_E5" – gets all episodes between season 1, episode 5 and season 2 episode 5

How to get the lowest price of products?

Table structure
The above Table have 10 products with various price from 3 suppliers. I need to pick the supplier who can give the lowest price.
Just i tried with MS Access 2013. I unable to get the lowest price. Your valuable guidance is much appreciate one.
SID = Supplier ID
PCODE = Product Code
Thank you very much for your time
I assume cheapest means lowest per-ml price per item
So do the following:
create a query#1 that includes the product, supplier, whatever other
fields you want in the final answer, and a calculated field of the
per-ml price.
create an aggregate query #2 on query#1, which groups by product and gives the min per_ml_price. Now you have a "table" with the cheapest price for each product.
Lastly, you want to find the data that matches the lowest price. (Inner-)Join query#2 and query#1, and output the fields you want (product, supplier, etc.)

eBay API upload TrackingId by Sold listing | Sales record

For Uploading Tracking Numbers to eBay via API I use CompleteSale function, where I pass OrderId or TransactionId, but sometimes I don't have neither TransactionId nor OrderId, I just have Sold listing | Sales record
it looks like:
Sold listing | Sales record 123456
How can I upload tracking number used Sold listing | Sales record value? Should I first get TransactionId or OrderId?
Appreciate any help. Thank you!
In that case you need to use call for GetSellingManagerSoldListings
then get SaleRecord array, which contains Transactions arrays. There is TransactionID in each transaction item.

ASP.Net / MySQL : Translating content into several languages

I have an ASP.Net website which uses a MySQL database for the back end. The website is an English e-commerce system, and we are looking at the possibility of translating it into about five other languages (French, Spanish etc). We will be getting human translators to perform the translation - we've looked at automated services but these aren't good enough.
The static text on the site (e.g. headings, buttons etc) can easily be served up in multiple languages via .Net's built in localization features (resx files etc).
The thing that I'm not so sure about it how best to store and retrieve the multi-language content in the database. For example, there is a products table that includes these fields...
productId (int)
categoryId (int)
title (varchar)
summary (varchar)
description (text)
features (text)
The title, summary, description and features text would need to be available in all the different languages.
Here are the two options that I've come up with...
Create additional field for each language
For example we could have titleEn, titleFr, titleEs etc for all the languages, and repeat this for all text columns. We would then adapt our code to use the appropriate field depending on the language selected. This feels a bit hacky, and also would lead to some very large tables. Also, if we wanted to add additional languages in the future it would be time consuming to add even more columns.
Use a lookup table
We could create a new table with the following format...
textId | languageId | content
-------------------------------
10 | EN | Car
10 | FR | Voiture
10 | ES | Coche
11 | EN | Bike
11 | FR | Vélo
We'd then adapt our products table to reference the appropriate textId for the title, summary, description and features instead of having the text stored in the product table. This seems much more elegant, but I can't think of a simple way of getting this data out of the database and onto the page without using complex SQL statements. Of course adding new languages in the future would be very simple compared to the previous option.
I'd be very grateful for any suggestions about the best way to achieve this! Is there any "best practice" guidance out there? Has anyone done this before?
In your case, I would recommend using two tables:
Product
-------------------------------
ProductID | Price | Stock
-------------------------------
10 | 10 | 15
ProductLoc
-----------------------------------------------
ProductID | Lang | Name | Description
-----------------------------------------------
10 | EN | Bike | Excellent Bike
10 | ES | Bicicleta | Excelente bici
This way you can use:
SELECT * FROM
Product LEFT JOIN ProductLoc ON Product.ProductID = ProductLoc.ProductID
AND ProductLoc.Lang = #CurrentLang
(Left join just in case there is no record for the current lang in the ProductLoc table)
It's not good idea just to add new columns to existing table. It will be really hard to add a new language in the feature. The lookup table is much more better but I think you can have problem with performance because number of translated records.
I think best solution is to have a shared table:
products: id, categoryid,
and same tables for every language
products_en, products_de: product_id (fk), title, price, description, ...
You will just select from the shared one and join table with your language. The advantage is that you can localize even the price, category, ...

Resources