PowerBI - Count instances of string in multiple columns - count

Been searching the forums on here but can't find anything that exactly replicates what I'm trying to do - I have split a string by delimiter and currently have an Excel file that has the following column names and some basic sample data:
Learnt 1 - Learnt 2 - Learnt 3 - Learnt 4
Books - (blank) - (blank) - (blank)
Online - Books - (blank) - (blank)
(blank) - Books - Bootcamp - (blank)
Bootcamp - (blank) - Books - (blank)
The four learnt columns are populated either by a method of learning (of which there's around 8 possibilities which apply to every column, or are blank, and more than one column can be non-blank at the same time in any given row. Very simply, I want to be able to count the total number of times each method appears in all the columns in PowerBI, so the expected results here would be Books 4, Online 1, Bootcamp 2. Any help would be greatly appreciated.
I have tried using the USERelationship function to link this to a separate table with all methods of learning listed out(Table 1), see query below but was having no luck:
CountinlearntocodeColumns =
CALCULATE (
COUNTROWS ( LearningMethodTable ),
USERELATIONSHIP ( Table1[Method], LearningMethodTable[Learnt 1] )
)
+ CALCULATE (
COUNTROWS ( LearningMethodTable ),
USERELATIONSHIP ( Table1[Method], LearningMethodTable[Learnt 2] )
)
+ CALCULATE (
COUNTROWS ( LearningMethodTable ),
USERELATIONSHIP ( Table1[Method], LearningMethodTable[Learnt 3] )
)

You make life unnecessarily difficult when working with pivoted tables in Power BI. Fix that first
In Power Query simply unpivot all columns and filter out blanks:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcsrPzy5WUNJRgqJYnWgl/7yczLxUkCCKLEgKWQzIKElOzC1QgEmiCCBrjo0FAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Learnt 1 " = _t, #"Learnt 2 " = _t, #"Learnt 3 " = _t, #"Learnt 4" = _t]),
#"Unpivoted Columns" = Table.UnpivotOtherColumns(
Source, {}, "Learnt", "Method"),
#"Filtered Rows" = Table.SelectRows(
#"Unpivoted Columns", each [Method] <> null and [Method] <> "")
in
#"Filtered Rows"
The result will look like this:
Now you can easily create your count in DAX
Count = Count('Table'[Method])
and pull it into a new result table:

Related

DISTINCT key word issues in SQLite

Trying to run a query that should bring back all mechanics and a sum of all their commissions from another table but only getting one mechanics name and a sum of all commissions. Tried writing the query in different ways but getting the same result.
The Query:
SELECT DISTINCT m.mechID, fname || ' ' || lname AS 'Full Name', SUM(commission) AS 'Total Commissions Per Mechanic'
FROM
mechanics AS m
INNER JOIN mech_commissions AS mc on m.mechID = mc.mechID
ORDER BY "Full Name";
The output:
I think you want an aggregation query here:
SELECT m.mechID, m.fname || ' ' || m.lname AS `Full Name`,
SUM(mc.commission) AS `Total Commissions Per Mechanic`
FROM mechanics AS m
INNER JOIN mech_commissions AS mc ON m.mechID = mc.mechID
GROUP BY 1, 2
ORDER BY `Full Name`;

Count Multiple Rows from Multiple Table (Power BI)

I am trying to write a Power BI query that can calculate the number of ROWS with a Condition.
Now I have 5 tables- Table1, Table 2, Table 3, Table 4, Table 5.
Now in those Tables, I have two Column Called ID & Date. I would like to Count all ID where the Date is `not Empty
I am trying this Query but it is not helping my cause.
All Total Hires =
SUMX(
UNION(
SELECTCOLUMNS(Table1,"A",Table1[Name]),
SELECTCOLUMNS(Table2,"A",Table2[Name]),
SELECTCOLUMNS(Table3,"A",Table3[Name]),
SELECTCOLUMNS(Table4,"A",Table4[Name]),
SELECTCOLUMNS(Table5,"A",Table5[Name])
)
,IF([A] <> NULL, 1, 0))
Does Anyone know any solution to this Problem?
You can do something like this using COUNTROWS and BLANK(). Note: I've assumed that the Date is null/blank and not ' ' type of empty.
Table1 Non Blanks= CALCULATE(COUNTROWS('Table1'), FILTER('Table1', 'Table1'[Date] <> BLANK())
You create a measure per table, and add them together or
CALCULATE(COUNTROWS('Table1'), FILTER('Table1', 'Table1'[Date] <> BLANK())
+ CALCULATE(COUNTROWS('Table2'), FILTER('Table2', 'Table2'[Date] <> BLANK())
+ and Table3 etc
You can also try this below measure-
count_id =
COUNTROWS(
UNION(
FILTER(Table_1, Table_1[date] <> BLANK()),
FILTER(Table_2, Table_2[date] <> BLANK())
)
)

Undesired flattening occuring

I'm using BigQuery on exported GA data (see schema here)
Looking at the documentation, I see that when I selected a field that is inside a record it will automatically flatten that record and duplicate the surrounding columns.
So I tried to create a denormalized table that I could query in a more SQL like mindset
SELECT
CONCAT( date, " ", if (hits.hour < 10,
CONCAT("0", STRING(hits.hour)),
STRING(hits.hour)), ":", IF(hits.minute < 10, CONCAT("0", STRING(hits.minute)), STRING(hits.minute)) ) AS hits.date__STRING,
CONCAT(fullVisitorId, STRING(visitId)) AS session_id__STRING,
fullVisitorId AS google_identity__STRING,
MAX(IF(hits.customDimensions.index=7, hits.customDimensions.value,NULL)) WITHIN RECORD AS customer_id__LONG,
hits.hitNumber AS hit_number__INT,
hits.type AS hit_type__STRING,
hits.isInteraction AS hit_is_interaction__BOOLEAN,
hits.isEntrance AS hit_is_entrance__BOOLEAN,
hits.isExit AS hit_is_exit__BOOLEAN,
hits.promotion.promoId AS promotion_id__STRING,
hits.promotion.promoName AS promotion_name__STRING,
hits.promotion.promoCreative AS promotion_creative__STRING,
hits.promotion.promoPosition AS promotion_position__STRING,
hits.eventInfo.eventCategory AS event_category__STRING,
hits.eventInfo.eventAction AS event_action__STRING,
hits.eventInfo.eventLabel AS event_label__STRING,
hits.eventInfo.eventValue AS event_value__INT,
device.language AS device_language__STRING,
device.screenResolution AS device_resolution__STRING,
device.deviceCategory AS device_category__STRING,
device.operatingSystem AS device_os__STRING,
geoNetwork.country AS geo_country__STRING,
geoNetwork.region AS geo_region__STRING,
hits.page.searchKeyword AS hit_search_keyword__STRING,
hits.page.searchCategory AS hits_search_category__STRING,
hits.page.pageTitle AS hits_page_title__STRING,
hits.page.pagePath AS page_path__STRING,
hits.page.hostname AS page_hostname__STRING,
hits.eCommerceAction.action_type AS commerce_action_type__INT,
hits.eCommerceAction.step AS commerce_action_step__INT,
hits.eCommerceAction.option AS commerce_action_option__STRING,
hits.product.productSKU AS product_sku__STRING,
hits.product.v2ProductName AS product_name__STRING,
hits.product.productRevenue AS product_revenue__INT,
hits.product.productPrice AS product_price__INT,
hits.product.productQuantity AS product_quantity__INT,
hits.product.productRefundAmount AS hits.product.product_refund_amount__INT,
hits.product.v2ProductCategory AS product_category__STRING,
hits.transaction.transactionId AS transaction_id__STRING,
hits.transaction.transactionCoupon AS transaction_coupon__STRING,
hits.transaction.transactionRevenue AS transaction_revenue__INT,
hits.transaction.transactionTax AS transaction_tax__INT,
hits.transaction.transactionShipping AS transaction_shipping__INT,
hits.transaction.affiliation AS transaction_affiliation__STRING,
hits.appInfo.screenName AS app_current_name__STRING,
hits.appInfo.screenDepth AS app_screen_depth__INT,
hits.appInfo.landingScreenName AS app_landing_screen__STRING,
hits.appInfo.exitScreenName AS app_exit_screen__STRING,
hits.exceptionInfo.description AS exception_description__STRING,
hits.exceptionInfo.isFatal AS exception_is_fatal__BOOLEAN
FROM
[98513938.ga_sessions_20151112]
HAVING
customer_id__LONG IS NOT NULL
AND customer_id__LONG != 'NA'
AND customer_id__LONG != ''
I wrote the result of this table into another table denorm (flatten on, large data set on).
I get different results when I query denorm with the clause
WHERE session_id_STRING = "100001897901013346771447300813"
versus wrapping the above query in (which yields desired results)
SELECT * FROM (_above query_) as foo where session_id_STRING = 100001897901013346771447300813
I'm sure this is by design, but if someone could explain the difference between these two methods that would be very helpful?
I believe you are saying that you did check the box "Flatten Results" when you created the output table? And I assume from your question that session_id_STRING is a repeated field?
If those are correct assumptions, then what you are seeing is exactly the behavior you referenced from the documentation above. You asked BigQuery to "flatten results" so it turned your repeated field into an un-repeated field and duplicated all the fields around it so that you have a flat (i.e., no repeated data) table.
If the desired behavior is the one you see when querying over the subquery, then you should uncheck that box when creating your table.
Looking at the documentation, I see that when I selected a field that
is inside a record it will automatically flatten that record and
duplicate the surrounding columns.
This is not correct. BTW, can you please point to the documentation - it needs to be improved.
Selecting a field does not flatten that record. So if you have a table T with a single record {a = 1, b = (2, 2, 3)}, then do
SELECT * FROM T WHERE b = 2
You still get a single record {a = 1, b = (2, 2)}. SELECT COUNT(a) from this subquery would return 1.
But once you write results of this query with flatten=on, you get two records: {a = 1, b = 2}, {a = 1, b = 2}. SELECT COUNT(a) from the flattened table would return 2.

How to execute a complex sql statement and get the results in an array?

I would like to execute a fairly complex SQL statement using SQLite.swift and get the result preferably in an array to use as a data source for a tableview. The statement looks like this:
SELECT defindex, AVG(price) FROM prices WHERE quality = 5 AND price_index != 0 GROUP BY defindex ORDER BY AVG(price) DESC
I was studying the SQLite.swift documentation to ind out how to do it properly, but I couldn't find a way. I could call prepare on the database and iterate through the Statement object, but that wouldn't be optimal performance wise.
Any help would be appreciated.
Most sequences in Swift can be unpacked into an array by simply wrapping the sequence itself in an array:
let stmt = db.prepare(
"SELECT defindex, AVG(price) FROM prices " +
"WHERE quality = 5 AND price_index != 0 " +
"GROUP BY defindex " +
"ORDER BY AVG(price) DESC"
)
let rows = Array(stmt)
Building a data source from this should be relatively straightforward at this point.
If you use the type-safe API, it would look like this:
let query = prices.select(defindex, average(price))
.filter(quality == 5 && price_index != 0)
.group(defindex)
.order(average(price).desc)
let rows = Array(query)

Schedule planing procedure

My family owns a medium sized transport company and when i came in the business 3 years ago we had no software to manage all the transports we had to do. With 20 drivers this was a problem, so i sat down, learned the basics of VBA and made an app trough excel to manage/dispatch the different trips by email to our different drivers. It "works" for now but we are planing a future expansion so i started learning Xojo (im on a mac, closest thing to VBA)
We receive a Excel file to tell us which trips we have to do one day ahead (we transport people). Basically, its a sheet with all the different customers. I import this sheet in a "week file" to use the data afterwards trough different macros. There is lot of irrelevant information in this sheet but the column we will be interested too are the Type, Number and Hour.
So basically, i have to take all my rows (100+), group them by type and number, then order them by hour.
Heres a quick example of what my sheet looks like when sorted (the different colours are different drivers):
I think my procedure to get this result is not really that good. I loop trough all the rows in a data sheet (which is hidden) with a If statement checking if its a new type or trip number, save the time and row reference (first row, last row) in an array, then loop trough the array to export the ranges on the display sheet. Keep in mind that i wrote this 3 weeks after learning that VBA existed. It "works" but id like to have a better process.
I will be using SQLite to store all the information in the application im starting to write. Id like to have suggestion as to how i could sort all my data faster using SQL. Im looking for a procedure, i can figure out a way to code it.
Heres a sample of the code i made.
For RowSearch = 2 To RowCount
If Sheets(DataSheetName).Cells(RowSearch, 2).Value <> Sheets(DataSheetName).Cells(RowSearch - 1, 2).Value _
Or Sheets(DataSheetName).Cells(RowSearch, 3).Value <> Sheets(DataSheetName).Cells(RowSearch - 1, 3).Value Then
Blocks(TripCount, 1) = Position
Blocks(TripCount, 2) = RowSearch - 1
Blocks(TripCount, 3) = Format(Sheets(DataSheetName).Cells(Position, 4).Value, "hh:mm")
TripCount = TripCount + 1
Position = RowSearch
End If
Next RowSearch
Blocks(TripCount, 1) = Position
Blocks(TripCount, 2) = RowSearch - 1
Blocks(TripCount, 3) = Format(Sheets(DataSheetName).Cells(Position, 4).Value, "hh:mm")
'Sorts the blocks by time, loops trought the trips row range to sort the trips by time and type and writes the blocks
RowSelect = 1
For BlockSearch = 1 To TripCount
TempHour = "99:99"
For RowOrder = 1 To TripCount
If Blocks(RowOrder, 3) <= TempHour Then
TempHour = Blocks(RowOrder, 3)
Trips(BlockSearch, 1) = Blocks(RowOrder, 1)
Trips(BlockSearch, 2) = Blocks(RowOrder, 2)
RowChange = RowOrder
End If
Next RowOrder
RowRange = Trips(BlockSearch, 2) - Trips(BlockSearch, 1) + 1
FieldValue = Sheets(DataSheetName).Range("A" & Trips(BlockSearch, 1) & ":" & "R" & Trips(BlockSearch, 2))
Sheets(SheetName).Range("A" & RowSelect & ":" & "R" & RowSelect + RowRange - 1) = FieldValue
Sheets(SheetName).Rows(RowSelect + RowRange).Insert Shift = xlDown
RowSelect = RowSelect + RowRange + 1
Blocks(RowChange, 3) = "99:99"
Next BlockSearch
In SQL, "grouping" is an operation that not only partitions the rows into groups, but also aggregates all a group's rows to create a single output row for each group.
In your example, the rows are simply sorted by type, number, and hour, which would require a query like this:
SELECT *
FROM MyTable
ORDER BY Type, Number, Hour

Resources