Power BI Power Query: Adding Custom Column is Changing Date Format - datetime

I have an Azure Databricks data source, and I am experiencing the strangest behavior I have ever seen. I have a datetime field that has milliseconds that I need to retain.
I am able to parse this out to get just the milliseconds, or create a text-friendly key field, but as soon as I add any Custom Column step, the date gets reformatted and drops the milliseconds, which then ruins my other calculations.
The datetime column remains Text data type. The custom column is not referencing my datetime--it's a completely unrelated calculation. It's as if, during calculation of a new field, it creates a shallow copy that then re-detects the metadata and tries to be smart about datetimes.
I have no idea how to stop this. I have disabled the following Options:
This is literally blocking me from doing duration analysis on events. Has anyone encountered this before?

If you are using this below code for generating the custom column, you will see the values in custom column is up to Second. But internally the millisecond also available with that value. You can do your further sorting or duration calculation which should consider the millisecond as well.
DateTime.FromText([Timestamp])

Related

How to ingest historical data with proper creation time?

When ingesting historical data, we would like it to become consistent with streamed data with respect to caching and retention, hence we need to set proper creation time on the data extents.
The options I found:
creationTime ingestion property,
with(creationTime='...') query ingestion property,
creationTimePattern parameter of Lightingest.
All options seem to have very limited usability as they require manual work or scripting to populate creationTime with some granularity based on the ingested data.
In case the "virtual" ingestion time can be extracted from data in form of a datetime column or otherwise inherited (e.g. based on integer ID), is it possible to instruct the engine to set creation time as an expression based on the data row?
If such a feature is missing, what could be other handy alternatives?
creationTime is a tag on an extent/shard.
The idea is to be able to effectively identify and drop / cool data at the end of the retention time.
In this context, your suggested capability raises some serious issues.
If all records have the same date, no problem, we can use this date as our tag.
If we have different dates, but they span on a short period, we might decide to take min / avg / max date.
However -
What is the behavior you would expect in case of a file that contains dates that span on a long period?
Fail the ingestion?
Use the current time as the creationTime?
Use the min / avg / max date, although they clearly don't fit the data well?
Park the records in a tamp store until (if ever) we get enough records with similar dates to create the batches?
Scripting seems the most reasonable way to go here.
If your files are indeed homogenous by their records dates, then you don't need to scan all records, just read the 1st record and use its date.
If the dates are heterogenous, then we are at the scenario described by the "However" part.

Creating a filter to switch between timezones in Tableau Report

I am adding a filter in Tableau to switch between timezones when looking at the report.
Currently, I have a date time field in MT and I want to have a filter where you can go back and forth between MT and CT.
What would be the best way to accomplish this?
Would it be better practice to add a new field to my data source for the central timezone or to handle the conversion logic in Tableau?
This can be achieved in tableau within 2 steps:
Create a parameter with the strings of time zones that you want to display
Create a calculated field that is connected to this parameter and has a switch case to add or subtract time from your original timestamp.
If switching between timezones is done frequently then it is recommended to insert a column in your database itself. Depends on your data size and dashboard performance.

Mismatched Timestamp after Query

I've found that sometimes comparing a timestamp on Google Sheets returned in a query differs from the original the query was based on.
At the online community I'm volunteering in, we use Google Forms to record volunteer hours. For our users to be able to verify their clock in/clock outs, we take the form responses with timestamps and filter them via a Query to only display those for one specific user:
=QUERY(A:F,"Select A,B,D where '"&J4&"'=F")
where J4 contains the username we are filtering for.
We calculate the row each stamp can be found in via a Match function where M2:M is the range containing the timestamp the query above returns and A2:A is the original timestamp.
=iferror(arrayformula(MATCH(M2:M,A2:A,0)+1),)
Now we found that sometimes, the MATCH failed even though we could verify that the timestamp in question existed. Some format wrangling later, we found the problem, illustrated for one example below:
The timestamp in question read 2/8/2018 4:12:47. Converted to a decimal, the value in column A turned into 43139.1755413195, while the very same time stamp in the query result read 43139.1755413194. The very last decimal, invisible unless you change the format to number and look at the formula line at the top of the sheet, has changed.
We have several different time stamps where the last decimal in the query result differs from the original the query is based on. Whether the last decimal in the query was one higher or lower than the original was inconsistent.
For our sheet, we now implemented a workaround of truncating the number earlier. However, that seems very inelegant. Is there a more elegant solution or a way to prevent (what we assume to be) rounding errors like this from happening? My search of google and the forums has not turned up anything like it, though I'm having trouble phrasing it in a way that gives me relevant hits.

How to pass parameters to measures in Power BI?

I'm new to Power BI and here's the deal:
I have the following query which calculates a measure:
MyMeasure = CALCULATE(COUNTA(F_incident[INCIDENT_ID]);F_incident[OPEN_TIME]>DATE(2016;1;1))
I need the date to be replaced by a parameter #param, so that external users could enter custom dates causing the measure to recalculate.
Is this possible in Power BI?
In your situation you are looking for an end-user to enter a date. That date will then be used in a measure to show you the number of incidents since that date (but not including that date).
I would recommend, as mentioned in the comments, a regular date table related to your F_Incident table that you could then use with a regular date slicer. While a regular date slicer requires a range rather than a single date, it is a lot more flexible for the end-user. Power BI's built-in slicer handles dates quite well. E.g. the relative date slicer allows an end-user to quickly pick "last month" as an option. See: https://powerbi.microsoft.com/en-us/blog/power-bi-desktop-may-feature-summary/#reportView
If you've genuinely ruled out a regular date table for some reason, then another solution for a measure that responds to user input is to create a disconnected parameter table with options for the user to choose from (typically via a slicer). More information here: http://www.daxpatterns.com/parameter-table/
This parameter table can certainly be a date table. Because the table isn't related to any other table, it doesn't automatically filter anything. However, it can be referenced in measures such as you describe in your question. (I would recommend doing more error checking in your measure for situations such as nothing being selected, or multiple dates being selected.)
Once you have a parameter table set up, you can also pass in the filter information by URL. More information here: https://powerbi.microsoft.com/en-us/documentation/powerbi-service-url-filters/. Note that you can't pass a date directly via URL, but if you add a text-field version of the date in your parameter table, you can filter on that to the same effect. Note, however, that it's more common to put a slicer for the parameter value right on the report rather than passing it in via URL.

Array calculation in Tableau, maxif routine

I'm fairly new to Tableau, and I'm struggling in building some routines that could be easily implemented in Excel (though it would take forever for big sets of data).
So here is the deal, consider a dataset with the following fields:
int [id_order] -> id of the sales order (deepest level, there are only unique entries of id_order)
int [id_client] -> as I want to know who bought it
date [purchase_date] -> when the customer bought the product
What I want to know is, for each order, when was the last time (if ever) the client has bought something. In order words, what is the highest purchase_date for that user that is smaller than current purchase_date.
In excel, approach is simple (but again, not efficient)
{=max(if(id_client=B1,if(purchase_order
Is there a way to do this kind of calculation in Tableau?
You can do this in Tableau using table calculations. They take a little time to understand how to use well, but are very powerful and flexible. I posted a sample Tableau workbook for a similar question in an answer for SO question Find first time a condition is met
Your situation is similar, but with the extra complication that you want to repeat the analysis for each client id, so you might want to try a recursive approach using the Previous_Value() function instead of the approach used in that example - though I'm not certain that previous_value() will fit your situation.
Still, it might be helpful to download the example workbook I mentioned to get an idea how table calculations can address similar problems.
Just to register the solution, in case someone has the same doubt.
So, basically the solution I found use table calculation, which is not calculated until it's called on a sheet (and is only calculated on the context of the sheet). That's a little bit limiting, so what I do is create a sheet with all the fields I need (+ what is necessary for the table calculation) then export the data (to mdb) and connect to this new file.
So, for my example, the right table calculation is (let's name it last_order_date):
LOOKUP(MAX([purchase_date]),-1)
Explanations. The MAX() is necessary because Lookup (and all table calculations) does not work with data directly, only with aggregations. You can use sum, avg, max, attr, whatever suits you. As in my case there will be only 1 correspondence, any function will do just fine and return the same value.
The -1 indicates that I'm looking for the element immediately before the current entry (of the table, as you define it). If it were FIRST(), it would go for the first entry of the table, and LAST() would go for the last.
Now, I have to put it on a sheet. So I'll bring the fields id_client, id_order, purchase_date and last_order_date.
Then I have to define the parameters of my table calculation last_order_date (Edit Table Calculation). I'll go to Compute using and choose advanced. Now I'll do Partitioning: id_client, and addressing all the rest. What will that do? This mean Tableau will create temporary tables for each id_client, and table calculations will use those tables as parameter.
Additionally, I will Sort by field purchase_date, Max (again the aggregation issue) and ascending, to guarantee my entries are in chronological order.
Now, what will it do? For each entry it will access the table of the id_client, and check what was the purchase_date that is immediately before the current entry (that is being assessed), exactly what I need.
To avoid spending Tableau processing in Visualization, I often put all the fields in details (and leave nothing on screen), use Bar chart (it's good because it allows me to see the data). Then I export it to mdb, then connect to it again. Unfortunately Tableau doesn't directly export to tde.

Resources