Mismatched Timestamp after Query - datetime

I've found that sometimes comparing a timestamp on Google Sheets returned in a query differs from the original the query was based on.
At the online community I'm volunteering in, we use Google Forms to record volunteer hours. For our users to be able to verify their clock in/clock outs, we take the form responses with timestamps and filter them via a Query to only display those for one specific user:
=QUERY(A:F,"Select A,B,D where '"&J4&"'=F")
where J4 contains the username we are filtering for.
We calculate the row each stamp can be found in via a Match function where M2:M is the range containing the timestamp the query above returns and A2:A is the original timestamp.
=iferror(arrayformula(MATCH(M2:M,A2:A,0)+1),)
Now we found that sometimes, the MATCH failed even though we could verify that the timestamp in question existed. Some format wrangling later, we found the problem, illustrated for one example below:
The timestamp in question read 2/8/2018 4:12:47. Converted to a decimal, the value in column A turned into 43139.1755413195, while the very same time stamp in the query result read 43139.1755413194. The very last decimal, invisible unless you change the format to number and look at the formula line at the top of the sheet, has changed.
We have several different time stamps where the last decimal in the query result differs from the original the query is based on. Whether the last decimal in the query was one higher or lower than the original was inconsistent.
For our sheet, we now implemented a workaround of truncating the number earlier. However, that seems very inelegant. Is there a more elegant solution or a way to prevent (what we assume to be) rounding errors like this from happening? My search of google and the forums has not turned up anything like it, though I'm having trouble phrasing it in a way that gives me relevant hits.

Related

How to ingest historical data with proper creation time?

When ingesting historical data, we would like it to become consistent with streamed data with respect to caching and retention, hence we need to set proper creation time on the data extents.
The options I found:
creationTime ingestion property,
with(creationTime='...') query ingestion property,
creationTimePattern parameter of Lightingest.
All options seem to have very limited usability as they require manual work or scripting to populate creationTime with some granularity based on the ingested data.
In case the "virtual" ingestion time can be extracted from data in form of a datetime column or otherwise inherited (e.g. based on integer ID), is it possible to instruct the engine to set creation time as an expression based on the data row?
If such a feature is missing, what could be other handy alternatives?
creationTime is a tag on an extent/shard.
The idea is to be able to effectively identify and drop / cool data at the end of the retention time.
In this context, your suggested capability raises some serious issues.
If all records have the same date, no problem, we can use this date as our tag.
If we have different dates, but they span on a short period, we might decide to take min / avg / max date.
However -
What is the behavior you would expect in case of a file that contains dates that span on a long period?
Fail the ingestion?
Use the current time as the creationTime?
Use the min / avg / max date, although they clearly don't fit the data well?
Park the records in a tamp store until (if ever) we get enough records with similar dates to create the batches?
Scripting seems the most reasonable way to go here.
If your files are indeed homogenous by their records dates, then you don't need to scan all records, just read the 1st record and use its date.
If the dates are heterogenous, then we are at the scenario described by the "However" part.

Power BI Power Query: Adding Custom Column is Changing Date Format

I have an Azure Databricks data source, and I am experiencing the strangest behavior I have ever seen. I have a datetime field that has milliseconds that I need to retain.
I am able to parse this out to get just the milliseconds, or create a text-friendly key field, but as soon as I add any Custom Column step, the date gets reformatted and drops the milliseconds, which then ruins my other calculations.
The datetime column remains Text data type. The custom column is not referencing my datetime--it's a completely unrelated calculation. It's as if, during calculation of a new field, it creates a shallow copy that then re-detects the metadata and tries to be smart about datetimes.
I have no idea how to stop this. I have disabled the following Options:
This is literally blocking me from doing duration analysis on events. Has anyone encountered this before?
If you are using this below code for generating the custom column, you will see the values in custom column is up to Second. But internally the millisecond also available with that value. You can do your further sorting or duration calculation which should consider the millisecond as well.
DateTime.FromText([Timestamp])

DynamoDB top item per partition

We are new to DynamoDB and struggling with what seems like it would be a simple task.
It is not actually related to stocks (it's about recording machine results over time) but the stock example is the simplest I can think of that illustrates the goal and problems we're facing.
The two query scenarios are:
All historical values of given stock symbol <= We think we have this figured out
The latest value of all stock symbols <= We do not have a good solution here!
Assume that updates are not synchronized, e.g. the moment of the last update record for TSLA maybe different than for AMZN.
The 3 attributes are just { Symbol, Moment, Value }. We could make the hash_key Symbol, range_key Moment, and believe we could achieve the first query easily/efficiently.
We also assume could get the latest value for a single, specified Symbol following https://stackoverflow.com/a/12008398
The SQL solution for getting the latest value for each Symbol would look a lot like https://stackoverflow.com/a/6841644
But... we can't come up with anything efficient for DynamoDB.
Is it possible to do this without either retrieving everything or making multiple round trips?
The best idea we have so far is to somehow use update triggers or streams to track the latest record per Symbol and essentially keep that cached. That could be in a separate table or the same table with extra info like a column IsLatestForMachineKey (effectively a bool). With every insert, you'd grab the one where IsLatestForMachineKey=1, compare the Moment and if the insertion is newer, set the new one to 1 and the older one to 0.
This is starting to feel complicated enough that I question whether we're taking the right approach at all, or maybe DynamoDB itself is a bad fit for this, even though the use case seems so simple and common.
There is a way that is fairly straightforward, in my opinion.
Rather than using a GSI, just use two tables with (almost) the exact same schema. The hash key of both should be symbol. They should both have moment and value. Pick one of the tables to be stocks-current and the other to be stocks-historical. stocks-current has no range key. stocks-historical uses moment as a range key.
Whenever you write an item, write it to both tables. If you need strong consistency between the two tables, use the TransactWriteItems api.
If your data might arrive out of order, you can add a ConditionExpression to prevent newer data in stocks-current from being overwritten by out of order data.
The read operations are pretty straightforward, but I’ll state them anyway. To get the latest value for everything, scan the stocks-current table. To get historical data for a stock, query the stocks-historical table with no range key condition.

Date being identified sometimes as string, sometimes as date

I have a script that uploads a Date / Time to Datastore. Everything worked perfectly for a while, but after some time I noticed the following:
I decided to investigate this issue, using the following query:
SELECT DISTINCT data_subida FROM mynamespace ORDER BY data_subida DESC
The query above got me the following results:
It shows me that it stopped sending data on January 10th, but I am sure I am sending more data, so I scrolled down and found the following:
At some point, Datastore stopped storing my date as a Date / Time and started storing it as a String. But, if I open the entity to visualize its data, I get the following result:
So, is this a common issue? Am I doing something wrong? Is there a way to tell Datastore to CAST or CONVERT my field before ordering? Or at least force it to query only the ones interpreted as Date / Time or as String. I need to use this timestamp as a watermark in a ETL process, and without proper ordering I will duplicate the data.
Thanks in advance.

How to pass parameters to measures in Power BI?

I'm new to Power BI and here's the deal:
I have the following query which calculates a measure:
MyMeasure = CALCULATE(COUNTA(F_incident[INCIDENT_ID]);F_incident[OPEN_TIME]>DATE(2016;1;1))
I need the date to be replaced by a parameter #param, so that external users could enter custom dates causing the measure to recalculate.
Is this possible in Power BI?
In your situation you are looking for an end-user to enter a date. That date will then be used in a measure to show you the number of incidents since that date (but not including that date).
I would recommend, as mentioned in the comments, a regular date table related to your F_Incident table that you could then use with a regular date slicer. While a regular date slicer requires a range rather than a single date, it is a lot more flexible for the end-user. Power BI's built-in slicer handles dates quite well. E.g. the relative date slicer allows an end-user to quickly pick "last month" as an option. See: https://powerbi.microsoft.com/en-us/blog/power-bi-desktop-may-feature-summary/#reportView
If you've genuinely ruled out a regular date table for some reason, then another solution for a measure that responds to user input is to create a disconnected parameter table with options for the user to choose from (typically via a slicer). More information here: http://www.daxpatterns.com/parameter-table/
This parameter table can certainly be a date table. Because the table isn't related to any other table, it doesn't automatically filter anything. However, it can be referenced in measures such as you describe in your question. (I would recommend doing more error checking in your measure for situations such as nothing being selected, or multiple dates being selected.)
Once you have a parameter table set up, you can also pass in the filter information by URL. More information here: https://powerbi.microsoft.com/en-us/documentation/powerbi-service-url-filters/. Note that you can't pass a date directly via URL, but if you add a text-field version of the date in your parameter table, you can filter on that to the same effect. Note, however, that it's more common to put a slicer for the parameter value right on the report rather than passing it in via URL.

Resources