I have a script that uploads a Date / Time to Datastore. Everything worked perfectly for a while, but after some time I noticed the following:
I decided to investigate this issue, using the following query:
SELECT DISTINCT data_subida FROM mynamespace ORDER BY data_subida DESC
The query above got me the following results:
It shows me that it stopped sending data on January 10th, but I am sure I am sending more data, so I scrolled down and found the following:
At some point, Datastore stopped storing my date as a Date / Time and started storing it as a String. But, if I open the entity to visualize its data, I get the following result:
So, is this a common issue? Am I doing something wrong? Is there a way to tell Datastore to CAST or CONVERT my field before ordering? Or at least force it to query only the ones interpreted as Date / Time or as String. I need to use this timestamp as a watermark in a ETL process, and without proper ordering I will duplicate the data.
Thanks in advance.
Related
I have an Azure Databricks data source, and I am experiencing the strangest behavior I have ever seen. I have a datetime field that has milliseconds that I need to retain.
I am able to parse this out to get just the milliseconds, or create a text-friendly key field, but as soon as I add any Custom Column step, the date gets reformatted and drops the milliseconds, which then ruins my other calculations.
The datetime column remains Text data type. The custom column is not referencing my datetime--it's a completely unrelated calculation. It's as if, during calculation of a new field, it creates a shallow copy that then re-detects the metadata and tries to be smart about datetimes.
I have no idea how to stop this. I have disabled the following Options:
This is literally blocking me from doing duration analysis on events. Has anyone encountered this before?
If you are using this below code for generating the custom column, you will see the values in custom column is up to Second. But internally the millisecond also available with that value. You can do your further sorting or duration calculation which should consider the millisecond as well.
DateTime.FromText([Timestamp])
I've found that sometimes comparing a timestamp on Google Sheets returned in a query differs from the original the query was based on.
At the online community I'm volunteering in, we use Google Forms to record volunteer hours. For our users to be able to verify their clock in/clock outs, we take the form responses with timestamps and filter them via a Query to only display those for one specific user:
=QUERY(A:F,"Select A,B,D where '"&J4&"'=F")
where J4 contains the username we are filtering for.
We calculate the row each stamp can be found in via a Match function where M2:M is the range containing the timestamp the query above returns and A2:A is the original timestamp.
=iferror(arrayformula(MATCH(M2:M,A2:A,0)+1),)
Now we found that sometimes, the MATCH failed even though we could verify that the timestamp in question existed. Some format wrangling later, we found the problem, illustrated for one example below:
The timestamp in question read 2/8/2018 4:12:47. Converted to a decimal, the value in column A turned into 43139.1755413195, while the very same time stamp in the query result read 43139.1755413194. The very last decimal, invisible unless you change the format to number and look at the formula line at the top of the sheet, has changed.
We have several different time stamps where the last decimal in the query result differs from the original the query is based on. Whether the last decimal in the query was one higher or lower than the original was inconsistent.
For our sheet, we now implemented a workaround of truncating the number earlier. However, that seems very inelegant. Is there a more elegant solution or a way to prevent (what we assume to be) rounding errors like this from happening? My search of google and the forums has not turned up anything like it, though I'm having trouble phrasing it in a way that gives me relevant hits.
I'm trying to save a property value that looks similar to a date, or part of a date, that gives me an error in Azure Cosmos DB with Graph API (Gremlin) like the following:
g.V('id').property('PartReference', '2016-02');
The error message
Gremlin Query Compilation Error: Data type 'Date' not yet supported by
Binary Comparison functions
To me it seems like Gremlin or Cosmos DB is trying to guess the datatype and get it wrong?
At the time of writing, Azure's graph API cares only about three types of data: bool, string and number. At the route of it, you should be able to convert any complex or contextual data into it's primitive representation and by-pass this delight of theirs...
For date and time data, I have settled for using ticks, which can be saved as a number which is filterable
I disagree that it only cares about bool, string and number as it is obviously trying to process a date string as a date. I have hit this problem where I have serialised to ISO and got back a US format with only seconds.
I do agree that the work around for now is to use ticks, I have switched to ticks, and hope when this problem is solved I can reprocess the data and go back to ISO format.
I have not tried the gremlin.net api, this might handle dates consistently.
I've done a web application using PHP and postgres. Now, that same application I'm translating to JavaScript and SQLite. I must say, it's not been too tough and SQLite has successfully been able to interpret the same queries as I use in postgres.
Except for this one.
SELECT SUM(t.subtotal)/MAX(EXTRACT(epoch FROM (r.fecha_out - r.fecha_in))/86400) AS subtotal,
COUNT(t.id) AS habitaciones FROM reserva_habitacion t
LEFT JOIN reserva r ON t.id_reserva=r.id
WHERE (r.fecha_in <= "2015-03-27" AND r.fecha_out > "2015-03-27") AND r.estado <> 5
Using the FireFox plugin "SQLiteManager" it hints me that the error is this part epoch FROM, but I cannot get my head around it. What am I doing wrong and how could I fix it?
Any suggestions are welcome!
SQLite, unusually for a relational database, is completely dynamically typed, as discussed in this manual page.
Postgres, in contrast, is strictly typed, and uses operator overloading so that timestamp - timestamp gives you an interval. An interval can then be passed to the SQL-standard extract() function, in this case to give a total number of seconds between two timestamps. (See the manual page on Date/Time functions and operators.)
In SQLite, you have no such column type, so you have two choices:
Store your DateTimes as Unix timestamps directly; at this point, the extract epoch from is redundant, because r.fecha_out - r.fecha_in will give you the difference in seconds.
Store your DateTimes as strings in a standard format, and use the SQLite Date and Time functions to work with them. In this case, you could use strftime('%s', foo) to convert each value to a Unix timestamp, e.g. strftime('%s', r.fecha_out) - strftime('%s', r.fecha_in)
I'm aware that Solr provides a date field which can store a time instance and then range queries can be performed to match all documents which have that field within a particular range.
My problem is the inverse of this. I need to associate multiple time ranges with documents and then search for all documents which have the searched time within one of those ranges.
For e.g. I'm indexing outlets and have 3-4 ranges during which the outlet is open. I need to search for all outlets which are open at a particular time instance.
One way of doing this is to index start time and end time of the durations as separate date fields and compare during search like
(time1_1 > t AND time1_2 < t) OR (time2_1 > t AND time2_2 < t) OR (time3_1 > t AND time3_2 < t)
Is there a better/faster/cleaner way to do this?
Your example looks like the entities of your index are the outlet stores and you store their opening and closing times in separate (probably dynamic) fields.
If you ask for a different approach you have to consider to restructure the existing schema or to even create an additional one that uses another entity.
It may seem unusual at first, but if this query is the most essential one to your app then you should consider making the entity of your new index to what you acutally want to query: the particular time instance. I take it, time instance is either a whole day, or maybe half or quarter of a day.
The schema would include fields like the ID, the startdate of the day or half day or whatever you choose, the end of it, and a multivalued list of ids that point to the outlets (stored in your current index (use a multi core setup)).
Even if you choose quarter days to handle morning, afternoon and night hours separately, and even with a preview of several years, data should not explode.
This different schema setup allows you to:
do the most important computation during import so that it is easily accessible when querying,
simple query that returns in one hit what you seek
You could even forgo Date fields by using a custom way to identify the ranges. I am thinking of creating the identifier from the date and a string that indicates whether it is morning or afternoon etc. This would be used as the unique ID in SOLR. If you can create such an ID from any "time instance" that is queried you'd end up with a simple ID lookup.
e.g.
What is open on 2013/03/03 in the morning?
/solr/openhours/select?q=id:2013_03_03_am
returns:
Array of outlet ids.