I'm aware that Solr provides a date field which can store a time instance and then range queries can be performed to match all documents which have that field within a particular range.
My problem is the inverse of this. I need to associate multiple time ranges with documents and then search for all documents which have the searched time within one of those ranges.
For e.g. I'm indexing outlets and have 3-4 ranges during which the outlet is open. I need to search for all outlets which are open at a particular time instance.
One way of doing this is to index start time and end time of the durations as separate date fields and compare during search like
(time1_1 > t AND time1_2 < t) OR (time2_1 > t AND time2_2 < t) OR (time3_1 > t AND time3_2 < t)
Is there a better/faster/cleaner way to do this?
Your example looks like the entities of your index are the outlet stores and you store their opening and closing times in separate (probably dynamic) fields.
If you ask for a different approach you have to consider to restructure the existing schema or to even create an additional one that uses another entity.
It may seem unusual at first, but if this query is the most essential one to your app then you should consider making the entity of your new index to what you acutally want to query: the particular time instance. I take it, time instance is either a whole day, or maybe half or quarter of a day.
The schema would include fields like the ID, the startdate of the day or half day or whatever you choose, the end of it, and a multivalued list of ids that point to the outlets (stored in your current index (use a multi core setup)).
Even if you choose quarter days to handle morning, afternoon and night hours separately, and even with a preview of several years, data should not explode.
This different schema setup allows you to:
do the most important computation during import so that it is easily accessible when querying,
simple query that returns in one hit what you seek
You could even forgo Date fields by using a custom way to identify the ranges. I am thinking of creating the identifier from the date and a string that indicates whether it is morning or afternoon etc. This would be used as the unique ID in SOLR. If you can create such an ID from any "time instance" that is queried you'd end up with a simple ID lookup.
e.g.
What is open on 2013/03/03 in the morning?
/solr/openhours/select?q=id:2013_03_03_am
returns:
Array of outlet ids.
Related
When ingesting historical data, we would like it to become consistent with streamed data with respect to caching and retention, hence we need to set proper creation time on the data extents.
The options I found:
creationTime ingestion property,
with(creationTime='...') query ingestion property,
creationTimePattern parameter of Lightingest.
All options seem to have very limited usability as they require manual work or scripting to populate creationTime with some granularity based on the ingested data.
In case the "virtual" ingestion time can be extracted from data in form of a datetime column or otherwise inherited (e.g. based on integer ID), is it possible to instruct the engine to set creation time as an expression based on the data row?
If such a feature is missing, what could be other handy alternatives?
creationTime is a tag on an extent/shard.
The idea is to be able to effectively identify and drop / cool data at the end of the retention time.
In this context, your suggested capability raises some serious issues.
If all records have the same date, no problem, we can use this date as our tag.
If we have different dates, but they span on a short period, we might decide to take min / avg / max date.
However -
What is the behavior you would expect in case of a file that contains dates that span on a long period?
Fail the ingestion?
Use the current time as the creationTime?
Use the min / avg / max date, although they clearly don't fit the data well?
Park the records in a tamp store until (if ever) we get enough records with similar dates to create the batches?
Scripting seems the most reasonable way to go here.
If your files are indeed homogenous by their records dates, then you don't need to scan all records, just read the 1st record and use its date.
If the dates are heterogenous, then we are at the scenario described by the "However" part.
Let's say i have an multi-restaurant food order app.
I'm storing orders in Firestore as documents.
Each order object/document contains:
total: double
deliveredByUid: str
restaurantId: str
I wanna see anytime during the day, the totals of every Driver to each Restaurant like so:
robert: mcdonalds: 10
kfc: 20
alex: mcdonalds: 35
kfc: 10
What is the best way of calculating the totals of all the orders?
I currently thinking of the following:
The safest and easiest method but expensive: Each time i need to know the totals i just query all the documents in that day and calculate them 1 by 1
Cloud Functions method: Each time an order has been added/removed modify a value in a Realtime database specific child: /totals/driverId/placeId
Manual totals: Each time a driver complete an order and write its id to the order object, make another write to the Realtime database specific child.
Edit: added the whole order object because i was asked to.
What I would most likely do is make sure orders are completely atomic (or as atomic as they can be). Most likely, I'd perform the order on the client within a transaction or batch write (both are atomic) that would not only create this document in question but also update the delivery driver's document by incrementing their running total. Depending on how extensible I wanted to get, I may even create subcollections within the user's document that represented chunks of time if I wanted to be able to record totals by month or year, or whatever. You really want to think this one through now.
The reason I'd advise against your suggested pattern is because it's not atomic. If the operation succeeds on the client, there is no guarantee it will succeed in the cloud. If you make both writes part of the same transaction then they could never be out of sync and you could guarantee that the total will always be accurate.
I have a request to alter current columns which are type of 'time' and instead of capturing just time I need to capture so called "utc time".
My idea is to create a fixed codelist with all timezones, and then to reference it to a appropriate table as FK.
My questions are:
Can column of a type 'time' hold also an information regarding time zones (utc, for example 15:00:00 +2 (gmt + 2)) and if not, could you suggest me another type for that column?
Should I maybe need to separate it into two columns? For example: [15:00:00] - StartTime, [+2:00] - UtcOffset
EF Insert: When I do inserting to the db, for that particular column, should I convert my DateTime object to for example DateTimeOffset?
Thanks in advance.
From the comments in your question, it sounds like you are building an appointment scheduling system. I'll base my answer on that, because your specific questions aren't quite aligned to the scenario you described.
First, it's important to understand that the relationship between a time zone and an offset is a one-to-many relationship. One time zone can have multiple offsets. In other words, a time zone is not an offset, but rather a time zone has an offset.
A time zone represents a geographic region where the local time is the same throughout. It is identified by a string ID, such as "America/Los_Angeles" (an IANA time zone ID) or "Pacific Standard Time" (a Windows time zone ID). In .NET, you will use them on the TimeZoneInfo object with the Id instance property or methods like FindSystemTimeZonesById.
An offset is like -07:00 or +05:30 or even +13:45. Any given offset applies only at a particular date and time. For example, in America/Los_Angeles, either -08:00 or -07:00 apply depending on whether daylight saving time is in effect at a given point in time. Keep in mind that DST is not the only reason for offsets to be different - many time zones have changed their standard time at some point in their history.
Also, it's called an offset because it is deviated from UTC by a certain amount. UTC itself always has a zero offset, delineated either by +00:00 or sometimes by Z. It's similar to GMT, except in how it is defined. UTC applies universally, everywhere. GMT technically applies only on the prime meridian. They both refer to the zero offset. You should prefer to say "UTC" in most cases.
Next, you should separate your application logic between future scheduling and present/past record keeping.
Present/past is the easier of the two. Since the moment in time has actually occurred, the local time and its offset from UTC is fixed forever. You can either store the local time and its offset in a single .NET DateTimeOffset structure (mapped to a datetimeoffset field in SQL Server). In other words, you can simply store 2021-07-27T12:00:00-08:00.
Note that you could instead store the equivalent UTC date and time, which would be 2021-07-27T20:00:00+00:00. However you've then lost the local time, and thus would need to convert back using the original time zone if you wanted to see that time. Some people prefer that, but I think it's more useful to store the original value.
For future scheduling, the situation is a bit different. Consider that the offset might not be the same for one appointment as it will be for the next appointment in the same time zone. Also consider that the definitions for which offset apply might change in between the time you schedule the appointment and when it comes around. (The likelihood of that increases the further out you schedule.)
Thus, for each location you should not store an offset, but rather a time zone identifier. Add a TimeZoneId to your object that stores each location (or each appointment depending on your model schema). Use TimeZoneInfo.GetSystemTimeZones to list the available time zones. The DisplayName property can be shown to your user, and the selected Id property gets assigned to the TimeZoneId.
Next you have to consider if you are scheduling a single appointment or creating a recurring appointment pattern.
For a single appointment you simply need the local date and time for that appointment. You can use a .NET DateTime struct (use datetime2 in SQL). Don't apply an offset, and don't convert to UTC. Just store the information provided.
For recurring appointments, you need to think through the information provided and store exactly what is given. For example, if the appointment is at 10:00 every other Tuesday, you'll need to store "10:00", "Tuesdays" and "2 week intervals". The data types for each will vary depending on how you choose to store and apply them. For example, you might use a time type for 10:00, but you could store the other values using integers. Appointments of different patterns could get stored in different ways.
Alternatively, some like to store patterns using a string containing a CRON expression. You can google that for more details.
Now you have everything you need to both schedule an appointment and record that appointment after it happens. But there's one part missing - you'll likely want some table of upcoming appointments that are easily queryable. For that, you've got a few options:
You can create a separate table via a background job of some kind. Periodically it would query all the appointments, use their information to compute the next upcoming appointment time, and insert it. You can store that in a DateTimeOffset, either as local time or as UTC. (SQL Server will always compute indexes on the equivalent UTC time either way.)
You could just add another field to your appointments that shows the next actual appointment time. You can then compute the next upcoming appointment whenever the appointment is created or updated, or when that appointment occurs, and update the table accordingly.
With either approach keep in mind that you will want to periodically check for time zone data updates (either via Windows Update or keeping the tzdata package current on Linux). You will also want to periodically re-compute future appointment times, in case that time zone data has been modified in a way that affects the appointments.
If all of this sounds super complicated - sorry, but it is. Doing scheduling worldwide across time zones right is challenging. If you want it simpler, you might want to look into a pre-made solution such as Quartz.NET, which you can integrate into your application.
This seems like such an elementary part of databases, cannot believe Dynamo does not do this.
Supposing I have a Case. I have 2 dates: when the Case became active, and when it became inactive. I want to write a query that would return the count of active cases for a given Date.
In SQL (and MySQL has special Date indices), I could do an expression 'where :date between active and inactive.' Cannot do this in dynamo for a bunch of reasons:
there is no date type
there only seem to be concatenated keys since everything is a hash hence no between
So far the only things I have come up with were:
Sharding - should probably shard this table, I did some reading on that and the way Dynamo does sharding seems simple, although kinda sucks that you end up with 2 tables
if I do this, then I can just ask for the active count each day and store it
which means if I wanted count for a day in the past, I would have to table scan, and worse, scan 2 tables (as I understand it)
Date Partitions - the problem here is which date do we partition on, I guess activation, then the presumption is a count for a given Date would have a key expression of active <= :date, and a filter expression of inactive is null
Distinct Events - if I am recording Events on each case, the count of active cases on a given date is also the distinct set of CaseIDs in the Events table for that date, but that looks like it's not easy to do
Still reading so would not be surprised if I am missing something obvious. Actually one other possible way to do this is move the event data to Timestream and then have it compute this aggregate.
I'm new to Power BI and here's the deal:
I have the following query which calculates a measure:
MyMeasure = CALCULATE(COUNTA(F_incident[INCIDENT_ID]);F_incident[OPEN_TIME]>DATE(2016;1;1))
I need the date to be replaced by a parameter #param, so that external users could enter custom dates causing the measure to recalculate.
Is this possible in Power BI?
In your situation you are looking for an end-user to enter a date. That date will then be used in a measure to show you the number of incidents since that date (but not including that date).
I would recommend, as mentioned in the comments, a regular date table related to your F_Incident table that you could then use with a regular date slicer. While a regular date slicer requires a range rather than a single date, it is a lot more flexible for the end-user. Power BI's built-in slicer handles dates quite well. E.g. the relative date slicer allows an end-user to quickly pick "last month" as an option. See: https://powerbi.microsoft.com/en-us/blog/power-bi-desktop-may-feature-summary/#reportView
If you've genuinely ruled out a regular date table for some reason, then another solution for a measure that responds to user input is to create a disconnected parameter table with options for the user to choose from (typically via a slicer). More information here: http://www.daxpatterns.com/parameter-table/
This parameter table can certainly be a date table. Because the table isn't related to any other table, it doesn't automatically filter anything. However, it can be referenced in measures such as you describe in your question. (I would recommend doing more error checking in your measure for situations such as nothing being selected, or multiple dates being selected.)
Once you have a parameter table set up, you can also pass in the filter information by URL. More information here: https://powerbi.microsoft.com/en-us/documentation/powerbi-service-url-filters/. Note that you can't pass a date directly via URL, but if you add a text-field version of the date in your parameter table, you can filter on that to the same effect. Note, however, that it's more common to put a slicer for the parameter value right on the report rather than passing it in via URL.