SSIS Merge Join by closest time - datetime

I have multiple datasources. After many transformations, I end with something like this:
Datasource1
StationId: Id integer
OriginDate: Datetime (i.e. 2011-04-25 16:53:26.623)
I have transformed this component to get separated the Year, the month and the day, doing so like this:
(DT_I2)YEAR(OriginDate)
and the same for the month using MONTH(OriginDate) and the Day.
On the other hand I also have the Time using (DT_DBTIME)OriginDate
I made those transformations because I think they may be useful for my question but not sure about it.
This datasource come from MSSQL
Datasource2
CurrentDate (i.e. 01/04/2011)
StationId: Integer
Hour (i.e. 01:59)
note that in this ds both fields are varchar.
This datasource comes from MySQL
What I need is to join this two datasets by the same stationId (easy) and for the closest date (not easy). I should get for each row in Datasource1 the row with the closest time in the other datasource, I mean, for the same Date, the closest time. This is because the datasource2 are measures at a given time, and the datasource1 are events at a given time, and I need to relate them.
How can I achieve this ? The Merge join component only allows to join by equals expressions. I would like to avoid staging if possible.
I thought on separate the hour and minutes from both datasources and compare them, and also compare for equality in dates and stationId, but I am not exactly sure how to accomplish the first part of that.
I am stucked with this, about which approach to take.

Related

Separating DATE / DATETIME elements into different columns for fast querying in MySQL?

I want to store data in MySQL and query it based on the current day. I want to know what is the best practice to do so.
I want to store data totals for each day, so queries total data will be quick. I thought about modeling my table as follows:
TotalsByCountry
- Year
- Month
- Day
- countryId
- totalNumber
When I query the totals for a specific day and for specific country, I will query the table based on 4 columns, the Year, Month, Day and countryId.
I wanted to know if this is a good practice, or there is a better way to do so, like using one columns for data that holds the month, day and year, and query only two columns, the datetime columns and the coutryId.
need you help in choosing the right way to model the table. I also want to make another table that store totals based on gender, so take that into consideration too.
The data will need to be accessed frequently, maybe in real time because I want to show the data changes in real time. I will be developing the web app in asp.net and probably use web sockets to create constant connection that will update the data on the user in real time. So when data changes, it will be reflected on the user webpage in real-time. That's why I need a table modeling that will be ready for many queries. I will use caching for a few seconds so it want stress the db too much.
I hope I provided enough information, if not, please comment and I will reply.
Having three separate columns to store each individual element of a date (year/month/day) will add unnecessary overhead to your database in terms of insert performance and disk space.
What you will want to do is simply have a single DATETIME column to store the date and time, and have a composite index set up on (countryId, datetime_col).
Even if you wanted to query all rows based on a specific day or month, MySQL will still be able to utilize indexes on the DATETIME field, provided that you are writing your queries in the right way and making sure to never to wrap the DATETIME column within a function when you perform your conditional check.
Here is how you can write your query so that it will still be able to utilize indexes:
-- Get the sum of totalNumber of all rows based on current day
-- where countryId = 1
SELECT SUM(totalNumber) AS totalsum
FROM tbl
WHERE countryId = 1 AND
datetime_col >= CAST(CURDATE() AS DATETIME) AND
datetime_col < CAST(CURDATE() + INTERVAL 1 DAY AS DATETIME)
By making the comparison on the bare DATETIME column, the query remains sargable(i.e. able to utilize index range scans) and MySQL will be able to use indexes to quickly look up rows.
On the other hand, if you were to try to wrap the DATETIME column within a function to make the comparison:
-- Get the sum of totalNumber of all rows based on current day
-- where countryId = 1
SELECT SUM(totalNumber) AS totalsum
FROM tbl
WHERE countryId = 1 AND
DATE(datetime_col) = CURDATE()
...It would be quite inefficient because the DATE() function that wraps the column effectively renders the query as non-sargable, and any kind of index you have set up containing the DATETIME column will not be utilized.
You can also efficiently query for the total sum of all rows in the current month:
-- Get the sum of totalNumber of all rows based on current month
-- where countryId = 1
SELECT SUM(totalNumber) AS monthsum
FROM tbl
WHERE countryId = 1 AND
datetime_col >= CAST(CONCAT(YEAR(NOW()), '-', MONTH(NOW()), '-01') AS DATETIME) AND
datetime_col < CAST(CONCAT(YEAR(NOW()), '-', MONTH(NOW()), '-01') AS DATETIME) + INTERVAL 1 MONTH
And within the current year:
-- Get the sum of totalNumber of all rows based on current year
-- where countryId = 1
SELECT SUM(totalNumber) AS yearsum
FROM tbl
WHERE countryId = 1 AND
datetime_col >= CAST(CONCAT(YEAR(NOW()), '-01-01') AS DATETIME) AND
datetime_col < CAST(CONCAT(YEAR(NOW()), '-01-01') AS DATETIME) + INTERVAL 1 YEAR
My argument is:
If you want to be fast on a database lookups, you need well built queries that uses indexes.
Your approach require 4 indexes (that means slower insert), using a single date column you will require just two indexes, Also the query complexity will increase if you ever need to search for date ranges.

Time diff calculations where date and time are in seperate columns

I've got a query where I'm trying to get the hours in duration (eg 6.5 hours) between two different times.
In my database, time and date are held in different fields so I can efficiently query on just a startDate, or endDate as I never query specifically on time.
My query looks like this
SELECT COUNT(*), IFNULL(SUM(TIMEDIFF(endTime,startTime)),0) FROM events WHERE user=18
Sometimes an event will go overnight, so the difference between times needs to take into account the differences between the dates as well.
I've been trying
SELECT COUNT(*), IFNULL(SUM(TIMEDIFF(CONCAT(endDate,' ',endTime),CONCAT(startDate,' ',startTime))),0) FROM events WHERE user=18
Unfortunately I only get errors when I do this, and I can't seem to combine the two fields into a single timestamp.
Pretty sure your problem is that your concatenated values are being sent to TIMEDIFF() as strings rather than DATETIMEs. Try calling the DATETIME function on them:
SELECT COUNT(*), IFNULL(SUM(TIMEDIFF(DATETIME(CONCAT(endDate,' ',endTime)),DATETIME(CONCAT(startDate,' ',startTime)))),0) FROM events WHERE user=18
I don't have a MySQL DB in front of my to test that, but I think that or some similar form of it is what you are looking for. There's an example of it in the MySQL docs involving MICROSECOND:
http://dev.mysql.com/doc/refman/5.0/en/datetime.html
Edit: Hmm... looks like TIMEDIFF is supposed to work with strings. Worth trying anyway.
TIMEDIFF(endDate,startDate) + TIMEDIFF(endTime,startTime)

ASP.NET / SQL drop-down list sort order

I am trying to correct the sort order of my ASP.NET drop down list.
The problem I have is that I need to select a distinct Serial number and have these numbers organised by DateTime Desc.
However I cannot ORDER BY DateTime if using DISTINCT without selecting the DateTime field in my query.
However if I select DateTime this selects every data value associated with a single Serial number and results in duplications.
The purpose of my page is to display data for ALL Serials, or data associated to one serial. When a new cycle begins (because it is a new production run) the Serial reverts to 1. So I cannot simply organise by serial number either.
When I use the following SQL statement the list box is in the order I require but after a period of time (usually a few hours) the order changes and appears to have no organised structure.
alt text http://img7.imageshack.us/i/captureky.jpg/
I'm fairly new to ASP.NET / SQL, does anyone know of a solution to my problem.
If you have multiple date times for each serial number, then which do you want to use for ordering? If the most recent, try this:
SELECT SerialNumber,
MAX(DateTimeField)
FROM Table
GROUP BY SerialNumber
ORDER BY 2 DESC
I don´t know if everybody agrees with that, but when I see a DISTINCT in a query the first thought that goes trough my mind is "This is wrong". Generally, DISTINCT is not necessary and it´s used when the person writing the query doesnt know very well what he is doing and this might be the case since you said you are new with Sql.
Without complete knowledge of your model is difficult to assist you a hundred percente, but I would say that you should use a GROUP BY clause instead of DISTINCT, then you can order it correctly.

How do I pull values between two datetimes at specific interval in MySQL?

I have an application that writes temperature values to a MySQL table every second, It consists of the temperature and a datetime field.
I need to pull these values out of the table at specific intervals, every second, minute, hour etc.
So for example I will need to pull out values between 2 datetime fields and show the temperature at the hour for each of them.
One option I've considered is to create a temporary table that holds a list of the time intervals generated using MySQL INTERVAL and then joining that against the main table.
I'm just wondering if there's some time and date functions in MySQL that I'm overlooking that would make this easier?
Thanks.
You could use between for your date, and then a conditional WHERE clause using time() that looks at the structure of the timestamp. If it has 00:00 (for instance, 16:00:00) within it, take it, if not, leave it.
Example (untested):
SELECT temp, date
FROM temperatures
WHERE (date BETWEEN '2009/01/03 12:00:00' AND '2009/01/04 12:00:00')
AND (time(date) LIKE '%:00:00')
ORDER BY date ASC
LIMIT 10

How to store and get datetime value in SQLite

My table contains Birthdate field which has datatype as datetime.
I want to get all records having birthday today.
How can I get it?
Try this query:
SELECT * FROM mytable
WHERE strftime('%m-%d', 'now') = strftime('%m-%d', birthday)
Having a special datetime type has always seemed like unnecessary overhead to me, integers are fast, flexible, and use less space.
For general datetime values use Unix Epoch timestamps. Easy to work with, extremely flexible, as well as timezone (and even calender!) agnostic. (I recently wrote an article on using them, which I really have to plug...)
That said, if you're only interested in dates in the Gregorian calendar you may want to use a large integer in the following format: YYYYMMDD, eg 19761203. For you particular usage you could even create a four digit integer like MMDD, say 1703 — that's got to result in fast selects!
SQLite has very poor support for storing dates. You can use the method suggested by Nick D above but bear in mind that this query will result in full table scan since dates are not indexed correctly in SQLite (actually SQLite does not support dates as a built-in type at all).
If you really want to do a fast query then you'll have to add a separate (integral) column for storing the birth day (1-31) and attach an index for it in the database.
If you only want to compare dates then you can add a single (INTEGER) column that will store the date UTC value (but this trick won't allow you to search for individual date components easily).
Good Luck

Resources