How to retrieve specific date data from the table in kusto query - azure-data-explorer

I need to retrieve the specific date data from the table
Can any one help me out this..
Sample data:
Timestamp
message
2010-09-12
king
2010-09-12
queen
2010-09-13
raju
2010-09-13
Rani
2010-09-14
Ramu
2010-09-12
somu
Expecting results:
Timestamp
message
2010-09-12
king
2010-09-12
queen
2010-09-12
somu
Only 2010-09-12 date results required.
Thanks in advance....

If all datetime values are at the start of the day then you can use a simple equality.
datatable(Timestamp:datetime, message:string)
[
datetime("2010-09-12") ,"king"
,datetime("2010-09-12") ,"queen"
,datetime("2010-09-13") ,"raju"
,datetime("2010-09-13") ,"Rani"
,datetime("2010-09-14") ,"Ramu"
,datetime("2010-09-12") ,"somu"
]
| where Timestamp == datetime("2010-09-12")
If all datetime values are at the start of the day then you can use a simple equality.
Timestamp
message
2010-09-12T00:00:00Z
king
2010-09-12T00:00:00Z
queen
2010-09-12T00:00:00Z
somu
Fiddle
If datetime values have timestamp parts, you'll need to check a range of dates.
datatable(Timestamp:datetime, message:string)
[
datetime("2010-09-12 00:00:00") ,"king"
,datetime("2010-09-12 12:34:56") ,"queen"
,datetime("2010-09-13 00:00:00") ,"raju"
,datetime("2010-09-13 15:23:02") ,"Rani"
,datetime("2010-09-14 11:11:11") ,"Ramu"
,datetime("2010-09-12 02:03:04") ,"somu"
]
| where Timestamp >= datetime("2010-09-12") and Timestamp < datetime("2010-09-13")
Timestamp
message
2010-09-12T00:00:00Z
king
2010-09-12T02:03:04Z
somu
2010-09-12T12:34:56Z
queen
Fiddle

Related

Surround Search with KQL

Surround Search with KQL: How can I retrieve five records that were logged (based on a specific datetime column) before and after (again, based on a given datetime column) one/several record(s)?
Reference from Linux logs: we can search for "failed login" and obtain a list of 5 events logged before and after a failed login. The query can be phrased as follows:
$ grep -B 5 -A 5 'failed login' var/log/auth.log
Source: https://www.manageengine.com/products/eventlog/logging-guide/syslog/analyzing-syslogs-with-tools-techniques.html > search "Surround Search".
I tried the next() operator, but it doesn't retrieve the value of the entire record, only the value in a specific column.
Example:
cluster("https://help.kusto.windows.net").database("Samples").
StormEvents
| serialize
| extend NextEpisode = next(EpisodeId,5)
| extend PrevEpisode = prev(EpisodeId,5)
| extend formated_text = strcat("Current episode: ", EpisodeId, " .Next episode: ", NextEpisode, " .Prev episode: ", PrevEpisode)
| where StartTime == datetime(2007-12-13T09:02:00Z)
| where EndTime == datetime(2007-12-13T10:30:00Z)
| project-reorder formated_text, *
rows_near plugin
cluster("https://help.kusto.windows.net").database("Samples").StormEvents
| order by StartTime asc
| evaluate rows_near(EventType == "Dense Smoke", 5)
| project StartTime, EventType
StartTime
EventType
2007-09-04T18:15:00Z
Thunderstorm Wind
2007-09-04T18:51:00Z
Thunderstorm Wind
2007-09-04T19:15:00Z
Flash Flood
2007-09-04T22:00:00Z
Dense Fog
2007-09-04T22:00:00Z
Dense Fog
2007-09-04T22:00:00Z
Dense Smoke
2007-09-04T22:00:00Z
Dense Fog
2007-09-04T22:00:00Z
Dense Fog
2007-09-05T02:00:00Z
Flash Flood
2007-09-05T04:45:00Z
Flash Flood
2007-09-05T06:00:00Z
Flash Flood
2007-10-17T15:51:00Z
Thunderstorm Wind
2007-10-17T15:55:00Z
Hail
2007-10-17T15:56:00Z
Thunderstorm Wind
2007-10-17T15:58:00Z
Hail
2007-10-17T16:00:00Z
Thunderstorm Wind
2007-10-17T16:00:00Z
Dense Smoke
2007-10-17T16:00:00Z
Thunderstorm Wind
2007-10-17T16:00:00Z
Thunderstorm Wind
2007-10-17T16:03:00Z
Funnel Cloud
2007-10-17T16:05:00Z
Thunderstorm Wind
2007-10-17T16:08:00Z
Hail
2007-11-05T06:00:00Z
Lake-Effect Snow
2007-11-05T06:00:00Z
Winter Storm
2007-11-05T07:00:00Z
Winter Storm
2007-11-05T07:00:00Z
Winter Storm
2007-11-05T07:00:00Z
Winter Storm
2007-11-05T07:00:00Z
Dense Smoke
2007-11-05T07:00:00Z
Winter Storm
2007-11-05T08:44:00Z
Hail
2007-11-05T09:57:00Z
Blizzard
2007-11-05T11:00:00Z
Strong Wind
2007-11-05T11:00:00Z
Strong Wind
Fiddle

Multiple orderBy in firestore

I have a question about how multiple orderBy works.
Supposing these documents:
collection/
doc1/
date: yesterday at 11:00pm
number: 1
doc2/
date: today at 01:00am
number: 6
doc3/
date: today at 13:00pm
number: 0
If I order by two fields like this:
.orderBy("date", "desc")
.orderBy("number", "desc")
.get()
How are those documents sorted? And, what about doing the opposite?
.orderBy("number", "desc")
.orderBy("date", "desc")
.get()
Will this result in the same order?
I'm a bit confused since I don't know if it will always end up ordering by the last orderBy.
In the documentation for orderBy() in Firebase it says this:
You can also order by multiple fields. For example, if you wanted to order by state, and within each state order by population in descending order:
Query query = cities.orderBy("state").orderBy("population", Direction.DESCENDING);
So, it is basically that. With logic from SQL where you have ORDER BY to order your table. Let's say you have a database of customers who are from all over the world. Then you can use ORDER BY Country and you will order them by their Country in any order you want. But if you add the second argument, let's say Customer Name, then it will first order by the Country and then within that ordered list it will order by Customer Name. Example:
1. Adam | USA |
2. Jake | Germany |
3. Anna | USA |
4. Semir | Croatia |
5. Hans | Germany |
When you call orderBy("country") you will get this:
1. Semir | Croatia |
2. Jake | Germany |
3. Hans | Germany |
4. Adam | USA |
5. Anna | USA |
Then when you call orderBy("customer name") you get this:
1. Semir | Croatia |
2. Hans | Germany |
3. Jake | Germany |
4. Adam | USA |
5. Anna | USA |
You can see that Hans and Jake switched places, because H is before J but they are still ordered by the Country name. In your case when you use this:
.orderBy("date", "desc")
.orderBy("number", "desc")
.get()
It will first order by the date and then by the numbers. But since you don't have the same date values, you won't notice any difference. This also goes for the second one. But let's say that one of your fields had the same date, so your data looks like this:
collection/
doc1/
date: yesterday at 11:00pm
number: 1
doc2/
date: today at 01:00am
number: 6
doc3/
date: today at 01:00am
number: 0
Now, doc2 and doc3 are both dated to today at 01:00am. Now when you order by the date they will be one below the other, probably doc2 will be shown first. But when you use orderBy("number") then it will check for numbers inside the same dates. So, if its just orderBy("number") without "desc" you would get this:
orderBy("date");
// output: 1. doc1, 2. doc2, 3. doc3
orderBy("number");
// output: 1. doc1, 2. doc3, 3. doc2
Because number 0 is before 6. Just reverse it for desc.

I want to find the day difference between 2 date column in azure app insight?

We have a log file where we store the searches happening on our platform. Now there is a departure date and I want to find the searches where departure date is after 330 days from today.
I am trying to run the query to find the difference between departure date column and logtime(entry time of the event into log). But getting the below error:
Query could not be parsed at 'datetime("departureDate")' on line [5,54]
Token: datetime("departureDate")
Line: 5
Position: 54
Date format of departure date is mm/dd/yyyy and logtime format is typical datetime format of app insight.
Query that I am running is below:
customEvents
| where name == "SearchLog"
| extend departureDate = tostring(customDimensions.departureDate)
| extend logTime = tostring(customDimensions.logTime)
| where datetime_diff('day',datetime("departureDate"),datetime("logTime")) > 200
As suggested I ran the below query but now I am getting 0 results but there is data that satisfy the given criteria.
customEvents
| where name == "SearchLog"
| extend departureDate = tostring(customDimensions.departureDate)
| extend logTime = tostring(customDimensions.logTime)
| where datetime_diff('day',todatetime(departureDate),todatetime(logTime)) > 200
Example:
departureDate
04/09/2020
logTime
8/13/2019 8:45:39 AM -04:00
I also tried the below query to check whether data format is supported or not and it gave correct response.
customEvents
| project datetime_diff('day', datetime('04/30/2020'),datetime('8/13/2019 8:25:51 AM -04:00'))
Please use the below query. Use todatetime statement to convert string to datetime
customEvents
| where name == "SearchLog"
| extend departureDate = tostring(customDimensions.departureDate)
| extend logTime = tostring(customDimensions.logTime)
| where datetime_diff('day',todatetime(departureDate),todatetime(logTime)) > 200
The double quotes inside datetime operator in where clause should be removed.
Your code should look like:
where datetime_diff('day',datetime(departureDate),datetime(logTime)) > 200

Pyspark: Convert String Datetime in 12 hour Clock to Date time with 24 hour clock (Time Zone Change)

Edit: Apologies, the sample data frame is a little off. Below is the corrected sample dataframe I'm trying to convert:
Timestamp (CST)
12/8/2018 05:23 PM
11/29/2018 10:20 PM
I tried the following code based on recommendation below but got null values returned.
df = df.withColumn('Timestamp (CST)_2', from_unixtime(unix_timestamp(col(('Timestamp (CST)')), "yyyy/MM/dd hh:mm:ss aa"), "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"))
df = df.withColumn("Timestamp (CST)_3", F.to_timestamp(F.col("Timestamp (CST)_2")))
--------------------------------------------------------------------------------
I have a field called "Timestamp (CST)" that is a string. It is in Central Standard Time.
Timestamp (CST)
2018-11-21T5:28:56 PM
2018-11-21T5:29:16 PM
How do I create a new column that takes "Timestamp (CST)" and change it to UTC and convert it to a datetime with the time stamp on the 24 hour clock?
Below is my desired table and I would like the datatype to be timestamp:
Timestamp (CST)_2
2018-11-21T17:28:56.000Z
2018-11-21T17:29:16.000Z
I tried the following code but all the results came back null:
df = df.withColumn("Timestamp (CST)_2", to_timestamp("Timestamp (CST)", "yyyy/MM/dd h:mm p"))
Firstly, import from_unixtime, unix_timestamp and col using
from pyspark.sql.functions import from_unixtime, unix_timestamp, col
Then, reconstructing your scenario in a DataFrame df_time
>>> cols = ['Timestamp (CST)']
>>> vals = [
... ('2018-11-21T5:28:56 PM',),
... ('2018-11-21T5:29:16 PM',)]
>>> df_time = spark.createDataFrame(vals, cols)
>>> df_time.show(2, False)
+---------------------+
|Timestamp (CST) |
+---------------------+
|2018-11-21T5:28:56 PM|
|2018-11-21T5:29:16 PM|
+---------------------+
Then, my approach would be
>>> df_time_twenfour = df_time.withColumn('Timestamp (CST)', \
... from_unixtime(unix_timestamp(col(('Timestamp (CST)')), "yyyy-MM-dd'T'hh:mm:ss aa"), "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"))
>>> df_time_twenfour.show(2, False)
+------------------------+
|Timestamp (CST) |
+------------------------+
|2018-11-21T17:28:56.000Z|
|2018-11-21T17:29:16.000Z|
+------------------------+
Notes
If you want time to be in 24-Hour format then, you would use HH instead of hh.
Since, you have a PM, you use aa in yyyy-MM-dd'T'hh:mm:ss aa to specify PM.
Your, input string has T in it so, you have to specify it as above format.
the option aa as mentioned in #pyy4917's answer might give legacy errors. To fix it, replace aa with a.
The full code as below:
df_time_twenfour = df_time.withColumn('Timestamp (CST)', \ ...
from_unixtime(unix_timestamp(col(('Timestamp (CST)')), \...
"yyyy-MM-dd'T'hh:mm:ss a"), "yyyy-MM-dd'T'HH:mm:ss.SSS'Z'"))

Extract date from a string column containing timestamp in Pyspark

I have a dataframe which has a date in the following format:
+----------------------+
|date |
+----------------------+
|May 6, 2016 5:59:34 AM|
+----------------------+
I intend to extract the date from this in the format YYYY-MM-DD ; so the result should be for the above date - 2016-05-06.
But when I extract is using the following:
df.withColumn('part_date', from_unixtime(unix_timestamp(df.date, "MMM dd, YYYY hh:mm:ss aa"), "yyyy-MM-dd"))
I get the following date
2015-12-27
Can anyone please advise on this? I do not intend to convert my df to rdd to use datetime function from python and want to use this in the dataframe it self.
There are some errors with your pattern. Here's a suggestion:
from_pattern = 'MMM d, yyyy h:mm:ss aa'
to_pattern = 'yyyy-MM-dd'
df.withColumn('part_date', from_unixtime(unix_timestamp(df['date'], from_pattern), to_pattern)).show()
+----------------------+----------+
|date |part_date |
+----------------------+----------+
|May 6, 2016 5:59:34 AM|2016-05-06|
+----------------------+----------+

Resources