I got the following dataset:
let data = datatable(Timestamp:datetime, Name:string, Value:int)
[
datetime(2022-02-18 10:00:00 AM), "AX_100A_A00", 100,
datetime(2022-02-18 10:01:00 AM), "BX_101B_B00", 200,
datetime(2022-02-18 10:02:00 AM), "CX_102C_C00", 300,
datetime(2022-02-18 10:03:00 AM), "DX_103D_D00", 400,
datetime(2022-02-18 10:04:00 AM), "EX_104E_E00", 500,
];
let mydict = dynamic(
{
"100A":"New York"
,"101B":"Geneva"
,"102C":"France"
,"103D":"US"
,"104E":"Canada"
}
);
data
| summarize result = max(Value) by Floor_Name = tostring(mydict[substring(Name, 3, 4)])
To illustrate what I am trying achieve here. Between the two underscores there is a code which represents a specific Location.
My question here, is how is it possible to add a condition which checks if the word between the 2 underscores doesn't exist as key in the dictionary, then just display the value between the 2 underscores. However if it does exist in the dictionary, then display its friendly name. Let us assume that a new name was added FX_105F_F00. In this case, since it is not found in the dictionary, then no friendly name should be displayed. It will be displayed as it is. An iff condition should be added to the floor name in the code but how should the syntax be ?
Few things to note here:
It is valid to address non-existing key within a JSON document.
The returned result in that case would be null.
The string data type doesn't support null values.
Using tostring() on null value would return an empty string.
coalesce() does work for empty string, the same way it works for null values of other data types.
So, for 105F, mydict["105F"] is null and tostring(mydict["105F"]) is an empty string, so coalesce proceed to the 2nd value (Floor_Code).
let data = datatable(Timestamp:datetime, Name:string, Value:int)
[
datetime(2022-02-18 10:00:00 AM), "AX_100A_A00", 100,
datetime(2022-02-18 10:01:00 AM), "BX_101B_B00", 200,
datetime(2022-02-18 10:02:00 AM), "CX_102C_C00", 300,
datetime(2022-02-18 10:03:00 AM), "DX_103D_D00", 400,
datetime(2022-02-18 10:04:00 AM), "EX_104E_E00", 500,
datetime(2022-02-18 10:05:00 AM), "FX_105F_F00", 600
];
let mydict = dynamic(
{
"100A":"New York"
,"101B":"Geneva"
,"102C":"France"
,"103D":"US"
,"104E":"Canada"
}
);
data
| extend Floor_Code = substring(Name, 3, 4)
| summarize result = max(Value) by Floor_Name = coalesce(tostring(mydict[Floor_Code]), Floor_Code)
Floor_Name
result
New York
100
Geneva
200
France
300
US
400
Canada
500
105F
600
Fiddle
Related
Here is my data set in kusto and I am trying to generate "releaseRank" column based on the release column value.
Input Dataset:
let T = datatable(release:string, metric:long)
[
"22.05", 20,
"22.04", 40,
"22.03", 50,
"22.01", 560
];
T
|take 100;
desired output :
found that there is serialize and row_number kusto
T
|serialize
|extend releaseRank = row_number()
|take 100;
But if the release value is repeated, i need the releaseRank to be same for eg. given the data set, i am not getting the desired output
T = datatable(release:string, metric:long)
[
"22.05", 20,
"22.05", 21,
"22.04", 40,
"22.03", 50,
"22.01", 560
];
T
|serialize
|extend releaseRank = row_number()
|take 100;
expected output
This should do what we want.
let T = datatable(release:string, metric:long)
[
"22.05", 20,
"22.05", 21,
"22.04", 40,
"22.03", 50,
"22.01", 560
];
T
| sort by release desc , metric asc
| extend Rank=row_rank(release)
22.05 20 1
22.05 21 1
22.04 40 2
22.03 50 3
22.01 560 4
Background
We have a dataset with the following format in Azure Data Explorer.
sensorid
timestamp
value
valve1
24-03-2021
123
valve1
23-03-2021
234
cylinderspeed
23-03-2021
1.2
valvestatus
23-03-2021
open
valvestatus
24-03-2021
closed
cylinderspeed
25-03-2021
2
The different sensors have different reporting intervals, some report every second, some a few times per day.
By using this query
datatable (sourcetimestamp: datetime, sensorid:string, value:dynamic)
[datetime(2021-03-23), "valve1", 123,
datetime(2021-03-24), "valve1", 234,
datetime(2021-03-23), "cylinderspeed", 1.2,
datetime(2021-03-23), "valvestatus", "open",
datetime(2021-03-24), "valvestatus", "closed",
datetime(2021-03-25), "cylinderspeed", 2]
| summarize average=any(value) by bin(sourcetimestamp, 1s), sensorid
| evaluate pivot(sensorid, any(average))
I can generate this table
timestamp
valve1
cylinderspeed
valvestatus
23-03-2021
123
1,2
open
24-03-2021
234
closed
25-03-2021
2
The problem
How can I continue on the above query so I can fill empty cells with the previous value for from that column?
You can use one of the series_fill functions such as series_fill_forward. Note that the easiest way to get the arrays to fill is by using the make-series operator
since timeseries expects numeric values in the series I translated the enum of the valvestatus to double.
datatable (sourcetimestamp: datetime, sensorid:string, value:dynamic)
[datetime(2021-03-23), "valve1", 123,
datetime(2021-03-24), "valve1", 234,
datetime(2021-03-23), "valvestatus", "open",
datetime(2021-03-24), "valvestatus", "closed",
datetime(2021-03-23), "cylinderspeed", 1.2,
datetime(2021-03-24), "cylinderspeed", 2]
| extend value = case(value=="open", double(1), value=="closed", double(0), value)
| make-series values = any(value) default=double(null) on sourcetimestamp from(datetime(2021-03-23 00:00:00.0000000)) to(datetime(2021-03-24 00:00:00.0000000)) step 1h by sensorid
| extend values = series_fill_forward(values)
I am stuck with a Kusto query.
This is what I want to do - I would like to show day wise sales amount with the previous month's sales amount on the same day.
datatable(DateStamp:datetime, OrderId:string, SalesAmount:int)
[
"02-01-2019", "I01", 100,
"02-01-2019", "I02", 200,
"02-02-2019", "I03", 250,
"02-02-2019", "I04", 150,
"02-03-2019", "I13", 110,
"01-01-2019", "I10", 20,
"01-02-2019", "I11", 50,
"01-02-2019", "I12", 30,
]
| extend SalesDate = format_datetime(DateStamp, 'MM/dd/yyyy')
| summarize AmountOfSales = sum(SalesAmount) by SalesDate
This is what I see.
And, instead this is what I want to show as result --
I couldn't figure out how to add multiple summarize operator in one query.
Here's an option:
datatable(DateStamp:datetime, OrderId:string, SalesAmount:int)
[
"02-01-2019", "I01", 100,
"02-01-2019", "I02", 200,
"02-02-2019", "I03", 250,
"02-02-2019", "I04", 150,
"02-03-2019", "I13", 110,
"01-01-2019", "I10", 20,
"01-02-2019", "I11", 50,
"01-02-2019", "I12", 30,
]
| summarize AmountOfSales = sum(SalesAmount) by bin(DateStamp, 1d)
| as hint.materialized = true T
| extend prev_month = datetime_add("Month", -1, DateStamp)
| join kind=leftouter T on $left.prev_month == $right.DateStamp
| project SalesDate = format_datetime(DateStamp, 'MM/dd/yyyy'), AmountOfSales, AmountOfSalesPrevMonth = coalesce(AmountOfSales1, 0)
SalesDate
AmountOfSales
AmountOfSalesPrevMonth
01/01/2019
20
0
01/02/2019
80
0
02/01/2019
300
20
02/02/2019
400
80
02/03/2019
110
0
I'm running into an issue when I'm trying to create a histogram of specific createdAt datetimes for orders. The issue is that even after created timezone aware datetimes, the .weekday() shows up as the same day, even though it should be a different time
The code I'm using to test this occurrence is as follows:
import datetime
import pytz
value = {
'createdAt': '2017-04-24T00:48:03+00:00'
}
created_at = datetime.datetime.strptime(value['createdAt'], '%Y-%m-%dT%H:%M:%S+00:00')
timezone = pytz.timezone('America/Los_Angeles')
created_at_naive = created_at
created_at_aware = timezone.localize(created_at_naive)
print(created_at_naive) # 2017-04-24 00:48:03
print(created_at_aware) # 2017-04-24 00:48:03-07:00
print(created_at_naive.weekday()) # 0 (Monday)
print(created_at_aware.weekday()) # 0 (should be Sunday)
The problem is that you need to actually change the datetime to the new timezone:
>>> timezone('UTC').localize(created_at)
datetime.datetime(2017, 4, 24, 0, 48, 3, tzinfo=<UTC>)
>>>timezone('UTC').localize(created_at).astimezone(timezone('America/Los_Angeles'))
datetime.datetime(2017, 4, 23, 17, 48, 3, tzinfo=<DstTzInfo 'America/Los_Angeles' PDT-1 day, 17:00:00 DST>)
I have a data frame df:
PRICE
2004-03-19 36.250000
2004-03-20 36.237500
2004-03-21 36.225000
2004-03-22 36.212500
etc...
The index is of type:
DatetimeIndex(['2004-03-19', '2004-03-20', '2004-03-21', ...],
dtype='datetime64[ns]', length=1691, freq='D')
I want to retrieve the PRICE at a certain day using df[datetime.date(2004,3,19)]. This is what pandas does:
KeyError: datetime.date(2004, 3, 19)
The following works, but that can't be the way it is supposed to work:
df[df.index.isin(pd.DatetimeIndex([datetime.date(2004,3,19)]))].PRICE.values[0]
The problem here is that the comparison is being performed for an exact match, as none of the times are 00:00:00 then no matches occur.
You can use loc with DatetimeIndex:
print df.loc[pd.DatetimeIndex(['2004-3-19'])]
PRICE
2004-03-19 36.25
Or you can use loc, convert string 2004-3-19 to_datetime and get date of DatetimeIndex:
print df.loc[pd.to_datetime('2004-3-19').date()]
PRICE 36.25
Name: 2004-03-19 00:00:00, dtype: float64
If you need value of PRICE:
print df.loc[pd.DatetimeIndex(['2004-3-19']), 'PRICE']
2004-03-19 36.25
Name: PRICE, dtype: float64
print df.loc[pd.DatetimeIndex(['2004-3-19']), 'PRICE'].values[0]
36.25
print df.loc[pd.to_datetime('2004-3-19').date(), 'PRICE']
36.25
But if add time to datetime, DatetimeIndex match:
print df.loc[pd.to_datetime('2004-3-19 00:00:00')]
PRICE 36.25
Name: 2004-03-19 00:00:00, dtype: float64
print df.loc[pd.to_datetime('2004-3-19 00:00:00'), 'PRICE']
36.25
Your index appears to be timestamps, whereas you are trying to equate them to datetime.date objects.
Rather than trying to retrieve the price via df[datetime.date(2004,3,19)], I would simply recommend df['2004-3-19'].
If you are intent on using datetime.date values, you should first convert the index.
df.index = [d.date() for d in df.index]