Kusto query analysis based on Key-Value telemetry data - flatten dataset - azure-data-explorer

I have a kusto table containing telemetry data like the following:
Timestamp
Key
Value
2022-11-10 10:00:01
Position
87.3
2022-11-10 10:00:13
Temperature
10.2
2022-11-10 10:00:55
Temperature
10.4
2022-11-10 10:01:25
Position
81.3
2022-11-10 10:01:42
Temperature
12.2
2022-11-10 10:02:13
Temperature
12.8
2022-11-10 10:02:44
Position
74.3
2022-11-10 10:03:01
Temperature
18.6
2022-11-10 10:03:19
Position
87.3
2022-11-10 10:03:38
Temperature
10.6
2022-11-10 10:04:00
Temperature
10.7
2022-11-10 10:04:00
Temperature
10.1
2022-11-10 10:04:25
Position
80.3
2022-11-10 10:04:59
Temperature
12.6
I would like to perform some analysis where I calculate the average temperature in a certain area; in buckets of 5 minutes.
Therefore I would like to average all temperatures as of the latest position being sent until the position is updated:
I would like to have something as follows
Timestamp
Area
Temperature
2022-11-10 10:00:00
1
10.4
2022-11-10 10:00:00
2
12,53
2022-11-10 10:00:00
3
18.6
I tried extinding the table with a Area & temperature column based on the key value:
Timestamp
Key
Value
Area
Temperature
2022-11-10 10:00:01
Position
87.3
1
2022-11-10 10:00:13
Temperature
10.2
10.2
2022-11-10 10:00:55
Temperature
10.4
10.4
2022-11-10 10:01:25
Position
81.3
2
2022-11-10 10:01:42
Temperature
12.2
12.2
2022-11-10 10:02:13
Temperature
12.8
12.8
2022-11-10 10:02:44
Position
74.3
3
2022-11-10 10:03:01
Temperature
18.6
18.6
2022-11-10 10:03:19
Position
87.3
1
2022-11-10 10:03:38
Temperature
10.6
10.6
2022-11-10 10:04:00
Temperature
10.7
10.7
2022-11-10 10:04:00
Temperature
10.1
10.1
2022-11-10 10:04:25
Position
80.3
2
2022-11-10 10:04:59
Temperature
12.6
12.6
I then tried to fill up the null values with the previous non-null value found followed by an aggregation, however the prev() function does not allow to find previous non-null values.
Currently I do not have any idea on how to achieve my goal.

You can use the scan operator for analyzing sequences. Here is a possible solution, but without the Area code and with 4 distinct positions.
datatable (Timestamp: datetime, Key: string, Value: real) [
datetime(2022-11-10 10:00:01), "Position", 87.3,
datetime(2022-11-10 10:00:13), "Temperature", 10.2,
datetime(2022-11-10 10:00:55), "Temperature", 10.4,
datetime(2022-11-10 10:01:25), "Position", 81.3,
datetime(2022-11-10 10:01:42), "Temperature", 12.2,
datetime(2022-11-10 10:02:13), "Temperature", 12.8,
datetime(2022-11-10 10:02:44), "Position", 74.3,
datetime(2022-11-10 10:03:01), "Temperature", 18.6,
datetime(2022-11-10 10:03:19), "Position", 87.3,
datetime(2022-11-10 10:03:38), "Temperature", 10.6,
datetime(2022-11-10 10:04:00), "Temperature", 10.7,
datetime(2022-11-10 10:04:00), "Temperature", 10.1,
datetime(2022-11-10 10:04:25), "Position", 80.3,
datetime(2022-11-10 10:04:59), "Temperature", 12.6,
]
| sort by Timestamp asc
| scan declare (Position: real) with (
step Pos output= none: Key == "Position";
step Temp: Key == "Temperature" => Position= Pos.Value;
)
| summarize round(avg(Value), 2) by Position
Result:
Position
avg_Value
87.3
10.4
81.3
12.5
74.3
18.6
80.3
12.6

Related

How do I display dates on x-axis only when the date changes in ggplot?

order_dates Value
1 2022-08-27 00:00:10 80.9
2 2022-08-27 00:16:40 81.6
3 2022-08-27 00:33:28 81.2
4 2022-08-27 05:37:12 81.4
5 2022-08-28 08:52:24 89.0
6 2022-08-28 09:50:28 100.6
7 2022-08-28 12:30:08 84.9
I would like to plot this data and display all times on the x-axis, however, I'd like to display the date once for every instance the date changes. So I would have a date marker corresponding to row 1 and row 5. How can I achieve this?
ggplot(df,aes(order_dates,Value)) +
geom_line() +
scale_x_datetime(labels = date_format("%Y-%m-%d"), date_breaks = "day")

Modify R column by creating function, code error

I created these lines (function) to modify a specific column of a data frame, I want to use this function to run it for different column and data frame, but the function does not work, I got a error code message.
change.date <- function(df_date,col_nb,first.year, second.year){
df_date$col_nb <- gsub(first.year, second.year, df_date$col_nb)
df_date$col_nb <- as.Date(df_date$col_nb)
df_date$col_nb <- as.numeric(df_date$col_nb)
}
change.date(df_2020,df_2020[1], "2020","2020")
Error in $<-.data.frame`(*tmp*`, "col_nb", value = character(0)):
replacement table has 0 rows, replaced table has 7265
my reproducible data are:
df_2020 <- dput(test_qst)
structure(list(Date = structure(c(1588809600, 1588809600, 1588809600,
1588809600, 1588809600, 1588809600, 1588809600, 1588809600, 1588809600,
1588809600, 1588809600, 1588809600, 1588809600, 1588809600), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), Depth = c(1.72, 3.07, 3.65, 4.58,
5.39, 6.31, 7.27, 8.57, 9.73, 10.78, 11.71, 12.81, 13.79, 14.96
), salinity = c(34.7299999999999, 34.79, 34.76, 34.78, 34.77,
34.79, 34.76, 34.71, 34.78, 34.78, 34.7999999999999, 34.86, 34.7999999999999,
34.83)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-14L))
You may try
change.date <- function(df_date,col_nb,first.year, second.year){
df_date[[col_nb]] <- gsub(first.year, second.year, df_date[[col_nb]])
df_date[[col_nb]] <- as.Date(df_date[[col_nb]])
df_date[[col_nb]] <- as.numeric(df_date[[col_nb]])
df_date
}
change.date(df_2020, "Date", "2020","2020")
Date Depth salinity
<dbl> <dbl> <dbl>
1 18389 1.72 34.7
2 18389 3.07 34.8
3 18389 3.65 34.8
4 18389 4.58 34.8
5 18389 5.39 34.8
6 18389 6.31 34.8
7 18389 7.27 34.8
8 18389 8.57 34.7
9 18389 9.73 34.8
10 18389 10.8 34.8
11 18389 11.7 34.8
12 18389 12.8 34.9
13 18389 13.8 34.8
14 18389 15.0 34.8
One issue you may find when using gsub is that you lose the dates. Unless you need a numerical timescale, then it may be better to keep dates for plotting and analysis.
Using dplyr, this extracts the years, changes them, and then creates dates again, (even if they are the same year):
library(dplyr)
change.date <- function(df_date, col_nb = "Date", first.year, second.year) {
col_nb <- which(colnames(df_date) %in% col_nb)
df_date %>%
mutate(year = lubridate::year(.[[col_nb]])) %>%
mutate(year = ifelse(year == first.year, second.year, year)) %>%
mutate(Date = lubridate::make_date(year, lubridate::month(.[[col_nb]]), lubridate::day(.[[col_nb]]))) %>%
select(-year)
}
change.date(df_2020, "Date", 2020, 2020)
# A tibble: 14 x 3
Date Depth salinity
<date> <dbl> <dbl>
1 2020-05-07 1.72 34.7
2 2020-05-07 3.07 34.8
3 2020-05-07 3.65 34.8
4 2020-05-07 4.58 34.8
5 2020-05-07 5.39 34.8
6 2020-05-07 6.31 34.8
7 2020-05-07 7.27 34.8
8 2020-05-07 8.57 34.7
9 2020-05-07 9.73 34.8
10 2020-05-07 10.8 34.8
11 2020-05-07 11.7 34.8
12 2020-05-07 12.8 34.9
13 2020-05-07 13.8 34.8
14 2020-05-07 15.0 34.8
If you do want numerical dates, then use this instead of the second last line:
mutate(Date = as.numeric(lubridate::make_date(year, lubridate::month(.[[col_nb]]), lubridate::day(.[[col_nb]])))) %>%
One comment on your function is to be consistent on the case. Camel case, snake case or, less so, dot case are all acceptable, but using a combination makes it harder to keep track of variables, e.g. df_date versus first.year.

Read Quarterly time series data as Dates in R

Year A B C D E F
1993-Q1 15.3 5.77 437.02 487.68 97 86.9
1993-Q2 13.5 5.74 455.2 504.5 94.7 85.4
1993-Q3 12.9 5.79 469.42 523.37 92.4 82.9
:::
2021-Q1 18.3 6.48 35680.82 29495.92 182.2 220.4
2021-Q2 7.9 6.46 36940.3 30562.03 180.4 218
Dataset1 <- read.csv('C:/Users/s/Desktop/R/intro/data/Dataset1.csv')
class(Dataset1)
[1] "data.frame"
time_series <- ts(Dataset1, start=1993, frequency = 4)
class(time_series)
[1] "mts" "ts" "matrix"
I don't know how to proceed from there to read my Year column as dates (quaterly) instead of numbers!
Date class does not work well with ts class. It is better to use year and quarter. Using the input shown reproducibly in the Note at the end use read.csv.zoo with yearqtr class and then convert it to ts. The strip.white is probably not needed but we added it just in case.
library(zoo)
z <- read.csv.zoo("Dataset1.csv", FUN = as.yearqtr, format = "%Y-Q%q",
strip.white = TRUE)
tt <- as.ts(z)
tt
## A B C D E F
## 1993 Q1 15.3 5.77 437.02 487.68 97.0 86.9
## 1993 Q2 13.5 5.74 455.20 504.50 94.7 85.4
## 1993 Q3 12.9 5.79 469.42 523.37 92.4 82.9
class(tt)
## [1] "mts" "ts" "matrix"
as.integer(time(tt)) # years
## [1] 1993 1993 1993
cycle(tt) # quarters
## Qtr1 Qtr2 Qtr3
## 1993 1 2 3
as.numeric(time(tt)) # time in years
## [1] 1993.00 1993.25 1993.50
If you did want to use Date class it would be better to use a zoo (or xts) series.
zd <- aggregate(z, as.Date, c)
zd
## A B C D E F
## 1993-01-01 15.3 5.77 437.02 487.68 97.0 86.9
## 1993-04-01 13.5 5.74 455.20 504.50 94.7 85.4
## 1993-07-01 12.9 5.79 469.42 523.37 92.4 82.9
If you want a data frame or xts object then fortify.zoo(z), fortify.zoo(zd), as.xts(z) or as.xts(zd) can be used depending on which one you want.
Note
Lines <- "Year,A,B,C,D,E,F
1993-Q1,15.3,5.77,437.02,487.68,97,86.9
1993-Q2,13.5,5.74,455.2,504.5,94.7,85.4
1993-Q3,12.9,5.79,469.42,523.37,92.4,82.9
"
cat(Lines, file = "Dataset1.csv")
lubridate has really nice year-quarter function yq to convert year quarters to dates.
Dataset1<-structure(list(Year = c("1993-Q1", "1993-Q2", "1993-Q3", "1993-Q4", "1994-Q1", "1994-Q2"), ChinaGDP = c(15.3, 13.5, 12.9, 14.1, 14.1, 13.3), Yuan = c(5.77, 5.74, 5.79, 5.81, 8.72, 8.7), totalcredit = c(437.02, 455.2, 469.42, 521.68, 363.42, 389.01), bankcredit = c(487.68, 504.5, 523.37, 581.83, 403.48, 431.06), creditpercGDP = c(97, 94.7, 92.4, 95.6, 91.9, 90), creditGDPratio = c(86.9, 85.4, 82.9, 85.7, 82.8, 81.2)), row.names = c(NA, 6L), class = "data.frame")
library(lubridate)
library(dplyr)
df_quarter <- Dataset1 %>%
mutate(date=yq(Year)) %>%
relocate(date, .after=Year)
df_quarter
#> Year date ChinaGDP Yuan totalcredit bankcredit creditpercGDP
#> 1 1993-Q1 1993-01-01 15.3 5.77 437.02 487.68 97.0
#> 2 1993-Q2 1993-04-01 13.5 5.74 455.20 504.50 94.7
#> 3 1993-Q3 1993-07-01 12.9 5.79 469.42 523.37 92.4
#> 4 1993-Q4 1993-10-01 14.1 5.81 521.68 581.83 95.6
#> 5 1994-Q1 1994-01-01 14.1 8.72 363.42 403.48 91.9
#> 6 1994-Q2 1994-04-01 13.3 8.70 389.01 431.06 90.0
#> creditGDPratio
#> 1 86.9
#> 2 85.4
#> 3 82.9
#> 4 85.7
#> 5 82.8
#> 6 81.2
Created on 2022-01-15 by the reprex package (v2.0.1)

Replace from NA to random values

I wanna replace from NA to random values. This data frame have a columns like "Dayofweek" and I don't know how can i complete this data frame. I try by function missforest but this function work on columns with integer I think. Do you have any idea how I can complete all of the columns?
travel <- read.csv("https://openmv.net/file/travel-times.csv")
library(missForest)
summary(travel)
set.seed(82)
travel1 <- prodNA(travel, noNA = 0.2)
travel2 <- missForest(travel1)
You can use the imputeTS package for inserting random values to your time series. The function na_random can be used for this. The function can be used for numeric columns (the other columns will be left untouched, which might be useful, since you probably do not need random texts for the comments column)
You can call
library("imputeTS")
na_random(yourData)
and the function will look for the lowest and highest value of each column and insert random values between this bounds for you.
But you can also define your own bounds for the random values like this:
library("imputeTS")
na_random(yourData, lower_bound = 0, upper_bound = 25)
For your data this could look like this:
library("imputeTS")
# To read the input correctly and have the right data types
travel <- read.csv("https://openmv.net/file/travel-times.csv", na.strings = "")
travel$FuelEconomy <- as.numeric(travel$FuelEconomy)
# To perform the missing data replacement
travel <- na_random(travel)
First, if you want to read "" strings as NAs, you need an additional argument na.strings = "" in read.csv. Then, do you mean replacing an NA observation of a variable with the other random observation of the same variable? If so, consider the following procedure:
travel <- read.csv("https://openmv.net/file/travel-times.csv", na.strings = "")
set.seed(82)
res <- data.frame(lapply(travel, function(x) {
is_na <- is.na(x)
replace(x, is_na, sample(x[!is_na], sum(is_na), replace = TRUE))
}))
res looks like this
Date StartTime DayOfWeek GoingTo Distance MaxSpeed AvgSpeed AvgMovingSpeed FuelEconomy TotalTime MovingTime Take407All Comments
1 1/6/2012 16:37 Friday Home 51.29 127.4 78.3 84.8 8.5 39.3 36.3 No Medium amount of rain
2 1/6/2012 08:20 Friday GSK 51.63 130.3 81.8 88.9 8.5 37.9 34.9 No Put snow tires on
3 1/4/2012 16:17 Wednesday Home 51.27 127.4 82.0 85.8 8.5 37.5 35.9 No Heavy rain
4 1/4/2012 07:53 Wednesday GSK 49.17 132.3 74.2 82.9 8.31 39.8 35.6 No Accident blocked 407 exit
5 1/3/2012 18:57 Tuesday Home 51.15 136.2 83.4 88.1 9.08 36.8 34.8 No Rain, rain, rain
6 1/3/2012 07:57 Tuesday GSK 51.80 135.8 84.5 88.8 8.37 36.8 35.0 No Backed up at Bronte
7 1/2/2012 17:31 Monday Home 51.37 123.2 82.9 87.3 - 37.2 35.3 No Pumped tires up: check fuel economy improved?
8 1/2/2012 07:34 Monday GSK 49.01 128.3 77.5 85.9 - 37.9 34.3 No Pumped tires up: check fuel economy improved?
9 12/23/2011 08:01 Friday GSK 52.91 130.3 80.9 88.3 8.89 39.3 36.0 No Police slowdown on 403
10 12/22/2011 17:19 Thursday Home 51.17 122.3 70.6 78.1 8.89 43.5 39.3 No Start early to run a batch

Adding mini radar plots as markers on leaflet map

I have the following dataset of weather conditions in 5 different sites observed in 15-minute intervals over a year, and am developing a shiny app based on it.
site_id date_time latitude longitude ambient_air_tem~ relative_humidy barometric_pres~ average_wind_sp~ particulate_den~
<chr> <dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 arc1046 2019-11-15 09:15:00 -37.8 145. 14.4 65.4 1007. 7.45 3.9
2 arc1048 2019-11-15 09:15:00 -37.8 145. 14.0 65.5 1006. 6.95 4.4
3 arc1045 2019-11-15 09:15:00 -37.8 145. 14.8 60 1007. 4.93 3.9
4 arc1047 2019-11-15 09:15:00 -37.8 145. 14.4 66.1 1008. 7.85 4.5
5 arc1050 2019-11-15 09:15:00 -37.8 145. 14.1 64.7 1007. 5.8 3.9
6 arc1045 2019-11-15 09:30:00 -37.8 145. 15.4 57.1 1007. 4.43 3.8
7 arc1046 2019-11-15 09:30:00 -37.8 145. 14.8 63.2 1007. 7.6 4.5
8 arc1047 2019-11-15 09:30:00 -37.8 145. 15.2 62.7 1008 7.13 3.6
9 arc1048 2019-11-15 09:30:00 -37.8 145. 14.6 62.2 1007. 7.09 4.7
10 arc1050 2019-11-15 09:30:00 -37.8 145. 14.6 62.5 1007 5.94 3.5
I mapped the 5 sites using leaflet.
leaflet(quarter_hour_readings) %>%
addTiles() %>%
addCircleMarkers(
layerId = ~site_id,
label = ~site_id)
And now want to include radial(spider) plots on each of the markers on the map, upon selecting a single date. For now I have filtered out the data values at a single date, for the following radial plot.
library(fmsb)
dat <- rbind(c(85.00,100.00,2000.00,160.00,999.9,1999.9),
c(-40.00,0.00,10.00,0.00,0.00,0.00),
quarter_hour_readings %>%
filter(date_time == as.POSIXct("2019-11-15 09:15:00",tz="UTC")) %>%
column_to_rownames(var="site_id") %>%
select(c("ambient_air_temperature","relative_humidy","barometric_pressure", "average_wind_speed", "particulate_density_2.5", "particulate_density_10")))
radarchart(dat)
I am however unsure how to include these raidal plots on the respective markers on the map and if there was an easier way to handle this. Although I found this package to insert minicharts on leaflet maps, I wasn't able to find how to add radar plots on a map.
Note. Since you did not provide a reproducible dataset, I take some fake data.
You can follow the approach described here:
m <- leaflet() %>% addTiles()
rand_lng <- function(n = 5) rnorm(n, -93.65, .01)
rand_lat <- function(n = 5) rnorm(n, 42.0285, .01)
rdr_dat <- structure(list(total = c(5, 1, 2.15031008049846, 4.15322054177523,
2.6359076872468),
phys = c(15, 3, 12.3804132539814, 6.6208886719424,
12.4789917719968),
psycho = c(3, 0, 0.5, NA, 3),
social = c(5, 1, 2.82645894121379,
4.82733338139951, 2.81333662476391),
env = c(5, 1, 5, 2.5, 4)),
row.names = c(NA, -5L), class = "data.frame")
makePlotURI <- function(expr, width, height, ...) {
pngFile <- plotPNG(function() { expr }, width = width, height = height, ...)
on.exit(unlink(pngFile))
base64 <- httpuv::rawToBase64(readBin(pngFile, raw(1), file.size(pngFile)))
paste0("data:image/png;base64,", base64)
}
set.seed(1)
plots <- data.frame(lat = rand_lat(),
lng = rand_lng(),
radar = rep(makePlotURI({radarchart(rdr_dat)}, 200, 200, bg = "white"), 5))
m %>% addMarkers(icon = ~ icons(radar), data = plots)

Categories

Resources