KQL reformat table add columns based on distinct values in column

KQL reformat table add columns based on distinct values in column - azure-data-explorer

I'm looking for a smart way in Kusto Query Language (KQL) to reformat a table. One Column (in this example Car-column) gives the kind of new rows. So I'm looking for a KQL pipe command to add columns and reduce the number of rows by reordering the content.
E.g. I would like to format this example table:
Distance
avg. Velocity
Car
0
0
Audi
0
0
VW
0
0
Porsche
200
60
Audi
200
55
VW
200
70
Porsche
400
63
Audi
400
54
VW
400
77
Porsche
to look like this:
Distance
Audi
VW
Porsche
0
0
0
0
200
60
55
70
400
63
54
77
Is there a good (maybe one line) kql-cmd to get to this result?
Background information: After an Azure Digital Twins Query, joined with a Azure Data Explorer table and some piped commands, I get Table 1, but I want to plot the data as seperate graphes in one grafana diagram. Therefore I need the data as shown in Table 2. Currently I use hard coded car names e.g. '| where Car = "Audi"' and join columns together... That's not efficient nor reusable. There must be a better way!
I'm looking forward for your answers :)
Chris

The short answer for your question is to use the pivot plugin
for example:
datatable(Distance:long, avg_Velocity:long, Car:string)
[
0, 0, "Audi",
0, 0, "VW",
0, 0, "Porsche",
200, 60, "Audi",
200, 55, "VW",
200, 70, "Porsche",
400, 63, "Audi",
400, 54, "VW",
400, 77, "Porsche",
]
| evaluate pivot(Car, avg(avg_Velocity))
Distance
Audi
Porsche
VW
0
0
0
0
200
60
70
55
400
63
77
54
However, you should be able to chart the original table in Grafana as is by specifying the series columns correctly for the specific chart that you need. I don't have Grafana around but here is how it would be in the Azure Data Explorer Dashboard:

Related

In a series of series, how to subtract every 1st number in each sub-series event from every nth number in those events?

I have multiple series of timepoints. Some series have five timepoints, others have ten or fifteen timepoints. The series are in multiples of five because the event I am measuring is always five timepoints long; some recordings have multiple events in succession. For instance:
Series 1:
0
77
98
125
174
Series 2:
0
69
95
117
179
201
222
246
277
293
0 marks the beginning of each series. Series 1 is a single event, but Series 2 is two events in succession. The 6th timepoint in Series 2 is the start of the second event in that series.
I have an R dataframe that contains every timepoint in one column:
dd <- data.frame(
timepoint=c(0, 77, 98, 125, 174,
0, 69, 95, 117, 179, 201, 222, 246, 277, 293)
)
I need to know the duration from the start of each event to the 4th timepoint in each event. For the above data, that means:
Duration 1: 125 - 0 = 125
Duration 2: 179 - 0 = 179
Duration 3: 277 - 201 = 76
How can I write a simple piece of R code that will tell me the duration of that interval regardless of how many series or events there are, i.e. regardless of how many numbers are in the column?
I tried using diff() and seq_along(), but that seems only useful for every nth number, which doesn't work in this case.
diff(vec[seq_along(vec) %% 4 == 1])

This is maybe one way to do it with dplyr. We break up the data into "runs" which reset at each 0 and them we have the "sequences" which reset each 5 values.
dd %>%
group_by(run =cumsum(timepoint==0)) %>%
mutate(seq = (row_number()-1) %/% 5 + 1) %>%
group_by(run, seq) %>%
summarize(diff=timepoint[4]-timepoint[1])
# run seq diff
# <int> <dbl> <dbl>
# 1 1 1 125
# 2 2 1 117
# 3 2 2 76
It makes it somewhat easy to tie the value back to where it came from.
If you just wanted to use indexing, here's a helper function
diff4v1 <- function(x) {
idx <- (seq_along(x)-1) %% 5+1;
x[idx==4] - x[idx==1]
}
diff4v1(dd$timepoint)
# [1] 125 117 76

This is your data frame (hypothetical)
df = data.frame(series = round(rnorm(40, 100, 50)))
head(df)
series
1 16
2 35
3 75
4 125
5 190
6 85
And these are your differences
idx = c(1:nrow(df))
df[which(idx %% 5 == 4), "series"] - df[which(idx %% 5 == 1), "series"]
[1] 109 -38 -101 -47 34 -52 -63 -5

dplyr mutate column with nearest value in external list

I'm trying to mutate a column and populate it with exact matches from a list if those occur, and if not, the closest match possible.
My data frame looks like this:
index <- seq(1, 10, 1)
blockID <- c(100, 120, 132, 133, 201, 207, 210, 238, 240, 256)
df <- as.data.frame(cbind(index, blockID))
index blockID
1 1 100
2 2 120
3 3 132
4 4 133
5 5 201
6 6 207
7 7 210
8 8 238
9 9 240
10 10 256
I want to mutate a new column that checks whether blockID is in a list. If yes, it should just keep the value of blockID. If not, It should return the nearest value in blocklist:
blocklist <- c(100, 120, 130, 150, 201, 205, 210, 238, 240, 256)
so the additional column should contain
100 (match),
120 (match),
130 (no match for 132--nearest value is 130),
130 (no match for 133--nearest value is 130),
201,
205 (no match for 207--nearest value is 205),
210,
238,
240,
256
Here's what I've tried:
df2 <- df %>% mutate(blockmatch = ifelse(blockID %in% blocklist, blockID, ifelse(match.closest(blockID, blocklist, tolerance = Inf), "missing")))
I just put in "missing" to complete the ifelse() statements, but it shouldn't actually be returned anywhere since the preceding cases will be fulfilled for every value of blockID. However, the resulting df2 just has "missing" in all the cells where it should have substituted the nearest number. I know there are base R alternatives to match.closest but I'm not sure that's the problem. Any ideas?

You don't need if..else. Your rule can simplified by saying that we always get the blocklist element with least absolute difference when compared to blockID. If values match then absolute difference is 0 (which will always be the least).
With that here's a simple base R solution -
df$blockmatch <- sapply(df$blockID, function(x) blocklist[order(abs(x - blocklist))][1])
index blockID blockmatch
1 1 100 100
2 2 120 120
3 3 132 130
4 4 133 130
5 5 201 201
6 6 207 205
7 7 210 210
8 8 238 238
9 9 240 240
10 10 256 256
Here are a couple of ways with dplyr -
df %>%
rowwise() %>%
mutate(
blockmatch = blocklist[order(abs(blockID - blocklist))][1]
)
df %>%
mutate(
blockmatch = sapply(blockID, function(x) blocklist[order(abs(x - blocklist))][1])
)
Thanks to #Onyambu, here's a faster way -
df$blockmatch <- blocklist[max.col(-abs(sapply(blocklist, '-', df$blockID)))]

Given a list of rectangle coordinates, how can we find all possible combinations of polygons from these coordinates?

Given the following list of region coordinates:
x y Width Height
1 1 65 62
1 59 66 87
1 139 78 114
1 218 100 122
1 311 126 84
1 366 99 67
1 402 102 99
7 110 145 99
I wish to identify all possible rectangle combinations that can be formed by combining any two or more of the above rectangles. For instance, one rectangle could be
1 1 66 146 by combining 1 1 65 62 and 1 59 66 87
What would be the most efficient way to find all possible combinations of rectangles using this list?
Okay. I'm sorry for not being specific about the problem.
I have been working on an algorithm for object detection that identifies different windows across the image that might have the object. But sometimes, the object gets divided into several windows. So, when looking for objects, I want to use all the windows that I identified as well as try their different combinations for object detection.
So far I have tried using 2 loops and going across all the windows one by one
: If the x coordinate of the second loop window lies in the first loop window, then I merge those two by taking the left-most, top-most, right-most and bottom-most coordinates.
However, this has been taking a lot of time and there are a lot of duplicates present in the final output. I would like to see if there is a more efficient and easier way to do this.
I hope this information helps.

r - How to create vector with for loops and ifelse

I'm having a problem with nested for loops and ifelse statements. This is my dataframe abund:
Species Total C1 C2 C3 C4
1 Blue 223 73 30 70 50
2 Black 221 17 50 56 98
3 Yellow 227 29 99 74 25
4 Green 236 41 97 68 30
5 Red 224 82 55 21 66
6 Orange 284 69 48 73 94
7 Black 154 9 63 20 62
8 Red 171 70 58 13 30
9 Blue 177 57 27 8 85
10 Orange 197 88 61 18 30
11 Orange 112 60 8 31 13
I would like to add together some of abund’s columns but only if they match the correct species I’ve specified in the vector colors.
colors <- c("Black", "Red", "Blue")
So, if the Species in abund matches the species in color then add columns C2 through C4 together in a new vector minus. If the species in abund does not match the species in color then add a 0 to the new vector minus.
I'm having trouble with my code and hope it's just a small matter of defining a range, but I'm not sure. This is my code so far:
# Use for loop to create vector of sums for select species or 0 for species not selected
for( i in abund$Species)
{
for( j in colors)
{
minus <- ifelse(i == j, sum(abund[abund$Species == i,
"C2"]:abund[abund$Species == i, "C4"]), 0)
}
}
Which returns this: There were 12 warnings (use warnings() to see them)
and this "vector": minus [1] 0
This is my target:
minus
[1] 150 204 0 0 142 0 145 101 120 0 0
Thank you for your time and help with this.

This is probably better done without any loops.
# Create the vector
minus <- rep(0, nrow(abund))
# Identify the "colors" cases
inColors <- abund[["Species"]] %in% colors
# Set the values
minus[inColors] <- rowSums(abund[inColors, c("C2","C3","C4")])
Also, for what it is worth there are quite a few problems with your original code. First, your first for loop isn't doing what you think. In each round, the value of i is being set to the next value in abund$Species, so first it is Blue then Black then Yellow, etc. As a result, then you index using abund[abund$Species == i, ], you may return multiple rows (ex. Blue will give you 1 and 9, since both those rows Species == "Blue").
Second when you make the statement abund[abund$Species == i, "C2"]:abund[abund$Species == i, "C4"] you are not indexing the columns C2 C3 and C4 you are making a sequence starting at the value in C2 and ending at the value in C4. For example, when i == "Yellow" it returns 99:25 or 99, 98, 97, ... , 26, 25. The reason you were getting those warnings was a combination of this problem and the last one. For example, when i == "Blue", you were trying to make a sequence starting at both 30 and 27 and ending at both 50 and 85. The warning was saying that it was just using the first number in your start and finish and giving you 30:50.
Finally, you were constantly over writing your value of minus rather than adding to it. You need to first create minus as above and index into it for the assignment like this minus[i] <- newValue.

Note that ifelse is vectorized so you usually don't need any for loops when using it.
I like Barker's answer best, but if you wanted to do this with ifelse this is the way:
abund$minus = with(abund, ifelse(
Species %in% colors, # if the species matches
C2 + C3 + C4, # add the columns
0 # otherwise 0
))
Even though this is just one line and Barker's is 3, on large data it will be slightly more efficient to avoid ifelse.
However, ifelse statements can be nested and are often easier to work with when conditions get complicated - so there are definitely good times to use them. On small to medium sized data the speed difference will be negligible so just use whichever you think of first.

# Create a column called minus with the length of the number of existing rows.
# The default value is zero.
abund$minus <- integer(nrow(abund))
# Perform sum of C2 to C4 only in those rows where Species is in the colors vector
abund$minus[abund$Species %in% colors] <- rowSums(abund[abund$Species %in% colors,5:7])

Breaking down a timed sequence into episodes

I'm trying to break down a vector of event times into episodes. An episode must meet 2 criteria. 1) It consists of 3 or more events and 2) those events have inter-event times of 25 time units or less. My data is organized in a data frame as shown below.
So far, I figured out that I can find the difference between events with diff(EventTime). By creating a logical vector that corresponds to events that the 2nd inter-event criterion, I can use rle(EpisodeTimeCriterion) to get a the total number, and length of episodes.
EventTime TimeDifferenceBetweenNextEvent EpisodeTimeCriterion
25 NA NA
75 50 TRUE
100 25 TRUE
101 1 TRUE
105 4 TRUE
157 52 FALSE
158 1 TRUE
160 2 TRUE
167 7 TRUE
169 2 TRUE
170 1 TRUE
175 5 TRUE
178 3 TRUE
278 100 FALSE
302 24 TRUE
308 6 TRUE
320 12 TRUE
322 459 FALSE
However, I would like to know the timing of the episodes and 'rle()' doesn’t let me do that.
Ideally I would like to generate a data frame that looks like this:
Episode EventsPerEpisode EpisodeStartTime EpisodeEndTime
1 4 75 105
2 7 158 178
3 3 302 322
I know that this is probably a simple problem but being new to R, the only solution I can envision is some series of loops. Is there a way of doing this without loops? Or is there package that lends itself to this sort of analysis?
Thanks!
Edited for clarity. Added a desired outcome data fame and expanded the example data to make it clearer.

You've got the pieces you need. You really need to make a variable that gives each episode a number/name so you can group by it. rle(...)$length gives you run lengths, so just use rep to repeat a number that number of times:
runs <- rle(df$EpisodeTimeCriterion)$lengths # You don't need this extra variable, but it makes the code more readable
df$Episode <- rep(1:length(runs), runs)
so df looks like
> head(df)
EventTime TimeDifferenceBetweenNextEvent EpisodeTimeCriterion Episode
1 25 NA NA 1
2 75 50 TRUE 2
3 100 25 TRUE 2
4 101 1 TRUE 2
5 105 4 TRUE 2
6 157 52 FALSE 3
Now use dplyr to summarize the data:
library(dplyr)
df2 <- df %>% filter(EpisodeTimeCriterion) %>% group_by(Episode) %>%
summarise(EventsPerEpisode = n(),
EpisodeStartTime = min(EventTime),
EpisodeEndTime = max(EventTime))
which returns
> df2
Source: local data frame [3 x 4]
Episode EventsPerEpisode EpisodeStartTime EpisodeEndTime
(int) (int) (dbl) (dbl)
1 2 4 75 105
2 4 7 158 178
3 6 3 302 320
If you want your episode numbers to be integers starting with one, you can clean up with
df2$Episode <- 1:nrow(df2)
Data
If someone wants to play with the data, the results of dput(df) before running the above code:
df <- structure(list(EventTime = c(25, 75, 100, 101, 105, 157, 158,
160, 167, 169, 170, 175, 178, 278, 302, 308, 320, 322), TimeDifferenceBetweenNextEvent = c(NA,
50, 25, 1, 4, 52, 1, 2, 7, 2, 1, 5, 3, 100, 24, 6, 12, 459),
EpisodeTimeCriterion = c(NA, TRUE, TRUE, TRUE, TRUE, FALSE,
TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FALSE, TRUE, TRUE,
TRUE, FALSE)), .Names = c("EventTime", "TimeDifferenceBetweenNextEvent",
"EpisodeTimeCriterion"), row.names = c(NA, -18L), class = "data.frame")

Here is one approach I came up with using a combination of cut2 from Hmisc package and cumsum to label episodes into numbers:
library(Hmisc)
library(dplyr)
df$episodeCut <- cut2(df$TimeDifferenceBetweenNextEvent, c(26))
df$episode <- cumsum((df$episodeCut == '[ 1,26)' & lag(df$episodeCut) != '[ 1,26)') | df$episodeCut != '[ 1,26)')
Output is as follows:
EventTime TimeDifferenceBetweenNextEvent EpisodeTimeCriterion episodeCut episode
1 25 50 FALSE [26,52] 1
2 75 25 TRUE [ 1,26) 2
3 100 1 TRUE [ 1,26) 2
4 101 4 TRUE [ 1,26) 2
5 105 52 TRUE [26,52] 3
6 157 52 FALSE [26,52] 4
As you can see, it tags rows 2, 3, 4 as belonging to a single episode.
Is this what you are looking for? Not sure from your description. So, my answer may be wrong.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

KQL reformat table add columns based on distinct values in column - azure-data-explorer

Related

In a series of series, how to subtract every 1st number in each sub-series event from every nth number in those events?

dplyr mutate column with nearest value in external list

Given a list of rectangle coordinates, how can we find all possible combinations of polygons from these coordinates?

r - How to create vector with for loops and ifelse

Breaking down a timed sequence into episodes

Categories

Resources