Function to find the start and end of conditional selection - r

I have a data that looks as follows:
Date | Time | Temperature
16995 | "12:00" | 23
16995 | "12:30" | 24
...
17499 | "23:30" | 23
17500 | "00:00" | 24
I'm writing a function to select a range of cases based on certain start and end time points. To do this I need to determine the start_pt and end_pt indices which should match with a pair of rows in the dataframe.
select_case <- function(df,date,time) {
start_pt = 0
end_pt = 0
for (i in 1:nrow(df)) {
if ((date[i] == 17000) & (time[i] == "12:00")) {
start_pt <- i
return(start_pt)
} else {
next
}
}
for (i in start_pt:nrow(df)) {
if (date[i] == 17500) {
end_pt <- i - 1
return(end_pt)
break
} else {
next
}
}
return(df[start_pt:end_pt,])
}
When I called:
test <- select_case(data,data$Date,data$Time)
test
I expect the following:
Date | Time | Temperature
17000 | "12:00" | 23
17000 | "12:30" | 24
...
17499 | "23:00" | 23
17499 | "23:30" | 23
Instead i got
[1] 1
Not sure where i got it wrong here. When I separately ran each of the two for-loops from R console and substituting in the corresponding arguments for each loop, i got the correct indices for both start_pt and end_pt.

I tried putting each loop in a separate function, named sta(date,time) and end(date). Then I bind them in the following function:
binder <- function(date,time) {
return(sta(date,time),end(date))
}
and call
sta_end <- binder(date,time)
I got the error:
Error in return(sta(date, time), end(date)) :
multi-argument returns are not permitted
So i combined them and it worked:
binder <- function(date,time) {
return(c(sta(date,time),end(date)))
}
sta_end <- binder(date,time)
[1] 1 <an index for end_pt>
So the mistake i made in my original function is that i use return() 3 times and the function will only return the first one which is start_pt. So I took out the first two return() and retained the last one:
return(df[start_pt:end_pt,])
This worked, i got the expected result.

Related

Kusto: Apply function on multiple column values during bag_unpack

Given a dynamic field, say, milestones, it has value like: {"ta": 1655859586546, "tb": 1655859586646},
How do I print a table with columns like "ta", "tb" etc, with the single row as unixtime_milliseconds_todatetime(tolong(taValue)), unixtime_milliseconds_todatetime(tolong(tbValue)) etc.
I figured that I'll need to write a function that I can call, so I created this:-
let f = view(a:string ){
unixtime_milliseconds_todatetime(tolong(a))
};
I can use this function with a normal column as:- project f(columnName).
However, in this case, its a dynamic field, and the number of items in the list is large, so I do not want to enter the fields manually. This is what I have so far.
log_table
| take 1
| evaluate bag_unpack(milestones, "m_") // This gives me fields as columns
// | project-keep m_* // This would work, if I just wanted the value, however, I want `view(columnValue)
| project-keep f(m_*) // This of course doesn't work, but explains the idea.
Based on the mv-apply operator
// Generate data sample. Not part of the solution.
let log_table = materialize(range record_id from 1 to 10 step 1 | mv-apply range(1, 1 + rand(5), 1) on (summarize milestones = make_bag(pack_dictionary(strcat("t", make_string(to_utf8("a")[0] + toint(rand(26)))), 1600000000000 + rand(60000000000)))));
// Solution Starts here.
log_table
| mv-apply kv = milestones on
(
extend k = tostring(bag_keys(kv)[0])
| extend v = unixtime_milliseconds_todatetime(tolong(kv[k]))
| summarize milestones = make_bag(pack_dictionary(k, v))
)
| evaluate bag_unpack(milestones)
record_id
ta
tb
tc
td
te
tf
tg
th
ti
tk
tl
tm
to
tp
tr
tt
tu
tw
tx
tz
1
2021-07-06T20:24:47.767Z
2
2021-05-09T07:21:08.551Z
2022-07-28T20:57:16.025Z
2022-07-28T14:21:33.656Z
2020-11-09T00:54:39.71Z
2020-12-22T00:30:13.463Z
3
2021-12-07T11:07:39.204Z
2022-05-16T04:33:50.002Z
2021-10-20T12:19:27.222Z
4
2022-01-31T23:24:07.305Z
2021-01-20T17:38:53.21Z
5
2022-04-27T22:41:15.643Z
7
2022-01-22T08:30:08.995Z
2021-09-30T08:58:46.47Z
8
2022-03-14T13:41:10.968Z
2022-03-26T10:45:19.56Z
2022-08-06T16:50:37.003Z
10
2021-03-03T11:02:02.217Z
2021-02-28T09:52:24.327Z
2021-04-09T07:08:06.985Z
2020-12-28T20:18:04.973Z
9
2022-02-17T04:55:35.468Z
6
2022-08-02T14:44:15.414Z
2021-03-24T10:22:36.138Z
2020-12-17T01:14:40.652Z
2022-01-30T12:45:54.28Z
2022-03-31T02:29:43.114Z
Fiddle

Where condition in KQL

I am looking for help with Kusto query:
| where test == "Jump" and Time > 2
| where test == "Run" and Time > 20
| where test == "Stand" and Time > 5
It is interesting because I am not getting error message, however I should get results...
At least I get results when I run where commands separately, but when I execute query as one, no results...any idea why?
Test Time
Jump 10
Run 13
Stand 15
Jump 5
Run 15
Stand 4
The query you included is the equivalent of:
...
| where test == "Jump" and Time > 2
and test == "Run" and Time > 20
and test == "Stand" and Time > 5
| ...
In the data set you've provided, the intersection (and) of all of the conditions you included is empty, therefore no records are returned:
| where test == "Jump" and Time > 2: 2 matching records are (Jump, 10), (Jump, 5)
| where test == "Run" and Time > 20: there are no matching records.
| where test == "Stand" and Time > 5: 1 matching record is (Stand, 15)
I think your intention was to use the or keyword, e.g.:
...
| where (test == "Jump" and Time > 2)
or (test == "Run" and Time > 20)
or (test == "Stand" and Time > 5)
| ...
Otherwise, you'll need to clarify what the expected output is.

Kusto query to split pie chart in half as per results

I am trying to display result using Kusto KQL query in pie chart.The goal is to display pie chart as half n half color in case of failure and full color in case of pass.
Basically log from a site displays rows as pass and failed row .In case where all are pass , pie chart should display 100 % same color.In case of even single failure in any rows , it should display 50% one color and 50% other color.Below query works when 1) When all rows are pass as full color 2) when some are pass and some fail or even one fails (displays pie chart in half n half) color 3)BUT WHEN ALL ROW HAS FAILS ,this is displaying in one color and not splitting pie chart in half n half
QUERY I USED:
results
| where Name contains "jobqueues"
| where timestamp > ago(1h)
| extend PASS = (ErLvl)>2 )
| extend FAIL = ((ErLvl<2 )
| project PASS ,FAIL
| extend status = iff(PASS==true,"PASS","FAIL")
| summarize count() by status
| extend display = iff(count_>0,1,0)
| summarize percentile(display, 50) by status
| render piechart
Please suggest what can be done to solve this problem.Thanks in advance.
Let's summarize your question:
There are only two outcomes of your query:
A piechart showing 50% vs 50%
A piechart showing 100%
From your description we learn that when
All rows are PASS we plot piechart 2.
Any row has FAIL we plot piechart 1.
Lets see how we can achieve this after this line from your code:
| extend status = iff(PASS==true,"PASS","FAIL")
| summarize count() by status
We should have a table looking like so:
status
count_
PASS
x
FAIL
y
Looks like we need to perform some logic on this. You were originally plotting based on the operation result. My idea was to just generate a table of pass = 1 and fail = 1 for the 50%v50% case and another table of pass = 1 and fail = 0 for the 100% case.
So following that logic we need to perform the following mapping:
status
count_
status
count2
fail
>0
maps to
fail
1
pass
>0
pass
1
status
count_
status
count2
fail
>0
maps to
fail
1
pass
=0
pass
1
status
count_
status
count2
fail
=0
maps to
fail
0
pass
>0
pass
1
Logical representation:
(given count_ >=0):
if fail > 0 count2 = 0 else count 1
pass is always equal to 1
We only need to apply this to the row where status == FAILED but summarize doesn't guarantee a row if there are no observations
Guarantee summarize results:
| extend fail_count = iif(status == "FAIL", count_, 0)
| extend pass_count = iif(status == "PASS", count_, 0)
| project fail_count,pass_count
| summarize sum(fail_count), sum(pass_count)
Apply logic
| extend FAIL = iff(sum_fail_count > 0, 1, 0)
| extend PASS = 1
| project FAIL, PASS
Now our result is looking like:
PASS
FAIL
1
1 or 0
In order to plot this as a pie chart we just need to transpose it so the columns PASSED and FAILED are rows of the "status" column.
We can use a simple pack and mv-expand for this
//transpose for rendering
| extend tmp = pack("FAIL",FAIL,"PASS",PASS)
| mv-expand kind=array tmp
| project Status=tostring(tmp[0]), Count=toint(tmp[1])
| render piechart
And that's it!~
Final query:
results
| where Name contains "jobqueues"
| where timestamp > ago(1h)
| extend PASS = (ErLvl)>2 )
| extend FAIL = ((ErLvl<2 )
| project PASS ,FAIL
| extend status = iff(PASS==true,"PASS","FAIL")
| summarize count() by status
//ensure results
| extend fail_count = iif(status == "FAIL", count_, 0)
| extend pass_count = iif(status == "PASS", count_, 0)
| project fail_count,pass_count
| summarize sum(fail_count), sum(pass_count)
//apply logic
| extend FAIL = iff(sum_fail_count > 0, 1, 0)
| extend PASS = 1
| project FAIL, PASS
//transpose for rendering
| extend Temp = pack("FAIL",FAIL,"PASS",PASS)
| mv-expand kind=array Temp
| project Status=tostring(Temp[0]), Count=toint(Temp[1])
| render piechart

Replace Value in a Column Using a Loop and Custom Function - R

I have a data.frame with a column (named "color") in which every value is "black." I also have created a function that can replace "black" with other colors depending on another column's value (the "growth" column value). I need to create a loop that uses this function to replace the values in the "color" column according to the "growth" value
# Create a function
check_it <- function(x)
if(x>500){
return("green")
} else if(x<0) {
return("red")
} else {
return("blue")
}
# Create a loop using check_it
for(x in 1:nrow(all_data)) {
...
# Given this hint:
# You can use 1:nrow(all_data) as a set of indices
# to do something like the following inside the loop:
# all_data[i, "color"] <-
# check_it( all_data[i, "growth"] )
Any suggestions?
SAMPLE DATA
| station_id | timestamp | growth.x | growth.y | color |
--------------------------------------------------------
| DB1 | 1/14/01 | 59.916 | 59.9164 | black |
--------------------------------------------------------
| DB1 | 1/14/02 | 316.128 | 316.128 | black |
--------------------------------------------------------
| DB1 | 1/14/03 | -12.456 | -12.456 | black |
--------------------------------------------------------
| DB1 | 1/14/04 | 537.443 | 537.443 | black |
--------------------------------------------------------
Thanks for the help! Thanks to the comments I was able to understand that my function wouldn't work without the proper arguments inserted (I just had "x") and didn't tell my function where to look for the "growth" value.
Here's the code I ended up using:
check_it <- function(x, )
if(all_data[x, "growth.x"] >500){
return("green")
} else if(all_data[x, "growth.x"] <0) {
return("red")
} else {
return("blue")
}
# Create a loop using check_it
for(x in 1:nrow(all_data)) {
all_data[x, "color"] <- check_it(x, all_data)
}
Well, of course there are plenty of solutions to your problem. But since you specifically requested a loop and provided your own function I tried to stick to what you've done so far as much as possible. You have however two growth-columns so I took the growth.y.
datf <- read.table(text="
station_id timestamp growth.x growth.y color
DB1 1/14/01 59.916 59.9164 black
DB1 1/14/02 316.128 316.128 black
DB1 1/14/03 12.456 12.456 black
DB1 1/14/04 537.443 537.443 black",
header = TRUE, stringsAsFactors = FALSE)
#I had to change your function a little:
check_it <- function(x, dat)
if(dat[x, "growth.y"] >500){
return("green")
} else if(dat[x, "growth.y"] < 0) {
return("red")
} else {
return("blue")
}
Now your loop-variable x corresponds to the row index of the data.frame and you're looping through it. Before that, this was not the case you just passed a number to your function.
#And finally the loop
for(x in 1:nrow(datf)){
datf[x, "color"] <- check_it(x, datf)
}
> datf
station_id timestamp growth.x growth.y color
1 DB1 1/14/01 59.916 59.9164 blue
2 DB1 1/14/02 316.128 316.1280 blue
3 DB1 1/14/03 12.456 12.4560 blue
4 DB1 1/14/04 537.443 537.4430 green
You should however consider to look at the *apply-function family.

Replacement and non-matches with 'sub'

Months ago I ended up with a sub statement that originally worked with my input data. It has since stopped working causing me to re-examine my ugly process. I hate to share it but it accomplished several things at once:
active$id[grep("CIR",active$description)] <- sub(".*CIR0*(\\d+).*","\\1",active$description[grep("CIR",active$description)],perl=TRUE)
This statement created a new id column by finding rows that had an id embedded in the description column. The sub statement would find the number following a "CIR0" and populate the id column iff there was an id within a row's description. I recognize it is inefficient with the embedded grep subsetting either side of the assignment.
Is there a way to have a 'sub' replacement be NA or empty if the pattern does not match? I feel like I'm missing something very simple but ask for the community's assistance. Thank you.
Example with the results of creating an id column:
| name | id | description |
|------+-----+-------------------|
| a | 343 | Here is CIR00343 |
| b | | Didn't have it |
| c | 123 | What is CIR0123 |
| d | | CIR lacks a digit |
| e | 452 | CIR452 is next |
I was struggling with the same issue a few weeks ago. I ended up using the str_match function from the stringr package. It returns NA if the target string is not found. Just make sure you subset the result correctly. An example:
library(stringr)
str = "Little_Red_Riding_Hood"
sub(".*(Little).*","\\1",str) # Returns 'Little'
sub(".*(Big).*","\\1",str) # Returns 'Little_Red_Riding_Hood'
str_match(str,".*(Little).*")[1,2] #Returns 'Little'
str_match(str,".*(Big).*")[1,2] # Returns NA
I think in this case you could try using ifelse(), i.e.,
active$id[grep("CIR",active$description)] <- ifelse(match, replacement, "")
where match should evaluate to true if there's a match, and replacement is what that element would be replaced with in that case. Likewise, if match evaluates to false, that element's replaced with an empty string (or NA if you prefer).

Resources