Kusto (KQL): Count of all columns where value < 0 - azure-data-explorer

Objective: Count all columns where values < 0. Columns could be either positive or negative
Example as shown:
Table
| summarize count() by Field
| where (Col1 <0 or Col2 <0 or Col3 <0 or Col4 <0)
The result I get back is:
A | 1
B | 1
C | 0
New to Kusto - what am I doing wrong?
Thanks

try reversing the order of the filter and the aggregation, i.e.:
Table
| where (Col1 <0 or Col2 <0 or Col3 <0 or Col4 <0)
| summarize count() by Field
or use the countif() aggregation function:
Table
| summarize countif(Col1 <0 or Col2 <0 or Col3 <0 or Col4 <0) by Field

Related

Internal joins in R

How do get data in R for large set of data?
data <- data.frame(col1 = c(A,A,B,B,B,"","","","","","",),
col2 = c(1, 2, 3, 4,2,5,4,7,1,2,3)),
col3=c(5,6,7,10,15,15,10,20,30,40,50))
where
col3=sale number
output result:
From col2 select those row which are not assigned with col1 ex.
col2 row 1,2 which is assigned with A from col1, so I want sale number of col3 excluding which is present in A i.e. 5 & 6 from col3,
similarly, 3,4,2 assigned with B from col1, so I want sale number of col3 excluding which is present in B i.e. 7,10,15 from col3.
Expected result:
col1 col2 col3(SUM OF SALE)
A 1 30
A 2 40
B 3 50
B 4 10
B 2 6
B 2 40

Keep first occurrence of rows irrespective of column of each element

Is there a function that treats the elements of a row as set and returns only the first occurrence of each unique set?
In example below, rows 1 and 3 should be considered equal. It should be irrelevant for the function foo whether an element is in col1 or col2.
df <- data.frame(col1 = c('a', 'b', '1'), col2 = c('1', '2', 'a'))
foo(df)
> col1 col2
> 1 a 1
> 2 b 2
You could do something like this..
df[!duplicated(t(apply(df,1,sort))),]
col1 col2
1 a 1
2 b 2
It sorts each row (so that a-1 and 1-a end up the same), and then selects only those rows of df that are not duplicates.

Replace different values in one column, according to the row information in another column

I am actually working with genomic data, and I have one data frame, which I am going to show you the first three rows (see table below):
Chrom | POS | ID | REF | ALT | HapA | HapB |
----------------------------------------------------------
22 | 16495833 | rs116911124 | A | C | 1 | 0 |
22 | 19873357 | rs116378360 | T | A | 0 | 1 |
22 | 21416404 | rs117982183 | T | T | 0 | . |
So, I would like to replace the values of "0", "1" and "." from the "HapA" and "HapB" columns according to the REF and ALT columns for every row in the data frame. For example:
a) for the first row I want to change the "1" in HapA column for the "C" in the ALT column, and the "0" in the HapB column for the "A" value in the REF column
b) for the second row change the "0" for the "T" in the "REF" column and the "1" for the "A" in the "ALT" column.
c) And finally, for the "." change it for "NA"
I think that this could be achieved using "if else" or with data.table.
Thank you very much.
It's a bit unclear what you want exactly, since you don't specify what should happen to the 0 in the third row of the HapA column, but given what you said, this is a dplyr solution:
library(dplyr)
df <- read.table(text = "
'Chrom' 'POS' 'ID' 'REF' 'ALT' 'HapA' 'HapB'
22 16495833 'rs116911124' 'A' 'C' 1 0
22 19873357 'rs116378360' 'T' 'A' 0 1
22 21416404 'rs117982183' 'T' 'T' 0 .", header = T, stringsAsFactors = F)
df %>%
mutate(HapA = ifelse(HapA == 1, ALT, ifelse(HapA == 0, REF, NA)),
HapB = ifelse(HapB == 1, ALT, ifelse(HapB == 0, REF, NA)))
## Chrom POS ID REF ALT HapA HapB
## 1 22 16495833 rs116911124 A C C A
## 2 22 19873357 rs116378360 T A T A
## 3 22 21416404 rs117982183 T T T <NA>
I think if_else(), recode(), or case_when() could all work for this. Here I've tried to use mutate_at() to apply the function to both HapA and HapB. In case one of the values in those columns is not equal to 1,0, or . then the function should return the value as a character string.
mutate_at(df, vars(HapA, HapB),
function(x) {case_when(x == 1 ~ .$ALT,
x == 0 ~ .$REF,
x == . ~ NA_character_,
TRUE ~ as.character(x)) } )
There wasn't really a question, but I'm going to guess what it was:
How can I replace the values of HapA and HapB following these rules:
If "0", then replace with the value of REF.
If "1", then replace with the value of ALT.
If ".", then replace with NA.
Note that I'm also assuming HapA and HapB are character columns, since . can't be a numeric value.
If this is the right interpretation, there's no need to use fancy tricks. This is an "if-else" problem. Here's a solution using data.table, which I think is common in genomic analysis. First I'll create the example dataset:
library(data.table)
dt <- fread(
header = TRUE,
colClasses = c(
Chrom = "character",
POS = "integer",
ID = "character",
REF = "character",
ALT = "character",
HapA = "character",
HapB = "character"
),
input = "
Chrom POS ID REF ALT HapA HapB
22 16495833 'rs116911124' 'A' 'C' 1 0
22 19873357 'rs116378360' 'T' 'A' 0 1
22 21416404 'rs117982183' 'T' 'T' 0 ."
)
dt
# Chrom POS ID REF ALT HapA HapB
# 1: 22 16495833 'rs116911124' 'A' 'C' 1 0
# 2: 22 19873357 'rs116378360' 'T' 'A' 0 1
# 3: 22 21416404 'rs117982183' 'T' 'T' 0 .
That was the long part. Here's the short part.
dt[HapA == "0", HapA := REF]
dt[HapA == "1", HapA := ALT]
dt[HapA == ".", HapA := NA]
dt[HapB == "0", HapB := REF]
dt[HapB == "1", HapB := ALT]
dt[HapB == ".", HapB := NA]
dt
# Chrom POS ID REF ALT HapA HapB
# 1: 22 16495833 'rs116911124' 'A' 'C' 'C' 'A'
# 2: 22 19873357 'rs116378360' 'T' 'A' 'T' 'A'
# 3: 22 21416404 'rs117982183' 'T' 'T' 'T' NA
I strongly suggest writing this out in a simple way, like the above. It's short, has little repetition, and is easily understood at a glance. However, if you'd want to generalize this to a lot of columns, that would require writing a lot of repetitive lines. So here's a loop version:
replaced_columns <- c("HapA", "HapB") # Switch these out for any
source_columns <- c("REF", "ALT") # number of columns
for (rr in replaced_columns) {
for (source_i in seq_along(source_columns)) {
target_rows <- which(dt[[rr]] == source_i - 1)
dt[
target_rows,
(rr) := .SD,
.SDcols = source_columns[source_i]
]
}
}
dt
# Chrom POS ID REF ALT HapA HapB
# 1: 22 16495833 'rs116911124' 'A' 'C' 'C' 'A'
# 2: 22 19873357 'rs116378360' 'T' 'A' 'T' 'A'
# 3: 22 21416404 'rs117982183' 'T' 'T' 'T' .

How to create new column in data.table based on presence of value in another data.table

I have two data.tables A and B:
A B A
--------- ----------------- -----------------
Col1 Col2 Col1 Col2 Col1 Col2 Col3
A 1 A popular A 1 popular
B 2 B moderate -> B 2 moderate
C 3 C not popular . . .
D 4 D popular . . .
For each value in col1 of A, I want to check for its existence in col1 of B. If it's there, then create a third column in A based on the value in col2 of B. How can I achieve this?
We can use a join
library(data.table)
setDT(A)[B, Col3 := Col2,on = .(Col1)]

Checking if a value is numerical in R

I have two dataframes, df1 and df2.
df1:
col1 <- c('30','30','30','30')
col2 <- c(3,13,18,41)
col3 <- c("heavy","light","blue","black")
df1 <- data.frame(col1,col2,col3)
>df1
col1 col2 col3
1 30 3 heavy
2 30 13 light
3 30 18 blue
4 30 41 black
df2:
col1 <- c('10',"NONE")
col2 <- c(21,"NONE")
col3 <- c("blue","NONE")
df2 <- data.frame(col1,col2,col3)
>df2
col1 col2 col3
1 10 21 blue
2 NONE NONE NONE
I wrote a bit of script that says; if a value in col3 is equal to "light", I want to remove that row and all subsequent rows in the dataframe. So df1 would look like:
col1 col2 col3
1 30 3 heavy
And there would be no changes to df2 (as it has no matches to "light" in col3).
I have stated there are two separate df's above as two examples, but the script below just refers to a general "df" to save me copying and pasting the same bit of code twice with df1 repalced with df2.
phrase=c("light")
start_rownum=which(grepl(phrase, df[,3]))
end_rownum=nrow(df)
end_rownum=as.numeric(end_rownum)
if(start_rownum > 0){
df=df[-c(start_rownum:end_rownum),]
}
This script works fine with df1, as the start_rownum has a numerical value. However, I get the following error with df2:
Error in start_rownum:end_rownum : argument of length 0
Instead of saying "if(start_rownum > 0)", is there some way to check if start_rownum has a numerical value? I can't find a working solution.
Thanks.
For anyone who has a similar problem, I just solved it:
Use the phrase
if (length(start_rownum)>0 & is.numeric(start_rownum))

Resources