How to generate a JSON field based on results of a Kusto query - azure-data-explorer

Say we have a table like this:
datatable(one:string, two:string)
[
"A", "textA",
"A", "textB",
"A", "textC",
"B", "textA1",
"B", "textB1",
"B", "textC1",
]
| summarize by one
We need to get alongside the one summary a JSON column with all the instances of two summarized. In this case, we would get:
A, "textA,textB,textC"
B, "textA1,textB1,textC1"
I know how to pack fields of one column into a new one but I have no idea about how to gather the results from different rows summarized into one.

make_list()
datatable(one:string, two:string)
[
"A", "textA",
"A", "textB",
"A", "textC",
"B", "textA1",
"B", "textB1",
"B", "textC1",
]
| summarize make_list(two) by one
one
list_two
A
["textA","textB","textC"]
B
["textA1","textB1","textC1"]
Fiddle
Add strcat_array() if you want to convert the result to a string.
datatable(one:string, two:string)
[
"A", "textA",
"A", "textB",
"A", "textC",
"B", "textA1",
"B", "textB1",
"B", "textC1",
]
| summarize strcat_array(make_list(two), ",") by one
one
Column1
A
textA,textB,textC
B
textA1,textB1,textC1
Fiddle

Related

Finding values in a columns "a" which has different values in column "b" for two different data set

Data contains multiple columns and 3000 row
Same OrderNo but different Ordertype.
I want to get all the OrderNo whose Ordertype are different in the two data frame.
I have isolated the two columns from the two data frame and set them in ascending order. Then I tried to use the function cbind to combine the two columns and find the missing values in one of the columns.
xxx <- data.frame( orderNo = c(1:10), Ordertype = c("a", "b", "c", "d", "a", "b", "c", "d", "e", "f"))
yyy <- data.frame( orderNo = c(1:10), Ordertype = c("a", "b", "c", "d", "a", "b", "e", "d", "e", "f"))
In the above example: OrderNo "7" corresponds to "c" in one data frame and "e" in another data frame. I want a set of all such number with a different value in the column Ordertype as my output.
It sounds like you want a data frame that contains differences between two data frames, matched by (and including) orderNo. Is that correct?
One possibility is:
res <- merge(xxx, yyy, by = "orderNo")
res[res[,2] != res[,3], ]
orderNo Ordertype.x Ordertype.y
7 7 c e
Using dplyr and anti_join you can do the following to find differences:
library(dplyr)
inner_join(anti_join(xxx, yyy), anti_join(yyy, xxx), by='orderNo')
orderNo Ordertype.x Ordertype.y
1 7 c e

Count number of observations with elements in the same order

I am trying to pre-process some data in order to build a Sunburst plot in R. In short, I need to count how many observations have their elements in the same order.
The elements of each observation are character strings. The order does matter.
mylist <- list(c("a", "b", "c"),
c("x", "y"),
c("b", "c", "a"),
c("a", "b", "c"))
Desired output would be something like:
"a-b-c" = 2
"x-y" = 1
"b-c-a" = 1

R - combining columns by specific conditions

I currently has a data frame as follow:
groups <- data.frame(name=paste("person",c(1:27),sep=""),
assignment1 = c("F","A","B","H", "A", "E", "D", "G", "I", "I", "E", "A", "D", "C", "F", "C", "D", "H", "F", "H", "G", "I", "G", "C", "B", "E", "B"),
assignment2 = c("H", "F", "F", "D", "E", "G", "A", "E", "I", "C", "A", "H", "G", "B", "I", "C", "E", "I", "C", "A", "B", "B", "G", "D", "H", "F", "D"),stringsAsFactors = FALSE)
It would looks like this:
I would like to create a list for each person that only contains the people he had already worked with. For example, person1 is on group F and H for 1st and 2nd assignment respectively and
The member of groups F on 1st assignment are {"person1","person15", "person19"}.
The member of groups D on 2nd assignment are {"person1","person12", "person25"}.
I would like to create a vector for person1 like
{"person15", "person19", "person12", "person25"}.
Any one knows a convenient way to do this in R?
Any help will be appreciated. Thanks in advance.
You could do this:
teammates <- lapply(1:nrow(groups), function(i) {
assig1 <- subset(groups, assignment1 == groups$assignment1[i])$name
assig2 <- subset(groups, assignment2 == groups$assignment2[i])$name
unq_set <- unique(c(assig1, assig2))
return(setdiff(unq_set, groups$name[i]))
})
This takes a vector of row indices, and for each one applies a function that a) gets the names of those where assignments 1 & 2 match the given row, b) gets the unique superset of these, c) returns that, less the name of the person around whom the group is built
The output is a list like this:
[[1]]
[1] "person15" "person19" "person12" "person25"
[[2]]
[1] "person5" "person12" "person3" "person26"
[[3]]
[1] "person25" "person27" "person2" "person26"
...and so on
For more brevity, the following is equivalent (though order inside list items may be different). Same logic as #user5219763's answer for subsetting, but the setdiff part is important
teammates <- lapply(1:nrow(groups), function(i) {
setdiff(
with(groups, name[assignment1 == assignment1[i] |
assignment2 == assignment2[i] ]),
groups$name[i])
})
Here's a solution using dplyr and tidyr:
library(dplyr)
library(tidyr)
groups %>%
gather(var, val, -name) %>%
unite(comb, var, val) %>%
left_join(.,., by = 'comb') %>%
group_by(name.x) %>%
summarise(out = list(name.y))
The heavy lifting is done using the left_join before that we are combining columns, so that we can merge on eg assignment1_f. The output contains itself, and is not corrected for dupes - that is up to you.
However, as #akrun says, if you are doing a lot of this stuff, use igraph
You could use is.element()
workedWith <- function(index,data=groups){
data[is.element(data[,2],data[index,2]) | is.element(data[,3],data[index,3]),1]
}
lapply(X = seq(1:nrow(groups)),FUN = workedWith)

Reorder / arrange bars in a plot(table) while keeping value names

I would like to plot the result of table in a decreasing order, but if I sort the table before plotting it the plot does not show the value names anymore.
a <- data.frame(var = c("A", "A", "B", "B", "B", "B", "B", "C", "D", "D", "D"))
plot(table(a))
plot(sort(table(a)))
We get the count with table ('tbl'), order the elements and assign it to 'tbl' to keep the same structure as in 'tbl' and then plot. In the OP's code, the sort or order converts the table class to matrix.
tbl <- table(a)
tbl[] <- tbl[order(tbl)]
plot(tbl)

NAs in the dummies package

I am using R dummy.data.frame function in the dummies package to create dummy variables for the k levels of my factor. Unfortunately, my factor has NAs. When I use dummy.data.frame it creates k dummies with no NAs and a new dummy which flags with 1 the missing values.
However, I would like to still have the NAs in the k dummies and not a dummy for the missing values.
Is this possible with that function? Do you know any other functions that can help me?
I usually do this kind of things using the model.matrix(). Using that with the option na.action set to pass will retain the NAs in their correct places. This option does not seem to change the behavior of the function dummy(), so using model.matrix() might be your easiest bet. For example, for a single factor letters the following should do the trick:
options(na.action="na.pass")
letters <- c( "a", "a", "b", "c", "d", "e", "f", "g", "h", "b", "b", NA )
model.matrix(~letters-1)
Or for several variables or columns of a data frame as well:
letters <- c( "a", "a", "b", "c", "d", "e", "f", "g", "h", "b", "b", NA )
betters <- c( "a", "a", "c", "c", "c", "d", "d", "d", NA, "e", "e", "e" )
model.matrix(~letters+betters-1)
The important trick here really is to set the option na.action. After this dummy recoding, it is a good idea to return the option to its default value:
options(na.action="na.omit")

Resources