Group by a column but concat another column with comma delimited - azure-data-explorer

I have a table with company_name and RegistrationId column.
How to write a query so that I can do group by company_name but concatenate all values of RegistrationId into a string column (say AllIDs) with comma like 123,456,789?

This can be expressed as:
T
| summarize AllIds=make_list(RegistrationId) by company_name
You can use make_set() function to create unique set (without repetitions of ids).
https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/makelist-aggfunction
https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/makeset-aggfunction
If you need later to format a string from array - use strcat_array() function:
https://learn.microsoft.com/en-us/azure/data-explorer/kusto/query/strcat-arrayfunction

Related

How to name a new dataframe based on input character value

In R, I am trying to get input from a user to create the name of a new data frame. e.g
number <- readline(prompt = "what is your number:")
Which creates a character string with one entry, e.g number: "4"
Now i want to create a dataframe named after the character inputted, and subset some other information based on that number from another table, for example:
number_4 <- subset(df, df$NO=="4")
As i might be doing hundreds of these i do not want to have to manually name each dataframe, is there a way to use the character to name a dataframe?
We can use assign function
assign(paste0("number_", number), subset(df, NO == number))

Concatenating strings to get object name

a <- "name"
df$a
Here, df is my data frame, and name is one of the column names of data frame df. How could I command R to execute code by considering (a) to be an object name instead of a character?
Do the following. First remove a column in which you want to work. After that, turn it into your desired object.
Example:
A = factor (a)
A = vector (a)
1 - You can only concatenate vectors.
2- A letter "a" is not sensitive using the name of an object. Use another name, for example: Work1

How to use pandas Grouper on multiple keys?

I need to groupby-transform a dataframe by a datetime column AND another str(object) column to apply a function by group and asign the result to each of the row members of the group. I understand the groupby workflow but cannot make a pandas.Grouper for both conditions at the same time. Thus:
How to use pandas.Grouper on multiple columns?
Use the DataFrame.groupby with a list of pandas.Grouper as the by argument like this:
df['result'] = df.groupby([
pd.Grouper('dt', freq='D'),
pd.Grouper('other_column')
]).transform(foo)
If your second column is a non-datetime series, you can group it with a date-time column like this:
df['res'] = df.groupby([
pd.Grouper('dt', freq='D'),
'other_column'
]).transform(foo)
Note that in this case you don't have to use pd.Grouper for second column beacuse its a string object and not a time object. pd.Grouper is only compatible with datetime columns.

How to avoid reading data from a dataframe when the passed column name do not match exactly?

I recently discovered that R will output data for a column name if the column name does not exist as is passed but the dataframe has a column name that meets what was passed as column name to retrieve data.
So if you have a dataframe X with column names say fruits and vegetables and if you try to retrieve data as X$fruit it will give you the fruits column data even when the passed column name (fruit) does not match the data frame column name (fruits). It throws error if there are column names like fruitss because at this time I believe R cannot decide whether to show fruits or fruitss to the passed value of x$fruit
How to avoid this?
The $ can create confusion where there are similar prefix for column names, so it is better to use [[ or [ to extract the columns as it will match the entire string and not any partial strings.
X[["fruit"]]
Or
X[, "fruit"]

count occurrences in pipe delimited string in dataframe

I have a Names column in my dataframe as follows:
Names
steve|chris|jeff
melissa|jo|john
chris|susan|redi
john|fiona|bart
jo|chris|fiona
The entries are pipe delimited. Is there a way to count the occurrences of the names in this column? For example, Chris occurs 3 times. Using a package like "plyr" works when there are only single entries in the column, but not sure about entries that are combined like above.

Resources