How to get count-of-a-count with dplyr?

How to get count-of-a-count with dplyr? - r

Let's say we have the data frame
df <- data.frame(x = c("a", "a", "b", "a", "c"))
Using dplyr count, we get
df %>% count(x)
x n
1 a 3
2 b 1
3 c 1
I now want to do a count on the resulting n column. If the n column were named m, the result I'm looking for is
m n
1 1 2
2 3 1
How can this be done with dplyr?
Thank you very much!

dplyr seems to have trouble with count(n).
For instance:
d <- data.frame(n = sample(1:2, 10, TRUE), x = 1:10)
d %>% count(n)
A workaround is to rename n:
df %>% # using data defined in question
count(x) %>%
rename(m = n) %>%
count(m)

EDIT: I was wrong. Didn't have the newest version of dplyr so I didn't have the count function.
With dplyr a way to count is with n() In your example you would do the following to obtain the first counts:
df <- data.frame(x = c("a", "a", "b", "a", "c"))
df %>% group_by(x) %>% summarise(count=n())
Then if you want to count the occurrences of particular counts you can do:
df %>% group_by(x) %>% summarise(count=n()) %>% group_by(count) %>% summarise(newCount=n())
This is a dplyr way.

sum((df %>% count(x))$n)
##[1] 5

If you are willing to give data.table a try, it could be quite straight forward.
df <- data.frame(x = c("a", "a", "b", "a", "c"))
library(data.table)
setDT(df)[, .N, by=x][, list(count_of_N=.N), by=N]
# N count_of_N
# 1: 3 1
# 2: 1 2

If you want to count:
df %>% count(x) %>% summarise(length(n))
# length(n)
#1 3
If you want the sum:
df %>% count(x) %>% summarise(sum(n))
# sum(n)
#1 5

Its not pure plyr but this may work:
countr<-function(x){data.frame(table(x))}
t<-count(df,x)
countr(t[,2])

Related

concentate 2 vectors to string by common element

I have a data.frame with 2 columns. If an element appears in both columns this should be the grouping criteria. I then want to create a new column which concentates all elements by group into a single, sorted string.
df <- tibble::tribble(
~col1, ~col2,
"a", "b",
"b","c",
"c","b",
"d",NA,
"e","d",
"f","d",
"g","d",
"h","i",
"i","h",
"j", NA
)
outcome <- tibble::tribble(
~result,
c("a_b_c"),
c("d_e_f_g"),
c("h_i"),
c("j")
)
any help is appreciated since I have not yet found a starting point to solve the question thanks!

Get the connected components from igraph and paste.
library(dplyr)
library(igraph)
df %>%
mutate(col2 = coalesce(col2, col1)) %>%
as.matrix %>%
graph_from_edgelist %>%
components %>%
groups %>%
sapply(paste, collapse = "_") %>%
stack
giving:
values ind
1 a_b_c 1
2 d_e_f_g 2
3 h_i 3
4 j 4

R group by problem_the multiple combination ID

I'm using group by function in a dataset using R software. But the target of the id would duplicate. Here is the sample dataset:
df <- data.frame(ID = c ("A","A","B","C","C","D"),
Var1 = c(1,3,2,3,1,2))
ID Var1
A 1
A 3
B 2
C 3
C 1
D 2　
I've to group ID by A+B and B+C and D (PS. say that F=A+B ,G=B+C) and the target result dataset below:
ID Var1
F 6
G 6
D 2
I use the following code to solve it
library(dplyr)
library(tidyr)
df <- df %>% mutate(F=ifelse(ID %in% c("A", "B"), 1, 0),
G = ifelse(ID %in% c("B", "C"), 1, 0),
D = ifelse(ID == "D", 1, 0))
df %>%
gather(var, val, F:D) %>%
filter(val==1) %>%
group_by(var) %>%
summarise(Var1=sum(Var1))
BUT this way failed because of the memory limit (the dataset is large)
Is there another way to solve it?
Any suggestions would be greatly appreciated.

Generating multiple columns at once with dplyr

I often have to dynamically generate multiple columns based on values in existing columns. Is there a dplyr equivalent of the following?:
cols <- c("x", "y")
foo <- c("a", "b")
df <- data.frame(a = 1, b = 2)
df[cols] <- df[foo] * 5
> df
a b x y
1 1 2 5 10

Not the most elegant:
library(tidyverse)
df %>%
mutate_at(vars(foo),function(x) x*5) %>%
set_names(.,nm=cols) %>%
cbind(df,.)
a b x y
1 1 2 5 10
This can be made more elegant as suggested by #akrun :
df %>%
mutate_at(vars(foo), list(new = ~ . * 5)) %>%
rename_at(vars(matches('new')), ~ c('x', 'y'))

Drop unused levels from a factor after filtering data frame using dplyr

I used dplyr function to create a new data sets which contain the names that have less than 4 rows.
df <- data.frame(name = c("a", "a", "a", "b", "b", "c", "c", "c", "c"), x = 1:9)
aa = df %>%
group_by(name) %>%
filter(n() < 4)
But when I type
table(aa$name)
I get,
a b c
3 2 0
I would like to have my output as follow
a b
3 2
How to completely separate new frame aa from df?

To complete your answer and KoenV's comment you can just, write your solution in one line or apply the function factor will remove the unused levels:
table(droplevels(aa$name))
table(factor(aa$name))
or because you are using dplyr add droplevels at the end:
aa <- df %>%
group_by(name) %>%
filter(n() < 4) %>%
droplevels()
table(aa$name)
# Without using table
df %>%
group_by(name) %>%
summarise(count = n()) %>%
filter(count < 4)

aaNew <- droplevels(aa)
table(aa$name)

overlapping groups in dplyr

I'm trying to calculate "rolling" summary statistics based on a grouping factor. Is there a nice way to process by (overlapping) groups, based on (say) an ordered factor?
As an example, say I want to calculate the sum of val by groups
df <- data.frame(grp = c("a", "a", "b", "b", "c", "c", "c"),
val = rnorm(7))
For groups based on grp, it's easy:
df %>% group_by(grp) %>% summarise(total = sum(val))
# result:
grp total
1 a 1.6388
2 b 0.7421
3 c 1.1707
However, what I want to do is calculate "rolling" sums for successive groups ("a" & "b", then "b" & "c", etc.). The desired output would be something like this:
grp1 grp2 total
1 a b 1.6388
2 b c 0.7421
I'm having trouble doing this in dplyr. In particular, I can't seem to figure out how to get "overlapping" groups - the "b" rows in the above example should end up in two output groups.

Try lag:
df %>%
group_by(grp) %>%
arrange(grp) %>%
summarise(total = sum(val)) %>%
mutate(grp1 = lag(grp), grp2 = grp, total = total + lag(total)) %>%
select(grp1, grp2, total) %>%
na.omit

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to get count-of-a-count with dplyr? - r

dplyr seems to have trouble with count(n). For instance: d <- data.frame(n = sample(1:2, 10, TRUE), x = 1:10) d %>% count(n) A workaround is to rename n: df %>% # using data defined in question count(x) %>% rename(m = n) %>% count(m)

sum((df %>% count(x))$n) ##[1] 5

If you are willing to give data.table a try, it could be quite straight forward. df <- data.frame(x = c("a", "a", "b", "a", "c")) library(data.table) setDT(df)[, .N, by=x][, list(count_of_N=.N), by=N] # N count_of_N # 1: 3 1 # 2: 1 2

If you want to count: df %>% count(x) %>% summarise(length(n)) # length(n) #1 3 If you want the sum: df %>% count(x) %>% summarise(sum(n)) # sum(n) #1 5

Its not pure plyr but this may work: countr<-function(x){data.frame(table(x))} t<-count(df,x) countr(t[,2])

Related

concentate 2 vectors to string by common element

R group by problem_the multiple combination ID

Generating multiple columns at once with dplyr

Drop unused levels from a factor after filtering data frame using dplyr

overlapping groups in dplyr

Categories

Resources