This question already has answers here:
Faster ways to calculate frequencies and cast from long to wide
(4 answers)
Closed 4 years ago.
I want to get for each group the frequencies of the values of a factor/categorial variable.
The following does not work:
library(data.table)
dt<-data.table(fac=c("l1","l1","l2"),grp=c("A","B","B"))
dt[,fac:=as.factor(fac)]
dt[,list( table(fac) ),by=grp]
The error message is:
Error in `[.data.table`(dt, , list(table(fac)), by = grp) :
All items in j=list(...) should be atomic vectors or lists. If you are trying something like j=list(.SD,newcol=mean(colA)) then use := by group instead (much quicker), or cbind or merge afterwards.
Is there an easy way to accomblish this task? Thanks.
We can use dcast and bypass the second and third line of OP's code.
dcast(dt, grp~fac, length)
# grp l1 l2
#1: A 1 0
#2: B 1 1
Related
This question already has answers here:
Numbering rows within groups in a data frame
(10 answers)
Closed 1 year ago.
I am searching for a function that return a vector with the position/count of each value of a vector.
Here an example :
I have :
vec<-c("A","A","A","B","B","C")
I want :
c(1,2,3,1,2,1)
I have created a function that works but I am looking for a faster way to get it, as I have a big dataset.
Thank you very much in advance
One way would be to use ave in base R :
vec<-c("A","A","A","B","B","C")
result <- as.integer(ave(vec, vec, FUN = seq_along))
result
[1] 1 2 3 1 2 1
We can use rowid from data.table
library(data.table)
rowid(vec)
#[1] 1 2 3 1 2 1
This question already has answers here:
Dynamically add column names to data.table when aggregating
(2 answers)
Closed 4 years ago.
I'm using R's data table, and am trying to assign a column with := named with a character object while performing an operation by group.
If it's not done by group, things are relatively straightforward:
dt <- data.table(mtcars)[, .(cyl, mpg)]
thing2 <- 'mpgx2'
dt[,(thing2):=mpg*2]
However, when I'm doing things by group, an error occurs:
DT <- data.table(V1=c(1L,2L),
V2=LETTERS[1:3],
V3=round(rnorm(4),4),
V4=1:12)
ghi <- "def"
DT[,.((ghi)=mean(V3)),by=V1]
Specifically, Error: unexpected '=' in "DT[,.((ghi)=".
How can I rectify this?
We can use setNames
DT[,setNames(.(mean(V3)), ghi), by = V1]
# V1 def
#1: 1 -1.4663
#2: 2 0.0414
This question already has answers here:
How to sum a variable by group
(18 answers)
Closed 6 years ago.
I am trying to summarize this dataset by grouping by name (Almeria, Ath Bilbao,...) and have the sum of its corresponding values in column 2 (HalfTimeResult) and 3 (FullTimeResult). I tried with the aggregate and group_by functions but have not been able to obtain the right output.
What function and how would I use it to obtain an output like this?
This is the dataset that I am working with:
We can use data.table
library(data.table)
setDT(df1)[, lapply(.SD, sum), by = HomeTeam]
This question already has answers here:
Collapse text by group in data frame [duplicate]
(2 answers)
Closed 7 years ago.
I got a table like this:
id words
1 I like school.
2 I hate school.
3 I like cakes.
1 I like cats.
Here's what I want to do, joining the strings in each row according to id.
id words
1 I like school. I like cats.
2 I hate school.
3 I like cakes.
Is there a package to do that in R?
We can paste the 'words' together grouped by 'id'. This can be done with any of the group by operations. One way it is data.table. We convert the 'data.frame' to 'data.table' (setDT(df1)) and then do the operation was mentioned above.
# install.packages(c("data.table"), dependencies = TRUE)
library(data.table)
setDT(df1)[, list(words = paste(words, collapse=' ')), by = id]
A base R operation would be to use aggregate
aggregate(words~id, df1, FUN= paste, collape=' ')
This question already has answers here:
Select groups with more than one distinct value
(3 answers)
Closed 7 years ago.
I have data like below:
ID category class
1 a m
1 a s
1 b s
2 a m
3 b s
4 c s
5 d s
I want to subset the data by only including those "ID" which have several (> 1) different categories.
My expected output:
ID category class
1 a m
1 a s
1 b s
Is there a way to doing so?
I tried
library(dplyr)
df %>%
group_by(ID) %>%
filter(n_distinct(category, class) > 1)
But it gave me an error:
# Error: expecting a single value
Using data.table
library(data.table) #see: https://github.com/Rdatatable/data.table/wiki for more
setDT(data) #convert to native 'data.table' type by reference
data[ , if(uniqueN(category) > 1) .SD, by = ID]
uniqueN is data.table's (fast) native mask for length(unique()), and .SD is just the whole data.table (in more general cases, it can represent a subset of columns, e.g. when the .SDcols argument is activated). So basically the middle statement (j, the column selection argument) says to return all columns and rows associated with an ID for which there are at least two distinct values of category.
Use the by argument to extend to a case involving counts ok multiple columns.