Subset data frame based on a column number threshold [duplicate] - r

I have a question about counting zeros per row. I have a dataframe like this:
a = c(1,2,3,4,5,6,0,2,5)
b = c(0,0,0,2,6,7,0,0,0)
c = c(0,5,2,7,3,1,0,3,0)
d = c(1,2,6,3,8,4,0,4,0)
e = c(0,4,6,3,8,4,0,6,0)
f = c(0,2,5,5,8,4,2,7,4)
g = c(0,8,5,4,7,4,0,0,0)
h = c(1,3,6,7,4,2,0,4,2)
i = c(1,5,3,6,3,7,0,5,3)
j = c(1,5,2,6,4,6,8,4,2)
DF<- data.frame(a=a,b=b,c=c,d=d,e=e,f=f,g=g,h=h,i=i,j=j)
a b c d e f g h i j
1 1 0 0 1 0 0 0 1 1 1
2 2 0 5 2 4 2 8 3 5 5
3 3 0 2 6 6 5 5 6 3 2
4 4 2 7 3 3 5 4 7 6 6
5 5 6 3 8 8 8 7 4 3 4
6 6 7 1 4 4 4 4 2 7 6
7 0 0 0 0 0 2 0 0 0 8
8 2 0 3 4 6 7 0 4 5 4
9 5 0 0 0 0 4 0 2 3 2
I want to count the numbers of zeros per row. If the number of zeros per row is more than a certain number, say 4, I want to remove the complete row. The resulting dataframe looks like this:
a b c d e f g h i j
2 2 0 5 2 4 2 8 3 5 5
3 3 0 2 6 6 5 5 6 3 2
4 4 2 7 3 3 5 4 7 6 6
5 5 6 3 8 8 8 7 4 3 4
6 6 7 1 4 4 4 4 2 7 6
8 2 0 3 4 6 7 0 4 5 4
Is that possible?? Thank you!

It's not only possible, but very easy:
DF[rowSums(DF == 0) <= 4, ]
You could also use apply:
DF[apply(DF == 0, 1, sum) <= 4, ]

Related

count number of 0 in each row and build a new column in r [duplicate]

I have a question about counting zeros per row. I have a dataframe like this:
a = c(1,2,3,4,5,6,0,2,5)
b = c(0,0,0,2,6,7,0,0,0)
c = c(0,5,2,7,3,1,0,3,0)
d = c(1,2,6,3,8,4,0,4,0)
e = c(0,4,6,3,8,4,0,6,0)
f = c(0,2,5,5,8,4,2,7,4)
g = c(0,8,5,4,7,4,0,0,0)
h = c(1,3,6,7,4,2,0,4,2)
i = c(1,5,3,6,3,7,0,5,3)
j = c(1,5,2,6,4,6,8,4,2)
DF<- data.frame(a=a,b=b,c=c,d=d,e=e,f=f,g=g,h=h,i=i,j=j)
a b c d e f g h i j
1 1 0 0 1 0 0 0 1 1 1
2 2 0 5 2 4 2 8 3 5 5
3 3 0 2 6 6 5 5 6 3 2
4 4 2 7 3 3 5 4 7 6 6
5 5 6 3 8 8 8 7 4 3 4
6 6 7 1 4 4 4 4 2 7 6
7 0 0 0 0 0 2 0 0 0 8
8 2 0 3 4 6 7 0 4 5 4
9 5 0 0 0 0 4 0 2 3 2
I want to count the numbers of zeros per row. If the number of zeros per row is more than a certain number, say 4, I want to remove the complete row. The resulting dataframe looks like this:
a b c d e f g h i j
2 2 0 5 2 4 2 8 3 5 5
3 3 0 2 6 6 5 5 6 3 2
4 4 2 7 3 3 5 4 7 6 6
5 5 6 3 8 8 8 7 4 3 4
6 6 7 1 4 4 4 4 2 7 6
8 2 0 3 4 6 7 0 4 5 4
Is that possible?? Thank you!
It's not only possible, but very easy:
DF[rowSums(DF == 0) <= 4, ]
You could also use apply:
DF[apply(DF == 0, 1, sum) <= 4, ]

Find rows in dataframe with more than one zero [duplicate]

I have a question about counting zeros per row. I have a dataframe like this:
a = c(1,2,3,4,5,6,0,2,5)
b = c(0,0,0,2,6,7,0,0,0)
c = c(0,5,2,7,3,1,0,3,0)
d = c(1,2,6,3,8,4,0,4,0)
e = c(0,4,6,3,8,4,0,6,0)
f = c(0,2,5,5,8,4,2,7,4)
g = c(0,8,5,4,7,4,0,0,0)
h = c(1,3,6,7,4,2,0,4,2)
i = c(1,5,3,6,3,7,0,5,3)
j = c(1,5,2,6,4,6,8,4,2)
DF<- data.frame(a=a,b=b,c=c,d=d,e=e,f=f,g=g,h=h,i=i,j=j)
a b c d e f g h i j
1 1 0 0 1 0 0 0 1 1 1
2 2 0 5 2 4 2 8 3 5 5
3 3 0 2 6 6 5 5 6 3 2
4 4 2 7 3 3 5 4 7 6 6
5 5 6 3 8 8 8 7 4 3 4
6 6 7 1 4 4 4 4 2 7 6
7 0 0 0 0 0 2 0 0 0 8
8 2 0 3 4 6 7 0 4 5 4
9 5 0 0 0 0 4 0 2 3 2
I want to count the numbers of zeros per row. If the number of zeros per row is more than a certain number, say 4, I want to remove the complete row. The resulting dataframe looks like this:
a b c d e f g h i j
2 2 0 5 2 4 2 8 3 5 5
3 3 0 2 6 6 5 5 6 3 2
4 4 2 7 3 3 5 4 7 6 6
5 5 6 3 8 8 8 7 4 3 4
6 6 7 1 4 4 4 4 2 7 6
8 2 0 3 4 6 7 0 4 5 4
Is that possible?? Thank you!
It's not only possible, but very easy:
DF[rowSums(DF == 0) <= 4, ]
You could also use apply:
DF[apply(DF == 0, 1, sum) <= 4, ]

R function to evaluate the values in data.table columns against another set of values

I have columns consisting of answers to different test questions (e.g. Q1,Q2,Q3), and I would like to write a function that would evaluate those answers and create new columns with scores for each test question (1 or 0), where 'id' refers to different individuals.
id Q1 Q2 Q3
1 4 3 3
2 7 3 7
3 8 5 6
4 8 2 8
5 4 6 8
6 4 6 6
7 4 6 5
8 4 6 8
9 4 6 6
The output I'm looking for is
id Q1 Q2 Q3 Q1_score Q2_score Q3_score
1 4 3 3 1 0 0
2 7 3 7 0 0 0
3 8 5 6 0 0 0
4 8 2 8 0 0 1
5 4 6 8 1 1 1
6 4 6 6 1 1 0
7 4 6 5 1 1 0
8 4 6 8 1 1 1
9 4 6 6 1 1 0
I've defined the correct answers and the new column names below, but I can't seem to figure out the function that would do something like "for the first question 'Q1', if the answer is equal to the first value in 'answers', return 1 else 0"... then "for the second question 'Q2', if the answer is equal to the second value...", etc.
answers=c(4,6,8)
newcols=paste('Q',rep(1:3),'_score',sep='')
dt[,(newcols):= function, id, .SDcols=2:4]
Starting with
> quiz
id Q1 Q2 Q3
1 1 4 3 3
2 2 7 3 7
3 3 8 5 6
4 4 8 2 8
5 5 4 6 8
6 6 4 6 6
7 7 4 6 5
8 8 4 6 8
9 9 4 6 6
You want to extract out the Q columns into a matrix and then do a row-by-row comparison by transposing, comparing by column (because R stores matrices in column-order) and then transposing back.
You can then convert to numeric by adding 0, fixup the names with a quick paste0 and then cbind this onto your original. Here's a solution:
> resp = t(t(quiz[,2:4]) == c(4,6,8))+0
> colnames(resp)=paste0(colnames(resp),"_score")
> cbind(quiz, data.frame(resp))
id Q1 Q2 Q3 Q1_score Q2_score Q3_score
1 1 4 3 3 1 0 0
2 2 7 3 7 0 0 0
3 3 8 5 6 0 0 0
4 4 8 2 8 0 0 1
5 5 4 6 8 1 1 1
6 6 4 6 6 1 1 0
7 7 4 6 5 1 1 0
8 8 4 6 8 1 1 1
9 9 4 6 6 1 1 0
We specify the .SDcols with all the columns except the first one, use Map to compare the corresponding list element (i.e. column) with corresponding value in answers, convert it to integer and assign (:=) it to new columns using paste
library(data.table)
dt[, paste0(names(dt)[-1], "_score") :=
Map(function(x,y) as.integer(x==y), .SD, answers), .SDcols = -1]
dt
# id Q1 Q2 Q3 Q1_score Q2_score Q3_score
#1: 1 4 3 3 1 0 0
#2: 2 7 3 7 0 0 0
#3: 3 8 5 6 0 0 0
#4: 4 8 2 8 0 0 1
#5: 5 4 6 8 1 1 1
#6: 6 4 6 6 1 1 0
#7: 7 4 6 5 1 1 0
#8: 8 4 6 8 1 1 1
#9: 9 4 6 6 1 1 0

Count non-zero values of column in R [duplicate]

This question already has an answer here:
Add a new column of the sum by group [duplicate]
(1 answer)
Closed 6 years ago.
Suppose i have data frame like this one
DF
Id X Y Z
1 1 5 0
1 2 0 0
1 3 0 5
1 4 9 0
1 5 2 3
1 6 5 0
2 1 5 0
2 2 4 0
2 3 0 6
2 4 9 6
2 5 2 0
2 6 5 2
3 1 5 6
3 2 4 0
3 3 6 5
3 4 9 0
3 5 2 0
3 6 5 0
I want to count the number of non zero entries for variable Z in a particular Id and record that value in a new column Count, so the new data frame will look like
DF1
Id X Y Z Count
1 1 5 0 2
1 2 4 0 2
1 3 6 5 2
1 4 9 0 2
1 5 2 3 2
1 6 5 0 2
2 1 5 0 3
2 2 4 0 3
2 3 6 6 3
2 4 9 6 3
2 5 2 0 3
2 6 5 2 3
3 1 5 6 2
3 2 4 0 2
3 3 6 5 2
3 4 9 0 2
3 5 2 0 2
3 6 5 0 2
We can use base R ave
Counting the number of non-zero values for column Z grouped by Id
df$Count <- ave(df$Z, df$Id, FUN = function(x) sum(x!=0))
df$Count
#[1] 2 2 2 2 2 2 3 3 3 3 3 3 2 2 2 2 2 2
You can try this, it gives you exactly what you want:
library(data.table)
dt <- data.table(df)
dt[, Count := sum(Z != 0), by = Id]
dt
# Id X Y Z Count
# 1: 1 1 5 0 2
# 2: 1 2 0 0 2
# 3: 1 3 0 5 2
# 4: 1 4 9 0 2
# 5: 1 5 2 3 2
# 6: 1 6 5 0 2
# 7: 2 1 5 0 3
# 8: 2 2 4 0 3
# 9: 2 3 0 6 3
# 10: 2 4 9 6 3
# 11: 2 5 2 0 3
# 12: 2 6 5 2 3
# 13: 3 1 5 6 2
# 14: 3 2 4 0 2
# 15: 3 3 6 5 2
# 16: 3 4 9 0 2
# 17: 3 5 2 0 2
# 18: 3 6 5 0 2
This will also work:
df$Count <- rep(aggregate(Z~Id, df[df$Z != 0,], length)$Z, table(df$Id))
Id X Y Z Count
1 1 1 5 0 2
2 1 2 0 0 2
3 1 3 0 5 2
4 1 4 9 0 2
5 1 5 2 3 2
6 1 6 5 0 2
7 2 1 5 0 3
8 2 2 4 0 3
9 2 3 0 6 3
10 2 4 9 6 3
11 2 5 2 0 3
12 2 6 5 2 3
13 3 1 5 6 2
14 3 2 4 0 2
15 3 3 6 5 2
16 3 4 9 0 2
17 3 5 2 0 2
18 3 6 5 0 2

Count number of zeros per row, and remove rows with more than n zeros

I have a question about counting zeros per row. I have a dataframe like this:
a = c(1,2,3,4,5,6,0,2,5)
b = c(0,0,0,2,6,7,0,0,0)
c = c(0,5,2,7,3,1,0,3,0)
d = c(1,2,6,3,8,4,0,4,0)
e = c(0,4,6,3,8,4,0,6,0)
f = c(0,2,5,5,8,4,2,7,4)
g = c(0,8,5,4,7,4,0,0,0)
h = c(1,3,6,7,4,2,0,4,2)
i = c(1,5,3,6,3,7,0,5,3)
j = c(1,5,2,6,4,6,8,4,2)
DF<- data.frame(a=a,b=b,c=c,d=d,e=e,f=f,g=g,h=h,i=i,j=j)
a b c d e f g h i j
1 1 0 0 1 0 0 0 1 1 1
2 2 0 5 2 4 2 8 3 5 5
3 3 0 2 6 6 5 5 6 3 2
4 4 2 7 3 3 5 4 7 6 6
5 5 6 3 8 8 8 7 4 3 4
6 6 7 1 4 4 4 4 2 7 6
7 0 0 0 0 0 2 0 0 0 8
8 2 0 3 4 6 7 0 4 5 4
9 5 0 0 0 0 4 0 2 3 2
I want to count the numbers of zeros per row. If the number of zeros per row is more than a certain number, say 4, I want to remove the complete row. The resulting dataframe looks like this:
a b c d e f g h i j
2 2 0 5 2 4 2 8 3 5 5
3 3 0 2 6 6 5 5 6 3 2
4 4 2 7 3 3 5 4 7 6 6
5 5 6 3 8 8 8 7 4 3 4
6 6 7 1 4 4 4 4 2 7 6
8 2 0 3 4 6 7 0 4 5 4
Is that possible?? Thank you!
It's not only possible, but very easy:
DF[rowSums(DF == 0) <= 4, ]
You could also use apply:
DF[apply(DF == 0, 1, sum) <= 4, ]

Resources