find the count of values having zero in rows in dataframe - r

i am trying to calculate the count of zero in rows and then subtract it from 5
for eg in excel =3-COUNTIF(SM1:SM3,0)
any solution for this
df <- data.frame("T_1_1"= c(68,NA,0,105,NA,0,135,NA,24),
"T_1_2"=c(26,NA,0,73,NA,97,46,NA,0),
"T_1_3"=c(93,32,NA,103,NA,0,147,NA,139),
"S_2_1"=c(69,67,94,0,NA,136,NA,92,73),
"S_2_2"=c(87,67,NA,120,NA,122,0,NA,79),
"S_2_3"= c(150,0,NA,121,NA,78,109,NA,0),
"T_1_0"= c(79,0,0,NA,98,NA,15,NA,2)
)
df <- df %>% mutate(ltc = (5-rowSums(select(., matches('T_1[1-9]')) == 0,na.rm = TRUE)))

I believe you forgot an underscore in matches().
df %>%
mutate(ltc = 5 - rowSums(select(., matches('T_1_[1-9]')) == 0, na.rm = T))

Here is a base R option using rowSums
df$ltc = 5- rowSums(df == 0, na.rm = TRUE)

Related

R Set Column Value based on other Column Values

I need to set the values of a column to 0 or 1 based on other columns values.
If they are 0 or NA the new column should be 1.
I Thought about:
ifelse(df[,53:62]==0|NA, df$newCol <- 1, df$newCol <- 0)
But I the End I get only 1 in the new Column
Thanks for your help
I think the tidyverse fits perfectly on this common use case
library(tidyverse)
df_example <- matrix(c(0,1),ncol = 100,nrow = 100) %>%
as_tibble()
df_example %>%
mutate(across(.cols = 53:62,
.fns = ~ if_else(.x == 0|is.na(.x),
1,
0))
) %>%
select(V54) # example**

How to group by two column in R but with if statment for second?

I can't found any help lf internet.
I have 3 cols in .sav file loaded to R studio.
Is M with values 1,2,3,4,5,6,7 and label: weight, and N with values 1,2,3 and label diet.
I want group by it by these columns, but for N col I want only pick those where value is 1. Also I have last column with age data A.
I wrote this:
library(dplyr)
df%>%
group_by(M, N) %>%
summarize(values = mean(A, na.rm = TRUE))
And I got group by but for all N.
I tried something like this:
library(dplyr)
df%>%
group_by(M, N == 1) %>%
summarize(values = mean(A, na.rm = TRUE))
but I got again group for all categories from N with NA etc.
Expcted: I want only group_by by M - all values, and N where value =1.
How should that group by looks?
We can do a group by 'M' and summarise the filtered 'A'
library(dplyr)
df %>%
group_by(M) %>%
summarise(values = mean(A[N == 1], na.rm = TRUE))
Or another option is to have a filter in between, but this would also remove the groups where there are no 'N' as 1
df %>%
filter(N == 1) %>%
group_by(M) %>%
summarise(values = mean(A, na.rm = TRUE))

Is there a way of creating a loop that will create a new variable for each of the original 18 variables?

I have a data set with 4 variables, one of these variables is a dummy stating whether the individual graduated from a particular program (exits). I need to create a loop that will, for each of the 3 variables create two new variables (mean for dummy = 1 and mean for dummy = 0). This is my code, I want to make it more efficient, since afterwards I want to create a new data.frame for exits == 0 and substract both!.
summary_means_1 = bf %>%
filter(exits == 1) %>%
summarise(
v1_1 = as.double(mean(bf$v25_grad, na.rm = TRUE)),
v2_1 = as.double(mean(bf$v29_read, na.rm = TRUE)),
v3_1 = as.double(mean(bf$v30_math, na.rm = TRUE))
)
You can do this with the plyr package:
Say this is your data (simplified):
df <- data.frame(Dummy=sample(0:1, 10, T), V1=rnorm(10, 10), V2=rpois(10, 0.5))
This code will calculate the mean of each column, split by dummy:
library(magrittr)
library(plyr)
df %>%
group_by(Dummy) %>%
summarise(Mean_V1=mean(V1, na.rm = T),
Mean_V2=mean(V2, na.rm = T))
You'll need to add a new row in the summarise section for each column.
Using base R you can use colMeans with subsetted data:
colMeans(df[df$Dummy==0, -1])
colMeans(df[df$Dummy==1, -1])
Or you could combine them like this:
data.frame(Col=c("V1", "V2"),
Mean_0=colMeans(df[df$Dummy==0, -1]),
Mean_1=colMeans(df[df$Dummy==1, -1]))

Filter N number of columns for specific value

I have a large dataframe with over 100 conditions as boolean columns (not an ideal setup but I can't change it). I'm trying to make a function that takes a variable number of condition-columns, then filters where all conditions are 1 or all are zero.
SETUP
library(dplyr)
set.seed(123)
ID <- sample(1:5, 20, replace = TRUE)
Val <- round(runif(length(ID), 20, 40),0)
cond_1 <- sample(0:1, length(ID), replace = TRUE)
cond_2 <- sample(0:1, length(ID), replace = TRUE)
cond_3 <- sample(0:1, length(ID), replace = TRUE)
cond_4 <- sample(0:1, length(ID), replace = TRUE)
df <- data.frame(ID, Val, cond_1, cond_2, cond_3, cond_4, stringsAsFactors = FALSE)
Example of desired function for any two columns:
filterTwoCols <- function(df, cols){
# Select desired conditions
df1 <- df %>%
select(ID, Val, one_of(cols))
#### Filter on all conditions == 0 or all conditions == 1
df2 <- df1 %>%
filter(.[,ncol(.)] == 1 & .[,ncol(.) - 1] == 1 |
.[,ncol(.)] == 0 & .[,ncol(.) - 1] == 0)
return(df2)
}
filterTwoCols(df, c('cond_1', 'cond_4'))
filterTwoCols(df, c('cond_3', 'cond_2'))
What I want to be able to do is to name any number of conditions (e.g. filterManyCols(df, c('cond_1', 'cond_3', 'cond_4')), but I don't know how to do this without naming them explicitly in the filter (.[,ncol(.) - 2] == 1, .[,ncol(.) - 3] == 1, etc). If the number of columns selected don't match the number of conditions in the filter then it won't work. Any thoughts?
One option is filter_at
library(tidyverse)
filterManyCols <- function(df, cols){
# Select desired conditions
# Not clear whether we need to subset the columns or get the filtered
# full dataset columns
# df <- df %>%
# select(ID, Val, one_of(cols))
map_df(0:1, ~ df %>%
filter_at(vars(one_of(cols)), all_vars(. == .x)))
}
filterManyCols(df, c('cond_1', 'cond_4'))
filterManyCols(df, c('cond_1', 'cond_2', 'cond_3'))
filterManyCols(df, c('cond_1', 'cond_2', 'cond_3', 'cond_4'))

Mean excluding zero and na for all columns with dplyr

I want do do a mean of my dataframe with the dplyr package for all my colums.
n = c(NA, 3, 5)
s = c("aa", "bb", "cc")
b = c(3, 0, 5)
df = data.frame(n, s, b)
Here I want my function to get mean = 4 the n and b columns
I tried mean(df$n[df$n>0]) buts it's not easy for a large dataframe.
I want something like df %>% summarise_each(funs(mean)) ...
Thanks
If you don't want 0s it's probably that you consider them as NAs, so let's be explicit about it, then summarize numeric columns with na.rm = TRUE :
library(dplyr)
df[df==0] <- NA
summarize_if(df, is.numeric, mean, na.rm = TRUE)
# n b
# 1 4 4
As a one liner:
summarize_if(`[<-`(df, df==0, value= NA), is.numeric, mean, na.rm = TRUE)
and in base R (result as a named numeric vector)
sapply(`[<-`(df, df==0, value= NA)[sapply(df, is.numeric)], mean, na.rm=TRUE)
Cf elegant David Answer :
df %>% summarise_each(funs(mean(.[!is.na(.) & . != 0])), -s)
Or
df %>% summarise_each(funs(mean(.[. != 0], na.rm = TRUE)), -s)

Resources