Using a loop in filter

Using a loop in filter - r

I am quite new to R. Using dplyr and filter, I want to select records for which a list of variables !=NA.
df %>% filter (var1 != "NA" | var2 != "NA" | var3 != "NA" )
The problem is that I have 85 such variables (ending with HR). So I have extracted them and put them in a vector.
hr_variables <- grep("HR$", names(ssc), value=TRUE)
I would like to make a loop that will fetch hr_variable and then filter() by applying the OR condition to each element.
Is this possible in R?

We can use base R to do this more easily
ssc[!rowSums(is.na(ssc[hr_variables])),]
# col1_HR col2_HR col3
#2 1 3 0.5365853
#3 2 4 0.4196231
Or using tidyverse
library(tidyverse)
ssc %>%
select_(.dots = hr_variables) %>%
map(~is.na(.)) %>%
reduce(`|`) %>%
`!` %>%
extract(ssc, .,)
Or with complete.cases
ssc %>%
select_(.dots = hr_variables) %>%
complete.cases(.) %>%
extract(ssc, ., )
data
set.seed(24)
ssc <- data.frame(col1_HR = c(NA, 1, 2, 3), col2_HR = c(NA, 3, 4, NA), col3 = rnorm(4))

Related

How can i change multiple columns using the same condition in R?

I need to recode some columns in my data, there are 29 columns with the same coded expressions
The cells are coded with numbers, something like that:
1 - Normal
2 - Altered
3 - NA
I want to create a for loop to change all columns at the same time. I need to transform the number code (1;2;3) into names(Normal;Alteres;NA)
thats what im trying to do.... i dont get any error message but this arent working....
for (i in names(df[,123:151])){
mutate(i = case_when(
i == 1 ~ 'Normal',
i == 2 ~ 'Altered',
i == 3 ~ 'NA'))
}

An easy way to do this would be to use dplyr from tidyverse.
library(tidyverse)
#make test dataframe
col1 <- c("1", "2", "3")
col2 <- c(3, 2, 2)
df <- data.frame(col1, col2)
df_recoded<-df %>%
mutate(across(.cols = everything(), ~case_when(
. == 1 ~ 'Normal',
. == 2 ~ 'Altered',
. == 3 ~ NA_character_)))

Try this:
df %>% mutate(across(.cols = names(df)[121:151],
.fns = ~recode(.,`1` = "Normal", `2` = "Altered", `3` = "NA",.default=NA_character_)))

Applying functions in dplyr pipes

Given a data frame like data:
data <- data.frame(group = rep(c('a','b'), each= 100),
value = rnorm(200))
We want to filter values for group == b using dplyr and use boxplot.stats to identify outliers:
library(dplyr)
data%>%
filter(group == 'b')%>%
summarise(out.stats = boxplot.stats(value))
This returns the error Column out.stats must be length 1 (a summary value), not 4, why does this not work? How do you apply functions like this inside a pipe?

The following answers to the question and to the last comment to the question, where the OP asks for the row numbers of the outliers.
what if we want to return the row numbers that go with
boxplot.stats()$out from the pipe? so if we did
b<-data%>%filter(group=='b') outside of the pipe, we could have used:
which(b$value %in% boxplot.stats(b$value)$out)
This is done by left_joining with the original data.
library(dplyr)
set.seed(1234)
data <- data.frame(group = rep(c('a','b'), each= 100),
value = rnorm(200))
data %>% filter(group == 'b') %>% pull(value) %>%
boxplot.stats() %>% '[['('out') %>%
data.frame() %>%
left_join(data, by = c('.' = 'value'))
# . group
#1 3.043766 b
#2 -2.732220 b
#3 -2.855759 b

We can use the new version of dplyr which can also return summarise with more than one row
library(dplyr) # >= 1.0.0
data%>%
filter(group == 'b')%>%
summarise(out.stats = boxplot.stats(value))
# out.stats
#1 -2.4804222, -0.7546693, 0.1304050, 0.6390749, 2.2682247
#2 100
#3 -0.08980661, 0.35061653
#4 -3.014914

How to use mutate rowwise with complex row operation?

How can I use mutate to achieve the below?
bd_diag_date <- df %>%
apply(1, function(dates) last(na.omit(dates))) %>%
as.data.frame() %>%
`colnames<-`("diag_date")
I tried this below but didn't work. I can't find out why and it says Error: Column 'diagnosis_date' is of unsupported type symbol. Should I assume mutate takes any function operation that can apply to a vector? If not, then what kind of operation does it accept?
bd_diag_date <- df %>%
rowwise() %>%
{mutate(., diag_date=last(na.omit(all_vars(.))))}
I also have a more general questions. That is how can I debug this? Every time I encounter this problem I have to google stack exchange but I feel like this isn't the right way to improve my dplyr skill.

We can use pmap
library(dplyr)
library(purrr)
df %>%
mutate(diag_date = pmap(., ~ last(na.omit(c(...)))))
If the columns are numeric, we can use pmap_dbl, simply using pmap returns a list column
df %>%
mutate(diag_date = pmap_dbl(., ~ last(na.omit(c(...)))))
# col1 col2 col3 diag_date
#1 1 NA 2 2
#2 NA 2 NA 2
#3 3 4 NA 4
If we need to return only a single column, use transmute
df %>%
transmute(diag_date = pmap_dbl(., ~ last(na.omit(c(...)))))
Or with group_split and map
df %>%
group_split(grp = row_number(), keep = FALSE) %>%
map_dfr(~ .x %>%
transmute(diag_date = last(na.omit(unlist(.)))))
Or using base R with max.col
df$diag_date <- df[cbind(seq_len(nrow(df)), max.col(!is.na(df), 'last'))]
data
df <- data.frame(col1 = c(1, NA, 3), col2 = c(NA, 2, 4), col3 = c(2, NA, NA))

Using dplyr mutuate (or other package) to create new column based on count of specific values in each row

I have a data frame containing several forms of data, such as:
<dbl> <chr> <dttm> <chr> <chr>
0001 cccc Feb-01-18 bbbb 1ab76
0002 bbbb Apr-02-20 cccc 7we54
...
What I'm trying to do is create a new column "f" that returns a count of the number of specific character values (e.g., "cccc" OR "bbbb") within each row. I've tried using a combination of the dplyr merge function and rowSums but have not had any luck despite trying several variations.
df %>% mutate(new = rowSums(. == "cccc"))
Any guidance would be greatly appreciated. Thanks!

One option would be to specify the |
library(dplyr)
df %>%
mutate(f = rowSums(. == "cccc"| .== "bbbb"))
Also, this can be made more specific by checking only columns that are character class
df %>%
select_if(is.character) %>%
transmute(f = rowSums(. == "cccc" | . == "bbbb"))%>%
bind_cols(df, .)

Base R solution:
df <- data.frame(a = c("c","b"), d = c("c", "c"), e = c(1,2), stringsAsFactors = F)
pattern <- "c"
df["count"] <- rowSums(apply(df, 2, function(x, s = pattern) x %in% s))

Avoiding missing row after summarise

I'm using RStudio Version 0.98.1028 on windows. Summarising a multi level data frame, package dplyr, using the function sum(), I lost a row, which had sum = 0. In other words, if my original data frame was something like
group <- as.factor(rep(c('X', 'Y'), each = 1, times = 6))
type <- as.factor(rep(c('a', 'b'), each = 2, times = 3))
day <- as.factor(rep(1:3, each = 4))
df = data.frame(type = type, day = day, value = abs(rnorm(12)))
df = df[day != 1 | type != 'a',]
and I summarise it
df1 = df %>%
group_by(day, type) %>%
summarise(sum = sum(value))
then I get one missing row, which is the interaction between day = 1 and type = a, which I would like to have (even if it's 0...)
Thanks in advance!
EB

You could try left_join
library(dplyr)
left_join(expand.grid(type=unique(df$type), day=unique(df$day)), df1) %>%
group_by(day, type) %>%
summarise(sum=sum(value, na.rm=TRUE))
# day type sum
#1 1 a 0.0000000
#2 1 b 0.5132914
#3 2 a 1.2482210
#4 2 b 0.9232343
#5 3 a 2.0381779
#6 3 b 0.7558351
where df1 is
df1 <- df[day != 1 | type != 'a',]

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Using a loop in filter - r

Related

How can i change multiple columns using the same condition in R?

Applying functions in dplyr pipes

How to use mutate rowwise with complex row operation?

Using dplyr mutuate (or other package) to create new column based on count of specific values in each row

Avoiding missing row after summarise

Categories

Resources