How to select the rows in R data.frame?

How to select the rows in R data.frame? - r

How can I select the rows, which at least once have value 1 in all 4 columns? or have only 0 through all columns?

We can use filter with if_any/if_all
library(dplyr)
df1 %>%
filter(if_any(everything(), ~ .== 1)|if_all(everything(), ~ . == 0))
Or with base R
df1[(rowSums(df1 == 1) > 0) | (rowSums(df1 == 0) == ncol(df1)),]

Related

How to group by then create a new subset in which the values are conditionally based on the each group's first row and a particular value

I have a simple data frame like this
df <- data.frame(x=c(1,1,3,3,2,2,2,1),
y=c('a','b','a','b','e','a','d','c'))
I want to group by x, create a new data frame of 2 columns: 'x' and 'test'.
The value of 'test' will be based on conditions:
If in each group, if the first row has y == 'a' and then if 'c' appears in the list of values of y, then 'test' = 1 else 0
If in each group, if the first row has y == 'e' and then if 'd' appears in the list of values of y, then 'test' = 1 else 0
So the expected outcome would be as below
Thank you very much.

df %>%
group_by(x) %>%
summarise(test = (first(y) == "a" && any(y == "c") || (first(y) == "e" && any(y == "d"))) * 1L)

library(dplyr)
library(stringr)
df |>
group_by(x) |>
mutate(test = (row_number() == 1 & y == "a" & sum(str_detect(y, "c"))) |
(row_number() == 1 & y == "e" & sum(str_detect(y, "d")))) |>
summarize(test = sum(test))

Mutate in R conditional

I have a dataframe df with columns payment_type and payment_amount.
I want to do a conditional mutate such that if payment_type is "tpt" or "trt", it should make the payment_amount 5 times.
df$payment_amount<-df%>%select(payment_amount)%>%filter(payment_type=='tpt' | payment_type=='trt')%>%mutate(payment_amount=payment_amount*100)
But this isn't working.
TIA

Here is an alternative to Park's answer with %in%.
df %>%
mutate(pament_amount = case_when(
payment_type %in% c("tpt" "trt") ~ 5 * payment_amount,
TRUE ~ payment_amount
))

Try
df %>%
mutate(pament_amount = case_when(
payment_type == "tpt" | payment_type == "trt" ~ 5 * payment_amount,
TRUE ~ payment_amount
))
Approach based on question
df$payment_amount[(df$payment_type=='tpt' | df$payment_type == "trt")]<- df %>%
filter(payment_type=='tpt' | payment_type=='trt')%>%
select(payment_amount)%>%
mutate(payment_amount=payment_amount*100) %>% pull

How to count the number of changes in column (R)

df <- data.frame(Name=c('black','white','green','red','brown', 'blue'),
Num=c(1,1,1,0,1,0))
How many times 1 changed to 0 in the column Num? How I can count it by R?

One way is to use head, tail and count instances where the previous value was 1 and current value is 0.
sum(head(df$Num, -1) == 1 & tail(df$Num, -1) == 0)
#[1] 2
Using the same logic with dplyr lead/lag we can do
library(dplyr)
df %>% filter(Num == 0 & lag(Num) == 1) %>% nrow()
df %>% filter(Num == 1 & lead(Num) == 0) %>% nrow()

We can just use rle from base R
sum(rle(df$Num)$values)
#[1] 2
Or with rleid from data.table
library(data.table)
nrow(setDT(df)[, .N[any(Num > 0)] , rleid(Num)])
#[1] 2

Creating functions with logical comparatives as input R

I've got several sequential comparative evaluations to conduct with two variables in R in order to check for concordance.
In this example, say I have a boolean ANES_6 and a numeric ANES. The boolean is 1 if the patient had anesthesia for more than 6 hours, 0 else. The numeric value is the time the patient was under anesthesia.
I'm looking to write a function which can replace multiple copy-pastes of the following:
data %>% select(ANES_6, ANES) %>%
filter(ANES_6 == 1 & ANES < 6)) %>%
tally()
data %>% select(ANES_6, ANES) %>%
filter(ANES_6 == 0 & ANES >= 6)) %>%
tally()
data %>% select(ANES_6, ANES) %>%
filter(ANES_6 == 1 & ANES >= 6)) %>%
tally()
data %>% select(ANES_6, ANES) %>%
filter(ANES_6 == 0 & ANES >= 6)) %>%
tally()
I could create the following function (non-exhaustive of all cases shown above):
my_func <- function(x, y) {
if (x == "gt" & y == 1) {
data %>% select(ANES_6, AnaestheticTime_hours_) %>%
filter(ANES >= 6 & ANES_6 == 1) %>%
tally()
} else if (x == "lt" & y == 0 ) {
data %>% select(ANES_6, AnaestheticTime_hours_) %>%
filter(ANES < 6 & ANES_6 != 1) %>%
tally()
}}
which takes x and y as input, with values for x being c('lt', 'gt'), and y being c(0, 1), in order to evaluate all possible condition. However, this would entail writing more code, and not less.
Is there a way to input logical comparisons in the function such that the following works:
my_func <- function(x, y) {
data %>% select(ANES_6, ANES) %>%
filter(ANES x 6 & ANES_6 == y)
}
with x replaced by >=, <, etc, in the input of the function. Currently, this does not work, are there any workarounds?

Try grouping. The question should normally include reproducible test data but I have provided it this time.
library(dplyr)
data <- data.frame(ANES_6 = c(0, 0, 1, 1), ANES = 5:6) # test data
data %>%
group_by(ANES_6, ANES >= 6) %>%
tally %>%
ungroup
giving:
# A tibble: 4 x 3
ANES_6 `ANES >= 6` n
<dbl> <lgl> <int>
1 0. FALSE 1
2 0. TRUE 1
3 1. FALSE 1
4 1. TRUE 1

R get rows based on multiple conditions - use dplyr and reshape2

df <- data.frame(
exp=c(1,1,2,2),
name=c("gene1", "gene2", "gene1", "gene2"),
value=c(1,1,3,-1)
)
In trying to get customed to the dplyr and reshape2I stumbled over a "simple" way to select rows based on several conditions. If I want to have those genes (the namevariable) that have valueabove 0 in experiment 1 (exp== 1) AND at the same time valuebelow 0 in experiment 2; in df this would be "gene2". Sure there must be many ways to this, e.g. subset df for each set of conditions (exp==1 & value > 0, and exp==2 and value < 0) and then join the results of these subset:
library(dplyr)
inner_join(filter(df,exp == 1 & value > 0),filter(df,exp == 2 & value < 0), by= c("name"="name"))[[1]]
Although this works it looks very akward, and I feel that such conditioned filtering lies at the heart of reshape2 and dplyr but cannot figure out how to do this. Can someone enlighten me here?

One alternative that comes to mind is to transform the data to a "wide" format and then do the filtering.
Here's an example using "data.table" (for the convenience of compound-statements):
library(data.table)
dcast.data.table(as.data.table(df), name ~ exp)[`1` > 0 & `2` < 0]
# name 1 2
# 1: gene2 1 -1
Similarly, with "dplyr" and "tidyr":
library(dplyr)
library(tidyr)
df %>%
spread(exp, value) %>%
filter(`1` > 0 & `2` < 0)

Another dplyr option is:
group_by(df, name) %>% filter(value[exp == 1] > 0 & value[exp == 2] < 0)
#Source: local data frame [2 x 3]
#Groups: name
#
# exp name value
#1 1 gene2 1
#2 2 gene2 -1

Probably this is even more convoluted than your own solution, but I think it has a "dplyr" feel:
df %>%
filter((exp == 1 & value > 0) | (exp == 2 & value < 0)) %>%
group_by(name) %>%
filter(length(unique(exp)) == 2) %>%
select(name) %>%
unique()
#Source: local data frame [1 x 1]
#Groups: name
# name
#1 gene2

filter allows multiple parameters with comma, sames as select. Each extra condition is an AND:
group_by(df, name) %>% filter(value[exp == 1] > 0, value[exp == 2] < 0)
From official documentation: https://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html
The examples shown there are:
flights[flights$month == 1 & flights$day == 1, ] in base R
filter(flights, month == 1, day == 1) in dplyr.