ifelse in r with two or more conditions - r

How can I use a conditional statement in R to define value in a column based on two column conditions?
Data
Term(in month) DayLate NEW_STATUS
12 0 .....
24 24 .....
17 30 .....
9 15 .....
36 21 .....
Pseudocode
if(term <= 12){
if(DayLate <= 14) then NEW_STATUS = "NORM"
if(DayLate between 15~30) then NEW_STATUS = "SPECIAL"
}else if(term > 12){
if(DayLate <= 29) then NEW_STATUS = "NORM"
if(DayLate between 30~89) then NEW_STATUS = "SPECIAL"
}

It can be achieved by nested conditional statements with ifelse() in base or if_else(), case_when() in dplyr.
# data
df <- structure(list(Term = c(12L, 24L, 17L, 9L, 36L), DayLate = c(0L,
24L, 30L, 15L, 21L)), class = "data.frame", row.names = c(NA, -5L))
(1) base way
within(df,
NEW_STATUS <- ifelse(Term <= 12,
ifelse(DayLate <= 14, "NORM", "SPECIAL"),
ifelse(DayLate <= 29, "NORM", "SPECIAL"))
)
(2) dplyr
df %>% mutate(
NEW_STATUS = case_when(
Term <= 12 ~ if_else(DayLate <= 14, "NORM", "SPECIAL"),
TRUE ~ ifelse(DayLate <= 29, "NORM", "SPECIAL")
)
)
Output
# Term DayLate NEW_STATUS
# 1 12 0 NORM
# 2 24 24 NORM
# 3 17 30 SPECIAL
# 4 9 15 SPECIAL
# 5 36 21 NORM

Related

How To return the true condition only of a result on a list in r?

I have a problem with my R code.
Here, I have a list named bought_list with lists of customer and checkout (checkout is a data frame),
And this how checkout lists looks like:
items price qty total
Milk 10 2 20
Dolls 15 10 150
Chocolate 5 5 25
Toys 50 1 50
I want to know which one is for play_purpose and date_purpose
So I made a variable of boolean
play_purpose <- Bought_list[["checkout"]][,"total"] >= 50 & Bought_list[["checkout"]][,"total"] <= 150
date_purpose <- Bought_list[["checkout"]][,"total"] > 0 & Bought_list[["checkout"]][,"total"] < 50
How to return the items name and total value of selected condition like this?
for play_purpose:
Dolls 150
Toys 50
for date_purpose :
Milk 20
Chocolate 25
I'm not clear on the structure of your data, but you could subset with your current code:
play_purpose <-
Bought_list[["checkout"]][Bought_list[["checkout"]][, "total"] >= 50 &
Bought_list[["checkout"]][, "total"] <= 150, c(1, 4)]
# items total
#2 Dolls 150
#4 Toys 50
date_purpose <-
Bought_list[["checkout"]][Bought_list[["checkout"]][, "total"] > 0 &
Bought_list[["checkout"]][, "total"] < 50, c(1, 4)]
# items total
#1 Milk 20
#3 Chocolate 25
Another option is to use dplyr:
Bought_list$checkout %>%
filter(total >= 50 & total <= 150) %>%
select(items, total)
Bought_list$checkout %>%
filter(total > 0 & total < 50) %>%
select(items, total)
Or if you are needing to applying this function to multiple dataframes in the list, then we could use map from purrr:
map(Bought_list, ~ .x %>%
filter(total >= 50 & total <= 150) %>%
select(items, total))
map(Bought_list, ~ .x %>%
filter(total > 0 & total < 50) %>%
select(items, total))
Data
Bought_list <- list(checkout = structure(list(items = c("Milk", "Dolls", "Chocolate",
"Toys"), price = c(10L, 15L, 5L, 50L), qty = c(2L, 10L, 5L, 1L
), total = c(20L, 150L, 25L, 50L)), class = "data.frame", row.names = c(NA,
-4L)))

Creating a column with factor variables conditional on multiple other columns?

I have 4 columns, called Amplification, CNV.gain, Homozygous.Deletion.Frequency, Heterozygous.Deletion.Frequency. I want to create a new column in which, if any of the values in these 4 columns are:
greater than or equal to 5 and less than or equal to 10, it returns low:
greater than 10 and less than or equal to 20, it returns medium
greater than 20, it returns high
An example of the final table (long_fused) would look like this:
CNV.Gain
Amplification
Homozygous.Deletion.Frequency
Heterozygous.Deletion.Frequency
Threshold
3
5
10
0
Low
0
0
11
8
Medium
7
16
25
0
High
So far, I've tried the following code, although it seems to fill in the "Threshold" Column, is doing so incorrectly.
library(dplyr)
long_fused <- long_fused %>%
mutate(Percent_sample_altered = case_when(
Amplification>=5 & Amplification < 10 & CNV.gain>=5 & CNV.gain < 10 | CNV.gain>=5 & CNV.gain<=10 & Homozygous.Deletion.Frequency>=5 & Homozygous.Deletion.Frequency<=10| Heterozygous.Deletion.Frequency>=5 & Heterozygous.Deletion.Frequency<=10 ~ 'Low',
Amplification>= 10 & Amplification<20 |CNV.gain>=10 & CNV.gain<20| Homozygous.Deletion.Frequency>= 10 & Homozygous.Deletion.Frequency<20 | Heterozygous.Deletion.Frequency>=10 & Heterozygous.Deletion.Frequency<20 ~ 'Medium',
Amplification>20 | CNV.gain >20 | Homozygous.Deletion.Frequency >20 | Heterozygous.Deletion.Frequency>20 ~ 'High'))
As always any help is appreciated!
Data in dput format
long_fused <-
structure(list(CNV.Gain = c(3L, 0L, 7L), Amplification = c(5L,
0L, 16L), Homozygous.Deletion.Frequency = c(10L, 11L, 25L),
Heterozygous.Deletion.Frequency = c(0L, 8L, 0L), Threshold =
c("Low", "Medium", "High")), class = "data.frame",
row.names = c(NA, -3L))
Here is a way with rowwise followed by base function cut.
library(dplyr)
long_fused %>%
rowwise() %>%
mutate(new = max(c_across(-Threshold)),
new = cut(new, c(5, 10, 20, Inf), labels = c("Low", "Medium", "High"), left.open = TRUE))
Here's an alternative using case_when -
library(dplyr)
long_fused %>%
mutate(max = do.call(pmax, select(., -Threshold)),
#If you don't have Threshold column in your data just use .
#mutate(max = do.call(pmax, .),
Threshold = case_when(between(max, 5, 10) ~ 'Low',
between(max, 11, 15) ~ 'Medium',
TRUE ~ 'High'))
# CNV.Gain Amplification Homozygous.Deletion.Frequency
#1 3 5 10
#2 0 0 11
#3 7 16 25
# Heterozygous.Deletion.Frequency max Threshold
#1 0 10 Low
#2 8 11 Medium
#3 0 25 High

Correlation of similar variables in R

I have slightly edited the data table.
I would like to correlate variable with similar name in my dataset:
A_y B_y C_y A_p B_p C_p
1 15 52 32 30 98 56
2 30 99 60 56 46 25
3 10 25 31 20 22 30
..........
n 55 23 85 12 34 52
I would like to obtain correlation of
A_y-A_p: 0.78
B_y-B_p: 0.88
C_y-C_p: 0.93
How can I do it in R? Is it possible?
This is really dangerous. Behavior of data.frames with invalid column names is undefined by the language definition. Duplicated column names are invalid.
You should restructure your input data. Anyway, here is an approach with your input data.
DF <- read.table(text = " A B C A B C
1 15 52 32 30 98 56
2 30 99 60 56 46 25
3 10 25 31 20 22 30", header = TRUE, check.names = FALSE)
sapply(unique(names(DF)), function(s) do.call(cor, unname(DF[, names(DF) == s])))
# A B C
#0.9995544 0.1585501 -0.6004010
#compare:
cor(c(15, 30, 10), c(30, 56, 20))
#[1] 0.9995544
Here is another base R option
within(
rev(
stack(
Map(
function(x) do.call(cor, unname(x)),
split.default(df, unique(gsub("_.*", "", names(df))))
)
)
),
ind <- sapply(
ind,
function(x) {
paste0(grep(paste0("^", x), names(df), value = TRUE),
collapse = "-"
)
}
)
)
which gives
ind values
1 A_y-A_p 0.9995544
2 B_y-B_p 0.1585501
3 C_y-C_p -0.6004010
Data
df <- structure(list(A_y = c(15L, 30L, 10L), B_y = c(52L, 99L, 25L),
C_y = c(32L, 60L, 31L), A_p = c(30L, 56L, 20L), B_p = c(98L,
46L, 22L), C_p = c(56L, 25L, 30L)), class = "data.frame", row.names = c("1",
"2", "3"))

Recode continuous variable in R based on conditions

I want to "translate" a syntax written in SPSS into R code but am a total beginner in R and struggling to get it to work.
The SPSS syntax is
DO IF (Geschlecht = 0).
RECODE hang0 (SYSMIS=SYSMIS) (Lowest thru 22.99=0) (23 thru 55=1) (55.01 thru Highest=2)
INTO Hang.
ELSE IF (Geschlecht = 1).
RECODE hang0 (SYSMIS=SYSMIS) (Lowest thru 21.99=0) (22 thru 54=1) (54.01 thru Highest=2)
INTO Hang.
END IF.
I have installed the "car"-package in R but I neither get the "range" recoding to work (I have tried
td_new$Hang <- recode(td_new$hang0, "0:22.99=0; 23:55=1; else=2")
nor do I manage to work with the if-else-function. My last attempt was
if(td_new$Geschlecht == 0){
td_new$Hang <- td_new$hang0 = 3
} else if (td_new$Geschlecht == 1) {
td_new$Hang <- td_new$hang0 = 5)
} else
td_new$hang0 <- NA
(this was without the recoding, just to test the if-else function).
Would be very happy if someone helped!
Thanks a lot in advance :)!
Sorry, edited to add:
The data structure looks as follows:
Geschlecht hang0
0 15
1 45
1 7
0 11
And I want to recode hang0 such that
for boys (Geschlecht = 0): all values < 23 = 0, values between 23 and 55 = 1, all values > 55 = 2
and for girls (Geschlecht = 1): all values < 22 = 0, values between 23 and 54 = 1, all values > 54 = 2
Here's an approach with case_when:
library(dplyr)
td_new %>%
mutate(Hang = case_when(Geschlecht = 0 & hang0 < 23 ~ 0,
Geschlecht = 0 & hang0 >= 23 & hang0 < 55 ~ 1,
Geschlecht = 0 & hang0 >= 55 ~ 2,
Geschlecht = 1 & hang0 < 22 ~ 0,
Geschlecht = 1 & hang0 >= 22 & hang0 < 54 ~ 1,
Geschlecht = 1 & hang0 >= 54 ~ 2,
TRUE ~ NA_real_))
# Geschlecht hang0 Hang
#1 0 15 0
#2 1 45 1
#3 1 7 0
#4 0 11 0
The final line is there to catch NAs.
Data
td_new <- structure(list(Geschlecht = c(0L, 1L, 1L, 0L), hang0 = c(15L, 45L, 7L, 11L)), class = "data.frame", row.names = c(NA, -4L))

R programming_ Subsetting rows on logic conditions

Sample data:
sampleData
Ozone Solar.R Wind Temp Month Day sampleData.Ozone
1 41 190 7.4 67 5 1 41
2 36 118 8.0 72 5 2 36
3 12 149 12.6 74 5 3 12
.........
Want to extract records on the condition $ozone > 31
Here is the code:
data <- sampleData[sampleData$ozone > 31]
And get the error below:
Error in if (inherits(X[[j]], "data.frame") && ncol(xj) > 1L) X[[j]] <- as.matrix(X[[j]]) :
missing value where TRUE/FALSE needed
How should I correct it? Thanks!
R is case sensitive, so your ozone has to match the name in your data.frame. Also to subset a data.frame, you need two indices (row and column) separated by a comma. If there is nothing after the comma, it means that you are selecting all the columns:
sampleData[sampleData$Ozone > 31,]
Other methods to subset a data.frame:
subset(sampleData, Ozone > 31)
or with dplyr:
library(dplyr)
sampleData %>%
filter(Ozone > 31)
Result:
Ozone Solar.R Wind Temp Month Day sampleData.Ozone
1 41 190 7.4 67 5 1 41
2 36 118 8.0 72 5 2 36
Data:
sampleData = structure(list(Ozone = c(41L, 36L, 12L), Solar.R = c(190L, 118L,
149L), Wind = c(7.4, 8, 12.6), Temp = c(67L, 72L, 74L), Month = c(5L,
5L, 5L), Day = 1:3, sampleData.Ozone = c(41L, 36L, 12L)), .Names = c("Ozone",
"Solar.R", "Wind", "Temp", "Month", "Day", "sampleData.Ozone"
), class = "data.frame", row.names = c("1", "2", "3"))

Resources