Recode continuous variable in R based on conditions - r

I want to "translate" a syntax written in SPSS into R code but am a total beginner in R and struggling to get it to work.
The SPSS syntax is
DO IF (Geschlecht = 0).
RECODE hang0 (SYSMIS=SYSMIS) (Lowest thru 22.99=0) (23 thru 55=1) (55.01 thru Highest=2)
INTO Hang.
ELSE IF (Geschlecht = 1).
RECODE hang0 (SYSMIS=SYSMIS) (Lowest thru 21.99=0) (22 thru 54=1) (54.01 thru Highest=2)
INTO Hang.
END IF.
I have installed the "car"-package in R but I neither get the "range" recoding to work (I have tried
td_new$Hang <- recode(td_new$hang0, "0:22.99=0; 23:55=1; else=2")
nor do I manage to work with the if-else-function. My last attempt was
if(td_new$Geschlecht == 0){
td_new$Hang <- td_new$hang0 = 3
} else if (td_new$Geschlecht == 1) {
td_new$Hang <- td_new$hang0 = 5)
} else
td_new$hang0 <- NA
(this was without the recoding, just to test the if-else function).
Would be very happy if someone helped!
Thanks a lot in advance :)!
Sorry, edited to add:
The data structure looks as follows:
Geschlecht hang0
0 15
1 45
1 7
0 11
And I want to recode hang0 such that
for boys (Geschlecht = 0): all values < 23 = 0, values between 23 and 55 = 1, all values > 55 = 2
and for girls (Geschlecht = 1): all values < 22 = 0, values between 23 and 54 = 1, all values > 54 = 2

Here's an approach with case_when:
library(dplyr)
td_new %>%
mutate(Hang = case_when(Geschlecht = 0 & hang0 < 23 ~ 0,
Geschlecht = 0 & hang0 >= 23 & hang0 < 55 ~ 1,
Geschlecht = 0 & hang0 >= 55 ~ 2,
Geschlecht = 1 & hang0 < 22 ~ 0,
Geschlecht = 1 & hang0 >= 22 & hang0 < 54 ~ 1,
Geschlecht = 1 & hang0 >= 54 ~ 2,
TRUE ~ NA_real_))
# Geschlecht hang0 Hang
#1 0 15 0
#2 1 45 1
#3 1 7 0
#4 0 11 0
The final line is there to catch NAs.
Data
td_new <- structure(list(Geschlecht = c(0L, 1L, 1L, 0L), hang0 = c(15L, 45L, 7L, 11L)), class = "data.frame", row.names = c(NA, -4L))

Related

How To return the true condition only of a result on a list in r?

I have a problem with my R code.
Here, I have a list named bought_list with lists of customer and checkout (checkout is a data frame),
And this how checkout lists looks like:
items price qty total
Milk 10 2 20
Dolls 15 10 150
Chocolate 5 5 25
Toys 50 1 50
I want to know which one is for play_purpose and date_purpose
So I made a variable of boolean
play_purpose <- Bought_list[["checkout"]][,"total"] >= 50 & Bought_list[["checkout"]][,"total"] <= 150
date_purpose <- Bought_list[["checkout"]][,"total"] > 0 & Bought_list[["checkout"]][,"total"] < 50
How to return the items name and total value of selected condition like this?
for play_purpose:
Dolls 150
Toys 50
for date_purpose :
Milk 20
Chocolate 25
I'm not clear on the structure of your data, but you could subset with your current code:
play_purpose <-
Bought_list[["checkout"]][Bought_list[["checkout"]][, "total"] >= 50 &
Bought_list[["checkout"]][, "total"] <= 150, c(1, 4)]
# items total
#2 Dolls 150
#4 Toys 50
date_purpose <-
Bought_list[["checkout"]][Bought_list[["checkout"]][, "total"] > 0 &
Bought_list[["checkout"]][, "total"] < 50, c(1, 4)]
# items total
#1 Milk 20
#3 Chocolate 25
Another option is to use dplyr:
Bought_list$checkout %>%
filter(total >= 50 & total <= 150) %>%
select(items, total)
Bought_list$checkout %>%
filter(total > 0 & total < 50) %>%
select(items, total)
Or if you are needing to applying this function to multiple dataframes in the list, then we could use map from purrr:
map(Bought_list, ~ .x %>%
filter(total >= 50 & total <= 150) %>%
select(items, total))
map(Bought_list, ~ .x %>%
filter(total > 0 & total < 50) %>%
select(items, total))
Data
Bought_list <- list(checkout = structure(list(items = c("Milk", "Dolls", "Chocolate",
"Toys"), price = c(10L, 15L, 5L, 50L), qty = c(2L, 10L, 5L, 1L
), total = c(20L, 150L, 25L, 50L)), class = "data.frame", row.names = c(NA,
-4L)))

Creating a column with factor variables conditional on multiple other columns?

I have 4 columns, called Amplification, CNV.gain, Homozygous.Deletion.Frequency, Heterozygous.Deletion.Frequency. I want to create a new column in which, if any of the values in these 4 columns are:
greater than or equal to 5 and less than or equal to 10, it returns low:
greater than 10 and less than or equal to 20, it returns medium
greater than 20, it returns high
An example of the final table (long_fused) would look like this:
CNV.Gain
Amplification
Homozygous.Deletion.Frequency
Heterozygous.Deletion.Frequency
Threshold
3
5
10
0
Low
0
0
11
8
Medium
7
16
25
0
High
So far, I've tried the following code, although it seems to fill in the "Threshold" Column, is doing so incorrectly.
library(dplyr)
long_fused <- long_fused %>%
mutate(Percent_sample_altered = case_when(
Amplification>=5 & Amplification < 10 & CNV.gain>=5 & CNV.gain < 10 | CNV.gain>=5 & CNV.gain<=10 & Homozygous.Deletion.Frequency>=5 & Homozygous.Deletion.Frequency<=10| Heterozygous.Deletion.Frequency>=5 & Heterozygous.Deletion.Frequency<=10 ~ 'Low',
Amplification>= 10 & Amplification<20 |CNV.gain>=10 & CNV.gain<20| Homozygous.Deletion.Frequency>= 10 & Homozygous.Deletion.Frequency<20 | Heterozygous.Deletion.Frequency>=10 & Heterozygous.Deletion.Frequency<20 ~ 'Medium',
Amplification>20 | CNV.gain >20 | Homozygous.Deletion.Frequency >20 | Heterozygous.Deletion.Frequency>20 ~ 'High'))
As always any help is appreciated!
Data in dput format
long_fused <-
structure(list(CNV.Gain = c(3L, 0L, 7L), Amplification = c(5L,
0L, 16L), Homozygous.Deletion.Frequency = c(10L, 11L, 25L),
Heterozygous.Deletion.Frequency = c(0L, 8L, 0L), Threshold =
c("Low", "Medium", "High")), class = "data.frame",
row.names = c(NA, -3L))
Here is a way with rowwise followed by base function cut.
library(dplyr)
long_fused %>%
rowwise() %>%
mutate(new = max(c_across(-Threshold)),
new = cut(new, c(5, 10, 20, Inf), labels = c("Low", "Medium", "High"), left.open = TRUE))
Here's an alternative using case_when -
library(dplyr)
long_fused %>%
mutate(max = do.call(pmax, select(., -Threshold)),
#If you don't have Threshold column in your data just use .
#mutate(max = do.call(pmax, .),
Threshold = case_when(between(max, 5, 10) ~ 'Low',
between(max, 11, 15) ~ 'Medium',
TRUE ~ 'High'))
# CNV.Gain Amplification Homozygous.Deletion.Frequency
#1 3 5 10
#2 0 0 11
#3 7 16 25
# Heterozygous.Deletion.Frequency max Threshold
#1 0 10 Low
#2 8 11 Medium
#3 0 25 High

How to extract the previous n rows where a certain column value cannot be a particular value?

I've been searching for quite some time now with no luck. Essentially, I'm trying to figure out a way in R to extract the previous n rows where the "LTO Column" is a 0 but starting from where the "LTO Column" is a 1.
Data table:
Week Price LTO
1/1/2019 11 0
2/1/2019 12 0
3/1/2019 11 0
4/1/2019 11 0
5/1/2019 9.5 1
6/1/2019 10 0
7/1/2019 8 1
Then what I'm trying to do is say if n = 3, starting from 5/1/2019 where LTO = 1. I want to be able to pull the rows 4/1/2019, 3/1/2019. 2/1/2019.
But then for 7/1/2019 where the LTO is also equal to 1, I want to grab the rows 6/1/2019, 4/1/2019, 3/1/2019. In this situation it skips the row 5/1/2019 because is has a 1 in the LTO column.
Any help would be much appreciated.
There could be better way to do this , here is one attempt using base R.
#Number of rows to look back
n <- 3
#Find row index where LTO is 1.
inds <- which(df$LTO == 1)
#Remove row index where LTO is 1
remaining_rows <- setdiff(seq_len(nrow(df)), inds)
#For every inds find the previous n rows from remaining_rows
#use it to subset from the dataframe and add a new column week2
#with its corresponding date
do.call(rbind, lapply(inds, function(x) {
o <- match(x - 1, remaining_rows)
transform(df[remaining_rows[o:(o - (n -1))], ], week2 = df$Week[x])
}))
# Week Price LTO week2
#4 4/1/2019 11 0 5/1/2019
#3 3/1/2019 11 0 5/1/2019
#2 2/1/2019 12 0 5/1/2019
#6 6/1/2019 10 0 7/1/2019
#41 4/1/2019 11 0 7/1/2019
#31 3/1/2019 11 0 7/1/2019
data
df <- structure(list(Week = structure(1:7, .Label = c("1/1/2019",
"2/1/2019", "3/1/2019", "4/1/2019", "5/1/2019", "6/1/2019", "7/1/2019"), class =
"factor"), Price = c(11, 12, 11, 11, 9.5, 10, 8), LTO = c(0L, 0L, 0L,
0L, 1L, 0L, 1L)), class = "data.frame", row.names = c(NA, -7L))

ifelse in r with two or more conditions

How can I use a conditional statement in R to define value in a column based on two column conditions?
Data
Term(in month) DayLate NEW_STATUS
12 0 .....
24 24 .....
17 30 .....
9 15 .....
36 21 .....
Pseudocode
if(term <= 12){
if(DayLate <= 14) then NEW_STATUS = "NORM"
if(DayLate between 15~30) then NEW_STATUS = "SPECIAL"
}else if(term > 12){
if(DayLate <= 29) then NEW_STATUS = "NORM"
if(DayLate between 30~89) then NEW_STATUS = "SPECIAL"
}
It can be achieved by nested conditional statements with ifelse() in base or if_else(), case_when() in dplyr.
# data
df <- structure(list(Term = c(12L, 24L, 17L, 9L, 36L), DayLate = c(0L,
24L, 30L, 15L, 21L)), class = "data.frame", row.names = c(NA, -5L))
(1) base way
within(df,
NEW_STATUS <- ifelse(Term <= 12,
ifelse(DayLate <= 14, "NORM", "SPECIAL"),
ifelse(DayLate <= 29, "NORM", "SPECIAL"))
)
(2) dplyr
df %>% mutate(
NEW_STATUS = case_when(
Term <= 12 ~ if_else(DayLate <= 14, "NORM", "SPECIAL"),
TRUE ~ ifelse(DayLate <= 29, "NORM", "SPECIAL")
)
)
Output
# Term DayLate NEW_STATUS
# 1 12 0 NORM
# 2 24 24 NORM
# 3 17 30 SPECIAL
# 4 9 15 SPECIAL
# 5 36 21 NORM

How to add modified row to dataframe in R?

I have made a function which increments the values in certain columns in a certain row. I did this by writing a function that subsets through my dataframe to find the row it needs (by looking at sex, then age, then deprivation, then number of partners) and then adds numbers to whichever column I need it to (depending on these risk factors), it then calculates the risk (my code is for STI testing).
However, this does not change my existing dataframe with the new values, but creates a new variable patientRow which holds these new values. I need help with how I can incorporate this into my existing dataframe. Thanks!
adaptRisk <- function(dataframe, sexNum, ageNum, deprivationNum,
partnerNum, testResult){
sexRisk = subset(dataframe, sex == sexNum)
ageRisk = subset(sexRisk, age == ageNum)
depRisk = subset(ageRisk, deprivation == deprivationNum)
patientRow = subset(depRisk, partners == partnerNum)
if (testResult == "positive") {
patientRow$tested <- patientRow$tested + 1
patientRow$infected <- patientRow$infected + 1
}
else if (testResult == "negative") {
patientRow$tested <- patientRow$tested + 1
}
patientRow <- transform(patientRow, risk = infected/tested)
return(patientRow)
}
This is the head of my dataframe to give you an idea:
sex age deprivation partners tested infected risk
1 Female 16-19 1-2 0-1 132 1 0.007575758
2 Female 16-19 1-2 2 25 1 0.040000000
3 Female 16-19 1-2 >=3 30 1 0.033333333
4 Female 16-19 3 0-1 80 2 0.025000000
5 Female 16-19 3 2 12 1 0.083333333
6 Female 16-19 3 >=3 18 1 0.055555556
The dput of my data is:
structure(list(sex = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label =
c("Female",
"Male"), class = "factor"), age = structure(c(1L, 1L, 1L, 1L,
1L, 1L), .Label = c("16-19", "20-24", "25-34", "35-44"), class =
"factor"),
deprivation = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("1-2",
"3", "4-5"), class = "factor"), partners = structure(c(2L,
3L, 1L, 2L, 3L, 1L), .Label = c(">=3", "0-1", "2"), class = "factor"),
tested = c(132L, 25L, 30L, 80L, 12L, 18L), infected = c(1L,
1L, 1L, 2L, 1L, 1L), uninfected = c(131L, 24L, 29L, 78L,
11L, 17L), risk = c(0.00757575757575758, 0.04, 0.0333333333333333,
0.025, 0.0833333333333333, 0.0555555555555556)), .Names = c("sex",
"age", "deprivation", "partners", "tested", "infected", "uninfected",
"risk"), row.names = c(NA, 6L), class = "data.frame")
An example call to the function:
adaptRisk(data, "Female", "16-19", 3, 2, "positive")
sex age deprivation partners tested infected uninfected risk
5 Female 16-19 3 2 13 2 11 0.1538462
I have adjusted your function (see all the way below) using base R syntax. It does the job, but is not the most beautiful code.
Issue:
The subsets create a lot of extra (and not needed) data.frames, instead of replacing the internal values when the conditions match. And the return was a different data.frame so the existing data.frame could not handle it correctly.
I adjusted it so that the filters are done on the needed objects that you want to change.
Transform might have unintended side effects and you were recalculating the whole risk column. Now only the affected value is recalculated.
You might want to built in some warnings / stops in case the filters return more than 1 record.
You can now use
df <- adaptRisk(df, "Female", "16-19", "3", "2", "positive") to replace the values in the data.frame you supply to the function
examples
# affects row 5
adaptRisk(df, "Female", "16-19", "3", "2", "positive")
sex age deprivation partners tested infected uninfected risk
1 Female 16-19 1-2 0-1 132 1 131 0.007575758
2 Female 16-19 1-2 2 25 1 24 0.040000000
3 Female 16-19 1-2 >=3 30 1 29 0.033333333
4 Female 16-19 3 0-1 80 2 78 0.025000000
5 Female 16-19 3 2 13 2 11 0.153846154
6 Female 16-19 3 >=3 18 1 17 0.055555556
# affects row 5
adaptRisk(df, "Female", "16-19", "3", "2", "negative")
sex age deprivation partners tested infected uninfected risk
1 Female 16-19 1-2 0-1 132 1 131 0.007575758
2 Female 16-19 1-2 2 25 1 24 0.040000000
3 Female 16-19 1-2 >=3 30 1 29 0.033333333
4 Female 16-19 3 0-1 80 2 78 0.025000000
5 Female 16-19 3 2 13 1 11 0.076923077
6 Female 16-19 3 >=3 18 1 17 0.055555556
function:
adaptRisk <- function(data, sexNum, ageNum, deprivationNum,
partnerNum, testResult){
if (testResult == "positive") {
data$tested[data$sex == sexNum &
data$age == ageNum &
data$deprivation == deprivationNum &
data$partners == partnerNum] <- data$tested[data$sex == sexNum &
data$age == ageNum &
data$deprivation == deprivationNum &
data$partners == partnerNum] + 1
data$infected[data$sex == sexNum &
data$age == ageNum &
data$deprivation == deprivationNum &
data$partners == partnerNum] <- data$infected[data$sex == sexNum &
data$age == ageNum &
data$deprivation == deprivationNum &
data$partners == partnerNum] + 1
data$risk[data$sex == sexNum &
data$age == ageNum &
data$deprivation == deprivationNum &
data$partners == partnerNum] <- data$infected[data$sex == sexNum &
data$age == ageNum &
data$deprivation == deprivationNum &
data$partners == partnerNum]/data$tested[data$sex == sexNum &
data$age == ageNum &
data$deprivation == deprivationNum &
data$partners == partnerNum]
}
else if (testResult == "negative") {
data$tested[data$sex == sexNum &
data$age == ageNum &
data$deprivation == deprivationNum &
data$partners == partnerNum] <- data$tested[data$sex == sexNum &
data$age == ageNum &
data$deprivation == deprivationNum &
data$partners == partnerNum] + 1
data$risk[data$sex == sexNum &
data$age == ageNum &
data$deprivation == deprivationNum &
data$partners == partnerNum] <- data$infected[data$sex == sexNum &
data$age == ageNum &
data$deprivation == deprivationNum &
data$partners == partnerNum]/data$tested[data$sex == sexNum &
data$age == ageNum &
data$deprivation == deprivationNum &
data$partners == partnerNum]
}
return(data)
}
The function outputs a single row that -- apparently -- you intend to replace the original row(s). You could replace the original row by doing something like this:
## original data frame is named patientData
patientRow <- adaptRisk(data, "Female", "16-19", 3, 2, "positive")
patientData[row.names(patientRow), ] <- patientRow

Resources