I have made a function which increments the values in certain columns in a certain row. I did this by writing a function that subsets through my dataframe to find the row it needs (by looking at sex, then age, then deprivation, then number of partners) and then adds numbers to whichever column I need it to (depending on these risk factors), it then calculates the risk (my code is for STI testing).
However, this does not change my existing dataframe with the new values, but creates a new variable patientRow which holds these new values. I need help with how I can incorporate this into my existing dataframe. Thanks!
adaptRisk <- function(dataframe, sexNum, ageNum, deprivationNum,
partnerNum, testResult){
sexRisk = subset(dataframe, sex == sexNum)
ageRisk = subset(sexRisk, age == ageNum)
depRisk = subset(ageRisk, deprivation == deprivationNum)
patientRow = subset(depRisk, partners == partnerNum)
if (testResult == "positive") {
patientRow$tested <- patientRow$tested + 1
patientRow$infected <- patientRow$infected + 1
}
else if (testResult == "negative") {
patientRow$tested <- patientRow$tested + 1
}
patientRow <- transform(patientRow, risk = infected/tested)
return(patientRow)
}
This is the head of my dataframe to give you an idea:
sex age deprivation partners tested infected risk
1 Female 16-19 1-2 0-1 132 1 0.007575758
2 Female 16-19 1-2 2 25 1 0.040000000
3 Female 16-19 1-2 >=3 30 1 0.033333333
4 Female 16-19 3 0-1 80 2 0.025000000
5 Female 16-19 3 2 12 1 0.083333333
6 Female 16-19 3 >=3 18 1 0.055555556
The dput of my data is:
structure(list(sex = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label =
c("Female",
"Male"), class = "factor"), age = structure(c(1L, 1L, 1L, 1L,
1L, 1L), .Label = c("16-19", "20-24", "25-34", "35-44"), class =
"factor"),
deprivation = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("1-2",
"3", "4-5"), class = "factor"), partners = structure(c(2L,
3L, 1L, 2L, 3L, 1L), .Label = c(">=3", "0-1", "2"), class = "factor"),
tested = c(132L, 25L, 30L, 80L, 12L, 18L), infected = c(1L,
1L, 1L, 2L, 1L, 1L), uninfected = c(131L, 24L, 29L, 78L,
11L, 17L), risk = c(0.00757575757575758, 0.04, 0.0333333333333333,
0.025, 0.0833333333333333, 0.0555555555555556)), .Names = c("sex",
"age", "deprivation", "partners", "tested", "infected", "uninfected",
"risk"), row.names = c(NA, 6L), class = "data.frame")
An example call to the function:
adaptRisk(data, "Female", "16-19", 3, 2, "positive")
sex age deprivation partners tested infected uninfected risk
5 Female 16-19 3 2 13 2 11 0.1538462
I have adjusted your function (see all the way below) using base R syntax. It does the job, but is not the most beautiful code.
Issue:
The subsets create a lot of extra (and not needed) data.frames, instead of replacing the internal values when the conditions match. And the return was a different data.frame so the existing data.frame could not handle it correctly.
I adjusted it so that the filters are done on the needed objects that you want to change.
Transform might have unintended side effects and you were recalculating the whole risk column. Now only the affected value is recalculated.
You might want to built in some warnings / stops in case the filters return more than 1 record.
You can now use
df <- adaptRisk(df, "Female", "16-19", "3", "2", "positive") to replace the values in the data.frame you supply to the function
examples
# affects row 5
adaptRisk(df, "Female", "16-19", "3", "2", "positive")
sex age deprivation partners tested infected uninfected risk
1 Female 16-19 1-2 0-1 132 1 131 0.007575758
2 Female 16-19 1-2 2 25 1 24 0.040000000
3 Female 16-19 1-2 >=3 30 1 29 0.033333333
4 Female 16-19 3 0-1 80 2 78 0.025000000
5 Female 16-19 3 2 13 2 11 0.153846154
6 Female 16-19 3 >=3 18 1 17 0.055555556
# affects row 5
adaptRisk(df, "Female", "16-19", "3", "2", "negative")
sex age deprivation partners tested infected uninfected risk
1 Female 16-19 1-2 0-1 132 1 131 0.007575758
2 Female 16-19 1-2 2 25 1 24 0.040000000
3 Female 16-19 1-2 >=3 30 1 29 0.033333333
4 Female 16-19 3 0-1 80 2 78 0.025000000
5 Female 16-19 3 2 13 1 11 0.076923077
6 Female 16-19 3 >=3 18 1 17 0.055555556
function:
adaptRisk <- function(data, sexNum, ageNum, deprivationNum,
partnerNum, testResult){
if (testResult == "positive") {
data$tested[data$sex == sexNum &
data$age == ageNum &
data$deprivation == deprivationNum &
data$partners == partnerNum] <- data$tested[data$sex == sexNum &
data$age == ageNum &
data$deprivation == deprivationNum &
data$partners == partnerNum] + 1
data$infected[data$sex == sexNum &
data$age == ageNum &
data$deprivation == deprivationNum &
data$partners == partnerNum] <- data$infected[data$sex == sexNum &
data$age == ageNum &
data$deprivation == deprivationNum &
data$partners == partnerNum] + 1
data$risk[data$sex == sexNum &
data$age == ageNum &
data$deprivation == deprivationNum &
data$partners == partnerNum] <- data$infected[data$sex == sexNum &
data$age == ageNum &
data$deprivation == deprivationNum &
data$partners == partnerNum]/data$tested[data$sex == sexNum &
data$age == ageNum &
data$deprivation == deprivationNum &
data$partners == partnerNum]
}
else if (testResult == "negative") {
data$tested[data$sex == sexNum &
data$age == ageNum &
data$deprivation == deprivationNum &
data$partners == partnerNum] <- data$tested[data$sex == sexNum &
data$age == ageNum &
data$deprivation == deprivationNum &
data$partners == partnerNum] + 1
data$risk[data$sex == sexNum &
data$age == ageNum &
data$deprivation == deprivationNum &
data$partners == partnerNum] <- data$infected[data$sex == sexNum &
data$age == ageNum &
data$deprivation == deprivationNum &
data$partners == partnerNum]/data$tested[data$sex == sexNum &
data$age == ageNum &
data$deprivation == deprivationNum &
data$partners == partnerNum]
}
return(data)
}
The function outputs a single row that -- apparently -- you intend to replace the original row(s). You could replace the original row by doing something like this:
## original data frame is named patientData
patientRow <- adaptRisk(data, "Female", "16-19", 3, 2, "positive")
patientData[row.names(patientRow), ] <- patientRow
Related
I have the following data in R:
gender <- c("Male","Female")
gender <- sample(gender, 5000, replace=TRUE, prob=c(0.45, 0.55))
gender <- as.factor(gender)
disease <- c("Yes","No")
disease <- sample(disease, 5000, replace=TRUE, prob=c(0.4, 0.6))
disease <- as.factor(disease)
status <- c("Immigrant","Citizen")
status <- sample(status, 5000, replace=TRUE, prob=c(0.3, 0.7))
status <- as.factor(status )
my_data = data.frame(gender, status, disease)
I want to make a table that shows:
What percent of male immigrants have the disease?
What percent of male non-immigrants have the disease?
What percent of female immigrants have the disease?
What percent of female non-immigrants have the disease?
I tried to do this with the following code:
t1 <- xtabs(disease ~ gender + status, data=my_data)
But I get this error:
Error in Summary.factor(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, :
‘sum’ not meaningful for factors
Can someone please show me what I am doing wrong and how to fix this?
Thank you!
As there are more columns and all of them are factors, use count from dplyr and then get the proportions
library(dplyr)
library(tidyr)
my_data %>%
dplyr::count(across(everything())) %>%
pivot_wider(names_from = disease, values_from =n, values_fill = 0) %>%
group_by(gender) %>%
mutate(100 *across(No:Yes, proportions)) %>%
ungroup
-output
# A tibble: 4 × 4
gender status No Yes
<fct> <fct> <dbl> <dbl>
1 Female Citizen 69.4 72.4
2 Female Immigrant 30.6 27.6
3 Male Citizen 70.4 68.7
4 Male Immigrant 29.6 31.3
With xtabs, if we convert the column to integer, it could work as
apply(xtabs(n ~ disease + gender + status,
transform(my_data, n = as.integer(disease))), c(1, 2), proportions) * 100
, , gender = Female
disease
status No Yes
Citizen 69.36724 72.41993
Immigrant 30.63276 27.58007
, , gender = Male
disease
status No Yes
Citizen 70.40185 68.68687
Immigrant 29.59815 31.31313
I have a problem with my R code.
Here, I have a list named bought_list with lists of customer and checkout (checkout is a data frame),
And this how checkout lists looks like:
items price qty total
Milk 10 2 20
Dolls 15 10 150
Chocolate 5 5 25
Toys 50 1 50
I want to know which one is for play_purpose and date_purpose
So I made a variable of boolean
play_purpose <- Bought_list[["checkout"]][,"total"] >= 50 & Bought_list[["checkout"]][,"total"] <= 150
date_purpose <- Bought_list[["checkout"]][,"total"] > 0 & Bought_list[["checkout"]][,"total"] < 50
How to return the items name and total value of selected condition like this?
for play_purpose:
Dolls 150
Toys 50
for date_purpose :
Milk 20
Chocolate 25
I'm not clear on the structure of your data, but you could subset with your current code:
play_purpose <-
Bought_list[["checkout"]][Bought_list[["checkout"]][, "total"] >= 50 &
Bought_list[["checkout"]][, "total"] <= 150, c(1, 4)]
# items total
#2 Dolls 150
#4 Toys 50
date_purpose <-
Bought_list[["checkout"]][Bought_list[["checkout"]][, "total"] > 0 &
Bought_list[["checkout"]][, "total"] < 50, c(1, 4)]
# items total
#1 Milk 20
#3 Chocolate 25
Another option is to use dplyr:
Bought_list$checkout %>%
filter(total >= 50 & total <= 150) %>%
select(items, total)
Bought_list$checkout %>%
filter(total > 0 & total < 50) %>%
select(items, total)
Or if you are needing to applying this function to multiple dataframes in the list, then we could use map from purrr:
map(Bought_list, ~ .x %>%
filter(total >= 50 & total <= 150) %>%
select(items, total))
map(Bought_list, ~ .x %>%
filter(total > 0 & total < 50) %>%
select(items, total))
Data
Bought_list <- list(checkout = structure(list(items = c("Milk", "Dolls", "Chocolate",
"Toys"), price = c(10L, 15L, 5L, 50L), qty = c(2L, 10L, 5L, 1L
), total = c(20L, 150L, 25L, 50L)), class = "data.frame", row.names = c(NA,
-4L)))
I have 4 columns, called Amplification, CNV.gain, Homozygous.Deletion.Frequency, Heterozygous.Deletion.Frequency. I want to create a new column in which, if any of the values in these 4 columns are:
greater than or equal to 5 and less than or equal to 10, it returns low:
greater than 10 and less than or equal to 20, it returns medium
greater than 20, it returns high
An example of the final table (long_fused) would look like this:
CNV.Gain
Amplification
Homozygous.Deletion.Frequency
Heterozygous.Deletion.Frequency
Threshold
3
5
10
0
Low
0
0
11
8
Medium
7
16
25
0
High
So far, I've tried the following code, although it seems to fill in the "Threshold" Column, is doing so incorrectly.
library(dplyr)
long_fused <- long_fused %>%
mutate(Percent_sample_altered = case_when(
Amplification>=5 & Amplification < 10 & CNV.gain>=5 & CNV.gain < 10 | CNV.gain>=5 & CNV.gain<=10 & Homozygous.Deletion.Frequency>=5 & Homozygous.Deletion.Frequency<=10| Heterozygous.Deletion.Frequency>=5 & Heterozygous.Deletion.Frequency<=10 ~ 'Low',
Amplification>= 10 & Amplification<20 |CNV.gain>=10 & CNV.gain<20| Homozygous.Deletion.Frequency>= 10 & Homozygous.Deletion.Frequency<20 | Heterozygous.Deletion.Frequency>=10 & Heterozygous.Deletion.Frequency<20 ~ 'Medium',
Amplification>20 | CNV.gain >20 | Homozygous.Deletion.Frequency >20 | Heterozygous.Deletion.Frequency>20 ~ 'High'))
As always any help is appreciated!
Data in dput format
long_fused <-
structure(list(CNV.Gain = c(3L, 0L, 7L), Amplification = c(5L,
0L, 16L), Homozygous.Deletion.Frequency = c(10L, 11L, 25L),
Heterozygous.Deletion.Frequency = c(0L, 8L, 0L), Threshold =
c("Low", "Medium", "High")), class = "data.frame",
row.names = c(NA, -3L))
Here is a way with rowwise followed by base function cut.
library(dplyr)
long_fused %>%
rowwise() %>%
mutate(new = max(c_across(-Threshold)),
new = cut(new, c(5, 10, 20, Inf), labels = c("Low", "Medium", "High"), left.open = TRUE))
Here's an alternative using case_when -
library(dplyr)
long_fused %>%
mutate(max = do.call(pmax, select(., -Threshold)),
#If you don't have Threshold column in your data just use .
#mutate(max = do.call(pmax, .),
Threshold = case_when(between(max, 5, 10) ~ 'Low',
between(max, 11, 15) ~ 'Medium',
TRUE ~ 'High'))
# CNV.Gain Amplification Homozygous.Deletion.Frequency
#1 3 5 10
#2 0 0 11
#3 7 16 25
# Heterozygous.Deletion.Frequency max Threshold
#1 0 10 Low
#2 8 11 Medium
#3 0 25 High
I want to "translate" a syntax written in SPSS into R code but am a total beginner in R and struggling to get it to work.
The SPSS syntax is
DO IF (Geschlecht = 0).
RECODE hang0 (SYSMIS=SYSMIS) (Lowest thru 22.99=0) (23 thru 55=1) (55.01 thru Highest=2)
INTO Hang.
ELSE IF (Geschlecht = 1).
RECODE hang0 (SYSMIS=SYSMIS) (Lowest thru 21.99=0) (22 thru 54=1) (54.01 thru Highest=2)
INTO Hang.
END IF.
I have installed the "car"-package in R but I neither get the "range" recoding to work (I have tried
td_new$Hang <- recode(td_new$hang0, "0:22.99=0; 23:55=1; else=2")
nor do I manage to work with the if-else-function. My last attempt was
if(td_new$Geschlecht == 0){
td_new$Hang <- td_new$hang0 = 3
} else if (td_new$Geschlecht == 1) {
td_new$Hang <- td_new$hang0 = 5)
} else
td_new$hang0 <- NA
(this was without the recoding, just to test the if-else function).
Would be very happy if someone helped!
Thanks a lot in advance :)!
Sorry, edited to add:
The data structure looks as follows:
Geschlecht hang0
0 15
1 45
1 7
0 11
And I want to recode hang0 such that
for boys (Geschlecht = 0): all values < 23 = 0, values between 23 and 55 = 1, all values > 55 = 2
and for girls (Geschlecht = 1): all values < 22 = 0, values between 23 and 54 = 1, all values > 54 = 2
Here's an approach with case_when:
library(dplyr)
td_new %>%
mutate(Hang = case_when(Geschlecht = 0 & hang0 < 23 ~ 0,
Geschlecht = 0 & hang0 >= 23 & hang0 < 55 ~ 1,
Geschlecht = 0 & hang0 >= 55 ~ 2,
Geschlecht = 1 & hang0 < 22 ~ 0,
Geschlecht = 1 & hang0 >= 22 & hang0 < 54 ~ 1,
Geschlecht = 1 & hang0 >= 54 ~ 2,
TRUE ~ NA_real_))
# Geschlecht hang0 Hang
#1 0 15 0
#2 1 45 1
#3 1 7 0
#4 0 11 0
The final line is there to catch NAs.
Data
td_new <- structure(list(Geschlecht = c(0L, 1L, 1L, 0L), hang0 = c(15L, 45L, 7L, 11L)), class = "data.frame", row.names = c(NA, -4L))
what should i do when i want to make new column with mutate but with if condition status on it.
example :
dt <- read.table(text="
name,gender,fat_%
adam,male,32
anya,female,27
gilang,male,24
andine,female,34
",sep=',',header=TRUE)
## + > dt
## name gender fat_.
## 1 adam male 32
## 2 anya female 27
## 3 gilang male 24
## 4 andine female 34
my question :
what code i have to write if i want to make new column where gonna take 2 answer "yes" or "no".
and my new column will be like this :
name gender fat_% obesity
adam male 32 yes
anya female 27 no
gilang male 24 yes
andine female 34 no
note : formula to find obesity is
(if male & fat > 26 = yes ,if girl & fat >32 = yes) if (if male & fat < 26 = no ,if girl & fat <32 = no)
Couple of suggestions first. Gender can be a single char M/F. You cannot use % in column name. Your column name 'fat', you probably meant BMI??
Does this work for you?
dt %>%
mutate (newcol = ifelse ((gender == "male"), (ifelse ((fat_ > 26), TRUE, FALSE)),
(ifelse ((fat_ > 32), TRUE, FALSE))))
Two solutions.
First, a base Rsolution:
df$obesity <- ifelse (df$gender == "m" & df$fat_ > 26 , "yes",
ifelse(df$gender == "f" & df$fat_ > 32, "yes", "no"))
Using mutatefrom dplyr, a more compact code based on dplyrs if_else rather than base R's ifelse is this:
df %>%
mutate(obesity = if_else(gender=="m" & fat_ > 26|gender=="f" & fat_ > 32, "yes", "no"))
RESULT:
df
name gender fat_ obesity
1 adam m 32 yes
2 anya f 27 no
3 gilang m 24 no
4 andine f 34 yes
DATA:
df <- data.frame(
name = c("adam", "anya", "gilang", "andine"),
gender = c("m", "f", "m", "f"),
fat_ = c(32,27,24,34)
)
One approach is to use case_when from dplyr:
library(dplyr)
df %>%
mutate(obesity = case_when(gender == "male" & fat > 26 ~ "yes",
gender == "female" & fat > 32 ~ "yes",
TRUE ~ "no"))
# name gender fat obesity
#1 adam male 32 yes
#2 anya female 27 no
#3 gilang male 24 no
#4 andine female 34 yes
Once you understand the syntax, it comes in handy quite often.
Data
structure(list(name = structure(c(1L, 3L, 4L, 2L), .Label = c("adam",
"andine", "anya", "gilang"), class = "factor"), gender = structure(c(2L,
1L, 2L, 1L), .Label = c("female", "male"), class = "factor"),
fat = c(32, 27, 24, 34)), class = "data.frame", row.names = c(NA,
-4L))