Create new column with if else in R - r

I have a database like this:
structure(list(code = c(1, 2, 3, 4), age = c(25, 30, 45, 50),
car = c(0, 1, 0, 1)), row.names = c(NA, -4L), class = c("tbl_df",
"tbl", "data.frame"))
I want to create a column "drivers under 40" with this conditions:
0 if Age<40 & car==0
1 if Age<40 & car==1
How do I create the third column with this conditions?
I tried using the code "if else" to create a variable but it doesn't work.
drivers <- ifelse((age <= 40) & (car==0), 0, ifelse((age<=40) & (car==1), 1))
Is maybe the code written wrong?
Is there another method to do it? I am afraid to mess up the parentheses so I'd prefer another method, if there is any faster

Here is a dplyr version with case_when
library(dplyr)
df %>%
mutate(drivers_under_40 = case_when(age <= 40 & car==0 ~ 0,
age <= 40 & car==1 ~ 1,
TRUE ~ NA_real_))
code age car drivers_under_40
<dbl> <dbl> <dbl> <dbl>
1 1 25 0 0
2 2 30 1 1
3 3 45 0 NA
4 4 50 1 NA

A base R option
df1$drivers_under_40 <- with(df1, (age <= 40 & car == 1)* NA^(age> 40))
df1$drivers_under_40
[1] 0 1 NA NA

Unless you work with dplyr you have to specify the data in your ifelse statement.
data$column for example. Also you have to assign a new column for the operation.
And the last else-statement is missing.
so your ifelse statement should look like this:
data = structure(list(code = c(1, 2, 3, 4), age = c(25, 30, 45, 50),
car = c(0, 1, 0, 1)), row.names = c(NA, -4L), class = c("tbl_df",
"tbl", "data.frame"))
data$drivers <- ifelse((data$age <= 40) & (data$car==0), 0, ifelse((data$age<=40) & (data$car==1), 1, "here you have to fill another 'else' value"))

Related

Group_by not working, summarize() computing identical values?

I am using the data found here: https://www.kaggle.com/cdc/behavioral-risk-factor-surveillance-system. In my R studio, I have named the csv file, BRFSS2015. Below is the code I am trying to execute. I have created two new columns comparing people who have arthritis vs. people who do not have arthritis (arth and no_arth). Grouping by these variables, I am now trying to find the mean and sd for their weights. The weight variable was generated from another variable in the dataset using this code: (weight = BRFSS2015$WEIGHT2) Below is the code I am trying to run for mean and sd.
BRFSS2015%>%
group_by(arth,no_arth)%>%
summarize(mean_weight=mean(weight),
sd_weight=sd(weight))
I am getting output that says mean and sd for these two groups is identical. I doubt this is correct. Can someone check and tell me why this is happening? The numbers I am getting are:
arth: mean = 733.2044; sd= 2197.377
no_arth: mean= 733.2044; sd= 2197.377
Here is how I created the variables arth and no_arth:
a=BRFSS2015%>%
select(HAVARTH3)%>%
filter(HAVARTH3=="1")
b=BRFSS2015%>%
select(HAVARTH3)%>%
filter(HAVARTH3=="2")
as.data.frame(BRFSS2015)
arth=c(a)
no_arth=c(b)
BRFSS2015$arth <- c(arth, rep(NA, nrow(BRFSS2015)-length(arth)))
BRFSS2015$no_arth <- c(no_arth, rep(NA, nrow(BRFSS2015)-length(no_arth)))
as.tibble(BRFSS2015)
Before I started, I also removed NAs from weight using weight=na.omit(WEIGHT2)
Based on the info you provided one can only guess what when wrong in your analysis. But here is a working code using a snippet of the real data.
library(tidyverse)
BRFSS2015_minimal <- structure(list(HAVARTH3 = c(
1, 2, 1, 1, 1, 1, 2, 1, 1, 2, 2,
1, 1, 1, 1, 1, 1, 2, 1, 2
), WEIGHT2 = c(
280, 165, 158, 180, 142,
145, 148, 179, 84, 161, 175, 150, 9999, 140, 170, 128, 200, 178,
155, 163
)), row.names = c(NA, -20L), class = c(
"tbl_df", "tbl",
"data.frame"
))
BRFSS2015_minimal %>%
filter(!is.na(WEIGHT2), HAVARTH3 %in% 1:2) %>%
mutate(arth = HAVARTH3 == 1, no_arth = HAVARTH3 == 2,weight = WEIGHT2) %>%
group_by(arth, no_arth) %>%
summarize(
mean_weight = mean(weight),
sd_weight = sd(weight),
.groups = "drop"
)
#> # A tibble: 2 × 4
#> arth no_arth mean_weight sd_weight
#> <lgl> <lgl> <dbl> <dbl>
#> 1 FALSE TRUE 165 10.8
#> 2 TRUE FALSE 865 2629.
Code used to create dataset
BRFSS2015 <- readr::read_csv("2015.csv")
BRFSS2015_minimal <- dput(head(BRFSS2015[c("HAVARTH3", "WEIGHT2")], 20))

How to add a new column based on a few other variables

I am new to R and am having trouble creating a new variable using conditions from already existing variables. I have a dataset that has a few columns: Name, Month, Binary for Gender, and Price. I want to create a new variable, Price2, that will:
make the price charged 20 if [the month is 6-9(Jun-Sept) and Gender is 0]
make the price charged 30 if [the month is 6-9(Jun-Sept) and Gender is 1]
make the price charged 0 if [the month is 1-5(Jan-May) or month is 10-12(Oct-Dec]
--
structure(list(Name = c("ADI", "SLI", "SKL", "SNK", "SIIEL", "DJD"), Mon = c(1, 2, 3, 4, 5, 6), Gender = c(1, NA, NA, NA, 1, NA), Price = c(23, 34, 32, 64, 23, 34)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))
Using case_when() from the dplyr package:
mydf$newprice <- dplyr::case_when(
mydf$Mon >= 6 & mydf$Mon <= 9 & mydf$Gender == 0 ~ 20,
mydf$Mon >= 6 & mydf$Mon <= 9 & mydf$Gender == 1 ~ 30,
mydf$Mon < 6 | mydf$Mon > 9 ~ 0)

Combine ifelse two conditions and loop

I have a liste of dataframes (file1, file2, ..., file 72). For each dataframe I want to create one variable containing information from another dataframe based on two conditions.
The idea is simple:
condition 1: if file*$countryid equals source$country, and
condition 2: if file*$year is higher than source$starting but lower than source$ending, then if true I want to create a column file*$rank with the value in source$rank
I have been trying code lines like this but this code does not go through all lines in source:
file1$rank<-ifelse(file1$countryid=source$countryid & file1$year>source$starting & file1$year<source$ending,source$rank,NA)
In addition I would like to implement this within a loop to avoid iterating manually through all these dataframes:
dflist<-Filter(is.data.frame, mget(ls()))
dflist<-function(df,x){df$rank<-ifelse(df$countryid=source$countryid & df$year>source$starting & df$year<source$ending,source$rank,NA))
Here is an example of the data I have.
Thank you!
> dput(file1)
structure(list(id = c(1, 2, 3), countryid = c(10, 10, 13), year = c(1948,
1954, 1908)), row.names = c(NA, -3L), class = c("tbl_df", "tbl",
"data.frame"))
dput(file2)
structure(list(id = c(1, 2, 3), countryid = c(13, 10, 13), year = c(1907,
1908, 1907)), row.names = c(NA, -3L), class = c("tbl_df", "tbl",
"data.frame"))
> dput(source)
structure(list(country = c(13, 13, 13, 10, 10, 10), rank = c(1,
2, 3, 1, 2, 3), starting = c(1885, 1909, 1940, 1902, 1907, 1931
), ending = c(1908, 1939, 1960, 1906, 1930, 1960)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
We can use a non-equi join after getting all the file\\d+ datasets into a list
library(data.table)
out <- lapply(mget(ls(pattern = '^file\\d+$')), function(dat)
setDT(dat)[, year := as.integer(year)][as.data.table(source), rank := i.rank,
on = .(countryid = country, year > starting, year < ending)])
-output
out
#$file1
# id countryid year rank
#1: 1 10 1948 3
#2: 2 10 1954 3
#3: 3 13 1908 NA
#$file2
# id countryid year rank
#1: 1 13 1907 1
#2: 2 10 1908 2
#3: 3 13 1907 1
if it needs to update the original objects, use list2env
list2env(out, .GlobalEnv)

if_else with haven_labelled column fails because of wrong class

I have the following data:
dat <- structure(list(value = structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
label = "value: This is my label",
labels = c(`No` = 0, `Yes` = 1),
class = "haven_labelled"),
group = structure(c(1, 2, 1, 1, 2, 3, 3, 1, 3, 1, 3, 3, 1, 2, 3, 2, 1, 3, 3, 1),
label = "my group",
labels = c(first = 1, second = 2, third = 3),
class = "haven_labelled")),
row.names = c(NA, -20L),
class = c("tbl_df", "tbl", "data.frame"),
label = "test.sav")
As you can see, the data uses a special class from tidyverse's haven package, so called labelled columns.
Now I want to recode my initial value variable such that:
if group equals 1, value should stay the same, otherwise it should be missing
I was trying the following, but getting an error:
dat_new <- dat %>%
mutate(value = if_else(group != 1, NA, value))
# Error: `false` must be a logical vector, not a `haven_labelled` object
I got so far as to understand that if_else from dplyr requires the true and false checks in the if_else command to be of same class and since there is no NA equivalent for class labelled (e.g. similar to NA_real_ for doubles), the code probably fails, right?
So, how can I recode my inital variables and preserve the labels?
I know I could change my code above and replace the if_else by R's base version ifelse. However, this deletes all labels and coerces the value column to a numeric one.
You can try dplyr::case_when for cases where group == 1. If no cases are matched, NA is returned:
dat %>% mutate(value = case_when(group == 1 ~ value))
You can create an NA value in the haven_labelled class with this ugly code:
haven::labelled(NA_real_, labels = attr(dat$value, "labels"))
I'd recommend writing a function for that, e.g.
labelled_NA <- function(value)
haven::labelled(NA_real_, labels = attr(value, "labels"))
and then the code for your mutate isn't quite so ugly:
dat_new <- dat %>%
mutate(value = if_else(group != labelled_NA(value), value))
Then you get
> dat_new[1:5,]
# A tibble: 5 x 2
value group
<dbl+lbl> <dbl+lbl>
1 NA 1 [first]
2 NA 2 [second]
3 0 [No] 1 [first]
4 0 [No] 1 [first]
5 NA 2 [second]

How to create a new dataset based on multiple conditions in R?

I have a dataset called carcom that looks like this
carcom <- data.frame(household = c(173, 256, 256, 319, 319, 319, 422, 422, 422, 422), individuals= c(1, 1, 2, 1, 2, 3, 1, 2, 3, 4))
Where individuals refer to father for "1" , mother for "2", child for "3" and "4". What I would like to get two new columns. First one should indicate the number of children in that household if there is. Second, assigning a weight to each individual respectively "1" for father, "0.5" to mother and "0.3" to each child. My new dataset should look like this
newcarcom <- data.frame(household = c(173, 256, 319, 422), child = c(0, 0, 1, 2), weight = c(1, 1.5, 1.8, 2.1)
I have been trying to find the solutions for days. Would be appreciated if someone helps me. Thanks
We can count number of individuals with value 3 and 4 in each household. To calculate weight we change the value for 1:4 to their corresponding weight values using recode and then take sum.
library(dplyr)
newcarcom <- carcom %>%
group_by(household) %>%
summarise(child = sum(individuals %in% 3:4),
weight = sum(recode(individuals,`1` = 1, `2` = 0.5, .default = 0.3)))
# household child weight
# <dbl> <int> <dbl>
#1 173 0 1
#2 256 0 1.5
#3 319 1 1.8
#4 422 2 2.1
Base R version suggested by #markus
newcarcom <- do.call(data.frame, aggregate(individuals ~ household, carcom, function(x)
c(child = sum(x %in% 3:4), weight = sum(replace(y <- x^-1, y < 0.5, 0.3)))))
An option with data.table
library(data.table)
setDT(carcom)[, .(child = sum(individuals %in% 3:4),
weight = sum(recode(individuals,`1` = 1, `2` = 0.5, .default = 0.3))), household]

Resources