I have a continuous variable in R. Entries 1-30 need to stay the same. NAs are coded as 99 and 0 was coded as 88 for some reason. I'm trying to figure out how to recode 99s to NA and 88s to 0, but keep any variables 1-30 as is.
I have tried a few things, but I'm pretty new to R and coding in general. None of my attempts have come even close, and most of the examples I'm coming across in my search are about categorial variables, recoding continuous as categorical, or binning. I want to recode as continuous, just changing 88s and 99s only.
I tried using mutate in a few different ways, but none worked. Most of the outcomes were and error or the new MH variable with nothing actually changed.
With dplyr, you can use
recode()
df %>%
mutate(y = recode(x, `88` = 0, `99` = NA_real_))
case_match()
df %>%
mutate(y = case_match(x, 88 ~ 0, 99 ~ NA, .default = x))
case_when()
df %>%
mutate(y = case_when(x == 88 ~ 0, x == 99 ~ NA, .default = x))
Using fcase
library(data.table)
setDT(df)[, y := fcase(!x %in% c(88, 99), x, x == 88, 0)]
You have a lot of options at your disposal with the tidyverse packages (e.g., dplyr, tidyr). One option is to use na_if to turn the 99s into NA and if_else to turn the 88s to 0.
I have created a fake dataset below, but if you have questions about your specific dataset, you should provide a reproducible example with your own data.
library(tidyverse)
a <- sample(x = c(1, 2, 3, 4, 99, 88), size = 30, replace = T)
b <- sample(x = c(1, 2, 3, 4, 99, 88), size = 30, replace = T)
c <- sample(x = c(1, 2, 3, 4, 99, 88), size = 30, replace = T)
df <- data.frame(a, b, c)
df
df %>%
mutate(across(everything(), ~na_if(., 99))) %>%
mutate(across(everything(), ~if_else(. == 88, 0, .)))
We can update matching values inplace with base R
df$y[df$y == 99] <- NA
df$y[df$y == 88] <- 0
Related
I have this df
df <- data.table(id=c(1,2,3,4,5,6,7,8,9,10),
var1=c(0,4,5,6,99,3,5,5,23,0),
var2=c(22,4,6,25,6,70,75,23,24,21))
I would like to create a third column being:
df <- data.table(id=c(1,2,3,4,5,6,7,8,9,10),
var1=c(0,4,5,6,99,3,5,5,23,0),
var2=c(22,4,6,25,6,70,75,23,24,21),
var3=c("0_22","4_4","5_6","6_25","99_6","3_70","5_75","5_23","23_24","0_21"))
where the value of each cell will be "var1 underscore var2".
Var1 and Var2 are categorical variables as they represent medications. Var3 would be to represent a combination of medications.
how can I do this?
thanks!
Load packages
library(data.table)
library(dplyr)
Create dataframe
df <- data.table(
id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
var1 = c(0, 4, 5, 6, 99, 3, 5, 5, 23, 0),
var2 = c(22, 4, 6, 25, 6, 70, 75, 23, 24, 21)
)
Add new variable
By means of dplyr package and sprintf
df <- df %>%
mutate(var3 = sprintf("%d_%d", var1, var2))
By means of dplyr package and paste0
df <- df %>%
mutate(var3 = paste0(var1, "_", var2))
By means of base package and sprintf
df$var3 <- sprintf("%d_%d", df$var1, df$var2)
By means of base package and paste0
df$var3 <- paste0(df$var1, "_", df$var2)
as #Wimpel says, the solution is df$var3 <- paste(df$var1, df$var2, sep = "_")
thanks!!
You can do this efficiently using the tidyverse and the unite() function
library(tidyverse)
df <- tibble(id=c(1,2,3,4,5,6,7,8,9,10),
var1=c(0,4,5,6,99,3,5,5,23,0),
var2=c(22,4,6,25,6,70,75,23,24,21)) %>%
# create new variable
unite(var3, c(var1, var2), sep = "_", remove = FALSE)
First, this question is likely a repeat, however the solutions I've tried (eg using enquo() and !!, or get(), or {{}}) have not yielded results.
The Problem
I have a function that needs to take column names passed to it in a pipeline, perform a series of dplyr-based functions with some base R components, and return the new dataframe. The problem is that the function will not take the column names passed to it as variables in the referenced dataframe, treating them as strings instead.
The Data
This code will create a usable dataframe for this problem:
df_ext <- tibble(ID = c(rep(1, 5), rep(2, 5)),
TIME = rep(c(1, 2, 3, 4, 5), 2),
VAL = c(0, 1, 2, 2, 3, 1, 5, 0, 1, 4))
The Current Function
Here's a version of the function that I can share. It's designed to create a series of categories for the data I pass to it, but this is a simplified version that just calculates some basic groupings (ie, it doesn't do much of anything).
my_fun <- function(.data, id, time, val){
require("dplyr")
df <- df |>
group_by(id) |>
mutate(val_lag = if_else(val > 0, time - lag(time, default = 0), 0)) |>
mutate(first_time = min(time),
last_time = max(time),
first_val_pos = ifelse(any(val), min(time[val > 0]), NA),
last_val_pos = ifelse(any(val > 0), max(time[val > 0]), NA)) |>
group_by(grp = cumsum(val_lag == 0)) |>
mutate(val_pos_run = cumsum(val_lag)) |>
ungroup() |>
group_by(id) |>
mutate(ada_bl = ifelse(first_val_pos <= 0, val[time == first_val_pos], 0)) |>
ungroup()
df
}
df_ext |>
my_fun(id = ID, time = TIME, val = VAL)
If anyone can get the columns from the dataframe to pass into the function and get treated as columns in the pipe-referenced dataframe, you'd be ending a very frustrating headache.
I am working with a very messy data set and I'll be needing to use the recode() function in a pipe to turn numbers 0:30 into four numerical categories (0,1,2,3,4).
What I have:
recode(var, 10:30 = 4,
6:9 = 3,
3:5 = 2,
1:2 = 1,
0 = 0))
Any help is greatly appreciated!
It may be easier with case_when
library(dplyr)
case_when(var %in% 10:30 ~ 4,
var %in% 6:9 ~ 3,
var %in% 3:5 ~ 2,
var %in% 1:2 ~ 1,
var == 0 ~ 0)
Or another option is cut
as.integer(cut(var, breaks = c(-Inf, 0, 2, 5, 9, 30, Inf)))
NOTE: change the include.lowest and right option in cut to adjust
data
set.seed(24)
var <- sample(0:35, 50, replace = TRUE)
I have coordinate in 2 different vectors named x and y. I want to create a third variable telling in which cadran (quarter) they are. With my ifelse statement I cannot figure out to use a range as an example x would be 25>x>0 and 25>y>0 instead of just being x>0 & y>0 . Also because I want to create this variable with multiple cadran I would like to have multiple ifelse statement that does not erase each others. so at the end my third column gives me the cadran for each of these pair of x and y
Here is my code so far and I am kind of block here
library(dplyr)
x <- c(3,-2, 35, -4, 34, 20, 45)
y <- c(0, 2, 33, -4, 50, 30, 10)
df <- cbind.data.frame(x, y)
df
#Creating a third variable reacting to coordinate
df$quart <-ifelse(df$x>0 & df$y>0, "quart1", NA)
df$quart <-ifelse(df$x<0 & df$y<0, "quart2", NA)
df
Thank you for your help
Simple approach
You can use multiple ifelse() in the same conjunction, it may seem difficult to overview in the beginning, but using indentation styles which makes sense to you, could get you some of the way.
#Creating a third variable reacting to coordinate
df$quart <- ifelse(
df$x<0 & df$y<0,
"quart3",
ifelse(
df$x<25 & df$y<25,
"quart2",
ifelse(
df$x<50 & df$y<50,
"quart1",
"other"
)
)
)
Note that with this approach, quart = "other" if X and Y are equal to 0, or 25, or 50, or >50. Consider the use of <= instead of <.
Improved readability
I suggest that you utilize dplyr, for easier readability.
library(dplyr)
df <- data.frame(
x = c(3,-2, 35, -4, 34, 20, 45),
y = c(0, 2, 33, -4, 50, 30, 10)
)
df <- mutate(
df,
quart = ifelse(
x < 0 & y < 0,
"quart3",
ifelse(
x < 25 & y < 25,
"quart2",
ifelse(
x < 50 & y < 50,
"quart1",
"other"
)
)
)
)
The strength in using mutate is that it is really easy to navigate you variables (columns), as each row is handled independently, instead of handling the entire x and y vectors. This makes a difference, if you need to use functions which doesn't handle vectors with more than a single value.
After reading OP comment and OP answer, I updated the answer.
Using case_when ended up being the easiest way of doing for me. Thank you for your comments/solutions
df <- data.frame(
x = c(3,-2, 35, -4, 34, 20, 45),
y = c(0, 2, 33, -4, 50, 30, 10)
)
df
df %>%
mutate(category = case_when(
x > 25 & y >25 ~ "q1",
x > 0 & y > 0 ~ "q2",
x < 0 & y < 0 ~ "q3",
TRUE ~ "other"
)
)
I'm trying to aggregate the variable Schulbildung which are less then 12. And aggregate the value of n. I tried using the aggregate() function but it didn't work. Has somebody any idea?
Use mutate with an ifelse statement to recode every value that is smaller than 12.
Summarise then with dplyr.
df <- data.frame(
Education = c(18, 16, 15, 12, 10, 8),
entries = c(200, 100, 50, 50, 10 ,5)
)
You said Education is a grouping varibale, so this means this is not the original data.frame, right?
df %>%
ungroup() %>%
mutate(Education = ifelse(Education < 12, "others", Education)) %>%
group_by(Education) %>%
summarise(entries = sum(entries))