Recode a range of values into one number using 'recode()' in tidyverse - r

I am working with a very messy data set and I'll be needing to use the recode() function in a pipe to turn numbers 0:30 into four numerical categories (0,1,2,3,4).
What I have:
recode(var, 10:30 = 4,
6:9 = 3,
3:5 = 2,
1:2 = 1,
0 = 0))
Any help is greatly appreciated!

It may be easier with case_when
library(dplyr)
case_when(var %in% 10:30 ~ 4,
var %in% 6:9 ~ 3,
var %in% 3:5 ~ 2,
var %in% 1:2 ~ 1,
var == 0 ~ 0)
Or another option is cut
as.integer(cut(var, breaks = c(-Inf, 0, 2, 5, 9, 30, Inf)))
NOTE: change the include.lowest and right option in cut to adjust
data
set.seed(24)
var <- sample(0:35, 50, replace = TRUE)

Related

User defined function in R with dplyr

I have a dataframe and try to create a function that calculate number of records by TRT01AN and another variable chosen by the user (I just send a reduced DF with only one extra variable to make it simpler)
dataframe <- as.data.frame(cbind(ID,=c(1,2,3,4,5,6),TRT01AN = c(1, 1, 3, 2, 2, 2),
AGEGR1 =c("Adult","Child","Adolescent","Adolescent","Adolescent","Child")))
sub1 <- function(SUB1) {
# Calculate number of subjects in each treatment arm
bigN1 <- dataframe %>%
group_by_(SUB1,TRT01AN) %>%
summarise(N = n_distinct(ID))
return(bigN1)
}
bigN1<-sub1(SUB1="AGEGR1")
If I do that , with group_by_ I have an error that TRT01AN doesn't exist and if I use group_by, SUB1 can't be found... Any idea how I can have both variables, a "permanent" one and on defined as the argument of the function?
Thank you!
Try using curly braces (works with or without quotation marks in function call):
library(dplyr)
dataframe <-
as.data.frame(cbind(
ID = c(1, 2, 3, 4, 5, 6),
TRT01AN = c(1, 1, 3, 2, 2, 2),
AGEGR1 = c(
"Adult",
"Child",
"Adolescent",
"Adolescent",
"Adolescent",
"Child"
)
))
sub1 <- function(SUB1) {
# Calculate number of subjects in each treatment arm
bigN1 <- dataframe %>%
group_by({{SUB1}}, TRT01AN) %>%
summarise(N = n_distinct(ID))
return(bigN1)
}
bigN1 <- sub1(AGEGR1)

How to insert multiple columns at specific positions in large data frame in R?

I have a large data frame in R with over 200 participants who have answered 152 questions. Now, I want to insert a column based on a conditional query after each "Answer" column. As an example, I have the following data frame:
data <- data.frame(Participants = 1:5,
Answer1 = c(4, 6, 2, 2, 3),
Answer2 = c(5, 1, 3, 5, 4))
I now want to insert a new conditional column of "Confidence" after each "Answer" column. For the column of "Answer1", the query would look like this:
data$Confidence1 <- ifelse(data$Answer1 == 1 | data$Answer1 == 6, 2, ifelse(data$Answer1 == 2 | data$Answer1 == 5, 1, 0))
In the end, I want the data frame to look like this:
data <- data.frame(Participants = 1:5,
Answer1 = c(4, 6, 2, 2, 3),
Confidence1 = c(0, 2, 1, 1, 0),
Answer2 = c(5, 1, 3, 5, 4),
Confidence2 = c(1, 2, 0, 1, 0))
Does anyone have an idea on how to achieve this for all "Answer" columns at once? Thanks!
Here is one solution using the modify() function from the purrr package the response by #r2evans from merge two data table into one, with alternating columns in R
library(purrr)
# create data, adding one more answer to your example
data <- data.frame(Participants = 1:5,
Answer1 = c(4, 6, 2, 2, 3),
Answer2 = c(5, 1, 3, 5, 4),
Answer3 = c(3, 2, 3, 4, 5))
# make a new df containing only the answer columns from data
answers <- data[,2:4]
# make confidence df and give it correct names
conf <- modify(answers, function(x) ifelse(x == 1 | x == 6, 2, ifelse(x == 2 | x == 5, 1, 0)))
names(conf) <- paste0("Confidence",1:ncol(answers))
# set order
neworder <- order(c(2*(seq_along(answers) - 1) + 1,
2*seq_along(conf)))
# put it together
cbind(Participants = data[,1], cbind(answers, conf)[,neworder])

How to loop through vector of column names, mutate each column with assignment back to column and function referencing loop index

If you can forgive my interest in loops, I'd like to know how to loop through a vector of variable names (must be strings in my use case) and mutate the original columns. In this toy example, I want to calculate the mean of the column i plus z.
df_have <- data.frame(x=c(1, 1, 2, 3, 3),
y=c(2, 2, 3, 4, 4),
z=c(0, 1, 2, 3, 4))
for (i in c("x", "y")) {
df_test <-
df_have %>%
mutate(!!i := mean(i)+z)
}
df_want <- data.frame(x=c(2, 3, 4, 5, 6), # mean 2 + z
y=c(3, 4, 5, 6, 7), # mean 3 + z
z=c(0, 1, 2, 3, 4))
Well, if you want to do a loop, then
df_test <- df_have
for (i in c("x", "y")) {
df_test <-
df_test %>%
mutate(!!i := mean((!!as.name(i)))+z)
}
Note you need to turns those strings into symbols in order to use in the expression for mutate. An eaiser trick in this case would be
df_have %>% mutate_at(c("x","y"), funs(mean(.)+z))

Loop through each variable and collect output R

I have a data frame that looks like this. names and number of columns will NOT be consistent (sometimes 'C' will not be present, other times "D', 'E', 'F' may be present, etc.)
# name and number of columns varies...so need flexible process
A <- c(1, 2, 1, 2, 3, 2, 1, 1, 1, 2, 1, 4, 3, 1, 2, 2, 1, 2, 4, 8)
B <- c(5, 6, 6, 5, 3, 7, 2, 1, 1, 2, 7, 4, 7, 8, 5, 7, 6, 6, 4, 7)
C <- c(9, 1, 2, 2, 1, 4, 5, 6, 7, 8, 89, 9, 7, 6, 5, 6, 8, 9 , 67, 6)
ABC <- data.frame(A, B, C)
I want to loop through each variable and collect various information. This is a simple example, but what I am doing will be more complicated. I say that so that somebody doesn't just recommend some sort of summary() type solution.
maximum_value <- max(A)
mean_value <- mean(A)
# lots of other calculations for A
ID = 'A'
tempA <- data.frame(ID, maximum_value, mean_value)
maximum_value <- max(B)
mean_value <- mean(B)
# lots of other calculations for B
ID = 'B'
tempB <- data.frame(ID, maximum_value, mean_value)
maximum_value <- max(C)
mean_value <- mean(C)
# lots of other calculations for C
ID = 'C'
tempC <- data.frame(ID, maximum_value, mean_value)
output <- rbind(tempA, tempB, tempC)
Here is my attempt at creating a loop to go through the variables one by one and aggregate output. I can't figure out how to get [i] to point at an individual column of the data frame ABC.
# initialize data frame
data__ <- data.frame(ID__ = as.character(),
max__ = as.numeric(),
mean__ = as.numeric())
# loop through A, then B, then C
for(i in A:C) {
ID__ <- '[i]'
max__ <- maximum[i]
mean__ <- mean[i]
data__temp <- (ID__, max__, mean__)
data__ <- rbind(data__, data__temp)
}
If I were doing this in SAS, I would use a select into within proc sql to create a list of the variable names, then write an array, then i could loop through them that way, but there's something I'm missing here.
How would I tell R to do this process for each variable in the data frame?
If you use the tidyverse dplyr and tidyr package, you can do
library(tidyr)
ABC %>% gather(ID, value) %>% group_by(ID) %>% summarize_all(funs(mean, max))
or
ABC %>% gather(ID, value) %>% group_by(ID) %>%
summarize(maximum_value = max(value), mean_value=mean(value))
If you'd rather use base functions and there are a lot of "weird" functions, you can use purrr's map_df function
library(purrr)
map2_df(ABC, names(ABC), function(a, n) {
data_frame(ID=n, max_val=max(a), mean_val=mean(a))
})

OR operator in filter()?

I want to use the filter() function to find the types that have an x value less than or equal to 4, OR a y value greater than 5. I think this might be a simple fix I just can't find much info on ?filter(). I almost have it I think:
x = c(1, 2, 3, 4, 5, 6)
y = c(3, 6, 1, 9, 1, 1)
type = c("cars", "bikes", "trains")
df = data.frame(x, y, type)
df2 = df %>%
filter(x<=4)
Try
df %>%
filter(x <=4| y>=5)

Resources