How to change column order in tbl_summary() in R - r

I am using tbl_summary in R to create a table, but I do not like the default ordering of the columns.
```
tbl9 <- DH_df2 %>% select(Age_Group, BOP_Level)
tbl9 %>%
tbl_summary(by = Age_Group,
missing = "no") %>%
add_p(everything() ~ "chisq.test") %>%
modify_header(label ~ "**Age Groups**") %>%
modify_caption("**Table 1. Correlation Age and BOP**") %>%
bold_labels()
```
My output gives me the correct information and table, but I want to move "Under 35" to be the come before "Age36~50". How can I do that?
Current Column output: Age Groups / Age36~50 / Age51~ / Under 35 / p-value

The easiest way to define the order of the by variable levels is to make your by variable a factor with the levels in the order you'd like them to appear.
PS if you provide reproducible examples in your posts (i.e. code we can all run on our machines), you'll get more detailed solutions.

https://statisticsglobe.com/reorder-columns-of-data-frame-in-r
These are the basic methods (taken from above link).
use the indexing operator:
data[ , c(2, 1, 3)] with index
data[ , c("x2", "x1", "x3")] with names
use subset:
subset(data, select = c(2, 1, 3))
use dplyrs select function:
data %>% select(x2, x1, x3)

You can use relocate function from the dplyr package.
library(dplyr)
tbl9 <- tbl9 %>%
relocate(` Under 35`, .before = `Age36~50`)

Related

mutate cells of a range of columns if the column name is in another column

I have a huge dataset where I would like to change a cell value in a range of columns, if the column name is in another column.
I know I can loop through cells, and use ifelse, but this becomes very slow very soon, it seems. I got as far as using mutate() and across() but cannot work out how to make a logical with the column name.
I would be grateful if someone could suggest a vectorized approach, or point me to a similar question (which I was unable to find!), using tidyverse if possible.
Example of a dataset and the nested for loops:
a <- c(1,2,3,4)
b <- c(5,6,7,8)
c <- c(9,10,11,12)
d <- c("a","b","c","none")
test <- data.frame(a,c,b,d)
for(column in 1:3){
for(row in 1:nrow(test)){
test[row,column] <- ifelse(names(test)[column] == test$d[row], -99, test[row, column])
}
}
I found the solution to my own question in using current_col() which gives the name of the current column in an across()function, using ifelse().
test %>% mutate(across(c(a, b, c), ~ifelse(cur_column() == d, -99, .)))
You could do this for every column of interest as well as any reference column in your dataset.
library(tidyverse)
test %>%
mutate(a = case_when(
d == names(test)[1] ~ -99,
T ~ a
))
You could then add a new mutate, or include it in the same mutate, per "target" column (i.e.)
test %>%
mutate(a = case_when(
d == names(test)[1] ~ -99,
T ~ a
)) %>%
mutate(b = case_when(
d == names(test)[2] ~ -99,
T ~ b
))
If you have multiple source columns (i.e. Columns like d, then you would need to add new rows to your mutates that account for that column, however since your test does not include that I won't get into it unless required.

Using named list to generate cases for case_when

I am trying to figure out how to use case_when for groups stored in a list so modifying the list will modify the result of the case_when.
Here is a toy test case:
library(tidyverse)
info <- tibble(target = letters[seq(1, 10)])
groups <- list("A" = letters[seq(1, 10, by = 3)],
"B" = letters[seq(2, 10, by = 3)],
"C" = letters[seq(3, 10, by = 3)])
info %>% mutate(case_when(
target %in% groups$A ~ names(groups)[1],
target %in% groups$B ~ names(groups)[2],
target %in% groups$C ~ names(groups)[3]
))
This gives the output I want but I want to generate the options in the case_when dynamically from the list. I imagine it would be something like this:
generate_cases <- function(x, i) {
### I have no idea what to do here...
}
cases <- groups %>% imap(generate_cases)
info %>% mutate(case_when(!!! cases))
I suspect something use quo() and rlang::expr() but I really can't figure out how to string it together.
Here's one way using the purrr::imap function
cases <- imap(groups, ~quo(target %in% !!.x ~ !!.y))
info %>% mutate(case_when(
!!!cases
))
A better alternative might be to reshape your groups into a proper lookup table so you can do an efficent left-join. One way would be
info %>%
left_join(stack(groups), by=c("target"="values"))

How to add a new column using mutate function from a group of existing columns with similar names

I would like to add a column to my data frame based upon the values in other columns.
Here is an extract of the data.
On each row if any of the 4 TOPER columns have any of the following values (92514, 92515, 92508, 92510, 92511 or 92512( i want the S_Flag column to be equal to 1, If not the S_Flag value should be 0.
Have highlighted the data where this true (case nos 2, 4, 6 and 8) - therefore S_Flag should be made 1.
Have tried using a ifelse inside a mutate function. Just not sure how to identify looking across all 4 TOPER columns within the ifelse function???
Have tried
tt <- mutate(rr, S_Flag = ifelse( any(vars(everything()) %in% toper_vec), 1,0))
where rr is the original data frame and toper_vec is a vector containing the 6 TOPER column values.
Hope that makes sense. By the way i am in early stages of learning R.
Thank you for any assistance.
A couple of quick fixes should make your code work:
(1) use rowwise() and
(2) use across().
The revised code reads:
tt <- rr %>%
rowwise() %>%
mutate(S_Flag = if_else( any(across(everything()) %in% toper_vec), 1,0))
A similar question was addressed in the following informative post: Check row wise if value is present in column and update new column row wise
Applying the suggested approach in that post to your immediate question, the following should work:
library(tidyverse)
toper_vec <- c(92514, 92515, 92508, 92510, 92511, 92512)
df <- data.frame("CASE" = c(1, 2, 3, 4, 5),
"TOPER1" = c(86509, 92514, 87659, 45232, 86509),
"TOPER2" = c(12341, 10094, 12341, 92508, 10094),
"TOPER3" = c(86509, 67326, 41908, 50567, 50567))
new_df <- df %>%
rowwise() %>%
mutate(S_Flag = case_when(TOPER1 %in% toper_vec ~ 1,
TOPER2 %in% toper_vec ~ 1,
TOPER3 %in% toper_vec ~ 1,
TRUE ~ 0))
Here's an alternative, reusing toper_vec and df from Blue050205 :
df %>%
rowwise() %>%
mutate(s_flag = if_else(any(c_across(starts_with("TOP")) %in% toper_vec), 1, 0))

Add multiple columns with mutate using column-based conditions, without using explicit column name + POSIX

I have a dataframe of data: 1 column is POSIX, the rest is data.
I need to remove selectively some data from a group of columns and add these "new" columns to the original dataframe.
I can "easily" do it in base R (I am an old-style user). I'd like to do it more compactly with mutate_at or with other function... although I am having several issues.
A solution homemade with base R could be
df <- data.frame("date" = seq.POSIXt(as.POSIXct(format(Sys.time(),"%F %T"),tz="UTC"),length.out=20,by="min"), "a.1" = rnorm(20,0,3), "a.2" = rnorm(20,1,2), "b.1"= rnorm(20,1,4), "b.2"= rnorm(20,3,4))
df1 <- lapply(df[,grep("^a",names(df))], function(x) replace(x, which(x > 0 & x < 0.2), NA))
df1 <- data.frame(matrix(unlist(df1), nrow = nrow(df), byrow = F)) ## convert to data.frame
names(df1) <- grep("^a",names(df),value=T) ## rename columns
df1 <- cbind.data.frame("date"=df$date, df1) ## add date
Can anyone help me in setting up something working with dplyr + transmute?
So far I come up with something like:
df %>%
select(starts_with("a.")) %>%
transmute(
case_when(
.>0.2 ~ NA,
)
) %>%
cbind.data.frame(df)
But I am quite stuck, since I can't combine transmute with case_when: all examples that I found use explicitly the column names in case_when, but I can't, since I won't know the names of the column in advance. I will only know the initial of the columns that I need to transmute.
Thanks,
Alex
We can use transmute_at if the intention is to return only those columns specified in the vars
library(dplyr)
df %>%
transmute_at(vars(starts_with('a')), ~ case_when(. > 0.2~ NA_real_, TRUE~ .)) %>%
bind_cols(df %>% select(date), .)
If we need all the columns to return, but only change the columns of interest in vars, then we need mutate_at instead of transmute_at
df %>%
mutate_at(vars(starts_with('a')), ~ case_when(. > 0.2~ NA_real_, TRUE~ .)) %>%
select(date, starts_with('a')) # only need if we are selecting a subset of columns

conditional mutate by dplyr

I want to create a new column in my dataset that: i) drop the last 1 character if the word itself starts with "vi"; and ii) drop the last 2 characters if the word itself does not start with "vi". I know how to work on that in R, like below:
iris$Species <- as.character(iris$Species)
iris$Species_mod <- substr(iris$Species,
1,
ifelse(grepl('^vi',iris$Species),
nchar(iris$Species)-1,
nchar(iris$Species)-2))
But I have a hard time in deciphering the mutate, if_else and matches in dplyr. Can anyone enlighten me? Thanks!
Same idea, except you explicitly need to convert your factor to a string
iris = mutate(iris, Species_mod = substr(Species, 1, nchar(as.character(Species)) -
ifelse(grepl('^vi', Species), 1, 2)))
you can try something like:
iris %>%
mutate(Species = as.character(Species)) %>%
rowwise() %>%
mutate(species2 = case_when(
Species == 'vi%' ~ substr(Species, 0, nchar(Species) - 1),
Species != 'vi%' ~ substr(Species, 0, nchar(Species) - 2)
))

Resources