I want to create a new column in my dataset that: i) drop the last 1 character if the word itself starts with "vi"; and ii) drop the last 2 characters if the word itself does not start with "vi". I know how to work on that in R, like below:
iris$Species <- as.character(iris$Species)
iris$Species_mod <- substr(iris$Species,
1,
ifelse(grepl('^vi',iris$Species),
nchar(iris$Species)-1,
nchar(iris$Species)-2))
But I have a hard time in deciphering the mutate, if_else and matches in dplyr. Can anyone enlighten me? Thanks!
Same idea, except you explicitly need to convert your factor to a string
iris = mutate(iris, Species_mod = substr(Species, 1, nchar(as.character(Species)) -
ifelse(grepl('^vi', Species), 1, 2)))
you can try something like:
iris %>%
mutate(Species = as.character(Species)) %>%
rowwise() %>%
mutate(species2 = case_when(
Species == 'vi%' ~ substr(Species, 0, nchar(Species) - 1),
Species != 'vi%' ~ substr(Species, 0, nchar(Species) - 2)
))
Related
I am using tbl_summary in R to create a table, but I do not like the default ordering of the columns.
```
tbl9 <- DH_df2 %>% select(Age_Group, BOP_Level)
tbl9 %>%
tbl_summary(by = Age_Group,
missing = "no") %>%
add_p(everything() ~ "chisq.test") %>%
modify_header(label ~ "**Age Groups**") %>%
modify_caption("**Table 1. Correlation Age and BOP**") %>%
bold_labels()
```
My output gives me the correct information and table, but I want to move "Under 35" to be the come before "Age36~50". How can I do that?
Current Column output: Age Groups / Age36~50 / Age51~ / Under 35 / p-value
The easiest way to define the order of the by variable levels is to make your by variable a factor with the levels in the order you'd like them to appear.
PS if you provide reproducible examples in your posts (i.e. code we can all run on our machines), you'll get more detailed solutions.
https://statisticsglobe.com/reorder-columns-of-data-frame-in-r
These are the basic methods (taken from above link).
use the indexing operator:
data[ , c(2, 1, 3)] with index
data[ , c("x2", "x1", "x3")] with names
use subset:
subset(data, select = c(2, 1, 3))
use dplyrs select function:
data %>% select(x2, x1, x3)
You can use relocate function from the dplyr package.
library(dplyr)
tbl9 <- tbl9 %>%
relocate(` Under 35`, .before = `Age36~50`)
I need to set the values of a column to 0 or 1 based on other columns values.
If they are 0 or NA the new column should be 1.
I Thought about:
ifelse(df[,53:62]==0|NA, df$newCol <- 1, df$newCol <- 0)
But I the End I get only 1 in the new Column
Thanks for your help
I think the tidyverse fits perfectly on this common use case
library(tidyverse)
df_example <- matrix(c(0,1),ncol = 100,nrow = 100) %>%
as_tibble()
df_example %>%
mutate(across(.cols = 53:62,
.fns = ~ if_else(.x == 0|is.na(.x),
1,
0))
) %>%
select(V54) # example**
In a dplyr workflow I try to paste a 0 in each column of a dataframe after the newvar column when newvar == 0, else do nothing.
I modified the iris dataset:
library(dplyr)
n <- 150 # sample size
iris1 <- iris %>%
mutate(id = row_number(), .before = Sepal.Length) %>%
mutate(newvar = sample(c(0,1), replace=TRUE, size=n), .before = Sepal.Length ) %>%
mutate(across(.[,3:ncol(.)], ~ case_when(newvar==0 ~ 0)))
I tried a solution like here How to combine the across () function with mutate () and case_when () to mutate values in multiple columns according to a condition?.
My understanding:
with .[,3:ncol(.)] I go through the columns after newvar column.
with case_when(newvar==0 I try to set the condition.
with ~ 0 after newvar==0 I try to say paste 0 if condition is fulfilled.
I know that I am doing something wrong, but I don't know what! Thank you for your help.
.[,3:ncol(.)] are the values of the column and not the actual column numbers. Using 3:ncol(.) should work fine.
In general, it is also better to avoid referring column by positions and instead use their names. You can do this in one mutate call.
library(dplyr)
n <- 150
iris %>%
mutate(id = row_number(),
newvar = sample(c(0,1), replace=TRUE, size=n),
across(Sepal.Length:Petal.Width, ~ case_when(newvar==0 ~ 0,
newvar == 1 ~ .)))
I would like to add a column to my data frame based upon the values in other columns.
Here is an extract of the data.
On each row if any of the 4 TOPER columns have any of the following values (92514, 92515, 92508, 92510, 92511 or 92512( i want the S_Flag column to be equal to 1, If not the S_Flag value should be 0.
Have highlighted the data where this true (case nos 2, 4, 6 and 8) - therefore S_Flag should be made 1.
Have tried using a ifelse inside a mutate function. Just not sure how to identify looking across all 4 TOPER columns within the ifelse function???
Have tried
tt <- mutate(rr, S_Flag = ifelse( any(vars(everything()) %in% toper_vec), 1,0))
where rr is the original data frame and toper_vec is a vector containing the 6 TOPER column values.
Hope that makes sense. By the way i am in early stages of learning R.
Thank you for any assistance.
A couple of quick fixes should make your code work:
(1) use rowwise() and
(2) use across().
The revised code reads:
tt <- rr %>%
rowwise() %>%
mutate(S_Flag = if_else( any(across(everything()) %in% toper_vec), 1,0))
A similar question was addressed in the following informative post: Check row wise if value is present in column and update new column row wise
Applying the suggested approach in that post to your immediate question, the following should work:
library(tidyverse)
toper_vec <- c(92514, 92515, 92508, 92510, 92511, 92512)
df <- data.frame("CASE" = c(1, 2, 3, 4, 5),
"TOPER1" = c(86509, 92514, 87659, 45232, 86509),
"TOPER2" = c(12341, 10094, 12341, 92508, 10094),
"TOPER3" = c(86509, 67326, 41908, 50567, 50567))
new_df <- df %>%
rowwise() %>%
mutate(S_Flag = case_when(TOPER1 %in% toper_vec ~ 1,
TOPER2 %in% toper_vec ~ 1,
TOPER3 %in% toper_vec ~ 1,
TRUE ~ 0))
Here's an alternative, reusing toper_vec and df from Blue050205 :
df %>%
rowwise() %>%
mutate(s_flag = if_else(any(c_across(starts_with("TOP")) %in% toper_vec), 1, 0))
I am trying to replace some filtered values of a data set. So far, I wrote this lines of code:
df %>%
filter(group1 == uniq[i]) %>%
mutate(values = ifelse(sum(values) < 1, 2, NA)),
where uniq is just a list containing variable names I want to focus on (and group1 and values are column names). This is actually working. However, it only outputs the altered filtered rows and does not replace anything in the data set df. Does anyone have an idea, where my mistake is? Thank you so much! The following code is to reproduce the example:
group1 <- c("A","A","A","B","B","C")
values <- c(0.6,0.3,0.1,0.2,0.8,0.9)
df = data.frame(group1, group2, values)
uniq <- unique(unlist(df$group1))
for (i in 1:length(uniq)){
df <- df %>%
filter(group1 == uniq[i]) %>%
mutate(values = ifelse(sum(values) < 1, 2, NA))
}
What I would like to get is that it leaves all values except the last one since it is one unique group (group1 == C) and 0.9 < 1. So I'd like to get the exact same data frame here except that 0.9 is replaced with NA. Moreover, would it be possible to just use if instead of ifelse?
dplyr won't create a new object unless you use an assignment operator (<-).
Compare
require(dplyr)
data(mtcars)
mtcars %>% filter(cyl == 4)
with
mtcars4 <- mtcars %>% filter(cyl == 4)
mtcars4
The data are the same, but in the second example the filtered data is stored in a new object mtcars4