Mutate if variable name appears in a list - r

I would like to use dplyr to divide a subset of variables by the IQR. I am open to ideas that use a different approach than what I've tried before, which is a combination of mutate_if and %in%. I want to reference the list bin instead of indexing the data frame by position. Thanks for any thoughts!
contin <- c("age", "ct")
data %>%
mutate_if(%in% contin, function(x) x/IQR(x))

You should use:
data %>%
mutate(across(all_of(contin), ~.x/IQR(.x)))
Working example:
data <- head(iris)
contin <- c("Sepal.Length", "Sepal.Width")
data %>%
mutate(across(all_of(contin), ~.x/IQR(.x)))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 15.69231 7.777778 1.4 0.2 setosa
2 15.07692 6.666667 1.4 0.2 setosa
3 14.46154 7.111111 1.3 0.2 setosa
4 14.15385 6.888889 1.5 0.2 setosa
5 15.38462 8.000000 1.4 0.2 setosa
6 16.61538 8.666667 1.7 0.4 setosa

Related

Rename several columns using start with in r

I want to rename multiple columns that starts with the same string.
However, all the codes I tried did not change the columns.
For example this:
df %>% rename_at(vars(matches('^oldname,\\d+$')), ~ str_replace(., 'oldname', 'newname'))
And also this:
df %>% rename_at(vars(starts_with(oldname)), funs(sub(oldname, newname, .))
Are you familiar with a suitable code for rename?
Thank you!
Take iris for example, you can use rename_with() to replace those column names started with "Petal" with a new string.
head(iris) %>%
rename_with(~ sub("^Petal", "New", .x), starts_with("Petal"))
Sepal.Length Sepal.Width New.Length New.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
You can also use rename_at() in this case, although rename_if(), rename_at(), and rename_all() have been superseded by rename_with().
head(iris) %>%
rename_at(vars(starts_with("Petal")), ~ sub("^Petal", "New", .x))

Multiply value with specific columns

I want to multiply a value (0.045) with specific columns (that start with "i") in a dataset. There is also a column called "id" that has the value 0.045 in all rows.
I've tried this, which did not work:
df %>%
mutate(across(starts_with("i")), ~.id)
The columns to be multiplied can be specified based on position or based on the fact that they all start with "i"
Hope someone can help me.
Thanks a lot!
Magnus
Try this. I used iris dataset in order to create the example. Be careful that the new definition for mutating the columns should be inside across() and not outside it, as you have in the shared code. Here the solution:
library(tidyverse)
#Code
iris %>%
mutate(across(starts_with("Sepal"), ~.*0.045))
Output (some rows):
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 0.2295 0.1575 1.4 0.2 setosa
2 0.2205 0.1350 1.4 0.2 setosa
3 0.2115 0.1440 1.3 0.2 setosa
4 0.2070 0.1395 1.5 0.2 setosa
5 0.2250 0.1620 1.4 0.2 setosa
6 0.2430 0.1755 1.7 0.4 setosa
7 0.2070 0.1530 1.4 0.3 setosa
8 0.2250 0.1530 1.5 0.2 setosa
9 0.1980 0.1305 1.4 0.2 setosa
Base R solution:
cols_bool <- startsWith(names(iris), "Sepal")
cbind(iris[,!cols_bool, drop = FALSE], iris[,cols_bool, drop = FALSE] * 0.045)

How to tidily create multiple columns from sets of columns?

I'm looking to use a non-across function from mutate to create multiple columns. My problem is that the variable in the function will change along with the crossed variables. Here's an example:
needs=c('Sepal.Length','Petal.Length')
iris %>% mutate_at(needs, ~./'{col}.Width')
This obviously doesn't work, but I'm looking to divide Sepal.Length by Sepal.Width and Petal.Length by Petal.Width.
I think your needs should be something which is common in both the columns.
You can select the columns based on the pattern in needs and divide the data based on position. !! and := is used to assign name of the new columns.
library(dplyr)
library(rlang)
needs = c('Sepal','Petal')
purrr::map_dfc(needs, ~iris %>%
select(matches(.x)) %>%
transmute(!!paste0(.x, '_divide') := .[[1]]/.[[2]]))
# Sepal_divide Petal_divide
#1 1.457142857 7.000000000
#2 1.633333333 7.000000000
#3 1.468750000 6.500000000
#4 1.483870968 7.500000000
#...
#...
If you want to add these as new columns you can do bind_cols the above with iris.
Here is a base R approach based that the columns you want to divide have a similar name pattern,
res <- sapply(split.default(iris[-ncol(iris)], sub('\\..*', '', names(iris[-ncol(iris)]))), function(i) i[1] / i[2])
iris[names(res)] <- res
head(iris)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species Petal.Petal.Length Sepal.Sepal.Length
#1 5.1 3.5 1.4 0.2 setosa 7.00 1.457143
#2 4.9 3.0 1.4 0.2 setosa 7.00 1.633333
#3 4.7 3.2 1.3 0.2 setosa 6.50 1.468750
#4 4.6 3.1 1.5 0.2 setosa 7.50 1.483871
#5 5.0 3.6 1.4 0.2 setosa 7.00 1.388889
#6 5.4 3.9 1.7 0.4 setosa 4.25 1.384615

R Defined Function to review numeric column and calculate log

I have a dataframe with 10 vars. Three are factors and seven are numeric. I want to write a defined function that looks through each column and determines if it is numeric; and if it is numeric calculate the log.
Here's one simple way with dplyr package -
your_df %>%
mutate_if(is.numeric, log)
As per comment, if you want to keep the original variables and add the logs as new variables -
your_df %>%
mutate_if(is.numeric, list(LG = ~log))
Example -
head(iris) %>%
mutate_if(is.numeric, list(LG = ~log))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length_LG Sepal.Width_LG Petal.Length_LG Petal.Width_LG
1 5.1 3.5 1.4 0.2 setosa 1.629241 1.252763 0.3364722 -1.6094379
2 4.9 3.0 1.4 0.2 setosa 1.589235 1.098612 0.3364722 -1.6094379
3 4.7 3.2 1.3 0.2 setosa 1.547563 1.163151 0.2623643 -1.6094379
4 4.6 3.1 1.5 0.2 setosa 1.526056 1.131402 0.4054651 -1.6094379
5 5.0 3.6 1.4 0.2 setosa 1.609438 1.280934 0.3364722 -1.6094379
6 5.4 3.9 1.7 0.4 setosa 1.686399 1.360977 0.5306283 -0.9162907
Using "dplyr" package you can select only numeric columns and calculate log. In my example I used "iris" dataset:
iris_1 <- as.data.frame(lapply(iris %>% select_if(is.numeric), log))
> head(iris_1)
Sepal.Length Sepal.Width Petal.Length Petal.Width
1 1.629241 1.252763 0.3364722 -1.6094379
2 1.589235 1.098612 0.3364722 -1.6094379
3 1.547563 1.163151 0.2623643 -1.6094379
4 1.526056 1.131402 0.4054651 -1.6094379
5 1.609438 1.280934 0.3364722 -1.6094379
6 1.686399 1.360977 0.5306283 -0.9162907

When trying to call an object with get() within group_by and mutate, it brings up the entire object and not the grouped object. How do I fix this?

Here is my code:
data(iris)
spec<-names(iris[1:4])
iris$Size<-factor(ifelse(iris$Sepal.Length>5,"A","B"))
for(i in spec){
attach(iris)
output<-iris %>%
group_by(Size)%>%
mutate(
out=mean(get(i)))
detach(iris)
}
The for loop is written around some graphing and report writing that uses object 'i' in various parts. I am using dplyr and plyr.
Sepal.Length Sepal.Width Petal.Length Petal.Width Species Size out
1 5.1 3.5 1.4 0.2 setosa A 1.199333
2 4.9 3.0 1.4 0.2 setosa B 1.199333
3 4.7 3.2 1.3 0.2 setosa B 1.199333
4 4.6 3.1 1.5 0.2 setosa B 1.199333
5 5.0 3.6 1.4 0.2 setosa B 1.199333
Notice how that variable 'out' has the same mean, which is the mean of the entire dataset instead of the grouped mean.
> tapply(iris$Petal.Width,iris$Size,mean)
A B
1.432203 0.340625
> mean(iris$Petal.Width)
[1] 1.199333
Using get() and attach() isn't really consistent with dplyr because it's really messing up the environments in which the functions are evaulated. It would better to use the standard-evaluation equivalent of mutate here as described in the NSE vigette (vignette("nse", package="dplyr"))
for(i in spec){
output<-iris %>%
group_by(Size)%>%
mutate_(.dots=list(out=lazyeval::interp(~mean(x), x=as.name(i))))
# print(output)
}

Resources