Multiply value with specific columns - r

I want to multiply a value (0.045) with specific columns (that start with "i") in a dataset. There is also a column called "id" that has the value 0.045 in all rows.
I've tried this, which did not work:
df %>%
mutate(across(starts_with("i")), ~.id)
The columns to be multiplied can be specified based on position or based on the fact that they all start with "i"
Hope someone can help me.
Thanks a lot!
Magnus

Try this. I used iris dataset in order to create the example. Be careful that the new definition for mutating the columns should be inside across() and not outside it, as you have in the shared code. Here the solution:
library(tidyverse)
#Code
iris %>%
mutate(across(starts_with("Sepal"), ~.*0.045))
Output (some rows):
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 0.2295 0.1575 1.4 0.2 setosa
2 0.2205 0.1350 1.4 0.2 setosa
3 0.2115 0.1440 1.3 0.2 setosa
4 0.2070 0.1395 1.5 0.2 setosa
5 0.2250 0.1620 1.4 0.2 setosa
6 0.2430 0.1755 1.7 0.4 setosa
7 0.2070 0.1530 1.4 0.3 setosa
8 0.2250 0.1530 1.5 0.2 setosa
9 0.1980 0.1305 1.4 0.2 setosa

Base R solution:
cols_bool <- startsWith(names(iris), "Sepal")
cbind(iris[,!cols_bool, drop = FALSE], iris[,cols_bool, drop = FALSE] * 0.045)

Related

Mutate if variable name appears in a list

I would like to use dplyr to divide a subset of variables by the IQR. I am open to ideas that use a different approach than what I've tried before, which is a combination of mutate_if and %in%. I want to reference the list bin instead of indexing the data frame by position. Thanks for any thoughts!
contin <- c("age", "ct")
data %>%
mutate_if(%in% contin, function(x) x/IQR(x))
You should use:
data %>%
mutate(across(all_of(contin), ~.x/IQR(.x)))
Working example:
data <- head(iris)
contin <- c("Sepal.Length", "Sepal.Width")
data %>%
mutate(across(all_of(contin), ~.x/IQR(.x)))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 15.69231 7.777778 1.4 0.2 setosa
2 15.07692 6.666667 1.4 0.2 setosa
3 14.46154 7.111111 1.3 0.2 setosa
4 14.15385 6.888889 1.5 0.2 setosa
5 15.38462 8.000000 1.4 0.2 setosa
6 16.61538 8.666667 1.7 0.4 setosa

R: Transforming variables over many columns

I want to transform multiple columns in a large data.frame at once using across.
As an example I want to make this transformation
library(tidyverse)
iris %>% mutate(Sepal.Length2 = (Sepal.Length^4-min(Sepal.Length^4)) / (max(Sepal.Length^4) - min(Sepal.Length^4)))
but for all columns starting with "Sepal".
I think, I can use this command, but I can't figure how I can add my function.
iris %>% mutate(across(starts_with("Sepal")), ... )
Sorry if it is too trivial, but I don't know what I have to enter into google to find some useful pages.
We can use
library(dplyr)
iris1 <- iris %>%
mutate(across(starts_with("Sepal"),
~ (.^4-min(.^4)) / (max(.^4) - min(.^4)), .names = '{.col}2'))
my_function <- function(x) {
y = x^4-min(x^4)/max(x^4)/min(x^4)
return=y
}
iris %>%
mutate(across(starts_with("Sepal"), my_function))
Output:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 676.5198 150.05983 1.4 0.2 setosa
2 576.4798 80.99733 1.4 0.2 setosa
3 487.9678 104.85493 1.3 0.2 setosa
4 447.7453 92.34943 1.5 0.2 setosa
5 624.9997 167.95893 1.4 0.2 setosa
6 850.3053 231.34143 1.7 0.4 setosa
7 447.7453 133.63093 1.4 0.3 setosa
8 624.9997 133.63093 1.5 0.2 setosa
9 374.8093 70.72543 1.4 0.2 setosa
10 576.4798 92.34943 1.5 0.1 setosa
11 850.3053 187.41343 1.5 0.2 setosa
12 530.8413 133.63093 1.6 0.2 setosa
13 530.8413 80.99733 1.4 0.1 setosa
14 341.8798 80.99733 1.1 0.1 setosa
15 1131.6493 255.99733 1.2 0.2 setosa
.....

How to tidily create multiple columns from sets of columns?

I'm looking to use a non-across function from mutate to create multiple columns. My problem is that the variable in the function will change along with the crossed variables. Here's an example:
needs=c('Sepal.Length','Petal.Length')
iris %>% mutate_at(needs, ~./'{col}.Width')
This obviously doesn't work, but I'm looking to divide Sepal.Length by Sepal.Width and Petal.Length by Petal.Width.
I think your needs should be something which is common in both the columns.
You can select the columns based on the pattern in needs and divide the data based on position. !! and := is used to assign name of the new columns.
library(dplyr)
library(rlang)
needs = c('Sepal','Petal')
purrr::map_dfc(needs, ~iris %>%
select(matches(.x)) %>%
transmute(!!paste0(.x, '_divide') := .[[1]]/.[[2]]))
# Sepal_divide Petal_divide
#1 1.457142857 7.000000000
#2 1.633333333 7.000000000
#3 1.468750000 6.500000000
#4 1.483870968 7.500000000
#...
#...
If you want to add these as new columns you can do bind_cols the above with iris.
Here is a base R approach based that the columns you want to divide have a similar name pattern,
res <- sapply(split.default(iris[-ncol(iris)], sub('\\..*', '', names(iris[-ncol(iris)]))), function(i) i[1] / i[2])
iris[names(res)] <- res
head(iris)
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species Petal.Petal.Length Sepal.Sepal.Length
#1 5.1 3.5 1.4 0.2 setosa 7.00 1.457143
#2 4.9 3.0 1.4 0.2 setosa 7.00 1.633333
#3 4.7 3.2 1.3 0.2 setosa 6.50 1.468750
#4 4.6 3.1 1.5 0.2 setosa 7.50 1.483871
#5 5.0 3.6 1.4 0.2 setosa 7.00 1.388889
#6 5.4 3.9 1.7 0.4 setosa 4.25 1.384615

R Defined Function to review numeric column and calculate log

I have a dataframe with 10 vars. Three are factors and seven are numeric. I want to write a defined function that looks through each column and determines if it is numeric; and if it is numeric calculate the log.
Here's one simple way with dplyr package -
your_df %>%
mutate_if(is.numeric, log)
As per comment, if you want to keep the original variables and add the logs as new variables -
your_df %>%
mutate_if(is.numeric, list(LG = ~log))
Example -
head(iris) %>%
mutate_if(is.numeric, list(LG = ~log))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length_LG Sepal.Width_LG Petal.Length_LG Petal.Width_LG
1 5.1 3.5 1.4 0.2 setosa 1.629241 1.252763 0.3364722 -1.6094379
2 4.9 3.0 1.4 0.2 setosa 1.589235 1.098612 0.3364722 -1.6094379
3 4.7 3.2 1.3 0.2 setosa 1.547563 1.163151 0.2623643 -1.6094379
4 4.6 3.1 1.5 0.2 setosa 1.526056 1.131402 0.4054651 -1.6094379
5 5.0 3.6 1.4 0.2 setosa 1.609438 1.280934 0.3364722 -1.6094379
6 5.4 3.9 1.7 0.4 setosa 1.686399 1.360977 0.5306283 -0.9162907
Using "dplyr" package you can select only numeric columns and calculate log. In my example I used "iris" dataset:
iris_1 <- as.data.frame(lapply(iris %>% select_if(is.numeric), log))
> head(iris_1)
Sepal.Length Sepal.Width Petal.Length Petal.Width
1 1.629241 1.252763 0.3364722 -1.6094379
2 1.589235 1.098612 0.3364722 -1.6094379
3 1.547563 1.163151 0.2623643 -1.6094379
4 1.526056 1.131402 0.4054651 -1.6094379
5 1.609438 1.280934 0.3364722 -1.6094379
6 1.686399 1.360977 0.5306283 -0.9162907

Defining functions (in rollapply) using lines of a dataframe

First of all, I have a dataframe (lets call it "years") with 5 rows and 10 columns. I need to build a new one doing (x1-x2)/x1, being x1 the first element and x2 the second element of a column in "years", then (x2-x3)/x2 and so forth. I thought rollapply would be the best tool for the task, but I can't figure out how to define such function to insert it in rollapply.
I'm new to R, so I hope my question is not too basic. Anyway, I couldn't find a similar question here so I'd be really thankful if someone could help me.
You can use transform, diff and length, no need to use rollapply
> df <- head(iris,5) # some data
> transform(df, New = c(NA, diff(Sepal.Length)/Sepal.Length[-length(Sepal.Length)] ))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species New
1 5.1 3.5 1.4 0.2 setosa NA
2 4.9 3.0 1.4 0.2 setosa -0.03921569
3 4.7 3.2 1.3 0.2 setosa -0.04081633
4 4.6 3.1 1.5 0.2 setosa -0.02127660
5 5.0 3.6 1.4 0.2 setosa 0.08695652
diff.zoo in the zoo package with the arithmetic=FALSE argument will divide each number by the prior in each column:
library(zoo)
as.data.frame(1 - diff(zoo(DF), arithmetic = FALSE))

Resources