I want to transform multiple columns in a large data.frame at once using across.
As an example I want to make this transformation
library(tidyverse)
iris %>% mutate(Sepal.Length2 = (Sepal.Length^4-min(Sepal.Length^4)) / (max(Sepal.Length^4) - min(Sepal.Length^4)))
but for all columns starting with "Sepal".
I think, I can use this command, but I can't figure how I can add my function.
iris %>% mutate(across(starts_with("Sepal")), ... )
Sorry if it is too trivial, but I don't know what I have to enter into google to find some useful pages.
We can use
library(dplyr)
iris1 <- iris %>%
mutate(across(starts_with("Sepal"),
~ (.^4-min(.^4)) / (max(.^4) - min(.^4)), .names = '{.col}2'))
my_function <- function(x) {
y = x^4-min(x^4)/max(x^4)/min(x^4)
return=y
}
iris %>%
mutate(across(starts_with("Sepal"), my_function))
Output:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 676.5198 150.05983 1.4 0.2 setosa
2 576.4798 80.99733 1.4 0.2 setosa
3 487.9678 104.85493 1.3 0.2 setosa
4 447.7453 92.34943 1.5 0.2 setosa
5 624.9997 167.95893 1.4 0.2 setosa
6 850.3053 231.34143 1.7 0.4 setosa
7 447.7453 133.63093 1.4 0.3 setosa
8 624.9997 133.63093 1.5 0.2 setosa
9 374.8093 70.72543 1.4 0.2 setosa
10 576.4798 92.34943 1.5 0.1 setosa
11 850.3053 187.41343 1.5 0.2 setosa
12 530.8413 133.63093 1.6 0.2 setosa
13 530.8413 80.99733 1.4 0.1 setosa
14 341.8798 80.99733 1.1 0.1 setosa
15 1131.6493 255.99733 1.2 0.2 setosa
.....
Related
Not sure why the first one has an error but the second line works? My understanding was using names(.) in the formulas tells R to use the data before pipe operator. It seems to work for .cols argument but not for formula.
iris%>%rename_with(~gsub("Petal","_",names(.)),all_of(names(.)))
iris%>%rename_with(~~gsub("Petal","_",names(iris)),all_of(names(.)))
rename_with applies a function to the names of the passed data frame. The function should be one that, given the vector of names, returns the altered names, so the syntax is much simpler than you are trying to make it:
iris %>%
rename_with(~ gsub("Petal", "_", .x))
#> Sepal.Length Sepal.Width _.Length _.Width Species
#> 1 5.1 3.5 1.4 0.2 setosa
#> 2 4.9 3.0 1.4 0.2 setosa
#> 3 4.7 3.2 1.3 0.2 setosa
#> 4 4.6 3.1 1.5 0.2 setosa
#> 5 5.0 3.6 1.4 0.2 setosa
#> 6 5.4 3.9 1.7 0.4 setosa
#... etc
I am new to R and have a question.
Create a function, zScore, that will take a vector of numbers (x) and converts them to a vector of z-scaled numbers (see code below). (Don't worry about NA's)
#This creates the z-scaled numbers for sepal lengths
(iris$Sepal.Length - mean(iris$Sepal.Length))/sd(iris$Sepal.Length)
#This creates the z-scaled numbers for sepal widths
(iris$Sepal.Width - mean(iris$Sepal.Width))/sd(iris$Sepal.Width)
write a zScore function that is flexible.
thank you for any help you provide
You can use the following code:
# Z-score function
zscore <- function(x) {
(x - mean(x))/sd(x)
}
library(tidyverse)
iris %>%
mutate(zscore_sepal.length = zscore(Sepal.Length)) %>%
mutate(zscore_sepal.width = zscore(Sepal.Width))
Output:
Sepal.Length Sepal.Width Petal.Length Petal.Width Species zscore_sepal.length zscore_sepal.width
1 5.1 3.5 1.4 0.2 setosa -1.95660229 -3.514384
2 4.9 3.0 1.4 0.2 setosa -2.15660229 -4.014384
3 4.7 3.2 1.3 0.2 setosa -2.35660229 -3.814384
4 4.6 3.1 1.5 0.2 setosa -2.45660229 -3.914384
5 5.0 3.6 1.4 0.2 setosa -2.05660229 -3.414384
I want to multiply a value (0.045) with specific columns (that start with "i") in a dataset. There is also a column called "id" that has the value 0.045 in all rows.
I've tried this, which did not work:
df %>%
mutate(across(starts_with("i")), ~.id)
The columns to be multiplied can be specified based on position or based on the fact that they all start with "i"
Hope someone can help me.
Thanks a lot!
Magnus
Try this. I used iris dataset in order to create the example. Be careful that the new definition for mutating the columns should be inside across() and not outside it, as you have in the shared code. Here the solution:
library(tidyverse)
#Code
iris %>%
mutate(across(starts_with("Sepal"), ~.*0.045))
Output (some rows):
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 0.2295 0.1575 1.4 0.2 setosa
2 0.2205 0.1350 1.4 0.2 setosa
3 0.2115 0.1440 1.3 0.2 setosa
4 0.2070 0.1395 1.5 0.2 setosa
5 0.2250 0.1620 1.4 0.2 setosa
6 0.2430 0.1755 1.7 0.4 setosa
7 0.2070 0.1530 1.4 0.3 setosa
8 0.2250 0.1530 1.5 0.2 setosa
9 0.1980 0.1305 1.4 0.2 setosa
Base R solution:
cols_bool <- startsWith(names(iris), "Sepal")
cbind(iris[,!cols_bool, drop = FALSE], iris[,cols_bool, drop = FALSE] * 0.045)
I have a dataframe with 10 vars. Three are factors and seven are numeric. I want to write a defined function that looks through each column and determines if it is numeric; and if it is numeric calculate the log.
Here's one simple way with dplyr package -
your_df %>%
mutate_if(is.numeric, log)
As per comment, if you want to keep the original variables and add the logs as new variables -
your_df %>%
mutate_if(is.numeric, list(LG = ~log))
Example -
head(iris) %>%
mutate_if(is.numeric, list(LG = ~log))
Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Length_LG Sepal.Width_LG Petal.Length_LG Petal.Width_LG
1 5.1 3.5 1.4 0.2 setosa 1.629241 1.252763 0.3364722 -1.6094379
2 4.9 3.0 1.4 0.2 setosa 1.589235 1.098612 0.3364722 -1.6094379
3 4.7 3.2 1.3 0.2 setosa 1.547563 1.163151 0.2623643 -1.6094379
4 4.6 3.1 1.5 0.2 setosa 1.526056 1.131402 0.4054651 -1.6094379
5 5.0 3.6 1.4 0.2 setosa 1.609438 1.280934 0.3364722 -1.6094379
6 5.4 3.9 1.7 0.4 setosa 1.686399 1.360977 0.5306283 -0.9162907
Using "dplyr" package you can select only numeric columns and calculate log. In my example I used "iris" dataset:
iris_1 <- as.data.frame(lapply(iris %>% select_if(is.numeric), log))
> head(iris_1)
Sepal.Length Sepal.Width Petal.Length Petal.Width
1 1.629241 1.252763 0.3364722 -1.6094379
2 1.589235 1.098612 0.3364722 -1.6094379
3 1.547563 1.163151 0.2623643 -1.6094379
4 1.526056 1.131402 0.4054651 -1.6094379
5 1.609438 1.280934 0.3364722 -1.6094379
6 1.686399 1.360977 0.5306283 -0.9162907
This works well, but troublesome.
> library(dplyr)
> mutate(iris, a = paste( Petal.Width, Petal.Length) ) %>>% head
Sepal.Length Sepal.Width Petal.Length Petal.Width Species a
1 5.1 3.5 1.4 0.2 setosa 0.2 1.4
2 4.9 3.0 1.4 0.2 setosa 0.2 1.4
3 4.7 3.2 1.3 0.2 setosa 0.2 1.3
4 4.6 3.1 1.5 0.2 setosa 0.2 1.5
5 5.0 3.6 1.4 0.2 setosa 0.2 1.4
6 5.4 3.9 1.7 0.4 setosa 0.4 1.7
How can I use dplyr's "Select helpers" in paste()?
> mutate(iris, a = paste( starts_with("Petal") ))
Error in mutate_impl(.data, dots) :
wrong result size (0), expected 150 or 1
> mutate_(iris, a = paste( starts_with("Petal") ))
Error in parse(text = x)[[1]] : subscript out of bounds
> mutate_(iris, a = paste( starts_with(Petal) ))
Error in is.string(match) : object 'Petal' not found
> mutate(iris, a = paste( grep("Petal", names(iris), value=T) ))
Error in mutate_impl(.data, dots) :
wrong result size (2), expected 150 or 1
And this did not work.
> mutate(iris, a = paste( names(iris)[base::startsWith(names(iris),"Petal")] ))
Error in mutate_impl(.data, dots) :
wrong result size (2), expected 150 or 1
I made very troublesome function. But it works. Maybe I use this or search more simple good one.
> paste.colprefix <- function(DFNAME, PREFIX){
+ TMP <- eval(parse(text= paste0("grep(\"", PREFIX, "\",names(", DFNAME, "), v=T)")))
+ TMP <- paste0(DFNAME, "$",TMP)
+ TMP <- paste0(TMP, collapse = ",")
+ eval(parse(text= paste0( "paste(", TMP, ")")))
+ }
>
> iris$PetalPaste <- paste.colprefix("iris", "Petal")
> head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species PetalPaste
1 5.1 3.5 1.4 0.2 setosa 1.4 0.2
2 4.9 3.0 1.4 0.2 setosa 1.4 0.2
3 4.7 3.2 1.3 0.2 setosa 1.3 0.2
4 4.6 3.1 1.5 0.2 setosa 1.5 0.2
5 5.0 3.6 1.4 0.2 setosa 1.4 0.2
6 5.4 3.9 1.7 0.4 setosa 1.7 0.4
>
You can not use select's helper functions in paste function.
Following is the trick with which you can get expected output.
You can filter out column names of the data frame and use them as parameter to your paste function.
To filter out those column names you can use any one of the following technique.
base::startsWith(character vector, Starts with string)
cn <- names(iris)[base::startsWith(names(iris),"Petal")]
stringr::str_detect(character vector, regex to find)
cn <- names(iris)[stringr::str_detect(names(iris), "Petal.*")]
In each of this method, it will return vector of column names which start with "Petal".
Then You can use this as following to get your expected result.
iris$a <- do.call(paste,iris[cn])