How to add multiple columns with purrr? [duplicate] - r

This question already has answers here:
Mutate multiple columns in a dataframe
(6 answers)
Closed 2 years ago.
I'm looking for a tidy way to perform the following:
iris %>% mutate(
petal_width_pct = Petal.Width/Sepal.Width,
petal_length_pct = Petal.Length/Sepal.Width,
sepal_length_pct = Sepal.Length/Sepal.Width,
sepal_width_pct = Sepal.Width/Sepal.Width)
I'd like to add a percentage column for each numeric column, in a way which doesn't require the listing of each variable. I imagine I need a purrr map of some kind, but I haven't figured it out yet!

Instead of map I think you should look at across :
library(dplyr)
iris %>% mutate(across(where(is.numeric), ~./Sepal.Width, .names ='{col}_pct'))
In dplyr versions < 1.0.0 you could use mutate_if with same effect.
iris %>% mutate_if(is.numeric, list(pct = ~./Sepal.Width))

Related

What is the dot (".") notation in R? [duplicate]

This question already has answers here:
What does the dplyr period character "." reference?
(2 answers)
Closed 2 years ago.
I want to group the tidyr dataset relig_income by religion, show the mean of the believers by N_People and order them DESC through the mean. I tried the first code but according to my online-course, the correct answer appears to be the second. What does the dot, in the function of arrange, means?
I am getting two different results.
My Code:
tidy_df %>% group_by(religion) %>% summarise(mean_believers = mean(N_People)) %>% arrange(mean_believers, desc(mean_believers))
Correct Answer:
tidy_df %>% group_by(religion) %>% summarise(mean_believers = mean(N_People)) %>% arrange(., desc(mean_believers))
The . is the notation for the data passed through %>%.
For example, you can reference specific columns of the data with .$your_column
Take a look at the documentation for pipe

R - how to use group by function properly [duplicate]

This question already has answers here:
dplyr groups not working with dollar sign data$column syntax
(1 answer)
Why does summarize or mutate not work with group_by when I load `plyr` after `dplyr`?
(2 answers)
Closed last year.
I'm trying to do the average and correlation for some variables sorted gender. I don't think my group_by function is working, for some reason.
data(PSID1982, package ="AER" )
PSID1982 %>%
group_by(gender) %>%
summarise(avgeduc = mean(PSID1982$education), avgexper = mean(PSID1982$experience), avgwage= mean(PSID1982$wage),cor_wagvseduc = cor( x=PSID1982$wage, y= PSID1982$education))
The result is just the summary statistics of the entire group, not broken up into different genders.
Your syntax is correct but when you are using pipes and dplyr functions you do not need to call the column name using PSID1982$Column_Name. You just use the name of the column as follows:
PSID1982 %>%
group_by(gender) %>%
summarise(avgeduc = mean(education),
avgexper = mean(experience),
avgwage= mean(wage),
cor_wagvseduc = cor( x=wage, y= education))

Get mean values if a key column value is duplicated with dplyr (R) [duplicate]

This question already has answers here:
Means multiple columns by multiple groups [duplicate]
(4 answers)
Closed 4 years ago.
This is my data. What I would like to do is, if the gene column has duplicated value (e.g. CASZ1), then I would like to get mean values for each Sample column.
Input data
Output data
I googled it and tried, but I am stuck to get an answer. I am sorry for asking such a question looks exactly like homework.
My code
data %>% group_by(gene) %>% summarise(avg = mean(colnames(data)) --- error...
You can use summarize_at along with some regular expression to ensure any column not starting by your pattern will not be included:
data %>% group_by(gene) %>% summarise_at(vars(matches("Sample")), mean)
Is that what you're looking for?
You can use summarise_all:
library(dplyr)
data %>% group_by(gene) %>% summarise_all(funs(mean))

dplyr use select() helpers inside mutate() [duplicate]

This question already has answers here:
dplyr mutate rowSums calculations or custom functions
(7 answers)
Closed 4 years ago.
I'd to make a new variable which represents the sum (or other function) of many other variables which all start with "prefix_". Is there a way to do this neatly using these select() helpers (e.g. starts_with())?
I don't think mutate_at() works for this since I'm just trying to create a single new variable based on many existing variables.
My attempt:
df %<>%
mutate(newvar = sum(vars(starts_with("prefix_"))))
This of course doesn't work. Many thanks!
A reproducible example:
mtcars %<>%
rename("prefix_mpg" = mpg) %>%
rename("prefix_cyl" = cyl) %>%
mutate(newvar = sum(var(starts_with("prefix_"))))
Intended output would be mtcars$newvar which is the sum of prefix_mpg and prefix_cyl. Of course I could just explicitly name mpg and cyl but in my actual case it's a long list of variables, too long to name conveniently.
We can use starts_with with the select call and put them in the rowSums function. . is a way to specify the object from the output of the previous pipe operation.
library(dplyr)
mtcars %>%
rename(prefix_mpg = mpg, prefix_cyl = cyl) %>%
mutate(newvar = rowSums(select(., starts_with("prefix_"))))

R - group data frame from a variable [duplicate]

This question already has answers here:
dplyr: How to use group_by inside a function?
(4 answers)
Closed 6 years ago.
I want to set the column for grouping a data frame into a variable and then group and summarise the data frame based on it, i.e.
require(dplyr)
var <- colnames(mtcars)[10]
summaries <- mtcars %>% dplyr::group_by(var) %>% dplyr::summarise_each(funs(mean))
such that I can simply change var and use the second line without changing anything. Unfortunately my solution does not work as group_by asks the column name and not a variable.
Use group_by_, which takes arguments as character strings:
require(dplyr)
var <- colnames(mtcars)[10]
summaries <- mtcars %>% dplyr::group_by_(var) %>% dplyr::summarise_each(funs(mean))
(Maybe resources on standard vs non-standard evaluation would be of interest: http://adv-r.had.co.nz/Computing-on-the-language.html)

Resources