What is the dot (".") notation in R? [duplicate] - r

This question already has answers here:
What does the dplyr period character "." reference?
(2 answers)
Closed 2 years ago.
I want to group the tidyr dataset relig_income by religion, show the mean of the believers by N_People and order them DESC through the mean. I tried the first code but according to my online-course, the correct answer appears to be the second. What does the dot, in the function of arrange, means?
I am getting two different results.
My Code:
tidy_df %>% group_by(religion) %>% summarise(mean_believers = mean(N_People)) %>% arrange(mean_believers, desc(mean_believers))
Correct Answer:
tidy_df %>% group_by(religion) %>% summarise(mean_believers = mean(N_People)) %>% arrange(., desc(mean_believers))

The . is the notation for the data passed through %>%.
For example, you can reference specific columns of the data with .$your_column
Take a look at the documentation for pipe

Related

How to add multiple columns with purrr? [duplicate]

This question already has answers here:
Mutate multiple columns in a dataframe
(6 answers)
Closed 2 years ago.
I'm looking for a tidy way to perform the following:
iris %>% mutate(
petal_width_pct = Petal.Width/Sepal.Width,
petal_length_pct = Petal.Length/Sepal.Width,
sepal_length_pct = Sepal.Length/Sepal.Width,
sepal_width_pct = Sepal.Width/Sepal.Width)
I'd like to add a percentage column for each numeric column, in a way which doesn't require the listing of each variable. I imagine I need a purrr map of some kind, but I haven't figured it out yet!
Instead of map I think you should look at across :
library(dplyr)
iris %>% mutate(across(where(is.numeric), ~./Sepal.Width, .names ='{col}_pct'))
In dplyr versions < 1.0.0 you could use mutate_if with same effect.
iris %>% mutate_if(is.numeric, list(pct = ~./Sepal.Width))

How do I group data based on a variable? [duplicate]

This question already has answers here:
Why does summarize or mutate not work with group_by when I load `plyr` after `dplyr`?
(2 answers)
Closed 2 years ago.
Im trying to group together data in my data set based on whether the value in one of the columns is 1, 2 or 3. The columns of my data are CLASS and PERF and I want to group based on the CLASS column. The code I have used is
visible2<-visible %>%
group_by(CLASS) %>%
summarise(mean_performance = mean(PERF), sd_performance = sd(PERF))
the output I get is just one value for the mean and standard deviation for the performance across all groups rather than 3 rows, one for each group
It could be because the plyr package is also loaded and plyr::summarise masked te dplyr::summarise. We can specify dplyr::summarise explicitly or redo this on a fresh R with only dplyr loaded
library(dplyr)
visible %>%
group_by(CLASS) %>%
dplyr::summarise(mean_performance = mean(PERF), sd_performance = sd(PERF))

R - how to use group by function properly [duplicate]

This question already has answers here:
dplyr groups not working with dollar sign data$column syntax
(1 answer)
Why does summarize or mutate not work with group_by when I load `plyr` after `dplyr`?
(2 answers)
Closed last year.
I'm trying to do the average and correlation for some variables sorted gender. I don't think my group_by function is working, for some reason.
data(PSID1982, package ="AER" )
PSID1982 %>%
group_by(gender) %>%
summarise(avgeduc = mean(PSID1982$education), avgexper = mean(PSID1982$experience), avgwage= mean(PSID1982$wage),cor_wagvseduc = cor( x=PSID1982$wage, y= PSID1982$education))
The result is just the summary statistics of the entire group, not broken up into different genders.
Your syntax is correct but when you are using pipes and dplyr functions you do not need to call the column name using PSID1982$Column_Name. You just use the name of the column as follows:
PSID1982 %>%
group_by(gender) %>%
summarise(avgeduc = mean(education),
avgexper = mean(experience),
avgwage= mean(wage),
cor_wagvseduc = cor( x=wage, y= education))

Get mean values if a key column value is duplicated with dplyr (R) [duplicate]

This question already has answers here:
Means multiple columns by multiple groups [duplicate]
(4 answers)
Closed 4 years ago.
This is my data. What I would like to do is, if the gene column has duplicated value (e.g. CASZ1), then I would like to get mean values for each Sample column.
Input data
Output data
I googled it and tried, but I am stuck to get an answer. I am sorry for asking such a question looks exactly like homework.
My code
data %>% group_by(gene) %>% summarise(avg = mean(colnames(data)) --- error...
You can use summarize_at along with some regular expression to ensure any column not starting by your pattern will not be included:
data %>% group_by(gene) %>% summarise_at(vars(matches("Sample")), mean)
Is that what you're looking for?
You can use summarise_all:
library(dplyr)
data %>% group_by(gene) %>% summarise_all(funs(mean))

R - group data frame from a variable [duplicate]

This question already has answers here:
dplyr: How to use group_by inside a function?
(4 answers)
Closed 6 years ago.
I want to set the column for grouping a data frame into a variable and then group and summarise the data frame based on it, i.e.
require(dplyr)
var <- colnames(mtcars)[10]
summaries <- mtcars %>% dplyr::group_by(var) %>% dplyr::summarise_each(funs(mean))
such that I can simply change var and use the second line without changing anything. Unfortunately my solution does not work as group_by asks the column name and not a variable.
Use group_by_, which takes arguments as character strings:
require(dplyr)
var <- colnames(mtcars)[10]
summaries <- mtcars %>% dplyr::group_by_(var) %>% dplyr::summarise_each(funs(mean))
(Maybe resources on standard vs non-standard evaluation would be of interest: http://adv-r.had.co.nz/Computing-on-the-language.html)

Resources