Passing variable names to the count() function in R - r

I am trying to use the count() function to return the levels of a column in R. I have 37 columns and I wanted to know if there is a way to pass column names other than typing them out.
I am currently using,
> x1Count <- totalCount%>% group_by(Country) %>count(X1.Environmental.Regulation)%>% drop_na()
I want to run this through a loop with the count() function taking the column names from a list like colnames(totalCount).
Is there another way to pass inputs to the count() function that will allow me to use column numbers or refer another list?

We can change the string into a symbol (with sym) and evaluate (!!!). In the below example, we get the frequency count of the columns 4 and 5, grouped by 'Country'
library(tidyverse)
totalCount %>%
group_by(Country) %>%
count(!!! rlang::syms(names(.)[4:5]))

Related

Distinct function in dplyr package

I'm trying to get get unique values for 2 columns. I get them when I write it separately but it doesn't work for both in the same command. I do it like this:
dataname %>%
select(column name, column name) %>%
distinct() %>%
summarise(name=n(), name=n())
In this way, I only get unique values for the first column. What is the problem?

Store specific index value of output in R

I'm basically looking for the equivalent of the following python code in R:
df.groupby('Categorical')['Count'].count()[0]
The following is what I'm doing in R:
by(df$count,df$Categorical,sum)
This accomplishes the same thing as the first code but I'd like to know how to store an index value to a variable in R (new to R) .
Based on the by code, it seems like we can use (assuming that 'count' is a columns of 1s)
library(dplyr)
out <- df %>%
group_by(Categorical) %>%
summarise(Sum = sum(count))
If the columns 'count' have other values as well, the python function is taking the frequency count of 'Categorical' column. So, a similar option would be
out <- df %>%
count(Categorical) %>%
slice(1) %>%
pull(n)

With dplyr and enquo my code works but not when I pass to purrr::map

I want to create a plot for each column in a vector called dates. My data frame contains only these columns and I want to group on it, count the occurrences and then plot it.
Below code works, except for map which I want to use to go across a previously unknown number of columns. I think I'm using map correctly, I've had success with it before. I'm new to using quosures but given that my function call works I'm not sure what is wrong. I've looked at several other posts that appear to be set up this way.
df <- data.frame(
date1 = c("2018-01-01","2018-01-01","2018-01-01","2018-01-02","2018-01-02","2018-01-02"),
date2 = c("2018-01-01","2018-01-01","2018-01-01","2018-01-02","2018-01-02","2018-01-02"),
stringsAsFactors = FALSE
)
dates<-names(df)
library(tidyverse)
dates.count<-function(.x){
group_by<-enquo(.x)
df %>% group_by(!!group_by) %>% summarise(count=n()) %>% ungroup() %>% ggplot() + geom_point(aes(y=count,x=!!group_by))
}
dates.count(date1)
map(dates,~dates.count(.x))
I get this error: Error in grouped_df_impl(data, unname(vars), drop) : Column .x is unknown
When you pass the variable names to map() you are using strings, which indicates you need ensym() instead of enquo().
So your function would look like
dates.count <- function(.x){
group_by = ensym(.x)
df %>%
group_by(!!group_by) %>%
summarise(count=n()) %>%
ungroup() %>%
ggplot() +
geom_point(aes(y=count,x=!!group_by))
}
And you would use the variable names as strings for the argument.
dates.count("date2")
Note that tidyeval doesn't always play nicely with the formula interface of map() (I think I'm remembering that correctly). You can always do an anonymous function instead, but in your case where you want to map the column names to a function with a single argument you can just do
map(dates, dates.count)
Using the formula interface in map() I needed an extra !!:
map(dates, ~dates.count(!!.x))

Passing column names as both variables and columns in a single dplyr function in R

I am writing a code in which a column name (e.g. "Category") is supplied by the user and assigned to a variable biz.area. For example...
biz.area <- "Category"
The original data frame is saved as risk.data. User also supplies the range of columns to analyze by providing column names for variables first.column and last.column.
Text in these columns will be broken up into bigrams for further text analysis including tf_idf.
My code for this analysis is given below.
x.bigrams <- risk.data %>%
gather(fields, alldata, first.column:last.column) %>%
unnest_tokens(bigrams,alldata,token = "ngrams", n=2) %>%
count(bigrams, biz.area, sort=TRUE) %>%
bind_tf_idf(bigrams, biz.area, n) %>%
arrange(desc(tf_idf))
However, I get the following error.
Error in grouped_df_impl(data, unname(vars), drop) : Column
x.biz.area is unknown
This is because count() expects a column name text string instead of variable biz.area. If I use count_() instead, I get the following error.
Error in compat_lazy_dots(vars, caller_env()) : object 'bigrams'
not found
This is because count_() expects to find only variables and bigrams is not a variable.
How can I pass both a constant and a variable to count() or count_()?
Thanks for your suggestion!
It looks to me like you need to enclosures, so that you can pass column names as variables, rather than as strings or values. Since you're already using dplyr, you can use dplyr's non-standard evaluation techniques.
Try something along these lines:
library(tidyverse)
analyze_risk <- function(area, firstcol, lastcol) {
# turn your arguments into enclosures
areaq <- enquo(area)
firstcolq <- enquo(firstcol)
lastcolq <- enquo(lastcol)
# run your analysis on the risk data
risk.data %>%
gather(fields, alldata, !!firstcolq:!!lastcolq) %>%
unnest_tokens(bigrams,alldata,token = "ngrams", n=2) %>%
count(bigrams, !!areaq, sort=TRUE) %>%
bind_tf_idf(bigrams, !!areaq, n) %>%
arrange(desc(tf_idf))
}
In this case, your users would pass bare column names into the function like this:
myresults <- analyze_risk(Category, Name_of_Firstcol, Name_of_Lastcol)
If you want users to pass in strings, you'll need to use rlang::expr() instead of enquo().

Error using dplyr package in R

I am using the below code to extract the summary of data with respect to column x by counting the values in column x from the dataset unique_data and arranging the count values in descending order.
unique_data %>%
group_by(x) %>%
arrange(desc(count(x)))
But, when I execute the above code i am getting the error message as below,
Error: no applicable method for 'group_by_' applied to an object of class "character"
Kindly, let me know as what is going wrong in my code. For your information the column x is of character data type.
Regards,
Mohan
The reason is the wrapping of arrange on count. We need to do this separately. If we use the same code as in the OP's post, just split up the count and arrange step in two separate pipes. The output of count is a frequency column 'n' (by default), which we arrange in descending (desc) order.
unique_data %>%
group_by(x) %>%
count(x) %>%
arrange(desc(n))
also the group_by is not needed. According to the ?count documentation
tally is a convenient wrapper for summarise that will either call n or
sum(n) depending on whether you're tallying for the first time, or
re-tallying. count() is similar, but also does the group_by for you.
So based on that, we can just do
count(unique_data, x) %>%
arrange(desc(n))

Resources