Using to PLYR to count with Which Condition - r

I am trying to use the which function in conjunction with the count function. I would like to count the number of factors that follow a which condition. This code isn't correct, but any advice would be appreciated.
library(plyr)
count(data, 'factor', which numeric > 10)
#Base version attempt
count(data$factor, which(data$numeric > 10))
Error in UseMethod("group_by_") :
no applicable method for 'group_by_' applied to an object of class "factor"

This isn't exactly what you're looking for but here are two pieces of advice:
plyr is an older version of dplyr so I would use the newer one, especially because it come in the tidyverse group. dplyr's count can deal with factors.
Factors aren't commonly used in R anymore. I would suggest just coercing with as.character
With dplyr you could write something like:
data %>% filter(numeric > 10) %>% count(factor)

Related

dplyr::recode Why does pipe generate error?

If I use recode in a pipe, I get an error:
df <- df %>%
recode(unit, .missing="g")
Error in UseMethod("recode") : no applicable method for 'recode'
applied to an object of class "c('tbl_df', 'tbl', 'data.frame')"
If I pull it out of the pipe, it works fine:
df$unit <- recode(df$unit, .missing="g")
Any ideas why? I'd like to stay in the pipe if possible.
An equivalent of the baseR solution in dplyr is to use it inside mutate:
df %>%
mutate(unit = recode(unit, .missing="g"))
Directly chaining recode after %>% will pass the data frame to recode as the first argument, which doesn't agree with recode's parameters. The first argument .x needs to be a vector; unlike some other dplyr functions recode doesn't use some non-standard evaluation magic to interpret unit as the column with that name in df. Most functions designed for direct use with the pipe have a data frame as their first argument and their output. You can read more about magrittr and how the pipe works here.

Could not find function mutate_if

I am trying to use the function mutate_if under dplyr() package to convert all the character columns to factor columns. I am aware of alternate approaches for this transfromation, but I am curious to see how mutate_if works. I tried the following command:
df <-df %>% mutate_if(is.character,as.factor)
But I am geting a message, :
could not find function mutate_if
I reinstalled dplyr() but still I am getting the same error message.
When I was having this problem, I also got the error message "there is no package called 'pillar'" while loading the dplyr package. (Re-installing dplyr didn't seem to install pillar.) Installing the pillar package allowed things to work again.
df <-df %>% dplyr::mutate_if(is.character,as.factor)
Try like this. when I faced the same error I used the mutate function like this.
There is no more mutate_if function. You could use this one which is really long if you would not be able to extract names of variables which are from type of character:
dt1 <- dt1 %>% mutate_each_(funs(as.factor),names(which (sapply(dt1, class) == 'character',arr.ind = TRUE)))
If you have list of factor variables lets say in "varlist" object, you can use this one:
dt1 <- dt1 %>% mutate_each_(funs(as.factor),varlist)

r dplyr group_by - by variable content

I use dplyr group_by function to group my data frame,
and need to be able to group the data, by a column, i don't know the name of the column yet, i need to decide it along the code, so the name can't be hard coded.
for example,
i can't use
data %>% group_by(col_name)
i need to do somthing like
data %>% c <- col_name
data %>% group_by(c)
when i try doing so, it popes error:
Error: unknown variable to group by : c
All the examples I find are for the trevial case when you can hard code the name of the column
group by example
Same in the r help
Thanks.
You would like to look up NSE as others have said in their comments. Using that also requires you to use lazyeval package, and group_by_ function, which allows you to you standard evaluation. So it will look like:
data %>% group_by_(lazyeval::interp(~var, var = as.name(c)))

mutate_each_ non-standard evaluation

Really struggling with putting dplyr functions within my functions. I understand the function_ suffix for the standard evaluation versions, but still having problems, and seemingly tried all combinations of eval paste and lazy.
Trying to divide multiple columns by the median of the control for a group. Example data includes an additional column in iris named 'Control', so each species has 40 'normal', and 10 'control'.
data(iris)
control <- rep(c(rep("normal", 40), rep("control", 10)), 3)
iris$Control <- control
Normal dplyr works fine:
out_df <- iris %>%
group_by(Species) %>%
mutate_each(funs(./median(.[Control == "control"])), 1:4)
Trying to wrap this up into a function:
norm_iris <- function(df, control_col, control_val, species, num_cols = 1:4){
out <- df %>%
group_by_(species) %>%
mutate_each_(funs(./median(.[control_col == control])), num_cols)
return(out)
}
norm_iris(iris, control_col = "Control", control_val = "control", species = "Species")
I get the error:
Error in UseMethod("as.lazy_dots") :
no applicable method for 'as.lazy_dots' applied to an object of class "c('integer', 'numeric')"
Using funs_ instead of funs I get Error:...: need numeric data
If you haven't already, it might help you to read the vignette on standard evaluation here, although it sounds like some of this may be changing soon.
Your function is missing the use of interp from package lazyeval in the mutate_each_ line. Because you are trying to use a variable name (the Control variable) in the funs, you need funs_ in this situation along with interp. Notice that this is a situation where you don't need mutate_each_ at all. You would need it if you were trying to use column names instead of column numbers when selecting the columns you want to mutate.
Here is what the line would look like in your function instead of what you have:
mutate_each(funs_(interp(~./median(.[x == control_val]), x = as.name(control_col))),
num_cols)

Understanding plyr's ddply function

I am learning R and don't understand a section of the below function. In the below function what exactly is count=length(address) doing? Is there another way to do this?
crime_dat = ddply(crime, .(lat, lon), summarise, count = length(address))
The plyr library has two very common "helper" functions, summarize and mutate.
Summarise is used when you want to discard irrelevant data/columns, keeping only the levels of the grouping variable(s) and the specific and the summary functions of those groups (in your example, length).
Mutate is used to add a column (analogous to transform in base R), but without discarding anything. If you run these two commands, they should illustrate the difference nicely.
library(plyr)
ddply(mtcars, .(cyl), summarise, count = length(mpg))
ddply(mtcars, .(cyl), mutate, count = length(mpg))
In this example, as in your example, the goal is to figure out how many rows there are in each group. When using ddply like this with summarise, we need to pick a function that takes a single column (vector) as an argument, so length is a good choice. Since we're just counting rows / taking the length of the vector, it doesn't matter which column we pass to it. Alternatively, we could use nrow, but for that we have to pass a whole data.frame, so summarise won't work. In this case it saves us typing:
ddply(mtcars, .(cyl), nrow)
But if we want to do more, summarise really shines
ddply(mtcars, .(cyl), summarise, count = length(mpg),
mean_mpg = mean(mpg), mean_disp = mean(disp))
Is there another way to do this?
Yes, many other ways.
I'd second Alex's recommendation to use dplyr for things like this. The summarize and mutate concepts are still used, but it works faster and results in more readable code.
Other options include the data.table package (also a great option), tapply() or aggregate() in base R, and countless other possibilities.

Resources