Could not find function mutate_if - r

I am trying to use the function mutate_if under dplyr() package to convert all the character columns to factor columns. I am aware of alternate approaches for this transfromation, but I am curious to see how mutate_if works. I tried the following command:
df <-df %>% mutate_if(is.character,as.factor)
But I am geting a message, :
could not find function mutate_if
I reinstalled dplyr() but still I am getting the same error message.

When I was having this problem, I also got the error message "there is no package called 'pillar'" while loading the dplyr package. (Re-installing dplyr didn't seem to install pillar.) Installing the pillar package allowed things to work again.

df <-df %>% dplyr::mutate_if(is.character,as.factor)
Try like this. when I faced the same error I used the mutate function like this.

There is no more mutate_if function. You could use this one which is really long if you would not be able to extract names of variables which are from type of character:
dt1 <- dt1 %>% mutate_each_(funs(as.factor),names(which (sapply(dt1, class) == 'character',arr.ind = TRUE)))
If you have list of factor variables lets say in "varlist" object, you can use this one:
dt1 <- dt1 %>% mutate_each_(funs(as.factor),varlist)

Related

Anonymous function to select variables and get unique values

I have a simple task which I would like to loop over many datasets (which have similar variable names). I know how to do it dplyr, but I need to convert it to base R in order to get it into an anonymous function.
For example (this not the real data I am working with):
This is my dplyr approach:
mtcars %>%
select(mpg, contains("cyl")) %>%
distinct()
However, when I throw this into an anonymous function:
I get an error: Error: No tidyselect variables were registered
mtcars %>% (function(x) subset(x, select=c(mpg, contains("cyl")))
Any ideas about how to solve this, and how to add distinct() to the function so that I only get unique values? Any and all suggestions are appreciated, thank you!

Subsetting on a tibble using "[]" gives "object not found" error

The article on dplyr here says "[]" (square brackets) can be used to subset filtered Tibbles like this:
filter(mammals, adult_body_mass_g > 1e7)[ , 3]
But I am getting an "object not found" error.
Here is the replication of the error on a more known dataset "iris"
library(dplyr)
iris %>% filter(Sepal.Length>6) [,c(1:3)]
Error in filter_(.data, .dots = lazyeval::lazy_dots(...)) :
object 'Sepal.Length' not found
I also want to mention that I am deliberately not preferring to use the native subsetting in dplyr using select() as I need a vector output and not a data frame on a single column. Unfortunately, dplyr always forces a data frame output (for good reasons).
You need an extra pipe:
iris %>% filter(Sepal.Length>6) %>% .[,1:3]
Sorry, forgot the . before the brackets.
Note: Your code will probably be more readable if you stick to the tidyverse syntax and use select as the last operation.
iris %>%
filter(Sepal.Length > 6) %>%
select(1:3)
The dplyr-native way of doing this is to use select:
iris %>% filter(Sepal.Length > 6) %>% select(1:3)
You could also use {} so that the filtering is done before [ is applied:
{iris %>% filter(Sepal.Length>6)}[,c(1:3)]
Or, as suggested in another answer, use the . notation to indicated where the data should go in relation to [:
iris %>% filter(Sepal.Length>6) %>% .[,1:3]
You can also load magrittr explicitly and use extract, which is a "pipe-able" version of [:
library(magrittr)
iris %>% filter(Sepal.Length>6) %>% extract( ,1:3)
The blog entry you reference is old in dplyr time - about 3 years old. dplyr has been changing a lot. I don't know whether the blog's suggestion worked at the time it was written or not, but I'd recommend finding more recent sources to learn about this frequently changing package.

Using to PLYR to count with Which Condition

I am trying to use the which function in conjunction with the count function. I would like to count the number of factors that follow a which condition. This code isn't correct, but any advice would be appreciated.
library(plyr)
count(data, 'factor', which numeric > 10)
#Base version attempt
count(data$factor, which(data$numeric > 10))
Error in UseMethod("group_by_") :
no applicable method for 'group_by_' applied to an object of class "factor"
This isn't exactly what you're looking for but here are two pieces of advice:
plyr is an older version of dplyr so I would use the newer one, especially because it come in the tidyverse group. dplyr's count can deal with factors.
Factors aren't commonly used in R anymore. I would suggest just coercing with as.character
With dplyr you could write something like:
data %>% filter(numeric > 10) %>% count(factor)

r dplyr group_by - by variable content

I use dplyr group_by function to group my data frame,
and need to be able to group the data, by a column, i don't know the name of the column yet, i need to decide it along the code, so the name can't be hard coded.
for example,
i can't use
data %>% group_by(col_name)
i need to do somthing like
data %>% c <- col_name
data %>% group_by(c)
when i try doing so, it popes error:
Error: unknown variable to group by : c
All the examples I find are for the trevial case when you can hard code the name of the column
group by example
Same in the r help
Thanks.
You would like to look up NSE as others have said in their comments. Using that also requires you to use lazyeval package, and group_by_ function, which allows you to you standard evaluation. So it will look like:
data %>% group_by_(lazyeval::interp(~var, var = as.name(c)))

R: Making pivot table with dplyr or reshape2 package

I am trying to make simple pivot table in R using dplyr or reshape2 packages as my dataset is too large and R goes out of memory with sqldf. The two columns of my dataset that I want to make a pivot table out of is "Product" and "Cust_Id". I want to count the number of customer per product. And this is what I got.
library(reshape2)
mydata<-read.table("Book1.txt",header=TRUE,fill=TRUE)
mydata.m<-melt(mydata,id=c("Product"),measured=c(Cust_Id))
mydata.d<-dcast(mydata.m,Product~variable,count)
It returns
Error in UseMethod("group_by_"):
no applicable method for 'group_by_' applied to an object of class "c('integer','numeric')"
I have also tried dplyr with below code(not sure about the last step though as I did it on the other laptop)
library(dplyr)
mydata.df<-tbl_df(mydata)
summarize(mydata.df,Product,Cust_Id=n())
I got no error message but a lot of values seems to be missing in the output.
I really appreciate your input. Thanks in advance.
Try this:
library(dplyr)
mydata <- mydata %>%
group_by(Product) %>%
summarise(nCustomers = n())
Alternatively, if you only want to count unique customers, you can do:
library(dplyr)
mydata <- mydata %>%
group_by(Product) %>%
summarise(nCustomers = n_distinct(Cust_Id))
If this really is a big data set then your best option in the data.table package
require(data.table)
mydata_data_table = data.table(mydata)
number_customer = mydata_data_table[, .(number_customers = .N), by=Product]

Resources