I am trying to use the function mutate_if under dplyr() package to convert all the character columns to factor columns. I am aware of alternate approaches for this transfromation, but I am curious to see how mutate_if works. I tried the following command:
df <-df %>% mutate_if(is.character,as.factor)
But I am geting a message, :
could not find function mutate_if
I reinstalled dplyr() but still I am getting the same error message.
When I was having this problem, I also got the error message "there is no package called 'pillar'" while loading the dplyr package. (Re-installing dplyr didn't seem to install pillar.) Installing the pillar package allowed things to work again.
df <-df %>% dplyr::mutate_if(is.character,as.factor)
Try like this. when I faced the same error I used the mutate function like this.
There is no more mutate_if function. You could use this one which is really long if you would not be able to extract names of variables which are from type of character:
dt1 <- dt1 %>% mutate_each_(funs(as.factor),names(which (sapply(dt1, class) == 'character',arr.ind = TRUE)))
If you have list of factor variables lets say in "varlist" object, you can use this one:
dt1 <- dt1 %>% mutate_each_(funs(as.factor),varlist)
Related
I have a simple task which I would like to loop over many datasets (which have similar variable names). I know how to do it dplyr, but I need to convert it to base R in order to get it into an anonymous function.
For example (this not the real data I am working with):
This is my dplyr approach:
mtcars %>%
select(mpg, contains("cyl")) %>%
distinct()
However, when I throw this into an anonymous function:
I get an error: Error: No tidyselect variables were registered
mtcars %>% (function(x) subset(x, select=c(mpg, contains("cyl")))
Any ideas about how to solve this, and how to add distinct() to the function so that I only get unique values? Any and all suggestions are appreciated, thank you!
The article on dplyr here says "[]" (square brackets) can be used to subset filtered Tibbles like this:
filter(mammals, adult_body_mass_g > 1e7)[ , 3]
But I am getting an "object not found" error.
Here is the replication of the error on a more known dataset "iris"
library(dplyr)
iris %>% filter(Sepal.Length>6) [,c(1:3)]
Error in filter_(.data, .dots = lazyeval::lazy_dots(...)) :
object 'Sepal.Length' not found
I also want to mention that I am deliberately not preferring to use the native subsetting in dplyr using select() as I need a vector output and not a data frame on a single column. Unfortunately, dplyr always forces a data frame output (for good reasons).
You need an extra pipe:
iris %>% filter(Sepal.Length>6) %>% .[,1:3]
Sorry, forgot the . before the brackets.
Note: Your code will probably be more readable if you stick to the tidyverse syntax and use select as the last operation.
iris %>%
filter(Sepal.Length > 6) %>%
select(1:3)
The dplyr-native way of doing this is to use select:
iris %>% filter(Sepal.Length > 6) %>% select(1:3)
You could also use {} so that the filtering is done before [ is applied:
{iris %>% filter(Sepal.Length>6)}[,c(1:3)]
Or, as suggested in another answer, use the . notation to indicated where the data should go in relation to [:
iris %>% filter(Sepal.Length>6) %>% .[,1:3]
You can also load magrittr explicitly and use extract, which is a "pipe-able" version of [:
library(magrittr)
iris %>% filter(Sepal.Length>6) %>% extract( ,1:3)
The blog entry you reference is old in dplyr time - about 3 years old. dplyr has been changing a lot. I don't know whether the blog's suggestion worked at the time it was written or not, but I'd recommend finding more recent sources to learn about this frequently changing package.
I am trying to use the which function in conjunction with the count function. I would like to count the number of factors that follow a which condition. This code isn't correct, but any advice would be appreciated.
library(plyr)
count(data, 'factor', which numeric > 10)
#Base version attempt
count(data$factor, which(data$numeric > 10))
Error in UseMethod("group_by_") :
no applicable method for 'group_by_' applied to an object of class "factor"
This isn't exactly what you're looking for but here are two pieces of advice:
plyr is an older version of dplyr so I would use the newer one, especially because it come in the tidyverse group. dplyr's count can deal with factors.
Factors aren't commonly used in R anymore. I would suggest just coercing with as.character
With dplyr you could write something like:
data %>% filter(numeric > 10) %>% count(factor)
I use dplyr group_by function to group my data frame,
and need to be able to group the data, by a column, i don't know the name of the column yet, i need to decide it along the code, so the name can't be hard coded.
for example,
i can't use
data %>% group_by(col_name)
i need to do somthing like
data %>% c <- col_name
data %>% group_by(c)
when i try doing so, it popes error:
Error: unknown variable to group by : c
All the examples I find are for the trevial case when you can hard code the name of the column
group by example
Same in the r help
Thanks.
You would like to look up NSE as others have said in their comments. Using that also requires you to use lazyeval package, and group_by_ function, which allows you to you standard evaluation. So it will look like:
data %>% group_by_(lazyeval::interp(~var, var = as.name(c)))
I am trying to make simple pivot table in R using dplyr or reshape2 packages as my dataset is too large and R goes out of memory with sqldf. The two columns of my dataset that I want to make a pivot table out of is "Product" and "Cust_Id". I want to count the number of customer per product. And this is what I got.
library(reshape2)
mydata<-read.table("Book1.txt",header=TRUE,fill=TRUE)
mydata.m<-melt(mydata,id=c("Product"),measured=c(Cust_Id))
mydata.d<-dcast(mydata.m,Product~variable,count)
It returns
Error in UseMethod("group_by_"):
no applicable method for 'group_by_' applied to an object of class "c('integer','numeric')"
I have also tried dplyr with below code(not sure about the last step though as I did it on the other laptop)
library(dplyr)
mydata.df<-tbl_df(mydata)
summarize(mydata.df,Product,Cust_Id=n())
I got no error message but a lot of values seems to be missing in the output.
I really appreciate your input. Thanks in advance.
Try this:
library(dplyr)
mydata <- mydata %>%
group_by(Product) %>%
summarise(nCustomers = n())
Alternatively, if you only want to count unique customers, you can do:
library(dplyr)
mydata <- mydata %>%
group_by(Product) %>%
summarise(nCustomers = n_distinct(Cust_Id))
If this really is a big data set then your best option in the data.table package
require(data.table)
mydata_data_table = data.table(mydata)
number_customer = mydata_data_table[, .(number_customers = .N), by=Product]