I'm trying to split a column into tokens using the tokenizers package but I keep receiving an error: could not find function "unnest_tokens". I am using R 3.5.3 and have installed and reinstalled dplyr, tidytext, tidyverse, tokenizers, tidyr, but still keep receiving the error.
I have also quit and restarted R and RStudio.
comments_tidy <- comments %>%
unnest_tokens(word, txt) %>% #Break the comments into individual words
filter(!word %in% undesirable_words) %>% #Remove undesirables
anti_join(stop_words) #Data provided by the tidytext package
I receive the following:
Error in unnest_tokens(., word, txt) :
could not find function "unnest_tokens"
As mentioned in the comments, you may want to expand your code with library(x) statements. In addition, make sure that all the packages and their dependencies are installed. The following snippet would look up a given package (in this case dplyr) and install it if needed.
if ("dplyr" %in% installed.packages()[, "Package"]){
cat("'dplyr' is installed.")
} else {
install.packages("dplyr",dependencies=T)
}
library(dplyr)
The command installed.packages()[, "Package"])? gives you a list of all installed packages, that is a nice trick for debugging all kind of Function foo not found errors.
Just be sure to run this first
install.packages("tidytext")
library(tidytext)
You only need to run the first line once - it will install the package.
You need to run the second line in each new R session, since that will load the package.
Calling library (tidytext) and tidytext::unnest_tokens() solved the problem for me.
Related
I'm running an example in R, going through the steps and everything is working so far except for this code produces an error:
words <- dtm %>%
as.matrix %>%
colnames %>%
(function(x) x[nchar(x) < 20])
Error: could not find function "%>%"
I don't understand what the benefit of using this special operator
%>% is, and any feedback would be great.
You need to load a package (like magrittr or dplyr) that defines the function first, then it should work.
install.packages("magrittr") # package installations are only needed the first time you use it
install.packages("dplyr") # alternative installation of the %>%
library(magrittr) # needs to be run every time you start R and want to use %>%
library(dplyr) # alternatively, this also loads %>%
The pipe operator %>% was introduced to "decrease development time and to improve readability and maintainability of code."
But everybody has to decide for himself if it really fits his workflow and makes things easier.
For more information on magrittr, click here.
Not using the pipe %>%, this code would return the same as your code:
words <- colnames(as.matrix(dtm))
words <- words[nchar(words) < 20]
words
EDIT:
(I am extending my answer due to a very useful comment that was made by #Molx)
Despite being from magrittr, the pipe operator is more commonly used
with the package dplyr (which requires and loads magrittr), so
whenever you see someone using %>% make sure you shouldn't load dplyr
instead.
On Windows: if you use %>% inside a %dopar% loop, you have to add a reference to load package dplyr (or magrittr, which dplyr loads).
Example:
plots <- foreach(myInput=iterators::iter(plotCount), .packages=c("RODBC", "dplyr")) %dopar%
{
return(getPlot(myInput))
}
If you omit the .packages command, and use %do% instead to make it all run in a single process, then works fine. The reason is that it all runs in one process, so it doesn't need to specifically load new packages.
One needs to install magrittr as follows
install.packages("magrittr")
Then, in one's script, don't forget to add on top
library(magrittr)
For the meaning of the operator %>% you might want to consider this question: What does %>% function mean in R?
Note that the same operator would also work with the library dplyr, as it imports from magrittr.
dplyr used to have a similar operator (%.%), which is now deprecated. Here we can read about the differences between %.% (deprecated operator from the library dplyr) and %>% (operator from magrittr, that is also available in dplyr)
The pipe operator is not available in base R. You need to load one of the following packages to use it: dplyr, tidyverse or magrittr
Anyone else stumbling upon this for calculating powers of matrices please install this library (dplyr alone not correct)
library(expm)
I'm maintaining an openCPU (R API) instance with frequent package updates. OpenCPU (sensibly) separates its own core packages into different folders, so that they aren't accidentally broken out of the lockstep with the installed version.
However, this can lead to duplicated packages being installed in the user folder which, then again, leads to errors when the openCPU API tries to unload and reattach the package to get the newer version. I frequently cause these sorts of problems when trying to update packages.
I usually check for them using this snippet.
ocpubasics <- rownames(installed.packages(lib.loc ="/usr/lib/opencpu/library"))
userpkgs <- rownames(installed.packages(lib.loc="/usr/local/lib/R/site-library"))
(dupe_pkgs <- userpkgs[ userpkgs %in% ocpubasics])
remove.packages(dupe_pkgs, lib="/usr/local/lib/R/site-library")
However, this doesn't catch all cases (because they are five library paths) and also removes duplicates that aren't mismatched for version (which don't really hurt and are sometimes necessary for a package to be installed). So, I'm wondering if someone has written a function which, given a vector of library paths, checks whether any package has a mismatched version installed in a different library path.
I ended up writing the following code; maybe it's useful to others.
An old package could hide in any library
.libPaths(c( "/usr/local/lib/opencpu/site-library",
"/usr/local/lib/R/site-library",
"/usr/lib/R/site-library",
"/usr/lib/R/library",
"/usr/lib/opencpu/library" ))
Get a list of duplicated packages
library(tidyverse)
pkgs <- installed.packages()
pkgs <- as.data.frame(pkgs)
dupes <- pkgs %>% select(Package, Version, LibPath) %>%
group_by(Package) %>%
filter(n_distinct(Version, na.rm = TRUE) > 1)
Check which version is installed in which library
dupes %>%
spread(LibPath, Version) %>%
knitr::kable()
Remove any duplicates with older versions
dupes %>%
group_by(Package) %>%
arrange(desc(Version)) %>%
filter(Version != first(Version)) %>%
purrr::pmap(~ remove.packages(..1, ..3))
I recently downloaded googlesheets via
devtools::install_github("jennybc/googlesheets")
and experience some difficulties. When running the script as mentioned in
https://github.com/jennybc/googlesheets I get always:
Error: could not find function "%>%"
How can I solve that problem?
Reproducible example:
Download:
devtools::install_github("jennybc/googlesheets")
require(googlesheets)
Data:
gap_key <- "1HT5B8SgkKqHdqHJmn5xiuaC04Ngb7dG9Tv94004vezA"
copy_ss(key = gap_key, to = "Gapminder")
gap <- register_ss("Gapminder")
Error occurs:
oceania_csv <- gap %>% get_via_csv(ws = "Oceania")
Load the dplyr package first, which provides the %>% operator. This is noted here in the README you link to (suppressMessages is optional):
googlesheets is designed for use with the %>% pipe operator and, to a lesser extent, the data-wrangling mentality of dplyr. The examples here use both, but we'll soon develop a vignette that shows usage with plain vanilla R. googlesheets uses dplyr internally but does not require the user to do so.
library("googlesheets")
suppressMessages(library("dplyr"))
You can install dplyr with
install.packages("dplyr")
See here for more about the pipe operator (%>%).
I'm running an example in R, going through the steps and everything is working so far except for this code produces an error:
words <- dtm %>%
as.matrix %>%
colnames %>%
(function(x) x[nchar(x) < 20])
Error: could not find function "%>%"
I don't understand what the benefit of using this special operator
%>% is, and any feedback would be great.
You need to load a package (like magrittr or dplyr) that defines the function first, then it should work.
install.packages("magrittr") # package installations are only needed the first time you use it
install.packages("dplyr") # alternative installation of the %>%
library(magrittr) # needs to be run every time you start R and want to use %>%
library(dplyr) # alternatively, this also loads %>%
The pipe operator %>% was introduced to "decrease development time and to improve readability and maintainability of code."
But everybody has to decide for himself if it really fits his workflow and makes things easier.
For more information on magrittr, click here.
Not using the pipe %>%, this code would return the same as your code:
words <- colnames(as.matrix(dtm))
words <- words[nchar(words) < 20]
words
EDIT:
(I am extending my answer due to a very useful comment that was made by #Molx)
Despite being from magrittr, the pipe operator is more commonly used
with the package dplyr (which requires and loads magrittr), so
whenever you see someone using %>% make sure you shouldn't load dplyr
instead.
On Windows: if you use %>% inside a %dopar% loop, you have to add a reference to load package dplyr (or magrittr, which dplyr loads).
Example:
plots <- foreach(myInput=iterators::iter(plotCount), .packages=c("RODBC", "dplyr")) %dopar%
{
return(getPlot(myInput))
}
If you omit the .packages command, and use %do% instead to make it all run in a single process, then works fine. The reason is that it all runs in one process, so it doesn't need to specifically load new packages.
One needs to install magrittr as follows
install.packages("magrittr")
Then, in one's script, don't forget to add on top
library(magrittr)
For the meaning of the operator %>% you might want to consider this question: What does %>% function mean in R?
Note that the same operator would also work with the library dplyr, as it imports from magrittr.
dplyr used to have a similar operator (%.%), which is now deprecated. Here we can read about the differences between %.% (deprecated operator from the library dplyr) and %>% (operator from magrittr, that is also available in dplyr)
The pipe operator is not available in base R. You need to load one of the following packages to use it: dplyr, tidyverse or magrittr
Anyone else stumbling upon this for calculating powers of matrices please install this library (dplyr alone not correct)
library(expm)
My question is some packages share the same function name. How can I tell R which package that I want to use this function from?
I tried to load the package that I wanted to use again in the code but it still did not work. My case is the select in MASS and dplyr. I want to use dplyr but the error is always unused argument...
You can use the :: operator:
iris %>%
head(n = 3) %>%
dplyr::select(Sepal.Length)
See here for details.
Or detach MASS ala this post.