Getting this error from R Markdown when trying to export my .RMD
"Error in filter(Gastropods, Species == "Cellana") : object 'Species' not found Calls: <Anonymous> ... withCallingHandlers -> withVisible -> eval -> eval -> filter"
However, all my plots are coming out successfully. I can clearly see in the data that the species column is there and that Cellana is a species. No spelling errors or anything.
My first 20 or so lines of code are below
###
---
title: " Lab Report 2 - z5016113"
output: html_notebook
i---
#1. Gastropod abundance vs. height on the shore
```{r}
Gastropods <- read.csv(file = "MaroubraZones.csv", header = TRUE)
library(ggplot2, dplyr)
```
```{r}
Gastropods$Zone <- factor(Gastropods$Zone, levels = c("Low", "Mid", "High"))
```
```{r}
Cellana <- filter(Gastropods, Species == "Cellana") ------> This line is causing the error
```
```{r}
ggplot(Cellana, aes(Zone, Abundance)) + geom_boxplot()
```
###
It looks like this might be a bigger issue with DPLYR and filter and I found other posts suggesting they had the same problem and the answer seemed to add dplyr::filter rather than just filter in the command. Link to a similar issue
It might also be worth testing this by filtering out the mollusc of interest before you convert it to a factor?
I have also had similar issues filtering items out and a restart of R fixed the issue.
dplyr::filter has not been found because you haven't loaded dplyr, but since there are other function named filter in other packages, it tries to apply those instead (and fails).
From ?library:
library(package, [...])
[...]
package the name of a package, given as a name or literal
character string, or a character string, depending on whether
character.only is FALSE (default) or TRUE).
This means you can only load one package at a time. Here, you are trying to load both ggplot2 and dplyr in the same call. Only ggplot2 is loaded. The correct way to do this is:
library(dplyr)
library(ggplot2)
Related
How can I include inline R code that refers to a variable name that contains spaces or other unusual characters (actual use-case is Pr(>F))? Backticks are the solution in plain R script, but they don't seem to work when the code is inline in a markdown doc. Here's an example:
```{r}
df <- data.frame(mydata= 1:10, yourdata = 20:29)
names(df) <- c("your data", "my data")
```
The first five values of your data are `r df$`your data`[1:5]`
Which when knitted gives:
Quitting from lines 7-9 (test-main.Rmd)
Error in base::parse(text = code, srcfile = NULL) :
2:0: unexpected end of input
1: df$
^
Calls: <Anonymous> ... <Anonymous> -> withVisible -> eval -> parse_only -> <Anonymous>
Execution halted
Note that this is different from showing the backticks. All I want to do is have the code executed when the the doc is knitted. My workaround is to assign the value of the odd-named variable to another object with a simple name in the chunk preceding the inline code. But I'm curious about how to directly call inline these objects with unusual names.
In this instance can use normal quotes,
The first five values of your data are `r df$"your data"[1:5]`
or rather
The first five values of your data are `r df[["your data"]][1:5]`
I have noted to strange behaviour in the R exams package when I load the dplyr library. the below example only works if I explicitly call the dplyr namespace, as indicated in the comments. notice that the error only occurs in a fresh session, i.e. you need to restart R in order to see what I see. You need to place the below in a file exam.Rmd, then call
library(exams)
library(dplyr)
exams2html("exam.Rmd") # in pwd
# this is exam.Rmd
```{r datagen,echo=FALSE,results='hide',warning=FALSE,message=FALSE}
df = data.frame(i = 1:4, y = 1:4, group = paste0("g",rep(1:2,2)))
# works:
b2 = diff(dplyr::filter(df,group!="g1")$y)
b3 = diff(dplyr::filter(df,group!="g2")$y)
# messes up the complete exercise:
# b2 = diff(filter(df,group!="g1")$y)
# b3 = diff(filter(df,group!="g2")$y)
nq = 2
questions <- solutions <- explanations <- rep(list(""), nq)
type <- rep(list("num"),nq)
questions[[1]] = "What is the value of $b_2$ rounded to 3 digits?"
questions[[2]] = "What is the value of $b_3$ rounded to 3 digits?"
solutions[[1]] = b2
solutions[[2]] = b3
explanations[[1]] = paste("You have you substract the conditional mean of group 2 from the reference group 1. gives:",b2)
explanations[[2]] = paste("You have you substract the conditional mean of group 3 from the reference group 1",b3)
```
Question
========
You are given the following dataset on two variables `y` and `group`.
```{r showdata,echo=FALSE}
# kable(df,row.names = FALSE,align = "c")
df
```
some text with math
$y_i = b_0 + b_2 g_{2,i} + b_3 g_{3,i} + e_i$
```{r questionlist, echo = FALSE, results = "asis"}
answerlist(unlist(questions), markup = "markdown")
```
Solution
========
```{r sollist, echo = FALSE, results = "asis"}
answerlist(unlist(explanations), markup = "markdown")
```
Meta-information
================
extype: cloze
exsolution: `r paste(solutions,collapse = "|")`
exclozetype: `r paste(type, collapse = "|")`
exname: Dummy Manual computation
extol: 0.001
Thanks for raising this issue and to #hrbrmstr for explanation of one part of the problem. However, one part of the explanation is still missing:
Of course, the root of the problem is that both stats and dplyr export different filter() functions. And it can depend on various factors which function is found first.
In an interactive session it is sufficient to load the packages in the right order with stats being loaded automatically and dplyr subsequently. Hence this works:
library("knitr")
library("dplyr")
knit("exam.Rmd")
It took me a moment to figure out what is different when you do:
library("exams")
library("dplyr")
exams2html("exam.Rmd")
It turns out that in the latter code chunk knit() is called by exams2html() and hence the NAMESPACE of the exams package changes the search path because it fully imports the entire stats package. Therefore, stats::filter() is found before dplyr::filter() unless the code is evaluated in an environment where dplyr was loaded such as the .GlobalEnv. (For more details see the answer by #hrbrmstr)
As there is no pressing reason for the exams package to import the entire stats package, I have changed the NAMESPACE to import only the required functions selectively (which does not include the filter() function). Please install the development version from R-Forge:
install.packages("exams", repos = "http://R-Forge.R-project.org")
And then your .Rmd can be compiled without dplyr::... just by including library("dplyr") - either within the .Rmd or before calling exams2html(). Both should work now as expected.
Using your exams.Rmd, this is the source pane where I'm about to hit cmd-enter:
(I added quiet=FALSE so I could see what was going on).
Here's the console output after cmd-enter:
And here's the output:
If you read all the way through to the help on knit:
envir: Environment in which code chunks are to be evaluated, for example, parent.frame(), new.env(), or globalenv()).
So parent.frame() or globalenv() is required vs what you did (you don't seem to fully understand environments). You get TRUE from your exists() call because by default inherits is TRUE in the exists function and that tells the function to "[search] the enclosing frames of the environment" (from the help on exists.
And, you should care deeply about source code and triaging errors. You're using a programming language and open source software and you are right that the library(dplyr) didn't work inside the Rmd due to some terrible code choices in this "great" package and that you don't want pointed out since you don't want to look at source code.
End, as I can do no more for you. I just hope others benefit from this.
I'm trying to read in two dataframes into a comparitive object so I can plot them using pgls.
I'm not sure what the error being returned means, and how to go about getting rid of it.
My code:
library(ape)
library(geiger)
library(caper)
taxatree <- read.nexus("taxonomyforzeldospecies.nex")
LWEVIYRcombodata <- read.csv("LWEVIYR.csv")
LWEVIYRcombodataPGLS <-data.frame(LWEVIYRcombodata$Sum.of.percentage,OGT=LWEVIYRcombodata$OGT, Species=LWEVIYRcombodata$Species)
comp.dat <- comparative.data(taxatree, LWEVIYRcombodataPGLS, "Species")
Returns error:
> comp.dat <- comparative.data(taxatree, LWEVIYRcombodataPGLS, 'Species')
Error in if (tabulate(phy$edge[, 1])[ntips + 1] > 2) FALSE else TRUE :
missing value where TRUE/FALSE needed
This might come from your data set and your phylogeny having some discrepancies that comparative.data struggles to handle (by the look of the error message).
You can try cleaning both the data set and the tree using dispRity::clean.data:
library(dispRity)
## Reading the data
taxatree <- read.nexus("taxonomyforzeldospecies.nex")
LWEVIYRcombodata <- read.csv("LWEVIYR.csv")
LWEVIYRcombodataPGLS <- data.frame(LWEVIYRcombodata$Sum.of.percentage,OGT=LWEVIYRcombodata$OGT, Species=LWEVIYRcombodata$Species)
## Cleaning the data
cleaned_data <- clean.data(LWEVIYRcombodataPGLS, taxatree)
## Preparing the comparative data object
comp.dat <- comparative.data(cleaned_data$tree, cleaned_data$data, "Species")
However, as #MrFlick suggests, it's hard to know if that solves the problem without a reproducible example.
The error here is that I was using a nexus file, although ?comparitive.data does not specify which phylo objects it should use, newick trees seem to work fine, whereas nexus files do not.
Given a fresh session,
executing a small ggparcoord(.) example provided in the documentation of the function
library(GGally)
data(diamonds, package="ggplot2")
diamonds.samp <- diamonds[sample(1:dim(diamonds)[1], 100), ]
ggparcoord(data = diamonds.samp, columns = c(1, 5:10))
results into the following plot:
Again, starting in a fresh session and executing the same script with the loaded dplyr
library(GGally)
library(dplyr)
data(diamonds, package="ggplot2")
diamonds.samp <- diamonds[sample(1:dim(diamonds)[1], 100), ]
ggparcoord(data = diamonds.samp, columns = c(1, 5:10))
results in:
Error: (list) object cannot be coerced to type 'double'
Note that the order of the library(.) statements does not matter.
Questions
Is there something wrong with the code samples?
Is there a way to overcome the problem (over some namespace functions)?
Or is this a bug?
I need both dplyr and ggparcoord(.) in a bigger analysis but this minimal example reflects the problem i am facing.
Versions
R # 3.2.3
dplyr # 0.4.3
GGally # 1.0.1
ggplot # 2.0.0
UPDATE
To wrap the excellent answer given by Joran up:
Answers
The code samples are in fact wrong as ggparcoord(.) expects a data.frame not a tbl_df as given by the diamonds data set (if dplyr is loaded).
The problem is solved by coercing the tbl_df to a data.frame.
No it is not a bug.
Working code sample:
library(GGally)
library(dplyr)
data(diamonds, package="ggplot2")
diamonds.samp <- diamonds[sample(1:dim(diamonds)[1], 100), ]
ggparcoord(data = as.data.frame(diamonds.samp), columns = c(1, 5:10))
Converting my comments to an answer...
The GGally package here is making the reasonable assumption that using [ on a data frame should behave the way it always does and always has. However, this all being in the Hadley-verse, the diamonds data set is a tbl_df as well as a data.frame.
When dplyr is loaded, the behavior of [ is overridden such that drop = FALSE is always the default for a tbl_df. So there's a place in GGally where data[,"cut"] is expected to return a vector, but instead it returns another data frame.
...specifically, the error is thrown in your example while attempting to execute:
data[, fact.var] <- as.numeric(data[, fact.var]).
Since data[,fact.var] remains a data frame, and hence a list, as.numeric won't work.
As for your conclusion that this isn't a bug, I'd say....maybe. Probably. At least there probably isn't anything the GGally package author ought to do to address it. You just have to be aware that using tbl_df's with non-Hadley written packages may break things.
As you noted, removing the extra class attributes fixes the problem, as it returns R to using the normal [ method.
Workaround: coerce your data for ggparcoord to as.data.table(...) or as.data.table(... , keep.rownames=TRUE) unless you want to lose all your rownames.
Cause: as per #joran's investigating, when dplyr is loaded, tbl_df overrides [ so that drop = FALSE.
Solution: file a pull-request on GGally.
edit: fixed in v1.3.0 (https://github.com/ggobi/ggally/commit/bfa930d102289d723de2ce9ec528baf42b3b7b40)
I am putting together a small tutorial using R Markdown. I want to include the following:
This is how we create a vector in R:
{r}
x <- c(1,2,3,4,5)
x
We need to use the c() operator to do this; if we don't we get an error as shown below:
{r}
x <- (1,2,3,4,5)
I want to show the error message that R would normally give if I tried to create a vector without the c(), namely
"Error: unexpected ',' in "x <- (1,"
However, when I knit the markdown it stops at the line containing the error. So, how do I deliberately include a line with a mistake in it, in order to demonstrate the error?
Thanks.
Try
```{r, error = TRUE}
x <- (1, 2, 3, 4, 5)
```
An upgrade to knitr changed the default behavior of the error option from TRUE to FALSE. You'll either need to set error = TRUE in each chunk where you want it or use opts_chunk$set(error = TRUE) at the start of your script.
I'm not sure Benjamin's answer will work. Doesn't work for me at least - because the error is a syntax error.
I have two imperfect solutions to this problem. You can 'hack' something that looks right by not evaluating the code with the syntax error, then having a chunk underneath which is evaluated and just shows the error message.
```{r, eval = FALSE}
x <- (1,2,3,4,5)
```
```{r, echo = FALSE}
cat("Error: unexpected \',\' in \'try(x <- (1,\'")
```
Or you can run the code in a different engine. However, it also gives a message saying execution halted that I can't work out how to remove.
```{r, engine='Rscript', error=TRUE}
x <- (1, 2, 3, 4, 5)
```