SparkR distinct (on databricks) - r

I am new to SparkR, so please forgive if my question is very basic.
I work on databricks and try to get all unique dates of a column of a SparkDataFrame.
When I run:
uniquedays <- SparkR::distinct(df$datadate)
I get the error message:
unable to find an inherited method for function ‘distinct’ for signature ‘"Column"’
On Stack Overflow, I found out that this usually means
(If I run isS4(df), it returns TRUE):
That is the type of message you will get when attempting to apply an S4 generic function to an object of a class for which no defined S4 method exists
I also tried to run
uniquedays <- SparkR::unique(df$datadate)
where I get the error message:
unique() applies only to vectors
It feels like, I am missing something basic here.
Thank you for your help!

Try this:
library(magrittr)
uniquedays <- SparkR::select(df, df$datadate) %>% SparkR::distinct()

Related

Error: object 'skim_without_charts' not found [duplicate]

I got the error message:
Error: object 'x' not found
Or a more complex version like
Error in mean(x) :
error in evaluating the argument 'x' in selecting a method for function 'mean': Error: object 'x' not found
What does this mean?
The error means that R could not find the variable mentioned in the error message.
The easiest way to reproduce the error is to type the name of a variable that doesn't exist. (If you've defined x already, use a different variable name.)
x
## Error: object 'x' not found
The more complex version of the error has the same cause: calling a function when x does not exist.
mean(x)
## Error in mean(x) :
## error in evaluating the argument 'x' in selecting a method for function 'mean': Error: object 'x' not found
Once the variable has been defined, the error will not occur.
x <- 1:5
x
## [1] 1 2 3 4 5
mean(x)
## [1] 3
You can check to see if a variable exists using ls or exists.
ls() # lists all the variables that have been defined
exists("x") # returns TRUE or FALSE, depending upon whether x has been defined.
Errors like this can occur when you are using non-standard evaluation. For example, when using subset, the error will occur if a column name is not present in the data frame to subset.
d <- data.frame(a = rnorm(5))
subset(d, b > 0)
## Error in eval(expr, envir, enclos) : object 'b' not found
The error can also occur if you use custom evaluation.
get("var", "package:stats") #returns the var function
get("var", "package:utils")
## Error in get("var", "package:utils") : object 'var' not found
In the second case, the var function cannot be found when R looks in the utils package's environment because utils is further down the search list than stats.
In more advanced use cases, you may wish to read:
The Scope section of the CRAN manual Intro to R and demo(scoping)
The Non-standard evaluation chapter of Advanced R
While executing multiple lines of code in R, you need to first select all the lines of code and then click on "Run".
This error usually comes up when we don't select our statements and click on "Run".
Let's discuss why an "object not found" error can be thrown in R in addition to explaining what it means. What it means (to many) is obvious: the variable in question, at least according to the R interpreter, has not yet been defined, but if you see your object in your code there can be multiple reasons for why this is happening:
check syntax of your declarations. If you mis-typed even one letter or used upper case instead of lower case in a later calling statement, then it won't match your original declaration and this error will occur.
Are you getting this error in a notebook or markdown document? You may simply need to re-run an earlier cell that has your declarations before running the current cell where you are calling the variable.
Are you trying to knit your R document and the variable works find when you run the cells but not when you knit the cells? If so - then you want to examine the snippet I am providing below for a possible side effect that triggers this error:
{r sourceDataProb1, echo=F, eval=F}
# some code here
The above snippet is from the beginning of an R markdown cell. If eval and echo are both set to False this can trigger an error when you try to knit the document. To clarify. I had a use case where I had left these flags as False because I thought i did not want my code echoed or its results to show in the markdown HTML I was generating. But since the variable was then used in later cells, this caused an error during knitting. Simple trial and error with T/F TRUE/FALSE flags can establish if this is the source of your error when it occurs in knitting an R markdown document from RStudio.
Lastly: did you remove the variable or clear it from memory after declaring it?
rm() removes the variable
hitting the broom icon in the evironment window of RStudio clearls everything in the current working environment
ls() can help you see what is active right now to look for a missing declaration.
exists("x") - as mentioned by another poster, can help you test a specific value in an environment with a very lengthy list of active variables
I had a similar problem with R-studio. When I tried to do my plots, this message was showing up.
Eventually I realised that the reason behind this was that my "window" for the plots was too small, and I had to make it bigger to "fit" all the plots inside!
Hope to help
I'm going to add this on here even though it's not a new question as it comes quite highly in the search results for the error:
As mentioned above, re checking syntax, if you're using dplyr, make sure you have all the %>% pipes at the end of the lines above the error, otherwise the contents of anything like a select statement won't pass down into the next part of the code block.

Running a MCMC analysis with a new tree I made using BM, anyone know what this error would mean?

tree_mvBM <- read.nexus("C:/Users/Zach/Desktop/tree_mvBM.tre")
View(tree_mvBM)
dat <- data$Tp; names(dat) <- rownames(data)
Error in data$Tp : object of type 'closure' is not subsettable
You're trying to refer to an object called data in your global workspace, presumably a data frame. The object doesn't exist (you forgot to read it in,or you called it something else, or ... ?), so R is instead finding the built-in function data. It is trying to "subset" it (i.e. $Tp tells R to extract the element named "Tp"), which is not possible because you can't extract an element of a function. (Functions are called "closures" in R for technical reasons.)
This is one reason (probably the main reason) that you shouldn't give your variables names that match the names of built-in R objects (like I, t, c, data, df, ...). If you had called your data my_data instead the error message would be
Error: object 'my_data' not found
which might be easier to understand.
This is such a common error that there are jokes about it (image search the error message):

Error in UseMethod("select_") in Blogdown

I am using Blogdown to create a new post and I am getting the following error when trying to preview.
The code works well in my Rmarkdown file but I cannot update it to my blog. Do anyone know where the problem is?
Quitting from lines 36-47
Error in UseMethod("select_") :
no applicable method for 'select_' applied to an object of class "function"
Calls: local ... freduce -> -> select -> select.default -> select_
Execution halted
Here is my code in lines 36-47;
library(corrplot)
library(RColorBrewer)
library(tidyverse)
corrplot(cor(df %>% select(Sales, Customers, Store,
Open, SchoolHoliday,
DayOfWeek, month, year,
CompetitionDistance,
Promo, Promo2_active) %>%
filter(!is.na(Sales), !is.na(CompetitionDistance))),
type="upper", order="original",
col=brewer.pal(n=8, name="RdYlBu"))
Thanks a lot.
I think you're getting this error because you don't have an object called df in your global environment. Either your data frame hasn't been created yet or it is called something else. There is a little-known function called df in the stats package, which is on the search path when you start an R session. You can check this by starting a new R session and typing df into the console. You will see the body of the function stats::df.
You are therefore getting the error because you are trying to subset a function, not a data frame. To resolve the error, make sure you create a data frame called df before your call to corrplot

Error : could not find function "ImportMethodFrom"

I tried to run the code in Chapter 7 Data mining with R learning with case study book but I got an error in following line:
rankWorkflows(svm, maxs = TRUE)
The error was:
Error in as.character.default(X[[i]], ...) : no method for coercing
this S4 class to a vector
Then I searched on the internet and found following solution:
importMethodsFrom(GenomicRanges, as.data.frame)
and again again I got a new error:
Error: could not find function "importMethodFrom"
I searched a lot but I got nothing :(
You can try using library(sos) to find the packages where your function is located.
library(sos)
findFn("replaceherewithyourfunction")
Based on the answer of #Bea, there does not seem to be a importMethodsFrom anywhere in R. My guess is you found the call in a NAMESPACE file. Those files have different syntax than normal R scripts.
If you want to load a specific function from an R package (rather than all functions from a package), you can use libraryname::functionname instad of functionname in your code. In your case, replace as.data.frame with GenomicRanges::as.data.frame
If this does not work (for example because you don't have as.data.frame anywhere in your code), you can also load the whole GenomicRanges library with library(GenomicRanges)

R 'object XX not found' error thrown inside function, but not in script

I am fairly new to R, so my apologies if this question is a bit silly.
I am calling a function in an external package ('mmlcr', although I don't think that is directly relevant to my problem), and one of the required inputs (data) is a data.frame. I compose the data.frame from various data using the following approach (simplified for illustration):
#id, Time, and value are vectors created elsewhere in the code.
myData = data.frame(a=id, b=Time, c=value)
out <- mmlcr( input1, input2, data=myData, input4)
Which throws the error:
Error in is.data.frame(data) : object 'myData' not found
The debugger indicates that this error is thrown during the mmlcr() call.
I then added a print(ls()) immediately prior to the mmlcr() call, and the output confirmed that "myData" was in my function workspace; further is.data.frame(myData) returned TRUE. So it seems that "myData" is successfully being created, but for some reason it is not passing into the mmlcr() function properly. (Commenting this line causes no error to be thrown, so I'm pretty sure this is the problematic line).
However, when I put the exact same code in a script (i.e., not within a function block), no such error is thrown and the output is as expected. Thus, I assume there is some scoping issue that arises.
I have tried both assignment approaches:
myData = data.frame(a=id, b=Time, c=value)
myData <- data.frame(a=id, b=Time, c=value)
and both give me the same error. I admit that I don't fully understand the scope model in R (I've read about the differences between = and <- and I think I get it, but I'm not sure).
Any advice you can offer would be appreciated.
MMLCR is now deprecated and you should search for some alternatives. Without looking too much into it, I sleuthed through an old repo and found the culprit:
m <- eval(m, data)
in the function mmlcr.default. There are a lot of reasons why this is bad, but scoping is the big one. R has this issue with the subset.data.frame function, see my old SO question. Rather than modify the source code, I would find a way to do your function with a subroutine using a for, repeat, or while loop.

Resources