I am trying to execute Random Forest algorithm on SparkR, with Spark 1.5.1 installed. I am not getting clear idea, why i am getting the error -
Error: could not find function "includePackage"
Further even if I use mapPartitions function in my code , i get the error saying -
Error: could not find function "mapPartitions"
Please find the below code:
rdd <- SparkR:::textFile(sc, "http://localhost:50070/explorer.html#/Datasets/Datasets/iris.csv",5)
includePackage(sc,randomForest)
rf <- mapPartitions(rdd, function(input) {
## my function code for RF
}
This is more of a comment and a cross question rather than an answer (not allowed to comment because of the reputation) but just to take this further, if we are using the collect method to convert the rdd back to an R dataframe, isnt that counter productive as if the data is too large, it would take too long to execute in R.
Also does it mean that we could possibly use any R package say, markovChain or a neuralnet using the same methodology.
Kindly check the functions that can be in used in sparkR http://spark.apache.org/docs/latest/api/R/index.html
This doesn't include function mapPartitions() or includePackage()
#For reading csv in sparkR
sparkRdf <- read.df(sqlContext, "./nycflights13.csv",
"com.databricks.spark.csv", header="true")
#Possible way to use `randomForest` is to convert the `sparkR` data frame to `R` data frame
Rdf <- collect(sparkRdf)
#compute as usual in `R` code
>install.packages("randomForest")
>library(rainForest)
......
#convert back to sparkRdf
sparkRdf <- createDataFrame(sqlContext, Rdf)
Related
I'm using R Studio based on R 3.4.3. However, when I tried to call the forecast.HoltWinters function, R told me that "could not find function "forecast.HoltWinters"". Inspect the installed package (v8.2) told me that it's true, there is no forecast.HoltWinters. But the manual in https://cran.r-project.org/web/packages/forecast/ clearly stated that forecast.HoltWinters is still available.
I have also tried stats::HoldWinters, but it's working wrong. The code run fine on another computer, but it couldn't run at all on mine. Is there any solution?
Here is the code. Book2.csv has enough data to last more than 3 periods.
dltt <- read.csv("book2.csv", header = TRUE)
dltt.ts <- ts(dltt$Total, frequency=12, start=c(2014,4))
dltt.ts.hw <- HoltWinters(dltt.ts)
library(forecast)
dltt.ts.hw.fc <- forecast.HoltWinters(dltt.ts.hw) //Error as soon as I run this line
Fit a HoltWinters model using the HoltWinters function and then use forecast. Its all in the help for HoltWinters and forecast, namely "The function invokes particular _methods_ which depend on the class of the first argument". I'll copy the guts of it here:
m <- HoltWinters(co2)
forecast(m)
Note this will call the non-exported forecast.HoltWinters function, which you should never call directly using triple-colon notation as some may suggest.
Goal: mas5 normalize data.
Problem: when I try the following R code, I get this
error: unable to find an inherited method for function bg.correct for signature ExpressionFeatureSet, character
I have looked on SO, and found the following: What does this mean: unable to find an inherited method for function ‘A’ for signature ‘"B"’, but I am not exactly sure how to fix my specific problem and use the mas5 function properly. I have also looked at this affy manual but still stuck...
installpkg("affy")
library('affy')
setwd("/Users/er/Desktop/DesktopFolders/DataSets/CD8Helios/Microarray/CELfiles/CEL")
cel_Files <- list.celfiles()
affyRaw <- read.celfiles(cel_Files)
eset <- mas5(affyRaw)
If you are sure that the .cel files were created based on experiments performed on the type of array that works with affy package than you should try this workflow using ReadAffy from affy package.
cel_Files <- list.celfiles()
affyRaw <- affy::ReadAffy(filenames=cel_Files)
eset <- mas5(affyRaw)
However, it might be the case that the affy package is not designed for your array type. Then, you should switch to the oligo and oligoClasses packages and normalize with analogous function rma
cel_Files <- oligoClasses::list.celfiles()
affyRaw <- oligo::read.celfiles(cel_Files)
eset <- oligo::rma(affyRaw)
I am using R version 3.3.2 and the package copula version 0.999-15 to evaluate the fitting of the normal copula to my data. My data and code are:
Data: https://www.dropbox.com/s/tdg8bfzmy4nd1dd/jumps.dat?dl=0
library(copula)
data <- read.csv(file="jumps.dat", head=F, sep="")
cop_model <- ellipCopula("normal", dim = 2)
m <- pobs(as.matrix(data))
fitCopula(cop_model, m, method = 'mpl')
After I run the code I receive the following error:
Error in `freeParam<-`(`*tmp*`, value = estimate) : the length of 'value' is not equal to the number of free parameters
Calls: fitCopula ... fitCopula.ml -> fitCopStart -> fitCopula.icor -> freeParam<-
Execution halted
I have no idea what is happening here. The fitting for Clayton and Gumbel is pretty fine. Searching for similar errors in the web, I have found nothing. Reading the documentation (https://www.rdocumentation.org/packages/copula/versions/0.999-15/topics/fitCopula?) for some specificity for ellipCopula, I have found an specific option for posDef, but it did not returns any solution at all.
Old question, but I found this, so will share my solution.
Try to run the following, this is a minimum working example:
library(copula)
print("-----------")
mycop <- ellipCopula("normal", dim=4)
data <- matrix(runif(400), nrow=4)
fitCopula(mycop, t(data))
print("-----------")
For me, this works fine if I open R and type in the lines one by one, but fails if I run as a script with Rscript. The solution is that you need library(methods) as well.
For some reason this worked with copula v0.999-v14, but was broken by v0.999-v16. Alas.
I am trying to prepare data for cluster analysis. That's why I have prepared data tables in excel and the headers are "id","name","crime_type","crime_date","gender","age"
Then , I convert the excel into .csv format.
Then , I write the following command ->
>crime <- read.csv("crime_data.csv",header=T)
>crime # I print , and it prints
# now I will do cluster with kmeans()
>kmeans.result <- kmeans(crime,3)
But , it shows errors.
"Error is as follows :
Error in do_one(nmeth) : NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning message:
In kmeans(crime, 3) : NAs introduced by coercion"
What I am doing wrong here...
I can't speak to your specific problem without knowing what you data looks like but it could be as simple as giving the xlsx package a try. I think it handles NaNs better
install.packages(xlsx)
library(xlsx)
yourdata <- read.xlsx("YOURDATASHEET.xlsx", sheetName="THESHEETNAME")
Seems like you are asking two questions. For the first; you can also try reading directly from the clipboard (beware of large tables tough, but so far I have good results with 40k rows, 30 col)
d1<-read.table(file="clipboard",sep="\t",header=FALSE,stringsAsFactors=FALSE)
set header to TRUE if you want to name your columns. You can also use what was suggested above to open excel sheets directly but this might not be practical if you have non standard tables.
For the second part perhaps you should convert to numerical using the sapply function and or suppressWarnings().
Yesterday I posted this question on Stats Exchange and based on the response I got, I decided to do some analysis using R's src() function. It's part of the "sensitivity" package.
I installed the package with no trouble, and then tried the following command:
sens <- src(seminars, REV, rank=TRUE, nboot=100)
sens is a new variable to store the results of the test
seminars is a data frame that I imported from a CSV file using the read.csv() command
REV is the name of a variable/column in seminars and my desired response variable
When I ran the command, I got the following error:
Error in data.frame(Y = y, X) : object 'REV' not found
Any thoughts?
From the documentation of src
y: a vector containing the responses corresponding to the design
of experiments (model output variables).
The input needs to be a vector (apparently) and you're attempting to pass in a name (and not even quoting the name at that). Since REV isn't defined (I'm guessing due to the error message) in the global environment it doesn't know what to do.
From reading the documentation it sounds like what you want to do is pass sensitivity[,-which(colnames(sensitivity) == "REV")] (just the design matrix - you don't want to include the responses) in as x and sensitivity[,"REV"] in as y.
This error is linked to the fact that the data.frame X=seminars include factors with 0 value, which produce an error while constructing the regression coefficient. You can first remove them as they don't contribute to the variance of the output.