I have a list of data frames, xyz, and in every data frame there are 2 numeric vectors (x and y). I want to apply the interpSpline function from package splines to x and y, but when I do :
lapply(xyz, function (x){
x%>%
interpSpline(x,y)
})
I get the following error:
Error in data.frame(x = as.numeric(obj1), y = as.numeric(obj2)) :
(list) object cannot be coerced to type 'double'
It doesn't work because interpSpline doesn't take a data frame as its first argument.
xyz <- list(data.frame(x=rnorm(10),y=rnorm(10)),
data.frame(x=rnorm(10),y=rnorm(10)))
library(splines)
sf <- function(d) with(d,interpSpline(x,y)))
s <- lapply(xyz,sf)
You could also use interpSpline(d$x,d$y). It might be possible to do enough contortions to get interpSpline to work with pipes, but it hardly seems worth the trouble ...
Per your comment on Ben's answer, interpSpline() requires the input x values to be unique. So, to avoid this error you could use the spline() function instead of interpSpline(). This will set s equal to the interpolated values of the spline at each x,y input coordinate. However, you will not have all the other output that you get from interpSpline().
set.seed(1)
# fake up some data that has duplicate 'x' values
xyz <- list(data.frame(x=round(rnorm(100),1),y=round(rnorm(100),1)),
data.frame(x=round(rnorm(100),1),y=round(rnorm(100),1)))
library(splines)
sf <- function(d) with(d,spline(x,y))
s <- lapply(xyz,sf)
Related
I am trying to produce a list of graphical objects (grobs), suitable for passing to gridExtra::grid.arrange (or some ggequivalent if that is easier). I have redrafted my example to eliminate multiple problems that were kindly pointed out by #MrFlick and #Ronak Shah. However, I still get the same error message:
Error in .f(.x[[i]], ...) : unused argument (NULL)
(and I don't get a single google hit for "unused argument (NULL)" r | purrr. The closest I find is unused argument (.null = NA).)
Each object is a set of four plots (plotted together) that are output as side effects from the acf or pacf functions, showing the autocorelation and autocovarience between two variables and their various lags. I am trying to write a function that will take a tsibble, a vector of independent variable column names from that tsibble, and the name of a single outcome variable (also a column in the input tsibble) and, using purrr, take the acf or pacf of the pairs of variables defined by the single name with each of the specified columns, convert the plot to a grob, and output a list of the grobs.
Here is my non-working function, acf version:
acf_grobs <- function(dat., mod_nms, outcome){
depend <- select(dat., all_of(mod_nms))
out <- map(depend, ~ grob(acf(x=cbind(.x, y))), y = dat.[[outcome]])
out
}
And a reproducible example:
library(tidyverse)
library(fpp3)
I know that this loads a lot more packages than needed for this example, but I am loading all these packages for other reasons.
aa <- 1:4
bb <- 1:4
cc <- 1:4
day <- 1:4
tb <- tibble(aa, bb, cc, day)
tsb <- as_tsibble(tb, index = day)
var_of_interest <- "aa"
mod.nms <- c("bb", "cc")
pacf_grobs(dat. = tsb, mod_nms = mod.nms, outcome = var_of_interest)
I still believe the error message is caused by my failure to pass the single argument outcome to the function argument of purrr::map correctly, though this belief is shaken because I am now passing it a different way, and still getting the same error message.
I'm sure the question is a bit dummy (sorry)... I'm trying to create a function using differents variables I have stored in a Dataframe. The function is like that:
mlr_turb <- function(Cond_in, Flow_in, pH_in, pH_out, Turb_in, nm250_i, nm400_i, nm250_o, nm400_o){
Coag = (+0.032690 + 0.090289*Cond_in + 0.003229*Flow_in - 0.021980*pH_in - 0.037486*pH_out
+0.016031*Turb_in -0.026006*nm250_i +0.093138*nm400_o - 0.397858*nm250_o - 0.109392*nm400_o)/0.167304
return(Coag)
}
m4_turb <- mlr_turb(dataset)
The problem is when I try to run my function in a dataframe (with the same name of variables). It doesn't detect my variables and shows this message:
Error in mlr_turb(dataset) :
argument "Flow_in" is missing, with no default
But, actually, there is, also all the variables.
I think I missplace or missing some order in the function that gives it the possibility to take the variables from the dataset. I have searched a lot about that but I have not found any answer...
No dumb questions!
I think you're looking for do.call. This function allows you to unpack values into a function as arguments. Here's a really simple example.
# a simple function that takes x, y and z as arguments
myFun <- function(x, y, z){
result <- (x + y)/z
return(result)
}
# a simple data frame with columns x, y and z
myData <- data.frame(x=1:5,
y=(1:5)*pi,
z=(11:15))
# unpack the values into the function using do.call
do.call('myFun', myData)
Output:
[1] 0.3765084 0.6902654 0.9557522 1.1833122 1.3805309
You meet a standard problem when writing R that is related to the question of standard evaluation (SE) vs non standard evaluation (NSE). If you need more elements, you can have a look at this blog post I wrote
I think the most convenient way to write function using variables is to use variable names as arguments of the function.
Let's take again #Muon example.
# a simple function that takes x, y and z as arguments
myFun <- function(x, y, z){
result <- (x + y)/z
return(result)
}
The question is where R should find the values behind names x, y and z. In a function, R will first look within the function environment (here x,y and z are defined as parameters) then it will look at global environment and then it will look at the different packages attached.
In myFun, R expects vectors. If you give a column name, you will experience an error. What happens if you want to give a column name ? You must say to R that the name you gave should be associated to a value in the scope of a dataframe. You can for instance do something like that:
myFun <- function(df, col1 = "x", col2 = "y", col3 = "z"){
result <- (df[,col1] + df[,col2])/df[,col3]
return(result)
}
You can go far further in that aspect with data.table package. If you start writing functions that need to use variables from a dataframe, I recommend you to start having a look at this package
I like Muon's answer, but I couldn't get it to work if there are columns in the data.frame not in the function. Using the with() function is a simple way to make this work as well...
#Code from Muon:
# a simple function that takes x, y and z as arguments
myFun <- function(x, y, z){
result <- (x + y)/z
return(result)
}
# a simple data frame with columns x, y and z
myData <- data.frame(x=1:5,
y=(1:5)*pi,
z=(11:15),
a=6:10) #adding a var not used in myFun
# unpack the values into the function using do.call
do.call('myFun', myData)
#generates an error for the unused "a" column
#using with() function:
with(myData, myFun(x, y, z))
I discovered plyr and was playing with an example but could not understand why it does not work: I have a data frame of 10 (x,y) coordinates and want to plot these points one after the other
## Creating the data
df <- data.frame(a=rnorm(10),b=rnorm(10))
## Empty plot
plot(0, xlim=c(-2,2), ylim=c(-2,2))
## Function to be repeated
plot.pts <- function(x){
points(x$a,x$b)
}
## Magic d_plyr
d_ply(df,plot.pts)
But I get the error
Error in UseMethod("as.quoted") :
no applicable method for 'as.quoted' applied to an object of class "function"
I understood d_ply is the function to be used in that case, hence what am I doing wrong?
Because you are not really dividing the dataframe into groups based on variables, just calling a function for each row, I think a_ply suits better than d_ply:
a_ply(df,.margins = 1, .fun = plot.pts)
In your original d_ply call you passed your function where the .variables argument that tells d_ply how to group the data was supposed to be, giving you that error.
I try to run this line :
knn(mydades.training[,-7],mydades.test[,-7],mydades.training[,7],k=5)
but i always get this error :
Error in knn(mydades.training[, -7], mydades.test[, -7], mydades.training[, :
NA/NaN/Inf in foreign function call (arg 6)
In addition: Warning messages:
1: In knn(mydades.training[, -7], mydades.test[, -7], mydades.training[, :
NAs introduced by coercion
2: In knn(mydades.training[, -7], mydades.test[, -7], mydades.training[, :
NAs introduced by coercion
Any idea please ?
PS : mydades.training and mydades.test are defined as follow :
N <- nrow(mydades)
permut <- sample(c(1:N),N,replace=FALSE)
ord <- order(permut)
mydades.shuffled <- mydades[ord,]
prop.train <- 1/3
NOMBRE <- round(prop.train*N)
mydades.training <- mydades.shuffled[1:NOMBRE,]
mydades.test <- mydades.shuffled[(NOMBRE+1):N,]
I suspect that your issue lies in having non-numeric data fields in 'mydades'. The error line:
NA/NaN/Inf in foreign function call (arg 6)
makes me suspect that the knn-function call to the C language implementation fails. Many functions in R actually call underlying, more efficient C implementations, instead of having an algorithm implemented in just R. If you type just 'knn' in your R console, you can inspect the R implementation of 'knn'. There exists the following line:
Z <- .C(VR_knn, as.integer(k), as.integer(l), as.integer(ntr),
as.integer(nte), as.integer(p), as.double(train), as.integer(unclass(clf)),
as.double(test), res = integer(nte), pr = double(nte),
integer(nc + 1), as.integer(nc), as.integer(FALSE), as.integer(use.all))
where .C means that we're calling a C function named 'VR_knn' with the provided function arguments. Since you have two of the errors
NAs introduced by coercion
I think two of the as.double/as.integer calls fail, and introduce NA values. If we start counting the parameters, the 6th argument is:
as.double(train)
that may fail in cases such as:
# as.double can not translate text fields to doubles, they are coerced to NA-values:
> as.double("sometext")
[1] NA
Warning message:
NAs introduced by coercion
# while the following text is cast to double without an error:
> as.double("1.23")
[1] 1.23
You get two of the coercion errors, which are probably given by 'as.double(train)' and 'as.double(test)'. Since you did not provide us with exact details of how 'mydades' is, here are some of my best guesses (and an artificial multivariate normal distribution data):
library(MASS)
mydades <- mvrnorm(100, mu=c(1:6), Sigma=matrix(1:36, ncol=6))
mydades <- cbind(mydades, sample(LETTERS[1:5], 100, replace=TRUE))
# This breaks knn
mydades[3,4] <- Inf
# This breaks knn
mydades[4,3] <- -Inf
# These, however, do not introduce the coercion for NA-values error message
# This breaks knn and gives the same error; just some raw text
mydades[1,2] <- mydades[50,1] <- "foo"
mydades[100,3] <- "bar"
# ... or perhaps wrongly formatted exponential numbers?
mydades[1,1] <- "2.34EXP-05"
# ... or wrong decimal symbol?
mydades[3,3] <- "1,23"
# should be 1.23, as R uses '.' as decimal symbol and not ','
# ... or most likely a whole column is non-numeric, since the error is given twice (as.double problem both in training AND test set)
mydades[,1] <- sample(letters[1:5],100,replace=TRUE)
I would not keep both the numeric data and class labels in a single matrix, perhaps you could split the data as:
mydadesnumeric <- mydades[,1:6] # 6 first columns
mydadesclasses <- mydades[,7]
Using calls
str(mydades); summary(mydades)
may also help you/us in locating the problematic data entries and correct them to numeric entries or omitting non-numeric fields.
The rest of the run code (after breaking the data), as provided by you:
N <- nrow(mydades)
permut <- sample(c(1:N),N,replace=FALSE)
ord <- order(permut)
mydades.shuffled <- mydades[ord,]
prop.train <- 1/3
NOMBRE <- round(prop.train*N)
mydades.training <- mydades.shuffled[1:NOMBRE,]
mydades.test <- mydades.shuffled[(NOMBRE+1):N,]
# 7th column seems to be the class labels
knn(train=mydades.training[,-7],test=mydades.test[,-7],mydades.training[,7],k=5)
Great answer by#Teemu.
As this is a well-read question, I will give the same answer from an analytics perspective.
The KNN function classifies data points by calculating the Euclidean distance between the points. That's a mathematical calculation requiring numbers. All variables in KNN must therefore be coerce-able to numerics.
The data preparation for KNN often involves three tasks:
(1) Fix all NA or "" values
(2) Convert all factors into a set of booleans, one for each level in the factor
(3) Normalize the values of each variable to the range 0:1 so that no variable's range has an unduly large impact on the distance measurement.
I would also point out that the function seems to fail when using integers. I needed to convert everything into "num" type prior to calling the knn function. This includes the target feature, which most methods in R use the factor type. Thus, as.numeric(my_frame$target_feature) is required.
Instead of writing one vector subscript operation a line, such as:
x.and.y <- intersect(x, y)
idx.x <- match(x, x.and.y)
idx.x <- idx.x[!is.na(idx.x)]
I could chain them in one line:
x.and.y <- intersect(x, y)
idx.x <- subset(tmp <- match(x, x.and.y), !is.na(tmp))
In order to do that, I must give intermediate vector a name to be used in subscript operations. To make code even more concise, is there a way to refer to a vector anonymously? Like this:
x.and.y <- intersect(x, y)
idx.x <- match(x, x.and.y)[!is.na] ## illegal R
Considering intersect calls match, what you're doing is redundant. intersect is defined as:
function (x, y)
{
y <- as.vector(y)
unique(y[match(as.vector(x), y, 0L)])
}
And you can get the same result as your 3 lines of code by using %in%: x[y%in%x].
I realize this may not be representative of your actual problem, but "referring to a vector anonymously" doesn't really fit the R paradigm. Function arguments are pass-by-value. You're essentially saying, "I want a function to manipulate an object, but I don't want to provide the object to the function."
You could use R's scoping rules to do this (which is what mplourde did using Filter with an anonymous function), but you're going to create quite a bit of convoluted code that way.