call a variable name from a dataframe within a function in R - r

I can't figure out a simple problem of how to call a variable name from a dataframe passed as string( function input).
I have a function defined as:
box_lambda = function(Valuename,data1){
data1[,Valuename]=ifelse(data1[,Valuename]==0,0.000001,data1[,Valuename])
b= boxcox(get(Valuename) ~ Age.Group+Sex , data = data1)
lambda <- b$x[which.max(b$y)]
return(lambda)
}
But this doesn't work as I get error:
Error in eval(f): 'list' object cannot be coerced to type 'double'
I tried
data1[[Valuename]]=ifelse(data1[Valuename]]==0,0.000001,data1[[Valuename]])
Any help is appreciated!

First you lost a bracket necessary to address a field:
data1[[Valuename]]
You can also use a seties of other approaches from [here][1] and from [here][2]. For instance you can use:
library(dplyr)
data %>%
filter(!!as.name(Valuename) == 0)
So finally you can use :
data1[[Valuename]][data1[[Valuename]]==0] <-0.000001
This script will replace 0 with epsilon and leave the other values.
[1]: https://stackoverflow.com/a/74173690/5043424
[2]: https://stackoverflow.com/a/48219802/5043424

Related

I tried to give a name for a function. There is a result with object not found due to an assignment error

I have a problem that there is a function (ex:ROC()). I want to assign this function a name such as right_roc <- ROC() but I have an error like that: object 'right_roc' not found. How to solve this problem? Thanks in advance.
Do it without the parenthesis to get the function itself but not the call of the function:
# create an alias for the sum function
f <- sum
f(1,2,3)
#[1] 6
Do partial function evaluation to create more specific versions of a function:
right_roc <- purrr::partial(nsROC::gROC, side="right")
right_roc(X, D)

Creating a function for GWR maps

I have created a function for GWR maps and I have run the code without it being in the function and it works well. However, when I create into a function I get an error. I was wondering if anyone could help, thank you!
#a=polygonshapefile
#b= Dependent variabable of shapefile
#c= Explantory variable 1
#d= Explantory vairbale 2
GWR_map <- function(a,b,c,d){
GWRbandwidth <- gwr.sel(a$b ~ a$c+a$d, a,adapt=T)
gwr.model = gwr(a$b ~ a$c+a$d, data = a, adapt=GWRbandwidth, hatmatrix=TRUE, se.fit=TRUE)
gwr.model
}
GWR_map(OA.Census,"Qualification", "Unemployed", "White_British")
The above code produces the following error:
Error in model.frame.default(formula = a$b ~ a$c + a$d, data = a, drop.unused.levels = TRUE) :
invalid type (NULL) for variable 'a$b'
You can't use function parameters with the $. Try changing your function to use the [[x]] notation instead. It should look like this:
GWR_map <- function(a,b,c,d){
GWRbandwidth <- gwr.sel(a[[b]] ~ a[[c]]+a[[d]], a,adapt=T)
gwr.model = gwr(a[[b]] ~ a[[c]]+a[[d]], data = a, adapt=GWRbandwidth, hatmatrix=TRUE, se.fit=TRUE)
gwr.model
}
The R help docs (section 6.2 on lists) explain this difference well:
Additionally, one can also use the names of the list components in double square brackets,
i.e., Lst[["name"]] is the same as Lst$name. This is especially useful, when the name of the component to be extracted is stored in another variable as in
x <- "name"; Lst[[x]] It is very important to distinguish Lst[[1]] from Lst[1]. ‘[[...]]’ is the operator used to select a single element, whereas ‘[...]’ is a general subscripting operator. Thus the former is the first object in the list Lst, and if it is a named list the name is not included. The latter
is a sublist of the list Lst consisting of the first entry only. If it is a named list, the names are transferred to the sublist.

make function detect nonexistent column when specified as df$x

I have functions that operate on a single vector (for example, a column in a data frame). I want users to be able to use $ to specify the columns that they pass to these functions; for example, I want them to be able to write myFun(df$x), where df is a data frame. But in such cases, I want my functions to detect when x isn't in df. How may I do this?
Here is a minimal illustration of the problem:
myFun <- function (x) sum(x)
data(iris)
myFun(iris$Petal.Width) # returns 180
myFun(iris$XXX) # returns 0
I don't want the last line to return 0. I want it to throw an error message, as XXX isn't a column in iris. How may I do this?
One way is to run as.character(match.call()) inside the function. I could then use the parts of the resulting string to determine the name of df, and in turn, I could check for the existence of x. But this seems like a not–so–robust solution.
It won't suffice to throw an error whenever x has length 0: I want to detect whether the vector exists, not whether it has length 0.
I searched for related posts on Stack Overflow, but I didn't find any.
The iris$XXX returns NULL and NULL is passed to sum
sum(NULL)
#[1] 0
Note that either iris$XXX or iris[['XXX']] returns NULL as value. If we need to get an error either subset or dplyr::select gives that
iris %>%
select(XXX)
Error: Can't subset columns that don't exist.
✖ Column XXX doesn't exist.
Run rlang::last_error() to see where the error occurred.
Or with pull
iris %>%
pull(XXX)
Error: object 'XXX' not found Run rlang::last_error() to see where
the error occurred.
subset(iris, select = XXX)
Error in eval(substitute(select), nl, parent.frame()) :
object 'XXX' not found
>
We could make the function to return an error if NULL is passed. Based on the way the function takes arguments, it is taking the value and not any info about the object.
myFun <- function (x) {
stopifnot(!is.null(x))
sum(x)
}
However, this would be non-specific error because NULL values can be passed to the function from other cases as well i.e. consider if the column exists and the value is NULL.
If we need to check if the column is valid, then the data and the column name should be passed into
myFun2 <- function(data, colnm) {
stopifnot(exists(colnm, data))
sum(data[[colnm]])
}
myFun2(iris, 'XXX')
#Error in myFun2(iris, "XXX") : exists(colnm, data) is not TRUE

What is difference between subset function and filter function in R?

When I do these two functions in R, one returns error, but one works well. Why? I think both functions return same thing.
impute[1,]$steps <- filter(steps_per_interval,
interval==impute[1,]$interval)[,2]
Error: invalid subscript type 'integer'
impute[1,]$steps <- subset(steps_per_interval,
interval==impute[1,]$interval)[,2]
Not sure if I'm correct, but seems like inside filter you can't make a reference combining $ and [] in the same expression as in interval==impute[1,]$interval. Instead you could try:
x < -which(colnames(impute)=="interval")
library(dplyr)
impute[1,]$steps <- filter(steps_per_interval,
interval==impute[1,x])[,2]

input 'data' is not double type?

While programming in R, I'm continuosly facing the following error::
Error in data.validity(data, "data") : Bad usage: input 'data' is
not double type.
Can anyone please explain why this error is happening, i.e. the reasons in the dataset which cause the error to arise?
Here is the code I'm running. The packages I have loaded are cluster, psych and clv.
data1 <- read.table(file='dataset.csv', sep=',', header=T, row.names=1)
data1.p <- as.matrix(data1)
hello.data <- data1.p[,1:15]
agnes.mod <- agnes(hello.data)
v.pred <- as.integer(cutree(agnes.mod,3)) # "cut" the tree
scatt <- clv.Scatt(hello.data, v.pred)
Error in data.validity(data, "data") :
Bad usage: input 'data' is not double type.
The key part of data.validity() raising the error is:
data = as.matrix(data)
if( !is.double(data) )
stop(paste("Bad usage: input '", name, "' is not double type.", sep=""))
data is converted to a matrix and then checked if it is a numeric matrix via is.double(). If it isn't numeric the clause is true and the error raised. So why isn't your data (hello.data) numeric when converted to a matrix? Either you have character variables in your data or there are factors. Do you have factors? Try
str(hello.data)
Are there any non-numeric variables in there? If you have character data then get rid of it. If you have factors, then data.validity() could coerce via data.matrix() but as it doesn't, try
hello.data <- data.matrix(hello.data)
after the line creating hello.data then run the rest of your code.
Whether this makes sense (treating a nominal or ordinal variable as a simple numeric) is unclear as you haven't provided a reproducible example or explained what your data are etc.

Resources