lme object within a function uses response variable from previous run - r

I am using a lme object within a function but it does not use the response variable that I feed it, instead it uses the response variable from the previous time I called the function.
library(nlme)
library(car)
# DATA (Example)
S1=data.frame(blok = c(rep("blokI",16),rep("blokII",16)),
treat=rep(c("plus","plus","min","min"),8),
field = c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,16,16),
subfield = rep(c("a","b"),16),
var1=rnorm(32)^2,
var2=rnorm(32)^2)
S2=data.frame(blok = c(rep("blokI",16),rep("blokII",16)),
treat=rep(c("plus","plus","min","min"),8),
field = c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15,16,16),
subfield = rep(c("a","b"),16),
var3=rnorm(32)^2,
var4=rnorm(32)^2)
# FUNCTION
get.stats = function(S,dofor){
U = data.frame(crop = NA, C = NA, T_S=NA, T_F=NA, T_DF=NA, T_P=NA, R_S=NA, R_F=NA, R_DF=NA, R_P=NA, TR_S=NA,TR_F=NA, TR_DF=NA, TR_P=NA)
for (i in 1:length(dofor)){
f = as.formula(paste(dofor[i]," ~ treat",sep=""))
model = lme(f, data=S, random = ~1|blok/treat/field/subfield)
av = Anova(model, type = "II")
U[i,1] = NaN
U[i,2] = dofor[i]
U[i,3] = av$`Chisq`[1]
U[i,4] = NaN
U[i,5] = av$Df[1]
U[i,6] = av$`Pr(>Chisq)`[1]
U[i,7] = av$`Chisq`[2]
U[i,8] = NaN
U[i,9] = av$Df[2]
U[i,10] = av$`Pr(>Chisq)`[2]
U[i,11] = av$`Chisq`[3]
U[i,12] = NaN
U[i,13] = av$Df[3]
U[i,14] = av$`Pr(>Chisq)`[3]
}
return(U)
}
dofor=c("var1","var2")
U = get.stats(S1, dofor) # First call of function
dofor=c("var3","var4")
U = rbind(U, get.stats(S2, dofor)) # Second call of function
If I call the function the first time I get Error in eval(x$call$fixed) : object 'f' not found, which is already strange to me as f is defined within the function. I execute the command line where f is defined for i = 1 and call the function again. Now it works, but when I call the function a second time with S2 as data input I get the error:
Error in eval(predvars, data, env) : object 'var2' not found
I have to repeat this model for a many variables on yield of different crops for different years. With this function I intend to collect all the statistics for the different crop species in one matrix which I can latter edit in an Excel sheet.
Any suggestions on how to solve or by-pass my problem?

Related

Making a function that builds a dataframe

I'm trying to make a function that basically builds a dataframe and returns it. This new dataframe is made of columns taken from another dataframe that I have, called metadata.. in addetion to some additional data that I want to control, by passing the TRUE or FALSE values when calling the function.
Here is what I did:
make_data = function(metric, use_additions = FALSE){
data = data.frame(my_metric = metadata[['metric']], gender = metadata$Gender ,
age = as.numeric(metadata$Age) , use_additions = t(additional_data))
data = data %>% dplyr::select(my_metric, everything())
return(data)
}
data = make_data(CR, FALSE)
I want to pass different metric values each time, and all other features stay the same. So here for example I called the function with metric as CR which is the name of the column I want in the metadata. The argument I want to control is use_additions, sometines I want to add it and sometimes I don't.
metadata and additional_data have the exact same row names and the same rows number. It's just adding the data or not.
I get this error(s):
Error in data.frame(metric = metadata[["metric"]], gender = metadata$Gender, :
arguments imply differing number of rows: 0, 1523
In addition: Warning message:
In data.frame(metric = metadata[["metric"]], gender = metadata$Gender, :
Error in data.frame(my_metric = metadata[["metric"]], gender = metadata$Gender, :
arguments imply differing number of rows: 0, 1523
I've tried several ways to do this, with '' and without, using the $, but non of these worked. So for example when I type metric = metadata[[metric]] I get this:
Error in (function(x, i, exact) if (is.matrix(i)) as.matrix(x)[[i]] else .subset2(x, :
object 'CR' not found
make_data = function(colname, use_additions = FALSE){
data = data.frame(my_metric = metadata[colname], gender = metadata$Gender ,
age = as.numeric(metadata$Age))
if (use_additions) data$use_additions=additional_data
return(data)
}
data = make_data(“CR”, FALSE)

for loop in ctree [R]

I want to run a decision tree for each variable in my dataframe, so I'm using this:
results_cont = list()
for (i in 2:(ncol(DATA)-1)) {
current_var = colnames(DATA[i])
current_result = ctree(TARGET ~ current_var, DATA, control = ctrl)
results_cont[[i]] = current_result
}
Where DATA is a dataframe where the first column is the ID and the last column (TARGET) is my binary Target.
I keep getting this error:
Error in trafo(data = data, numeric_trafo = numeric_trafo, factor_trafo = factor_trafo, :
data class “character” is not supported
But I don't have any character in mi dataframe.
Is there anything wrong with my loop or something else ?
Thank you guys.
Since you do not provide data, I have not tested this, but I believe your problem is the line
current_result = ctree(TARGET ~ current_var, DATA, control = ctrl)
This is not working because current_var is just a character string. You need to build the formula as a string and then convert it to a formula - like this:
current_var = colnames(DATA[i])
FORM = as.formula(paste("TARGET ~ ", current_var))
current_result = ctree(FORM, DATA, control = ctrl)

Resolving a variable value within a function call R

I have a data frame defined as follows:
model_comp
logLik IC Lack of fit Res var
W2.4 -353.2939 716.5878 1.361885e-01 26.80232
baro5 -353.2936 718.5871 NaN 27.04363
LL.5 -353.2940 718.5880 NaN 27.04384
LL.3 -360.3435 728.6871 3.854799e-04 29.99842
W1.3 -360.3842 728.7684 3.707592e-04 30.01948
W1.4 -360.3129 730.6258 7.850947e-05 30.25028
LL.4 -360.3170 730.6340 7.818416e-05 30.25243
The best model fit is the one with the lowest IC (information criteria). I want to use the best fit to do some plotting etc... So I created:
> bestmodel <- noquote(paste0(as.name(rownames(model_comp[which.min(model_comp$IC),])),"()"))
> bestmodel
[1] W2.4()
I want to use the W2.4() as a function call to a the DRC package.
For example this call works when manually specified:
drm(y~x,logDose = 10, fct=W2.4())
I'm trying to use the value in bestmodel instead to do something like:
drm(y~x,logDose = 10,fct = as.formula(paste(bestmodel)))
I've tried all the options given here with no success. I've messed with as.formula(), noquote(), as.name() with no success.
I also tried as.name(paste0(as.name(bestmodel),"()")) where I didn't add on the "()" in the bestmodel definition above. Still no dice.
model_comp <- structure(list(logLik = c(-353.293902612472, -353.293568997018,
-353.294024776211, -360.343530770823, -360.384220907907, -360.312897918459,
-360.317018443052), IC = c(716.587805224944, 718.587137994035,
718.588049552421, 728.687061541646, 728.768441815814, 730.625795836919,
730.634036886105), `Lack of fit` = c(0.136188459104035, NaN,
NaN, 0.000385479884900107, 0.000370759187117765, 7.85094742623572e-05,
7.81841606352332e-05), `Res var` = c(26.8023196097934, 27.0436263934882,
27.0438389102235, 29.9984226526044, 30.0194755526501, 30.2502847248304,
30.2524338881051)), .Names = c("logLik", "IC", "Lack of fit",
"Res var"), row.names = c("W2.4", "baro5", "LL.5", "LL.3", "W1.3",
"W1.4", "LL.4"), class = "data.frame")
Just using noquote() not to draw the quotes around a string doesn't turn a character value into an executable piece of code. There is a big different in R between a character value an a symbol or function call. You can't really just replace one with the other.
So let's say you have extracted the character value from the rownames
x <- "W2.4"
This is basically the string version of the function you want. You can get the value of a symbol (in this case the function W2.4 from the drc:package) from its string name with get(). So you can call
drm(y~x, logDose = 10, fct = get(x)())
Note the extra parenthesis. The get(x)-call returns the W2.4 function, and the second set of parenthesis calls that function returned by get().
Using the ryegrass dataset that comes with the drc package, we can see that these two lines return the same thing
drm(rootl ~ conc, data = ryegrass, fct = W2.4())
drm(rootl ~ conc, data = ryegrass, fct = get(x)())

Calling R function in Vertica: Failure in UDx RPC call ... Exception in processPartitionForR: [0 (non-NA) cases]

I have created a transform UDF in R, which runs linear regression over a table partitioned by an Id of some entity. I tested it in console and it worked flawlessly, the results made sense and all was good. However, in the practical setting (code ran by the server, not me manually) with the same data I always see the same error:
ERROR 3399: Failure in UDx RPC call InvokeProcessPartition(): Error calling processPartition() in User Defined Object [remove_temperature_correlation] at [/scratch_a/release/24506/vbuild/vertica/OSS/UDxFence/RInterface.cpp:1387], error code: 0, message: Exception in processPartitionForR: [0 (non-NA) cases]
Here is the actual function:
require('splines')
timefy <- function(time) {
time = as.POSIXct(time, tz='utc', origin='1970-01-01T00:00:00Z')
return(time)
}
remove_temperature_correlation <- function(data, params=list()) {
names(data) = c('Time', 'Value', 'Temperature')
# Check params
df = params[['df']]
if (is.null(df))
df = 4
degree = params[['degree']]
if (is.null(degree))
degree = 1
# Convert Vertica timestamps to R's POSIXct format
data$ct = timefy(data$Time)
# Fit model
formula = Value ~ bs(Temperature, df = df, degree = degree)
fitmodel = lm(formula, data=data)
data$NormalizedValue = residuals(fitmodel)
return(data[c('Time', 'Value', 'Temperature', 'NormalizedValue')])
}
remove_temperature_correlation_parameters <- function()
{
num_params = 2
param = data.frame(datatype=rep(NA, num_params),
length=rep(NA, num_params),
scale=rep(NA, num_params),
name=rep(NA, num_params))
param[1,1] = 'int'
param[1,4] = 'df'
param[2,1] = 'int'
param[2,4] = 'degree'
return(param)
}
remove_temperature_correlation_return_type <- function(x, param)
{
num_params = 4
param = data.frame(datatype=rep(NA, num_params),
length=rep(NA, num_params),
scale=rep(NA, num_params),
name=rep(NA, num_params))
param[1,1] = 'timestamptz'
param[1,4] = 'Time'
param[2,1] = 'float'
param[2,4] = 'Value'
param[3,1] = 'float'
param[3,4] = 'Temperature'
param[4,1] = 'float'
param[4,4] = 'NormalizedValue'
return(param)
}
remove_temperature_correlation_factory <- function()
{
list(name=remove_temperature_correlation,
udxtype=c('transform'),
# time, value, temperature
intype=c('timestamptz', 'float', 'float'),
outtype=c('any'),
outtypecallback=remove_temperature_correlation_return_type,
parametertypecallback=remove_temperature_correlation_parameters,
volatility=c('stable'),
strict=c('called_on_null_input'))
}
I was trying to simulate a situation that causes similar error in test environment and found that supplying the function with just a single row raises the same error. I'm very new to Vertica and it's UDFs so it would really help if I could get some ideas on how to debug it further and ideas on what could be the cause. Some googling and conversations led me to believe that the cause is in the way the data is partitioned (maybe some empty partitions arriving to the UDF?)
Here is how I call the UDF:
create local temporary table TEMP_table
on commit preserve rows
as
SELECT
Id,
remove_temperature_correlation(Time, Value, Temperature USING PARAMETERS df = 4, degree=1)
OVER(partition by Id order by Time)
FROM temp_input;
The temp_input table is simply storing the corresponding data in multiple rows.
What could be the way to solve it, any ideas on how to find out where exactly this error happens and how to handle it?

glmulti wrapper for lmer does not produce results

I am using a glmulti wrapper for glmer (binomial) and the summary is:
This is glmulti 1.0.7, Apr. 2013.
Length Class Mode
0 NULL NULL
Following what has been done on this this thread, though this is for lmer,
glmulti runs indefinitely when using genetic algorithm with lme4, I get the same result as above. Could it be that the versions have changed since and the wrapping has to be done differently? The following is the dummy code (lifted form the link above):
x = as.factor(round(runif(30),1))# dummy grouping factor
yind = runif(30,0,10) # mock dependent variable
a = runif(30) # dummy covariate
b = runif(30) # another dummy covariate
c = runif(30) # an another one
d = runif(30)
tmpdata <- data.frame(x=x,yind=yind,a=a,b=b,c=c,d=d)
lmer.glmulti <- function (formula, data, random = "", ...) {
lmer(paste(deparse(formula), random), data = data, REML=F, ...)
}
summary(glmulti(formula = yind~a*b*c*d,
data = tmpdata,
random = '+(1|x)',
level = 2,
method = 'h',
crit = 'aicc',
marginality = TRUE,
fitfunc = lmer.glmulti))
lme4 version: 1.1.5
glmulti version: 1.0.7
"R version 3.0.2 (2013-09-25)"
SOLUTION
This works:
lmer.glmulti <- function (formula, data, random, ...) {
lmer(paste(deparse(formula), random), data = data)
}
glmulti(y = yind~a*b*c*d,
data = tmpdata,
random = '+(1|x)',
level = 2,
method = 'h',
crit = 'aicc',
marginality = TRUE,
fitfunc = lmer.glmulti)
packageVersion('lme4')
‘1.1.5’
packageVersion('glmulti')
‘1.0.7’
R.version: 3.1.0
FYI: From the package maintainer:
"fitfunc must be the name of a function so your other call including the function definition in the glmulti call cannot work."
"you named the first argument to glmulti 'formula', where it must be unnamed or 'y'... Sorry. But y is a formula (if passing a string it is the dependent variable only). "

Resources