Implementing ECDF in R

Implementing ECDF in R - r

I'm trying to implement the R function ecdf().
I'm considering two cases: one with t 1-dimensional, the other with t as a vector.
#First case
my.ecdf<-function(x,t) {
indicator<-ifelse(x<=t,1,0)
out<-sum(indicator)/length(x)
out
}
#Second case
my.ecdf<-function(x,t) {
out<-length(t)
for(i in 1:length(t)) {
indicator<-ifelse(x<=t[i],1,0)
out[i]<-sum(indicator)/length(t)
}
out
}
How can I check whether I'm doing the right thing with the R function ecdf() or not? This function take as argument just x, therefore I can't specify the value of t.

You could just plot the results and see that it gives something very similar:
# slightly improved version of my.ecdf
my.ecdf<-function(x,t) {
out<-numeric(length(t))
for(i in 1:length(t)) {
indicator <- as.numeric(x<=t[i])
out[i] <- sum(indicator)/length(t)
}
out
}
# test 1
x <- rnorm(1000)
plot(ecdf(x))
lines(seq(-4, 4, length=1000),
my.ecdf(x, seq(-4, 4, length=1000)),
col='red')
# test 2
x <- rexp(1000)
plot(ecdf(x))
lines(seq(0, 8, length=1000),
my.ecdf(x, seq(0, 8, length=1000)),
col='red')

A general tip - you can view the source code of any function by typing its name into the console without parentheses or arguments:
edcf
function (x)
{
x <- sort(x)
n <- length(x)
if (n < 1)
stop("'x' must have 1 or more non-missing values")
vals <- unique(x)
rval <- approxfun(vals, cumsum(tabulate(match(x, vals)))/n,
method = "constant", yleft = 0, yright = 1, f = 0, ties = "ordered")
class(rval) <- c("ecdf", "stepfun", class(rval))
assign("nobs", n, envir = environment(rval))
attr(rval, "call") <- sys.call()
rval
}

Related

homals package for Nonlinear PCA in R: Error in dimnames(x) <- dn : length of 'dimnames' [1] not equal to array extent

I am trying to implement NLPCA (Nonlinear PCA) on a data set using the homals package in R but I keep on getting the following error message:
Error in dimnames(x) <- dn : length of 'dimnames' [1] not equal to array extent
The data set I use can be found in the UCI ML Repository and it's called dat when imported in R: https://archive.ics.uci.edu/ml/datasets/South+German+Credit+%28UPDATE%29
Here is my code (some code is provided once the data set is downloaded):
nlpcasouthgerman <- homals(dat, rank=1, level=c('nominal','numerical',rep('nominal',2),
'numerical','nominal',
rep('ordinal',2), rep('nominal',2),
'ordinal','nominal','numerical',
rep('nominal',2), 'ordinal',
'nominal','ordinal',rep('nominal',3)),
active=c(FALSE, rep(TRUE, 20)), ndim=3, verbose=1)
I am trying to predict the first attribute, therefore I set it to be active=FALSE.
The output looks like this (skipped all iteration messages):
Iteration: 1 Loss Value: 0.000047
Iteration: 2 Loss Value: 0.000044
...
Iteration: 37 Loss Value: 0.000043
Iteration: 38 Loss Value: 0.000043
Error in dimnames(x) <- dn :
length of 'dimnames' [1] not equal to array extent
I don't understand why this error comes up. I have used the same code on some other data set and it worked fine so I don't see why this error persists. Any suggestions about what might be going wrong and how I could fix this issue?
Thanks!

It seems the error comes from code generating NAs in the homals function, specifically for your data for the number_credits levels, which causes problems with sort(as.numeric((rownames(clist[[i]])))) and the attempt to catch the error, since one of the levels does not give an NA value.
So either you have to modify the homals function to take care of such an edge case, or change problematic factor levels. This might be something to file as a bug report to the package maintainer.
As a work-around in your case you could do something like:
levels(dat$number_credits)[1] <- "_1"
and the function should run without problems.
Edit:
I think one solution would be to change one line of code in the homals function, but no guarantee this does work as intended. Better submit a bug report to the package author/maintainer - see https://cran.r-project.org/web/packages/homals/ for the address.
Using rnames <- as.numeric(rownames(clist[[i]]))[order(as.numeric(rownames(clist[[i]])))] instead of rnames <- sort(as.numeric((rownames(clist[[i]])))) would allow the following code to identify NAs, but I am not sure why the author did not try to preserve factor levels outright.
Anyway, you could run a modified function in your local environment, which would require to explicitly call internal (not exported) homals functions, as shown below. Not necessarily the best approach, but would help you out in a pinch.
homals <- function (data, ndim = 2, rank = ndim, level = "nominal", sets = 0,
active = TRUE, eps = 0.000001, itermax = 1000, verbose = 0) {
dframe <- data
name <- deparse(substitute(dframe))
nobj <- nrow(dframe)
nvar <- ncol(dframe)
vname <- names(dframe)
rname <- rownames(dframe)
for (j in 1:nvar) {
dframe[, j] <- as.factor(dframe[, j])
levfreq <- table(dframe[, j])
if (any(levfreq == 0)) {
newlev <- levels(dframe[, j])[-which(levfreq == 0)]
}
else {
newlev <- levels(dframe[, j])
}
dframe[, j] <- factor(dframe[, j], levels = sort(newlev))
}
varcheck <- apply(dframe, 2, function(tl) length(table(tl)))
if (any(varcheck == 1))
stop("Variable with only 1 value detected! Can't proceed with estimation!")
active <- homals:::checkPars(active, nvar)
rank <- homals:::checkPars(rank, nvar)
level <- homals:::checkPars(level, nvar)
if (length(sets) == 1)
sets <- lapply(1:nvar, "c")
if (!all(sort(unlist(sets)) == (1:nvar))) {
print(cat("sets union", sort(unlist(sets)), "\n"))
stop("inappropriate set structure !")
}
nset <- length(sets)
mis <- rep(0, nobj)
for (l in 1:nset) {
lset <- sets[[l]]
if (all(!active[lset]))
(next)()
jset <- lset[which(active[lset])]
for (i in 1:nobj) {
if (any(is.na(dframe[i, jset])))
dframe[i, jset] <- NA
else mis[i] <- mis[i] + 1
}
}
for (j in 1:nvar) {
k <- length(levels(dframe[, j]))
if (rank[j] > min(ndim, k - 1))
rank[j] <- min(ndim, k - 1)
}
x <- cbind(homals:::orthogonalPolynomials(mis, 1:nobj, ndim))
x <- homals:::normX(homals:::centerX(x, mis), mis)$q
y <- lapply(1:nvar, function(j) homals:::computeY(dframe[, j], x))
sold <- homals:::totalLoss(dframe, x, y, active, rank, level, sets)
iter <- pops <- 0
repeat {
iter <- iter + 1
y <- homals:::updateY(dframe, x, y, active, rank, level, sets,
verbose = verbose)
smid <- homals:::totalLoss(dframe, x, y, active, rank, level,
sets)/(nobj * nvar * ndim)
ssum <- homals:::totalSum(dframe, x, y, active, rank, level, sets)
qv <- homals:::normX(homals:::centerX((1/mis) * ssum, mis), mis)
z <- qv$q
snew <- homals:::totalLoss(dframe, z, y, active, rank, level,
sets)/(nobj * nvar * ndim)
if (verbose > 0)
cat("Iteration:", formatC(iter, digits = 3, width = 3),
"Loss Value: ", formatC(c(smid), digits = 6,
width = 6, format = "f"), "\n")
r <- abs(qv$r)/2
ops <- sum(r)
aps <- sum(La.svd(crossprod(x, mis * z), 0, 0)$d)/ndim
if (iter == itermax) {
stop("maximum number of iterations reached")
}
if (smid > sold) {
warning(cat("Loss function increases in iteration ",
iter, "\n"))
}
if ((ops - pops) < eps)
break
else {
x <- z
pops <- ops
sold <- smid
}
}
ylist <- alist <- clist <- ulist <- NULL
for (j in 1:nvar) {
gg <- dframe[, j]
c <- homals:::computeY(gg, z)
d <- as.vector(table(gg))
lst <- homals:::restrictY(d, c, rank[j], level[j])
y <- lst$y
a <- lst$a
u <- lst$z
ylist <- c(ylist, list(y))
alist <- c(alist, list(a))
clist <- c(clist, list(c))
ulist <- c(ulist, list(u))
}
dimlab <- paste("D", 1:ndim, sep = "")
for (i in 1:nvar) {
if (ndim == 1) {
ylist[[i]] <- cbind(ylist[[i]])
ulist[[i]] <- cbind(ulist[[i]])
clist[[i]] <- cbind(clist[[i]])
}
options(warn = -1)
# Here is the line that I changed in the code:
# rnames <- sort(as.numeric((rownames(clist[[i]]))))
rnames <- as.numeric(rownames(clist[[i]]))[order(as.numeric(rownames(clist[[i]])))]
options(warn = 0)
if ((any(is.na(rnames))) || (length(rnames) == 0))
rnames <- rownames(clist[[i]])
if (!is.matrix(ulist[[i]]))
ulist[[i]] <- as.matrix(ulist[[i]])
rownames(ylist[[i]]) <- rownames(ulist[[i]]) <- rownames(clist[[i]]) <- rnames
rownames(alist[[i]]) <- paste(1:dim(alist[[i]])[1])
colnames(clist[[i]]) <- colnames(ylist[[i]]) <- colnames(alist[[i]]) <- dimlab
colnames(ulist[[i]]) <- paste(1:dim(as.matrix(ulist[[i]]))[2])
}
names(ylist) <- names(ulist) <- names(clist) <- names(alist) <- colnames(dframe)
rownames(z) <- rownames(dframe)
colnames(z) <- dimlab
dummymat <- as.matrix(homals:::expandFrame(dframe, zero = FALSE, clean = FALSE))
dummymat01 <- dummymat
dummymat[dummymat == 2] <- NA
dummymat[dummymat == 0] <- Inf
scoremat <- array(NA, dim = c(dim(dframe), ndim), dimnames = list(rownames(dframe),
colnames(dframe), paste("dim", 1:ndim, sep = "")))
for (i in 1:ndim) {
catscores.d1 <- do.call(rbind, ylist)[, i]
dummy.scores <- t(t(dummymat) * catscores.d1)
freqlist <- apply(dframe, 2, function(dtab) as.list(table(dtab)))
cat.ind <- sequence(sapply(freqlist, length))
scoremat[, , i] <- t(apply(dummy.scores, 1, function(ds) {
ind.infel <- which(ds == Inf)
ind.minfel <- which(ds == -Inf)
ind.nan <- which(is.nan(ds))
ind.nael <- which((is.na(ds) + (cat.ind != 1)) ==
2)
ds[-c(ind.infel, ind.minfel, ind.nael, ind.nan)]
}))
}
disc.mat <- apply(scoremat, 3, function(xx) {
apply(xx, 2, function(cols) {
(sum(cols^2, na.rm = TRUE))/nobj
})
})
result <- list(datname = name, catscores = ylist, scoremat = scoremat,
objscores = z, cat.centroids = clist, ind.mat = dummymat01,
loadings = alist, low.rank = ulist, discrim = disc.mat,
ndim = ndim, niter = iter, level = level, eigenvalues = r,
loss = smid, rank.vec = rank, active = active, dframe = dframe,
call = match.call())
class(result) <- "homals"
result
}

Time varying parameter-matrix in deSolve R

I am struggling with this for so long. I have a logistic growth function where the growth parameter
r is a matrix. The model is constructed in a way that I have as an output two N the N1 and N2.
I would like to be able to change the r parameter over time. When time < 50 I would like
r = r1 where
r1=matrix(c(
2,3),
nrow=1, ncol=2
When time >= 50 I would like r=r2 where
r2=matrix(c(
1,2),
nrow=1, ncol=2
Here is my function. Any help is highly appreciated.
rm(list = ls())
library(deSolve)
model <- function(time, y, params) {
with(as.list(c(y,params)),{
N = y[paste("N",1:2, sep = "")]
dN <- r*N*(1-N/K)
return(list(c(dN)))
})
}
r=matrix(c(
4,5),
nrow=1, ncol=2)
K=100
params <- list(r,K)
y<- c(N1=0.1, N2=0.2)
times <- seq(0,100,1)
out <- ode(y, times, model, params)
plot(out)
I would like ideally something like this but it does not work
model <- function(time, y, params) {
with(as.list(c(y,params)),{
N = y[paste("N",1:2, sep = "")]
r = ifelse(times < 10, matrix(c(1,3),nrow=1, ncol=2),
ifelse(times > 10, matrix(c(1,4),nrow=1, ncol=2), matrix(c(1,2),nrow=1, ncol=2)))
print(r)
dN <- r*N*(1-N/K)
return(list(c(dN)))
})
}
Thank you for your time.

Here a generic approach that uses an extended version of the approx function. Note also some further simplifications of the model function and the additional plot of the parameter values.
Edit changed according to the suggestion of Lewis Carter to make the parameter change at t=3, so that the effect can be seen.
library(simecol) # contains approxTime, a vector version of approx
model <- function(time, N, params) {
r <- approxTime(params$signal, time, rule = 2, f=0, method="constant")[-1]
K <- params$K
dN <- r*N*(1-N/K)
return(list(c(dN), r))
}
signal <- matrix(
# time, r[1, 2],
c( 0, 2, 3,
3, 1, 2,
100, 1, 2), ncol=3, byrow=TRUE
)
## test of the interpolation
approxTime(signal, c(1, 2.9, 3, 100), rule = 2, f=0, method="constant")
params <- list(signal = signal, K = 100)
y <- c(N1=0.1, N2=0.2)
times <- seq(0, 10, 0.1)
out <- ode(y, times, model, params)
plot(out)
For a small number of state variables like in the example, separate signals with approxfun from package stats will look less generic but may be slighlty faster.
As a further improvement, one may consider to replace the "hard" transitions with a more smooth one. This can then directly be formulated as a function without the need of approx, approxfun or approxTime.
Edit 2:
Package simecol imports deSolve, and we need only a small function from it. So instead of loading simecol it is also possible to include the approxTime function explicitly in the code. The conversion from data frame to matrix improves performance, but a matrix is preferred anyway in such cases.
approxTime <- function(x, xout, ...) {
if (is.data.frame(x)) {x <- as.matrix(x); wasdf <- TRUE} else wasdf <- FALSE
if (!is.matrix(x)) stop("x must be a matrix or data frame")
m <- ncol(x)
y <- matrix(0, nrow=length(xout), ncol=m)
y[,1] <- xout
for (i in 2:m) {
y[,i] <- as.vector(approx(x[,1], x[,i], xout, ...)$y)
}
if (wasdf) y <- as.data.frame(y)
names(y) <- dimnames(x)[[2]]
y
}

If you want to pass a matrix parameter you should pass a list of parameters and you can modify it inside the model when your time limit is exceeded (in the example below you don't even have to pass the r matrix to the model function)
library(deSolve)
model <- function(time, y, params) {
with(as.list(c(y,params)),{
if(time < 3) r = matrix(c(2,3), nrow = 1, ncol = 2)
else r = matrix(c(1,3), nrow = 1, ncol = 2)
N = y[paste("N",1:2, sep = "")]
dN <- r*N*(1-N/K)
return(list(c(dN)))
})
}
y <- c(N1=0.1, N2=0.2)
params <- list(r = matrix(c(0,0), nrow = 1, ncol = 2), K=100)
times <- seq(0,10,0.1)
out <- ode(y, times, model, params)
plot(out)
You can see examples of this for instance with Delay Differential Equations ?dede

Error message about the plot in regression model

I have an R programm for a regression that somehow gives me an error message that I do not understand. The regression model takes as input heat input heat data (Q_htg) and the corresponding temperature data (T_amb) and then builds a linear regression for those two variables. Afterwards I want to use the trained regression model to predict some outputs. Here is the code:
dalinearPowerScaling2.function <-
function(Dataset,
numberOfDaysForAggregation,
normOutsideTemperature) {
heatingPower <- Dataset$Q_htg
outSideTemperature <- Dataset$T_amb
aggregationLevel <- numberOfDaysForAggregation * 1440
index <- 0
meanValuesOutsideTemperature <-
vector(, length(outSideTemperature) / aggregationLevel)
for (i in seq(1, length(outSideTemperature), aggregationLevel)) {
sum <- 0
for (j in seq(i, i + aggregationLevel - 1, 1)) {
sum <- sum + outSideTemperature[j]
}
index <- index + 1
meanValuesOutsideTemperature[index] <- sum / aggregationLevel
}
index <- 0
meanValuesHeatingDemand <-
vector(, length(heatingPower) / aggregationLevel)
for (i in seq(1, length(heatingPower), aggregationLevel)) {
sum <- 0
for (j in seq(i, i + aggregationLevel - 1, 1)) {
sum <- sum + heatingPower[j]
}
index <- index + 1
meanValuesHeatingDemand[index] <- sum / aggregationLevel
}
linearModel <-
lm(meanValuesHeatingDemand ~ meanValuesOutsideTemperature)
abline(linearModel, col = "red")
pred <- predict(linearModel, data.frame(meanValuesOutsideTemperature = c(normOutsideTemperature)))
List<-list(meanValuesHeatingDemand, meanValuesOutsideTemperature)
List2 <- vector("list", length(heatingPower)/aggregationLevel)
for (i in seq(1, length(meanValuesHeatingDemand),1)){
List2 [[i]]<-c(meanValuesHeatingDemand[i], meanValuesOutsideTemperature[i])
}
List3<-List2[order(sapply(List2, function(x) x[1], simplify=TRUE), decreasing=FALSE)]
firstTemperatureWithHeatingDemand<-0
firstHeatingDemand<-0
for (i in seq(1, length(List3), 1)) {
if(List3[[i]][1]>0) {
firstTemperatureWithHeatingDemand<-List3[[i]][2]
firstHeatingDemand<-List3[[i]][1]
break}
}
regression2ValuesX <- vector(, 5)
regression2ValuesY <- vector(, 5)
regression2ValuesX [1] <- firstTemperatureWithHeatingDemand
regression2ValuesY [1] <-firstHeatingDemand
List3<-List2[order(sapply(List2, function(x) x[1], simplify=TRUE), decreasing=TRUE)]
for (i in seq(1, length(regression2ValuesX) - 1, 1)) {
regression2ValuesX[i + 1]<-List3[[i]][2]
regression2ValuesY[i + 1]<-List3[[i]][1]
}
plot(regression2ValuesX, regression2ValuesY)
linearModel2 <-
lm(regression2ValuesY ~ regression2ValuesX)
abline(linearModel2, col = "blue")
pred <- predict(linearModel2, data.frame(regression2ValuesX = c(normOutsideTemperature)))
paste("Predicted heating demand:", round(pred))
}
When I run with the command
linearPowerScaling2.function(data_heat_test, 1, -12)
I get the error message:
Error in int_abline(a = a, b = b, h = h, v = v, untf = untf, ...) :
plot.new has not been called yet
3.
int_abline(a = a, b = b, h = h, v = v, untf = untf, ...)
2.
abline(linearModel, col = "red") at LinearPowerScaling2_Function.R#33
1.
linearPowerScaling2.function(data_heat_test, 1, -12)
The data itself should be okay. Can anyone tell me, what the problem is?

Without reproducible minimal example it's hard to test if this solves it, but the error message tells you that you are calling abline() before calling plot().
That's exactly what happens on line 33...
Hope this helps.
Check here to see how to make a minimal reproducible example.

Manual simulation of Markov Chain in R (3)

I have tried to improve my previous code so that I can incorporate conditional probability.
Source Code
states <- c(1, 2)
alpha <- c(1, 1)/2
mat <- matrix(c(0.5, 0.5,
0, 1), nrow = 2, ncol = 2, byrow = TRUE)
# this function calculates the next state, if present state is given.
# X = present states
# pMat = probability matrix
nextX <- function(X, pMat)
{
#set.seed(1)
probVec <- vector() # initialize vector
if(X == states[1]) # if the present state is 1
{
probVec <- pMat[1,] # take the 1st row
}
if(X==states[2]) # if the prsent state is 2
{
probVec <- pMat[2,] # take the 2nd row
}
return(sample(states, 1, replace=TRUE, prob=probVec)) # calculate the next state
}
# this function simulates 5 steps
steps <- function(alpha1, mat1, n1)
{
vec <- vector(mode="numeric", length = n1+1) # initialize an empty vector
X <- sample(states, 1, replace=TRUE, prob=alpha1) # initial state
vec[1] <- X
for (i in 2:(n1+1))
{
X <- nextX(X, mat1)
vec[i] <- X
}
return (vec)
}
# this function repeats the simulation n1 times.
# steps(alpha1=alpha, mat1=mat, n1=5)
simulate <- function(alpha1, mat1, n1)
{
mattt <- matrix(nrow=n1, ncol=6, byrow=T);
for (i in 1:(n1))
{
temp <- steps(alpha1, mat1, 5)
mattt[i,] <- temp
}
return (mattt)
}
Execution
I created this function so that it can handle any conditional probability:
prob <- function(simMat, fromStep, toStep, fromState, toState)
{
mean(simMat[toStep+1, simMat[fromStep+1, ]==fromState]==toState)
}
sim <- simulate(alpha, mat, 10)
p <- prob(sim, 0,1,1,1) # P(X1=1|X0=1)
p
Output
NaN
Why is this source code giving NaN?
How can I correct it?

I didn't inspect the rest of your code, but it seems that only prob has a mistake; you are mixing up rows with columns and instead it should be
prob <- function(simMat, fromStep, toStep, fromState, toState)
mean(simMat[simMat[, fromStep + 1] == fromState, toStep + 1] == toState)
Then NaN still remains a valid possibility for the following reason. We are looking at a conditional probability P(X1=1|X0=1) which, by definition, is well defined only when P(X0=1)>0. The same holds with sample estimates: if there are no cases where X0=1, then the "denominator" in the mean inside of prob is zero. Thus, it cannot and should not be fixed (i.e., returning 0 in those cases would be wrong).

Creating a function in R but getting a replacement has length zero error

I tried to create a function f and create the function so when a value x is inserted, it spits out a function f from y.But, when I try to run the code to plot, it gives me an error that says that my y_value has no length.
f <- function(x){
if (x<0){
print(y_values<-x*x*x)
}
if(x>0 & x<=1){
print(y_values<-x*x)
}
if(x>1){
print(y_values<-sqrt(x))
}
}
x_values <- seq(-2, 2, by = 0.1)
y_values <- rep(NA, length(x_values))
for (i in seq_along(x_values)) {
x <- x_values[i]
y_values[i] <- f(x)
}
# output
plot(x_values, y_values, type = "l")

Two issues:
From ?print
‘print’ prints its argument and returns it invisibly (via
‘invisible(x)’)
So all your function f does is print the values to the console (instead of returning them).
As per your definition of f, the function does not know how to deal with x=0; so this will create a problem when you store the output of f(0) later.
We can fix these issues by slightly altering f as
f <- function(x) {
y_values <- NA
if (x<0){
y_values<-x*x*x
}
if(x>0 & x<=1){
y_values<-x*x
}
if(x>1){
y_values<-sqrt(x)
}
return(y_values)
}
Then
x_values <- seq(-2, 2, by = 0.1)
y_values <- rep(NA, length(x_values))
for (i in seq_along(x_values)) {
x <- x_values[i]
y_values[i] <- f(x)
}
plot(x_values, y_values, type = "l")
You could also use Vectorize to obtain a vectorised function f2, which allows you to pass x_values as a vector, thereby avoiding the explicit for loop:
f2 <- Vectorize(f)
x_values <- seq(-2, 2, by = 0.1)
y_values <- f2(x_values)
The resulting plot is the same.

I would recommend you explore other methods for coding something like this:
here is one option that doesn't use a for loop. If you are simply working on using for loops then the fix Mauritus Evers made should work for you.
library(tidyverse)
data.frame(x_values = seq(-2, 2, by = 0.1)) %>%
mutate(y_values = case_when(x_values < 0 ~ x_values^3,
x_values>=0 & x_values<=1 ~ x_values^2,
x_values>1 ~ sqrt(x_values))) %>%
ggplot(aes(x_values, y_values)) + geom_point()
note that I changed your code to produce output when x_value = 0.

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Implementing ECDF in R - r

Related

homals package for Nonlinear PCA in R: Error in dimnames(x) <- dn : length of 'dimnames' [1] not equal to array extent

Time varying parameter-matrix in deSolve R

Error message about the plot in regression model

Manual simulation of Markov Chain in R (3)

Creating a function in R but getting a replacement has length zero error

Categories

Resources