Functions like f=tanh the derivative can be written as df=f*(delta(i,j)-f). When g is a vector or column matrix, this can programmed as follows using a loop, transpose, and apply to calculate df.
set.seed(5)
g1<-matrix(rnorm(3),ncol=1)
f1<-tanh(g1)
df1a = matrix(NA,nrow=nrow(f1),ncol=nrow(f1))
df1b = matrix(NA,nrow=nrow(f1),ncol=nrow(f1))
for (i in seq(1,nrow(f1))) {
for (j in seq(1,nrow(f1))) {
df1a[i,j] = -f1[j]*f1[i]
df1b[i,j] = -f1[j]*f1[i]
}
df1b[i,i] = df1b[i,i]+f1[i]
}
df2a <- -f1 %*% t(f1)
df2b <- diag(as.list(f1))+df2a
df3a <- (-1)*apply(f1,1,'*',f1)
df3b <- diag(as.list(f1))+df3a
does_m1_m2_match = all.equal(df1b, df2b, tolerance = 1e-5)
does_m1_m3_match = all.equal(df1b, df3b, tolerance = 1e-5)
does_m1_m2_match
does_m1_m3_match
How to extend this when g is a matrix, and should be treated as a collection of column vectors. Here is the loop implementation. How to do the apply implementation?
ddf2a = array(NA,c(nrow(f2),nrow(f2),ncol(f2)))
ddf2b = array(NA,c(nrow(f2),nrow(f2),ncol(f2)))
for (k in seq(1,ncol(f2))) {
for (i in seq(1,nrow(f2))) {
for (j in seq(1,nrow(f2))) {
ddf2a[i,j,k] = -f2[j]*f2[i]
ddf2b[i,j,k] = -f2[j,k]*f2[i,k]
}
ddf2b[i,i,k] = ddf2b[i,i,k]+f2[i,k]
}
}
for (k in seq(1,ncol(f2))) {
does_m1_md_match = all.equal(df1b, ddf2b[,,k], tolerance = 1e-5)
print(paste('k',k,does_m1_md_match))
}
Related
I am working on data replacement in a matrix. The replaced data will be calculated by calculating the sd of the (1+2k)X(1+2k) matrix centered on the value.
replace.loop = function(n, m, k, pad){
#search the value row by row, column by column
for (i in n) {
for (j in m) {
pad[i,j] = sd(as.vector(pad[(i-k):(i+k),(j-k):(j+k)]))
}
}
return(pad) #return the matrix that finishing calculation
}
Is there any way to rewrite this function with any apply() function? I am an R starter learner, so I am not sure which apply() function I should use.
for example:
X = matrix(c(.5,.5,.4,.4,.3,.5,.5,.4,.3,.3,.4,.4,.3,.2,.2,.4,.4,.3,.2,.1,.3,.3,.2,.2,.1), ncol=5)
k = 1
pad.X = matrix(0, dim(X)[1]+2*k, dim(X)[2]+2*k)
n = (k+1):(dim(X)[1]+k); m = (k+1):(dim(X)[2]+k)
pad.X[n, m] = X
Thanks!
I created a copy of pad called padOut. mapply will return the results as a matrix, which you can then assign to the relevant portion of padOut using matrix indexing:
replace.apply = function(n, m, k, pad){
kk <- -k:k
idx <- expand.grid(n, m)
padOut <- pad
padOut[as.matrix(idx)] <- mapply(function(i) sd(pad[idx[i,1] + kk, idx[i, 2] + kk]), 1:(length(m)*length(n)))
return(padOut)
}
With your example data:
X = matrix(c(.5,.5,.4,.4,.3,.5,.5,.4,.3,.3,.4,.4,.3,.2,.2,.4,.4,.3,.2,.1,.3,.3,.2,.2,.1), ncol=5)
k = 1
pad.X = matrix(0, dim(X)[1]+2*k, dim(X)[2]+2*k)
n = (k+1):(dim(X)[1]+k); m = (k+1):(dim(X)[2]+k)
pad.X[n, m] = X
replace.loop = function(n, m, k, pad){
#search the value row by row, column by column
padOut <- pad
for (i in n) {
for (j in m) {
padOut[i,j] = sd(as.vector(pad[(i-k):(i+k),(j-k):(j+k)]))
}
}
return(padOut) #return the matrix that finishing calculation
}
pad1 <- replace.loop(n, m, k, pad.X)
pad2 <- replace.apply(n, m, k, pad.X)
identical(pad1, pad2)
#> [1] TRUE
below is a function to extract p-values from multiple Chi-Square tests and display them as a matrix. I'm trying to do the same, but to extract residuals instead. Any help is appreciated.
Sample data:
df <- data.frame(first_column = c(rep("E1_C1",5), rep("E1_C2",3), rep("E2_C2",7),rep("E3_C3",5)),
second_column = c(rep("E1_C1",3), rep("E1_C2",10), rep("E2_C2",4),rep("E3_C3",3)),
third_column = c(rep("E1_C1",7), rep("E1_C2",4), rep("E2_C2",3),rep("E3_C3",6)),
fourth_column = c(rep("E1_C1",4), rep("E1_C2",6), rep("E2_C2",6),rep("E3_C3",4))
)
Chi-square matrix function for P-Values:
chisqmatrix <- function(x) {
names = colnames(x); num = length(names)
m = matrix(nrow=num,ncol=num,dimnames=list(names,names))
for (i in 1:(num-1)) {
for (j in (i+1):num) {
#browser()
m[j,i] = chisq.test(x[, i, drop = TRUE],x[, j, drop = TRUE])$p.value
}
}
return (m)
}
Generate Chi-Square p-value matrix
res <- chisqmatrix(df)
res[, -ncol(res)]
In your case, the returned residuals is a 4x4 matrix. Instead of using a matrix to take the results, the following solution uses a list instead. This way you can have matrices of different sizes.
With minimal changes from your original code:
chisqlist <- function(x) {
names = colnames(x); num = length(names)
m = list()
index = 1
for (i in 1:(num-1)) {
for (j in (i+1):num) {
#browser()
m[[index]] = chisq.test(x[, i, drop = TRUE],x[, j, drop = TRUE])$residuals
index=index+1
}
}
return (m)
}
Edit:
I do prefer # Onyambu's answer, which I didn't see. It would be faster than a nested for loop.
Simply change your function from requesting $p.value to requesting $residuals. This will provide (observed - expected) / sqrt(expected). If you desire standardized residuals request $stdres.
chisqmatrix <- function(x) {
names = colnames(x); num = length(names)
m = matrix(nrow=num,ncol=num,dimnames=list(names,names))
for (i in 1:(num-1)) {
for (j in (i+1):num) {
#browser()
m[j,i] = chisq.test(x[, i, drop = TRUE],x[, j, drop = TRUE])$residuals
}
}
return (m)
}
Hi so I have the following function:
kde.cv = function(X,s) {
l = length(X)
log.fhat.vector = c()
for (i in 1:l) {
current.log.fhat = log ( kde(X[i],X[-i],s) )
log.fhat.vector[i] = current.log.fhat
}
CV.score = sum(log.fhat.vector)
return(CV.score)
}
I'd like to vectorize this without using any for loops or apply statements, can't seem to get around doing so. Help would be appreciated. Thanks.
EDIT: Given the responses, here are my answers to the questions posed.
Given requests for clarification, I will elaborate on the function inputs and on the user defined function inside the function given. So X here is a dataset in the form of a vector, specifically, a vector of length 7 in the dataset I used as an input to this function. The X I used this function for is c(-1.1653, -0.7538, -1.3218, -2.3394, -1.9766, -1.8718, -1.5041). s is a single scalar point set at 0.2 for the use of this function. kde is a user - defined function that I wrote. Here is the implementation:
kde = function(x,X,s){
l = length(x)
b = matrix(X,l,length(X),byrow = TRUE)
c = x - b
phi.matrix = dnorm(c,0,s)
d = rowMeans(phi.matrix)
return(d)
}
in this function, X is the same vector of data points used in kde.cv. s is also the same scalar value of 0.2 used in kde.cv. x is a vector of evaluation points for the function, I used seq(-2.5, -0.5, by = 0.1).
Here is an option using sapply
kde.cv = function(X,s)
sum(sapply(1:length(X), function(i) log(kde(X[i], X[-i], s))))
For convenience, please provide a more complete example. For example, the kde() function. Is that a customized function?
Alternative to sapply, you can try Vectorize(). There are some examples you can find on stack overflow.
Vectorize() vs apply()
Here is an example
f1 <- function(x,y) return(x+y)
f2 <- Vectorize(f1)
f1(1:3, 2:4)
[1] 3 5 7
f2(1:3, 2:4)
[1] 3 5 7
and the second example
f1 <- function(x)
{
new.vector<-c()
for (i in 1:length(x))
{
new.vector[i]<-sum(x[i] + x[-i])
}
return(sum(new.vector))
}
f2<-function(x)
{
f3<-function(y, i)
{
u<-sum(y[i]+y[-i])
return(u)
}
f3.v<-Vectorize(function(i) f3(y = x, i=i))
new.value<-f3.v(1:length(x))
return(sum(new.value))
}
f1(1:3)
[1] 24
f2(1:3)
[1] 24
Note: Vectorize is a wrapper for mapply
EDIT 1
According to the response, I edited your kde.cv function.
kde.cv = function(X,s) {
l = length(X)
log.fhat.vector = c()
for (i in 1:l) {
current.log.fhat = log ( kde(X[i],X[-i],s) )
log.fhat.vector[i] = current.log.fhat
}
CV.score = sum(log.fhat.vector)
return(CV.score)
}
kde = function(x,X,s){
l = length(x)
b = matrix(X,l,length(X),byrow = TRUE)
c = x - b
phi.matrix = dnorm(c,0,s)
d = rowMeans(phi.matrix)
return(d)
}
##### Vectorize kde.cv ######
kde.cv.v = function(X,s)
{
log.fhat.vector = c()
kde.v<-Vectorize(function(i) kde(X[i], X[-i], s))
CV.score <- sum(log(kde.v(1:length(X))))
return(CV.score)
}
X<-c(-1.1653, -0.7538, -1.3218, -2.3394, -1.9766, -1.8718, -1.5041)
s<-0.2
x<-seq(-2.5, -0.5, by = 0.1)
kde.cv(X, s)
[1] -10.18278
kde.cv.v(X, s)
[1] -10.18278
EDIT 2
Well, I think the following function may match your requirement. BTW, since the little x is not used in your kde.cv, I just edited both two functions
kde.cv.2 <- function(X,s)
{
log.fhat.vector<-log(kde.2(X, s))
CV.score = sum(log.fhat.vector)
return(CV.score)
}
kde.2<-function(X, s)
{
l <- length(X)
b <- matrix(rep(X, l), l, l, byrow = T)
c <- X - b
diag(c) <- NA
phi.matrix <- dnorm(c, 0, s)
d <- rowMeans(phi.matrix, na.rm = T)
return(d)
}
X<-c(-1.1653, -0.7538, -1.3218, -2.3394, -1.9766, -1.8718, -1.5041)
s<-0.2
kde.cv(X,s)
[1] -10.18278
kde.cv.2(X, s)
[1] -10.18278
I'm trying to loop through integers 1:1000 comparing the result of a function I've created with an R function. Specifically, I have:
floor.log2 = function(n) {
x = 1
i = 0
while (x <= n) {
x = 2*x
i = i + 1
}
print(i-1)
}
And I want to compare with:
floor(log(n, base = 2))
Every comparison loop I've created ends up printing each index 1:1000 - what is a succinct way to compare results for 1:1000 in these functions without R printing the indices?
I would modify the existing function you wrote to:
floor.log2 = function(n) {
x = 1
i = 0
while (x <= n) {
x = 2*x
i = i + 1
}
return(i-1)
}
To test:
iter <- 100 # How long you would like to test for
vec1 <- c() # Container of your custom function
vec2 <- c() # Container for the comparison function
for(i in 1:iter) {
vec1[i] <- floor.log2(i)
vec2[i] <- floor(log(i, base = 2))
}
Finally:
all(vec1 == vec2)
I am trying to create a function to calculate the Box-Cox transformation in R, where you iterate values of lambda (lambdas) in a formula to maximize L. What I ultimately want is a vector of L, such that for all i in lambda, there is a corresponding L value.
y <- c(256,256,231,101,256,213,241,246,207,143,287,240,262,234,146,255,184,161,252,229,283,132,218,113,194,237,181,262,104)
df <- 28
n=29
lambdas <- seq(-3,3,0.001)
L <- c(rep(NA,length(lambdas)))
for(i in lambdas) {
if(i != 0) {
yprime <- (((y^i)-1)/i)
} else
{ yprime <- log(y)
}
st2 <- var(yprime)
L <- (((-df/2)*(log(st2))) + ((i-1)*(df/n)*(sum(log(y)))))
}
What I typically end up with L as a vector of 1, with the final iteration calculated.
Use seq_along to generate an index for lambdas[] and L[]
for(i in seq_along(lambdas)) {
if(i != 0) {
yprime <- (((y^lambdas[i])-1)/lambdas[i])
} else {
yprime <- log(y)
}
st2 <- var(yprime)
L[i] <- (((-df/2)*(log(st2))) + ((lambdas[i]-1)*(df/n)*(sum(log(y)))))
}
plot(L)