Uniform kernel function returning NaN - r

I'm writing my own uniform kernel function like so:
uniform.kernel <- function(data, predict.at, iv.name, dv.name, bandwidth){
#Load in the DV/IV and turn them into vectors
iv <- data$iv.name
dv <- data$dv.name
#Given the point we're predicting,
#what kernel weights does each observation of the iv receive?
kernelvalue <- ifelse(abs((iv - predict.at)/bandwidth)<= 1, 0.5,0)
#Given these kernel values and the dv,
#what is our estimate of the conditional expectation?
conditional.expectation <-sum(kernelvalue*dv)/sum(kernelvalue)
#Return the expectation
return(conditional.expectation)
}
And then applying it to this data:
set.seed(101)
x <- seq(from=0, to=100, by=.1)
errors <- runif(min=.5, max=5000, n=length(x))
y <- x^2 - 3*x + errors^1.1
combo.frame <- cbind.data.frame(x,y)
Only, when I apply the function to the data (like below), I get "NaN".
uniform.kernel(combo.frame, 20, "x","y", 4)
However, when I just write out the steps within my function to the data set directly (without using the function), I get the correct answer. For example, I do the following and get the correct results:
kernelvalue <- ifelse(abs((combo.frame$x - 20)/4)<= 1, 0.5,0)
conditional.expectation <- sum(kernelvalue*combo.frame$y)/sum(kernelvalue)
Why am I getting NaN when I use the function?

You can't use the $ operator with character objects like that. Use the [ operator instead. Replace the first two lines in your function like this:
iv <- data[,iv.name]
dv <- data[,dv.name]
and it works as expected.

Related

R: How to plot custom range of polynomial produced by lm poly fit

I'm confused by the coefficients produced by the output of lm
Here's a copy of the data I'm working with
(postprocessed.csv)
"","time","value"
"1",1,2.61066016308988
"2",2,3.41246054742996
"3",3,3.8608767964033
"4",4,4.28686048552237
"5",5,4.4923132964825
"6",6,4.50557049744317
"7",7,4.50944447661246
"8",8,4.51097373134893
"9",9,4.48788748823809
"10",10,4.34603985656981
"11",11,4.28677073671406
"12",12,4.20065901625172
"13",13,4.02514194962519
"14",14,3.91360194972916
"15",15,3.85865748409081
"16",16,3.81318053258601
"17",17,3.70380706527433
"18",18,3.61552922363713
"19",19,3.61405310598722
"20",20,3.64591327503384
"21",21,3.70234435835577
"22",22,3.73503970503372
"23",23,3.81003078640584
"24",24,3.88201196162666
"25",25,3.89872518158949
"26",26,3.97432743542362
"27",27,4.2523675144599
"28",28,4.34654855854847
"29",29,4.49276038902684
"30",30,4.67830892029687
"31",31,4.91896819673664
"32",32,5.04350767355202
"33",33,5.09073406942046
"34",34,5.18510849382162
"35",35,5.18353176529036
"36",36,5.2210776270173
"37",37,5.22643491929207
"38",38,5.11137006553725
"39",39,5.01052467981257
"40",40,5.0361056705898
"41",41,5.18149486951409
"42",42,5.36334869132276
"43",43,5.43053620818444
"44",44,5.60001072279525
I have fitted a 4th order polynomial to this data using the following script:
library(ggplot2)
library(matrixStats)
library(forecast)
df_input <- read.csv("postprocessed.csv")
x <- df_input$time
y <- df_input$value
df <- data.frame(x, y)
poly4model <- lm(y~poly(x, degree=4), data=df)
v <- seq(30, 40)
vv <- poly4model$coefficients[1] +
poly4model$coefficients[2] * v +
poly4model$coefficients[3] * (v ^ 2) +
poly4model$coefficients[4] * (v ^ 3) +
poly4model$coefficients[5] * (v ^ 4)
pdf("postprocessed.pdf")
plot(df)
lines(v, vv, col="red", pch=20, lw=3)
dev.off()
I initially tried using the predict function to do this, but couldn't get that to work, so resorted to implementing this "workaround" using some new vectors v and vv to store the data for the line in the region I am trying to plot.
Ultimatly, I am trying to do this:
Fit a 4th order polynomial to the data
Plot the 4th order polynomial over the range of data in one color
Plot the 4th order polynomial over the range from the last value to the last value + 10 (prediction) in a different color
At the moment I am fairly sure using v and vv to do this is not "the best way", however I would have thought it should work. What is happening is that I get very large values.
Here is a screenshot from Desmos. I copied and pasted the same coefficients as shown by typing poly4model$coefficients into the console. However, something must have gone wrong because this function is nothing like the data.
I think I've provided enough info to be able to run this short script. However I will add the pdf as well.
It is easiest to use the predict function to create your line. To do that, you pass the model and a data frame with the desired independent variables to the predict function.
x <- df_input$time
y <- df_input$value
df <- data.frame(x, y)
poly4model <- lm(y~poly(x, degree=4), data=df)
v <- seq(30, 40)
#Notice the column in the dataframe is the same variable name
# as the variable in the model!
predict(poly4model, data.frame(x=v))
plot(df)
lines(v, predict(poly4model, data.frame(x=seq(30, 40))), col="red", pch=20, lw=3)
NOTE
The function poly "Returns or evaluates orthogonal polynomials of degree 1 to degree over the specified set of points x: these are all orthogonal to the constant polynomial of degree 0." To return the "normal" polynomial coefficients one needs to use the "raw=TRUE" option in the function.
poly4model <- lm(y~poly(x, degree=4, raw=TRUE), data=df)
Now your equation above will work.

Principal component analysis using R. Automatic and manual results do not match

Two different methods of the principal component analysis were conducted to analyze the following data (ch082.dat) using the Box1's R-code, below.https://drive.google.com/file/d/1xykl6ln-bUnXIs-jIA3n5S3XgHjQbkWB/view?usp=sharing
The first method uses the rotation matrix (See 'ans_mat' under the '#rotated data' of the Box1's code) and,
the second method uses the 'pcomp' function (See 'rpca' under the '#rotated data' of the Box1's code).
However, there is a subtle discrepancy in the answer between the method using the rotation matrix and the method using the 'pcomp' function.
make it match
My Question
What should I do so that the result of the rotation matrix -based method matches the result of the'pcomp' function?
As far as I've tried with various data, including other data, the actual discrepancies seem to be limited to scale shifts and mirroring transformations.
The results of the rotation matrix -based method is shown in left panel.
The results of the pcomp function -based method is shown in right panel.
Mirror inversion can be seen in "ch082.dat" data.(See Fig.1);
It seems that, in some j, the sign of the "jth eigenvector of the correlation matrix" and the sign of the "jth column of the output value of the prcomp function" may be reversed. If there is a degree of overlap in the eigenvalues, it is possible that the difference may be more complex than mirror inversion.
Fig.1
There is a scale shift for the Box2's data (See See Fig.2), despite the centralization and normalization to the data.
Fig.2
Box.1
#dataload
##Use the 'setwd' function to specify the directory containing 'ch082.dat'.
##For example, if you put this file directly under the C drive of your Windows PC, you can run the following command.
setwd("C:/") #Depending on where you put the file, you may need to change the path.
getwd()
w1<-read.table("ch082.dat",header = TRUE,row.names = 1,fileEncoding = "UTF-8")
w1
#Function for standardizing data
#Thanks to https://qiita.com/ohisama2/items/5922fac0c8a6c21fcbf8
standalize <- function(data)
{ for(i in length(data[1,]))
{
x <- as.matrix(data[,i])
y <- (x-mean(x)/sd(x))
data[,i] <- y
}
return(data)}
#Method using rotation matrix
z_=standalize(w1)
B_mat=cor(z_) #Compute correlation matrix
eigen_m <- eigen(B_mat)
sample_mat <- as.matrix(z_)
ans_mat=sample_mat
for(j in 1:length(sample_mat[1,])){
ans_mat[,j]=sample_mat%*%eigen_m$vectors[,j]
}
#Method using "rpca" function
rpca <- prcomp(w1,center=TRUE, scale=TRUE)
#eigen vectors
eigen_m$vectors
rpca
#rotated data
ans_mat
rpca$x
#Graph Plots
par(mfrow=c(1,2))
plot(
ans_mat[,1],
ans_mat[,2],
main="Rotation using eigenvectors"
)
plot(rpca$x[,1], rpca$x[,2],
main="Principal component score")
par(mfrow=c(1,1))
#summary
summary(rpca)$importance
Box2.
sample_data <- data.frame(
X = c(2,4, 6, 5,7, 8,10),
Y = c(6,8,10,11,9,12,14)
)
X = c(2,4, 6, 5,7, 8,10)
Y = c(6,8,10,11,9,12,14)
plot(Y ~ X)
w1=sample_data
Reference
https://logics-of-blue.com/principal-components-analysis/
(Written in Japanease)
The two sets of results agree. First we can simplify your code a bit. You don't need your function or the for loop:
z_ <- scale(w1)
B_mat <- cor(z_)
eigen_m <- eigen(B_mat)
ans_mat <- z_ %*% eigen_m$vectors
Now the prcomp version
z_pca <- prcomp(z_)
z_pca$sdev^2 # Equals eigen_m$values
z_pca$rotation # Equals eigen_m$vectors
z_pca$x # Equals ans_mat
Your original code mislabeled ans_mat columns. They are actually the principal component scores. You can fix that with
colnames(ans_mat) <- colnames(z_pca$x)
The pc loadings (and therefore the scores) are not uniquely defined with respect to reflection. In other words multiplying all of the loadings or scores in one component by -1 flips them but does not change their relationships to one another. Multiply z_pca$x[, 1] by -1 and the plots will match:
z_pca$x[, 1] <- z_pca$x[, 1] * -1
dev.new(width=10, height=6)
par(mfrow=c(1,2))
plot(ans_mat[,1], ans_mat[,2], main="Rotation using eigenvectors")
plot(z_pca$x[,1], z_pca$x[,2], main="Principal component score")

sampling random values each iteration

I have some simulated data, on top of the data I add some noise to see how the noise affects my data for further analyses. I created the following function
create.noise <- function(n, amount_needed, mean, sd){
set.seed(25)
values <- rnorm(n, mean, sd)
returned.values <- sample(values, size=amount_needed)
}
I call this function in the following loop:
dataframe.noises <- as.data.frame(noises) #i create here a dataframe dim 1x45 containing zeros
for(i in 1:100){
noises <- as.matrix(create.noise(100,45,0,1))
dataframe.noises[,i] <- noises
data_w_noise <- df.data_responses+noises
Estimators <- solve(transposed_schema %*% df.data_schema) %*% (transposed_schema %*% data_w_noise)
df.calculated_estimators[,i] <-Estimators
}
The code above always returns the same values, one solution I tried is sending i as parameter(which i think isn't correct) for each iteration I add i to the set.seed(25+i)
This gives me a unique value for each iteration, butas mentioned I don't think that this is the correct way to go with it.

Returning 'traditional' notations of functions in the context of fourier interpolation

in numerical analysis we students are obligated to implement code in R that given a function f(x) finds its Fourier interpolation tN(x) and computes the interpolation error
$||f(x)-t_{N}(x)||=\int_{0}^{2\pi}$ $|f(x)-t_{N}(x)|^2$
or a variety of different $N$
I first tried to compute the d-coefficients according to this formular:
$d = \frac 1N M y$
with M denoting the DFT matrix and y denoting a series of equidistant function values with
$y_j = f(x_j)$ and
$x_j = e^{\frac{2*pi*i}N*j}$
for $j = 1,..,N-1$.
My goal was to come up with a sum that can be described by:
$t_{N}(x) = \Sigma_{k=0}^{N-1} d_k * e^{i*k*x}$
Which would be easier to later integrate in sort of a subsequently additive notation.
f <- function(x) 3/(6+4*cos(x)) #first function to compare with
g <- function(x) sin(32*x) #second one
xj <- function(x,n) 2*pi*x/n
M <- function(n){
w = exp(-2*pi*1i/n)
m = outer(0:(n-1),0:(n-1))
return(w^m)
}
y <- function(n){
f(xj(0:(n-1),n))
}
transformFunction <- function(n, f){
d = 1/n * t(M(n)) %*% f(xj(0:(n-1),n))
script <- paste(d[1])
for(i in 2:n)
script <- paste0(script,paste0("+",d[i],"*exp(1i*x*",i,")"))
#trans <- sum(d[1:n] * exp(1i*x*(0:(n-1))))
return(script)
}
The main purpose of the transform function was, initially, to return a function - or rather: a mathematical expression - which could then be used in order to declarate my Fourier Interpolation Function. Problem is, based on my fairly limited knowledge, that I cannot integrate functions that still have sums nested in them (which is why I commented the corresponding line in the code).
Out of absolute desperation I then tried to paste each of the summands in form of text subsequently, only to parse them again as an expression.
So the main question that remains is: how do I return mathmatical expressions in a manner that allow me to use them as a function and later on integrate them?
I am sincerely sorry for any misunderstanding or confusion, as well as my seemingly amateurish coding.
Thanks in advance!
A function in R can return any class, so specifically also objects of class function. Hence, you can make trans a function of x and return that.
Since the integrate function requires a vectorized function, we use Vectorize before outputting.
transformFunction <- function(n, f){
d = 1/n * t(M(n)) %*% f(xj(0:(n-1),n))
## Output function
trans <- function(x) sum(d[1:n] * exp(1i*x*(0:(n-1))))
## Vectorize output for the integrate function
Vectorize(trans)
}
To integrate, now simply make a new variable with the output of transformFunction:
myint <- transformFunction(n = 10,f = f)
Test: (integrate can only handle real-valued functions)
integrate(function(x) Re(myint(x)),0,2)$value +
1i*integrate(function(x) Im(myint(x)),0,2)$value
# [1] 1.091337-0.271636i

R code to estimate parameters from non-linear function

Suppose I have a data set. There are some categorical variables and some numerical variables. I want to estimate the parameters of a model e^{X'b} for every categories and others. I am trying to do it in R code. I need to do it by creating design matrix for categorical variables like age==2 and age==3, where considering age==1 as reference category. But this program is not running and giving errors. What is the problem?
sex <- c("F","M","F","M","F","M")
age <- c(1,3,2,3,1,2) # categorical variable with three age categories
age <- as.factor(age)
dat <- data.frame(sex,age)
myfun <- function(par, data){
xx <- data
func <- exp(par[1]*(xx$age==2)+par[2]*(xx$age==3)+par[3]*factor(xx$sex))
return(-func)
}
optim(myfun, par=c(0.1,0.4,0.7), data=dat)
Your function myfun returns a vector of length 6 (because you multiply with xx$sex). It needs to be of length 1. Also, the optim function takes the par first and the function as second parameter.
EDIT: You need to rewrite your function to return a single value. -exp(X'b) is a vector of length of your observations. Maybe this goes in your direction:
myfun1 <- function(par, data) {
xx <- matrix(c(as.numeric(dat$sex), age, rep(1, nrow(dat))), ncol=3)
sum(-exp(xx %*% par))
}
optim(c(0.1,0.4,0.7), myfun1, data=dat)
Note that it would be more efficient to pass xx to optim since the calculation of xx is independent of the iteration.

Resources