Error in FUN(newX[, i], ...) : could not find function "sim" [closed] - r

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
I was trying to learn R using a book. I was trying to do an example where for each row of the matrix, the corresponding element of the vector will be either 1 or 0, depending on whether the majority of the
first d elements in that row is 1 or 0. The code used was:-
copymaj <- function(rw,d) {
maj <- sum(rw[1:d]) / d
return(if(maj > 0.5) 1 else 0)
}
x
[,1] [,2] [,3] [,4] [,5]
[1,] 1 0 1 1 0
[2,] 1 1 1 1 0
[3,] 1 0 0 1 1
[4,] 0 1 1 1 0
apply(x,1,copymaj,3)
It is showing the above error. Also if I do apply(x,1,copymaj(3)), still error is shown.
R 2.13 is installed
Please help!

As #BenBarnes pointed out, you probably misspelled sum, I think you wrote sim instead of sum.
I was able to reproduce your error by doing:
copymaj0 <- function(rw,d) {
maj <- sim(rw[1:d]) / d # here you have sim, this causes the error
return(if(maj > 0.5) 1 else 0)
}
copymaj1 <- function(rw,d) {
maj <- sum(rw[1:d]) / d # here you have sum which works well for me
return(if(maj > 0.5) 1 else 0)
}
x <- matrix(c(1,0,1,1,0,
1,1,1,1,0,
1,0,0,1,1,
0,1,1,1,0), ncol=5, byrow=TRUE)
apply(x,1,copymaj0,3) # prints error
Error in FUN(newX[, i], ...) : could not find function "sim"
apply(x,1,copymaj1,3) # works well
[1] 1 1 0 1
I really think you misspelled sum.
apply(x,1,copymaj1(3)) won't work becase if you read ?apply you'll see
apply(X, MARGIN, FUN, ...), then apply(x,1,copymaj1(3)) wil produce an error because
... replaces the arguments to FUN (d=3 in your case) is missed. In order to pass optional arguments to your fun you have to use ... as in apply(x,1,copymaj1,3) but not using apply(x,1,copymaj1(3)).

Related

Detecting zero in solution of linear system of equations (Ax=b)

Suppose the following system of equations Ax = b with:
> A <- matrix(c(2,0,-1,0,0,2,2,1,-1,2,0,0,0,1,0,0), ncol = 4)
> A
[,1] [,2] [,3] [,4]
[1,] 2 0 -1 0
[2,] 0 2 2 1
[3,] -1 2 0 0
[4,] 0 1 0 0
> b <- c(-2,5,0,0)
Solving these equations with solve() yields:
> x <- solve(A,b)
> x
[1] 6.66e-16 4.44e-16 2.00e+00 1.00e+00
This is just an example, but A and b can be of any form.
I need to detect whether any component of x is 0. Now, the first two components should actually be 0, but they are both higher than the machine epsilon .Machine$double.eps = 2.22e-16 which makes them very small, but not equal to zero.
I think I understand that this is caused by rounding errors in floating point arithmetic inside solve(). What I need to know is whether it is possible (from a practical point of view) to determine the higher bound of these errors, so 0s can be detected. For example, instead of
> x == 0
[1] FALSE FALSE FALSE FALSE
one would use something like this:
> x > -1e-15 & x < 1e-15
[1] TRUE TRUE FALSE FALSE
Giving more insight into this problem would be appreciated.
One way to approach this is to check if we can find a better solution to the linear system if we assume the components to be zero. For that we would want to solve A[3:4]%*%y=b since A%*%c(0,0,x[3],x[4])=A[3:4]%*%c(x[3],x[4]). This is an overdetermined system so we can't use solve to find a solution. We can however use qr.solve:
> x.new = c(0,0,qr.solve(A[,3:4],b))
It remains to check if this solution is really better:
> norm(A%*%x.new - b) < norm(A%*%x - b)
[1] TRUE
Thus we have a good reason to suspect that x[1]==x[2]==0.
In this simple example it is obviously possible to guess the true solution by looking at the approximate solution:
> x.true = c(0,0,2,1)
> norm(A%*%x.true - b)
[1] 0
This is however not very helpful in the general case.

How to implement q-learning in R?

I am learning about q-learning and found a Wikipedia post and this website.
According to the tutorials and pseudo code I wrote this much in R
#q-learning example
#http://mnemstudio.org/path-finding-q-learning-tutorial.htm
#https://en.wikipedia.org/wiki/Q-learning
set.seed(2016)
iter=100
dimension=5;
alpha=0.1 #learning rate
gamma=0.8 #exploration/ discount factor
# n x n matrix
Q=matrix( rep( 0, len=dimension*dimension), nrow = dimension)
Q
# R -1 is fire pit,0 safe path and 100 Goal state########
R=matrix( sample( -1:0, dimension*dimension,replace=T,prob=c(1,2)), nrow = dimension)
R[dimension,dimension]=100
R #reward matrix
################
for(i in 1:iter){
row=sample(1:dimension,1)
col=sample(1:dimension,1)
I=Q[row,col] #randomly choosing initial state
Q[row,col]=Q[row,col]+alpha*(R[row,col]+gamma*max(Qdash-Q[row,col])
#equation from wikipedia
}
But I have problem in max(Qdash-Q[row,col] which according to the website is Max[Q(next state, all actions)] How to I programmatically search all actions for next state?
The second problem is this pseudo code
Do While the goal state hasn't been reached.
Select one among all possible actions for the current state.
Using this possible action, consider going to the next state.
Get maximum Q value for this next state based on all possible actions.
Compute: Q(state, action) = R(state, action) + Gamma * Max[Q(next state, all actions)]
Set the next state as the current state.
End Do
Is it this
while(Q<100){
Q[row,col]=Q[row,col]+alpha*(R[row,col]+gamma*max(Qdash-Q[row,col])
}
This post is by no means a complete implementation of Q-learning in R. It is an attempt to answer the OP with regards to the description of the algorithm in the website linked in the post and in Wikipedia.
The assumption here is that the reward matrix R is as described in the website. Namely that it encodes reward values for possible actions as non-negative numbers, and -1's in the matrix represent null values (i.e., where there is no possible action to transition to that state).
With this setup, an R implementation of the Q update is:
Q[cs,ns] <- Q[cs,ns] + alpha*(R[cs,ns] + gamma*max(Q[ns, which(R[ns,] > -1)]) - Q[cs,ns])
where
cs is the current state at the current point in the path.
ns is the new state based on a (randomly) chosen action at the current state. This action is chosen from the collection of possible actions at the current state (i.e., for which R[cs,] > -1). Since the state transition itself is deterministic here, the action is the transition to the new state.
For this action resulting in ns, we want to add its maximum (future) value over all possible actions that can be taken at ns. This is the so-called Max[Q(next state, all actions)] term in the linked website and the "estimate of optimal future value" in Wikipedia. To compute this, we want to maximize over the ns-th row of Q but consider only columns of Q for which columns of R at the corresponding ns-th row are valid actions (i.e., for which R[ns,] > -1). Therefore, this is:
max(Q[ns, which(R[ns,] > -1)])
An interpretation of this value is a one-step look ahead value or an estimate of the cost-to-go in dynamic programming.
The equation in the linked website is the special case in which alpha, the learning rate, is 1. We can view the equation in Wikipedia as:
Q[cs,ns] <- (1-alpha)*Q[cs,ns] + alpha*(R[cs,ns] + gamma*max(Q[ns, which(R[ns,] > -1)]))
where alpha "interpolates" between the old value Q[cs,ns] and the learned value R[cs,ns] + gamma*max(Q[ns, which(R[ns,] > -1)]). As noted in Wikipedia,
In fully deterministic environments, a learning rate of alpha=1 is optimal
Putting it all together into a function:
q.learn <- function(R, N, alpha, gamma, tgt.state) {
## initialize Q to be zero matrix, same size as R
Q <- matrix(rep(0,length(R)), nrow=nrow(R))
## loop over episodes
for (i in 1:N) {
## for each episode, choose an initial state at random
cs <- sample(1:nrow(R), 1)
## iterate until we get to the tgt.state
while (1) {
## choose next state from possible actions at current state
## Note: if only one possible action, then choose it;
## otherwise, choose one at random
next.states <- which(R[cs,] > -1)
if (length(next.states)==1)
ns <- next.states
else
ns <- sample(next.states,1)
## this is the update
Q[cs,ns] <- Q[cs,ns] + alpha*(R[cs,ns] + gamma*max(Q[ns, which(R[ns,] > -1)]) - Q[cs,ns])
## break out of while loop if target state is reached
## otherwise, set next.state as current.state and repeat
if (ns == tgt.state) break
cs <- ns
}
}
## return resulting Q normalized by max value
return(100*Q/max(Q))
}
where the input parameters are:
R is the rewards matrix as defined in the blog
N is the number of episodes to iterate
alpha is the learning rate
gamma is the discount factor
tgt.state is the target state of the problem.
Using the example in the linked website as a test:
N <- 1000
alpha <- 1
gamma <- 0.8
tgt.state <- 6
R <- matrix(c(-1,-1,-1,-1,0,-1,-1,-1,-1,0,-1,0,-1,-1,-1,0,-1,-1,-1,0,0,-1,0,-1,0,-1,-1,0,-1,0,-1,100,-1,-1,100,100),nrow=6)
print(R)
## [,1] [,2] [,3] [,4] [,5] [,6]
##[1,] -1 -1 -1 -1 0 -1
##[2,] -1 -1 -1 0 -1 100
##[3,] -1 -1 -1 0 -1 -1
##[4,] -1 0 0 -1 0 -1
##[5,] 0 -1 -1 0 -1 100
##[6,] -1 0 -1 -1 0 100
Q <- q.learn(R,iter,alpha,gamma,tgt.state)
print(Q)
## [,1] [,2] [,3] [,4] [,5] [,6]
##[1,] 0 0 0.0 0 80 0.00000
##[2,] 0 0 0.0 64 0 100.00000
##[3,] 0 0 0.0 64 0 0.00000
##[4,] 0 80 51.2 0 80 0.00000
##[5,] 64 0 0.0 64 0 100.00000
##[6,] 0 80 0.0 0 80 99.99994

within function warnings when using while in a function

I write a function to calculate critical depth of water in a circular channel
while the flow (Q) and diameter (d) are given:
D_Critic<- function (Q,Dia) {
g=9.81
Diff=1
Phi=0.01
while(Diff>=0.001) {
A=16*Q*sqrt((2/g)*sin(Phi/2))
B=Dia^5/2*(Phi-sin(Phi))^3/2
Diff=A-B
Phi=Phi+0.001
Yc=Dia/2*(1-cos(Phi/2))
}
return(Yc)
}
now I want to use within function to bind Yc with dataframe DQ, but it returns only first calculated Yc and several repeated warnings:
Q<-c(2.5975,2.5900,2.4183,2.3077)
D<-c(1,1,1,1)
DQ<-data.frame(Q,D)
> D_Q<-within(DQ,Yc<-D_Critic( Q/2, D))
There were 50 or more warnings (use warnings() to see the first 50)
> D_Q
Q D Yc
1 2.5975 1 0.52609
2 2.5900 1 0.52609
3 2.4183 1 0.52609
4 2.3077 1 0.52609
> warnings()
Warning messages:
1: In while (Diff >= 0.001) { ... :
the condition has length > 1 and only the first element will be used
The while statement only takes one boolean value, e.g. Diff >= 0.001 where Diff must be a single number. In the first time you go through the loop, this is the case, as Diff equals 1. However, in the second instance, Diff becomes equal to A-B, where A and B are both vectors of length 4.
So, when your code reaches the second iteration, while generates a warning, as it is not sure how to deal with a vector of booleans. The choice it makes is to simply use the first element in the boolean vector, discarding the rest.
You need to consider what Diff actually is. Probably a single number, so Diff sum(A-B) or sum((A-B)^2). This would result in a single Diff value, and get rid of your errors. What Diff should exactly depends on the theory you are working on. Your text book should list this.
It was resolved with a trick:
Yc<-matrix(NA,length(DQ$Q),1)
for (i in 1:length(DQ$Q)) {
Yc[i,1]<- D_Critic(DQ$Q[i]/2,DQ$D[i])
}
> Yc
[,1]
[1,] 0.5260900
[2,] 0.5255907
[3,] 0.5163489
[4,] 0.5098512
DQ<-cbind(DQ,Yc)
> DQ
Q D Yc
1 2.5975 1 0.5260900
2 2.5900 1 0.5255907
3 2.4183 1 0.5163489
4 2.3077 1 0.5098512

Feature hashing in R for Text classification

I'm trying to implement feature hashing in R to help me with a text classification problem, but i'm not sure if i'm doing it the way it should be. Part of my code is based on this post: Hashing function for mapping integers to a given range?.
My code:
random.data = function(n = 200, wlen = 40, ncol = 10){
random.word = function(n){
paste0(sample(c(letters, 0:9), n, TRUE), collapse = '')
}
matrix(replicate(n, random.word(wlen)), ncol = ncol)
}
feature_hash = function(doc, N){
doc = as.matrix(doc)
library(digest)
idx = matrix(strtoi(substr(sapply(doc, digest), 28, 32), 16L) %% (N + 1), ncol = ncol(doc))
sapply(1:N, function(r)apply(idx, 1, function(v)sum(v == r)))
}
set.seed(1)
doc = random.data(50, 16, 5)
feature_hash(doc, 3)
[,1] [,2] [,3]
[1,] 2 0 1
[2,] 2 1 1
[3,] 2 0 1
[4,] 0 2 1
[5,] 1 1 1
[6,] 1 0 1
[7,] 1 2 0
[8,] 2 0 0
[9,] 3 1 0
[10,] 2 1 0
So, i'm basically converting the strings to integers using the last 5 hex digits of the md5 hash returned by digest. Questions:
1 - Is there any package that can do this for me? I haven't found any.
2 - Is it a good idea do use digest as hash function? If not, what can i do?
PS: I should test if it works before posting, but my files are quite big and take a lot of processing time, so i think it's more clever to someone point me in the right direction, because i'm sure i'm doing it wrong!
Thanks for nay help on this!
I don't know any existed CRAN package for this.
However, I wrote a package for myself to do feature hashing. The source code is here: https://github.com/wush978/FeatureHashing, but the API is different.
In my case, I use it to convert a data.frame to CSRMatrix, a customized sparse matrix in the package. I also implemented a helper function to convert the CSRMatrix to Matrix::dgCMatrix. For text classification, I guess the sparse matrix will be more suitable.
If you want to try it, please check the test script here: https://github.com/wush978/FeatureHashing/blob/master/tests/test-conver-to-dgCMatrix.R
Note that I only used it in Ubuntu, so I don't know if it works for windows or macs or not. Please feel free to ask me any question of the package on https://github.com/wush978/FeatureHashing/issues.

Trouble with applying a nested loop on a list

I have a list consisting of 3 elements:
datalist=list(a=datanew1,b=datanew2,c=datanew3)
datalist$a :
Inv_ret Firm size leverage Risk Liquidity Equity
17 0.04555968 17.34834 0.1323199 0.011292273 0.02471489 0
48 0.01405835 15.86315 0.6931730 0.002491093 0.12054914 0
109 0.04556252 16.91602 0.1714068 0.006235836 0.01194579 0
159 0.04753472 14.77039 0.3885720 0.007126830 0.06373028 0
301 0.03941040 16.94377 0.1805346 0.005450653 0.01723319 0
datalist$b :
Inv_ret Firm size leverage Risk Liquidity Equity
31 0.04020832 18.13300 0.09326265 0.015235240 0.01579559 0.005025379
62 0.04439078 17.84086 0.11016402 0.005486982 0.01266566 0.006559096
123 0.04543250 18.00517 0.12215307 0.011154742 0.01531451 0.002282790
173 0.03960613 16.45457 0.10828643 0.011506857 0.02385191 0.009003780
180 0.03139643 17.57671 0.40063094 0.003447233 0.04530395 0.000000000
datalist$c :
Inv_ret Firm size leverage Risk Liquidity Equity
92 0.03081029 19.25359 0.10513159 0.01635201 0.025760806 0.000119744
153 0.03280746 19.90229 0.11731517 0.01443786 0.006769735 0.011999005
210 0.04655847 20.12543 0.11622403 0.01418010 0.003125632 0.003802365
250 0.03301018 20.67197 0.13208234 0.01262499 0.009418828 0.021400052
282 0.04355975 20.03012 0.08588316 0.01918129 0.004213846 0.023657440
I am trying to create a cor.test on the datalist above :
Cor.tests=sapply(datalist,function(x){
for(h in 1:length(names(x))){
for(i in 1:length(names(x$h[i]))){
for(j in 1:length(names(x$h[j]))){
cor.test(x$h[,i],x$h[,j])$p.value
}}}})
But I get an error :
Error in cor.test.default(x$h[, i], x$h[, j]) :
'x' must be a numeric vector
Any suggestions about what I am doing wrong?
P.S. If I simply have one dataframe, datanew1 :
Inv_ret Firm size leverage Risk Liquidity Equity
17 0.04555968 17.34834 0.1323199 0.011292273 0.02471489 0
48 0.01405835 15.86315 0.6931730 0.002491093 0.12054914 0
109 0.04556252 16.91602 0.1714068 0.006235836 0.01194579 0
159 0.04753472 14.77039 0.3885720 0.007126830 0.06373028 0
301 0.03941040 16.94377 0.1805346 0.005450653 0.01723319 0
I use this loop :
results=matrix(NA,nrow=6,ncol=6)
for(i in 1:length(names(datanew1))){
for(j in 1:length(names(datanew1))){
results[i,j]<-cor.test(datanew1[,i],datanew1[,j])$p.value
}}
And the output is:
results :
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.000000e+00 7.085663e-09 3.128975e-10 3.018239e-02 4.806400e-10 0.475139526
[2,] 7.085663e-09 0.000000e+00 2.141581e-21 0.000000e+00 2.247825e-20 0.454032499
[3,] 3.128975e-10 2.141581e-21 0.000000e+00 2.485924e-25 2.220446e-16 0.108643838
[4,] 3.018239e-02 0.000000e+00 2.485924e-25 0.000000e+00 5.870007e-15 0.006783324
[5,] 4.806400e-10 2.247825e-20 2.220446e-16 5.870007e-15 0.000000e+00 0.558827862
[6,] 4.751395e-01 4.540325e-01 1.086438e-01 6.783324e-03 5.588279e-01 0.000000000
Which is exactly what I want. But I want to get 3 matrices, one for each element of the datalist above.
EDIT:
If I do as Joran says:
Cor.tests=lapply(datalist,function(x){
results=matrix(NA,nrow=6,ncol=6)
for(i in 1:length(names(x))){
for(j in 1:length(names(x))){
results[i,j]<-cor.test(x[,i],x[,j])$p.value
}}})
I get:
$a
NULL
$b
NULL
$c
NULL
This can be done without for loops.
1) A solution with base R:
lapply(datalist,
function(datanew) outer(seq_along(datanew),
seq_along(datanew),
Vectorize(function(x, y)
cor.test(datanew[ , x],
datanew[ , y])$p.value)))
2) A solution with the package psych:
library(psych)
lapply(datalist, function(datanew) corr.test(datanew)$p)
A modified version of approach in the question:
lapply(datalist, function(x) {
results <- matrix(NA,nrow=6,ncol=6)
for(i in 1:6){
for(j in 1:6){
results[i,j]<-cor.test(x[,i],x[,j])$p.value
}
}
return(results)
})
There were two major problems in these commands:
The matrix results was not returned. I added return(results)
to the function.
You want to have a 6 by 6 matrix whereas your data frames have
seven columns. I replaced 1:length(names(x)) with 1:6 in the
for loops.
I'm not going to attempt to provide you with working code, but hopefully what follows will help explain why what you're trying isn't working.
Let's look at the first few lines of your sapply call:
Cor.tests=sapply(datalist,function(x){
for(h in 1:length(names(x))){
for(i in 1:length(names(x$h[i]))){
Let's stop here and think for a moment about x$h[i]. At this points, x is the argument passed to your anonymous function in sapply (presumably either a data frame or matrix, I can't be sure from your question which it is).
At this point in your code, what is h? h is the index variable in the previous for loop, so initially h has the value 1. The $ operator is for selecting items from an object by name. Is there something in x named h? I think not.
But then things get even worse as you attempt to select the ith element within this non-existant thing named h inside x. I'm honestly not even sure what R's interpreter will do with that since you're referencing the variable i in the expression that is supposed to define the range of values for i. Circular, anyone?
If you simply remove all attempts at the third for loop, you should have more luck. Just take the working version, plop it down in the body of the anonymous function, and replace every occurrence of datanew1 with x.
Good luck.
(PS - You might want be happier with the output of lapply rather than sapply)

Resources