integrate() gives horribly wrong answer:
integrate(function (x) dnorm(x, -5, 0.07), -Inf, Inf, subdivisions = 10000L)
# 2.127372e-23 with absolute error < 3.8e-23
The return value should obviously be 1 (normal distrubution integrates to 1), but integrate() returns ridiculously small number, with wrong error reporting, and no warning...
Any ideas?
This seems the default integrate() is horribly buggy... and I just found this by chance! Is there any reliable R package to compute numerical integration?
EDIT: I tried package pracma and I see the same problem! :
require(pracma)
integral(function (x) dnorm(x, -5, 0.07), -Inf, Inf)
# For infinite domains Gauss integration is applied!
# [1] 0
EDIT: Hmm... digging deeper, it seems that he has trouble to find the very narrow domain for the function which is numerically > 0. When I set the limits to certain (very close to 0, 1) quantiles, it starts to work:
integral(function (x) dnorm(x, -5, 0.07), qnorm(1e-10, -5, 0.07), qnorm(1 - 1e-10, -5, 0.07))
But anyway, this is quite horrible gotcha... wonder if there is any remedy for this.
From the online documentation: "Like all numerical integration routines, these evaluate the function on a finite set of points. If the function is approximately constant (in particular, zero) over nearly all its range it is possible that the result and error estimate may be seriously wrong."
I take this to mean "caveat emptor". I notice that in your example, the absolute error is greater than value of the integral. Given that you know f(x) > 0 for all x, at least it's giving you the chance to spot that something has gone wrong. It's down to you to take the opportunity.
integrate( function(x) dnorm(x, -5, 0.07), -20, 10, subdivisions=1000L)
Gives
1 with absolute error < 9.8e-07
The warning in the online doc says to me that, given your apparent definition of buggy, the answer to your question is "no, there is no reliable numerical intergration method. Not in R or any other language". No numerical integration technique should be used blindly. The user needs to check their inputs are sensible and the output is reasonable. It's no good believing an answer just because the computer gave it to you.
See also this post.
Expanding a little further on #r2evan's and #Limey's comments:
#Limey: for very general problems like this, there is simply no way to guarantee a generic solution.
One way to solve such problem is to use more knowledge of the properties of the integrand (#r2evans's answer); the answer referenced by #Limey goes into detail for a different problem.
One "gotcha" that you may not have thought of is that trying out a bunch of generic methods, tuning settings, etc. may mislead you into concluding that some settings/methods are generically better than the first one you tried that failed to get the right answer. (Methods that work may work better because they're generically better, but trying them on one example doesn't prove it!)
As an example, the description of pcubature() (in ?cubature::pcubature says
This algorithm is often superior to h-adaptive integration for
smooth integrands in a few (<=3) dimensions, but is a poor choice
in higher dimensions or for non-smooth integrands.
However, recall that pcubature() happens to fail for your example, which is a smooth low-dimensional case - exactly where pcubature() is supposed to perform better - which suggests that it may be just luck that hcubature() works and pcubature() doesn't in this case.
An illustration of how sensitive the results can be to parameters (lower/upper limits in this case):
library(emdbook)
cc <- curve3d(integrate( dnorm, mean=-5, sd=0.07,
lower=x, upper=y, subdivisions=1000L)$value,
xlim=c(-30,-10), ylim=c(0,30), n = c(61, 61),
sys3d="image", col=c("black", "white"),
xlab="lower", ylab="upper")
White squares are successful (integral=1), black squares are bad (integral=0).
Try package cubature.
library(cubature)
hcubature(function (x) dnorm(x, -5, 0.07), -Inf, Inf)
#$integral
#[1] 1
#
#$error
#[1] 9.963875e-06
#
#$functionEvaluations
#[1] 405
#
#$returnCode
#[1] 0
Note that function pcubature in the same package also returns 0.
From vignette("cubature"), section Introduction. My emphasis.
This R cubature package exposes both the hcubature and pcubature
routines of the underlying C cubature library, including the
vectorized interfaces.
Per the documentation, use of pcubature is advisable only for smooth
integrands in dimensions up to three at most. In fact, the pcubature
routines perform significantly worse than the vectorized hcubature
in inappropriate cases. So when in doubt, you are better off using
hcubature.
Since in this case the integrand is the normal density, a smooth and 1-dimensional function, there would be reasons to prefer pcubature. But it doesn't give the right result. The vignette concludes the following.
Vectorized hcubature seems to be a good starting point.
For smooth integrands in low dimensions (≤3), pcubature might be worth trying out. Experiment before using in a production package.
Interesting workaround: not too surprisingly, integrate does well when the values sampled (on (-Inf,Inf), no less) are closer to the "center" of the data. You can reduce this by using your function but hinting at a center:
Without adjustment:
t(sapply(-10:10, function(i) integrate(function (x) dnorm(x, i, 0.07), -Inf, Inf, subdivisions = 10000L)))
# value abs.error subdivisions message call
# [1,] 0 0 1 "OK" Expression
# [2,] 1 4.611403e-05 10 "OK" Expression
# [3,] 6.619713e-19 1.212066e-18 2 "OK" Expression
# [4,] 7.344551e-71 0 2 "OK" Expression
# [5,] 3.389557e-06 6.086176e-06 3 "OK" Expression
# [6,] 2.127372e-23 3.849798e-23 2 "OK" Expression
# [7,] 1 3.483439e-05 8 "OK" Expression
# [8,] 1 6.338078e-07 11 "OK" Expression
# [9,] 1 3.408389e-06 7 "OK" Expression
# [10,] 1 6.414833e-07 8 "OK" Expression
# [11,] 1 7.578907e-06 3 "OK" Expression
# [12,] 1 6.414833e-07 8 "OK" Expression
# [13,] 1 3.408389e-06 7 "OK" Expression
# [14,] 1 6.338078e-07 11 "OK" Expression
# [15,] 1 3.483439e-05 8 "OK" Expression
# [16,] 2.127372e-23 3.849798e-23 2 "OK" Expression
# [17,] 3.389557e-06 6.086176e-06 3 "OK" Expression
# [18,] 7.344551e-71 0 2 "OK" Expression
# [19,] 6.619713e-19 1.212066e-18 2 "OK" Expression
# [20,] 1 4.611403e-05 10 "OK" Expression
# [21,] 0 0 1 "OK" Expression
If we add a "centering" hint, though, we get more consistent results:
t(sapply(-10:10, function(i) integrate(function (x, offset) dnorm(x + offset, i, 0.07), -Inf, Inf, subdivisions = 10000L, offset = i)))
# value abs.error subdivisions message call
# [1,] 1 7.578907e-06 3 "OK" Expression
# [2,] 1 7.578907e-06 3 "OK" Expression
# [3,] 1 7.578907e-06 3 "OK" Expression
# [4,] 1 7.578907e-06 3 "OK" Expression
# [5,] 1 7.578907e-06 3 "OK" Expression
# [6,] 1 7.578907e-06 3 "OK" Expression
# [7,] 1 7.578907e-06 3 "OK" Expression
# [8,] 1 7.578907e-06 3 "OK" Expression
# [9,] 1 7.578907e-06 3 "OK" Expression
# [10,] 1 7.578907e-06 3 "OK" Expression
# [11,] 1 7.578907e-06 3 "OK" Expression
# [12,] 1 7.578907e-06 3 "OK" Expression
# [13,] 1 7.578907e-06 3 "OK" Expression
# [14,] 1 7.578907e-06 3 "OK" Expression
# [15,] 1 7.578907e-06 3 "OK" Expression
# [16,] 1 7.578907e-06 3 "OK" Expression
# [17,] 1 7.578907e-06 3 "OK" Expression
# [18,] 1 7.578907e-06 3 "OK" Expression
# [19,] 1 7.578907e-06 3 "OK" Expression
# [20,] 1 7.578907e-06 3 "OK" Expression
# [21,] 1 7.578907e-06 3 "OK" Expression
I recognize this is mitigation for heuristics, presumes knowing something about your distribution before integration, and is not a perfect "generic" solution. Just offering another perspective.
Related
Suppose the following system of equations Ax = b with:
> A <- matrix(c(2,0,-1,0,0,2,2,1,-1,2,0,0,0,1,0,0), ncol = 4)
> A
[,1] [,2] [,3] [,4]
[1,] 2 0 -1 0
[2,] 0 2 2 1
[3,] -1 2 0 0
[4,] 0 1 0 0
> b <- c(-2,5,0,0)
Solving these equations with solve() yields:
> x <- solve(A,b)
> x
[1] 6.66e-16 4.44e-16 2.00e+00 1.00e+00
This is just an example, but A and b can be of any form.
I need to detect whether any component of x is 0. Now, the first two components should actually be 0, but they are both higher than the machine epsilon .Machine$double.eps = 2.22e-16 which makes them very small, but not equal to zero.
I think I understand that this is caused by rounding errors in floating point arithmetic inside solve(). What I need to know is whether it is possible (from a practical point of view) to determine the higher bound of these errors, so 0s can be detected. For example, instead of
> x == 0
[1] FALSE FALSE FALSE FALSE
one would use something like this:
> x > -1e-15 & x < 1e-15
[1] TRUE TRUE FALSE FALSE
Giving more insight into this problem would be appreciated.
One way to approach this is to check if we can find a better solution to the linear system if we assume the components to be zero. For that we would want to solve A[3:4]%*%y=b since A%*%c(0,0,x[3],x[4])=A[3:4]%*%c(x[3],x[4]). This is an overdetermined system so we can't use solve to find a solution. We can however use qr.solve:
> x.new = c(0,0,qr.solve(A[,3:4],b))
It remains to check if this solution is really better:
> norm(A%*%x.new - b) < norm(A%*%x - b)
[1] TRUE
Thus we have a good reason to suspect that x[1]==x[2]==0.
In this simple example it is obviously possible to guess the true solution by looking at the approximate solution:
> x.true = c(0,0,2,1)
> norm(A%*%x.true - b)
[1] 0
This is however not very helpful in the general case.
Presently, I am working through the above in the RStudio help file, which contains the following sample:
##
## rbprobitGibbs example
##
if(nchar(Sys.getenv("LONG_TEST")) != 0) {R=2000} else {R=10}
set.seed(66)
simbprobit = function(X,beta) {
## function to simulate from binary probit including x variable
y=ifelse((X%*%beta+rnorm(nrow(X)))<0,0,1)
list(X=X,y=y,beta=beta)
}
nobs=200
X=cbind(rep(1,nobs),runif(nobs),runif(nobs))
beta=c(0,1,-1)
nvar=ncol(X)
simout=simbprobit(X,beta)
Data1=list(X=simout$X,y=simout$y)
Mcmc1=list(R=R,keep=1)
out=rbprobitGibbs(Data=Data1,Mcmc=Mcmc1)
summary(out$betadraw,tvalues=beta)
if(0){
## plotting example
plot(out$betadraw,tvalues=beta)
}
When I step through the code, I don't see anywhere that the A matrix is set. It is only when I reach this line:
out=rbprobitGibbs(Data=Data1,Mcmc=Mcmc1)
That I see the A matrix displayed in the output, which I understand has to be a k * k matrix, where betabar is k * 1 matrix.
Prior Parms:
betabar
# [1] 0 0 0
A
# [,1] [,2] [,3]
# [1,] 0.01 0.00 0.00
# [2,] 0.00 0.01 0.00
# [3,] 0.00 0.00 0.01
So I can understand how A gets its dimensions; however, what is not clear to my is how the values in A are set to 0.01. I am trying to figure out how I can allow a user calling the rbprobitGibbs function to set the precision via A to whatever they like. I can see where A is output, but how are its values based on some input? Does anyone have any suggestions? TIA.
UPDATE:
Here is the output produced, but as far as I can determine it is identical whether I use prior = list(rep(0,3), .2*diag(3)) or not:
> out
$betadraw
[,1] [,2] [,3]
[1,] 0.3565099 0.6369436 -0.9859025
[2,] 0.4705437 0.7211755 -1.1955608
[3,] 0.1478930 0.6538157 -0.6989660
[4,] 0.4118663 0.7910846 -1.3919411
[5,] 0.0385419 0.9421720 -0.7359932
[6,] 0.1091359 0.7991905 -0.7731041
[7,] 0.4072556 0.5183280 -0.7993501
[8,] 0.3869478 0.8116237 -1.2831395
[9,] 0.8893555 0.5448905 -1.8526630
[10,] 0.3165972 0.6484716 -0.9857531
attr(,"class")
[1] "bayesm.mat" "mcmc"
attr(,"mcpar")
[1] 1 10 1
It gets this factor by a scaling constant on the prior precision matrix. In the source, you will note that if you do not supply a prior precision then it will generate a square k matrix and multiply it by .1. Nothing fancy here. These scaling parameters for all of the various functions in bayesm can be found in the ./bayesm/R/bayesmConstants.R file.
if (is.null(Prior$A)) {
A = BayesmConstant.A * diag(nvar)
}
Should you like to you could supply your own constant, say .2, you could do so as follows, prior = list(rep(0,k), .2*diag(k)), or even introduce some relational information into the prior.
Very late to the party, but I ran across this same issue and just figured it out. In order to change the A matrix and prior matrix you have to name them as well since all of your other input variables are named.
For example your code should be,
rbprobitGibbs(Data=Data1, Prior=list(betabar=betabar1, A=A1), Mcmc=Mcmc1)
If you do that, you are able to set your own values for betabar and A.
I want to find the solution of:
-x^3+6*x^2+51*x+44=0
but with R. Is it possible?
I found the package Ryacas, but nobody seems to be able to make it work.
May sound trivial, but I'm not able to find an easy way to do this...
Do you have an alternative?
Thanks guys!
You can use polynom package:
library(polynom)
p <- polynomial(c(44,51,6,-1))
# 44 + 51*x + 6*x^2 - x^3
solve(p)
# [1] -4 -1 11
But you simply can use the function polyroot from base package:
polyroot(c(44,51,6,-1))
# [1] -1+0i -4+0i 11+0i
If you keep the real part with Re:
Re(polyroot(c(44,51,6,-1)))
# [1] -1 -4 11
Here we solve for the roots using the relationship between a matrix and its characteristic polynomial.
Given the polynomial a0 + a1*x^1 + a2*x^2 + x^3, define the matrix:
0 0 -a0
1 0 -a1
0 1 -a2
The eigenvalues of this matrix are the roots of the polynomial.
Substituting y = -x in your polynomial equation gives this
y^3 + 6*y^2 - 51*y + 44=0
And gives this example
> z <- matrix(c(0,1,0,0,0,1,-44,51,-6),3,3)
> z
[,1] [,2] [,3]
[1,] 0 0 -44
[2,] 1 0 51
[3,] 0 1 -6
> eigen(z)
$values
[1] -11 4 1
$vectors
[,1] [,2] [,3]
[1,] 0.6172134 0.73827166 0.98733164
[2,] -0.7715167 -0.67115606 -0.15707549
[3,] 0.1543033 -0.06711561 -0.02243936
Or, since we've substituted -y for x:
> eigen(-z)$values
[1] 11 -4 -1
See: http://www-math.mit.edu/~edelman/publications/polynomial_roots.pdf
I just stumbled upon this question and I am not sure if anything inherently changed around the Ryacas package, but it seems to work great in 2020, here is a helpful vignette to get started: https://cran.r-project.org/web/packages/Ryacas/vignettes/getting-started.html
Following the vignette, things seem to work as expected when I run the code:
library(Ryacas)
# initialize equation:
eq <- "-x^3+6*x^2+51*x+44"
# simplify the equation:
library(glue)
yac_str(glue("Simplify({eq})"))
[1] "6*x^2-x^3+51*x+44"
# factor:
yac_str(glue("Factor({eq})"))
[1] "(-1)*(x-11)*(x+4)*(x+1)"
You can evaluate the expression like this, plugging in whatever values for x:
# evaluate
evaluate(eq,list(x=c(0,1,10,100,-100)))
[[1]]
$src
[1] "-x^3+6*x^2+51*x+44"
attr(,"class")
[1] "source"
[[2]]
[1] "[1] 44 100 154 -934856 1054944\n"
Here you can see the results where x=0 produced an answer of 44, x=1 produced an answer of 100, etc...
If you evaluated the new simplified or factored versions and evaluated those, you would of course end up with the same exact results:
evaluate(yac_str(glue("Simplify({eq})")),list(x=c(0,1,10,100,-100)))
[[1]]
$src
[1] "6*x^2-x^3+51*x+44"
attr(,"class")
[1] "source"
[[2]]
[1] "[1] 44 100 154 -934856 1054944\n"
Notice the formula changed in the $src output, but we get the same results.
Here's the factored one too:
evaluate(yac_str(glue("Factor({eq})")),list(x=c(0,1,10,100,-100)))
[[1]]
$src
[1] "(-1)*(x-11)*(x+4)*(x+1)"
attr(,"class")
[1] "source"
[[2]]
[1] "[1] 44 100 154 -934856 1054944\n"
The only real difference between what I outlined here and what's outlined in the vignette is the actual formula, and the fact that I used library(glue) instead of paste0, which is also a fair option.
I'm trying to match stock trades from one data frame with the mid-quote that was prevailing during that time. Thus, the time stamps don't match exactly but I have just a corresponding time interval of quotes for the time the trade happened.
I wrote a loop which works but since I know that loops should be avoided whenever possible, I looked out for an alternative.
First, this is my loop:
t=dim(x1)[1]
z=1
for (i in 1:t) {
flag=FALSE
while(flag==FALSE){
if(x1[z,1]>x2[i,1]){
x2[i,2]=x1[z-1,2]
flag=TRUE
}
else {
z=z+1
}
}
}
I've found the advice on Stack Overflow to merge the two arrays, so I added the upper bound of the interval as another column and matched the corresponding times with the subset-function.
Unfortunately, this method takes far more time than the loop. I assume it's due to the huge array that is created by merging. The data frames with the quotes have like 500.000 observations and the transaction data 100.000.
Is there a more elegant (and especially faster) way to solve this problem?
Furthermore, for some data I get the error message "missing value where TRUE/FALSE needed", even though the if-condition works when I do it manually.
edit:
My quote data would look like this:
Time midquote
[1,] 35551 50.85229
[2,] 35589 53.77627
[3,] 36347 54.27945
[4,] 37460 52.01283
[5,] 37739 53.65414
[6,] 38249 52.34947
[7,] 38426 50.59568
[8,] 39858 53.75646
[9,] 40219 51.38876
[10,] 40915 52.09319
and my transaction data:
Time midquote
[1,] 36429 0
[2,] 38966 0
[3,] 39334 0
[4,] 39998 0
[5,] 40831 0
So I want to know the midquotes from the time in the latter from the corresponding time of the former. The time in the example is in seconds from midnight.
For your example datasets, the following approach is faster:
x2[ , 2] <- x1[vapply(x2[, 1], function(x) which(x <= x1[, 1])[1] - 1L,
FUN.VALUE = integer(1)), 2]
# Time midquote
# [1,] 36429 54.27945
# [2,] 38966 50.59568
# [3,] 39334 50.59568
# [4,] 39998 53.75646
# [5,] 40831 51.38876
A second approach:
o <- order(c(x1[ , 1], x2[ , 1]))
tmp <- c(x1[ , 2], x2[ , 2])[o]
idx <- which(!tmp)
x2[ , 2] <- tmp[unlist(tapply(idx, c(0, cumsum(diff(idx) > 1)),
function(x) x - seq_along(x)), use.names = FALSE)]
# Time midquote
# [1,] 36429 54.27945
# [2,] 38966 50.59568
# [3,] 39334 50.59568
# [4,] 39998 53.75646
# [5,] 40831 51.38876
I have a list consisting of 3 elements:
datalist=list(a=datanew1,b=datanew2,c=datanew3)
datalist$a :
Inv_ret Firm size leverage Risk Liquidity Equity
17 0.04555968 17.34834 0.1323199 0.011292273 0.02471489 0
48 0.01405835 15.86315 0.6931730 0.002491093 0.12054914 0
109 0.04556252 16.91602 0.1714068 0.006235836 0.01194579 0
159 0.04753472 14.77039 0.3885720 0.007126830 0.06373028 0
301 0.03941040 16.94377 0.1805346 0.005450653 0.01723319 0
datalist$b :
Inv_ret Firm size leverage Risk Liquidity Equity
31 0.04020832 18.13300 0.09326265 0.015235240 0.01579559 0.005025379
62 0.04439078 17.84086 0.11016402 0.005486982 0.01266566 0.006559096
123 0.04543250 18.00517 0.12215307 0.011154742 0.01531451 0.002282790
173 0.03960613 16.45457 0.10828643 0.011506857 0.02385191 0.009003780
180 0.03139643 17.57671 0.40063094 0.003447233 0.04530395 0.000000000
datalist$c :
Inv_ret Firm size leverage Risk Liquidity Equity
92 0.03081029 19.25359 0.10513159 0.01635201 0.025760806 0.000119744
153 0.03280746 19.90229 0.11731517 0.01443786 0.006769735 0.011999005
210 0.04655847 20.12543 0.11622403 0.01418010 0.003125632 0.003802365
250 0.03301018 20.67197 0.13208234 0.01262499 0.009418828 0.021400052
282 0.04355975 20.03012 0.08588316 0.01918129 0.004213846 0.023657440
I am trying to create a cor.test on the datalist above :
Cor.tests=sapply(datalist,function(x){
for(h in 1:length(names(x))){
for(i in 1:length(names(x$h[i]))){
for(j in 1:length(names(x$h[j]))){
cor.test(x$h[,i],x$h[,j])$p.value
}}}})
But I get an error :
Error in cor.test.default(x$h[, i], x$h[, j]) :
'x' must be a numeric vector
Any suggestions about what I am doing wrong?
P.S. If I simply have one dataframe, datanew1 :
Inv_ret Firm size leverage Risk Liquidity Equity
17 0.04555968 17.34834 0.1323199 0.011292273 0.02471489 0
48 0.01405835 15.86315 0.6931730 0.002491093 0.12054914 0
109 0.04556252 16.91602 0.1714068 0.006235836 0.01194579 0
159 0.04753472 14.77039 0.3885720 0.007126830 0.06373028 0
301 0.03941040 16.94377 0.1805346 0.005450653 0.01723319 0
I use this loop :
results=matrix(NA,nrow=6,ncol=6)
for(i in 1:length(names(datanew1))){
for(j in 1:length(names(datanew1))){
results[i,j]<-cor.test(datanew1[,i],datanew1[,j])$p.value
}}
And the output is:
results :
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.000000e+00 7.085663e-09 3.128975e-10 3.018239e-02 4.806400e-10 0.475139526
[2,] 7.085663e-09 0.000000e+00 2.141581e-21 0.000000e+00 2.247825e-20 0.454032499
[3,] 3.128975e-10 2.141581e-21 0.000000e+00 2.485924e-25 2.220446e-16 0.108643838
[4,] 3.018239e-02 0.000000e+00 2.485924e-25 0.000000e+00 5.870007e-15 0.006783324
[5,] 4.806400e-10 2.247825e-20 2.220446e-16 5.870007e-15 0.000000e+00 0.558827862
[6,] 4.751395e-01 4.540325e-01 1.086438e-01 6.783324e-03 5.588279e-01 0.000000000
Which is exactly what I want. But I want to get 3 matrices, one for each element of the datalist above.
EDIT:
If I do as Joran says:
Cor.tests=lapply(datalist,function(x){
results=matrix(NA,nrow=6,ncol=6)
for(i in 1:length(names(x))){
for(j in 1:length(names(x))){
results[i,j]<-cor.test(x[,i],x[,j])$p.value
}}})
I get:
$a
NULL
$b
NULL
$c
NULL
This can be done without for loops.
1) A solution with base R:
lapply(datalist,
function(datanew) outer(seq_along(datanew),
seq_along(datanew),
Vectorize(function(x, y)
cor.test(datanew[ , x],
datanew[ , y])$p.value)))
2) A solution with the package psych:
library(psych)
lapply(datalist, function(datanew) corr.test(datanew)$p)
A modified version of approach in the question:
lapply(datalist, function(x) {
results <- matrix(NA,nrow=6,ncol=6)
for(i in 1:6){
for(j in 1:6){
results[i,j]<-cor.test(x[,i],x[,j])$p.value
}
}
return(results)
})
There were two major problems in these commands:
The matrix results was not returned. I added return(results)
to the function.
You want to have a 6 by 6 matrix whereas your data frames have
seven columns. I replaced 1:length(names(x)) with 1:6 in the
for loops.
I'm not going to attempt to provide you with working code, but hopefully what follows will help explain why what you're trying isn't working.
Let's look at the first few lines of your sapply call:
Cor.tests=sapply(datalist,function(x){
for(h in 1:length(names(x))){
for(i in 1:length(names(x$h[i]))){
Let's stop here and think for a moment about x$h[i]. At this points, x is the argument passed to your anonymous function in sapply (presumably either a data frame or matrix, I can't be sure from your question which it is).
At this point in your code, what is h? h is the index variable in the previous for loop, so initially h has the value 1. The $ operator is for selecting items from an object by name. Is there something in x named h? I think not.
But then things get even worse as you attempt to select the ith element within this non-existant thing named h inside x. I'm honestly not even sure what R's interpreter will do with that since you're referencing the variable i in the expression that is supposed to define the range of values for i. Circular, anyone?
If you simply remove all attempts at the third for loop, you should have more luck. Just take the working version, plop it down in the body of the anonymous function, and replace every occurrence of datanew1 with x.
Good luck.
(PS - You might want be happier with the output of lapply rather than sapply)