R Selecting only up to maxmimum - r

I want to calculate the mean of all the acceleration values from a displacement value of 0.01 to max. However, I do not want to include any acceleration values after the maximum value. How is this done?
mean(
subset(S1_Intns40_chainno-Sheet1,
Displacement>0.01:max(Displacement),
select=c("Acceleration"))$Acceleration)
[1] -0.8371687

Like so:
limitedAvgAcc <- mean(data$Acceleration[(data$Displacement <= max)&(data$Displacement >= 0.01)])
This works because the bracketed statement create a Boolean vector, which is then used to subset the Acceleration vector, which is then averaged.

lift_S1_intns40_chaino=
(S1_Intns40_chainno[(which.min(S1_Intns40_chainno$Displacement<0.01)):
(which.max(S1_Intns40_chainno$Dis)),])
I have gotten the following code to work well enough for my purpose.

Related

Is there a way to handle calculations invovling exponential of big values in R?

I have looked a bit online and in the site but I did not find any solution. My problem is relatively simple so if you could point me to a possible solution, much appreciated.
test_vec <- c(2,8,709,600)
mean(exp(test_vec))
test_vec_bis <- c(2,8,710,600)
mean(exp(test_vec_bis))
exp(709)
exp(710)
# The numerical limit of R is at exp(709)
How can I calculate the mean of my vector and deal with the Inf values knowing that R could probably handle the mean value but not all values in the numerator of the mean calculation ?
There is an edge case where you can solve your problem by simply restating your problem mathematically, but that would require that the length of your vector is extremely large and/or that your large exp. numbers are close to the numeric limit:
Since the mean sum(x)/n can be written as sum(x/n) and since exp(x)/exp(y) = exp(x-y), you can calculate sum(exp(x-log(n))), which gives you a relief of log(n).
mean(exp(test_vec))
[1] 2.054602e+307
sum(exp(test_vec - log(length(test_vec))))
[1] 2.054602e+307
sum(exp(test_vec_bis - log(length(test_vec_bis))))
[1] 5.584987e+307
While this works for your example, most likely this won't work for your real vector.
In this case, you will have to consult packages like Rmpfr as suggested by #fra.
Here's one way where you qualify to only select those in your test_vec that give an answer < Inf:
mean(exp(test_vec)[which(exp(test_vec) < Inf)])
[1] 1.257673e+260
t2 <- c(2,8,600)
mean(exp(t2))
[1] 1.257673e+260
This assumes you were looking to exclude values that result in Inf, of course.

Arithmetics with extremely small values in R

I am facing a big issue in computing the cumsum() of a vector. The vector has a length of ~ 10,000 elements and from element say 2000 the values go down to 1e-310. To give a feeling of the distribution I am dealing with, here is a plot.
When I try to apply cumsum() I get lots of ones, which is impossible, and a minimum value around 10^-2. I am porting a code which we developed in Matlab and of course no problems there. For some reason, R seems to have troubles in working with such small numbers to the extent that applying standard functions returns unexpected, and wrong, results.
I searched over stack overflow and found these two posts:
R: Number precision, how to prevent rounding?
Controlling number of decimal digits in print output in R
Unfortunately, none of them helped me out.
I also tried to use Rcpp cumsum() function with no luck. I guess the problem comes directly from the precision of my matrix object.
I am not even sure how to reproduce this so I am happy to share my 9137 x 2 matrix. I am completely stuck with this.
Looking forward to hearing from you guys!
Thanks
Update
Here is a sample of 100 elements from my matrix:
y <- sample( BestPair, 100 )
dput( y )
c(7.74958917761984e-289, 4.19283869686715e-319, 1.52834266884531e-319,
2.22089175309335e-297, 4.93980517192742e-298, 1.37861543592719e-301,
1.47044459800611e-317, 6.49068860911021e-319, 1.83302927898675e-305,
8.39514422452147e-312, 2.88487711616485e-300, 0.000544461085044608,
0.000435738736513519, 1.35649914843994e-309, 4.30826678309556e-310,
2.60728322623343e-319, 0.000544460617547516, 5.28815204888643e-299,
0.00102710912090133, 0.00198425117943324, 1.99711912768841e-304,
8.34594499227505e-306, 7.42055412763084e-300, 5.00039717762739e-311,
1.8750204972032e-305, 1.06513324565406e-310, 5.00487443690634e-313,
3.4890421843663e-319, 7.48945537292364e-310, 1.92948452007191e-310,
1.19840058299897e-305, 0.000532438536688165, 6.53966533658245e-318,
0.000499821676385928, 2.02305525482572e-305, 5.18981575493413e-306,
8.82648276295387e-320, 7.30476057376283e-320, 1.23073492422415e-291,
4.1801705284367e-311, 5.10863383734203e-318, 1.12106998189819e-298,
9.34823978505262e-297, 0.00093615863896107, 5.3667092510628e-311,
3.85094526994501e-319, 1.3693720559483e-313, 3.96230943126873e-311,
2.03293191294298e-319, 2.38607510351427e-291, 0.000544460855322345,
1.74738584846597e-310, 1.41874408662835e-318, 5.73056861298345e-319,
3.28565325597139e-302, 3.5412805275117e-310, 1.19647007227024e-302,
1.71539915106223e-305, 2.10738303243284e-311, 6.47783846432427e-313,
5.0072402480979e-303, 7.7250380240544e-303, 9.75451890703679e-309,
0.000533945755492525, 0.00211359631486468, 1.6612179399628e-312,
0.000521804571338402, 4.12194185271951e-308, 1.12829365794294e-313,
8.89772702908418e-319, 5.092756929242e-312, 7.45208240537024e-311,
6.60385177095196e-298, 0.000544461017733648, 1.62108867188263e-318,
3.95135528339003e-309, 1.8792966379072e-292, 5.98494480819088e-295,
0.00051614492665081, 2.25198141886419e-300, 7.97467977809552e-305,
1.78098757558338e-311, 1.66946525895122e-313, 0.000778442249425894,
6.58100207570114e-312, 0.00120733768329515, 3.44432924341767e-319,
6.38151190880713e-313, 7.1129669216109e-300, 4.11319531475755e-319,
7.21747577033383e-304, 1.48709312807403e-318, 1.39519898470211e-303,
4.58585270141592e-312, 2.16309869205282e-295, 7.55248601743105e-314,
3.16365476892733e-310, 1.96961507010996e-305, 3.21125377499206e-318,
3.66277772043574e-304)
Update 2
Apparently, imposing the following:
BestPair[ BestPair < .Machine$double.eps ] <- 0
does not solve the issue. Still finding weird results from cumsum(). Here is a plot to better explain what I am dealing with. The Cumulative Prob. has this shape because BestPair has been sorted by decreasing order. I want to have the 1 from cumsum() on top of my vector.
Here is a summary of the ob
> summary(CumProb)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0250 1.0000 1.0000 0.9685 1.0000 1.0000
Update 3. Results from Matlab
Here is the result as computed with Matlab. As you can see, I can get a pretty decent distribution which I cannot achieve in R even if I truncated the original matrix.

get location of row with median value in R data frame

I am a bit stuck with this basic problem, but I cannot find a solution.
I have two data frames (dummies below):
x<- data.frame("Col1"=c(1,2,3,4), "Col2"=c(3,3,6,3))
y<- data.frame("ColA"=c(0,0,9,4), "ColB"=c(5,3,20,3))
I need to use the location of the median value of one column in df x to then retrieve a value from df y. For this, I am trying to get the row number of the median value in e.g. x$Col1 to then retrieve the value using something like y[,"ColB"][row.number]
is there an elegant way/function for doing this? Solutions might need to account for two cases - when the sample has an even number of values, and ehwn this is uneven (when numbers are even, the median value might be one that is not found in the sample as a result of calculating the mean of the two values in the middle)
The problem is a little underspecified.
What should happen when the median isn't in the data?
What should happen if the median appears in the data multiple times?
Here's a solution which takes the (absolute) difference between each value and the median, then returns the index of the first row for which that difference vector achieves its minimum.
with(x, which.min(abs(Col1 - median(Col1))))
# [1] 2
The quantile function with type = 1 (i.e. no averaging) may also be of interest, depending on your desired behavior. It returns the lower of the two "sides" of the median, while the which.min method above can depend on the ordering of your data.
quantile(x$Col1, .5, type = 1)
# 50%
# 2
An option using quantile is
with(x, which(Col1 == quantile(Col1, .5, type = 1)))
# [1] 2
This could possibly return multiple row-numbers.
Edit:
If you want it to only return the first match, you could modify it as shown below
with(x, which.min(Col1 != quantile(Col1, .5, type = 1)))
Here, something like y$ColB[which(x$Col1 == round(median(x$Col1)))] would do the trick.
The problem is x has an even number of rows, so the median 2.5 is not an integer. In this case you have to choose between 2 or 3.
Note: The above works for your example, not for general cases (e.g. c(-2L,2L) or with rational numbers). For the more general case see #IceCreamToucan's solution.

How to compute for the mean and sd

I need help on 4b please
‘Warpbreaks’ is a built-in dataset in R. Load it using the function data(warpbreaks). It consists of the number of warp breaks per loom, where a loom corresponds to a fixed length of yarn. It has three variables namely, breaks, wool, and tension.
b. For the ‘AM.warpbreaks’ dataset, compute for the mean and the standard deviation of the breaks variable for those observations with breaks value not exceeding 30.
data(warpbreaks)
warpbreaks <- data.frame(warpbreaks)
AM.warpbreaks <- subset(warpbreaks, wool=="A" & tension=="M")
mean(AM.warpbreaks<=30)
sd(AM.warpbreaks<=30)
This is what I understood this problem and typed the code as in the last two lines. However, I wasn't able to run the last two lines while the first 3 lines ran successfully. Can anybody tell me what is the error here?
Thanks! :)
Another way to go about it:
This way you aren't generating a bunch of datasets and then working on remembering which is which. This is more a personal thing though.
data(warpbreaks)
mean(AM.warpbreaks[which(AM.warpbreaks$breaks<=30),"breaks"])
sd(AM.warpbreaks[which(AM.warpbreaks$breaks<=30),"breaks"])
There are two problems with your code. The first is that you are comparing to 30, but you're looking at the entire data frame, rather than just the "breaks" column.
AM.warpbreaks$breaks <= 30
is an expression that refers to the breaks being less than thirty.
But mean(AM.warpbreaks$breaks <= 30) will not give the answer you want either, because R will evaluate the inner expression as a vector of boolean TRUE/FALSE values indicating whether that break is less than 30.
Generally, you just want to take another subset for an analysis like this.
AM.lt.30 <- subset(AM.warpbreaks, breaks <= 30)
mean(AM.lt.30$breaks)
sd(AM.lt.30$breaks)

Find the percentage of values that meet some criterion by columns of a matrix

I've created an 8 x 1000 matrix of Exp() variables. This represents 1000 iterations (columns) of sampling 8 times from an exponential distribution. I am trying to figure out how to get the percentage of values in each column that are less than a critical value. So I end up with a vector of 1000 percentages. I've tried a couple things but still being relatively new to R I'm having some difficulty.
This is my current version of code that doesn't quite work. I've used the apply function (without the for loop) when I want the mean or variance of the samples, so I've been trying this approach but this percentage thing seems to require something different. Any thoughts?
m=1000
n=8
theta=4
crit=3
x=rexp(m*n,1/theta)
Mxs=matrix(x,nrow=n)
ltcrit=matrix(nrow=m,ncol=1)
for(i in 1:m){
lt3=apply(Mxs,2,length(which(Mxs[,i]<crit)/n))
}
ltcrit
You can use apply without any for loop and get the answer:
percentages = apply(Mxs, 2, function(column) mean(column < crit))
Note the creation of an anonymous function with function(column) mean(column < crit). You probably used apply(Mxs, 2, mean) when you wanted the means of the columns, or apply(Mxs, 2, var) when you wanted the variance of the columns, but notice that you can put any function you want into that space and it will perform it on each column.
Also note that mean(column < crit) is a good way to get the percentage of values in column less than crit.
You can use colMeans:
colMeans(Mxs < crit)
[1] 0.500 0.750 0.250 0.375 0.375 0.875 ...

Resources