I am trying to use match() in R to find any matching values within a certain interval. For example:
v <- c(2.2, 2.4, 4.3, 1.3, 4.5, 6.8, 0.9)
match(2.4, v)
gives me all the locations where 2.4 occurs in v, but what if I wanted to give a range for all possible matches? For example 2.4 +/- 0.2?
Any help is greatly appreciated, thanks in advance!
In that case, I would use subsetting:
v[v>2.2 & v<2.6]
or
which(v>2.2 & v<2.6)
depending on if you want the values or the index
This is another option:
which(findInterval(v, c(-.2, .2) + 2.4) == 1)
[1] 1 2
findInterval(v, c(-.2, .2) + 2.4) gives you 1 1 2 0 2 2 0, where 1 means the element is inside the interval, 0 means it's to the left, and 2 means to the right.
Related
I would like to round the min and max of a vector to the nearest integer, so that the new set defined by the two values is an actual superset of the previous data.
For instance, for a vector between 1.1 and 2.9, round(range()) returns 1 and 3, which is the desirable outcome:
x <- seq(1.1, 2.9, 0.1)
oldrange_x <- range(x)
newrange_x <- round(oldrange_x)
newrange_x
On the other hand, for 1.8 and 2.9 round(range()) returns 2 and 3, which is not a superset of the initial vector:
y <- seq(1.8, 2.9, 0.1)
oldrange_y <- range(y)
newrange_y <- round(oldrange_y)
newrange_y
and leads me to use a combination of floor() and ceiling()
newrange_y2 <- c(floor(min(oldrange_y)), ceiling(max(oldrange_y)))
newrange_y2
Would there be a readymade function doing that -essentially a roundrange() function- in order to avoid the ugly solution and make the code a bit more readable?
Your use of floor and ceiling is perfectly fine, not ugly at all. Here's how you would wrap it in a function.
superset <- function(x){
c(floor(min(x)), ceiling(max(x)))}
superset(seq(1.8, 2.9, 0.1))
> [1] 1 3
Given a vector of specified values, for example:
x = c(4.0, 3.7, 3.3, 3.0, 2.7, 2.3, 2.0, 1.7, 1.3, 1.0)
I would like to create a new vector of any length comprised only of values in x sampled randomly, that will result in the combined mean of 3.15. I have tried using the rnorm() function but however, I can only generate random numbers that equate to the mean of 3.15 and not the specified values I wanted. Could anyone point me in the correct direction?
The problem with your question is that there are an infinite number of ways to sample from
x = c(4.0, 3.7, 3.3, 3.0, 2.7, 2.3, 2.0, 1.7, 1.3, 1.0)
to get a mean of roughly 3.15, you just have a specify a probability for each value.
Doing
n = 20
sample(x, n, replace = TRUE)
assumes each value is equally likely and you would get a mean close to 2.5. But if you re-weight the probabilities, you can get closer to what you want. One way of doing this might be
p = 1/(x - 3.15)^2 # or try p = 1/abs(x - 3.15)
sample(x, n, replace = TRUE, prob = p)
where p weights values closer to 3.15 higher so these are more likely to be accepted. It isn't perfect (meaning the true expected value is something like 3.12 and most values are just 2.7, 3.0 and 3.3), but then again there isn't a single solution.
Here's my brute force method:
samp315<-function(n=20, desmean=3.15, distance=0.001) { # create a function with default n=20 and range 3.149-3.151
x<- c(4.0, 3.7, 3.3, 3.0, 2.7, 2.3, 2.0, 1.7, 1.3, 1.0)
samp<-0 # reset samp to 0
i<-0 # reset my counter to zero
while (!between(mean(samp),desmean-distance,desmean+distance) & i<1000000) { # the following will run continuously until a sample (samp) with a mean that falls within the specified range is found, OR until 1 million attempts have been made
samp<-sample(x,n,replace=TRUE) # try to generate a sample of n times from the list of values (x)
i=i+1 # add to counter towards 1 million
}
ifelse(i<1000000,samp,warning("Couldn't find an appropriate sample, please select a lower n, a desired mean closer to 2.5, or a greater distance")) # if the while loop ended because the counter reached a million, exit with an error, otherwise, return the contents of samp.
}
Now, every time you do samp315():
eg<-samp315()
mean(eg)
[1] 3.15
eg
[1] 3.0 3.7 3.0 3.7 3.3 3.7 3.3 3.3 4.0 1.0 1.7 3.0 2.0 4.0 3.7 3.7 2.3 3.3 4.0 3.3
If you want a sample of different length, just place whatever number you wish inside samp315(). The larger the number, however, the longer it will take to find a sample that will get your desired mean.
You can also change your desired mean by setting desmean, and play around with the range by changing distance to whatever the distance (+/-) from your desired mean can be. The default is n=20, with the range from 3.149 to 3.151.
To avoid an infinite loop for highly unlikely combinations of n and range, I set a maximum of 1m samples, after which the function quits with a warning.
As #mickey pointed out, we can weight the probability of each item according to how far it is from the mean. However, that does not quite work, because there are more elements in x lower than than the desired mean, which skews the sampling towards them. We can account for this by adjusting the probabilites relative to how many elements are higher or lower than the desire mean:
x = c(4.0, 3.7, 3.3, 3.0, 2.7, 2.3, 2.0, 1.7, 1.3, 1.0)
n = 100000
xbar=3.15
xhi = x[which(x>xbar)]
xlo = x[which(x<xbar)]
probhi = 1/(xhi-xbar)
problo = 1/(xbar-xlo)
probhi = probhi * length(problo) / length(probhi)
n=1e5
set.seed(1)
y = sample(x, size = n, replace = TRUE, prob = c(probhi,problo))
mean(y)
# [1] 3.150216
I would like to sample a random number between 0 and 1, with an 90% probability to sample from 0-0.3 and 10% to sample between 0.3-1.
I tried the following:
0.9*runif(1, 0, 0.3) + 0.1*runif(1, 0.3, 1)
But that's not quite it: I will never get the number 0.8, for example.
Is there a simple way to do it in Base R?
sample(c(runif(1,0,0.3),runif(1,0.3,1)),1,prob=c(0.9,0.1))
Usually in R you want to do stuff in a vectorized way. So don't draw a number at the time, but draw all of them in one call (much faster). Here you can use sample to draw the higher extremum and the draw. Like this:
nsamples<-100000
res<-runif(nsamples,0,sample(c(0.03,1),nsamples,TRUE,prob=c(10,90)))
#just to check the result
hist(res)
#this should be around 0.127(=0.9*0.03+0.1*1) if correct
mean(res<0.03)
You can write a small function to do the job whenever you need it.
runif_probs <- function(n, p = 0.9, cutpoint = 0.3){
ifelse(runif(n) <= p, runif(n, 0, cutpoint), runif(n, cutpoint, 1))
}
set.seed(8862)
which(runif_probs(100) > 0.8)
#[1] 38 62
I am using the baysout function for outlier detection from the 'dprep' package in R. The returned value is supposed to be a 2 column matrix according to the R documentation. The first column contains the indexes of the top num.out (user defined number of outliers to return) and the second, the outlyingness measure for each index.
The problem is that I want to access the index number separately but I am not able to do this. The function is actually returning an num.out x 1 matrix as opposed to a num.out x 2 matrix. The index value and the outlyingness measure are there but I cannot access them separately. Please see sample code below:
# Install and load the dprep library
install.packages("dprep")
library(dprep)
# Create 5x3 matrix for input to baysout function
A = matrix(c(0.8, 0.4, 1.2, 0.4, 1.2, 1.1, 0.3,
0.1, 1.9, 1.1, 0.9, 1.4, 0.3, 1.5, 0.5), nrow=5, ncol=3)
# Run the baysout function on matrix A and store result in outliers
outliers <- baysout(A, blocks = 3, nclass=0, k = 3, num.out = 3)
# print out result
print(outliers)
# attempt to access the index
print(outliers[1,1])
Output is as follows:
print out result
print(outliers) [,1] 4 3.625798 3 2.901654 2 2.850419
attempt to access the index
print(outliers[1,1]) 4 3.625798
This is not the real data I am using which is much larger and I would like to gain access to the index. In the example above I would like to be able to access the number 4 on its own. It is coupled with the 3.625798 and I am not able to access each figure separately. Would anyone have any advice on how I could do this?
solution by ekstroem
Use:
index <- as.numeric(rownames(outliers))
The documentation may not be entirely correct. In any case the index is stored in the row names.
I would like to take many values from interpolation at once.
For example, from my data file('int.txt'), I have each "conc1" corresponding to each "depth1" (e.g., 1.1 m, 2.1 m, 3.1 m, 4.1 m, 5.1 m, 6.1 m).
Here, after interpolating my concentration data, I want to take "conc"s at "depth" of 1.2, 2.2, 3.2, 4.2, 5.2 m
Following comments below (I'm editting my question), I made a code like this,
f = approxfun(depth1, conc1, rule=1,method='linear', xout=seq(1.2,5.2,1.0))
i<-approx(depth1, conc1, rule=1,method='linear', xout=seq(1.2,5.2,1.0))
It works well. Here, I have two more questions.
1. Then, how can I make two columns with data from i? Can I add these two columns to my data, 'int'? In this case, I will have no value at the last rows of the new columns.
2. I have one more x, y vector (y= conc2, x=depth2). I have each "conc2" at each "depth2", and "depth2" does not have regular intervals, so which is like 1.3, 2.7, 3.2... Here, after interpolating above, I want to extract all "conc1" values corresponding "depth2".
Please let me know how to do these things. Thank you very much for your help :)
approxfun() generates a function that interpolates between given x and y vectors. You can call that function on a vector to take many approximations at once. There are several customizations you can make, (such as the simple method of interpolation and what to do outside of the data range,) but this should get you started until you specify the need for something more complicated.
?approxfun
f = approxfun(x=c(1.1, 2.1, 3.1, 4.1, 5.1),y=c(1, 3, 5, 2, 4),rule=1,method='constant')
plot(y=f(seq(1.1,5.1,.1)),x=seq(1.1,5.1,.1))
f = approxfun(x=c(1.1, 2.1, 3.1, 4.1, 5.1),y=c(1, 3, 5, 2, 4),rule=1,method='linear')
plot(y=f(seq(1.1,5.1,.1)),x=seq(1.1,5.1,.1))