How to calculate the amount of numbers inside a specific range - r

I'm still having problems calculating numbers.
Trying to find the amount of numbers inside [-0.5 , 0.5] the first line, and the amount of numbers outside the same range in the second line.
I use abc = rnorm(100, mean=0, sd=1). So I have 100 numbers in total, but i only have 35 numbers inside the range, and 35 outside the range, that dosen't add up to 100.
length(abc[abc>=-0.5 & abc<=0.5])
[1] 35
length(abc[abc<-0.5 & abc>0.5])
[1] 35
Then I tried:
length(which(abc>=-0.5 & abc<=0.5))
[1] 40
length(which(abc<-0.5 & abc>0.5))
[1] 26
And it still doesn't add up to 100. What's wrong?

You are after:
R> set.seed(1)
R> abc = rnorm(100, mean=0, sd=1)
R> length(abc[abc >= -0.5 & abc <= 0.5])
[1] 41
R> length(abc[abc < -0.5 | abc > 0.5])
[1] 59
What went wrong
Two things:
abc < -0.5 & abc > 0.5 is asking for values less than -0.5 and greater than 0.5
However, you actually had: abc[abc<-0.5 & abc>0.5] This does something a bit different due to scoping. Let's pull it apart:
R> abc[abc<-0.5 & abc>0.5]
[1] 1.5953 0.7383 0.5758 1.5118 1.1249 0.9438 <snip>
Now let's look at abc
R> abc
[1] FALSE FALSE FALSE TRUE FALSE FALSE FALSE
You've changed the value of abc! This is because <- is the assignment operator. You have set abc equal to 0.5 & abc > 0.5. To avoid this, use spacing (as in my code).

When wanting to find numbers inside and outside a radius like this, it can be helpful to consider the absolute value, and you then only have one comparison to make:
length(abc[abs(abc)<=0.5])
[1] 41
length(abc[abs(abc)>0.5])
[1] 59
Or you can use cut and table to do it in one line:
table(cut(abs(abc),c(-Inf,0.5,Inf)))
(-Inf,0.5] (0.5,Inf]
41 59

As a shortcut, you can also do it this way :
set.seed(1)
abc <- rnorm(100, mean=0, sd=1)
sum(abc>=-0.5 & abc<=0.5)
# [1] 41
sum(abc< -0.5 | abc>0.5)
# [1] 59
This works because sum considers TRUE as 1 and FALSE as 0.

Alternatively via subset:
set.seed(1)
abc <- rnorm(100, mean=0, sd=1)
length(subset(abc, abc >= (-0.5) & abc <= 0.5))
[1] 41
length(subset(abc, abc < (-0.5) | abc > 0.5))
[1] 59

Related

Variable FOR LOOP in R [duplicate]

I have a question about creating vectors. If I do a <- 1:10, "a" has the values 1,2,3,4,5,6,7,8,9,10.
My question is how do you create a vector with specific intervals between its elements. For example, I would like to create a vector that has the values from 1 to 100 but only count in intervals of 5 so that I get a vector that has the values 5,10,15,20,...,95,100
I think that in Matlab we can do 1:5:100, how do we do this using R?
I could try doing 5*(1:20) but is there a shorter way? (since in this case I would need to know the whole length (100) and then divide by the size of the interval (5) to get the 20)
In R the equivalent function is seq and you can use it with the option by:
seq(from = 5, to = 100, by = 5)
# [1] 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
In addition to by you can also have other options such as length.out and along.with.
length.out: If you want to get a total of 10 numbers between 0 and 1, for example:
seq(0, 1, length.out = 10)
# gives 10 equally spaced numbers from 0 to 1
along.with: It takes the length of the vector you supply as input and provides a vector from 1:length(input).
seq(along.with=c(10,20,30))
# [1] 1 2 3
Although, instead of using the along.with option, it is recommended to use seq_along in this case. From the documentation for ?seq
seq is generic, and only the default method is described here. Note that it dispatches on the class of the first argument irrespective of argument names. This can have unintended consequences if it is called with just one argument intending this to be taken as along.with: it is much better to use seq_along in that case.
seq_along: Instead of seq(along.with(.))
seq_along(c(10,20,30))
# [1] 1 2 3
Use the code
x = seq(0,100,5) #this means (starting number, ending number, interval)
the output will be
[1] 0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75
[17] 80 85 90 95 100
Usually, we want to divide our vector into a number of intervals.
In this case, you can use a function where (a) is a vector and
(b) is the number of intervals. (Let's suppose you want 4 intervals)
a <- 1:10
b <- 4
FunctionIntervalM <- function(a,b) {
seq(from=min(a), to = max(a), by = (max(a)-min(a))/b)
}
FunctionIntervalM(a,b)
# 1.00 3.25 5.50 7.75 10.00
Therefore you have 4 intervals:
1.00 - 3.25
3.25 - 5.50
5.50 - 7.75
7.75 - 10.00
You can also use a cut function
cut(a, 4)
# (0.991,3.25] (0.991,3.25] (0.991,3.25] (3.25,5.5] (3.25,5.5] (5.5,7.75]
# (5.5,7.75] (7.75,10] (7.75,10] (7.75,10]
#Levels: (0.991,3.25] (3.25,5.5] (5.5,7.75] (7.75,10]

use one variable conditioned on another

I am new to R so not very apt in it. I am trying to use the values of one variable, conditioned on the corresponding value in the other variable. For example,
x 1 2 3 10 20 30
y 45 60 20 78 65 27
I need to calculate a variable, say m, where
m= 5 * (value of y, given value of x)
So, given x=3, corresponding y=20 then m = 5*(20|x=3) = 100
and, if x=30, corresponding y=27, then m = 5*(27|x=30) = 135
Could you please tell me how to define m in this case?
Thanks
Try this
5*y[x == 3]
## [1] 100
And
5*y[x == 30]
## [1] 135
Edit: based on you new explanation, it looks like you are looking for match, i.e.,
m <- c(0, 1, 15, 20, 3)
y[match(m, x)]*5
## [1] NA 225 NA 325 100

find largest smaller element

I have two lists of indices:
> k.start
[1] 3 19 45 120 400 809 1001
> k.event
[1] 3 4 66 300
I need a list that contains, for each element of k.event, the largest value in k.start which is less than or equal to it. The desired result is
k.desired = c(3,3,45,120)
So, I'm trying to replicate this code, except without a for loop:
for (i in 1:length(k.start){
k.start[max(which(k.event[i] > k.start))]
}
Thanks!
You could use
vapply(k.event, function(x) max(k.start[k.start <= x]), 1)
# [1] 3 3 45 120

R: Finding the begin of a (exponential?) decay?

How to find the index indicated by the red vlin in the following example:
# Get the data as "tmpData"
source("http://pastie.org/pastes/9350691/download")
# Plot
plot(tmpData,type="l")
abline(v=49,col="red")
The following approach is promising, but how to find the peak maximum?
library(RcppRoll)
n <- 10
smoothedTmpData <- roll_mean(tmpData,n)
plot(-diff(smoothedTmpData),type="l")
abline(v=49,col="red")
which.max(-diff(smoothedTmpData)) gives you the index of the maximum.
http://www.inside-r.org/r-doc/base/which.max
I'm unsure if this is your actual question...
Where there is a single peak in the gradient, as in your example dataset, then gwieshammer is correct: you can just use which.max to find it.
For the case where there are multiple possible peaks, you need a more sophisticated approach. R has lots of peak finding functions (of varying quality). One that works for this data is wavCWTPeaks in wmtsa.
library(RcppRoll)
library(wmtsa)
source("http://pastie.org/pastes/9350691/download")
n <- 10
smoothedTmpData <- roll_mean(tmpData, n)
gradient <- -diff(smoothedTmpData)
cwt <- wavCWT(gradient)
tree <- wavCWTTree(cwt)
(peaks <- wavCWTPeaks(tree))
## $x
## [1] 4 52
##
## $y
## [1] 302.6718 5844.3172
##
## attr(,"peaks")
## branch itime iscale time scale extrema iendtime
## 1 1 5 2 5 2 16620.58 4
## 2 2 57 26 57 30 20064.64 52
## attr(,"snr.min")
## [1] 3
## attr(,"scale.range")
## [1] 1 28
## attr(,"length.min")
## [1] 10
## attr(,"noise.span")
## [1] 5
## attr(,"noise.fun")
## [1] "quantile"
## attr(,"noise.min")
## 5%
## 4.121621
So the main peak close to 50 is correctly found, and the routine picks up another smaller peak at the start.

Create categorical variable in R based on range

I have a dataframe with a column of integers that I would like to use as a reference to make a new categorical variable. I want to divide the variable into three groups and set the ranges myself (ie 0-5, 6-10, etc). I tried cut but that divides the variable into groups based on a normal distribution and my data is right skewed. I have also tried to use if/then statements but this outputs a true/false value and I would like to keep my original variable. I am sure that there is a simple way to do this but I cannot seem to figure it out. Any advice on a simple way to do this quickly?
I had something in mind like this:
x x.range
3 0-5
4 0-5
6 6-10
12 11-15
x <- rnorm(100,10,10)
cut(x,c(-Inf,0,5,6,10,Inf))
Ian's answer (cut) is the most common way to do this, as far as i know.
I prefer to use shingle, from the Lattice Package
the argument that specifies the binning intervals seems a little more intuitive to me.
you use shingle like so:
# mock some data
data = sample(0:40, 200, replace=T)
a = c(0, 5);b = c(5,9);c = c(9, 19);d = c(19, 33);e = c(33, 41)
my_bins = matrix(rbind(a, b, c, d, e), ncol=2)
# returns: (the binning intervals i've set)
[,1] [,2]
[1,] 0 5
[2,] 5 9
[3,] 9 19
[4,] 19 33
[5,] 33 41
shx = shingle(data, intervals=my_bins)
#'shx' at the interactive prompt will give you a nice frequency table:
# Intervals:
min max count
1 0 5 23
2 5 9 17
3 9 19 56
4 19 33 76
5 33 41 46
We can use smart_cut from package cutr:
devtools::install_github("moodymudskipper/cutr")
library(cutr)
x <- c(3,4,6,12)
To cut with intervals of length 5 starting on 1 :
smart_cut(x,list(5,1),"width" , simplify=FALSE)
# [1] [1,6) [1,6) [6,11) [11,16]
# Levels: [1,6) < [6,11) < [11,16]
To get exactly your requested output :
smart_cut(x,c(0,6,11,16), labels = ~paste0(.y[1],'-',.y[2]-1), simplify=FALSE, open_end = TRUE)
# [1] 0-5 0-5 6-10 11-15
# Levels: 0-5 < 6-10 < 11-15
more on cutr and smart_cut

Resources