Find n smallest values from data? - r

How to get 3 minimum value on the data automatically?
Data:
data <- c(4,3,5,2,2,1,1,5,6,7,8,9)
[1] 4 3 5 2 2 1 1 5 6 7 8 9
With min() function just return 1 value and I want to get 3 minimum value from data.
min(data)
[1] 1
Can I have this from a data?
[1] 1 1 2

Simply take the first three values of a sorted vector
> sort(data)[1:3]
[1] 1 1 2
Another alternative is head function that shows you the first n values of R object, so for three highest numbers you need head of a sorted vector
> head(sort(data), 3)
[1] 1 1 2
...but you could take head of possibly any other R object.
If you were interested in value that marks the upper boundry of k percent lowest values, use quantile function
> quantile(data, 0.1)
10%
1.1

data <- c(4,3,5,2,2,1,1,5,6,7,8,9)
sort(data,decreasing=F)[1:3]

Related

How to create a vector of positions of a numeric vector in R?

I have a vector of numbers that contain some gaps. For example,
vec <- c(3,1,7,3,5,7)
So, there are 4 different values and I would like to transform it into a vector of values (without gaps) indicating the order of the entry while respecting the same position. So, in this case, I would like to obtain
2 1 4 2 3 4
Indicating a sequence of between 1 and 4 and showing the orders in the original vector vec.
You can use match to help you look up the values in a sorted unique order. For example
vec <- c(3,1,7,3,5,7)
match(vec, sort(unique(vec)))
# [1] 2 1 4 2 3 4
This works because match returns the indexes which will start at 1.
We may use factor
as.integer(factor(vec))
[1] 2 1 4 2 3 4

create a new variable in r based on cycling timestamps (s) to identify replicates

I have the following dummy data:
x<-c(0,1,2,3,0,1,2)
y<-c(1,1,1,1,1,1,1)
z<-c(2,2,2,2,2,2,2)
a<-data.frame(x,y,z)
however I would like to add a variable to identify the replicates based on the cycle of x
f<-c(1,1,1,1,2,2,2)
b<-data.frame(a,f)
head(b)
I need to use code to this as I have 48 individuals with 1000s of observations each potentially a different length of x (seconds). Any help would be very much appreciated.
You can use cumsum and diff to count the occurences when x returns to 0, or even just moves backwards
a$f <- c(0, cumsum(diff(a$x) < 0)) + 1
a
#> x y z f
#> 1 0 1 2 1
#> 2 1 1 2 1
#> 3 2 1 2 1
#> 4 3 1 2 1
#> 5 0 1 2 2
#> 6 1 1 2 2
#> 7 2 1 2 2
If a cycle is started by x == 0, then all you need to do is calculate your column f as follows:
f <- cumsum(a$x == 0)
If, more generally, you need to check whether a value in the x column is not strictly larger than its predecessor (and thus denotes a new cycle), try this:
f <- cumsum(a$x <= c(Inf, a$x[-nrow(a)]))
The above compares every observation in a$x against a shifted version of the vector - with an Inf value prepended to make sure the first observation counts as a new cycle and the last observation removed, so that that vector has same length.
// please see Allan Cameron's answer below for a more elegant approach to this, using diff

Get first n indexes fulfilling a condition in r

I have a large vector where I have different values. I would like to find first N values which are less than a particular value.
For example in the following vector I want only 3 indexes which are less than 3
x2 <- c(1.6,0.35,1,3,6,8,1.5,2)
x3 <- which(x2 < 3)
x3
[1] 1 2 3 7 8
From X3 I can extract the first three values but they are not the smallest values in the vector. If I order the X2 vector before applying the condition, I am loosing the indexes of the values. What I want at the end is as follows
[1] 2 3 7
The rank function is what you are looking for:
which(rank(x2)<=3 & x2<3)
#[1] 2 3 7
Try:
match(sort(x2[x2 < 3])[1:3], x2)
#[1] 2 3 7
We can match the smallest 3 values less than the threshold to the original vector.
edit
This will work with unique and non-unique vectors
which(!is.na(match(x2, sort(x2[x2 < 3])[1:3])))
[1] 2 3 7

interaction, number of groups

From a vector a I'm looking for a function (quick to compute) that returns a vector with numbers ranging between 1 and the number of levels in vector a and indicating which values are equal.
I know how to do this with a for loop but it is a bit slow to run.
a=vector(11,14,11,22,14,22)
levels(as.factor(a))==3
Solution
b=vector(1,2,1,3,2,3)
meaning that in position 1 and 3 (where are the numbers 1 in b) the values in a are equal.
in position 2 and 5 (where are the numbers 2 in b) the values in a are equal.
etc...
Thank you
You can use as.numeric() on a factor to get this:
a <- c(11,14,11,22,14,22)
as.numeric(factor(a))
# [1] 1 2 1 3 2 3
Here is one function thats quickily made:
numberfun <- function(x){y <- unique(x)
match(x,y)}
a <- c(11,14,11,22,14,22)
numberfun(a)
#[1] 1 2 1 3 2 3
a <- c(99,99,22,22,44,22,99)
numberfun(a)
#[1] 1 1 2 2 3 2 1

How do I get the tabulate() function in R to include the value of zero? Are there other options?

I have a test data set called "predicted" that results after taking 100 bootstrap samples from a random normal distribution. Predicted is filled with integer data (from 0 to 20).
When I use the following function:
predicted_output <- as.matrix(tabulate(predicted,
nbins = max(0, predicted, na.rm = FALSE)))
I observe that all counts associated with [0,] are excluded from the resulting matrix (as per the description in the tabulate data, which notes that NAs are (silently) ignored). How do I (or can I) augment tabulate to provide a matrix which, in my case, has 21 rows, and includes the counts for the NULL values?
A easy workaround is to change NA values to max(predicted)+1. And you can get the counts of 0 as well by doing tabulate(predicted+1):
x <- c(1,1,0,0,0,2,3,7,10,NA,5,2,NA,10)
x[is.na(x)] <- max(x, na.rm=T) + 1
tabulate(x+1)
# [1] 3 2 2 1 0 1 0 1 0 0 2 2
Note that the count for 0's and NA's are also included above. The first value = 3 is the number of 0's and the last is the count of NA's.
you can check this with:
x <- c(1,1,0,0,0,2,3,7,10,NA,5,2,NA,10)
table(x, exclude=NULL)
# x
# 0 1 2 3 5 7 10 <NA>
# 3 2 2 1 1 1 2 2

Resources