Get first n indexes fulfilling a condition in r - r

I have a large vector where I have different values. I would like to find first N values which are less than a particular value.
For example in the following vector I want only 3 indexes which are less than 3
x2 <- c(1.6,0.35,1,3,6,8,1.5,2)
x3 <- which(x2 < 3)
x3
[1] 1 2 3 7 8
From X3 I can extract the first three values but they are not the smallest values in the vector. If I order the X2 vector before applying the condition, I am loosing the indexes of the values. What I want at the end is as follows
[1] 2 3 7

The rank function is what you are looking for:
which(rank(x2)<=3 & x2<3)
#[1] 2 3 7

Try:
match(sort(x2[x2 < 3])[1:3], x2)
#[1] 2 3 7
We can match the smallest 3 values less than the threshold to the original vector.
edit
This will work with unique and non-unique vectors
which(!is.na(match(x2, sort(x2[x2 < 3])[1:3])))
[1] 2 3 7

Related

How to create a vector of positions of a numeric vector in R?

I have a vector of numbers that contain some gaps. For example,
vec <- c(3,1,7,3,5,7)
So, there are 4 different values and I would like to transform it into a vector of values (without gaps) indicating the order of the entry while respecting the same position. So, in this case, I would like to obtain
2 1 4 2 3 4
Indicating a sequence of between 1 and 4 and showing the orders in the original vector vec.
You can use match to help you look up the values in a sorted unique order. For example
vec <- c(3,1,7,3,5,7)
match(vec, sort(unique(vec)))
# [1] 2 1 4 2 3 4
This works because match returns the indexes which will start at 1.
We may use factor
as.integer(factor(vec))
[1] 2 1 4 2 3 4

How can i multiply specific number in vector using R

I have task to multiply numbers in vector, but only those that can be divided by 3 modulo 0. I figured out how to replace certain elements in vector by different numbers, but it works only if i replace with certain number. I wasn't able to find any answer here http://www.r-tutor.com/r-introduction/vector or even on this site. Everyone only extracting values to another vector.
x <- c(1,1,2,2,2,3,3)
x[x%%2==0] = 5
# [1] 1 1 5 5 5 3 3
why this doesn't work ?
x[x%%3==0] = x*3
I expect to get this:
c(1,1,5,5,5,9,9)
The assignment vectors are not the same on the lhs and rhs of the assignment operator.
length(x*3)
#[1] 7
length(x[x%%3 ==0])
#[1] 2
We need to do
x[x%%3==0] <- x[x%%3==0]*3
x
#[1] 1 1 5 5 5 9 9
Instead of repeating the logical vector, an object can be created and then do the substitution
i1 <- x%%3 == 0
x[i1] <- x[i1]*3
In the first assignment, there was only a single element and it was assigned to replace the values returned by the logical condition is met
Another option is
pmax(x, x*(!x%%3)*3)
#[1] 1 1 5 5 5 9 9

Find n smallest values from data?

How to get 3 minimum value on the data automatically?
Data:
data <- c(4,3,5,2,2,1,1,5,6,7,8,9)
[1] 4 3 5 2 2 1 1 5 6 7 8 9
With min() function just return 1 value and I want to get 3 minimum value from data.
min(data)
[1] 1
Can I have this from a data?
[1] 1 1 2
Simply take the first three values of a sorted vector
> sort(data)[1:3]
[1] 1 1 2
Another alternative is head function that shows you the first n values of R object, so for three highest numbers you need head of a sorted vector
> head(sort(data), 3)
[1] 1 1 2
...but you could take head of possibly any other R object.
If you were interested in value that marks the upper boundry of k percent lowest values, use quantile function
> quantile(data, 0.1)
10%
1.1
data <- c(4,3,5,2,2,1,1,5,6,7,8,9)
sort(data,decreasing=F)[1:3]

Computing difference between rows in a data frame

I have a data frame. I would like to compute how "far" each row is from a given row. Let us consider it for the 1st row. Let the data frame be as follows:
> sampleDF
X1 X2 X3
1 5 5
4 2 2
2 9 1
7 7 3
What I wish to do is the following:
Compute the difference between the 1st row & others: sampleDF[1,]-sampleDF[2,]
Consider only the absolute value: abs(sampleDF[1,]-sampleDF[2,])
Compute the sum of the newly formed data frame of differences: rowSums(newDF)
Now to do this for the whole data frame.
newDF <- sapply(2:4,function(x) { return (abs(sampleDF[1,]-sampleDF[x,]));})
This creates a problem in that the result is a transposed list. Hence,
newDF <- as.data.frame(t(sapply(2:4,function(x) { return (abs(sampleDF[1,]-sampleDF[x,]));})))
But another problem arises while computing rowSums:
> class(newDF)
[1] "data.frame"
> rowSums(newDF)
Error in base::rowSums(x, na.rm = na.rm, dims = dims, ...) :
'x' must be numeric
> newDF
X1 X2 X3
1 3 3 3
2 1 4 4
3 6 2 2
>
Puzzle 1: Why do I get this error? I did notice that newDF[1,1] is a list & not a number. Is it because of that? How can I ensure that the result of the sapply & transpose is a simple data frame of numbers?
So I proceed to create a global data frame & modify it within the function:
sapply(2:4,function(x) { newDF <<- as.data.frame(rbind(newDF,abs(sampleDF[1,]-sampleDF[x,])));})
> newDF
X1 X2 X3
2 3 3 3
3 1 4 4
4 6 2 2
> rowSums(outDF)
2 3 4
9 9 10
>
This is as expected.
Puzzle 2: Is there a cleaner way to achieve this? How can I do this for every row in the data frame (shown above is only for "distance" from row 1. Would need to do this for other rows as well)? Is running a loop the only option?
To put it in words, you are trying to compute the Manhattan distance:
dist(sampleDF, method = "Manhattan")
# 1 2 3
# 2 9
# 3 9 10
# 4 10 9 9
Regarding your implementation, I think the problem is that your inner function is returning a data.frame when it should return a numeric vector. Doing return(unlist(abs(sampleDF[1,]-sampleDF[x,]))) should fix it.

interaction, number of groups

From a vector a I'm looking for a function (quick to compute) that returns a vector with numbers ranging between 1 and the number of levels in vector a and indicating which values are equal.
I know how to do this with a for loop but it is a bit slow to run.
a=vector(11,14,11,22,14,22)
levels(as.factor(a))==3
Solution
b=vector(1,2,1,3,2,3)
meaning that in position 1 and 3 (where are the numbers 1 in b) the values in a are equal.
in position 2 and 5 (where are the numbers 2 in b) the values in a are equal.
etc...
Thank you
You can use as.numeric() on a factor to get this:
a <- c(11,14,11,22,14,22)
as.numeric(factor(a))
# [1] 1 2 1 3 2 3
Here is one function thats quickily made:
numberfun <- function(x){y <- unique(x)
match(x,y)}
a <- c(11,14,11,22,14,22)
numberfun(a)
#[1] 1 2 1 3 2 3
a <- c(99,99,22,22,44,22,99)
numberfun(a)
#[1] 1 1 2 2 3 2 1

Resources