Determining minimum values in a vector in R - r

I need some help in determining more than one minimum value in a vector. Let's suppose, I have a vector x:
x<-c(1,10,2, 4, 100, 3)
and would like to determine the indexes of the smallest 3 elements, i.e. 1, 2 and 3. I need the indexes of because I will be using the indexes to access the corresponding elements in another vector. Of course, sorting will provide the minimum values but I want to know the indexes of their actual occurrence prior to sorting.

In order to find the index try this
which(x %in% sort(x)[1:3]) # this gives you and index vector
[1] 1 3 6
This says that the first, third and sixth elements are the first three lowest values in your vector, to see which values these are try:
x[ which(x %in% sort(x)[1:3])] # this gives the vector of values
[1] 1 2 3
or just
x[c(1,3,6)]
[1] 1 2 3
If you have any duplicated value you may want to select unique values first and then sort them in order to find the index, just like this (Suggested by #Jeffrey Evans in his answer)
which(x %in% sort(unique(x))[1:3])

I think you mean you want to know what are the indices of the bottom 3 elements? In that case you want order(x)[1:3]

You can use unique to account for duplicate minimum values.
x<-c(1,10,2,4,100,3,1)
which(x %in% sort(unique(x))[1:3])

Here's another way with rank that includes duplicates.
x <- c(x, 3)
# [1] 1 10 2 4 100 3 3
which(rank(x, ties.method='min') <= 3)
# [1] 1 3 6 7

Related

Looping through items on a list in R

this may be a simple question but I'm fairly new to R.
What I want to do is to perform some kind of addition on the indexes of a list, but once I get to a maximum value it goes back to the first value in that list and start over from there.
for example:
x <-2
data <- c(0,1,2,3,4,5,6,7,8,9,10,11)
data[x]
1
data[x+12]
1
data[x+13]
3
or something functionaly equivalent. In the end i want to be able to do something like
v=6
x=8
y=9
z=12
values <- c(v,x,y,z)
data <- c(0,1,2,3,4,5,6,7,8,9,10,11)
set <- c(data[values[1]],data[values[2]], data[values[3]],data[values[4]])
set
5 7 8 11
values <- values + 8
set
1 3 4 7
I've tried some stuff with additon and substraction to the lenght of my list but it does not work well on the lower numbers.
I hope this was a clear enough explanation,
thanks in advance!
We don't need a loop here as vectors can take vectors of length >= 1 as index
data[values]
#[1] 5 7 8 11
NOTE: Both the objects are vectors and not list
If we need to reset the index
values <- values + 8
ifelse(values > length(data), values - length(data) - 1, values)
#[1] 1 3 4 7

Create a new vector with no element being in the same position as the original vector?

Suppose I have a vector V1 (with two or more elements):
V1 <- 1:10
I can reorder the original vector with the function sample. This function, however, cannot make sure that none element in the new vector being in the same position as the original vector. For example:
set.seed(4)
V2 <- sample(V1)
This will result in a vector that has two elements being in the same position as the original one:
V1[V1 == V2]
3 5
My question is: Is it possible to generate a random vector to make sure that no element being in the same position between the two vectors?
Your requirement of not having certain indices in the vector not being able to shift means that you don't want a purely random permutation, where that might happen. The best I could come up with is to just loop, using sample until we find a vector where every element shifts:
v1 <- 1:10
v1_perm <- v1
cnt <- 0
while (sum(v1 == v1_perm) > 0) {
v1_perm <- sample(v1)
cnt <- cnt + 1
}
v1
v1_perm
paste0("It took ", cnt, " tries to find a suitable vector")
[1] 1 2 3 4 5 6 7 8 9 10
[1] 3 10 4 7 8 1 6 2 5 9
[1] "It took 3 tries to find a suitable vector"
Demo
Note that I have implemented the requirement of shifting positions with shifting values. This of course isn't strictly true, because two values could be the same. But, assuming all your entries are unique, then checking for zero overlap of values equates with zero overlap of indices.

R: Which element of a list corresponds to an element of a sorted list

If I have a vector x, and I want to know which 5 values of x have the smallest values and their location in x.
The smallest 5 values of x will be sort(x)[1:5],
But how do I know what place these values have in the original x vector?
You are looking for the order function.
order returns a permutation which rearranges its first argument into ascending or descending order, breaking ties by further arguments. sort.list is the same, using only one argument.
> x <- rnorm(10)
[1] 1.6722546 1.3608374 0.7912174 -0.7017244 -0.2093535 1.7224396 -0.9370661 -1.5226014 0.4416517 -0.0455294
> order(x)
[1] 8 7 4 5 10 9 3 2 1 6
> x[order(x)[1:3]]
[1] -1.5226014 -0.9370661 -0.7017244

Excluding a number of answers from a R dataframe

I'm looking for a way to exclude a number of answers from a length function.
This is a follow on question from Getting R Frequency counts for all possible answers In sql the syntax could be
select * from someTable
where variableName not in ( 0, null )
Given
Id <- c(1,2,3,4,5)
ClassA <- c(1,NA,3,1,1)
ClassB <- c(2,1,1,3,3)
R <- c(5,5,7,NA,9)
S <- c(3,7,NA,9,5)
df <- data.frame(Id,ClassA,ClassB,R,S)
ZeroTenNAScale <- c(0:10,NA);
R.freq = setNames(nm=c('R','freq'),data.frame(table(factor(df$R,levels=ZeroTenNAScale,exclude=NULL))));
S.freq = setNames(nm=c('S','freq'),data.frame(table(factor(df$S,levels=ZeroTenNAScale,exclude=NULL))));
length(S.freq$freq[S.freq$freq!=0])
# 5
How would I change
length(S.freq$freq[S.freq$freq!=0])
to get an answer of 4 by excluding 0 and NA?
We can use colSums,
colSums(!is.na(S.freq)[S.freq$freq!=0,])[[1]]
#[1] 4
You can use sum to calculate the sum of integers. if NA's are found in your column you could be using na.rm(), however because the NA is located in a different column you first need to remove the row containing NA.
Our solution is as follows, we remove the rows containing NA by subsetting S.freq[!is.na(S.freq$S),], but we also need the second column freq:
sum(S.freq[!is.na(S.freq$S), "freq"])
# 4
You can try na.omit (to remove NAs) and subset ( to get rid off all lines in freq equal to 0):
subset(na.omit(S.freq), freq != 0)
S freq
4 3 1
6 5 1
8 7 1
10 9 1
From here, that's straightforward:
length(subset(na.omit(S.freq), freq != 0)$freq)
[1] 4
Does it solve your problem?
Just add !is.na(S.freq$S) as a second filter:
length(S.freq$freq[S.freq$freq!=0 & !is.na(S.freq$S)])
If you want to extend it with other conditions, you could make an index vector first for readability:
idx <- S.freq$freq!=0 & !is.na(S.freq$S)
length(S.freq$freq[idx])
You're looking for values with frequency > 0, that means you're looking for unique values. You get this information directly from vector S:
length(unique(df$S))
and leaving NA aside you get answer 4 by:
length(unique(df$S[!is.na(df$S)]))
Regarding your question on how to exclude a number of items based on their value:
In R this is easily done with logical vectors as you used it in you code already:
length(S.freq$freq[S.freq$freq!=0])
you can combine different conditions to one logical vector and use it for subsetting e.g.
length(S.freq$freq[S.freq$freq!=0 & !is.na(S.freq$freq)])

R: Index to unique vector that returns original

I have a vector v <- c(6,8,5,5,8) of which I can obtain the unique values using
> u <- unique(v)
> u
[1] 6 8 5
Now I need an index i = [2,3,1,1,3] that returns the original vector v when indexed into u.
> u[i]
[1] 6,8,5,5,8
I know such an index can be generated automatically in Matlab, the ci index, but does not seem to be part of the standard repertoire in R. Is anyone aware of a function that can do this?
The background is that I have several vectors with anonymized IDs that are long character strings:
ids
"PTefkd43fmkl28en==3rnl4"
"cmdREW3rFDS32fDSdd;32FF"
"PTefkd43fmkl28en==3rnl4"
"PTefkd43fmkl28en==3rnl4"
"cmdREW3rFDS32fDSdd;32FF"
To reduce the file size and simplify the code, I want to transform them into integers of the sort
ids
1
2
1
1
2
and found that the index of the unique vector does just this. Since there are many rows, I am hesitant to write a function that loops over each element of the unique vector and wonder whether there is a more efficient way — or a completely different way to transform the character strings into matching integers.
Try with match
df1$ids <- with(df1, match(ids, unique(ids)) )
df1$ids
#[1] 1 2 1 1 2
Or we can convert to factor and coerce to numeric
with(df1,as.integer(factor(ids, levels=unique(ids))))
#[1] 1 2 1 1 2
Using u and v. Based on the output of 'u' in the OP's post, it must have been sorted
u <- sort(unique(v))
match(v, u)
#[1] 2 3 1 1 3
Or using findInterval. Make sure that 'u' is sorted.
findInterval(v,u)
#[1] 2 3 1 1 3

Resources