Related
This question already has answers here:
How to find the largest N elements in a list in R?
(4 answers)
Closed 10 months ago.
I have a vector and I want to find the indices of the k greatest elements, not the elements themselves which I could do with sort. One idea would be to add indices to the values and have a custom sort function that only compares the first elements of pairs (a classical solution to this problem) but surely there has to be a simpler way ? Note that performance isn`t a matter.
First I create a random vector:
vector <- c(1, 3, 6, 2, 7, 8, 10, 4)
Next, you can use the following code which will output the top k elements as x with index ix:
k <- 3
lst <- sort(vector, index.return=TRUE, decreasing=TRUE)
lapply(lst, `[`, lst$x %in% head(unique(lst$x),k))
Output:
$x
[1] 10 8 7
$ix
[1] 7 6 5
As you can see ix gives the index of the top k elements.
Using rank.
x <- c(1, 3, 6, 2, 7, 8, 10, 4)
seq_along(x)[rank(-x) < 4]
# [1] 5 6 7
If you have ties, the result is this:
x <- c(10, 3, 6, 2, 7, 8, 10, 4)
seq_along(x)[rank(-x) < 4]
# [1] 1 6 7
I have the following multiset X, in which I want to find the distances between all the numbers. Is there any way to integrate this into a FOR LOOP so that If I was given a different sized multiset, I wouldn't have to manually do it like i did below?
the final answer IS [0,2, 2, 3, 3, 4, 5, 6, 7, 8, 10] (sorted) for this example
X=c(0,10,8,3,6)
L=length(X)
print(L)
##for(i in seq(from=1, to=L )){}
print(abs(X[1]-X[2]), abs(X[1]-X[3]),
abs(X[1]-X[4]), abs(X[1]-X[5]),
abs(X[1]-X[6]),
abs(X[2]-X[3]), abs(X[2]-X[4]),
abs(X[2]-X[5]), abs(X[2]-X[6]),
abs(X[3]-X[4]), abs(X[3]-X[5]),
abs(X[3]-X[6]),
abs(X[4]-X[5]), abs(X[4]-X[6]),
abs(X[5]-X[6])
)
You may see this vector as a column vector and apply dist:
sort(dist(X))
# [1] 2 2 3 3 4 5 6 7 8 10
I have vectors in R containing a lot of 0's, and a few non-zero numbers.Each vector starts with a non-zero number.
For example <1,0,0,0,0,0,2,0,0,0,0,0,4,0,0,0>
I would like to set all of the zeros equal to the most recent non-zero number.
I.e. this vector would become <1,1,1,1,1,1,2,2,2,2,2,2,4,4,4,4>
I need to do this for a about 100 vectors containing around 6 million entries each. Currently I am using a for loop:
for(k in 1:length(vector){
if(vector[k] == 0){
vector[k] <- vector[k-1]
}
}
Is there a more efficient way to do this?
Thanks!
One option, would be to replace those 0 with NA, then use zoo::na.locf:
x <- c(1,0,0,0,0,0,2,0,0,0,0,0,4,0,0,0)
x[x == 0] <- NA
zoo::na.locf(x) ## you possibly need: `install.packages("zoo")`
# [1] 1 1 1 1 1 1 2 2 2 2 2 2 4 4 4 4
Thanks to Richard for showing me how to use replace,
zoo::na.locf(replace(x, x == 0, NA))
You could try this:
k <- c(1,0,0,0,0,0,2,0,0,0,0,0,4,0,0,0)
k[which(k != 0)[cumsum(k != 0)]]
or another case that cummax would not be appropriate
k <- c(1,0,0,0,0,0,2,0,0,0,0,0,1,0,0,0)
k[which(k != 0)[cumsum(k != 0)]]
Logic:
I am keeping "track" of the indices of the vector elements that are non zero which(k != 0), lets denote this new vector as x, x=c(1, 7, 13)
Next I am going to "sample" this new vector. How? From k I am creating a new vector that increments every time there is a non zero element cumsum(k != 0), lets denote this new vector as y y=c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3)
I am "sampling" from vector x: x[y] i.e. taking the first element of x 6 times, then the second element 6 times and the third element 3 times. Let denote this new vector as z, z=c(1, 1, 1, 1, 1, 1, 7, 7, 7, 7, 7, 7, 13, 13, 13)
I am "sampling" from vector k, k[z], i.e. i am taking the first element 6 times, then the 7th element 6 times then the 13th element 3 times.
Add to #李哲源's answer:
If it is required to replace the leading NAs with the nearest non-NA value, and to replace the other NAs with the last non-NA value, the codes can be:
x <- c(0,0,1,0,0,0,0,0,2,0,0,0,0,0,4,0,0,0)
zoo::na.locf(zoo::na.locf(replace(x, x == 0, NA),na.rm=FALSE),fromLast=TRUE)
# you possibly need: `install.packages("zoo")`
# [1] 1 1 1 1 1 1 1 1 2 2 2 2 2 2 4 4 4 4
I have two vectors that one (v.num) corresponds to one variable in df and the other one (v.type) is a vector that I want to fill into the df with the corresponding vector.
v.num<-c(5, 6, 7, 8, 9, 10, 11)
v.type<-c(1, 3, 5, 2, 2, 4, 1)
The df looks like this:
set.seed(2016)
df <- data.frame(v.num=sample(5:11, 40000, replace=TRUE), obs=rnorm(40000))
What I want to do is create a v.type vector and store the v.type that corresponds to v.num
Like:
head(df)
v.num obs v.type
1 6 1.6149522 3
2 6 -0.2676644 3
3 10 0.3013365 4
4 5 -0.8514377 1
5 8 0.5786278 2
6 5 -1.2974004 1
I tried
for(i in 1:nrow(df)){
for(v in 1:length(v.num){
if(v.num[v]==df$v.num[i]){df$v.type[i]<-v.type[v]}
}}
But it takes pretty long, because I have 40000 rows. What is the most efficient way to do this task?
Check out the following code:
v.num<-c(5, 6, 7, 8, 9, 10, 11)
v.type<-c(1, 3, 5, 2, 2, 4, 1)
set.seed(2016)
df <- data.frame(v.num=sample(5:11, 40000, replace=TRUE), obs=rnorm(40000))
df$v.type <- v.type[match(df$v.num, v.num)]
I have a graph with names from 1 to 10
library(igraph)
library(Cairo)
g<- graph(c(0,1,0,4,0,9,1,7,1,9,2,9,2,3,2,5,3,6,3,9,4,5,4,8,5,8,6,7,6,8,7,8),n=10,dir=FALSE)
V(g)$name<-c(1:10)
V(g)$label<-V(g)$name
coords <- c(0,0,13.0000,0,5.9982,5.9991,7.9973,7.0009,-1.0008,11.9999,0.9993,11.0002,7.9989,13.0009,10.9989,14.0009,5.9989,14.0009,7.0000,4.0000)
coords <- matrix(coords, 10,2,byrow=T)
plot(g,layout=coords)
listMn<-neighborhood(g,1,0:9)
I'd like to do this but in opposite way
m1<-V(g)[listMn[[7]]]$name
the above instructions gets,
7 4 8 9
how to get listMn[[7]]=6 3 7 8 from names 7 4 8 9?
Node numbering starts at zero: listMn[[7]] gives the numbers of the neighbours of the seventh node (number 6, name 7), i.e., 6, 3, 7, 8, corresponding to names (add 1 to the numbers) 7, 4, 8, 9.
Using strings for the names may be less confusing:
V(g)$name <- as.character( 1:10 )