Extract vector from matrix based on column index changing by row - r

I am struggling with a theoretically simple problem with R:
say I have the following matrix:
a <- matrix(1:16,ncol=4)
and the following vector showing the column position I need to extract for each row:
b <- c(4,3,1,1)
I need to return the following vector:
[1] 13 10 3 4
In other words, for each row I need to extract the element whose column position is shown in the corrisponding c value.
I have search extensively on this site but could not find a solution.
Can anyone please help me? Thanks

You can try
a[cbind(1:nrow(a), b)]
#[1] 13 10 3 4

Related

How to calculate most frequent occurring terms/words in a document collection/corpus using R?

First I create a document term matrix like below
dtm <- DocumentTermMatrix(docs)
Then I take the sum of the occurance of each word vectors as below
totalsums <- colSums(as.matrix(dtm))
My totalsums (R says type 'double') looks like below for first 7 elements.
aaab aabb aabc aacc abbb abbc abcc ...
9 2 10 4 7 3 12 ...
I managed to sort this with the following command
sorted.sums <- sort(totalsums, decreasing=T)
Now I want to extract the first 4 terms/words with the highest sums which are greater than value 5.
I could get the first 4 highest with sorted.sums[1:4] but how can I set a threshold value?
I managed to do this with the order function like below but, is there a way to do this than sort function or without using findFreqTerms fucntion?
ord.totalsums <- order(totalsums)
findFreqTerms(dtm, lowfreq=5)
Appreciate your thoughts on this.
You can use
sorted.sums[sorted.sums > 5][1:4]
But if you have at least 4 values that are greater than 5 only using sorted.sums[1:4] should work as well.
To get the words you can use names.
names(sorted.sums[sorted.sums > 5][1:4])

Create vector selecting values from two different vectors

Currently, this code works to do what I want to do where dx$res is a vector selecting values from dx$val1 or dx$val2 depending on value of dx$x0.
x0<-c(1,2,1,2,2,1)
val1<-c(8,6,4,5,3,2)
val2<-c(4,8,6,7,9,5)
dx<-data.frame(x0,val1,val2)
dx$res<-(dx$x0==1)*dx$val1+(dx$x0==2)*dx$val2
I would like to know if there were more elegant methods to do this like using apply function.
One option is model.matrix with rowSums. It is also more general for 'n' number of distinct elements in the 'x0' column.
dx$res <- rowSums(dx[-1]*model.matrix(~ factor(x0) - 1 , dx))
dx$res
#[1] 8 8 4 7 9 2

Twofold, consecutive row selecting starting at different rows in R

I have got the following problem. I have a data.frame with an x and y column representing some points in space:
X<-c(18.25743,18.25783,18.25823,18.25850,18.25863,18.25878,
18.25885,18.25912,18.25943,18.25962,18.25978,18.26000,
18.26022,18.26051,18.26070,18.26095,18.26118,18.26140,
18.26189,18.26250,18.26310,18.26390)
Y<-c(44.69561,44.69564,44.69567,44.69567,44.69586,
44.69600,44.69637,44.69671,44.69691,44.69701,44.69720,
44.69740,44.69763,44.69774,44.69787,44.69790,44.69791,
44.69795,44.69812,44.69802,44.69812,44.69834)
eDF<-data.frame(X,Y)
Now my problem is they are "sorted" wrong for plotting.So what I need is a function to write together the rows of the two points which belong together (in a list of lists):
1 and 12 is ID1
2 and 13 is ID2
3 and 14 is ID3
...
11 and 22 is ID11
Every so created list within the list of lists should have its unique ID (just numerating from 1 to the end). Well because I got this problem in all my data with different length.
It would be great if the starting point of the second consecutive row selecting (the 12) is flexible always taking the first row after half of the data.((rownumber/2)+1) in this example
12.
Well I have tried some things and i think Im on the right way but I cant figure out a solution by myself.
This function is pretty near but i cant manage to make it start at different rows(1 and 12):
lapply(2:nrow(eDF), function(x) eDF[(x-1):x,])
I also tried to figure it out with seq and it would do what i need if i could make a list of lists by connecting both code samples. Well I also need to change the concrete start and end numbers to a dynamic solution.
eDF[(seq(1,to=11,by=1)),] # selecting rows 1 to 11
eDF[(seq(12,to=nrow(eDF),by=1)),] #selecting rows 12 to end
Anyone any ideas?
I don't know if you needed an ID column inside of the new list but another way would be:
#create the IDs
eDF$ID <- rep(1:11,2)
#split the data.frame according to those
mylist <- split(eDF, eDF$ID)
Output:
mylist
$`1`
X Y ID
1 18.25743 44.69561 1
12 18.26000 44.69740 1
$`2`
X Y ID
2 18.25783 44.69564 2
13 18.26022 44.69763 2
$`3`
X Y ID
3 18.25823 44.69567 3
14 18.26051 44.69774 3
$`4`
X Y ID
4 18.2585 44.69567 4
15 18.2607 44.69787 4
#and so on...
You could only do split(eDF, rep(1:11,2) if you don't need the ID column.
We can modify the OP's lapply code
lapply(1:11, function(i) eDF[c(i, i+11),])

Apply an operation to some elements of a vector by using indices

I've got a fairly basic question concerning vector operations in R. I want to apply a certain operation (i.e. increment) to specific elements of a vector by using a vector containing the indices of the elements.
For example:
ind <- c(2,5,8)
vec <- seq(1,10)
I want to add 1 to the 2nd, 5th and 8th element of vec. In the end I'd like to have:
vec <- c(1,3,3,4,6,6,7,9,8,10)
I tried vec[ind] + 1
but that returns only the three elements. I could use a for-loop, of course, but knowing R, I'm sure there's a more elegant way.
Any help would be much appreciated.
We have to assign it
vec[ind] <- vec[ind] + 1
vec
#[1] 1 3 3 4 6 6 7 9 9 10

Find indices of 5 closest samples in distance matrix

Users
I have a distance matrix dMat and want to find the 5 nearest samples to the first one. What function can I use in R? I know how to find the closest sample (cf. 3rd line of code), but can't figure out how to get the other 4 samples.
The code:
Mat <- replicate(10, rnorm(10))
dMat <- as.matrix(dist(Mat))
which(dMat[,1]==min(dMat[,1]))
The 3rd line of code finds the index of the closest sample to the first sample.
Thanks for any help!
Best,
Chega
You can use order to do this:
head(order(dMat[-1,1]),5)+1
[1] 10 3 4 8 6
Note that I removed the first one, as you presumably don't want to include the fact that your reference point is 0 distance away from itself.
Alternative using sort:
sort(dMat[,1], index.return = TRUE)$ix[1:6]
It would be nice to add a set.seed(.) when using random numbers in matrix so that we could show the results are identical. I will skip the results here.
Edit (correct solution): The above solution will only work if the first element is always the smallest! Here's the correct solution that will always give the 5 closest values to the first element of the column:
> sort(abs(dMat[-1,1] - dMat[1,1]), index.return=TRUE)$ix[1:5] + 1
Example:
> dMat <- matrix(c(70,4,2,1,6,80,90,100,3), ncol=1)
# James' solution
> head(order(dMat[-1,1]),5) + 1
[1] 4 3 9 2 5 # values are 1,2,3,4,6 (wrong)
# old sort solution
> sort(dMat[,1], index.return = TRUE)$ix[1:6]
[1] 4 3 9 2 5 1 # values are 1,2,3,4,6,70 (wrong)
# Correct solution
> sort(abs(dMat[-1,1] - dMat[1,1]), index.return=TRUE)$ix[1:5] + 1
[1] 6 7 8 5 2 # values are 80,90,100,6,4 (right)

Resources