Match a vector to multiple consecutive rows in R [duplicate] - r

This question already has answers here:
How to index a vector sequence within a vector sequence
(5 answers)
Closed 5 years ago.
I have got a dataframe and I need to find row numbers where the values of the entries in one column match a certain pattern.
Let the col1 col1 = matrix(c(1,0,0,0,0,0,0,0,0,0,2,0,2,0,0,0,0,0,0,0,1), nrow = 21, ncol = 1) be an example of by column and vector r r = c(2, 0 ,2) be a vector I need to match it with.
I need R to return an index number of rows where the pattern in r matches the values in col1 (in this case row 11, 12, 13).
I thought I could achieve this with row.match, but that is not the case. I have tried different combinations of match function, but it doesn't yield any results either.
Maybe the way I am approaching this problem is wrong from the beginning, but I have trouble believing that there isn't any function, that would provide me with the expected result given some adjustment.
Thanks.

You could do this using rollapply from zoo. Basically, this runs identical on a rolling basis with a window of length(r). This tells you that the sequence is present starting at positon 11 of the col1 vector..
library(zoo)
which(rollapply(col1,length(r),identical,r))
[1] 11
To get a vector of positions, you could do:
which(rollapply(col1,length(r),identical,r))+0:(length(r)-1)
[1] 11 12 13

Related

Finding the most repeated value using table() function [duplicate]

This question already has answers here:
How to retrieve the most repeated value in a column present in a data frame
(9 answers)
Closed 2 years ago.
I was given a sample vector v and was asked to use R code to extract, as a number (meaning: not as a character string), the value that was repeated the most times in v.
(Hints: use table(); note that which.max() gives you index of a vector's maximum value, like the maximum value within a table; names() allows you the extract the values of the original vector, when applied to the output of table().)
My answer is as follows:
names(which.max(table(v)))
it returns the correct answer as a string, not as a number. Am i using the hint correctly? Thanks.
names return the number as character, perhaps add as.integer/as.numeric to convert it to number.
as.integer(names(which.max(table(v))))
Moreover, in case of tie which.max would return only the first maximum. If you want all the values which are tied you can use :
v <- c(1, 1, 2, 4, 5, 3, 3)
as.integer(names(which.max(table(v))))
#[1] 1
tab <- table(v)
as.integer(names(tab[max(tab) == tab]))
#[1] 1 3

Reshaping matrix into vector of alternate columns [duplicate]

This question already has answers here:
Convert a matrix to a 1 dimensional array
(11 answers)
Closed 4 years ago.
I have a matrix measuring 91 x 2 (i.e 91 rows and two columns).
mat1 <- matrix(1:182, 91, 2)
I need to create a vector from the said matrix of one row. I can do that with the following:
mat2 <- matrix(mat1, nrow = 1, byrow = TRUE).
However, I would like to have each row in the original matrix to be represented one after another. Currently it's taking all of column 1 then all of column 2 and joining those together sequentially. Whilst I need them to be in one long row, like this: 1,92,2,93,3,94 etcMeaning the structure ultimately would be 1,182 (i.e. one row with 182 columns).
How can I achieve this?
Thanks.
We can transpose the matrix and convert it to a vector
c(t(mat1))

R count number of variables with value ="mq" per row [duplicate]

This question already has answers here:
How to count the frequency of a string for each row in R
(4 answers)
Closed 4 years ago.
I have a data frame with 70variables, I want to create a new variable which counts the number of occurrences where the 70 variables take the value "mq" on a per row basis.
I am looking for something like this:
[ID] [Var1] [Var2] [Count_mq]
1. mq mq 2
2. 1 mq 1
3. 1 7 0
I have found this solution:
count_row_if("mq",DT)
But it gives me a vector with those values for the whole data frame and it is quite slow to compute.
I would like to find a solution using the function apply() but I don't know how to achieve this.
Best.
You can use the 'apply' function to count a particular value in your existing dataframe 'df',
df$count.MQ <- apply(df, 1, function(x) length(which(x=="mq")))
Here the second argument is 1 since you want to count for each row. You can read more about it from https://www.rdocumentation.org/packages/base/versions/3.5.1/topics/apply
I assume the name of dataset is DT. I'm a bit confused what you really want to get but this is how I understand. Data frame consists of 70 columns and a number of rows that some of them have observations 'mq'.
If I get it right, please see the code below.
apply(DT, function(x) length(filter(DT,value=='mq')), MARGIN=1)

subset whole data frame for value and return rows in which value are found [duplicate]

This question already has answers here:
Finding rows containing a value (or values) in any column
(3 answers)
Closed 6 years ago.
I am trying to subset a data frame containing 626 obs. of 149 variables and I want to look for a specific string and return the rows that have that value regardless of what column it is found in.
For example:
I am looking for this string "GO:0004674" in a data frame that can contain this string in many different columns and rows as shown below in the image link.
For example the string "GO:0004674" can be found in row 12, 13 and 14. So I would want to keep only those rows and later on export them.
How can I perform this? All examples that I have seen thus far only look for string in a specific column and not in the whole dataframe.
Ant help will be greatly appreciated.
You can use apply to do row-wise operation using the argument MARGIN = 1. Example:
mydf[apply(mydf, MARGIN = 1, FUN = function(x) {"GO:0004674" %in% x}), ]

R - sum vectors matching names [duplicate]

This question already has answers here:
Sum rows in data.frame or matrix
(7 answers)
Closed 7 years ago.
I need to sum columns of a table that have a names starting with a particular string.
An example table might be:
tbl<-data.frame(num1=c(3,2,9), num2=c(3,2,9),n3=c(3,2,9),char1=c('a', 'b', 'c'))
I get the list of columns (in this example I wrote only 2, but the real case has more tan 20).
a<-colnames(tbl)[grep('num', colnames(tbl))]
I tried with
sum(tbl[,a])
But I get only one number with the total sum of the elements in both vectors.
What I need is the result of:
tbl$num1+ tbl$num2
We can either use Reduce
Reduce(`+`, tbl[a])
Or rowSums. The rowSums also has the option of removing the NA elements with na.rm=TRUE.
rowSums(tbl[a])

Resources