R subset string values including vertical bar(|) - r

I am trying to subset a data based on a column value. I am trying to subset if that specific column has only one level information. Here how my data look like.
data <- cbind(v1=c("a", "ab", "a|12|bc", "a|b", "ac","bc|2","b|bc|12"),
v2=c(1,2,3,5,3,1,2))
> data
v1 v2
[1,] "a" "1"
[2,] "ab" "2"
[3,] "a|12|bc" "3"
[4,] "a|b" "5"
[5,] "ac" "3"
[6,] "bc|2" "1"
[7,] "b|bc|12" "2"
I want to subset only with the character values that were not including "|", like below:
> data
v1 v2
[1,] "a" "1"
[2,] "ab" "2"
[3,] "ac" "3"
basically, I am trying to get rid of two-level (x|y) or three level values (x|y|z). Any thoughts on this?
Thanks!

We can use grep to find the row that have |, use the invert option to get the row index of elements that have no |, use that to subset the rows of the matrix
data[grep("|", data[,1], invert = TRUE, fixed = TRUE), ]
# v1 v2
#[1,] "a" "1"
#[2,] "ab" "2"
#[3,] "ac" "3"
NOTE: The fixed = TRUE is used or else it will check with the regex mode on and | is a metacharacter for OR condition. Other option are to escape (\\|) or place it inside square brackets ([|]) to capture the literal character (when fixed = FALSE)

Using logical grepl this can be done as follows. I will leave it in two code lines for clarity but it's straightforward to make of it a one-liner.
i <- !grepl("\\|", data[, 1])
data[i, ]
# v1 v2
#[1,] "a" "1"
#[2,] "ab" "2"
#[3,] "ac" "3"

Related

Pipe that leads to a map ends up giving a list of incorrect length

Using the combn function, I want to generate all possible combinations of the vector c("1", "2", "3") when choosing 2 elements (m = 2.) The code looks like this:
comparisons <- combn(c("1", "2", "3"), m = 2)
[,1] [,2] [,3]
[1,] "1" "1" "2"
[2,] "2" "3" "3"
I then transpose this data-frame, so it becomes this:
comparisons <- t(comparisons)
[,1] [,2]
[1,] "1" "2"
[2,] "1" "3"
[3,] "2" "3"
The last step is to generate a list, where each element is a row from this transposed data-frame. I used map, and it gave me exactly what I wanted:
comparisons <- map(1:3, ~ comparisons[.x, ])
[[1]]
[1] "1" "2"
[[2]]
[1] "1" "3"
[[3]]
[1] "2" "3"
This is all fine and dandy, but when I try to pipe all of these together in one nice assignment, the resulting list is incorrect.
comparisons <- combn(c("1", "2", "3"), m = 2) %>%
t() %>%
map(1:3, ~ .[.x, ])
[[1]]
NULL
[[2]]
NULL
[[3]]
NULL
[[4]]
NULL
[[5]]
NULL
[[6]]
NULL
Here is the thing, when I turn your matrix into a tibble and then a list I get to your desired output. Since every data frame/tibble is also a list so every column is equivalent to one element of the list.
package(purrr)
comparisons %>%
as_tibble() %>%
as.list() %>% # Up here it will get your desire output but if you want to transpose it however you can run the last line of code.
transpose()
$a # Before running transpose
[1] "1" "2"
$b
[1] "1" "3"
$c
[1] "2" "3"
# After running tranpose
[[1]]
[[1]]$a
[1] "1"
[[1]]$b
[1] "1"
[[1]]$c
[1] "2"
[[2]]
[[2]]$a
[1] "2"
[[2]]$b
[1] "3"
[[2]]$c
[1] "3"

Subset rows based on "start and stop" strings

looking to write an R script that will search a column for a specific value and begin sub setting rows until a specific text value is reached.
Example:
X1 X2
[1,] "a" "1"
[2,] "b" "2"
[3,] "c" "3"
[4,] "d" "4"
[5,] "e" "5"
[6,] "f" "6"
[7,] "c" "7"
[8,] "k" "8"
What I'd like to do is search through X1 until the letter 'c' is found, and begin to subset rows until another letter 'c' is found, at which point the subset procedure would stop. Using the above example, the result should be a vector containing c(3,4,5,6,7).
Assume there will be no more than 2 rows where X1 equals 'c'
Any help is greatly appreciated.
You can lookup where a value is with the function which, and use that as in index to get the values you are looking for. If you want everything from the first to the second "c", it would look like this:
indices <- which(df$X1=='c')
range <- indices[1]:indices[2]
df$X2[range]

How to delete more than one row in a matrix, not just last row, when using a for loop in R

I tried to go about this myself and looked up online how to do this, but no direct answer. Basically, I am trying to delete the rows in a matrix that have more than 3 characters. My code is only deleting the last row. Rows 16-31 should be deleted. The i gets iterated, but only deletes the last column which satisfies the condition. However, more rows must be deleted. Thanks for the help in advance!
setwd("~/Desktop/Rpractice")
c <- c("1", "2", "3", "4", "5")
combine <- function (x, y) {combn (y, x, paste, collapse = ",")}
combination_mat <- as.matrix(unlist(lapply (1:length (c), combine, c)))
for (i in length(combination_mat)) {
if (nchar(combination_mat[i]) > 3) {
newmat <- print(as.matrix(combination_mat[-i,]))
}
}
You really do not need a loop to remove those rows, eg you can look for the rows with more than 3 characters and remove those (please note the drop=FALSE argument to keep the tabular format of the data instead of simplifying that to a vector):
> combination_mat[nchar(combination_mat[, 1]) <= 3, , drop = FALSE]
[,1]
[1,] "1"
[2,] "2"
[3,] "3"
[4,] "4"
[5,] "5"
[6,] "1,2"
[7,] "1,3"
[8,] "1,4"
[9,] "1,5"
[10,] "2,3"
[11,] "2,4"
[12,] "2,5"
[13,] "3,4"
[14,] "3,5"
[15,] "4,5"

Subsetting Identical Observations in R [duplicate]

This question already has answers here:
Finding ALL duplicate rows, including "elements with smaller subscripts"
(9 answers)
Closed 8 years ago.
I am trying to look at protein sequence homology using R, and I'd like to go through a data frame looking for identical pairs of Position and Letter. The data look similar to the frame below:
Letter <- c("A", "B", "C", "D", "D", "E", "G", "L")
Position <- c(1, 2, 3, 4, 4, 5, 6, 7)
data.set <- cbind(Position, Letter)
Which yields:
Position Letter
[1,] "1" "A"
[2,] "2" "B"
[3,] "3" "C"
[4,] "4" "D"
[5,] "4" "D"
[6,] "5" "E"
[7,] "6" "G"
[8,] "7" "L"
I'd like to loop through and find all identical observations (in this case, observations 4 and 5), but I'm having difficulty in discovering the best way to do it.
I'd like the resultant data frame to look like:
Position Letter
[1,] "4" "D"
[2,] "4" "D"
The ways I've tried to do this ended up yielding this code, but unfortunately it returns one value of TRUE because I realized that I am comparing two identical data frames:
> identical(data.set[1:nrow(data.set),1:2], data.set[1:nrow(data.set),1:2])
[1] TRUE
I'm not sure if looping through using the identical() function would be the best way? I'm sure there's a more elegant solution that I am missing.
Thanks for any help!
Try the unique function:
unique(data.set)
...
You can use duplicated using fromLast to go in two directions:
data.set[(duplicated(data.set)==T | duplicated(data.set, fromLast = TRUE) == T),]
# Position Letter
#[1,] "4" "D"
#[2,] "4" "D"

r - pairwise combinations of rows from table?

Assume a table as below:
X =
col1 col2 col3
row1 "A" "0" "1"
row2 "B" "2" "NA"
row3 "C" "1" "2"
I select combinations of two rows, using the code below:
pair <- apply(X, 2, combn, m=2)
This returns a matrix of the form:
pair =
[,1] [,2] [,3]
[1,] "A" "0" "1"
[2,] "B" "2" NA
[3,] "A" "0" "1"
[4,] "C" "1" "2"
[5,] "B" "2" NA
[6,] "C" "1" "2"
I wish to iterate over pair, taking two rows at a time, i.e. first isolate [1,] and [2,], then [3,] and [4,] and finaly, [5,] and [6,]. These rows will then be passed as arguments to regression models, i.e. lm(Y ~ row[i]*row[j]).
I am dealing with a large dataset. Can anybody advise how to iterate over a matrix two rows at a time, assign those rows to variables and pass as arguments to a function?
Thanks,
S ;-)
It is unnecessary to multiply the rows of your matrix like that, and if you have a large data set it is might get problematic. In stead just pick out the relevant rows for each instance. But it is convenient to create the selection beforehand, something like this perhaps:
xselect <- combn(1:nrow(X),2)
To illustrate with your data (assuming you only use columns 2 and 3):
X <- matrix(c("A", "B", "C", 0,2,1,1,NA,2),3,3)
Y <- rnorm(2, 4, 2)
for (i in 1:ncol(xselect))
{
x1 <- as.numeric(X[xselect[1,i], c(2,3)])
x2 <- as.numeric(X[xselect[2,i], c(2,3)])
print(lm(Y ~ x1 * x2))
}
I'm not sure exactly what you're trying to do with the linear models, but to iterate over X, a pair of rows at a time, make a factor for each pair, and then use by
fac <- as.factor(sort(rep(1:(nrow(X)/2), 2)))
by(X, fac, FUN)
where FUN is whatever function you want to apply over the pairs of rows in X.

Resources