Referring to multiple columns by name - r

My question is a variation of this question. What I want is to add a prefix to a vector or column names (which is a subset of all column names). I tried to expand the solution from the link to more columns as follows, but got stuck.
Data:
m2 <- cbind(1,1:4,4:1)
colnames(m2) <- c("x","y","z")
x y z
[1,] 1 1 4
[2,] 1 2 3
[3,] 1 3 2
[4,] 1 4 1
colnames(m2)[colnames(m2) == c("x","z")] <- paste("Sub", colnames(m2)[colnames(m2) == c("x","z")], sep = "_")
Warning messages:
1: In colnames(m2) == c("x", "z") :
longer object length is not a multiple of shorter object length
2: In colnames(m2) == c("x", "z") :
longer object length is not a multiple of shorter object length
m2
Sub_x y z
[1,] 1 1 4
[2,] 1 2 3
[3,] 1 3 2
[4,] 1 4 1
The code gives two warnings and only changes one column.
Desired output:
m2 <- cbind(1,1:4,4:1)
colnames(m2) <- c("x","y","z")
colnames(m2)[1] <- paste("Sub", colnames(m2)[1], sep = "_")
colnames(m2)[3] <- paste("Sub", colnames(m2)[3], sep = "_")
m2
Sub_x y Sub_z
[1,] 1 1 4
[2,] 1 2 3
[3,] 1 3 2
[4,] 1 4 1

Alternative solution using dplyr
library(dplyr)
m2 %>%
as_tibble() %>%
rename_with(.cols = c("x", "z"), ~ stringr::str_c("Sub_", .))

Related

How to select rows from a matrix in. R

Newbie here, so maybe this problem has an obvious solution:
I have a large matrix with no row or column names. I would like to create a data frame consisting of rows from the matrix where a column has a particular value. The column doesn't have a name and I don't know its position in advance.
Here is a small example:
S <- matrix(c(1,1,2,3,0,0,-2,0,1,2),5,2)
which prints as:
[,1] [,2]
[1,] 1 0
[2,] 1 -2
[3,] 2 0
[4,] 3 1
[5,] 0 2
now I want to create a data frame of rows that contain an even entry in some column. This should produce a df with rows 1,2,3,5. In my situation there are 1000 columns so I can't enumerate all the columns. (What I need is a quantifier that ranges across columns)
Update:
Thanks to valuable comment of MrFlick:
S %>% rowwise() %>% filter(any(is.even(c_across())))
First answer:
matrix to dataframe
across with created function is.even
filter
library(dplyr)
is.even <- function(v) v %% 2 == 0
S <- data.frame(S)
S %>% mutate(across(everything(), is.even, .names = "{.col}.{.fn}")) %>%
filter(X1.1 != FALSE | X2.1 != FALSE) %>%
select(X1, X2)
Output:
X1 X2
1 1 0
2 1 -2
3 2 0
4 0 2
We can do like this
subset(
as.data.frame(S),
(V1 * V2) %% 2 == 0
)
which gives
V1 V2
1 1 0
2 1 -2
3 2 0
5 0 2
If you have many columns, you can use an alternative
subset(
u <- as.data.frame(S),
Reduce("*", u) %% 2 == 0
)
You can test for "evenness" by look at the value mod 2 (%% 2) and checking if it's zero. You can perform this action across all rows using rowSums() to check for any TRUE values. So you can do
S[rowSums(S %% 2 ==0)>0, ]
# [,1] [,2]
# [1,] 1 0
# [2,] 1 -2
# [3,] 2 0
# [4,] 0 2
to get just the rows with an even value.
We can use apply from base R on a logical matrix constructed with %% to create a logical vector that can be used to subset the rows
S[ apply(!(S %% 2), 1, any),]
-ouput
[,1] [,2]
[1,] 1 0
[2,] 1 -2
[3,] 2 0
[4,] 0 2
Or with collapse for efficiency
library(collapse)
sbt(S, dapply(S, MARGIN = 1, FUN = function(x) any(! x %% 2)))
[,1] [,2]
[1,] 1 0
[2,] 1 -2
[3,] 2 0
[4,] 0 2
Or use tidyverse
library(dplyr)
as.data.frame(S) %>%
filter(if_any(everything(), ~ !(. %% 2) ))
V1 V2
1 1 0
2 1 -2
3 2 0
4 0 2

R rbind command remove extra information

x=rbind(rep(1:3),rep(1:3))
x
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 1 2 3
How is it possible to remove the braces and values inside with comma? I try make.row.names = FALSE but this does not work
You can do it with rownames and colnames:
colnames(x) <- 1:3
rownames(x) <- 1:2
x
# 1 2 3
#1 1 2 3
#2 1 2 3
You're probably confusing matrices with data frames?
x <- rbind(rep(1:3), rep(1:3))
x
# [,1] [,2] [,3]
# [1,] 1 2 3
# [2,] 1 2 3
The display is perfectly fine, since x is a matrix:
class(x)
# [1] "matrix"
You could change dimnames like so
dimnames(x) <- list(1:nrow(x), 1:ncol(x))
x
# 1 2 3
# 1 1 2 3
# 2 1 2 3
However, probably you want a data frame.
x <- as.data.frame(rbind(rep(1:3), rep(1:3)))
x
# V1 V2 V3
# 1 1 2 3
# 2 1 2 3
class(x)
# [1] "data.frame"

System of equations. How to split a string to gain two matrices A and b in R

I have a string like for example this (the number of variables and constants is not important):
> my_string <- "-x+2y+z=-1; x-3y-2z=-1; 3x-y-z=4"
I know how to obtain a cbind(A, b) matrix using replace and numeric functions...
# [,1] [,2] [,3] [,4]
# [1,] -1 2 1 -1
# [2,] 1 -3 -2 -1
# [3,] 3 -1 -1 4
...but have no idea how to automatically gain two matrices A and b
A
# [,1] [,2] [,3]
# [1,]-1 2 1
# [2,] 1 -3 -2
# [3,] 3 -1 -1
b
# [,1]
# [1,]-1
# [2,]-1
# [3,] 4
It means how can I split this string on = to gain one matrix with numerical elements located before the equal sign and an other with elements which are located after it?
EDIT.
So far I made this:
my_string<-"-x+2y+z=-1; x-3y-2z=-1; 3x-y-z=4"
my_string<-gsub('([[:punct:]]|\\s)([a-z])', '\\11\\2', my_string)
my_string<-stringr::str_replace_all(my_string,"[a-z]"," ")
my_string<-stringr::str_replace_all(my_string,"; ",";")
my_string<-stringr::str_replace_all(my_string,"[-]","+-")
my_string<-stringr::str_replace_all(my_string,"[+]"," ")
my_string<-stringr::str_replace_all(my_string,"[=] ","=")
my_string<-stringr::str_replace_all(my_string," ",",")
my_string<-stringr::str_replace_all(my_string," ",",")
my_string<-stringr::str_replace_all(my_string," ",",")
my_string<-gsub("^,","",my_string)
my_string <- strsplit(my_string, "=|;")
And I gained:
# "-1,2,1" "-1" "1,-3,-2" "-1" "3,-1,-1" "4"
How to connect this strings?
> A <- "-1,2,1,1,-3,-2,3,-1,-1"
> b <- "-1,-1,4"
Here are some alternatives. All can handle strings like my_string shown in the question but (3), (4) and (5) can also handle equations in which some of the variables are missing and the variables are out of order. Only (4) hard codes the variable names but it is generalized in (5).
1) Insert 1 before any variable that has no numeric multiplier giving s1. Then extract the variable names assuming they are on letter each and count the unique ones giving the number n. Then extract the numbers, convert them to numeric and shape them into a matrix using n. It is assumed that all three variables are present in each equation and that they are in the same order since that is the case in the question's example.
library(gsubfn)
my_string<-"-x+2y+z=-1; x-3y-2z=-1; 3x-y-z=4"
s1 <- gsub('(^|\\W)([a-z])', '\\11\\2', my_string) # from your prior question
n <- length(strapplyc(s1, "[a-z]", simplify = unique))
matrix(strapply(s1, "(-?\\d+)", as.numeric, simplify = c), n, byrow = TRUE)
giving:
[,1] [,2] [,3] [,4]
[1,] -1 2 1 -1
[2,] 1 -3 -2 -1
[3,] 3 -1 -1 4
2) A variation is to split s1 from above at semicolon giving s2. Then use strapply to pick out the numbers giving mat. Finally convert the numbers from character to numeric.
library(gsubfn)
s2 <- strsplit(s1, ";")
mat <- do.call("rbind", sapply(s2, strapply, "(-?\\d+)"))
matrix(as.numeric(mat), nrow(mat))
giving:
[,1] [,2] [,3] [,4]
[1,] -1 2 1 -1
[2,] 1 -3 -2 -1
[3,] 3 -1 -1 4
3) This alternative can handle missing variables such as in the example below where y is missing in the first equation. varnames are the variable names. The extr function takes a variable name and extracts its coefficients or 0 if the variable does not appear.
library(gsubfn)
my_string2 <- "-x+z=-1; x-3y-2z=-1; 3x-y-z=4"
s1 <- gsub('(^|\\W)([a-z])', '\\11\\2', my_string2)
s2 <- strsplit(s1, ";")
varnames <- sort(strapplyc(s1, "[a-z]", simplify = unique))
extr <- function(x)
strapply(s2[[1]], paste0("-?\\d", x), ~ as.numeric(gsub("\\D", "", x)), empty = 0)
A <- sapply(varnames, extr)
b <- as.numeric(sub(".*=", "", s2[[1]]))
giving:
> A
x y z
[1,] 1 0 1
[2,] 1 3 2
[3,] 3 1 1
> b
[1] -1 -1 4
4) This one replaces x with *c(1, 0, 0), y with *c(0,1,0) and z with *c(0,0,1) and the evaluates them to produce A. It is particularly simple. It can also handle equations in which not all variables are present.
It assumes that the variables are x, y and z although it could be generalized.
my_string2 <- "-x+z=-1; x-3y-2z=-1; 3x-y-z=4"
s1 <- gsub('(^|\\W)([a-z])', '\\11\\2', my_string2)
s2 <- strsplit(s1, ";")
s <- sub("=.*", "", s2[[1]])
s <- gsub("x", "*c(1, 0, 0)", s)
s <- gsub("y", "*c(0, 1, 0)", s)
s <- gsub("z", "*c(0, 0, 1)", s)
A <- eval(parse(text = paste("rbind(", paste(s, collapse = ","), ")")))
b <- as.numeric(sub(".*=", "", s2[[1]]))
giving:
> A
[,1] [,2] [,3]
[1,] -1 0 1
[2,] 1 -3 -2
[3,] 3 -1 -1
> b
[1] -1 -1 4
5) This is a generalized version of (4) where x, y and z are not hard coded. It can handle unordered and missing variables. We first get the variable names in varnames, split the input string giving spl, for the ith variable name replace it with a vector of 0's with 1 in the ith position giving ss1, insert * before any such vector prefixed by a digit giving ss2, remove = and everything after that and surround it with cbind(...) and evaluate it as an R expression giving A. b is everything after = converted to numeric.
library(gsubfn)
my_string2 <- "-z+x=-1; x-3y-2z=-1; 3x-y-z=4"
ss0 <- my_string2
varnames <- sort(strapplyc(ss0, "[a-z]", simplify = unique))
spl <- strsplit(ss0, ";")[[1]]
ss1 <- gsubfn("[a-z]", x ~ (match(x, varnames) == seq_along(varnames))+0, spl)
ss2 <- gsub("(\\d)c", "\\1*c", ss1)
ss3 <- sub("=.*", "", ss2)
A <- eval(parse(text = paste("rbind(", paste(ss3, collapse = ","), ")")))
b <- as.numeric(sub(".*=", "", ss2))
giving:
> A
[,1] [,2] [,3]
[1,] 1 0 -1
[2,] 1 -3 -2
[3,] 3 -1 -1
> b
[1] -1 -1 4
With base R only. A bit ugly, too many calls to strsplit and *apply functions.
my_string <- "-x+2y+z=-1; x-3y-2z=-1; 3x-y-z=4"
sp1 <- unlist(strsplit(my_string, ";"))
sp2 <- strsplit(sp1, "=")
b <- as.numeric(sapply(sp2, '[[', 2))
sp3 <- lapply(lapply(sp2, '[[', 1), function(s) gsub("([-+])([[:alpha:]])", "\\11\\2", s))
sp3 <- lapply(sp3, trimws)
sp3 <- lapply(sp3, function(s) sub("^([[:alpha:]])", "1\\1", s))
A <- do.call(rbind, lapply(sp3, function(x) as.numeric(unlist(strsplit(x, "[[:alpha:]]")))))
Here's a base version with fairly simple regex:
mystring <- "-x+2y+z=-1; x-3y-2z=-1; 3x-y-z=4"
equations <- strsplit(mystring, '; ')[[1]] # split equations
coefs <- strsplit(equations, '[xyz=]+') # split into list of vectors of coefficients
# iterate over coefficients, clean, and simplify
Ab <- t(sapply(coefs, function(x){
missing1 <- !grepl('\\d', x); # detect coefficients with no numbers
x[missing1] <- paste0(x[missing1], '1'); # paste ones on those
as.numeric(x) # coerce from strings (substitute `as.integer` if suitable)
}))
Ab
#> [,1] [,2] [,3] [,4]
#> [1,] -1 2 1 -1
#> [2,] 1 -3 -2 -1
#> [3,] 3 -1 -1 4
A <- Ab[, 1:3]
b <- Ab[, 4, drop = FALSE]
A
#> [,1] [,2] [,3]
#> [1,] -1 2 1
#> [2,] 1 -3 -2
#> [3,] 3 -1 -1
b
#> [,1]
#> [1,] -1
#> [2,] -1
#> [3,] 4
solve(A, b)
#> [,1]
#> [1,] 2
#> [2,] -1
#> [3,] 3
Use do.call(cbind, lapply(...)) instead of t(sapply(...)) if you like. Note that the simplicity of the regex are dependent on the regularity of the equations; you'll need a more robust solution if terms are out of order or missing.
Already solved, but I just wanted to post mine too
my_string <- "-x+2y+z=-1; x-3y-2z=-1; 3x-y-z=4"
split <- strsplit(strsplit(my_string, ";")[[1]], "=")
right <- do.call(rbind, lapply(split, function(x) as.numeric(x[[2]])))
left <- lapply(split, function(x) x[[1]])
left <- do.call(rbind, lapply(left, function(x) {
eq_fs = unlist(strsplit(x, "\\W")); eq_fs = eq_fs[eq_fs != ""]
eq_ss = unlist(strsplit(x, "\\w"))
eq_ss = eq_ss[c(T, eq_ss[2:length(eq_ss)] != "")]
idx = grepl("\\d", eq_fs)
nums = rep(1, length(eq_fs))
nums[idx] = gsub(".*?(\\d).*", "\\1", eq_fs[idx], perl = TRUE)
nums = as.numeric(nums) * as.numeric(paste0(eq_ss, 1))
return(nums)
}))

Generalize R %in% operator to match tuples

I spent a while the other day looking for a way to check if a row vector is contained in some set of row vectors in R. Basically, I want to generalize the %in% operator to match a tuple instead of each entry in a vector. For example, I want:
row.vec = c("A", 3)
row.vec
# [1] "A" "3"
data.set = rbind(c("A",1),c("B",3),c("C",2))
data.set
# [,1] [,2]
# [1,] "A" "1"
# [2,] "B" "3"
# [3,] "C" "2"
row.vec %tuple.in% data.set
# [1] FALSE
for my made-up operator %tuple.in% because the row vector c("A",3) is not a row vector in data.set. Using the %in% operator gives:
row.vec %in% data.set
# [1] TRUE TRUE
because "A" and 3 are in data.set, which is not what I want.
I have two questions. First, are there any good existing solutions to this?
Second, since I couldn't find them (even if they exist), I tried to write my own function to do it. It works for an input matrix of row vectors, but I'm wondering if any experts have proposed improvements:
is.tuple.in <- function(matrix1, matrix2){
# Apply rbind() so that matrix1 has columns even if it is a row vector.
matrix1 = rbind(matrix1)
if(ncol(matrix1) != ncol(matrix2)){
stop("Matrices must have the same number of columns.") }
# Now check for the first row and handle other rows recursively
row.vec = matrix1[1,]
tuple.found = FALSE
for(i in 1:nrow(matrix2)){
# If we find a match, then this row exists in matrix 2 and we can break the loop
if(all(row.vec == matrix2[i,])){
tuple.found = TRUE
break
}
}
# If there are more rows to be checked, use a recursive call
if(nrow(matrix1) > 1){
return(c(tuple.found, is.tuple.in(matrix1[2:nrow(matrix1),],matrix2)))
} else {
return(tuple.found)
}
}
I see a couple problems with that that I'm not sure how to fix. First, I'd like the base case to be clear at the start of the function. I didn't manage to do this because I pass matrix1[2:nrow(matrix1),] in the recursive call, which produces an error if matrix1 has one row. So instead of getting to a case where matrix1 is empty, I have an if condition at the end deciding if more iterations are necessary.
Second, I think the use of rbind() at the start is sloppy, but I needed it for when matrix1 had been reduced to a single row. Without using rbind(), ncol(matrix1) produced an error in the 1-row case. I figure my trouble here has to do with a lack of knowledge about R data types.
Any help would be appreciated.
I'm wondering if you have made this a bit more complicated than it is. For example,
set.seed(1618)
vec <- c(1,3)
mat <- matrix(rpois(1000,3), ncol = 2)
rownames(mat) <- 1:nrow(mat)
mat[sapply(1:nrow(mat), function(x) all(vec %in% mat[x, ])), ]
# gives me this
# [,1] [,2]
# 6 3 1
# 38 3 1
# 39 3 1
# 85 1 3
# 88 1 3
# 89 1 3
# 95 3 1
# 113 1 3
# ...
you could subset this further if you care about the order
or you could modify the function slightly:
mat[sapply(1:nrow(mat), function(x)
all(paste(vec, collapse = '') %in% paste(mat[x, ], collapse = ''))), ]
# [,1] [,2]
# 85 1 3
# 88 1 3
# 89 1 3
# 113 1 3
# 133 1 3
# 139 1 3
# 187 1 3
# ...
another example with a longer vector
set.seed(1618)
vec <- c(1,4,5,2)
mat <- matrix(rpois(10000, 3), ncol = 4)
rownames(mat) <- 1:nrow(mat)
mat[sapply(1:nrow(mat), function(x) all(vec %in% mat[x, ])), ]
# [,1] [,2] [,3] [,4]
# 57 2 5 1 4
# 147 1 5 2 4
# 279 1 2 5 4
# 303 1 5 2 4
# 437 1 5 4 2
# 443 1 4 5 2
# 580 5 4 2 1
# ...
I see a couple that match:
mat[sapply(1:nrow(mat), function(x)
all(paste(vec, collapse = '') %in% paste(mat[x, ], collapse = ''))), ]
# [,1] [,2] [,3] [,4]
# 443 1 4 5 2
# 901 1 4 5 2
# 1047 1 4 5 2
but only three
for your single row case:
vec <- c(1,4,5,2)
mat <- matrix(c(1,4,5,2), ncol = 4)
rownames(mat) <- 1:nrow(mat)
mat[sapply(1:nrow(mat), function(x)
all(paste(vec, collapse = '') %in% paste(mat[x, ], collapse = ''))), ]
# [1] 1 4 5 2
here is a simple function with the above code
is.tuplein <- function(vec, mat, exact = TRUE) {
rownames(mat) <- 1:nrow(mat)
if (exact)
tmp <- mat[sapply(1:nrow(mat), function(x)
all(paste(vec, collapse = '') %in% paste(mat[x, ], collapse = ''))), ]
else tmp <- mat[sapply(1:nrow(mat), function(x) all(vec %in% mat[x, ])), ]
return(tmp)
}
is.tuplein(vec = vec, mat = mat)
# [1] 1 4 5 2
seems to work, so let's make our own %in% operator:
`%tuple%` <- function(x, y) is.tuplein(vec = x, mat = y, exact = TRUE)
`%tuple1%` <- function(x, y) is.tuplein(vec = x, mat = y, exact = FALSE)
and try her out
set.seed(1618)
c(1,2,3) %tuple% matrix(rpois(1002,3), ncol = 3)
# [,1] [,2] [,3]
# 133 1 2 3
# 190 1 2 3
# 321 1 2 3
set.seed(1618)
c(1,2,3) %tuple1% matrix(rpois(1002,3), ncol = 3)
# [,1] [,2] [,3]
# 48 2 3 1
# 64 2 3 1
# 71 1 3 2
# 73 3 1 2
# 108 3 1 2
# 112 1 3 2
# 133 1 2 3
# 166 2 1 3
Does this do what you want (even for more than 2 columns)?
paste(row.vec,collapse="_") %in% apply(data.set,1,paste,collapse="_")

in R, how to retrieve a complete matrix using combn?

My problem, removing the specific purpose, seems like this:
how to transform a combination like this:
first use combn(letters[1:4], 2) to calculate the combination
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "a" "a" "a" "b" "b" "c"
[2,] "b" "c" "d" "c" "d" "d"
use each column to obtain another data frame:
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 2 3 4 5 6
elements are obtained, for example: the first element, from the first column of the above dataframe
then How can i transform the above dataframe into a matrix, for example result, things like:
a b c d
a 0 1 2 3
b 1 0 4 5
c 2 4 0 6
d 3 5 6 0
the elements with same col and row names will have zero value where others corresponding to above value
Here is one way that works:
inputs <- letters[1:4]
combs <- combn(inputs, 2)
N <- seq_len(ncol(combs))
nams <- unique(as.vector(combs))
out <- matrix(ncol = length(nams), nrow = length(nams))
out[lower.tri(out)] <- N
out <- t(out)
out[lower.tri(out)] <- N
out <- t(out)
diag(out) <- 0
rownames(out) <- colnames(out) <- inputs
Which gives:
> out
a b c d
a 0 1 2 3
b 1 0 4 5
c 2 4 0 6
d 3 5 6 0
If I had to do this a lot, I'd wrap those function calls into a function.
Another option is to use as.matrix.dist() to do the conversion for us by setting up a "dist" object by hand. Using some of the objects from earlier:
## Far easier
out2 <- N
class(out2) <- "dist"
attr(out2, "Labels") <- as.character(inputs)
attr(out2, "Size") <- length(inputs)
attr(out2, "Diag") <- attr(out2, "Upper") <- FALSE
out2 <- as.matrix(out2)
Which gives:
> out2
a b c d
a 0 1 2 3
b 1 0 4 5
c 2 4 0 6
d 3 5 6 0
Again, I'd wrap this in a function if I had to do it more than once.
Does it have to be a mirror matrix with zeros over the diagonal?
combo <- combn(letters[1:4], 2)
in.combo <- matrix(1:6, nrow = 1)
combo <- rbind(combo, in.combo)
out.combo <- matrix(rep(NA, 16), ncol = 4)
colnames(out.combo) <- letters[1:4]
rownames(out.combo) <- letters[1:4]
for(cols in 1:ncol(combo)) {
vec1 <- combo[, cols]
out.combo[vec1[1], vec1[2]] <- as.numeric(vec1[3])
}
> out.combo
a b c d
a NA 1 2 3
b NA NA 4 5
c NA NA NA 6
d NA NA NA NA

Resources