Print a matrix flattened by column using a separator - r

I have the following matrix made from a vector of vectors, which I want to print separated by the & operator.
vec1 <- c(1, 2, 3, 4)
vec2 <- c(5, 6, 7, 8)
vec3 <- c(9, 10, 11, 12)
vec4 <- c(13, 14, 15, 16)
vec5 <- c(17, 18, 19, 20)
vec6 <- c(21, 22, 23, 24)
Mat <- matrix(c(vec1, vec2, vec3, vec4, vec5, vec6), nrow = 6, ncol = 4, byrow = TRUE)
(vect1 <- c(Mat[1,1], Mat[1,2], Mat[1,3], Mat[1,4], Mat[3,1], Mat[3,2], Mat[3,3], Mat[3,4], Mat[5,1], Mat[5,2], Mat[5,3], Mat[5,4]))
This is what I want for the above.
[1] 1 & 2 & 3 & 4 & 9 & 10 & 11 & 12 & 17 & 18 & 19 & 20
(vect2 <- c(Mat[2,1], Mat[2,2], Mat[2,3], Mat[2,4], Mat[4,1], Mat[4,2], Mat[4,3], Mat[4,4], Mat[6,1], Mat[6,2], Mat[6,3], Mat[6,4]))
This is what I want for the above.
[1] 5 & 6 & 7 & 8 & 13 & 14 & 15 & 16 & 21 & 22 & 23 & 24
I actually need it in the output in the latex table such that the & symbol will separate each element from the other.

c() is a convenient function to flatten matrices by column, so t() then c() flattens by row.
Mat |>
t() |>
c() |>
paste(collapse = " & ")
"1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11 & 12 & 13 & 14 & 15 & 16 & 17 & 18 & 19 & 20 & 21 & 22 & 23 & 24"
Feel free to leave out the paste step if you do not require it in string format.
|> is the base R form of a pipe, if you are unfamiliar with it.

Related

programming R ifelse conditions loop

Hello i need help with programming R. I have data.frame B with four column
x<- c(1,2,1,2,1,2,1,2,1,2,1,2,.......etc.)
y<-c(5,5,8,8,12,12,19,19,30,30,50,50,...etc.)
z<- c(2018-11-08,2018-11-08,2018-11-09,2018-11-09,2018-11-11,2018-11-11,2018-11-20,2018-11-20,2018-11-29,2018-11-29,2018-11-30,2018-11-30,.......etc.)
m<-c(0,1,1,0,1,1,0,1,0,1,0,1,...etc.)
2 milion rows and i need create next columns . Next columns should look as
t<-c(0,1,0,0,0,0,0,1,0,1,0,1,....)
code in cycle look like
B$t[1]=ifelse(B$y[i]==B$y[i+1] & B$z[i]==B$z[i+1] & B$x[i]==2 & B$m[1]==1,1,0)
for (i in 2:length(B$z))
{
B$t[i]<-ifelse(B$y[i]==B$y[i-1] & B$z[i]==B$z[i-1] & B$x[i]==2 & B$m[i]==1 & B$m[i]!=B$m[i-1],1,0)
}
I do not want to use cycle- loop.
I use basic package in R.
And i have new one question when i have data.frame E
x<- c(1,2,3,1,2,3,1,2,3,1,2,3,.......etc.)
y<-c(5,5,5,8,8,8,12,12,12,,19,19,19,30,30,30,50,50,50,...etc.)
z<- c(2018-11-08,2018-11-08,2018-11-08,2018-11-09,2018-11-09,2018-11-09,2018-11-11,2018-11-11,2018-11-11,2018-11-20,2018-11-20,2018-11-20,2018-11-29,2018-11-29,2018-11-29,2018-11-30,2018-11-30,2018-11-30,.......etc.)
m<-c(0,1,1,0,0,1,0,1,0,1,0,1,0,0,1...etc.)
2 milion rows and i need create next columns . Next columns should look as
t<-c(0,1,0,0,1,....)
code in cycle look like
E$t[1]=ifelse(E$y[i]==E$y[i+1] & E$z[i]==E$z[i+1] & E$x[1]==2 & E$m[1]==1,1,0)
E$t[2]=ifelse(E$y[i]==E$y[i+1] & E$z[i]==E$z[i+1] & E$x[2]==3 & E$m[2]==1,1,0)
for (i in 3:length(E$y))
{
E$t[i]<-ifelse(E$y[i]==E$y[i-2] & E$z[i]==E$z[i-2] & E$x[i]==3 & E$m[i]==1 &
E$m[i-1]==0 & E$m[i-2]==0,1,0)
}
I do not want to use cycle- loop.
I use basic package in R.
Here is a solution with base R:
N <- nrow(B)
B$t <- ifelse(B$y==c(NA, B$y[-N]) & B$z==c(NA, B$z[-N]) & B$x==2 & B$m==1 & B$m!=c(NA, B$m[-N]), 1, 0)
Here is a solution with data.table:
library("data.table")
B <- data.table(
x= c(1,2,1,2,1,2,1,2,1,2,1,2), y= c(5,5,8,8,12,12,19,19,30,30,50,50),
z= c("2018-11-08", "2018-11-08", "2018-11-09", "2018-11-09", "2018-11-11", "2018-11-11", "2018-11-20",
"2018-11-20", "2018-11-29", "2018-11-29", "2018-11-30", "2018-11-30"),
m= c(0,1,1,0,1,1,0,1,0,1,0,1)
)
B[, t := ifelse(y==c(NA, y[- .N]) & z==c(NA, z[- .N]) & x==2 & m==1 & m!=c(NA, m[- .N]), 1, 0)]
or (if logical is acceptable)
B[, t := (y==c(NA, y[- .N]) & z==c(NA, z[- .N]) & x==2 & m==1 & m!=c(NA, m[- .N]))]
or using shift()
B[, t := (y==shift(y) & z==shift(z) & x==2 & m==1 & m!=shift(m))]
With dplyr you can use if_else and lag:
library(dplyr)
dat %>%
mutate(t = if_else(
y == lag(y) & z == lag(z) & x == 2 & m == 1 & m != lag(m), 1, 0)
) # mutate lets you create a new variable in dat (named t here)
# x y z m t
# 1 1 5 2018-11-08 0 0
# 2 2 5 2018-11-08 1 1
# 3 1 8 2018-11-09 1 0
# 4 2 8 2018-11-09 0 0
# 5 1 12 2018-11-11 1 0
# 6 2 12 2018-11-11 1 0
# 7 1 19 2018-11-20 0 0
# 8 2 19 2018-11-20 1 1
# 9 1 30 2018-11-29 0 0
# 10 2 30 2018-11-29 1 1
# 11 1 50 2018-11-30 0 0
# 12 2 50 2018-11-30 1 1
Data:
x<- c(1,2,1,2,1,2,1,2,1,2,1,2)
y<-c(5,5,8,8,12,12,19,19,30,30,50,50)
z<- c("2018-11-08","2018-11-08","2018-11-09","2018-11-09","2018-11-11","2018-11-11","2018-11-20","2018-11-20","2018-11-29","2018-11-29","2018-11-30","2018-11-30")
m<-c(0,1,1,0,1,1,0,1,0,1,0,1)
dat <- data.frame(x, y, z, m)

Extract multiple ranges from a numeric vector

First, I simplify my question. I want to extract certain ranges from a numeric vector. For example, extracting 3 ranges from 1:20 at the same time :
1 < x < 5
8 < x < 12
17 < x < 20
Therefore, the expected output is 2, 3, 4, 9, 10, 11, 18, 19.
I try to use the function findInterval() and control arguments rightmost.closed and left.open to do that, but any arguments sets cannot achieve the goal.
x <- 1:20
v <- c(1, 5, 8, 12, 17, 20)
x[findInterval(x, v) %% 2 == 1]
# [1] 1 2 3 4 8 9 10 11 17 18 19
x[findInterval(x, v, rightmost.closed = T) %% 2 == 1]
# [1] 1 2 3 4 8 9 10 11 17 18 19 20
x[findInterval(x, v, left.open = T) %% 2 == 1]
# [1] 2 3 4 5 9 10 11 12 18 19 20
By the way, the conditions can also be a matrix like that :
[,1] [,2]
[1,] 1 5
[2,] 8 12
[3,] 17 20
I don't want to use for loop if it's not necessary.
I am grateful for any helps.
I'd probably do it using purrr::map2 or Map, passing your lower-bounds and upper-bounds as arguments and filtering your dataset with a custom function
library(purrr)
x <- 1:20
lower_bounds <- c(1, 8, 17)
upper_bounds <- c(5, 12, 20)
map2(
lower_bounds, upper_bounds, function(lower, upper) {
x[x > lower & x < upper]
}
)
You may use data.table::inrange and its incbounds argument. Assuming ranges are in a matrix 'm', as shown in your question:
x[data.table::inrange(x, m[ , 1], m[ , 2], incbounds = FALSE)]
# [1] 2 3 4 9 10 11 18 19
m <- matrix(v, ncol = 2, byrow = TRUE)
You were on the right path, and left.open indeed helps, but rightmost.closed actually concerns only the last interval rather than the right "side" of each interval. Hence, we need to use left.open twice. As you yourself figured out, it looks like an optimal way to do that is
x[findInterval(x, v) %% 2 == 1 & findInterval(x, v, left.open = TRUE) %% 2 == 1]
# [1] 2 3 4 9 10 11 18 19
Clearly there are alternatives. E.g.,
fun <- function(x, v)
if(length(v) > 1) v[1] < x & x < v[2] | fun(x, v[-1:-2]) else FALSE
x[fun(x, v)]
# [1] 2 3 4 9 10 11 18 19
I found an easy way just with sapply() :
x <- 1:20
v <- c(1, 5, 8, 12, 17, 20)
(v.df <- as.data.frame(matrix(v, 3, 2, byrow = T)))
# V1 V2
# 1 1 5
# 2 8 12
# 3 17 20
y <- sapply(x, function(x){
ind <- (x > v.df$V1 & x < v.df$V2)
if(any(ind)) x else NA
})
y[!is.na(y)]
# [1] 2 3 4 9 10 11 18 19

Automatically replacing only particular characters in a string

To show you my problem. We have a string containing a random system of equations:
x0<-"3w+2x+y-3z=-5; 5w+x+2z=31; -2w-x+3y+4z=7; -3x-5y+z=8"
Next steps:
varnames <- sort(strapply(x0, "[a-z]", simplify = unique))
spl <- strsplit(x0, ";")[[1]]
my_string<-unlist(spl)
my_string<-trimws(my_string)
ss1 <- gsubfn("[a-z]", x ~ (match(x, varnames) == seq_along(varnames))+0,
spl)
ss2 <- gsub("(\\d)c", "\\1*c", ss1)
ss3 <- sub("=.*", "", ss2)
A <- eval(parse(text = paste("rbind(", paste(ss3, collapse = ","), ")")))
b <- as.numeric(sub(".*=", "", ss2))
z<-matrix(cbind(A,b), nrow=ncol(cbind(A,b)), ncol=nrow(cbind(A,b)),
byrow=TRUE)
x1<-toString(z)
x1<-stringr::str_replace_all(z1,","," &")
x1
The output is:
3 & 2 & 1 & -3 & -5 & 5 & 1 & 0 & 2 & 31 & -2 & -1 & 3 & 4 & 7 & 0 & -3 & -5 & 1 & 8
But I want to achieve:
3 & 2 & 1 & -3 &|& -5 \\ 5 & 1 & 0 & 2 &|& 31 \\ -2 & -1 & 3 & 4 &|& 7 \\ 0 & -3 & -5 & 1 &|& 8
It means how to replace in x1 every fourth (in this example) "&" char (which stands in the x0 string for "=") with "&|&" and every fifth "&" (which stands in the x0 for ";") with "\\" to be able to create in markdown a Latex table, like this:
Thank you in advance.
Man that took a lot out of me, lol. Feels like there must be an easier way..
library(stringr)
x2 <- str_replace_all(x1, '(-?\\d+\\s&\\s-?\\d+\\s&\\s-?\\d+\\s&\\s)(-?\\d+\\s?)&?\\s?(-?\\d+)\\s?&?\\s?', '\\1\\2&|& \\3 \\\\ ')
x2 <- substr(x2, 1, nchar(x2)-3)
x2
#[1] "3 & 2 & 1 & -3 &|& -5 \\ 5 & 1 & 0 & 2 &|& 31 \\ -2 & -1 & 3 & 4 &|& 7 \\ 0 & -3 & -5 & 1 &|& 8"

Split a vector in R depending on entries

I input a vector vec<-c(2 3 4 8 10 12 15 19 20 23 27 28 39 47 52 60 64 75), and the size of intervals that I want to break the vector entries into.
In this example I want to break this into 9 different vectors based on the size of each entry.
In my case I want vector number 1 to be entries in the interval [1,9], then vector 2 to be entries in [10,18]...ect
In other words:
vec1: 2 3 4 8
vec2: 10 12 15
vec3: 19 20 23 27
ect...
I have tried using the split function but I do not know how to set a ratio that will work.
Maybe the following will do what you want.
f <- cut(vec, seq(0, max(vec), by = 9), include.lowest = TRUE)
sp <- split(vec, f)
sp <- sp[sapply(sp, function(x) length(x) != 0)]
sp
Use integer division %/% to return a vector of which group each value belongs in. Then split into separate vectors. Use (vec-1) to be "inclusive", i.e. 27 goes with group 3, not group 4.
split(vec,(vec-1) %/% 9)
Edit:
Another way using dplyr and cut which explicitly tags each interval
require(dplyr)
vec <- as.data.frame(vec)
df2 %>% mutate(interval = cut(vec,breaks=seq(0,((max(vec) %/% 9) +1) * 9,9),include.lowest=TRUE,right=TRUE))
vec interval
1 2 [0,9]
2 3 [0,9]
3 4 [0,9]
4 8 [0,9]
5 10 (9,18]
6 12 (9,18]
7 15 (9,18]
8 19 (18,27]
9 20 (18,27]
10 23 (18,27]
11 27 (18,27]
maybe this
library(purrr)
vec <- c(2, 3, 4, 8, 10 ,12, 15 ,19, 20, 23, 27, 28, 39, 47, 52, 60, 64, 75)
vec1 <- keep(vec, function(x) x >= 1 & (x) <= 9)
vec2 <- keep(vec, function(x) x >= 10 & (x) <= 18)

R similar column names ifelse

adding reproducible code as suggested by answers
Qs<-paste0("Q2_", 1:18)
set.seed(15)
maindata <- data.frame(ID=1:5)
for(q in Qs) {
maindata[,q] <- sample(1:20,5,replace=T)
}
I have below code. Is their a better to achieve the output without writing each line? If i thought of writing the for loop for iterating over questions 1 to 18 but felt that for loop might not be too efficient...
ifelse(maindata$Q2_1 > 2 & maindata$Q2_1< 11 & !is.na(maindata$Q2_1), 1, 0 )+
ifelse(maindata$Q2_2 > 2 & maindata$Q2_2< 11 & !is.na(maindata$Q2_2), 1, 0)+
ifelse(maindata$Q2_3 > 2 & maindata$Q2_3< 11 & !is.na(maindata$Q2_3), 1, 0)+
ifelse(maindata$Q2_4 > 2 & maindata$Q2_4< 11 & !is.na(maindata$Q2_4), 1, 0)+
ifelse(maindata$Q2_5 > 2 & maindata$Q2_5< 11 & !is.na(maindata$Q2_5), 1, 0)+
ifelse(maindata$Q2_6 > 2 & maindata$Q2_6< 11 & !is.na(maindata$Q2_6), 1, 0)+
ifelse(maindata$Q2_7 > 2 & maindata$Q2_7< 11 & !is.na(maindata$Q2_7), 1, 0)+
ifelse(maindata$Q2_8 > 2 & maindata$Q2_8< 11 & !is.na(maindata$Q2_8), 1, 0)+
ifelse(maindata$Q2_9 > 2 & maindata$Q2_9< 11 & !is.na(maindata$Q2_9), 1, 0)+
ifelse(maindata$Q2_10 > 2 & maindata$Q2_10< 11 & !is.na(maindata$Q2_10), 1, 0)+
ifelse(maindata$Q2_11 > 2 & maindata$Q2_11< 11 & !is.na(maindata$Q2_11), 1, 0)+
ifelse(maindata$Q2_12 > 2 & maindata$Q2_12< 11 & !is.na(maindata$Q2_12), 1, 0)+
ifelse(maindata$Q2_13 > 2 & maindata$Q2_13< 11 & !is.na(maindata$Q2_13), 1, 0)+
ifelse(maindata$Q2_14 > 2 & maindata$Q2_14< 11 & !is.na(maindata$Q2_14), 1, 0)+
ifelse(maindata$Q2_15 > 2 & maindata$Q2_15< 11 & !is.na(maindata$Q2_15), 1, 0)+
ifelse(maindata$Q2_16 > 2 & maindata$Q2_16< 11 & !is.na(maindata$Q2_16), 1, 0)+
ifelse(maindata$Q2_17 > 2 & maindata$Q2_17< 11 & !is.na(maindata$Q2_17), 1, 0)+
ifelse(maindata$Q2_18 > 2 & maindata$Q2_18< 11 & !is.na(maindata$Q2_18), 1, 0)
Well, here's one way. First, let's create some sample data
Qs<-paste0("Q2_", 1:18)
set.seed(15)
maindata <- data.frame(ID=1:5)
for(q in Qs) {
maindata[,q] <- sample(1:20,5,replace=T)
}
Here we make a list of all the question names (Qs) and we create a data.frame with 5 rows where each column contains values sampled from 1:20. If we want the score for each line for each individual, we can do
score <- rowSums(sapply(Qs, function(q)
maindata[,q] > 2 & maindata[,q] <11 & !is.na(maindata[,q]) )+0)
Here I use sapply to iterate over the question names. Then i wrote the formula once and swap in the different question names. Here I return a simple logical value and add zero to convert FALSE to 0 and TRUE to 1. Then I use rowSums to app up scores across rows. We can see the results with
cbind(maindata[,"ID", drop=F], score)
# ID score
# 1 1 9
# 2 2 8
# 3 3 4
# 4 4 6
# 5 5 10

Resources