I have two strings:
x1 = "03011000"
x2 = "13001000"
The strings have an exact overlap in their non-zero characters. I would like to get for every character position the max element.
So in this case the result would be:
result = "13011000"
The first character is a 1 because x2 has a 1 at the first position. The fourth character is also a 1 because x1 has a 1 at this position.
The way I go about it is the following:
paste0(mapply(pmax, strsplit(x1, ""), strsplit(x2, "")), collapse = "")
But this seems excessive as I have to split every character into its components and compare them. Is there a better approach to this?
Using raw comparison:
rawToChar(as.raw(pmax(as.numeric(charToRaw(x1)), as.numeric(charToRaw(x2)))))
# [1] "13011000"
We could wrap it into a function:
foo <- function(x, y){
mapply(FUN = function(x, y) {
rawToChar(as.raw(pmax(as.numeric(charToRaw(x)), as.numeric(charToRaw(y)))))
}, x = x, y = y, USE.NAMES = FALSE)
}
x1 <- "03011000"
x2 <- "13001000"
foo(x1, x2)
# [1] "13011000"
x1 <- c("03011000", "ab", "123")
x2 <- c("13001000", "cd", "212")
foo(x1, x2)
# [1] "13011000" "cd" "223"
The strings have an exact overlap in their non-zero characters.
I assume this means that when both strings are nonzero, they are guaranteed to match?
If so, it is sufficient to find the positions with zeros in one vector and not in the other (with setdiff) and make the string edit:
r <- gregexpr("0", c(x1,x2))
w <- setdiff(r[[1]], r[[2]])
rr <- structure(w, match.length = rep(1L, length(w)), useBytes = TRUE)
x = x1
regmatches(x, rr) <- regmatches(x2, rr)
x
# [1] "13011000"
Related
Suppose I have a string x <- "abc%de%fgh%ijk%" and I have a vector y <- c(1,2,3,4). Is there a clean way in base R to replace the % in x with the ordered vector y with the result being x <- "abc1de2fgh3ijk4"?
In base R, we can split the 'x' at % and then paste with 'y'
paste0(strsplit(x, "%", fixed = TRUE)[[1]], y, collapse = "")
-output
[1] "abc1de2fgh3ijk4"
We may also use gsubfn to insert the count at %
library(gsubfn)
p <- proto(fun = function(this, x) count )
gsubfn("%", p, x)
If the 'y' values are different, use the count as index to extract the 'y'
p <- proto(fun = function(this, x) y[count])
gsubfn("%", p, x)
i.e. for different values of 'y'
> y <- c(24, 31, 15, 23)
> gsubfn("%", p, x)
[1] "abc24de31fgh15ijk23"
data
x <- "abc%de%fgh%ijk%"
I wonder if it might be possible to drop the parts in n1 character vector that partially overlap with elements in f1 formula.
For example, in n1, we see "chyes"&"bmi:chyes" partially overlap with ch in f1.
Thus in the desired_output, we want to drop the "ch" part of them. Because other elements in n1 either fully overlap with an element in f1 (ex. bmi) or don't exist in f1 (ex. intrcpt), we leave them unchanged.
I have tried the following solution, but can't get my desired output.
Is obtaining my desired_output possible in BASE R or tidyvesrse?
f1 <- yi~ bmi*ch
n1 <- c("intrcpt","bmi","chyes","bmi:chyes")
desired_output <- c("intrcpt","bmi","yes","bmi:yes")
### Current unsuccessful solution:
foo <- function(fmla, vec) {
v1 <- all.vars(fmla)
v2 <- setdiff(vec, v1)
v1 <- paste0('^', v1)
v3 <- sub(paste(v1, collapse = "|"), "", v2)
vec[vec %in% v2] <- v3
vec
}
### EXAMPLE OF USE:
foo(f1, n1)
# "intrcpt" "bmi" "chyes" "bmi:chyes"
This function does what you want, but I agree with #Onyambu that it is worth considering whether your underlying problem actually necessitates string manipulation.
f <- function(fm, nm) {
vars <- vapply(attr(terms(fm), "variables"), deparse, "")[-1L]
subpat <- paste0(gsub("([()])", "\\\\\\1", vars), collapse = "|")
l <- rapply(strsplit(nm, ":"), sub, how = "list",
perl = TRUE,
pattern = sprintf("^(?!(%1$s)$)(%1$s)(.+)$", subpat),
replacement = "\\3")
vapply(l, paste0, "", collapse = ":")
}
fm1 <- yi ~ bmi * ch
nm1 <- c("intrcpt", "bmi", "chyes", "bmi:chyes")
f(fm1, nm1)
[1] "intrcpt" "bmi" "yes" "bmi:yes"
fm2 <- yi ~ bmi * factor(ch)
nm2 <- c("intrcpt", "bmi", "factor(ch)yes", "bmi:factor(ch)yes")
f(fm2, nm2)
[1] "intrcpt" "bmi" "yes" "bmi:yes"
fm3 <- gi ~ 0 + time:pub_type + time_wk_whn + time_wk_btw
nm3 <- c("time_wk_whn", "time_wk_btw", "timePost-test 1:pub_typejournal", "timePost-test 2:pub_typejournal")
f(fm3, nm3)
[1] "time_wk_whn" "time_wk_btw"
[3] "Post-test 1:journal" "Post-test 2:journal"
If the last value of each sublist in the list ListResiduals (e.g: OptionAOptionD) is > than the value with the corresponding name in the ListSigma (e.g: OptionAOptionD), it adds the name (e.g: OptionAOptionD) to the Watchlist list.
In the last line of the code I put "> 5" just for the example work, it's the "> 5" that I want to replace in the condition that I mentioned in the previous paragraph.
DF <- data.frame("OptionA" = sample(1:100, 50),
"OptionB" = sample(1:100, 50),
"OptionC" = sample(1:100, 50),
"OptionD" = sample(1:100, 50))
#Unfolding options and creating DF
UnFolding <- data.frame(
First = as.vector(sapply(names(DF[]), function(x)
sapply(names(DF[]), function(y)
paste0(x)))),
Second = as.vector(sapply(names(DF[]), function(x)
sapply(names(DF[]), function(y)
paste0(y)))))
#Deleting lines with the same names
UnFolding <-
UnFolding[UnFolding$First != UnFolding$Second, ]
#Creating list with dependent and independent variables
LMList <- apply(UnFolding, 1, function(x)
as.formula(paste(x[1], "~", x[2])))
#Change list data to variable names
names(LMList) <- substring(lapply(LMList, paste, collapse = ""), 2)
#Linear regression - lm()
LMListRegression <- lapply(LMList, function(x) {
eval(call("lm", formula = x, data = DF))
})
#Residuals
ListResiduals <- lapply(LMListRegression, residuals)
#Sigma
ListSigma <- lapply(LMListRegression, function(x) {
sigma(x)*2
})
#Watchlist
Watchlist <- as.list(unlist(lapply(ListResiduals,
function(x) names(x)[1][tail(x, 1) > 5])))
I would gravitate towards converting your Simga and Residual values to a vector and compare the vectors. You could also use a data.frame approach to be sure the order of your lists/vectors doesn't change.
# create a vector with the last value from the Residuals list.
last_residual <- sapply(ListResiduals, `[`, 50)
names(last_residual) <- substr(names(last_residual), 1, stop = -4)
# Using sapply() rather than lapply, will return a named vector
sigma_vector <- sapply(LMListRegression, function(x) {
sigma(x)*2
})
Watchlist <- sigma_vector[last_residual > sigma_vector]
Watchlist
# named numeric(0)
In your example, it returns an empty named vector because no values meet your condition
max(last_residual)
# [1] 31.70949
min(sigma_vector)
# [1] 52.93234
# To demonstrate that it works, let's devide sigma by 2 so that at least some values will pass
half_sigma <- sigma_vector/2
Watchlist2 <- sigma_vector[last_residual > half_sigma]
Watchlist2
# OptionDOptionA OptionDOptionB OptionDOptionC
# 54.52411 57.09503 56.79341
I have a character vector of identical elements:
vec <- c("AXAXAXA", "AXAXAXA", "AXAXAXA")
I would like to replace "X" with "Y" but leave one "X" behind in each string, different one in each string, so I get
vec_res <- c("AXAYAYA", "AYAXAYA", "AYAYAXA")
The strings are always the same and the number of elements (strings) in the vector is the same as the number of "X" in each string
I am new to sringr but I suspect there could be a smart way to do this.
Another one
x <- 'AXAXAXA'
y <- grepRaw('X', x, all = TRUE)
x <- chartr('X', 'Y', rep(x, length(y)))
substr(x, y, y + 1) <- 'X'
x
# [1] "AXAYAYA" "AYAXAYA" "AYAYAXA"
One option utilizing purrr could be:
map2_chr(.x = grepRaw("X", vec[1], all = TRUE),
.y = vec,
~ `substr<-`(gsub("X", "Y", .y), .x, .x, "X"))
[1] "AXAYAYA" "AYAXAYA" "AYAYAXA"
The same thing with base R:
mapply(function(x, y) `substr<-`(gsub("X", "Y", y), x, x, "X"),
x = grepRaw("X", vec[1], all = TRUE),
y = vec)
I'm not sure this is a very common operator or that stringr will help much here. Here's a base R function. Since the repetition of the input vector is redundant, this function expects a single value and will generate an output value for each of the X's
swap_expand <- function(x) {
stopifnot(length(x)==1)
m <- gregexpr("X", x)
regmatches(x, m) <- "Y" #set all X's to Y
m <- m[[1]]
r <- rep(x, length(m)) #repeat for each X
substr(r, m, m)<-"X" #replace a different Y each time
r
}
swap_expand("AXAXAXA")
# [1] "AXAYAYA" "AYAXAYA" "AYAYAXA"
Given a data.frame and a vector only with -1,0,1 with length equal to the number of columns of the data.frame. Is there a natural way to transform the vector into a formula with those elements in position with a -1 appear on the left side of the formula and those with +1 appear on the right side?
For example, given the following data.frame
df = data.frame(
'a' = rnorm(10),
'b' = rnorm(10),
'c' = rnorm(10),
'd' = rnorm(10),
'e' = rnorm(10))
and following vector vec = c(-1,-1,0,1,1).
Is there a natural way to build formula a+b~d+e?
We assume that if there are no 1's in vec that we should use a right hand side of 1 and if there are no -1's in vec then the left hand side is empty.
The alternatives each produce a character string but if a formula class object is wanted use formula(s) where s is that string.
1) paste each side Subset out the names corresponding to vec -1 giving LHS and paste/collapse them and do the same with vec 1 giving RHS and paste those with ~ all together. If we knew that there were at least one 1 in vec we could omit the if statement. Of the solutions here this seems the most straightforward.
nms <- names(df)
LHS <- paste(nms[vec == -1], collapse = "+")
RHS <- paste(nms[vec == 1], collapse = "+")
if (RHS == "") RHS <- "1"
paste0(LHS, "~", RHS)
## [1] "a+b~d+e"
2) sapply Alternately combine the LHS and RHS lines into a single sapply. If we knew that there were at least one 1 in vec then we could
simplify the code by omitting the if statement. This approach is shorter than (1).
sa <- sapply(c(-1, 1), function(x) paste(names(df)[vec == x], collapse = "+"))
if (sa[2] == "") sa[2] <- "1"
paste0(sa[1], "~", sa[2])
## [1] "a+b~d+e"
3) tapply We can alternately combine the LHS and RHS lines into a single tapply like this:
ta <- tapply(names(df), vec, paste, collapse = "+")
paste0(if (any(vec == -1)) ta[["-1"]], "~", if (any(vec == 1)) ta[["1"]] else 1)
## [1] "a+b~d+e"
If we knew that -1 and 1 each appear at least once in vec then we can simplify the last line to:
paste0(ta[["-1"]], "~", ta[["1"]]])
## [1] "a+b~d+e"
Overall this approach is the shortest if we can guarantee that there will be at least one 1 and at least one -1 but otherwise handling the edge cases seems somewhat cumbersome compared to the other approaches.
We could do this by creating a group by paste
paste(aggregate(nm ~ vec, subset(data.frame(nm = names(df), vec,
stringsAsFactors = FALSE), vec != 0),
FUN = paste, collapse= ' + ')[['nm']], collapse=' ~ ')
#[1] "a + b ~ d + e"
Or another option is tapply
paste(tapply(names(df), vec, FUN = paste,
collapse= ' + ')[c('-1', '1')], collapse= ' ~ ')
#[1] "a + b ~ d + e"