How to replace the wild card characters with sampled characters in R - r

I have the following sequence:
s0 <- "KDRH?THLA???RT?HLAK"
The wild card character there is indicated by ?.
What I want to do is to replace that character by sampled character from this vector:
AADict <- c("A", "R", "N", "D", "C", "E", "Q", "G", "H",
"I", "L", "K", "M", "F", "P", "S", "T", "W", "Y", "V")
Since s0 has 5 wild cards ?, I would sample from AADict:
set.seed(1)
nof_wildcard <- 5
tolower(sample(AADict, nof_wildcard, TRUE))
Which gives [1] "d" "q" "a" "r" "l"
Hence the expected result is:
KDRH?THLA???RT?HLAK
KDRHdTHLAqarRTlHLAK
So the placement of the sampled character must be exactly in the same position as ?, but the order of the character is not important.
e.g. this answer is also acceptable: KDRHqTHLAdlaRTrHLAK.
How can I achieve that with R?
The other example are:
s1 <- "FKDHKHIDVKDRHRTHLAK????RTRHLAK"
s2 <- "FKHIDVKDRHRTRHLAK??????????"

One approach is to replace the "?" characters 'one at a time' using a loop, e.g.
s0 <- "KDRH?THLA???RT?HLAK"
AADict <- c("A", "R", "N", "D", "C", "E", "Q", "G", "H",
"I", "L", "K", "M", "F", "P", "S", "T", "W", "Y", "V")
s0
#> [1] "KDRH?THLA???RT?HLAK"
repeat{s0 <- sub("\\?", sample(tolower(AADict), 1), s0); if(grepl("\\?", s0) == FALSE) break}
s0
#> [1] "KDRHtTHLAidwRTyHLAK"
s1 <- "FKDHKHIDVKDRHRTHLAK????RTRHLAK"
repeat{s1 <- sub("\\?", sample(tolower(AADict), 1), s1); if(grepl("\\?", s1) == FALSE) break}
s1
#> [1] "FKDHKHIDVKDRHRTHLAKrstaRTRHLAK"
s2 <- "FKHIDVKDRHRTRHLAK??????????"
repeat{s2 <- sub("\\?", sample(tolower(AADict), 1), s2); if(grepl("\\?", s2) == FALSE) break}
s2
#> [1] "FKHIDVKDRHRTRHLAKdvcfmheiqn"
Another approach which can also allow for sampling without replacement:
s0 <- "KDRH?THLA???RT?HLAK"
AADict <- c("A", "R", "N", "D", "C", "E", "Q", "G", "H",
"I", "L", "K", "M", "F", "P", "S", "T", "W", "Y", "V")
matches <- gregexpr("\\?", s0)
regmatches(s0, matches) <- lapply(lengths(matches), sample, x = tolower(AADict), replace = FALSE)
s0
#> [1] "KDRHdTHLAlanRTiHLAK"
Created on 2022-10-22 by the reprex package (v2.0.1)

You could split your string in single characters which makes it easy to replace the wildcard without the need of a loop (was my first approach):
replace_wc <- function(x, dict) {
x <- strsplit(x, split = "")[[1]]
ix <- grepl("\\?", x)
x[ix] <- sample(dict, sum(ix), replace = TRUE)
return(paste0(x, collapse = ""))
}
s0 <- "KDRH?THLA???RT?HLAK"
AADict <- c(
"A", "R", "N", "D", "C", "E", "Q", "G", "H",
"I", "L", "K", "M", "F", "P", "S", "T", "W", "Y", "V"
)
set.seed(1)
replace_wc(s0, tolower(AADict))
#> [1] "KDRHdTHLAqarRTlHLAK"

Here is a vectorized function to replace the "?" characters in a vector of strings.
fun <- function(x, dict = AADict) {
dict <- tolower(dict)
inx <- gregexpr("\\?", x)
sapply(seq_along(x), \(j) {
for(i in inx[[j]]) {
substr(x[j], i, i) <- sample(dict, 1L)
}
x[j]
})
}
AADict <- c("A", "R", "N", "D", "C", "E", "Q", "G", "H",
"I", "L", "K", "M", "F", "P", "S", "T", "W", "Y", "V")
s0 <- "KDRH?THLA???RT?HLAK"
s1 <- "FKDHKHIDVKDRHRTHLAK????RTRHLAK"
s2 <- "FKHIDVKDRHRTRHLAK??????????"
fun(s0)
#> [1] "KDRHsTHLAwppRTwHLAK"
fun(s1)
#> [1] "FKDHKHIDVKDRHRTHLAKyfqfRTRHLAK"
fun(s2)
#> [1] "FKHIDVKDRHRTRHLAKnsfehqwmkv"
fun(c(s0, s1, s2))
#> [1] "KDRHiTHLAdssRTgHLAK" "FKDHKHIDVKDRHRTHLAKcdivRTRHLAK"
#> [3] "FKHIDVKDRHRTRHLAKfrpafwpnif"
Created on 2022-10-22 with reprex v2.0.2

Related

Distance matrix from proxy package into a dataframe

I have a distance matrix from this code
I would like to convert the distanceMatrix into a dataframe. I use this:
library(reshape2)
melt(distanceMatrix)
or
as.data.frame(distanceMatrix)
and I receive this error:
Error in as.data.frame.default(x[[i]], optional = TRUE) :
cannot coerce class ""crossdist"" to a data.frame
Data
distanceMatrix <-
structure(c(1.1025096478618, 2.48701192612548, 1.81748937453859,
0.68928345814907, 3.4194165172611, 1.39021901561926, 0.696405607391678,
1.09511501308162, 0.733071057157832, 0.894074317336616, 0.274302486490285,
2.00790247099612, 2.03702210657379, 0.790303515570192, 0.76573433957666,
1.0571870370502, 2.08607605440225, 1.18691928628668, 0.950127106192438,
1.90183580897689, 1.06791623757733, 1.95426617861089, 1.28359907050968,
0.639828869115434, 1.2125883228325, 1.17334881171837, 2.86424081724093,
4.29579721901031, 2.48106485650871, 2.47992202769688, 4.78094585963798,
3.08269692108197, 2.51054397059837, 2.78351950724781, 1.9552995309483,
1.02672164296738, 2.04833064878561, 2.40777909325915, 1.37714830319657,
2.54290296394426, 1.99486295133513, 1.42661425293529, 2.75973709232752,
0.632464187558431, 2.64349038129557, 3.04900615202494, 1.34349249286485,
0.66548291586285, 1.14201671902258, 2.20314775706901, 3.027560891124,
2.58016468923376, 0.701837450761437, 1.82650318310107, 1.17318969224049,
0.898229996978744, 2.04804918964036, 0.510384590416117, 1.20067408397491,
0.479351971313752, 0.900264653292786, 2.17660319096498, 1.11774249289539,
1.50312712068438, 2.35380779446751, 0.74568873241509, 0.860144296532242,
1.49609968893816, 1.27903173482324, 2.30242237929782, 0.546178045451667,
0.696804454166844, 1.57330737370915, 3.18912158434627, 2.63481498585198,
0.743304574607114, 1.2813138290548, 0.278296684614969), .Dim = c(26L,
3L), .Dimnames = list(c("a", "b", "c", "d", "e", "f", "g", "h",
"i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u",
"v", "w", "x", "y", "z"), c("A", "B", "C")), class = "crossdist", method = "Euclidean", call = proxy::dist(x = voterIdealPoints,
y = candidateIdealPoints))
Use
as.dataframe(as.matrix(distanceMatrix))

How to order data.frame in my specific 'vector' order in R language?

I have a data.frame showed below:
In order to analyse the relationship between those 10 features and disorder propensity, I need to sort the data.frame in my amino acids order which is stored in an vector like this c("L", "I", "V", "Y", "C", "F", "R", "W", "M", "H", "N", "T", "G", "D", "Q", "A", "K", "S", "P", "E")
I tried this properties[aa == c("L", "I", "V", "Y", "C", "F", "R", "W", "M", "H", "N", "T", "G", "D", "Q", "A", "K", "S", "P", "E"), ] which doesn't seem to work for me.
What's the right way to sort the data.frame in my 'vector' order?
You can make your column aa a factor and give the factor levels in the correct order. The factor can then be sorted according to the levels. Look at this example:
my_order <- c("X", "Y", "Z", "A", "B") # defines the order
test <- c("A", "B", "Y", "Z", "Z", "A", "X", "X", "B") # a normal character vector
test2 <- factor(test, levels = my_order) # convert it to factor and specify the levels
test2 # original order unchanged
test2[order(test2)] # ordered by custom order
Note that you must specify all occuring factor levels or this will not work!

How can I generate a user password in R [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
I want to generate a passwort for user in R. Until now I'm using Excel and the following VB-Script. How canIi transform the functionality in an appropriate R script. Thank you very much.
myArr = Array("", 2, 3, 4, 5, 6, 7, 8, 9, "A", "B", _
"C", "D", "E", "F", "G", "H", "J", "K", "L", "M", _
"N", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", _
"a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o",
"p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", _
"!", "§", "$", "%", "&", "(", ")", "*")
intArr = UBound(myArr)
intA = Application.InputBox("Wieviele Passwörter sollen erstellt werden?", "PasswortGenerator", 10, , , , , 1)
If Not TypeName(intA) = "Boolean" Then
Randomize
intC = ActiveCell.Column
intZ = ActiveCell.Row
For intZ = intZ To intZ + intA 'Anzahl Passwörter
For intP = 1 To 8 'Anzahl Stellen des Passwortes
strP = strP & myArr(Int(intArr * Rnd + 1))
Next intP
If Application.WorksheetFunction.CountIf(ActiveCell.EntireColumn, strP) = 0 Then
ActiveSheet.Cells(intZ, intC).Value = strP
End If
strP = ""
Next intZ
End If
End Sub
Thank you very much.
OK, from bellow I'm trying to set up a function that will generate a password for each employee (mitarbeiter). I wnat to add a new variable 'passwort' in the function with the generated password for each employee. So, thanks again for your help.
genPsw <- function(num, len=8) {
# Vorgaben für die Passwortkonventionen festlegen
myArr <- c("", 2, 3, 4, 5, 6, 7, 8, 9, "A", "B",
"C", "D", "E", "F", "G", "H", "J", "K", "L", "M",
"N", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z",
"a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o",
"p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z",
"!", "§", "$", "%", "&", "(", ")", "*")
# replicate is a wrapper for the common use of sapply for repeated evaluation of an expression
# (which will usually involve random number generation).
replicate(num, paste(sample(myArr, size=len, replace=T), collapse=""))
# Lanege von dataframe mitarbeiter ermitteln
dim_mitarbeiter <- nrow(mitarbeiter)
for(i in 1:dim_mitarbeiter) {
# Random Number Generation
set.seed(i)
# Generate Passwort
mitarbeiter$passwort <- genPsw(i)
}
}
genPsw <- function(num, len=8) {
myArr <- c("", 2, 3, 4, 5, 6, 7, 8, 9, "A", "B",
"C", "D", "E", "F", "G", "H", "J", "K", "L", "M",
"N", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z",
"a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o",
"p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z",
"!", "§", "$", "%", "&", "(", ")", "*")
replicate(num, paste(sample(myArr, size=len, replace=T), collapse=""))
}
set.seed(1)
genPsw(3)
# [1] "JRf§E§&l" "j5ECnSsa" "p*St%Fk9"

Replacement with vectors

I have a vector with all consonants and I want every single consonant to be replaced with a "C" in a given data frame. Assume my data frame is x below:
x <- c("abacate", "papel", "importante")
v <- c("a", "e", "i", "o", "u")
c <- c("b", "c", "d", "f", "g", "h", "j", "k", "l", "m", "n", "p", "q", "r", "s", "t", "v", "w", "x", "y", "z")
find <- c
replace <- "C"
found <- match(x, find)
ifelse(is.na(found), x, replace[found])
This is not working. Could anybody tell me what the problem is and how I can fix it?
Thanks
Regular expressions (gsub) are far more flexible in general, but for that particular problem you can also use the chartr function which will run faster:
old <- c("b", "c", "d", "f", "g", "h", "j", "k", "l", "m", "n",
"p", "q", "r", "s", "t", "v", "w", "x", "y", "z")
new <- rep("C", length(old))
chartr(paste(old, collapse = ""),
paste(new, collapse = ""), x)
Use gsub to replace the letters in a character vector:
c <- c("b", "c", "d", "f", "g", "h", "j", "k", "l", "m", "n", "p", "q", "r", "s", "t", "v", "w", "x", "y", "z")
consonants = paste(c("[", c, "]"), collapse="")
replaced = gsub(consonants, "C", x)
consonants becomes a regular expression, [bcdfghjklmnpqrstvwxyz], that means "any letter inside the brackets."
One of the reasons your code wasn't working is that match doesn't look for strings within other strings, it only looks for exact matches. For example:
> match(c("a", "b"), "a")
[1] 1 NA
> match(c("a", "b"), "apple")
[1] NA NA

How can I partition a vector?

How can I build a function
slice(x, n)
which would return a list of vectors where each vector except maybe the last has size n, i.e.
slice(letters, 10)
would return
list(c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j"),
c("k", "l", "m", "n", "o", "p", "q", "r", "s", "t"),
c("u", "v", "w", "x", "y", "z"))
?
slice<-function(x,n) {
N<-length(x);
lapply(seq(1,N,n),function(i) x[i:min(i+n-1,N)])
}
You can use the split function:
split(letters, as.integer((seq_along(letters) - 1) / 10))
If you want to make this into a new function:
slice <- function(x, n) split(x, as.integer((seq_along(x) - 1) / n))
slice(letters, 10)

Resources