Problem with regex (check string for certain repetitions) - r

I would like to check whether in a text there are a) three consonants in a row or b) four identical letters in a row. Can someone please help me with the regular expressions?
library(tidyverse)
df <- data.frame(text = c("Completely valid", "abcdefg", "blablabla", "flahaaaa", "asdf", "another text", "a last one", "sj", "ngbas"))
consonants <- c("q", "w", "r", "t", "z", "p", "s", "d", "f", "g", "h", "k", "l", "m", "n", "b", "x")
df %>% mutate(
invalid = FALSE,
# Length too short
invalid = ifelse(nchar(text)<3, TRUE, invalid),
# Contains three consonants in a row: e.g. "ngbas"
invalid = ifelse(str_detect(text,"???"), TRUE, FALSE), # <--- Regex missing
# More than 3 identical characters in a row: e.g. "flahaaaa"
invalid = ifelse(str_detect(text,"???"), TRUE, FALSE) # <--- Regex missing
)

Three consonants in a row:
[qwrtzpsdfghklmnbx]{3}
Sequences of length > 3 of a specific char:
([a-z])(\\1){3}
# The double backslash occurs due to its role as the escape character in strings.
The latter uses a backreference. The number represents the ordinal number assigned to the capture group (= expression in parentheses) that is referenced - in this case the character class of latin lowercase letters.
For clarity, character case is not taken into account here.
Without backreferences, the solution gets a bit lengthy:
(aaaa|bbbb|cccc|dddd|eeee|ffff|gggg|hhhh|iiii|jjjj|kkkk|llll|mmmm|nnnn|oooo|pppp|qqqq|rrrr|ssss|tttt|uuuu|vvvv|wwww|xxxx|yyyy|zzzz)
The relevant docs can be found here.

You don't need to check the length of the word, regexs will made it for you.
In your code you have an error, the last ifelse condition will rewrite any output before, for example if the second ifelse is true and the third false the output will be false, your are making and AND condition.
I correct your error.
Here is the complete code:
df %>% mutate(
invalid = FALSE,
# Contains three consonants in a row: e.g. "ngbas"
invalid = ifelse(str_detect(text,regex("[BCDFGHJKLMNPQRSTVWXYZ]{3}", ignore_case = TRUE)), TRUE, invalid), # <--- Regex missing
# More than 3 identical characters in a row: e.g. "flahaaaa"
invalid = ifelse(str_detect(text,regex("([a-zA-Z])\\1{3}", ignore_case = TRUE)), TRUE, invalid) # <--- Regex missing
)

Related

Generate all possible combinations of a text string with two specific letters substituted for each other in R

Using R, I have generated several strings of letters that range from 6-25 characters. I'd like for each one to generate an output that consists of all the combinations of these strings with every "I" substituted for a "L" and vice versa, the order of the characters should stay the same.
For example:
Input
"IVGLWEA"
OUTPUT
"IVGLWEA"
"LVGLWEA"
"LVGIWEA"
'IVGIWEA"
"LVGLWEA"
many thanks
rob
Edit: Thanks to #Skaqqs for the dynamic solution!
string <- "IVGLWEA"
# find the number of I's and L's in the string
n <- length(unlist(gregexpr("I|L", string)))
# make a grid of all possible combinations with this amount of I's and L's
df <- expand.grid(rep(list(c("I", "L")), n))
# replace I's and L's with %s
string_ <- gsub("I|L", "\\%s", string)
# replace %s with letters in grid
do.call(sprintf, as.list(c(string_, df)))
Result:
[1] "IVGIWEA" "LVGIWEA" "IVGLWEA" "LVGLWEA"
Here's an extremely inefficient (but concise!) approach:
Create all potential combinations of your input characters and use regex to extract the desired pattern.
pattern <- "(I|L)VG(I|L)WEA"
b <- c("I", "V", "G", "L", "W", "E", "A")
strings <- apply(expand.grid(rep(list(b), 7)), 1, paste0, collapse = "")
grep(pattern, strings, value = TRUE)
[1] "IVGIWEA" "LVGIWEA" "IVGLWEA" "LVGLWEA"

R find the index of a list

I have a list of characters and after some lines of code my list has an element with zero characters. How can i extract the index of the element which have zero characters???
Original list
blocks <- list(
c("A", "B"),
c("C","D", "E", "R", "T"),
c("X"),
c("N")
)
Transformed list
blocks <- list(
character(0),
c("C","D", "E", "R", "T"),
c("X"),
c("N")
)
Not sure of what you want, but I guess grepcan do the trick:
if you want to know in which element of the list is a letter, use grep('A', block)
if you want to know the position of a letter into the whole list, you can try grep('A', unlist(blocks))
if you want something else, well, try it as well !
If we want to get a logical index of elements that have character(0), we can use lengths on the second 'blocks'
!lengths(blocks)
#[1] TRUE FALSE FALSE FALSE
lengths is a convenient wrapper for sapply(blocks, length) and is much faster.
lengths(blocks)
#[1] 0 5 1 1
returns a length of 0 for the first list element. By negating (!), the 0 gets coerced to TRUE and all others as FALSE.

Simple way to print a vector and string in one return value in R [duplicate]

I'm confused by paste, and thought it was just simple concatenating.
whales <- c("C","D","C","D","D")
quails <- c("D","D","D","D","D")
results <-paste(whales, quails, collapse = '')
Why would this return "C DD DC DD DD D" instead of CD DD CD DD DD?
Moreover, why would
results <-paste(whales[1], quails[1], collapse = '')
return
"C D" ?
with a space?
Thanks,
D
EDIT
OK, I see that
results <-paste(whales, quails, collapse = NULL, sep='')
will get me what I want, but an explanation of why the previous code didn't work? And also thank you to the answerers.
For those who like visuals, here is my take at explaining how paste works in R:
sep creates element-wise sandwich stuffed with the value in the sep argument:
collapse creates ONE big sandwich with the value of collapse argument added between the sandwiches produced by using the sep argument:
For the first question, try the following (which might be more illustrative than choosing to repeat 2 characters).
### Note that R paste's together corresponding elements together...
paste(c("A", "S", "D", "F"),
c("W", "X", "Y", "Z"))
[1] "A W" "S X" "D Y" "F Z"
### Note that with collapse, R converts the above
# result into a length 1 character vector.
paste(c("A", "S", "D", "F"),
c("W", "X", "Y", "Z"), collapse = '')
[1] "A WS XD YF Z"
What you really want to do (to get the "desired" result) is the following:
### "Desired" result:
paste(whales, quails, sep = '', collapse = ' ')
[1] "CD DD CD DD DD"
Note that we are specifying the sep and collapse arguments to different values, which relates to your second question. sep allows each terms to be separated by a character string, whereas collapse allows the entire result to be separated by a character string.
Try
paste(whales, quails, collapse = '', sep = '')
[1] "CDDDCDDDDD"
Alternatively, use a shortcut paste0, which defaults to paste with sep = ''
paste0(whales, quails, collapse = '')

Function to generate a random password

I want to generate a random password for employees with the function below. This is my first attempt with functions in R. So I need a bit help.
genPsw <- function(num, len=8) {
# Vorgaben für die Passwortkonventionen festlegen
myArr <- c("", 2, 3, 4, 5, 6, 7, 8, 9, "A", "B",
"C", "D", "E", "F", "G", "H", "J", "K", "L", "M",
"N", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z",
"a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o",
"p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z",
"!", "§", "$", "%", "&", "(", ")", "*")
# replicate is a wrapper for the common use of sapply for repeated evaluation of an expression
# (which will usually involve random number generation).
replicate(num, paste(sample(myArr, size=len, replace=T), collapse=""))
# nrow of dataframe mitarbeiter
dim_mitarbeiter <- nrow(mitarbeiter)
for(i in 1:dim_mitarbeiter) {
# Random Number Generation with i
set.seed(i)
# Generate Passwort for new variable password
mitarbeiter$passwort <- genPsw(i)
}
}
With the answer form Floo0 I've changed the function to somthing like that, but it doesn't work:
genPsw <- function(num, len=8) {
# Vorgaben für die Passwortkonventionen festlegen
sam<-list()
sam[[1]]<-1:9
sam[[2]]<-letters
sam[[3]]<-LETTERS
sam[[4]]<-c("!", "§", "$", "%", "&", "(", ")", "*")
# nrow of dataframe mitarbeiter
dim_mitarbeiter <- nrow(mitarbeiter)
for(i in 1:dim_mitarbeiter) {
# Random Number Generation with i
tmp<-mapply(sample,sam,c(2,2,2,2))
# Generate Passwort for new variable password
mitarbeiter$passwort <- paste(sample(tmp),collapse="")
}
}
What about
samp<-c(2:9,letters,LETTERS,"!", "§", "$", "%", "&", "(", ")", "*")
paste(sample(samp,8),collapse="")
result is something like this
"HKF§VvnD"
Caution:
This approch does not enforce having capitals, numbers, and non alpha numeric symbols
EDIT:
If you want to enforce a certain number of capitals, numbers, and non alpha numeric symbols you could go with this:
sam<-list()
sam[[1]]<-1:9
sam[[2]]<-letters
sam[[3]]<-LETTERS
sam[[4]]<-c("!", "§", "$", "%", "&", "(", ")", "*")
tmp<-mapply(sample,sam,c(2,2,2,2))
paste(sample(tmp),collapse="")
Where c(2,2,2,2) specifies the number of numbers, letters, capital letters and symbild (in this order). Result:
[1] "j$bP%5R3"
EDIT2:
To produce an new column in you table mitarbeiter just use
passwort<-replicate(nrow(mitarbeiter),paste(mapply(sample,sam,c(2,2,2,2)),collapse=""))
mitarbeiter$passwort<-passwort
There is function which generates random strings in stringi package:
require(stringi)
stri_rand_strings(n=2, length=8, pattern="[A-Za-z0-9]")
## [1] "90i6RdzU" "UAkSVCEa"
This might work, one might want to alter ASCII to avoid unwanted symbols:
generatePwd <- function(plength=8, ASCII=c(33:126)) paste(sapply(sample(ASCII, plength), function(x) rawToChar(as.raw(x))), collapse="")
The below script creates a password of specified length from a combination of upper and lowercase letters, digits and 32 symbols (punctuation, etc.).
# Store symbols as a vector in variable punc
R> library(magrittr) # Load this package to use the %>% (pipe) operator
R> punc_chr <- "!#$%&’()*+,-./:;<=>?#[]^_`{|}~" %>%
str_split("", simplify = TRUE) %>%
as.vector() -> punc
# Randomly generate specified number of characters from 94 characters
R> sample(c(LETTERS, letters, 0:9, punc), 8) %>%
paste(collapse = "") -> pw
R> pw # Return password
[1] "fAXVpyOs"
I like the brevity of L.R.'s solution, although I don't follow what it does 100%.
My solution allows to specify the length of the password but also ensures that at least one lower case, one upper case, one numeric, and one special character is included, and allows reproducibility. (ht to moazzem for spelling out all the special characters.)
gen_pass <- function(len=8,seeder=NULL){
set.seed(seeder) # to allow replicability of passw generation
# get all combinations of 4 nums summing to length len
all_combs <- expand.grid(1:(len-3),1:(len-3),1:(len-3),1:(len-3))
sum_combs <- all_combs[apply(all_combs, 1, function(x) sum(x)==len),]
# special character vector
punc <- unlist(strsplit("!#$%&’()*+,-./:;<=>?#[]^_`{|}~",""))
# list of all characters to sample from
chars <- list(punc,LETTERS,0:9,letters)
# retrieve the number of characters from each list element
# specified in the sampled row of sum_combs
pass_chars_l<- mapply(sample, chars,
sum_combs[sample(1:nrow(sum_combs),1),],
replace = TRUE)
# unlist sets of password characters
pass_chars <- unlist(pass_chars_l)
# jumble characters and combine into password
passw <- str_c(sample(pass_chars),collapse = "")
return(passw)
}
I am still wondering how the (1:(len-3),1:(len-3),1:(len-3),1:(len-3)) in expand.grid(1:(len-3),1:(len-3),1:(len-3),1:(len-3)) can be expressed more elegantly?
I have just tried the function proposed by Fmerhout. It seems like an excellent solution. thanks a lot. But because of the last line of code:
passw <- str_c (sample (pass_chars), collapse = "")
the function does not work.
I tried :
passw <- str (sample (pass_chars), collapse = "")
... and now it works:
example with a seed:
>gen_pass(4,2)
gives: chr [1:4] "y" "[" "O" "1"
In order to obtain a directly usable password, I changed the end of the code this way:
zz <- sample(pass_chars)
passw <- paste(zz, sep = "", collapse = "")
return(passw)
}
So, now we get for example:
> gen_pass(35, 2)
[1] "0OD}1O}8DKMqTL[JEFZBwKMJWGD’VZ=VRnD"
It's interesting; because we only have to remember the parameters passed to the function. here, in this case: 35 2.
Thank you Mr Fmerhout.
To conclude : with this little script we have a good way to create very strong and very safe passwords with good entropy without having to use a dictionary and without having to record in any place our passwords .

Paste/Collapse in R

I'm confused by paste, and thought it was just simple concatenating.
whales <- c("C","D","C","D","D")
quails <- c("D","D","D","D","D")
results <-paste(whales, quails, collapse = '')
Why would this return "C DD DC DD DD D" instead of CD DD CD DD DD?
Moreover, why would
results <-paste(whales[1], quails[1], collapse = '')
return
"C D" ?
with a space?
Thanks,
D
EDIT
OK, I see that
results <-paste(whales, quails, collapse = NULL, sep='')
will get me what I want, but an explanation of why the previous code didn't work? And also thank you to the answerers.
For those who like visuals, here is my take at explaining how paste works in R:
sep creates element-wise sandwich stuffed with the value in the sep argument:
collapse creates ONE big sandwich with the value of collapse argument added between the sandwiches produced by using the sep argument:
For the first question, try the following (which might be more illustrative than choosing to repeat 2 characters).
### Note that R paste's together corresponding elements together...
paste(c("A", "S", "D", "F"),
c("W", "X", "Y", "Z"))
[1] "A W" "S X" "D Y" "F Z"
### Note that with collapse, R converts the above
# result into a length 1 character vector.
paste(c("A", "S", "D", "F"),
c("W", "X", "Y", "Z"), collapse = '')
[1] "A WS XD YF Z"
What you really want to do (to get the "desired" result) is the following:
### "Desired" result:
paste(whales, quails, sep = '', collapse = ' ')
[1] "CD DD CD DD DD"
Note that we are specifying the sep and collapse arguments to different values, which relates to your second question. sep allows each terms to be separated by a character string, whereas collapse allows the entire result to be separated by a character string.
Try
paste(whales, quails, collapse = '', sep = '')
[1] "CDDDCDDDDD"
Alternatively, use a shortcut paste0, which defaults to paste with sep = ''
paste0(whales, quails, collapse = '')

Resources