How to generate word sequence - r

1.I want to generate combinations of characters from a given word with each letter being repeated consecutively utmost 2 times and at least 1.The resultant words are of unequal lengths. For example from
"cat"
to
"cat", "catt", "caat", "caatt", "ccat", "ccatt", "ccaat", "ccaatt"
Required function takes a word of length n and generates 2^n words of unequal length. It is almost similar to binary digits with n length gives 2^n combinations. For example a 3 digit binary number gives
000 001 010 011 100 101 110 111
combinations, where 0=t and 1=tt.
2.And also the same function should restrict the resultant sequence maximum upto 2 consecutive repetition of a character even if the given word has repetitions of letters.For example
"catt"
to
"catt" "ccatt" "caatt" "ccaatt"
I tried something like this
pos=expand.grid(l1=c(1,11),l2=c(2,22),l3=c(3,33))
result=chartr('123','cat',paste0(pos[,1],pos[,2],pos[,3]))
#[1] "cat" "ccat" "caat" "ccaat" "catt" "ccatt" "caatt" "ccaatt"
It gives correct sequence but I am stuck with generalizing it to any given word with different lengths.
Thank you.

Use stdout as per normal...
print("Hello, world!")
x="cat"
l=seq(nchar(x))
n=paste(l,collapse="")
m=split(c(l,paste0(l,l)),rep(l,2))
chartr(n,x,do.call(paste0,expand.grid(m)))

1.Just an addition to the answer given by Onyambu to solve the second part of the question i.e. restrict the output to maximum 2 consecutive repetitions of a character given any number of consecutive repetitions of characters in the input word.
x="catt"
l=seq(nchar(x))
n=paste(l,collapse="")
m=split(c(l,paste0(l,l)),rep(l,2))
o <- chartr(n,x,do.call(paste0,expand.grid(m)))
Below line of code removes the words with more than 2 consecutive repetitive characters
unique(gsub('([[:alpha:]])\\1{2,}', '\\1\\1', o))
#[1] "catt" "ccatt" "caatt" "ccaatt"
2.If you want all the combinations starting from "cat" to "ccaattt" given any number of consecutive repetitions of characters in the input word. Code is
x1="catt"
Below line of code restricts the consecutive repetition of characters to 1.
x2= gsub('([[:alpha:]])\\1+', '\\1', x1)
l=seq(nchar(x2))
n=paste(l,collapse="")
m=split(c(l,paste0(l,l)),rep(l,2))
o <- chartr(n,x,do.call(paste0,expand.grid(m)))
unique(gsub('([[:alpha:]])\\1{2,}', '\\1\\1', o))
#[1] "cat" "ccat" "caat" "ccaat" "catt" "ccatt" "caatt" "ccaatt"

Related

Generating random 1 to 12 digit numbers in R following some condition

I wish to write a program in R to generate 1 random number (a positive integer) each starting from 3 digits to 12 digits following these conditions:
There is no order in the consecutive number digits.
Strictly no repetition of digits in a number until the 9th digit number.
0 can be used after the 9 digit number only.
After 10 digits, a digit can be used twice but with no order.
And most importantly:
**The first number will not be the last number of next line and vice versa. **
All I know how to use is the sample command in R:
sample(1:9, size=n, replace=FALSE)
where n is the number of digits I wish to generate. However, I need to write a more generalized function or program which strictly obeys these conditions.

R: pairwise matrix of the number of characters that differ among strings

I have a vector containing a large number of strings that are all of the same length. For example:
vec = c("keep", "teem", "meat", "weep")
I would like to compare every possible pair of strings from within this vector and count the number of characters that differ between them. Using the vector above, "keep" would be compared to every other string in the vector, "teem" would be compared to every other string, and so on.
I'm only interested in counting the number of characters from the same position within each string that are different. So for example "keep" vs. "teem" would have 2 differences, "keep" vs. "meat" 3 differences, etc. I'd like to output the results as a pairwise matrix, where the strings in the vector make up the row names and column names.
I've learned from another post (How can I compare two strings to find the number of characters that match in R, using substitution distance?) that I can use the adist argument in mapply to calculate the number of differences between two strings:
mapply(adist,string1,string2)
But I'm not sure how to modify this to operate over every possible pairwise combination in my vector, and to place the results in a pairwise matrix. Any ideas for how to do that? Thanks!!
Do you mean using adist like below?
> `dimnames<-`(adist(vec),rep(list(vec),2))
keep teem meat weep
keep 0 2 3 1
teem 2 0 3 2
meat 3 3 0 3
weep 1 2 3 0
An option with stringdistmatrix
library(stringdist)
out <- as.matrix(stringdistmatrix(vec))
dimnames(out) <- list(vec, vec)

Count number of digits including leading zeros in R

What is a way to count the number of digits of a numeric object in R including leading zeroes?
For example, I know nchar(x) will return the number of digits if x is numeric but what about instances in which x includes leading zeros?
Note: Count the number of integer digits does not address the issue of leading zeros.
Example:
x<-7
nchar(7)
[1] 1 #that's fine
x<-07
nchar(07)
[1] 1 #that is NOT fine: I want the value of 2 to appear

find count of substrings of all anagrams of 1st string that are anagram of 2nd

Need an approach to solve this problem!
Problem : Given two strings containing lowercase alphabets count number of matches modulo 10^9+7 of non intersecting substrings in all distinct anagrams of 1st string such that they are equal to any anagram of 2nd string.
Example :
1) String 1: "ABC", String 2: "AB"
Answer = 4
Explanation : 'ABC','BAC','CAB','CBA' all contribute 1 such match each.
2) String 1: "ABCAB", String 2: "AB"
Answer = 40
Explanation : One possible Anagram of string 1 'ABABC' for which match count is 2 that is 'AB' and 'AB' while 'BABCA' contributes only one match that is 'BA' or 'AB'.
Constraints :
n,m are lengths of first and second strings
0 < n < 200
0 < m < 100
The approach I tried doing involved pre-computing the first 200 factorials modulo 10^9+7 and then from the given string calculating how many maximum non intersecting patterns (mx) the string could have and looping from p=1 to mx and calculating the number of rearrangements of first string that contain exactly p non intersecting substrings (i.e string 2) patterns.
Is there a different approach that I am missing here?
Here is another approach you can use -
1)Calculate number of anagrams of string2. You can google permutations and combinations to get a method to do so (in O(1))(say x).
2)Calculate at most how many string2 can string1 contribute. This can be done by calculating how many times the character of your string repeat in string2. like in your second example 'A' and 'B' repeat twice in string two so you can get at most anagrams at one time.Note - If choose lowest number if the character frequency doesnt match. Like if for some string 'A' repeats thrice and 'B' twice you can get at most 2 anagrams so you take lowest repeating character's frequency.
3) Calculate answer using formulae of permutation and combination-
Number of string1 anagrams with only one anagram of string2 x*(n1-n2+1) where n1 and n2 are lengths of string1 and 2 resp.
With two anagrams - (n1- 2*n2+2)*x*x
And so on

Counting positive smiles in string using R

In src$Review each row is filled with text in Russian. I want to count the number of positive smiles in each row. For example, in "My apricot is orange)) (for sure)" I want to count not just the quantity of outbound brackets (i.e., excluding general brackets in "(for sure)"), but the amount of positive smiling characters ("))" — at least two outbound brackets, number of ":)", ":-)"). So, it works only if at least two outbound brackets are exhibited.
Assume there is a string "I love this girl!)))) (she makes me happy) every day:):) :-)!" Here we count: )))) (4 units), ":)" (2 units), ":-)" (1 unit). After we combine the number of units (i.e., 7). Pay attention that we don't count brackets in "(she makes me happy)".
Now I have following code in my script:
smilecounts <- str_count(src$Review, "[))]")
It counts only the total amount of bracket pairs ("()") (as I understand comparing data set and derivation of this command).
I only need the total amount of ":)", ":-)", "))" (the total number of outbound brackets which display as "))" in rows) to be counted. For example, in ")))))" appear 5 outbound brackets, the condition of at least two outbound brackets together is satisfied, than we count the total amount of brackets in this part of text (i.e., 5 outbound brackets).
Thank you so much for help in advance.
We can use regex lookarounds to extract the ) that follows a ) or : or :=, then use length to get the count.
length(str_extract_all(str1, '(?<=\\)|\\!)\\)')[[1]])
#[1] 4
length(str_extract_all(str1, '(?<=:)\\)')[[1]])
#[1] 2
length(str_extract_all(str1, '(?<=:-)\\)')[[1]])
#[1] 1
Or this can be done using a loop
pat <- c('(?<=\\)|\\!)\\)', '(?<=:)\\)', '(?<=:-)\\)')
sum(sapply(lapply(pat, str_extract_all, string=str1),
function(x) length(unlist(x))))
#[1] 7
data
str1 <- "I love this girl!)))) (she makes me happy) every day:):) :-)!"
One way with regexpr and regmatches:
vec <- "I love this girl!)))) (she makes me happy) every day:):) :-)!"
Solution:
#matches the locations of :-) or ))+ or :)
a <- gregexpr(':-)+|))+|:)+', vec)
#extracts those
b <- regmatches(vec, a)[[1]]
b
#[1] "))))" ":)" ":)" ":-)"
#table counts the instances
b
)))) :-) :)
1 1 2
Then I suppose you could count the number of single )s using
nchar(b[1])
[1] 4
Or in a more automated way:
tab <- table(b)
#the following means "if a name of the table consists only of ) then
#count the number of )s"
tab2 <- ifelse(gsub(')','', names(table(b)))=='', nchar(names(table(b))), table(b))
names(tab2) <- names(tab)
> tab2
)))) :-) :)
4 1 2

Resources