Replace letters with ciphertext ones - r

I've been playing with R's gsub2 function R: replace characters using gsub, how to create a function? to create a ciphertext:
from<-c('s','l','k','u','m','i','x','j','o','p','n','q','b','v','w','z','f','y','t','g','h','a','e','d','c','r')
to<-c('z','e','b','r','a','s','c','d','f','g','h','i','j','k','l','m','n','o','p','q','t','u','v','w','x','y')
For example:
original text: the who's 1973
ciphertext: ptv ltn'm 1973
Problem is, that gsub2 replaces some letters twice, (o->f->n and s->z->n), which messes up my ciphertext and makes it almost impossible to decode. Could anyone point out the mistake I'm making? Thanks!

One way is to use a named vector as the encoding cipher. An easy way to create such a named vector is to use setNames:
cipher <- setNames(to, from)
cipher
s l k u m i x j o p n q b v w z f y t g h a e d c r
"z" "e" "b" "r" "a" "s" "c" "d" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "t" "u" "v" "w" "x" "y"
For the encoding function, you can use strsplit and paste:
encode <- function(x){
splitx <- strsplit(x, "")[[1]]
xx <- cipher[splitx]
xx[is.na(xx)] <- splitx[is.na(xx)]
paste(xx, collapse="")
}
encode("the who's 1973")
[1] "ptv ltf'z 1973"

You can also use chartr as mentionned in a (popular : 12 ups) answer to the question you quoted:
cipher <- function(x)
chartr( "slkumixjopnqbvwzfytghaedcr", "zebrascdfghijklmnopqtuvwxy", x )

Related

Opposite of "bitwAnd" Function in R?

I found this function in R that can generate the "power set" for a set of elements:
f <- function(set) {
n <- length(set)
masks <- 2^(1:n-1)
lapply( 1:2^n-1, function(u) set[ bitwAnd(u, masks) != 0 ] )
}
results = f((LETTERS[1:5])
results = sapply(results, paste, collapse = " ")
In a previous question (Subsetting Elements in a "Hypothetical" List), I learned how to interact with very large "power sets" that the computer can not load into memory. For example - suppose I wanted to make the "power set" for all 26 letters in the English alphabet (this set would contain 2^26 = 67108864 elements). I could find out the "13626980"th element in this list without actually generating the list (since it would be impossible to generate/store such a big list):
LETTERS[bitwAnd(13626980, 2^(1:26-1)) != 0]
[1] "C" "F" "G" "J" "K" "L" "N" "O" "P" "Q" "R" "S" "T" "W" "X"
I had the following question : Is it possible to do the "opposite" of this task?
For example, given the number "13626980" - can some function determine which sequence of letters ("C" "F" "G" "J" "K" "L" "N" "O" "P" "Q" "R" "S" "T" "W" "X") corresponds to? Is there some hypothetical function like:
#input
> hypothetical_function(c("C" "F" "G" "J" "K" "L" "N" "O" "P" "Q" "R" "S" "T" "W" "X"))
#output
13626980
Is this possible?
Thank you!

R function for repeat loop to create multiple variables from table

Apologies for any poorly formed question - I'm very new to R.
I am looking to create multiple character strings from this data table..
I have created the character string:
coor1 <- R_data[1,8]
I am looking to iterate this for other indices as follows:
coor1 <- data[1,8]
coor2 <- data[2,8]
coor3 <- data[3,8]
coor4 <- data[4,8]
coor5 <- data[5,8] etc....
I have tried using a for loop but with no success. Any advice would be great.
Thanks very much.
I think you just need to subset the 8th column with $:
data <- data.frame(V1 = rep(NA, 26))
data[, 2:7] <- NA
data[, 8] <- letters
data$V8
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s" "t" "u"
[22] "v" "w" "x" "y" "z"
data[1, 8]
[1] "a"
data[2, 8]
[1] "b"
data[3, 8]
[1] "c"
data[4, 8]
[1] "d"
You can assign it to a variable as well:
coor <- data$V8
And extract a single result with []:
coor[1]
[1]"a"
coor[2]
[1] "b"
This can also be accomplished from the original dataframe:
data$V8[1]
[1] "a"
which is same as:
data[1, 8]
[1] "a"
What a loop would look like:
coors <- vector() #allocate space for storage
for(i in seq_len(nrow(data))){
coors[i] <- data[i, 8]
}

Finding midpoint of string in R (mid character of a word)

I'd like to find the midpoint of any word after the following is done to the word:
>x = 'hello'
>y = strsplit(x, '')
>y
[[1]]
[1] "h" "e" "l" "l" "o"
>z = unlist(y)
>z
[1] "h" "e" "l" "l" "o"
Doing this then allows for :
> z[1]
[1] "h"
> z[4]
[1] "l"
The difference being that before z=unlist(y) when you try z[index] you get back NA, example:
> x = 'hello'
> strsplit(x, '')
[[1]]
[1] "h" "e" "l" "l" "o"
> x[1]
[1] "hello"
> x[2]
[1] NA
Anyways, what I want to do is find the mid point of words that are in this format so that the output would be something like:
"l"
in the case of the word "hello". Also, in this example we have a word with 5 letters allowing to easily designate a single character as the midpoint but for a word like "bake" I would like to designate both "a" and "k" together as the midpoint.
Try
f1 <- function(str1){
N <- nchar(str1)
if(!N%%2){
res <- substr(str1, N/2, (N/2)+1)
}
else{
N1 <- median(sequence(N))
res <- substr(str1, N1, N1)
}
res
}
f1('bake')
#[1] "ak"
f1('hello')
#[1] "l"
Another option. get_middle assumes the word has already been split into characters, as per your description:
get_middle <- function(x) {
mid <- (length(x) + 1) / 2
x[unique(c(ceiling(mid), floor(mid)))]
}
Then:
words <- c("bake", "hello")
lapply(strsplit(words, ""), get_middle)
Produces:
[[1]]
[1] "k" "a"
[[2]]
[1] "l"
You could try this:
midpoint <- function(word) {
# Split the word into a vector of letters
split <- strsplit(word, "")[[1]]
# Get the number of letters in the word
n <- nchar(word)
# Get the two middle letters for words of even length,
# otherwise get the single middle letter
if (n %% 2 == 0) {
c(split[n/2], split[n/2+1])
} else {
split[ceiling(n/2)]
}
}
In the case of a word of even length, the middle two characters are returned as a vector.
midpoint("hello")
#[1] "l"
midpoint("bake")
#[1] "a" "k"
How about:
mid<-function(str)substr(str,(nchar(str)+1)%/%2,(nchar(str)+2)%/%2)
Or slightly more legibly:
mid2<-function(str){
n1<-nchar(str)+1
substr(str,n1%/%2,(n1+1)%/%2)
}
> mid("bake")
[1] "ak"
> mid("hello")
[1] "l"
This has the advantage that it immediately vectorizes:
> mid(c("bake","hello"))
[1] "ak" "l"
It is slower than #akrun's solution for long words, but my second version is faster; apparently counting characters can be costly for longer strings.
If you want the final product in a list, you can just strsplit the result:
mid3<-function(str)strsplit(mid2(str),"")
word = c("bake","hello")
print(nchar(word))
q = ifelse (nchar(word)%%2==0, substr(word,nchar(word)/2,nchar(word)/2+1),substr(word,nchar(word)/2+1,nchar(word)/2+1))
print(q)
[1] 4 5
[1] "ak" "l"

Generating Multiple Subsets in R

I have a large sequence of bytes, and I would like to generate a list containing an arbitrary number of subsets of that sequence. I suspect I need to use one of the apply functions, but the trick is that I need to iterate over the vector of starting positions, not the sequence itself.
Here's an example of how I want it to work --
extrct_by_mod <- function(x, startpos, endpos, lrecl)
{
x[1:length(x) %% lrecl %in% startpos:endpos]
}
tmp_seq <- letters[1:25]
startpos <- c(0, 2)
endpos <- c(1, 5)
lrecl <- 5
list_one <- extrct_by_mod(x=tmp_seq, startpos=startpos[1], endpos=endpos[1], lrecl=lrecl)
list_two <- extrct_by_mod(x=tmp_seq, startpos=startpos[2], endpos=endpos[2], lrecl=lrecl)
what_i_want <- list(list_one, list_two)
Ideally, I'd like to be able to just add more values to startpos and endpos, thus automatically generate more subsets to add to my list. Note that the subsets will not be the same length, and in some cases, not even the same type.
My datasets are fairly large, so something that scales well would be ideal. I realize that this could be done with a loop, but I'm understanding that you generally want to avoid looping in R.
Thank you!
Saving some time by pre-calculating the modulo-selection index:
> cats <- 1:length(tmp_seq) %% lrecl
> mapply(function(start,end) { tmp_seq[cats %in% start:end]} , startpos, endpos)
[[1]]
[1] "a" "e" "f" "j" "k" "o" "p" "t" "u" "y"
[[2]]
[1] "b" "c" "d" "g" "h" "i" "l" "m" "n" "q" "r" "s" "v" "w" "x"
(It is not correct that R apply functions are any faster than equivalent loops.)

Scan without spaces in R?

How do I scan for individual chars in a .txt for R? From my understanding, scan uses whitespace as separators, but if i want to use white space as something to scan for in R how do i do this?
ie (I want to scan the string "Hello World") how do i get H,e,l,l,o, ,W,o,r,l,d ?
strsplit would also be your friend here:
test <- readLines(textConnection("Hello world
Line two"))
strsplit(test,"")
> strsplit(test,"")
[[1]]
[1] "H" "e" "l" "l" "o" " " "w" "o" "r" "l" "d"
[[2]]
[1] "L" "i" "n" "e" " " "t" "w" "o"
And unlisted as suggested by #Thilo...
> unlist(strsplit(test,""))
[1] "H" "e" "l" "l" "o" " " "w" "o" "r" "l" "d" "L" "i" "n" "e" " " "t" "w" "o"
I would go a two-step approach: First read the file as plain text with readLines and then split the single lines to vectors of characters:
lines <- readLines("test.txt")
characterlist <- lapply(a, function(x) substring(x, 1:nchar(x), 1:nchar(x)))
Note that this approach does not return a well formed matrix or data.frame, but a list.
Depending on what you want to do, there might be a few different modifications:
unlist(characterlist)
gives you a vector of all characters in a row. If your textfile is so well behaved that you have exactly the same number of characters in each line, you may just add simplify=T to lapply and hopfully will get a matrix of your characters.

Resources