How do I Compute Binary String Permutations in R? - r

I have a binary string like this:
0010000
I'd like to have all these permutations:
1010000
0110000
0011000
0010100
0010010
0010001
Is there anybody know which function in R could give me these results?

R has functions for bitwise operations, so we can get the desired numbers with bitwOr:
bitwOr(16, 2^(6:0))
#> [1] 80 48 16 24 20 18 17
...or if we want to exclude the original,
setdiff(bitwOr(16, 2^(6:0)), 16)
#> [1] 80 48 24 20 18 17
However, it only works in decimal, not binary. That's ok, though; we can build some conversion functions:
bin_to_int <- function(bin){
vapply(strsplit(bin, ''),
function(x){sum(as.integer(x) * 2 ^ seq(length(x) - 1, 0))},
numeric(1))
}
int_to_bin <- function(int, bits = 32){
vapply(int,
function(x){paste(as.integer(rev(head(intToBits(x), bits))), collapse = '')},
character(1))
}
Now:
input <- bin_to_int('0010000')
output <- setdiff(bitwOr(input, 2^(6:0)),
input)
output
#> [1] 80 48 24 20 18 17
int_to_bin(output, bits = 7)
#> [1] "1010000" "0110000" "0011000" "0010100" "0010010" "0010001"

library(stringr)
bin <- '0010000'
ones <- str_locate_all(bin, '1')[[1]][,1]
zeros <- (1:str_length(bin))[-ones]
sapply(zeros, function(x){
str_sub(bin, x, x) <- '1'
bin
})
[1] "1010000" "0110000" "0011000" "0010100" "0010010" "0010001"

We assume that the problem is to successively replace each 0 in the input with a 1 for an input string of 0's and 1's.
Replace each character successively with a "1", regardless of its value and then remove any components of the result equal to the input. No packages are used.
input <- "0010000"
setdiff(sapply(1:nchar(input), function(i) `substr<-`(input, i, i, "1")), input)
## [1] "1010000" "0110000" "0011000" "0010100" "0010010" "0010001"
Update: Have completely revised answer.

Related

how to do for-loop in R

aa <-order(maxstaCode$ gateInComingCnt,decreasing=TRUE)[1:10]
aa
[1] 11 121 19 79 13 21 43 10 15 138
for(i in aa){
maxinnum<-c(maxstaCode$gateInComingCnt[i])
}
maxinum
I wanted to use the loop to bring the numbers of aa into the index value in the chart in sequence, and runs out of the value corresponding to the index value the result below
[1] 6235770 2805043 2772432 2592227 2461369 2428441 1990890 1821025 1595055
[10] 1491299
but it turned out:
[1] 1491299
In the for loop, the issue was that maxinum is updated on each iteration, resulting in returning the last value. Instead we need to use c(maxinum, ...)
maxinum <- c()
for(i in aa){
maxinum <- c(maxinum, maxstaCode$gateInComingCnt[i])
}
maxinum

R : Extract a Specific Number out of a String

I have a vector as below
data <- c("6X75ML","24X37.5ML (KKK)", "6X2X75ML", "168X5CL (UUU)")
here i want to extract the first number before the "X" for each of the elements.
In case of situations with 2 "X" i.e. "6X2X75CL" the number 12 (6 multiplied by 2) should be calculated.
expected output
6, 24, 12, 168
Thank you for the help...
Here's a possible solution using regular expressions :
data <- c("6X75ML","24X37.5ML (KKK)", "6X2X75ML", "168X5CL (UUU)")
# this regular expression finds any group of digits followed
# by a upper-case 'X' in each string and returns a list of the matches
tokens <- regmatches(data,gregexpr('[[:digit:]]+(?=X)',data,perl=TRUE))
res <- sapply(tokens,function(x)prod(as.numeric(x)))
> res
[1] 6 24 12 168
Here is a method using base R:
dataList <- strsplit(data, split="X")
sapply(dataList, function(x) Reduce("*", as.numeric(head(x, -1))))
[1] 6 24 12 168
strplit breaks up the vector along "X". The resulting list is fed to sapply which the performs an operation on all but the final element of each vector in the list. The operation is to transform the elements into numerics and the multiply them. The final element is dropped using head(x, -1).
As #zheyuan-li comments, prod can fill in for Reduce and will probably be a bit faster:
sapply(dataList, function(x) prod(as.numeric(head(x, -1))))
[1] 6 24 12 168
We can also use str_extract_all
library(stringr)
sapply(str_extract_all(data, "\\d+(?=X)"), function(x) prod(as.numeric(x)))
#[1] 6 24 12 168
ind=regexpr("X",data)
val=as.integer(substr(data, 1, ind-1))
data2=substring(data,ind+1)
ind2=regexpr("[0-9]+X", data2)
if (!all(ind2!=1)) {
val2 = as.integer(substr(data2[ind2==1], 1, attr(ind2,"match.length")[ind2==1]-1))
val[ind2==1] = val[ind2==1] * val2
}

Select all binary neighbors of decimal number

Let's say I have a number in decimal format: 5
its binary version is: 00101
I would like to write a function that takes the decimal number x
and returns all other decimal numbers that have a single digit difference (in their binary forms) from the original one:
so for the example above the neighbors are:
10101 01101 00111 00001 00100
and the corresponding decimals are:
21 13 7 1 4
I would like a solution that is computationally efficient and doesn't take a long time even if I have say a million digits.
Is this possible to do?
I've no idea how trial and error got me here, but it looks valid unless I've messed up binaries and decimals:
bin_neighs = function(x, n) bitwXor(x, (2 ^ (0:(n - 1))))
bin_neighs(5, 5)
#[1] 4 7 1 13 21
I think you're asking how to take as input a number 5 and to return all neighboring binary values. To do this, you need to convert the number to a useful binary format (just the bits you want to flip), flip each bit, and return the result:
library(R.utils)
bin.neighbors <- function(x, num.neighbors=NA) {
# Get the bits with the appropriate amount of padding
bits <- as.numeric(unlist(strsplit(intToBin(x), "")))
if (!is.na(num.neighbors) & num.neighbors > length(bits)) {
bits <- c(rep(0, num.neighbors-length(bits)), bits)
}
# Build a matrix where each column is a bit vector of a neighbor
mat <- matrix(bits, length(bits), length(bits))
diag(mat) <- 1-diag(mat)
# Return the decimal values of the neighbors using strtoi
apply(mat, 2, function(x) strtoi(paste0(x, collapse=""), 2))
}
bin.neighbors(5, 5)
# [1] 21 13 1 7 4
Because each number has a number of binary representations with different numbers of leading 0s (e.g. 5 can be represented as 101, 0101, 00101, 000101, 0000101, etc.), I added an argument num.neighbors to specify the length of the output vector from the function. You can pass NA to obtain an output vector equal to the number of bits in the binary representation of the input with no leading zeros.
Here's another way using magrittr's pipe:
binNeighbours <- function(a, numNeighbours = ceiling(log2(a))) {
rep(a, numNeighbours) %>%
outer(., seq(.) - 1, function(x, y) x %/% (2 ^ y) %% 2) %>%
`diag<-`(., 1 - diag(.)) %>%
`%*%`(2 ^(0:(nrow(.) - 1))) %>%
`[`(, 1)
}

Arithmetic Progression series in R

I am new to this forum. I guess something like this has been asked before but, I am not really sure if that is what I want.
I have a sequence like this,
1 2 3 4 5 8 9 10 12 14 15 17 18 19
So, what I wish to do is this, get all the numbers which form a series,i.e.the numbers that belonging to that set should all have a constant difference with the previous element, and also the minimum number of elements should be 3 in that set.
i.e., I can see that (1,2,3,4,5) forms one such series in which numbers appear after an interval of 1 and the total size of this set is 5 which satisfies the minimum threshold criteria.
(1,3,5) forms one such a pattern in which the numbers appear after an interval of 2.
(8,10,12,14) forms another such pattern with an interval of 2. So, as you can see, the interval of repetition can be anything.
Also, for a particular set, I want its maximal one. I dont want, (8,10,12) (although it satisfies the minimum threshold of 3 and constant difference ) as the output and only of the maximal length I want, i.e. (8,10,12,14).
Similarly, for, (1,2,3,4,5) , I dont want (1,2,3) or (2,3,4,5) as the output, only the MAXIMAL LENGTH ONE I WANT, i.e. (1,2,3,4,5).
How can I do this in R?
Edit: That is, I want any set which forms a basic AP series with any difference, however the total value should be greater than 3 in that series and it should be maximal.
Edit2: I have tried using rle and acf in R but that doesnt entirely solves my problem.
Edit3: When I did acf, it basically gave me the maximum peak difference that I could have used. However, I want all the differences possible. Also, rle is just way different. It gave me the longest continuous sequence of similar numbers. Which is not there in my case.
If you are looking for sequences of consecutive numbers, then cgwtools::seqle will find them for you in the same way rle finds a sequence of repeated values.
In the general case of basically any subset of your data which form such a sequence, such as the 8,10,12,14 case you cite, your criteria are so general as to be very difficult to satisfy. You'd have to start at each element of your series and do a forward-looking search for x[j] +1, x[j]+2, x[j]+3 ... ad infinitum. This suggests using some tree-based algorithms.
Here's a potential solution - albeit a very ugly, sloppy one:
##
arithSeq <- function(x=nSeq, minSize=4){
##
dx <- diff(x,lag=1)
Runs <- rle(diff(x))
##
rLens <- Runs[[1]]
rVals <- Runs[[2]]
pStart <- c(
rep(1,rLens[1]),
rep(cumsum(1+rLens[-length(rLens)]),times=rLens[-1])
)
pEnd <- pStart + c(
rep(rLens[1]-1, rLens[1]),
rep(rLens[-1],times=rLens[-1])
)
pGrp <- rep(1:length(rLens),times=rLens)
pLen <- rep(rLens, times=rLens)
dAll <- data.frame(
pStart=pStart,
pEnd=pEnd,
pGrp=pGrp,
pLen=pLen,
runVal=rep(rVals,rLens)
)
##
dSub <- subset(dAll, pLen >= minSize - 1)
##
uVals <- unique(dSub$runVal)
##
maxSub <- subset(dSub, runVal==uVals[1])
maxLen <- max(maxSub$pLen)
maxSub <- subset(maxSub, pLen==maxLen)
##
if(length(uVals) > 1){
for(i in 2:length(uVals)){
iSub <- subset(dSub, runVal==uVals[i])
iMaxLen <- max(iSub$pLen)
iSub <- subset(iSub, pLen==iMaxLen)
maxSub <- rbind(
maxSub,
iSub)
maxSub
}
##
}
##
deDup <- maxSub[!duplicated(maxSub),]
seqStarts <- as.numeric(rownames(deDup))
outList <- list(NULL); length(outList) <- nrow(deDup)
for(i in 1:nrow(deDup)){
outList[[i]] <- list(
Sequence = x[seqStarts[i]:(seqStarts[i]+deDup[i,"pLen"])],
Length=deDup[i,"pLen"]+1,
StartPosition=seqStarts[i],
EndPosition=seqStarts[i]+deDup[i,"pLen"])
outList
}
##
return(outList)
##
}
##
So there are things that can definitely be improved in this function - for instance I made a mistake somewhere in the calculation of pStart and pEnd, the start and end indices of a given arithmetic sequence, but it just so happened that the true start positions of such sequences are given as the rownumbers of one of the intermediate data.frames, so that was a hacky sort of solution. Anyways, it accepts a numeric vector x and a minimum length parameter, minSize. It will return a list containing information about sequences meeting the criteria you outlined above.
set.seed(1234)
lSeq <- sample(1:25,100000,replace=TRUE)
nSeq <- c(1:10,12,33,13:17,16:26)
##
> arithSeq(nSeq)
[[1]]
[[1]]$Sequence
[1] 16 17 18 19 20 21 22 23 24 25 26
[[1]]$Length
[1] 11
[[1]]$StartPosition
[1] 18
[[1]]$EndPosition
[1] 28
##
> arithSeq(x=lSeq,minSize=5)
[[1]]
[[1]]$Sequence
[1] 13 16 19 22 25
[[1]]$Length
[1] 5
[[1]]$StartPosition
[1] 12760
[[1]]$EndPosition
[1] 12764
[[2]]
[[2]]$Sequence
[1] 11 13 15 17 19
[[2]]$Length
[1] 5
[[2]]$StartPosition
[1] 37988
[[2]]$EndPosition
[1] 37992
Like I said, its sloppy and inelegant, but it should get you started.

How to sum specific vectors in a list in R

I know this should be simple but I just can't do it...I have a data frame called data that works nicely and does what I want it to with the correct column headers and everything. I can call colSums() to get a list of 21 numbers which are the sums of each column.
> a <- colSums(data,na.rm = TRUE)
> names(a) <- NULL
> a
[1] 1000000.00 680000.00 170000.00 462400.00 115600.00 144500.00 314432.00 78608.00 98260.00 122825.00 213813.76 53453.44 66816.80
[14] 83521.00 104401.25 145393.36 36348.34 45435.42 56794.28 70992.85 88741.06
The problem is I need a list with the first number alone, the sum of the next two, sum of the next 3, sum of the next 4 etc. until I run out of numbers. I imagine it would look something like this:
c(sum(a[1]),sum(a[2:3]),sum(a[4:6])... etc.
Any help or a different way to do this would be greatly appreciated!
Thank you.
You should only need to go out to something on the order of sqrt(length(vector)). The seq function lets you specify a start integer and a length, so sending a sequence of integers to seq(1+x*(x-1)/2, length=x) should create the right set of sequences. It wasn't clear whether incomplete sequences at the end should return a result or NA so I put in na.rm=TRUE. You might decide otherwise. (You did not illustrate a dataframe but rather an ordinary numeric vector.
sumsegs <- function(vec) sapply(1:sqrt(2*length(vec)), function(x)
sum( vec[seq(1+x*(x-1)/2, length=x)], na.rm=TRUE) )
a <- scan()
1000000.00 680000.00 170000.00 462400.00 115600.00 144500.00 314432.00 78608.00 98260.00 122825.00 213813.76 53453.44 66816.80 83521.00 104401.25 145393.36 36348.34 45435.42 56794.28 70992.85 88741.06
# 22: enter carriage return to stop scan input
#Read 21 items
sumsegs(a)
#[1] 1000000.0 850000.0 722500.0 614125.0 522006.2 443705.3
I'm not exactly sure what the right upper limit on the number to send to the inner function. sqrt(length(vec)) is too short, but sqrt(2*length(vec)) seems to be "working" at lower numbers anyway.
> sapply( sapply(1:sqrt(2*100), function(x) seq(1+x*(x-1)/2, length=x) ), max)
[1] 1 3 6 10 15 21 28 36 45 55 66 78 91 105
> sapply( sapply(1:sqrt(100), function(x) seq(1+x*(x-1)/2, length=x) ), max)
[1] 1 3 6 10 15 21 28 36 45 55
This is a function that returns the last element in sequences so formed and making the factor 2.1 rather than 2 corrects minor deficiencies in the range of length 500-1000:
tail(lapply( sapply(1:sqrt(2.1*500), function(x) seq(1+x*(x-1)/2, length=x) ), max),1 )
[[1]]
[1] 528
tail(lapply( sapply(1:sqrt(2.1*500), function(x) seq(1+x*(x-1)/2, length=x) ), max),1 )
[[1]]
[1] 496
Going higher did not seem to degrade the "times 2" correction. There's probably some kewl number theory explanation for this.
tail(lapply( sapply(1:sqrt(2*100000), function(x) seq(1+x*(x-1)/2, length=x) ), max),1 )
[[1]]
[1] 100128
Alternatively a much more naive method is:
sums=colSums(data)
n=0 # number of sums
i=1 # currentIndex
intermediate=0;
newIndex=1;
newVec <- vector()
while(i<length(sums)) {
for(j in i:(i+n)) {
if(j<=length(sums))
intermediate=intermediate+sums[j]
}
if(n>1){
i=i+n+1;
}
else{
i=i+1;
}
newVec=c(newVec, intermediate);
intermediate=0;
n=n+1;
}
Here's a similar approach, using rep(...) and by(...)
n <- (-1+sqrt(1+8*length(a)))/2 # number of groups
groups <- rep(1:n,1:n) # indexing vector
result <- as.vector(by(a,groups,sum))
result
# [1] 1000000.0 850000.0 722500.0 614125.0 522006.2 443705.3

Resources