Suppose I have a string
age<-c("7y2m4d","5m4d","7y5m6d")
I want to convert it to a numeric vector like
c(7.34, 0.43, 7.43)
How can I make the R code?
We can assume there is 365 days in a year and 365/12 days in a month.
lubridate::duration will convert your strings to (approximate) seconds.
library(lubridate)
library(magrittr)
age <- c("7y2m4d", "5m4d", "7y5m6d")
age_sec <- age %>%
duration() %>%
as.numeric()
age_sec
[1] 226508400 13494600 234570600
Then you can approximate years as 365 * 24 * 60 * 60 seconds:
age_sec / (365 * 24 * 60 * 60)
[1] 7.182534 0.427911 7.438185
Another solution with base R:
age<-c("7y2m4d","5m4d","7y5m6d")
age <- gsub('y', ' + ', age)
age <- gsub('m', ' / 12 + ', age)
age <- gsub('d', ' / 365', age)
sapply(age, function(x) eval(parse(text = x)))
#7 + 2 / 12 + 4 / 365 5 / 12 + 4 / 365 7 + 5 / 12 + 6 / 365
# 7.1776256 0.4276256 7.4331050
The idea is to create the formula and then evaluate it for each element of your vector.
These solutions:
handle missing y, m and/or d and
give the same answer as in the question (except for the first element of age for which the question appears to have computed the answer incorrectly)
avoid the use of eval
only use base (except for alternative 1a)
Comparing the solutions below on the basis of simplicity (1a) is the simplest and automatically handles all the edge cases without specific code for them suggesting that it is the most natural; however, it does make use of a package. (1) is only slightly more complex and uses no packages and (2) pretty short and also does not use any packages but it is not as simple as (1) or (1a).
1) Here getNum extracts and returns the number from x associated with the code (the code is "y", "m" or "d") or if the code is not present in x returns 0. We then add up the year, month/12 and day/365.
getNum <- function(code, x) {
pat <- sprintf(".*?(\\d+)%s.*", code)
as.numeric(ifelse(grepl(code, x), sub(pat, "\\1", x), 0))
}
getNum("y", age) + getNum("m", age) / 12 + getNum("d", age) / 365
## [1] 7.1776256 0.4276256 7.4331050
1a) This is similar to (1) except that we use strapply in gsubfn to simplify getNum. In fact getNum reduces to a single strapply call and the regular expression it uses is also simpler.
library(gsubfn)
getNum <- function(code, x) {
strapply(x, paste0("(\\d+)", code), as.numeric, empty = 0, simplify = TRUE)
}
getNum("y", age) + getNum("m", age) / 12 + getNum("d", age) / 365
## [1] 7.1776256 0.4276256 7.4331050
2) This alternative converts each string to dcf format and uses read.dcf to create a matrix of the y, m and d numbers.
In detail, the first line of code is to handle certain edge cases which are not actually present in the sample data in the question. We first append 0d to age (from the question) if d is missing so that we can handle the case where y, m and d are all missing. We also prepend a dummy entry to ensure that y, m and d are present in at least one entry. If we knew that y, m and d were present in at least one component and there was no component in which y, m and d were all simultaneously missing then this first line of code could be omitted.
The second line of code converts each input character string to dcf form and reads it into a matrix ensuring that the columns are in a known order and deleting the dummy entry added above.
Finally we replace NAs with 0 and and use matrix multiplication to add up the year, month/12 and day/365.
a0 <- c("0y0m0d", paste0(age, ifelse(grepl("d", age), "", "0d")))
m <- read.dcf(textConnection(gsub("(\\d+)(\\D)", "\\2: \\1\n", a0)))[-1, c("y", "m", "d")]
m[is.na(m)] <- 0
c(array(as.numeric(m), dim(m)) %*% c(1, 1/12, 1/365))
## [1] 7.1776256 0.4276256 7.4331050
Update: Rearranged and added (1) and (1a).
Related
I have a column in a dataset that looks like this :
Actual
chr
5.25%
-5.50*1000000000
0.24%
-4.00*1000
4.5%
My goal is to access it and automatically convert the cells that have *1000 or *1000000000 and make the calculation, ex -5.5 * 1000000000 should be - 5 500 000 000 on the cell and -4 * 1000 should be -4000.
Does anyone have a hint how to do this?
Best regards
This can be done using first a splitting operation on * and then a mapping operation based on purrr's function map_dbl to perform the calculations:
library(purrr)
library(dplyr)
df %>%
# Step 1: split strings on `*`:
mutate(x_new = strsplit(x,"\\*")) %>%
# Step 2: convert to numeric and perform calculation:
mutate(x_new = ifelse(str_detect(x_new, ","),
map_dbl(x_new, function(x) as.numeric(x)[1] * as.numeric(x)[2]),
x_new))
x x_new
1 -5.50*1000000000 -5.5e+09
2 35% 35%
3 -4.00*1000 -4000
(warning messages can be ignored)
Test data:
df <- data.frame(x = c("-5.50*1000000000", "35%", "-4.00*1000"))
If your string is guaranteed to be a valid expression that R can evaluate literally, you can use
eval(parse(text = '-5*1000'))
This parses the string into R code equivalent, and then executes it using eval. In this case, it leads to a numerical result of -5000.
Tread with care. More background on using eval(parse) can be found here
After trying for an embarrassingly long time and extensive searches online, I come to you with a problem.
I am looking for a method to (non-randomly) shuffle a string to get a string which has the maximal ‘distance’ from the original one, while still containing the same set of characters.
My particular case is for short nucleotide sequences (4-8 nt long), as represented by these example sequences:
seq_1<-"ACTG"
seq_2<-"ATGTT"
seq_3<-"ACGTGCT"
For each sequence, I would like to get a scramble sequence which contains the same nucleobase count, but in a different order.
A favourable scramble sequence for seq_3 could be something like;
seq_3.scramble<-"CATGTGC"
,where none of the sequence positions 1-7 has the same nucleobase, but the overall nucleobase count is the same (A =1, C = 2, G= 2, T=2). Naturally it would not always be possible to get a completely different string, but these I would just flag in the output.
I am not particularly interested in randomising the sequence and would prefer a method which makes these scramble sequences in a consistent manner.
Do you have any ideas?
python, since I don't know r, but the basic solution is as follows
def calcDistance(originalString,newString):
d = 0
i=0
while i < len(originalString):
if originalString[i] != newString[i]: d=d+1
i=i+1
s = "ACTG"
d_max = 0
s_final = ""
for combo in itertools.permutations(s):
if calcDistance(s,combo) > d_max:
d_max = calcDistance(s,combo)
s_final = combo
Give this a try. Rather than return a single string that fits your criteria, I return a data frame of all strings sorted by their string-distance score. String-distance score is calculated using stringdist(..., ..., method=hamming), which determines number of substitutions required to convert string A to B.
seq_3<-"ACGTGCT"
myfun <- function(S) {
require(combinat)
require(dplyr)
require(stringdist)
vec <- unlist(strsplit(S, ""))
P <- sapply(permn(vec), function(i) paste(i, collapse=""))
Dist <- c(stringdist(S, P, method="hamming"))
df <- data.frame(seq = P, HD = Dist, fixed=TRUE) %>%
distinct(seq, HD) %>%
arrange(desc(HD))
return(df)
}
library(combinat)
library(dplyr)
library(stringdist)
head(myfun(seq_3), 10)
# seq HD
# 1 TACGTGC 7
# 2 TACGCTG 7
# 3 CACGTTG 7
# 4 GACGTTC 7
# 5 CGACTTG 7
# 6 CGTACTG 7
# 7 TGCACTG 7
# 8 GTCACTG 7
# 9 GACCTTG 7
# 10 GATCCTG 7
I have an excel file of a list of sequences. How would I go about getting the number of times a letter appears before a letter in square brackets? An example of an entry is below.
GTCCTGGTTGTAGCTGAAGCTCTTCCC[A]CTCCTCCCGATCACTGGGACGTCCTATGT
I'd also like to do this for the letter after the square brackets.
Edit: Apologies for the confusion. Take the example below. Id like to count how many times A, C, G, and T appears immediately before and after the letter in square brackets (for which there is only one per line). So to count the occurences of A[A]A, A[A]C, C[A]A, and so on. The file is in excel, and I'm happy to use any method in excel, R or in Linux.
CCCACCCGCCAGGAAGCCGCTATCACTGTCCAAGTTGTCATCGGAACTCC[A]CCAGCCTGTGGACTTGGCCTGGTGCCGCCCATCCCCCTTGCGGTCCTTGC
ACCACTACCCCCTTCCCCACCATCCACCTCAGAAGCAGTCCCAGCCTGCC[A]CCCGCCAGCCCCTGCCCAGCCCTGGCTTTTTGGAAACGGGTCAGGATTGG
TTTGCTTTAAAATACTGCAACCACTCCAGGTAAATCTTCCGCTGCCTATA[A]CCCCGCCAATGAGCCTGCACATCAGGAGAGAAAGGGAAGTAACTCAAGCA
GAAATCTTCTGAAACAGTCTCCAGAAGACTGTCTCCAAATACACAGCAGA[A]CCAGCCAGTCCACAGCACTTTACCTTCTCTATTCTCAGATGGCAATTGAG
GGACTGCCCCAAGGCCCGCAGGGAGGTGGAGCTGCACTGGCGGGCCTCCC[A]GTGCCCGCACATCGTACGGATCGTGGATGTGTACGAGAATCTGTACGCAG
GGCCCAACGCCATCCTGAAACTCACTGACTTTGGCTTTGCCAAGGAAACC[A]CCAGCCACAACTCTTTGACCACTCCTTGTTATACACCGTACTATGTGGGT
TCTGCCTGGTCCGCTGGAGCTGGGCATTGAAGCCCCGCAGCTGCTCAGCC[A]CCTGCCCCGCCATCAAGAAGGCCCCACCGGCCCTGGGAAGGACACCCCTG
TTTGAAGCCCTTATGAACCAAGAAACCTTCGTTCAGGACCTCAAAATCAA[A]CCCCGCCACATGCAGCTCGCAGGCCTGCAGGAGGAAAGACAGGTTAGCAA
CTGCAGCCTACCTGTCCATGTCCCAGGGGGCCGTTGCCAACGCCAACAGC[A]CCCCGCCGCCCTATGAGCGTACCCGCCTCTCCCCACCCCGGGCCAGCTAC
ACTGGCAAACATGTTGAGGACAATGATGGAGGGGATGAGCTTGCATAGGA[A]CCTGCCGTAGGGCCACTGTCCCTGGAGAGCCAAGTGAGCCAGCGAGAAGG
CACCCTCAGAGAAGAAGAAAGGAGCTGAGGAGGAGAAGCCAAAGAGGAGG[A]GGCAGGAGAAGCAGGCAGCCTGCCCCTTCTACAACCACGAGCAGATGGGC
CCAGCCCTGTATGAGGACCCCCCAGATCAGAAAACCTCACCCAGTGGCAA[A]CCTGCCACACTCAAGATCTGCTCTTGGAATGTGGATGGGCTTCGAGCCTG
TTCCTGTGCGCCCCAACAACTCCTTTAGCTGGCCTAAAGTGAAAGGACGG[A]CCTGCCAATGAAAATAGACTTTCAGGGTCTAGCAGAAGGCAAGACCACCA
CTAACACCCGCACGAGCTGCTGGTAGATCTGAATGGCCAAGTCACTCAGC[A]CCTGCCGATACTCAGCCAGGTCAAAATTGGTGAGGCAGTGTTCATTCTGG
AGTTCTGCATCTGGAGCAAATCCTTGGCACTCCCTCATGCTGGCTATCAC[A]CCTGCCACGAATGTGCCATGGCCCAACCCTGCAGTCCATAAAGAAAACAA
CGTGCCCATGCAGCTAGTGCTCTTCCGAGAGGCTATTGAACACAGTGAGC[A]CCTGCCACGCCTATCCCCTTCCCCATCATCTCAGTGATGGGGTATGTCTA
ACAAGGACCTGGCCCTGGGGCAGCCCCTCAGCCCACCTGGTCCCTGCCTT[A]CCCAGCCAGTACTCTCCATCAGCACGGCCGAAGCCCAGCTTGTAGTCATT
You could split the original string into parts. From the start of the string to the first [ and from the first ] to the end of the string.
int count = firstPart.Count(f => f == 'a');
count += secondPart.Count(f => f == 'a');
Option Explicit
Sub test()
Dim seq As String
seq = "GTCCTGGTTGTAGCTGAAGCTCTTCCC[A]CTCCTCCCGATCACTGGGACGTCCTATGT"
Debug.Print CountLetter("A", seq)
End Sub
Function CountLetter(letter As String, ByVal sequence As String) As Long
'--- assumes the letter in the brackets is the same as that being counted
Dim allLetters() As String
allLetters = Split("A,C,G,T", ",")
Dim letterToDelete As Variant
For Each letterToDelete In allLetters
If letterToDelete <> letter Then
sequence = Replace(sequence, letterToDelete, "")
End If
Next letterToDelete
CountLetter = Len(sequence) - 1
End Function
x = "GTCCTGGTTGTAGCTGAAGCTCTTCCC[A]CTCCTCCCGATCACTGGGACGTCCTATGT"
#COUNT 'A'
sapply(unlist(strsplit(x,"\\[[A-z]\\]")), function(a) length(unlist(gregexpr("A", a))))
# GTCCTGGTTGTAGCTGAAGCTCTTCCC CTCCTCCCGATCACTGGGACGTCCTATGT
# 3 4
#COUNT 'G'
sapply(unlist(strsplit(x,"\\[[A-z]\\]")), function(a) length(unlist(gregexpr("G", a))))
# GTCCTGGTTGTAGCTGAAGCTCTTCCC CTCCTCCCGATCACTGGGACGTCCTATGT
# 7 6
New R solution (after clarification by OP)
Let's assume the data have been read from Excel into a data.table called los (list of sequences) which has only one column called sequence. Then, the occurences can be counted as follows:
library(data.table)
los[, .N, by = stringr::str_extract(sequence, "[ACGT]\\[[ACGT]\\][ACGT]")]
# stringr N
#1: C[A]C 8
#2: A[A]C 5
#3: C[A]G 1
#4: G[A]G 1
#5: G[A]C 1
#6: T[A]C 1
str_extract() looks for one of the letters A, C, G, T followed by [ followed by one of the letters A, C, G, T followed by ] followed by one of the letters A, C, G, T in column sequence and extracts the matching substrings. Then, los is grouped by the substrings and the number of occurences is counted (.N).
Data
If the Excel file is stored in CSV format then it can be read using data.table's fread() function like this
los <- fread("your_file_name.csv")
(Perhaps, some parameters to fread() might need to be adjusted for the specific file.)
However, some data already are provided in the question. These can be read as character string using fread() as well:
los <- fread("sequence
CCCACCCGCCAGGAAGCCGCTATCACTGTCCAAGTTGTCATCGGAACTCC[A]CCAGCCTGTGGACTTGGCCTGGTGCCGCCCATCCCCCTTGCGGTCCTTGC
ACCACTACCCCCTTCCCCACCATCCACCTCAGAAGCAGTCCCAGCCTGCC[A]CCCGCCAGCCCCTGCCCAGCCCTGGCTTTTTGGAAACGGGTCAGGATTGG
TTTGCTTTAAAATACTGCAACCACTCCAGGTAAATCTTCCGCTGCCTATA[A]CCCCGCCAATGAGCCTGCACATCAGGAGAGAAAGGGAAGTAACTCAAGCA
GAAATCTTCTGAAACAGTCTCCAGAAGACTGTCTCCAAATACACAGCAGA[A]CCAGCCAGTCCACAGCACTTTACCTTCTCTATTCTCAGATGGCAATTGAG
GGACTGCCCCAAGGCCCGCAGGGAGGTGGAGCTGCACTGGCGGGCCTCCC[A]GTGCCCGCACATCGTACGGATCGTGGATGTGTACGAGAATCTGTACGCAG
GGCCCAACGCCATCCTGAAACTCACTGACTTTGGCTTTGCCAAGGAAACC[A]CCAGCCACAACTCTTTGACCACTCCTTGTTATACACCGTACTATGTGGGT
TCTGCCTGGTCCGCTGGAGCTGGGCATTGAAGCCCCGCAGCTGCTCAGCC[A]CCTGCCCCGCCATCAAGAAGGCCCCACCGGCCCTGGGAAGGACACCCCTG
TTTGAAGCCCTTATGAACCAAGAAACCTTCGTTCAGGACCTCAAAATCAA[A]CCCCGCCACATGCAGCTCGCAGGCCTGCAGGAGGAAAGACAGGTTAGCAA
CTGCAGCCTACCTGTCCATGTCCCAGGGGGCCGTTGCCAACGCCAACAGC[A]CCCCGCCGCCCTATGAGCGTACCCGCCTCTCCCCACCCCGGGCCAGCTAC
ACTGGCAAACATGTTGAGGACAATGATGGAGGGGATGAGCTTGCATAGGA[A]CCTGCCGTAGGGCCACTGTCCCTGGAGAGCCAAGTGAGCCAGCGAGAAGG
CACCCTCAGAGAAGAAGAAAGGAGCTGAGGAGGAGAAGCCAAAGAGGAGG[A]GGCAGGAGAAGCAGGCAGCCTGCCCCTTCTACAACCACGAGCAGATGGGC
CCAGCCCTGTATGAGGACCCCCCAGATCAGAAAACCTCACCCAGTGGCAA[A]CCTGCCACACTCAAGATCTGCTCTTGGAATGTGGATGGGCTTCGAGCCTG
TTCCTGTGCGCCCCAACAACTCCTTTAGCTGGCCTAAAGTGAAAGGACGG[A]CCTGCCAATGAAAATAGACTTTCAGGGTCTAGCAGAAGGCAAGACCACCA
CTAACACCCGCACGAGCTGCTGGTAGATCTGAATGGCCAAGTCACTCAGC[A]CCTGCCGATACTCAGCCAGGTCAAAATTGGTGAGGCAGTGTTCATTCTGG
AGTTCTGCATCTGGAGCAAATCCTTGGCACTCCCTCATGCTGGCTATCAC[A]CCTGCCACGAATGTGCCATGGCCCAACCCTGCAGTCCATAAAGAAAACAA
CGTGCCCATGCAGCTAGTGCTCTTCCGAGAGGCTATTGAACACAGTGAGC[A]CCTGCCACGCCTATCCCCTTCCCCATCATCTCAGTGATGGGGTATGTCTA
ACAAGGACCTGGCCCTGGGGCAGCCCCTCAGCCCACCTGGTCCCTGCCTT[A]CCCAGCCAGTACTCTCCATCAGCACGGCCGAAGCCCAGCTTGTAGTCATT")
Old solution (before clarification by OP) - left here for reference
This is a solution in base R with help of the stringr package which will work with a "list" of sequences (a data.frame), any single letter enclosed in square brackets, and arbitrary lengths of the sequences. It assumes that the data already have been read from file into a data.frame which is named los here.
# create data: data frame with two sequences
los <- data.frame(
sequence = c("GTCCTGGTTGTAGCTGAAGCTCTTCCC[A]CTCCTCCCGATCACTGGGACGTCCTATGT",
"GTCCTGGTTGTAGCTGAAGCTCTTCCCACT[C]CTCCCGATCACTGGGACGTCCTATGT"))
# split sequences in three parts
mat <- stringr::str_split_fixed(los$sequence, "[\\[\\]]", n = 3)
los$letter <- mat[, 2]
los$n_before <- stringr::str_count(mat[, 1], mat[, 2])
los$n_after <- stringr::str_count(mat[, 3], mat[, 2])
print(los)
# sequence letter n_before n_after
#1 GTCCTGGTTGTAGCTGAAGCTCTTCCC[A]CTCCTCCCGATCACTGGGACGTCCTATGT A 3 4
#2 GTCCTGGTTGTAGCTGAAGCTCTTCCCACT[C]CTCCCGATCACTGGGACGTCCTATGT C 9 9
Note this code works best if there is exactly one pair of square brackets in each sequence. Any additional brackets will be ignored.
It will also work if there is more than just one letter enclosed in brackets, e.g., [GT].
I'm confessing that I'm addicted to Hadley Wickham's stringr package because I have difficulties to remember the inconsistently named base R functions for string maninpulation like strsplit, grepl, sub, match, gregexpr, etc. To understand what I mean please have a look at the Usage and See Also sections of ?grep and compare to stringr.
I would think that R packages for bioinformatics, such as seqinr or Biostrings, would be a good starting point. However, here's a "roll your own" solution.
First step: get your data from Excel into R. I will assume that file mydata.xlsx contains one sheet with a column of sequence and no header. You need to adapt this for your file and sheet format.
library(readxl)
sequences <- read_excel("mydata.xlsx", col_names = FALSE)
colnames(sequences) <- "sequence"
Now you need a function to extract the base in square brackets and the bases at -1 and +1. This function uses the stringr package to extract bases using regular expressions.
get_bases <- function(seq) {
require(stringr)
require(magrittr)
subseqs <- str_match(seq, "^([ACGT]+)\\[([ACGT])\\]([ACGT]+)$")
bases <- list(
before = subseqs[, 2] %>% str_sub(-1, -1),
base = subseqs[, 3],
after = subseqs[, 4] %>% str_sub(1, 1)
)
return(bases)
}
Now you can pass the column of sequences to the function to generate a list of lists, which can be converted to a data frame.
library(purrr)
sequences_df <- lapply(sequences, get_bases) %>%
map_df(as.data.frame, stringsAsFactors = FALSE)
head(sequences_df, 3)
before base after
1 C A C
2 C A C
3 A A C
The last step is to use functions from dplyr and tidyr to count up the bases.
library(tidyr)
sequences_df %>%
gather(position, letter, -base) %>%
group_by(base, position, letter) %>%
tally() %>%
spread(position, n) %>%
select(base, letter, before, after)
Result using your 17 example sequences. I would use better names than I did if I were you: base = the base in square brackets, letter = the base being counted, before = count at -1, after = count at +1.
base letter before after
* <chr> <chr> <int> <int>
1 A A 5 NA
2 A C 9 15
3 A G 2 2
4 A T 1 NA
i have a code using R language, i want to sum all data frame (df$number is unlist result in 'res')
total result is = [1] 1 3 5 7 9 20 31 42
digits <- function(x){as.integer(substring(x, seq(nchar(x)), seq(nchar(x))))}
generated <- function(x){ x + sum(digits(x))}
digitadition <- function(x,N) { c(x, replicate(N-1, x <<- generated(x))) }
res <- NULL
for(i in 0:50){
for(j in 2:50){
tmp <- digitadition(i,j)
IND <- 50*(i-1) + (j-1) - (i-1) #to index results
res[IND] <- tmp[length(tmp)]
}
}
df <- data.frame(number = unlist(res), generator=rep(1:50, each=49), N=2:50)
total <- table(df$number)[as.numeric(names(table(df$number)))<=50]
setdiff(1:50, as.numeric(names(total)))
sum(total)
i'm using sum(total) but the result of summary is '155' it is not the right answer, cause the right answer is '118'
what the spesific code to sum the 'total'?
thank you.
I ran your code and I think you may be confused on what you want to sum.
You setdiff contains the values 1 3 5 7 9 20 31 42 which sum is 118.
So, if you do sum(setdiff(1:50, as.numeric(names(total)))), you'll get the 118 you are looking for.
Your total variable is different from this. Let me explain what you are doing and what I think you should do.
Your code: total <- table(df$number)[as.numeric(names(table(df$number)))<=50]]
When you table(), you get each unique value from the vector, and the number of how many times this number appears on your vector.
And when you get the names() of this table, you get each of these unique values as a character, that's why you are setting as.numeric.
But the function unique() do this job for you, he extracts uniques values from a vector.
Here's what you can do: total <- unique(df$number[which(df$number <= 50)])
Where which() get the ID's of values <= 50, and unique extracts unique values of these ID's.
And finally: sum(setdiff(1:50, total)) that sums all the values from 1 to 50 that are not in your total vector.
And in my opinion, sum(setdiff(total, 1:50)) its more intuitive.
Let's say I have a number in decimal format: 5
its binary version is: 00101
I would like to write a function that takes the decimal number x
and returns all other decimal numbers that have a single digit difference (in their binary forms) from the original one:
so for the example above the neighbors are:
10101 01101 00111 00001 00100
and the corresponding decimals are:
21 13 7 1 4
I would like a solution that is computationally efficient and doesn't take a long time even if I have say a million digits.
Is this possible to do?
I've no idea how trial and error got me here, but it looks valid unless I've messed up binaries and decimals:
bin_neighs = function(x, n) bitwXor(x, (2 ^ (0:(n - 1))))
bin_neighs(5, 5)
#[1] 4 7 1 13 21
I think you're asking how to take as input a number 5 and to return all neighboring binary values. To do this, you need to convert the number to a useful binary format (just the bits you want to flip), flip each bit, and return the result:
library(R.utils)
bin.neighbors <- function(x, num.neighbors=NA) {
# Get the bits with the appropriate amount of padding
bits <- as.numeric(unlist(strsplit(intToBin(x), "")))
if (!is.na(num.neighbors) & num.neighbors > length(bits)) {
bits <- c(rep(0, num.neighbors-length(bits)), bits)
}
# Build a matrix where each column is a bit vector of a neighbor
mat <- matrix(bits, length(bits), length(bits))
diag(mat) <- 1-diag(mat)
# Return the decimal values of the neighbors using strtoi
apply(mat, 2, function(x) strtoi(paste0(x, collapse=""), 2))
}
bin.neighbors(5, 5)
# [1] 21 13 1 7 4
Because each number has a number of binary representations with different numbers of leading 0s (e.g. 5 can be represented as 101, 0101, 00101, 000101, 0000101, etc.), I added an argument num.neighbors to specify the length of the output vector from the function. You can pass NA to obtain an output vector equal to the number of bits in the binary representation of the input with no leading zeros.
Here's another way using magrittr's pipe:
binNeighbours <- function(a, numNeighbours = ceiling(log2(a))) {
rep(a, numNeighbours) %>%
outer(., seq(.) - 1, function(x, y) x %/% (2 ^ y) %% 2) %>%
`diag<-`(., 1 - diag(.)) %>%
`%*%`(2 ^(0:(nrow(.) - 1))) %>%
`[`(, 1)
}