Encoding whole numbers in R to a base 62 character vector - r

What's a quick way to encode either integer values or numeric whole number values in R to a character vector in base 62 encoding, i.e. a string that only contains [a-zA-Z0-9]? Would translating the answer to this question be sufficient?
converting a number base 10 to base 62 (a-zA-Z0-9)
Edited
Here's my solution:
toBase <- function(num, base=62) {
bv <- c(seq(0,9),letters,LETTERS)
r <- num %% base
res <- bv[r+1]
q <- floor(num/base)
while (q > 0L) {
r <- q %% base
q <- floor(q/base)
res <- paste(bv[r+1],res,sep='')
}
res
}
to10 <- function(num, base=62) {
bv <- c(seq(0,9),letters,LETTERS)
vb <- list()
for (i in 1:length(bv)) vb[[bv[i]]] <- i
num <- strsplit(num,'')[[1]]
res <- vb[[num[1]]]-1
if (length(num) > 1)
for (i in 2:length(num)) res <- base * res + (vb[[num[i]]]-1)
res
}
Is that missing anything?

Here's a solution that does base 36 using [0-9A-Z] that could easily be adapted for base 62 using [a-zA-Z0-9]. And yes, it's basically just a translation of the solution to the other question you linked to.
https://github.com/graywh/r-gmisc/blob/master/R/baseConvert.R

Here's a variant of the above code that allows you to convert a vector of numbers to base 16. It's not particularly elegant, as it isn't vectorized, but it gets the job done.
toBase <- function(num, base=16) {
bv <- c(0:9,letters,LETTERS)
r <- list()
q <- list()
res <- list()
for(i in 1:length(num)){
r[i] <- num[i] %% base
res[i] <- bv[r[[i]]+1]
q[i] <- floor(num[i]/base)
while (q[[i]] > 0L) {
r[i] <- q[[i]] %% base
q[i] <- floor(q[[i]]/base)
res[i] <- paste(bv[r[[i]]+1],res[[i]],sep='')
}
}
return(do.call('c', res))
}

To make this more standard, you should implement it in a similar way to the conversion to hexadecimal. (see here for naming.)
as.exindadeomode <- function(x)
{
#Give x a class of "exindadeomode"
#Contents as per as.hexmode
}
format.exindadeomode <- function (x, width = NULL, upper.case = FALSE, ...)
{
#Return appropriate characters
#Contents as per format.hexmode
}
To convert back to integer, just strip the class, using as.integer.

Related

Function with variables that are different dataframe sizes

So I´m trying to run the fuction below hoping to get 224 vectors in the output, but only get one and I have no idea why.
ee <- 0.95
td <- 480
tt <- c(60,10,14,143,60)
tt <- as.data.frame(tt)
r <- vector()
m <- function(d)
{
n <- length(tt)
c <- nrow(d)
for (j in 1:c)
{
for (i in 1:n)
{
r[i] <- tt[i]/(td*ee/d[j,])
}
return(r)
}
#where d is a data frame of 224 obs. of 1 variable
and the output i´m getting is
[[1]]
[1] 1026.3158 171.0526 239.4737 2446.0526 1026.3158
The problem comes from the fact that your function returns only the last r vector that is computed, due to where return is placed within your loop.
One way to do this is to store the results in a list:
r <- vector()
m_bis <- function(d) {
res <- list() # store all the vectors here
n <- length(tt)
c <- nrow(d)
for (j in 1:c) {
for (i in 1:n) {
r[i] <- tt[i] / (td * ee / d[j,])
}
res[j] <- r
}
return(res)
}
That should yield something like this:
m_bis(as.data.frame(mtcars$mpg))
> [[1]]
[1] 2.7631579 0.4605263 0.6447368 6.5855263 2.7631579
...
[[32]]
[1] 2.8157895 0.4692982 0.6570175 6.7109649 2.8157895
outer(as.vector(tt[,1]), as.vector(d[,1]), function(x,y){x*y/(td*ee)})
Use vectorization to accelerate the computation.

How does Big Integer math work in R gmp package?

As a self taught programmer, I was unaware of libraries such as gmp and wrote several "Big Integer" functions myself to handle large integer arithmetic. The key idea driving my algorithms is storing my large integers of interest in an array (i.e. each index would represent a single digit of the number (e.g. 123456789 would be in the array like so: (1,2,3,4,5,6,7,8,9)). An example of big integer addition is below.
MyBigIntegerAddition <- function(x1, x2) {
MyNum1 <- as.integer(strsplit(as.character(x1), "")[[1]])
MyNum2 <- as.integer(strsplit(as.character(x2), "")[[1]])
if (length(MyNum1) < length(MyNum2)) {
while (length(MyNum1) < length(MyNum2)) {MyNum1 <- c(0L, MyNum1)}
} else if (length(MyNum2) < length(MyNum1)) {
while (length(MyNum2) < length(MyNum1)) {MyNum2 <- c(0L, MyNum2)}
}
MyNum1 <- MyNum1 + MyNum2
lenMyNum1 <- length(MyNum1)
if (lenMyNum1 >= 2L) {
for (j in lenMyNum1:2L) {
TempB1 <- MyNum1[j] %% 10L
TempB2 <- floor(MyNum1[j]/ 10L)
MyNum1[j] <- TempB1
MyNum1[j - 1L] <- MyNum1[j - 1L] + TempB2
}
}
while ((MyNum1[1L] / 10L) > 1L) {
TempB1 <- MyNum1[1L] %% 10L
TempB2 <- floor(MyNum1[1L]/ 10L)
MyNum1[1L] <- TempB1
MyNum1 <- c(TempB2, MyNum1)
}
paste(MyNum1, collapse = "")
}
Below is a random example comparing output
MyBigIntegerAddition("103489710232857289034750289347590984710923874","2987234756 23746529875692873456927834569298347569237")
[1] "298723579113456762732981908207217182160283058493111"
> add.bigz("103489710232857289034750289347590984710923874","298723475623746529875692873456927834569298347569237")
Big Integer ('bigz') :
[1] 298723579113456762732981908207217182160283058493111
I have provided a function that verifies my results as well.
TestStringMath <- function(n,Lim1,Lim2) {
samp1 <- sample(Lim1:Lim2,n)
samp2 <- sample(Lim1:Lim2,n)
count <- 0L
for (i in 1:n) {
temp1 <- add.bigz(samp1[i], samp2[i])
temp2 <- as.bigz(MyBigIntegerAddition(samp1[i], samp2[i]))
if (!(temp1==temp2)) {
count <- count+1L
}
}
count
}
Question: How exactly does gmp's arithmetic functions work? Do they convert numbers to strings and use arrays? Do they simply invoke more memory?

Converting a function to accept input directly in r

I was reading a book and I came across this function in R. This function basically finds out patterns in the input string having a minimum threshold of 3.
vec <- "da0abcab0abcaab0d0"
find_rep_path <- function(vec, reps) {
regexp <- paste0(c("(.+)", rep("\\1", reps - 1L)), collapse = "")
match <- regmatches(vec, regexpr(regexp, vec, perl = TRUE))
substr(match, 1, nchar(match) / reps)
}
vals <- unique(strsplit(vec, "")[[1]])
str <- NULL
for (i in seq.int(nchar(vec))) {
x <- vec
for (v in vals) {
substr(x, i, i) <- v
tmp <- find_rep_path(x, 3)
if (length(tmp) > 0)
str <- c(str, tmp)
}
}
nc <- nchar(str)
unique(str[which(nc == max(nc))])
Now, I wish to convert this function into the form like,
function("da0abcab0abcaab0d0"). This means, that I can easily pass a string to the function directly and not hardcode it in the original function. How can I modify this?
I know this is a beginner question but I am completely at sea right now as far as R is concerned. Please help!
I don't see how it's hardcoded. But you can just wrap your code into a function if that's what you mean?
# Function 1
find_rep_path <- function(vec, reps) {
regexp <- paste0(c("(.+)", rep("\\1", reps - 1L)), collapse = "")
match <- regmatches(vec, regexpr(regexp, vec, perl = TRUE))
substr(match, 1, nchar(match) / reps)
}
# Function 2
foo <- function(vec) {
vals <- unique(strsplit(vec, "")[[1]])
str <- NULL
for (i in seq.int(nchar(vec))) {
x <- vec
for (v in vals) {
substr(x, i, i) <- v
tmp <- find_rep_path(x, 3)
if (length(tmp) > 0)
str <- c(str, tmp)
}
}
nc <- nchar(str)
return(unique(str[which(nc == max(nc))]))
}
vec <- "da0abcab0abcaab0d0"
foo(vec)
#[1] "0ab" "abc"
Edit1
To get the place of the matches you can use gregexr:
a <- foo(vec)
gregexpr(a[1], vec)
#[[1]]
#[1] 3 9
#attr(,"match.length")
#[1] 3 3
#attr(,"useBytes")
#[1] TRUE
This tells you that a[1] ("0ab") was matched in vec at positions 3 and 9. Run ?gregexpr for more informations.
Edit2
To add this information to each match, we can do something like
bar <- function(vec) {
m <- foo(vec)
ans <- sapply(m, gregexpr, vec, fixed = TRUE)
ans <- lapply(ans, function(x) {attributes(x) <- NULL; x})
return(ans)
}
bar(vec)
#$`0ab`
#[1] 3 9
#
#$abc
#[1] 4 10

looping through a matrix with a function

I'd like to perform this function on a matrix 100 times. How can I do this?
v = 1
m <- matrix(0,10,10)
rad <- function(x) {
idx <- sample(length(x), size=1)
flip = sample(0:1,1,rep=T)
if(flip == 1) {
x[idx] <- x[idx] + v
} else if(flip == 0) {
x[idx] <- x[idx] - v
return(x)
}
}
This is what I have so far but doesn't work.
for (i in 1:100) {
rad(m)
}
I also tried this, which seemed to work, but gave me an output of like 5226 rows for some reason. The output should just be a 10X10 matrix with changed values depending on the conditions of the function.
reps <- unlist(lapply(seq_len(100), function(x) rad(m)))
Ok I think I got it.
The return statement in your function is only inside a branch of an if statement, so it returns a matrix with a probability of ~50% while in the other cases it does not return anything; you should change the code function into this:
rad <- function(x) {
idx <- sample(length(x), size=1)
flip = sample(0:1,1,rep=T)
if(flip == 1) {
x[idx] <- x[idx] + v
} else if(flip == 0) {
x[idx] <- x[idx] - v
}
return(x)
}
Then you can do:
for (i in 1:n) {
m <- rad(m)
}
Note that this is semantically equal to:
for (i in 1:n) {
tmp <- rad(m) # return a modified verion of m (m is not changed yet)
# and put it into tmp
m <- tmp # set m equal to tmp, then in the next iteration we will
# start from a modified m
}
When you run rad(m) is not do changes on m.
Why?
It do a local copy of m matrix and work on it in the function. When function end it disappear.
Then you need to save what function return.
As #digEmAll write the right code is:
for (i in 1:100) {
m <- rad(m)
}
You don't need a loop here. The whole operation can be vectorized.
v <- 1
m <- matrix(0,10,10)
n <- 100 # number of random replacements
idx <- sample(length(m), n, replace = TRUE) # indices
flip <- sample(c(-1, 1), n, replace = TRUE) # subtract or add
newVal <- aggregate(v * flip ~ idx, FUN = sum) # calculate new values for indices
m[newVal[[1]]] <- m[newVal[[1]]] + newVal[[2]] # add new values

R: creating a matrix with unknown number of rows

I have written the code below to generate a matrix containing what is, to me, a fairly complex pattern. In this case I determined that there are 136 rows in the finished matrix by trial and error.
I could write a function to calculate the number of matrix rows in advance, but the function would be a little complex. In this example the number of rows in the matrix = ((4 * 3 + 1) + (3 * 3 + 1) + (2 * 3 + 1) + (1 * 3 + 1)) * 4.
Is there an easy and efficient way to create matrices in R without hard-wiring the number of rows in the matrix statement? In other words, is there an easy way to let R simply add a row to a matrix as needed when using for-loops?
I have presented one solution that employs rbind at each pass through the loops, but that seems a little convoluted and I was wondering if there might be a much easier solution.
Sorry if this question is redundant with an earlier question. I could not locate a similar question using the search feature on this site or using an internet search engine today, although I think I have found a similar question somewhere in the past.
Below are 2 sets of example code, one using rbind and the other where I used trial and error to set nrow=136 in advance.
Thanks for any suggestions.
v1 <- 5
v2 <- 2
v3 <- 2
v4 <- (v1-1)
my.matrix <- matrix(0, nrow=136, ncol=(v1+4) )
i = 1
for(a in 1:v2) {
for(b in 1:v3) {
for(c in 1:v4) {
for(d in (c+1):v1) {
if(d == (c+1)) l.s = 4
else l.s = 3
for(e in 1:l.s) {
my.matrix[i,c] = 1
if(d == (c+1)) my.matrix[i,d] = (e-1)
else my.matrix[i,d] = e
my.matrix[i,(v1+1)] = a
my.matrix[i,(v1+2)] = b
my.matrix[i,(v1+3)] = c
my.matrix[i,(v1+4)] = d
i <- i + 1
}
}
}
}
}
my.matrix2 <- matrix(0, nrow=1, ncol=(v1+4) )
my.matrix3 <- matrix(0, nrow=1, ncol=(v1+4) )
i = 1
for(a in 1:v2) {
for(b in 1:v3) {
for(c in 1:v4) {
for(d in (c+1):v1) {
if(d == (c+1)) l.s = 4
else l.s = 3
for(e in 1:l.s) {
my.matrix2[1,c] = 1
if(d == (c+1)) my.matrix2[1,d] = (e-1)
else my.matrix2[1,d] = e
my.matrix2[1,(v1+1)] = a
my.matrix2[1,(v1+2)] = b
my.matrix2[1,(v1+3)] = c
my.matrix2[1,(v1+4)] = d
i <- i+1
if(i == 2) my.matrix3 <- my.matrix2
else my.matrix3 <- rbind(my.matrix3, my.matrix2)
my.matrix2 <- matrix(0, nrow=1, ncol=(v1+4) )
}
}
}
}
}
all.equal(my.matrix, my.matrix3)
If you have some upper bound on the size of the matrix,
you can create a matrix
large enough to hold all the data
my.matrix <- matrix(0, nrow=v1*v2*v3*v4*4, ncol=(v1+4) )
and truncate it at the end.
my.matrix <- my.matrix[1:(i-1),]
This is the generic form to do it. You can adapt it to your problem
matrix <- NULL
for(...){
...
matrix <- rbind(matriz,vector)
}
where vector contains the row elements
I stumbled upon this solution today: convert the matrix to a data.frame. As new rows are needed by the for-loop those rows are automatically added to the data.frame. Then you can convert the data.frame back to a matrix at the end if you want. I am not sure whether this constitutes something similar to iterative use of rbind. Perhaps it becomes very slow with large data.frames. I do not know.
my.data <- matrix(0, ncol = 3, nrow = 2)
my.data <- as.data.frame(my.data)
j <- 1
for(i1 in 0:2) {
for(i2 in 0:2) {
for(i3 in 0:2) {
my.data[j,1] <- i1
my.data[j,2] <- i2
my.data[j,3] <- i3
j <- j + 1
}
}
}
my.data
my.data <- as.matrix(my.data)
dim(my.data)
class(my.data)
EDIT: July 27, 2015
You can also delete the first matrix statement, create an empty data.frame then convert the data.frame to a matrix at the end:
my.data <- data.frame(NULL,NULL,NULL)
j <- 1
for(i1 in 0:2) {
for(i2 in 0:2) {
for(i3 in 0:2) {
my.data[j,1] <- i1
my.data[j,2] <- i2
my.data[j,3] <- i3
j <- j + 1
}
}
}
my.data
my.data <- as.matrix(my.data)
dim(my.data)
class(my.data)

Resources