Create decimal values of character hex - r

With the following code I'm reading a text file and displaying one column on the screen.
externalData <- read.delim("testdata.txt", header = FALSE, sep = "#")
i = 1;
while(i < 11) {
appel <- as.character(externalData[i,3,1])
i = i + 1;
print(appel)
}
The output looks like this:
I'm trying to convert these values from hexadecimal to decimal.
I've tried the following:
strtoi(c(appel))
but this doesn't seem to work, this only removes the quotation marks from the first one and the last one, and sets everything in-between to N/A (probably because there are letters in them).

Here are 3 ways to convert hexadecimal(character) to decimal(numeric).
x <- c("158308", "bb1787", "853f91")
# 1.
strtoi(x, base = 16L)
# 2.
as.integer(as.hexmode(x))
# 3.
as.integer(paste0("0x", x))
# more general edition:
# as.integer(ifelse(!grepl("^0x", x), paste0("0x", x), x))

From ?strtoi
Convert strings to integers according to the given base using the C function strtol, or choose a suitable base following the C rules.
Arguments
x a character vector, or something coercible to this by as.character.
base an integer which is between 2 and 36 inclusive, or zero (default).
1.Create a reproducible minimal example
appel <- c("158308", "d8db89")
2.Solution using strtoi base argument:
strtoi(appel, base=16)
Returns:
[1] 1409800 14211977

Related

Display numbers with commas in console

I am working with a bunch of large numbers. I know how to convert numbers to comma format from: Comma separator for numbers in R? . What I don't know how to do is display numbers in the console with commas without converting the class from numeric. I want to be able to see the commas so I can compare numbers when working - but need to keep the numbers as numeric to make calculations. I know you can get rid of scientific notation from:How to disable scientific notation? - but can't find an equivalent for a comma or dollar format.
You could create a new method for print(), for a custom class I will call "bignum":
print.bignum <- function(x) {
print(format(x, scientific = FALSE, big.mark = ",", trim = TRUE))
}
x <- c(1e6, 2e4, 5e8)
class(x) <- c(class(x), "bignum")
x
[1] "1,000,000" "20,000" "500,000,000"
x * 2
[1] "2,000,000" "40,000" "1,000,000,000"
y <- x + 1
y
[1] "1,000,001" "20,001" "500,000,001"
class(y) <- "numeric"
y
[1] 1000001 20001 500000001
For any numeric object x, if you add "bignum" to the class attribute via class(x) <- c(class(x), "bignum"), it will always print how you've described you want it to print, but should behave as a numeric otherwise, as shown above.

How to refactor a vector?

I have this vector
v <- c("firstOne","firstTwo","secondOne")
I would like to factor the vector assigning c("firstOne","firstTwo) to the same level (i.e., firstOne). I have tried this:
> factor(v, labels = c("firstOne", "firstOne", "secondOne"))
[1] firstOne firstOne secondOne
Levels: firstOne firstOne secondOne
But I get a duplicate factor (and a warning message advising not to use it). Instead, I would like the output to look like:
[1] firstOne firstOne secondOne
Levels: firstOne secondOne
Is there any way to get this output without brutally substituting the character strings?
Here are a couple of options:
v <- factor(ifelse(v %in% c("firstOne", "firstTwo"), "firstOne", "secondOne"))
v <- factor(v,levels = c("firstOne","secondOne")); f[is.na(f)] <- 'firstOne'
A factor is just a numeric (integer) vector with labels, and so manipulating a factor is equivalent to manipulating integers, rather than character strings. Therefore performance-wise is perfectly OK to do
f <- as.factor(v)
f[f %in% c('firstOne', 'firstTwo')] <- 'firstOne'
f <- droplevels(f)
You could use the rec-function of the sjmisc-package:
rec(v, "firstTwo=firstOne;else=copy", as.fac = T)
> [1] firstOne firstOne secondOne
> Levels: firstOne secondOne
(the output is shortened; note that the sjmisc-package supports labelled data and thus adds label attributes to the vector, which you'll see in the console output as well)
Eventually I also found a solution which looks somehow sloppy but I don't see major issues (looking forward to listen which might be possible problems with this tho):
v <- c("firstOne","firstTwo","secondOne")
factor(v)
factor(factor(v,labels = c("firstOne","firstOne","secondOne")))

Finding the position of a character within a string

I am trying to find the equivalent of the ANYALPHA SAS function in R. This function searches a character string for an alphabetic character, and returns the first position at which at which the character is found.
Example: looking at the following string '123456789A', the ANYALPHA function would return 10 since first alphabetic character is at position 10 in the string. I would like to replicate this function in R but have not been able to figure it out. I need to search for any alphabetic character regardless of case (i.e. [:alpha:])
Thanks for any help you can offer!
Here's an anyalpha function. I added a few extra features. You can specify the maximum amount of matches you want in the n argument, it defaults to 1. You can also specify if you want the position or the value itself with value=TRUE:
anyalpha <- function(txt, n=1, value=FALSE) {
txt <- as.character(txt)
indx <- gregexpr("[[:alpha:]]", txt)[[1]]
ret <- indx[1:(min(n, length(indx)))]
if(value) {
mapply(function(x,y) substr(txt, x, y), ret, ret)
} else {ret}
}
#test
x <- '123A56789BC'
anyalpha(x)
#[1] 4
anyalpha(x, 2)
#[1] 4 10
anyalpha(x, 2, value=TRUE)
#[1] "C" "A"

Convert binary vector to decimal

I have a vector of a binary string:
a<-c(0,0,0,1,0,1)
I would like to convert this vector into decimal.
I tried using the compositions package and the unbinary() function, however, this solution and also most others that I have found on this site require g-adic string as input argument.
My question is how can I convert a vector rather than a string to decimal?
to illustrate the problem:
library(compositions)
unbinary("000101")
[1] 5
This gives the correct solution, but:
unbinary(a)
unbinary("a")
unbinary(toString(a))
produces NA.
You could try this function
bitsToInt<-function(x) {
packBits(rev(c(rep(FALSE, 32-length(x)%%32), as.logical(x))), "integer")
}
a <- c(0,0,0,1,0,1)
bitsToInt(a)
# [1] 5
here we skip the character conversion. This only uses base functions.
It is likely that
unbinary(paste(a, collapse=""))
would have worked should you still want to use that function.
There is a one-liner solution:
Reduce(function(x,y) x*2+y, a)
Explanation:
Expanding the application of Reduce results in something like:
Reduce(function(x,y) x*2+y, c(0,1,0,1,0)) = (((0*2 + 1)*2 + 0)*2 + 1)*2 + 0 = 10
With each new bit coming next, we double the so far accumulated value and add afterwards the next bit to it.
Please also see the description of Reduce() function.
If you'd like to stick to using compositions, just convert your vector to a string:
library(compositions)
a <- c(0,0,0,1,0,1)
achar <- paste(a,collapse="")
unbinary(achar)
[1] 5
This function will do the trick.
bintodec <- function(y) {
# find the decimal number corresponding to binary sequence 'y'
if (! (all(y %in% c(0,1)))) stop("not a binary sequence")
res <- sum(y*2^((length(y):1) - 1))
return(res)
}

Vector-version / Vectorizing a for which equals loop in R

I have a vector of values, call it X, and a data frame, call it dat.fram. I want to run something like "grep" or "which" to find all the indices of dat.fram[,3] which match each of the elements of X.
This is the very inefficient for loop I have below. Notice that there are many observations in X and each member of "match.ind" can have zero or more matches. Also, dat.fram has over 1 million observations. Is there any way to use a vector function in R to make this process more efficient?
Ultimately, I need a list since I will pass the list to another function that will retrieve the appropriate values from dat.fram .
Code:
match.ind=list()
for(i in 1:150000){
match.ind[[i]]=which(dat.fram[,3]==X[i])
}
UPDATE:
Ok, wow, I just found an awesome way of doing this... it's really slick. Wondering if it's useful in other contexts...?!
### define v as a sample column of data - you should define v to be
### the column in the data frame you mentioned (data.fram[,3])
v = sample(1:150000, 1500000, rep=TRUE)
### now here's the trick: concatenate the indices for each possible value of v,
### to form mybiglist - the rownames of mybiglist give you the possible values
### of v, and the values in mybiglist give you the index points
mybiglist = tapply(seq_along(v),v,c)
### now you just want the parts of this that intersect with X... again I'll
### generate a random X but use whatever X you need to
X = sample(1:200000, 150000)
mylist = mybiglist[which(names(mybiglist)%in%X)]
And that's it! As a check, let's look at the first 3 rows of mylist:
> mylist[1:3]
$`1`
[1] 401143 494448 703954 757808 1364904 1485811
$`2`
[1] 230769 332970 389601 582724 804046 997184 1080412 1169588 1310105
$`4`
[1] 149021 282361 289661 456147 774672 944760 969734 1043875 1226377
There's a gap at 3, as 3 doesn't appear in X (even though it occurs in v). And the
numbers listed against 4 are the index points in v where 4 appears:
> which(X==3)
integer(0)
> which(v==3)
[1] 102194 424873 468660 593570 713547 769309 786156 828021 870796
883932 1036943 1246745 1381907 1437148
> which(v==4)
[1] 149021 282361 289661 456147 774672 944760 969734 1043875 1226377
Finally, it's worth noting that values that appear in X but not in v won't have an entry in the list, but this is presumably what you want anyway as they're NULL!
Extra note: You can use the code below to create an NA entry for each member of X not in v...
blanks = sort(setdiff(X,names(mylist)))
mylist_extras = rep(list(NA),length(blanks))
names(mylist_extras) = blanks
mylist_all = c(mylist,mylist_extras)
mylist_all = mylist_all[order(as.numeric(names(mylist_all)))]
Fairly self-explanatory: mylist_extras is a list with all the additional list stuff you need (the names are the values of X not featuring in names(mylist), and the actual entries in the list are simply NA). The final two lines firstly merge mylist and mylist_extras, and then perform a reordering so that the names in mylist_all are in numeric order. These names should then match exactly the (unique) values in the vector X.
Cheers! :)
ORIGINAL POST BELOW... superseded by the above, obviously!
Here's a toy example with tapply that might well run significantly quicker... I made X and d relatively small so you could see what's going on:
X = 3:7
n = 100
d = data.frame(a = sample(1:10,n,rep=TRUE), b = sample(1:10,n,rep=TRUE),
c = sample(1:10,n,rep=TRUE), stringsAsFactors = FALSE)
tapply(X,X,function(x) {which(d[,3]==x)})

Resources