How to convert a hex string to text in R? - r

Is there a function which converts a hex string to text in R?
For example:
I've the hex string 1271763355662E324375203137 which should be converted to qv3Uf.2Cu 17.
Does someone know a good solution in R?

Here's one way:
s <- '1271763355662E324375203137'
h <- sapply(seq(1, nchar(s), by=2), function(x) substr(s, x, x+1))
rawToChar(as.raw(strtoi(h, 16L)))
## [1] "\022qv3Uf.2Cu 17"
And if you want, you can sub out non-printable characters as follows:
gsub('[^[:print:]]+', '', rawToChar(as.raw(strtoi(h, 16L))))
## [1] "qv3Uf.2Cu 17"

Just to add to #jbaums answer or to simplify it
library(wkb)
hex_string <- '231458716E234987'
hex_raw <- wkb::hex2raw(hex_string)
text <- rawToChar(as.raw(strtoi(hex_raw, 16L)))

An alternative way that separates the two parts involved:
Turn the initial string into a vector of bytes (with values as hexadecimals)
Convert those raw bytes into characters (excluding any not printable)
Part 1:
s <- '1271763355662E324375203137'
sc <- unlist(strsplit(s, ""))
i1 <- (1:nchar(s)) %% 2 == 1
# vector of bytes (as character)
s_pairs1 <- paste0(sc[i1], sc[!i1])
# make explicit it is a series of hexadecimals bytes
s_pairs2 <- paste0("0x", s_pairs1)
head(s_pairs2)
#> [1] "0x12" "0x71" "0x76" "0x33" "0x55" "0x66"
Part 2:
s_raw1 <- as.raw(s_pairs2)
# filter non printable values (ascii < 32 = 0x20)
s_raw2 <- s_raw1[s_raw1 >= as.raw("0x20")]
rawToChar(s_raw2)
#> [1] "qv3Uf.2Cu 17"
We could also use as.hexmode() function to turn s_pairs1 into a vector of hexadecimals
s_pairs2 <- as.hexmode(s_pairs1)
head(s_pairs2)
#> [1] "12" "71" "76" "33" "55" "66"
Created on 2023-01-03 by the reprex package (v2.0.1)

Related

Convert character string to number in R

I am trying to convert a character string to a number in R. Example:
a=1; b=2
If my input is "abba", I want my output to be a+b+b+a = 1+2+2+1 = 6.
Here's my attempt so far:
str= "abba"
paste(unlist(strsplit(unlist(str_extract_all(str, "[aA-zZ]+")), split = "")),collapse="+")
[1] "a+b+b+a"
I don't know how to convert this to numeric since as.numeric() returns NA. Any help is appreciated!
Another option setting factor levels for the letters like this:
str= "abba"
sum(as.numeric(factor(unlist(strsplit(str, "")), levels = letters)))
#> [1] 6
Created on 2022-09-28 with reprex v2.0.2
You could use a data.frame to translate your letters to numbers, and match after using strsplit:
translation <- data.frame(letter = letters,
number = 1:26)
str <- "abba"
sum(match(strsplit(str, "")[[1]], translation$letter))
#> [1] 6
An option with chartr
eval(parse(text = chartr('ab', '12', gsub("(?<=\\w)(?=\\w)", "+",
str, perl = TRUE))))
[1] 6

How to space-pad unicode string character in R?

I am trying to pad a character with a space using sprintf() (but any base R alternative is would be fine).
It works as expected for letter "a" but for "β" it won't work:
sprintf('% 2s', 'a')
#> [1] " a"
sprintf('% 2s', 'β')
#> [1] "β"
sprintf('% 3s', 'β')
#> [1] " β"
I guess it has to do with the fact that it takes two bytes (i.e., two sprintf's "characters") to represent the "β" string... but so, I could I change my code to make it work and pad with spaces in a way that "β" is understood as one character (i.e., one-visible character).
Convert the string to native first. This worked for me on Windows but not on https://rdrr.io/snippets/ which reports its .Platform$os.type as unix.
s <- 'β'; n <- 3 # inputs
sprintf("%*s", n, enc2native(s)) # or hard code the 3 and drop n
## [1] " ß"
Alternately use paste0 or substring<- with strrep or convert the string to X's, perform the sprintf and then convert back. These worked on Windows and on https://rdrr.io/snippets/ .
# 2
paste0(strrep(' ', n - nchar(s)), s)
## [1] " β"
# 3
`substring<-`(strrep(" ", n), n - nchar(s) + 1, n, s)
## [1] " ß"
# 4
sub("X+", s, sprintf("%*s", n, strrep("X", nchar(s))))
## [1] " β"

How to insert back a character in a string at the exact position where it was originally

I have strings that have dots here and there and I would like to remove them - that is done, and after some other operations - these are also done, I would like to insert the dots back at their original place - this is not done. How could I do that?
library(stringr)
stringOriginal <- c("abc.def","ab.cd.ef","a.b.c.d")
dotIndex <- str_locate_all(pattern ='\\.', stringOriginal)
stringModified <- str_remove_all(stringOriginal, "\\.")
I see that str_sub() may help, for example str_sub(stringModified[2], 3,2) <- "." gets me somewhere, but it is still far from the right place, and also I have no idea how to do it programmatically. Thank you for your time!
update
stringOriginal <- c("11.123.100","11.123.200","1.123.1001")
stringOriginalF <- as.factor(stringOriginal)
dotIndex <- str_locate_all(pattern ='\\.', stringOriginal)
stringModified <- str_remove_all(stringOriginal, "\\.")
stringNumFac <- sort(as.numeric(stringModified))
stringi::stri_sub(stringNumFac[1:2], 3, 2) <- "."
stringi::stri_sub(stringNumFac[1:2], 7, 6) <- "."
stringi::stri_sub(stringNumFac[3], 2, 1) <- "."
stringi::stri_sub(stringNumFac[3], 6, 5) <- "."
factor(stringOriginal, levels = stringNumFac)
after such manipulation, I am able to order the numbers and convert them back to strings and use them later for plotting.
But since I wouldn't know the position of the dot, I wanted to make it programmatical. Another approach for factor ordering is also welcomed. Although I am still curious about how to insert programmatically back a character in a string at the exact position where it was originally.
This might be one of the cases for using base R's strsplit, which gives you a list, with a vector of substrings for each entry in your original vector. You can manipulate these with lapply or sapply very easily.
split_string <- strsplit(stringOriginal, "[.]")
#> split_string
#> [[1]]
#> [1] "11" "123" "100"
#>
#> [[2]]
#> [1] "11" "123" "200"
#>
#> [[3]]
#> [1] "1" "123" "1001"
Now you can do this to get the numbers
sapply(split_string, function(x) as.numeric(paste0(x, collapse = "")))
# [1] 11123100 11123200 11231001
And this to put the dots (or any replacement for the dots) back in:
sapply(split_string, paste, collapse = ".")
# [1] "11.123.100" "11.123.200" "1.123.1001"
And you could get the location of the dots within each element of your original vector like this:
lapply(split_string, function(x) cumsum(nchar(x) + 1))
# [[1]]
# [1] 3 7 11
#
# [[2]]
# [1] 3 7 11
#
# [[3]]
# [1] 2 6 11

Extract numbers after a pattern in vector of characters

I'm trying to extract values from a vector of strings. Each string in the vector, (there are about 2300 in the vector), follows the pattern of the example below:
"733|Overall (-2 to 2): _________2________________|How controversial is each sentence (1-5)?|Sent. 1 (ANALYSIS BY...): ________1__________|Sent. 2 (Bail is...): ____3______________|Sent. 3 (2) A...): _______1___________|Sent. 4 (3) A...): _______1___________|Sent. 5 (Proposition 100...): _______5___________|Sent. 6 (In 2006,...): _______3___________|Sent. 7 (That legislation...): ________1__________|Types of bias (check all that apply):|Pro Anti|X O Word use (bold, add alternate)|X O Examples (italicize)|O O Extra information (underline)|X O Any other bias (explain below)|Last sentence makes it sound like an urgent matter.|____________________________________________|NA|undocumented, without a visa|NA|NA|NA|NA|NA|NA|NA|NA|"
What I'd like is to extract the numbers following the pattern "Sent. " and place them into a separate vector. For the example, I'd like to extract "1311531".
I'm having trouble using gsub to accomplish this.
library(tidyverse)
Data <- c("PASTE YOUR WHOLE STRING")
str_locate(Data, "Sent. ")
Reference <- str_locate_all(Data, "Sent. ") %>% as.data.frame()
Reference %>% names() #Returns [1] "start" "end"
Reference <- Reference %>% mutate(end = end +1)
YourNumbers <- substr(Data,start = Reference$end[1], stop = Reference$end[1])
for (i in 2:dim(Reference)[1]){
Temp <- substr(Data,start = Reference$end[i], stop = Reference$end[i])
YourNumbers <- paste(YourNumbers, Temp, sep = "")
}
YourNumbers #Returns "1234567"
We can use str_match_all from stringr to get all the numbers followed by "Sent".
str_match_all(text, "Sent.*?_+(\\d+)")[[1]][, 2]
#[1] "1" "3" "1" "1" "5" "3" "1"
A base R option using strsplit and sub
lapply(strsplit(ss, "\\|"), function(x)
sub("Sent.+: _+(\\d+)_+", "\\1", x[grepl("^Sent", x)]))
#[[1]]
#[1] "1" "3" "1" "1" "5" "3" "1"
Sample data
ss <- "733|Overall (-2 to 2): _________2________________|How controversial is each sentence (1-5)?|Sent. 1 (ANALYSIS BY...): ________1__________|Sent. 2 (Bail is...): ____3______________|Sent. 3 (2) A...): _______1___________|Sent. 4 (3) A...): _______1___________|Sent. 5 (Proposition 100...): _______5___________|Sent. 6 (In 2006,...): _______3___________|Sent. 7 (That legislation...): ________1__________|Types of bias (check all that apply):|Pro Anti|X O Word use (bold, add alternate)|X O Examples (italicize)|O O Extra information (underline)|X O Any other bias (explain below)|Last sentence makes it sound like an urgent matter.|____________________________________________|NA|undocumented, without a visa|NA|NA|NA|NA|NA|NA|NA|NA|"

Simple loop for changing file names doesn't work

I wrote the following loop to convert user input, which can be single-, two- or three digit numbers, into all three digit numbers; such that an input vector [7, 8, 9, 10, 11] would be converted into an output vector [007, 008, 009, 010, 011]. This is my code:
zeroes <- function(id){
for(i in 1:length(id)){
if(id[i] <= 9){
id[i] <- paste("00", id[i], sep = "")
}
else if(id[i] >= 10 && id[i] <= 99){
id[i] <- paste("0", id[i], sep = "")
}
}
id
}
For an input vector
id <- 50:100
I get the following output:
[1] "050" "0051" "0052" "0053" "0054" "0055" "0056" "0057" "0058" "0059"
[11] "0060" "0061" "0062" "0063" "0064" "0065" "0066" "0067" "0068" "0069"
[21] "0070" "0071" "0072" "0073" "0074" "0075" "0076" "0077" "0078" "0079"
[31] "0080" "0081" "0082" "0083" "0084" "0085" "0086" "0087" "0088" "0089"
[41] "090" "091" "092" "093" "094" "095" "096" "097" "098" "099"
[51] "00100"
So, it looks like for id[1] the function works, then there is a bug for the following numbers, but for id[41:50], I get the correct output again. I haven't been able to figure out why this is the case, and what I am doing wrong. Any suggestions are warmly welcomed.
Its because when you do the first replacement on id in your function, the vector becomes character (because a vector can't store numbers and characters).
So zeroes(51) works fine:
> zeroes(51)
[1] "051"
but if its the second item, it fails:
> zeroes(c(50,51))
[1] "050" "0051"
because by the time your loop gets on to the 51, its actually "51" in quotes. And that fails:
> zeroes("51")
[1] "0051"
because "51" is less than 9:
> "51"<9
[1] TRUE
because R converts the 9 to a "9" and then does a character comparison, so only the "5" gets compared with the "9" and "5" is before "9" in the collating sequence alphabet.
Other languages might convert the character "51" to numeric and then compare with the numeric 9 and say "51"<9 is False, but R does it this way.
Lesson: don't overwrite your input vectors! (and use sprintf).

Resources