Simple loop for changing file names doesn't work - r

I wrote the following loop to convert user input, which can be single-, two- or three digit numbers, into all three digit numbers; such that an input vector [7, 8, 9, 10, 11] would be converted into an output vector [007, 008, 009, 010, 011]. This is my code:
zeroes <- function(id){
for(i in 1:length(id)){
if(id[i] <= 9){
id[i] <- paste("00", id[i], sep = "")
}
else if(id[i] >= 10 && id[i] <= 99){
id[i] <- paste("0", id[i], sep = "")
}
}
id
}
For an input vector
id <- 50:100
I get the following output:
[1] "050" "0051" "0052" "0053" "0054" "0055" "0056" "0057" "0058" "0059"
[11] "0060" "0061" "0062" "0063" "0064" "0065" "0066" "0067" "0068" "0069"
[21] "0070" "0071" "0072" "0073" "0074" "0075" "0076" "0077" "0078" "0079"
[31] "0080" "0081" "0082" "0083" "0084" "0085" "0086" "0087" "0088" "0089"
[41] "090" "091" "092" "093" "094" "095" "096" "097" "098" "099"
[51] "00100"
So, it looks like for id[1] the function works, then there is a bug for the following numbers, but for id[41:50], I get the correct output again. I haven't been able to figure out why this is the case, and what I am doing wrong. Any suggestions are warmly welcomed.

Its because when you do the first replacement on id in your function, the vector becomes character (because a vector can't store numbers and characters).
So zeroes(51) works fine:
> zeroes(51)
[1] "051"
but if its the second item, it fails:
> zeroes(c(50,51))
[1] "050" "0051"
because by the time your loop gets on to the 51, its actually "51" in quotes. And that fails:
> zeroes("51")
[1] "0051"
because "51" is less than 9:
> "51"<9
[1] TRUE
because R converts the 9 to a "9" and then does a character comparison, so only the "5" gets compared with the "9" and "5" is before "9" in the collating sequence alphabet.
Other languages might convert the character "51" to numeric and then compare with the numeric 9 and say "51"<9 is False, but R does it this way.
Lesson: don't overwrite your input vectors! (and use sprintf).

Related

strsplit(rquote, split = "")[[1]] in R

rquote <- "r's internals are irrefutably intriguing"
chars <- strsplit(rquote, split = "")[[1]]
This question has been asked before on this forum and has one answer on it but I couldn't understand anything from that answer, so here I am asking this question again.
In the above code what is the meaning of [[1]] ?
The program that I'm trying to run:
rquote <- "r's internals are irrefutably intriguing"
chars <- strsplit(rquote, split = "")[[1]]
rcount <- 0
for (char in chars) {
if (char == "r") {
rcount <- rcount + 1
}
if (char == "u") {
break
}
}
print(rcount)
When I don't use [[1]] I get the following warning message in for loop and I get a wrong output of 1 for rcount instead of 5:
Warning message: the condition has length > 1 and only the first element will be used
strsplit is vectorized. That means it splits each element of a vector into a vectors. To handle this vector of vectors it returns a list in which a slot (indexed by [[) corresponds to a element of the input vector.
If you use the function on a one element vector (single string as you do), you get a one-slot list. Using [[1]] right after strsplit() selects the first slot of the list - the anticipated vector.
Unfortunately, your list chars works in a for loop - you have one iteration with the one slot. In if you compare the vector of letters against "r" which throws the warning. Since the first element of the comparison is TRUE, the condition holds and rcount is rised by 1 = your result. Since you are not indexing the letters but the one phrase, the cycle stops there.
Maybe if you run something like strsplit(c("one", "two"), split="") , the outcome will be more straightforward.
> strsplit(c("one", "two"), split="")
[[1]]
[1] "o" "n" "e"
[[2]]
[1] "t" "w" "o"
> strsplit(c("one", "two"), split="")[[1]]
[1] "o" "n" "e"
> strsplit(c("one"), split="")[[1]][2]
[1] "n"
We'll start with the below as data, without [[1]]:
rquote <- "r's internals are irrefutably intriguing"
chars2 <- strsplit(rquote, split = "")
class(chars2)
[1] "list"
It is always good to have an estimate of your return value, your above '5'. We have both length and lengths.
length(chars2)
[1] 1 # our list
lengths(chars2)
[1] 40 # elements within our list
We'll use lengths in our for loop for counter, and, as you did, establish a receiver vector outside the loop,
rcount2 <- 0
for (i in 1:lengths(chars2)) {
if (chars2[[1]][i] == 'r') {
rcount2 <- rcount2 +1
}
if (chars2[[1]][i] == 'u') {
break
}
}
print(rcount2)
[1] 6
length(which(chars2[[1]] == 'r')) # as a check, and another way to estimate
[1] 6
Now supposing, rather than list, we have a character vector:
chars1 <- strsplit(rquote, split = '')[[1]]
length(chars1)
[1] 40
rcount1 <- 0
for(i in 1:length(chars1)) {
if(chars1[i] == 'r') {
rcount1 <- rcount1 +1
}
if (chars1[i] == 'u') {
break
}
}
print(rcount1)
[1] 5
length(which(chars1 == 'r'))
[1] 6
Hey, there's your '5'. What's going on here? Head scratch...
all.equal(chars1, unlist(chars2))
[1] TRUE
That break should just give us 5 'r' before a 'u' is encountered. What's happening when it's a list (or does that matter...?), how does the final r make it into rcount2?
And this is where the fun begins. Jeez. break for coffee and thinking. Runs okay. Usual morning hallucination. They come and go. But, as a final note, when you really want to torture yourself, put browser() inside your for loop and step thru.
Browse[1]> i
[1] 24
Browse[1]> n
debug at #7: break
Browse[1]> chars2[[1]][i] == 'u'
[1] TRUE
Browse[1]> n
> rcount2
[1] 5

making for loop for character vector in R

char_vector <- c("Africa", "identical", "ending" ,"aa" ,"bb", "rain" ,"Friday" ,"transport") # character vector
Suppose I have the above character vector
I would like to create a for loop to print on the screen only the elements in a vector that have more than 5 characters and starts with a vowel
and also delete from the vector those elements that do not start with a vowel
I created this for loop but it also gives null characters
for (i in char_vector){
if (str_length(i) > 5){
i <- str_subset(i, "^[AEIOUaeiou]")
print(i)
}
}
The result for the above is
[1] "Africa"
[1] "identical"
[1] "ending"
character(0)
character(0)
My desired result would only be the first 3 characters
I'm really new to R and facing huge difficulty with creating a for loop for this problem. Any help would be greatly appreciated!
Use grepl with the pattern ^[AEIOUaeiuo]\w{5,}$:
char_vector <- c("Africa", "identical", "ending" ,"aa" ,"bb", "rain" ,"Friday" ,"transport")
char_vector <- char_vector[grepl("^[AEIOUaeiuo]\\w{5,}$", char_vector)]
char_vector
[1] "Africa" "identical" "ending"
The regex pattern used here says to match words which:
^ from the start of the word
[AEIOUaeiuo] starts with a vowel
\w{5,} followed by 5 or more characters (total length > 5)
$ end of the word
You don't need for loop, because we use vectorized functions in R.
A simple solution using grep and substr (refer to Tim Blegeleisen answer for details):
substr(grep('^[aeiu].{4}', char_vector, T, , T), 1, 3)
# [1] "Afr" "ide" "end"
With stringr functions, you'd rather use str_detect instead of str_subset, and you can take advantage of the fact that those functions are vectorized:
library(stringr)
char_vector[str_length(char_vector) > 5 & str_detect(char_vector, "^[AEIOUaeiou]")]
#[1] "Africa" "identical" "ending"
or if you want your for loop as a single vector:
vec <- c()
for (i in char_vector){
if (str_length(i) > 5 & str_detect(i, "^[AEIOUaeiou]")){
vec <- c(vec, i)
}
}
vec
# [1] "Africa" "identical" "ending"
The first 3 characters?
library(stringr)
for (i in char_vector){
if (str_length(i) > 5 & str_detect(i, "^[AEIOUaeiou]")) {
word <- str_sub(i, 1, 3)
print(word)
}
}
output is:
[1] "Afr"
[1] "ide"
[1] "end"
Using only base R functions. No need for a loop. I wrapped the steps in a function so you can use the function with other character vectors. You could make this code shorter (see #utubun's answer) but I feel it is easier to understand the process with a "one line one step" approach.
char_vector <- c("Africa", "identical", "ending" ,"aa" ,"bb", "rain" ,"Friday" ,"transport")
yourfun <- function(char_vector){
char_vector <- char_vector[nchar(char_vector)>= 5] # grab only the strings that are at least 5 characters long
char_vector <- char_vector[grep(pattern = "^[AEIOUaeiou]", char_vector)] # grab strings that starts with vowel
return(char_vector) # print the first three strings
# remove comments to get the first three characters of each string
# out <- substring(char_vector, 1, 3) # select only the first 3 characters of each string
# return(out)
}
yourfun(char_vector = char_vector)
#> [1] "Africa" "identical" "ending"
Created on 2022-05-09 by the reprex package (v2.0.1)

Setting maximum numbers of characters in number

I want to have number with respect to maximum number of characters. e.g. let's take value 517.1918
I want to set that maximum number of characters to three, then it should give mu just 517 (just three first characters)
My work so far
I tried so split my number into to parts : first one containing three first numbers and second one containing remaining numbers by a code following :
d_convert<-function(x){
x<-sub('(.{3})(.{2})', '\\1', x)
x
}
d_convert(12345)
And it work's, but I'm not sure how can I put instead of (.{2}), length(x)-3. I tried print(paste()) but it didn't work. Is there any simply way how to do it ?
Try using signif which rounds a number to a given number of significant digits.
> signif(517.1918, 3)
[1] 517
I'm not sure if I understood what want, but you can try this:
d_convert2 <-function(x, digits=3){
x <- gsub("\\D", "", x)
num_string <- strsplit(x, "")[[1]]
out <- list(digits = num_string[1L:digits], renaming = num_string[(digits+1):length(num_string)])
out <- lapply(out, paste0, collapse="")
return(out)
}
> d_convert2(12345)
$digits
[1] "123"
$renaming
[1] "45"
> d_convert2("1,234.5")
$digits
[1] "1" "2" "3"
$renaming
[1] "4" "5"

Dynamically numbering files in R with placeholders using an If-elseif

I have a vector
x <- c(1,90,233)
I need to convert this to a vector of the form:
result = c("001.csv","090.csv","233.csv")
This is the function that I wrote to perform this operation:
convert <- function(x){
for (a in 1:length(x)){
if (x[a]<10) {
x[a]<- paste("00",x[a],".csv",sep="")
}
else if (x[a] < 100) {
x[a]<- paste("0", x[a], ".csv",sep="")
}
else {
x[a]<-paste(x[a],".csv",sep="")
}
}
x
}
The output I got was:
[1] "001.csv","90.csv","233.csv"
So, a[2] is 90 was processed in the else part and not the else if part. Then I changed the else if condition to x[a]<=99
convert <- function(x){
for (a in 1:length(x)){
if (x[a]<10) {
x[a]<- paste("00",x[a],".csv",sep="")
}
else if (x[a] <= 99) {
x[a]<- paste("0", x[a], ".csv",sep="")
}
else {
x[a]<-paste(x[a],".csv",sep="")
}
}
x
}
I got this output:
[1] "001.csv" "090.csv" "0233.csv"
Now both x[2] and x[3] ie 90 and 233 are being processed in the ElseIf part. What am I doing wrong here? And how do I get the output I need?
This is a little bit more dynamic as you do not need to specify the number of places held by the largest number.
Step 1:
Obtain the maximum number of places held.
(nb = max(nchar(x)))
To get:
3
Step 2:
Paste the number into a sprintf() call that will automatically format the digit.
sprintf("%0*d.csv", nb, x)
To get:
[1] "001.csv" "090.csv" "233.csv"
The problem is that the first round of your loop makes a character, that converts the whole vector to type character. You can get around that using nchar
convert <- function(x){
for (a in 1:length(x)){
if (nchar(x[a]) == 1) {
x[a]<- paste("00",x[a],".csv",sep="")
}
else if (nchar(x[a]) == 2) {
x[a]<- paste("0", x[a], ".csv",sep="")
}
else {
x[a]<-paste(x[a],".csv",sep="")
}
}
x
}
sprintf("%03d", x)
[1] "001" "090" "233"
You can avoid a call to paste by including the ".csv" in the format string:
sprintf("%03d.csv", x)
[1] "001.csv" "090.csv" "233.csv"
The problem with the original code is the conversion to character, which happens on the first element.
Here's the conversion to character:
> x <- c(1, 90, 233)
> x
[1] 1 90 233
> x[1] <- "001.csv"
> x
[1] "001.csv" "90" "233"
Here's the resulting comparison of the second element:
> "90" <= 99
[1] TRUE
> "90" < 100
[1] FALSE
Similarly for the third:
> "233" < 100
[1] FALSE
> "233" <= 99
[1] TRUE
In all of these cases, the right-hand side is converted to character, then the comparison is made, as character strings.
Your code doesn't work as expected because the whole vector gets converted into a character vector after first assignment(conversion of numeric to character).
Please note that when a string is compared to digit, the characters are matched one by one. For eg. if you compare "90" to 100 then 9 is compared to 1, hence control goes to the else part and in the case of comparison of "233" to 99, 2 is compared 9.
You can get around this by assigning the changed values to another vector.Or, you could use the str_pad function from the stringr package.
library(stringr)
x=c(1,90,233)
padded_name= str_pad(x,width=3,side="left",pad="0")
file_name = paste0(padded_name, ".csv")

How to convert a hex string to text in R?

Is there a function which converts a hex string to text in R?
For example:
I've the hex string 1271763355662E324375203137 which should be converted to qv3Uf.2Cu 17.
Does someone know a good solution in R?
Here's one way:
s <- '1271763355662E324375203137'
h <- sapply(seq(1, nchar(s), by=2), function(x) substr(s, x, x+1))
rawToChar(as.raw(strtoi(h, 16L)))
## [1] "\022qv3Uf.2Cu 17"
And if you want, you can sub out non-printable characters as follows:
gsub('[^[:print:]]+', '', rawToChar(as.raw(strtoi(h, 16L))))
## [1] "qv3Uf.2Cu 17"
Just to add to #jbaums answer or to simplify it
library(wkb)
hex_string <- '231458716E234987'
hex_raw <- wkb::hex2raw(hex_string)
text <- rawToChar(as.raw(strtoi(hex_raw, 16L)))
An alternative way that separates the two parts involved:
Turn the initial string into a vector of bytes (with values as hexadecimals)
Convert those raw bytes into characters (excluding any not printable)
Part 1:
s <- '1271763355662E324375203137'
sc <- unlist(strsplit(s, ""))
i1 <- (1:nchar(s)) %% 2 == 1
# vector of bytes (as character)
s_pairs1 <- paste0(sc[i1], sc[!i1])
# make explicit it is a series of hexadecimals bytes
s_pairs2 <- paste0("0x", s_pairs1)
head(s_pairs2)
#> [1] "0x12" "0x71" "0x76" "0x33" "0x55" "0x66"
Part 2:
s_raw1 <- as.raw(s_pairs2)
# filter non printable values (ascii < 32 = 0x20)
s_raw2 <- s_raw1[s_raw1 >= as.raw("0x20")]
rawToChar(s_raw2)
#> [1] "qv3Uf.2Cu 17"
We could also use as.hexmode() function to turn s_pairs1 into a vector of hexadecimals
s_pairs2 <- as.hexmode(s_pairs1)
head(s_pairs2)
#> [1] "12" "71" "76" "33" "55" "66"
Created on 2023-01-03 by the reprex package (v2.0.1)

Resources