I have a problem using a for-loop in R. The following code
a <- seq(-2, 5)
for(i in 1:length(a)){
a[i] <- if(a[i] <= 0) "aa" else a[i]
}
should result in the following vector
> a
[1] "aa" "aa" "aa" "1" "2" "3" "4" "5"
Instead we have the following result:
> a
[1] "aa" "-1" "aa" "1" "2" "3" "4" "5"
Why isn't R able to replace "-1" with "aa"?
We tried another solution which works fine:
a <- seq(-2, 5)
b <- NULL
for(i in 1:length(a)){
b[i] <- if(a[i] <= 0) "aa" else a[i]
}
it produces the expected result:
> b
[1] "aa" "aa" "aa" "1" "2" "3" "4" "5"
Why does the latter example work fine and the first one not?
Thank you very much for your help!
Best regards!!
The collation sequence may not be as you (or Matthew) understand. The character "-" may not be lower in the lexical ordering for your operating system. String comparisons are OS specific. (See ?Comparison) After the first replacement the entire vector was coerced to character and if "-" > 0 returns TRUE on your machine then you have the answer. I will bet that this code will act as you expected:
a <- seq(-2, 5)
for(i in 1:length(a)){
a[i] <- if( as.numeric(a[i]) <= 0) "aa" else a[i]
}
I suspect that Henrik's suggestion should also behave to your expectations because it would create a logical vector from the numeric comparison first, and then select from the choice of "aa" and a.
(In the second instance there was no coercion of the vector to character.)
Related
I wonder how for loop can be used at once without non-numeric error. I would like to make multiple character values in a vector Nums, using for loop.
But after the third line, the vector becomes chr so cannot continue the rest. This comes out to be same even when I use if loop or while loop... Can someone give a hint about this?
for(n in 1:30){
Nums<-1:n
Nums[Nums%%2==0 & Nums%%3==0]<-"OK1"
Nums[Nums%%2==0 & Nums%%3!=0]<-"OK2"
Nums[Nums%%2!=0 & Nums%%3==0]<-"OK3"
Nums[Nums%%2!=0 & Nums%%3!=0]<-n
}
Error in Nums%%2 : non-numeric argument to binary operator
I don't think the loop is actually doing what you want it to do. You are replacing Nums at every iteration, so nothing is actually being saved. Maybe you don't actually want a loop.
Nums <- 1:30
x <- 1:30
dplyr::case_when(
Nums%%2==0 & x%%3==0 ~ "OK1",
Nums%%2==0 & x%%3!=0 ~ "OK2",
Nums%%2!=0 & x%%3==0 ~ "OK3",
Nums%%2!=0 & x%%3!=0 ~ as.character(x)
)
#> [1] "1" "OK2" "OK3" "OK2" "5" "OK1" "7" "OK2" "OK3" "OK2" "11" "OK1"
#> [13] "13" "OK2" "OK3" "OK2" "17" "OK1" "19" "OK2" "OK3" "OK2" "23" "OK1"
#> [25] "25" "OK2" "OK3" "OK2" "29" "OK1"
Character and numeric values can't coexist in a vector*. As #Ands. points out, you don't really need a loop for this. If you want to avoid case_when (which is from the dplyr package, part of the "tidyverse"), you can do:
n <- 30
Nums <- 1:n
x <- as.character(Nums)
x[Nums%%2==0 & Nums%%3==0]<-"OK1"
x[Nums%%2==0 & Nums%%3!=0]<-"OK2"
x[Nums%%2!=0 & Nums%%3==0]<-"OK3"
You don't need the final statement because the remaining elements were already set to the corresponding numeric values.
If you want to use a for loop and replace as you go, you could convert the vector to a list:
Nums <- 1:n
Nums <- as.list(Nums)
for (i in 1:n) {
if (i%%2==0 & i%%3==0) Nums[[i]] <- "OK1"
if (i%%2==0 & i%%3!=0) Nums[[i]] <- "OK2"
if (i%%2!=0 & i%%3==0) Nums[[i]] <- "OK3"
}
unlist(Nums)
* Technically they can't coexist in an atomic vector — lists are vectors too ...
This question sounds to be partially answered here but this is not enough specific to me. I would like to understand better when an object is updated by reference and when it is copied.
The simpler example is vector growing. The following code is blazingly inefficient in R because the memory is not allocated before the loop and a copy is made at each iteration.
x = runif(10)
y = c()
for(i in 2:length(x))
y = c(y, x[i] - x[i-1])
Allocating the memory enable to reserve some memory without reallocating the memory at each iteration. Thus this code is drastically faster especially with long vectors.
x = runif(10)
y = numeric(length(x))
for(i in 2:length(x))
y[i] = x[i] - x[i-1]
And here comes my question. Actually when a vector is updated it does move. There is a copy that is made as shown below.
a = 1:10
pryr::tracemem(a)
[1] "<0xf34a268>"
a[1] <- 0L
tracemem[0xf34a268 -> 0x4ab0c3f8]:
a[3] <-0L
tracemem[0x4ab0c3f8 -> 0xf2b0a48]:
But in a loop this copy does not occur
y = numeric(length(x))
for(i in 2:length(x))
{
y[i] = x[i] - x[i-1]
print(address(y))
}
Gives
[1] "0xe849dc0"
[1] "0xe849dc0"
[1] "0xe849dc0"
[1] "0xe849dc0"
[1] "0xe849dc0"
[1] "0xe849dc0"
[1] "0xe849dc0"
[1] "0xe849dc0"
[1] "0xe849dc0"
I understand why a code is slow or fast as a function of the memory allocations but I don't understand the R logic. Why and how, for the same statement, in a case the update is made by reference and in the other case the update in made by copy. In the general case how can we know what will happen.
This is covered in Hadley's Advanced R book. In it he says (paraphrasing here) that whenever 2 or more variables point to the same object, R will make a copy and then modify that copy. Before going into examples, one important note which is also mentioned in Hadley's book is that when you're using RStudio
the environment browser makes a reference to every object you create on the command line.
Given your observed behavior, I'm assuming you're using RStudio which we will see will explain why there are actually 2 variables pointing to a instead of 1 like you might expect.
The function we'll use to check how many variables are pointing to an object is refs(). In the first example you posted you can see:
library(pryr)
a = 1:10
refs(x)
#[1] 2
This suggests (which is what you found) that 2 variables are pointing to a and thus any modification to a will result in R copying it, then modifying that copy.
Checking the for loop we can see that y always has the same address and that refs(y) = 1 in the for loop. y is not copied because there are no other references pointing to y in your function y[i] = x[i] - x[i-1]:
for(i in 2:length(x))
{
y[i] = x[i] - x[i-1]
print(c(address(y), refs(y)))
}
#[1] "0x19c3a230" "1"
#[1] "0x19c3a230" "1"
#[1] "0x19c3a230" "1"
#[1] "0x19c3a230" "1"
#[1] "0x19c3a230" "1"
#[1] "0x19c3a230" "1"
#[1] "0x19c3a230" "1"
#[1] "0x19c3a230" "1"
#[1] "0x19c3a230" "1"
On the other hand if introduce a non-primitive function of y in your for loop you would see that address of y changes each time which is more in line with what we would expect:
is.primitive(lag)
#[1] FALSE
for(i in 2:length(x))
{
y[i] = lag(y)[i]
print(c(address(y), refs(y)))
}
#[1] "0x19b31600" "1"
#[1] "0x19b31948" "1"
#[1] "0x19b2f4a8" "1"
#[1] "0x19b2d2f8" "1"
#[1] "0x19b299d0" "1"
#[1] "0x19b1bf58" "1"
#[1] "0x19ae2370" "1"
#[1] "0x19a649e8" "1"
#[1] "0x198cccf0" "1"
Note the emphasis on non-primitive. If your function of y is primitive such as - like: y[i] = y[i] - y[i-1] R can optimize this to avoid copying.
Credit to #duckmayr for helping explain the behavior inside the for loop.
I complete the #MikeH. awnser with this code
library(pryr)
x = runif(10)
y = numeric(length(x))
print(c(address(y), refs(y)))
for(i in 2:length(x))
{
y[i] = x[i] - x[i-1]
print(c(address(y), refs(y)))
}
print(c(address(y), refs(y)))
The output shows clearly what happened
[1] "0x7872180" "2"
[1] "0x765b860" "1"
[1] "0x765b860" "1"
[1] "0x765b860" "1"
[1] "0x765b860" "1"
[1] "0x765b860" "1"
[1] "0x765b860" "1"
[1] "0x765b860" "1"
[1] "0x765b860" "1"
[1] "0x765b860" "1"
[1] "0x765b860" "2"
There is a copy at the first iteration. Indeed because of Rstudio there are 2 refs. But after this first copy y belongs in the loops and is not available into the global environment. Then, Rstudio does not create any additional refs and thus no copy is made during the next updates. y is updated by reference. On loop exit y become available in the global environment. Rstudio creates an extra refs but this action does not change the address obviously.
I am trying to match the last digit in a character vector and replace it with the matched digit - 1. I have believe gsub is what I need to use but I cannot figure out what to use as the 'replace' argument. I can match the last number using:
gsub('[0-9]$', ???, chrvector)
But I am not sure how to replace the matched number with itself - 1.
Any help would be much appreciated.
Thank you.
We can do this easily with gsubfn
library(gsubfn)
gsubfn("([0-9]+)", ~as.numeric(x)-1, chrvector)
#[1] "str97" "v197exdf"
Or for the last digit
gsubfn("([0-9])([^0-9]*)$", ~paste0(as.numeric(x)-1, y), chrvector2)
#[1] "str97" "v197exdf" "v33chr138d"
data
chrvector <- c("str98", "v198exdf")
chrvector2 <- c("str98", "v198exdf", "v33chr139d")
Assuming the last digit is not zero,
chrvector <- as.character(1:5)
chrvector
#[1] "1" "2" "3" "4" "5"
chrvector <- paste(chrvector, collapse='') # convert to character string
chrvector <- paste0(substring(chrvector,1, nchar(chrvector)-1), as.integer(gsub('.*([0-9])$', '\\1', chrvector))-1)
unlist(strsplit(chrvector, split=''))
# [1] "1" "2" "3" "4" "4"
This works even if you have the last digit zero:
chrvector <- c(as.character(1:4), '0') # [1] "1" "2" "3" "4" "0"
chrvector <- paste(chrvector, collapse='')
chrvector <- as.character(as.integer(chrvector)-1)
unlist(strsplit(chrvector, split=''))
# [1] "1" "2" "3" "3" "9"
Could somebody explain me why this does not print all the numbers separately in R.
numberstring <- "0123456789"
for (number in numberstring) {
print(number)
}
Aren't strings just arrays of chars? Whats the way to do it in R?
In R "0123456789" is a character vector of length 1.
If you want to iterate over the characters, you have to split the string into
a vector of single characters using strsplit.
numberstring <- "0123456789"
numberstring_split <- strsplit(numberstring, "")[[1]]
for (number in numberstring_split) {
print(number)
}
# [1] "0"
# [1] "1"
# [1] "2"
# [1] "3"
# [1] "4"
# [1] "5"
# [1] "6"
# [1] "7"
# [1] "8"
# [1] "9"
Just for fun, here are a few other ways to split a string at each character.
x <- "0123456789"
substring(x, 1:nchar(x), 1:nchar(x))
# [1] "0" "1" "2" "3" "4" "5" "6" "7" "8" "9"
regmatches(x, gregexpr(".", x))[[1]]
# [1] "0" "1" "2" "3" "4" "5" "6" "7" "8" "9"
scan(text = gsub("(.)", "\\1 ", x), what = character())
# [1] "0" "1" "2" "3" "4" "5" "6" "7" "8" "9"
Possible with tidyverse::str_split
numberstring <- "0123456789"
str_split(numberstring,boundary("character"))
1. '0''1''2''3''4''5''6''7''8''9'
Here's a naive approach for iterating a string using a for loop and substring. This isn't any better than existing answers for the common case, but it might be useful if you want to break out of the loop early instead of always traversing the entire string once up front, as str_split/scan/substring(x, 1:nchar(x), 1:nchar(x))/regmatches requires.
s <- "0123456789"
if (s != "") {
for (i in 1:nchar(s)) {
print(substring(s, i, i))
}
}
The if is needed to avoid looping backwards from 1 to 0, inclusive of both ends.
Your question is not 100% clear as to the desired outcome (print each character individually from a string, or store each number in a way that the given print loop will result in each number being produced on its own line).
To store numberstring such that it prints using the loop you included:
numberstring<-c(0,1,2,3,4,5,6,7,8,9)
for(number in numberstring){print(number);}
[1] 0
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
>
I wrote a function in R to attach zeros such that any number between 1 and 100 comes out as 001 (1), 010 (10), and 100 (100) but I can't figure out why the if statements aren't qualifying like I would like them to.
id <- 1:11
Attach_zero <- function(id){
i<-1
for(i in id){
if(id[i] < 10){
id[i] <- paste("00",id[i], sep = "")
}
if((id[i] < 100)&&(id[i]>=10)){
id[i] <- paste("0",id[i], sep = "")
}
print(id[i])
}
}
The output is "001", "2", "3",... "010", "11"
I have no idea why the for loop is skipping middle integers.
The problem here is that you're assigning a character string (e.g. "001") to a numeric vector. When you do this, the entire id vector is converted to character (elements of a vector must be of one type).
So, after comparing 1 to 10 and assigning "001" to id[1], the next element of id is "2" (i.e. character 2). When an inequality includes a character element (e.g. "2" < 10), the numeric part is coerced to character, and alphabetic sorting rules apply. These rules mean that both "100" and "10" comes before "2", and so neither of your if conditions are met. This is the case for all numbers except 10, which according to alphabetic sorting is less than 100, and so your second if condition is met. When you get to 11, neither condition is met once again, since the "word" "11" comes after the word "100".
While there are a couple of ways to fix your function, this functionality exists in R (as mentioned in the comments), both with sprintf and formatC.
sprintf('%03d', 1:11)
formatC(1:11, flag=0, width=3)
# [1] "001" "002" "003" "004" "005" "006" "007" "008" "009" "010" "011"
For another vectorised approach, you could use nested ifelse statements:
ifelse(id < 10, paste0('00', id), ifelse(id < 100, paste0('0', id), id))
Try this:
id <- 1:11
Attach_zero <- function(id){
id1 <- id
i <- 1
for (i in seq_along(id)) {
if(id[i] < 10){
id1[i] <- paste("00", id[i], sep = "")
}
if(id[i] < 100 & id[i] >= 10){
id1[i] <- paste("0", id[i], sep = "")
}
}
print(id1)
}
If you try your function with id = c(1:3, 6:11):
Attach_zero(id)
##[1] "001"
##[1] "2"
##[1] "3"
##[1] "8"
##[1] "9"
##[1] "010"
##[1] "11"
##Error in if (id[i] < 10) { : missing value where TRUE/FALSE needed
What here happens is that the missing values are omitted because your i values says so. The i<-1 does nothing as it is after that written with for (i in id) which in turns gives i for each loop the ith value of id instead of an index. So if your id is id <- c(1:3, 6:11) you will have unexpected results as showed.
Just correcting your function to include all the elements of the id:
Attach_zero <- function(id){
for(i in 1:length(id)){
if(id[i] < 10){
id[i] <- paste("00",id[i], sep = "")
}
if((id[i] < 100)&&(id[i]>=10)){
id[i] <- paste("0",id[i], sep = "")
}
print(id[i])
}
}
Attach_zero(id)
##[1] "001"
##[1] "2"
##[1] "3"
##[1] "6"
##[1] "7"
##[1] "8"
##[1] "9"
##[1] "010"
##[1] "11"
Note the number 7 in this output.
And using sprintf as jbaums says, including it in a function:
Attach_zero <- function(id){
return(sprintf('%03d', id)) #You can change return for print if you want
}
Attach_zero(id)
## [1] "001" "002" "003" "006" "007" "008" "009" "010" "011"