Modifying for loop with if conditions to apply format in R - r

I am creating a variable called indexPoints that contains a subset of index values that passed certain conditions -
set.seed(1)
x = abs(rnorm(100,1))
y = abs(rnorm(100,1))
threshFC = 0.5
indexPoints=c()
seqVec = seq(1, length(x))
for (i in seq_along(seqVec)){
fract = x[i]/y[I]
fract[1] = NaN
if (!is.nan(fract)){
if(fract > (threshFC + 1) || fract < (1/(threshFC+1))){
indexPoints = c(indexPoints, i)
}
}
}
I am trying to recreate indexPoints using a more efficient method like apply methods (any except sapply). I started the process as shown below -
set.seed(1)
x = abs(rnorm(100,1))
y = abs(rnorm(100,1))
threshFC = 0.5
seqVec <- seq_along(x)
fract = x[seqVec]/y[seqVec]
fract[1] = NaN
vapply(fract, function(i){
if (!is.nan(fract)){ if(fract > (threshFC + 1) || fract < (1/(threshFC+1))){ i}}
}, character(1))
However, this attempt causes an ERROR:
Error in vapply(fract, function(i) { : values must be length 1,
but FUN(X[[1]]) result is length 0
How can I continue to modify the code to make it in an apply format. Note: sometimes, the fract variable contains NaN values, which I mimicked for the minimum examples above by using "fract[1] = NaN".

There are several problems with your code:
You tell vapply that you expect the internal code to return a character, yet the only thing you ever return is i which is numeric;
You only explicitly return something when all conditions are met, which means if the conditions are not all good, you do not return anything ... this is the same as return(NULL) which is also not character (try vapply(1:2, function(a) return(NULL), character(1)));
You explicitly set fract[1] = NaN and then test !is.nan(fract), so you will never get anything; and
(Likely a typo) You reference y[I] (capital "i") which is an error unless I is defined somewhere (which is no longer a syntax error but is now a logical error).
If I fix the code (remove NaN assignment) in your for loop, I get
indexPoints
# [1] 3 4 5 6 10 11 12 13 14 15 16 18 20 21 25 26 28 29 30 31 32 34 35 38 39
# [26] 40 42 43 44 45 47 48 49 50 52 53 54 55 56 57 58 59 60 61 64 66 68 70 71 72
# [51] 74 75 77 78 79 80 81 82 83 86 88 89 90 91 92 93 95 96 97 98 99
If we really want to do this one at a time (I recommend against it, read below), then there are a few methods:
Use Filter to only return the indices where the condition is true:
indexPoints2 <- Filter(function(i) {
fract <- x[i] / y[i]
!is.nan(fract) && (fract > (threshFC+1) | fract < (1/(threshFC+1)))
}, seq_along(seqVec))
identical(indexPoints, indexPoints2)
# [1] TRUE
Use vapply correctly, returning an integer either way:
indexPoints3 <- vapply(seq_along(seqVec), function(i) {
fract <- x[i] / y[i]
if (!is.nan(fract) && (fract > (threshFC+1) | fract < (1/(threshFC+1)))) i else NA_integer_
}, integer(1))
str(indexPoints3)
# int [1:100] NA NA 3 4 5 6 NA NA NA 10 ...
indexPoints3 <- indexPoints3[!is.na(indexPoints3)]
identical(indexPoints, indexPoints3)
# [1] TRUE
(Notice the explicit return of a specific type of NA, that is NA_integer_, so that vapply is happy.)
We can instead just return the logical if the index matches the conditions:
logicalPoints4 <- vapply(seq_along(seqVec), function(i) {
fract <- x[i] / y[i]
!is.nan(fract) && (fract > (threshFC+1) | fract < (1/(threshFC+1)))
}, logical(1))
head(logicalPoints4)
# [1] FALSE FALSE TRUE TRUE TRUE TRUE
identical(indexPoints, which(logicalPoints4))
# [1] TRUE
But really, there is absolutely no need to use vapply or any of the apply functions, since this can be easily (and much more efficiently) checked as a vector:
fract <- x/y # all at once
indexPoints5 <- which(!is.nan(fract) & (fract > (threshFC+1) | fract < (1/(threshFC+1))))
identical(indexPoints, indexPoints5)
# [1] TRUE
(If you don't use which, you'll see that it gives you a logical vector indicating if the conditions are met, similar to bullet 3 above with logicalPoints4.)

Related

Twin primes less than 87 in R

I am trying to list the first 87 twin primes. I'm using the Eratosthenes approach. Here is what I've worked on so far
Eratosthenes <- function(n) {
# Return all prime numbers up to n (based on the sieve of Eratosthenes)
if (n >= 2) {
sieve <- seq(2, n) # initialize sieve
primes <- c() # initialize primes vector
for (i in seq(2, n)) {
if (any(sieve == i)) { # check if i is in the sieve
primes <- c(primes, i) # if so, add i to primes
sieve <- sieve[(sieve %% i) != 0] # remove multiples of i from sieve
}
}
return(primes)
} else {
stop("Input value of n should be at least 2.")
}
}
Era <- c(Eratosthenes(87))
i <- 2:86
for (i in Era){
if (Era[i]+2 == Era[i+1]){
print(c(Era[i], Era[i+1]))
}
}
First thing I dont understand is this error:
Error in if (Era[i] + 2 == Era[i + 1]) { :
missing value where TRUE/FALSE needed
Second thing is in the list there are missing twin primes so for example (29,31)
Within your for loop, i is not index any more but the element in Era. In this case, you can try using (i+2) %in% Era to judge if i+2 is the twin
for (i in Era){
if ((i+2) %in% Era){
print(c(i,i+2))
}
}
which gives
[1] 3 5
[1] 5 7
[1] 11 13
[1] 17 19
[1] 29 31
[1] 41 43
[1] 59 61
[1] 71 73
A simpler way might be using diff, e.g.,
i <- Era[c(diff(Era)==2,FALSE)]
print(cbind(i,j = i+2))
which gives
> print(cbind(i,j = i+2))
i j
[1,] 3 5
[2,] 5 7
[3,] 11 13
[4,] 17 19
[5,] 29 31
[6,] 41 43
[7,] 59 61
[8,] 71 73
Firstly, (23,29) is not twin prime.
Secondly, your answer may be found in here
Edit: I've tried your code, I found that length of Era is 23.
Maybe when running if (Era[i] + 2 == Era[i+1]), it reaches to 24 and causes the problem.
for (i in Era) will set i to 2, then 3, then 5 etc which is not what you intended. Use for (i in seq_len(length(Era) - 1)).
for (i in seq_len(length(Era) - 1)){
if (Era[i] + 2 == Era[i + 1]){
print(c(Era[i], Era[i + 1]))
}
}
#> [1] 3 5
#> [1] 5 7
#> [1] 11 13
#> [1] 17 19
#> [1] 29 31
#> [1] 41 43
#> [1] 59 61
#> [1] 71 73

reducing repetitive tasks in data.table in R

I notice that i am doing the same thing multiple time, just with slightly different values:
HCCtreshold <- 40000
claimsMonthly[, HCC12mnth := +(HCCtreshold < claim12month)][ HCC12mnth == 1, `:=` (aboveHCCth12mnth = (claim12month - HCCtreshold))][is.na(aboveHCCth12mnth),aboveHCCth12mnth := 0]
claimsMonthly[, HCC11mnth := +(HCCtreshold < claim11month)][ HCC11mnth == 1, `:=` (aboveHCCth11mnth = (claim11month - HCCtreshold))][is.na(aboveHCCth11mnth),aboveHCCth11mnth := 0]
claimsMonthly[, HCC10mnth := +(HCCtreshold < claim10month)][ HCC10mnth == 1, `:=` (aboveHCCth10mnth = (claim10month - HCCtreshold))][is.na(aboveHCCth10mnth),aboveHCCth10mnth := 0]
So started with something like this:
k <- seq.default(from = 8, to = 12, by = 1)
claimsMonthly[paste0("HCC", k, "mnth") := lapply(k, function(x) (+(HCCtreshold < paste0("HCC", k, "mnth"))))]
i get an error:
Error: Check that is.data.table(DT) == TRUE. Otherwise, := and `:=`(...) are defined for use in j, once only and in particular ways. See help(":=").
I also tried:
for(k in 8:12){
claimsMonthly[, paste0("HCC", k, "mnth") := +(HCCtreshold < paste0("HCC", k, "mnth"))]
}
the columns are created correctly, but i get incorrect values inside them. I get an 1 everywhere
I am not sure what i am doing wrong?
I can offer some suggestions and, with some fake data, try them out.
You can programmatically define names on the left-hand side of := if you wrap a vector in c(...), so for instance DT[ c(vec_of_names) := list(some, values)].
You can programmatically retrieve values of variables with a vector of variable names and mget. While I generally think mget can indicate problematic code, I believe that in here it works with low risk. (While mget and get normally retrieve variables from the operating environment, often .GlobalEnv, from within a data.table operation then retrieve columns just as easily.)
Instead of a double-tap of assignment with == 1 and then is.na(...), we can use some logical trickery and the data.table::fcoalesce function. (If you aren't familiar, fcoalesce operates like SQL's coalesce function which is a vector-friendly way of finding the first non-NA value in arguments of vectors.
fcoalesce(c(1, 2, NA, NA), c(11, 12, 13, NA), c(21, 22, 23, 24))
# [1] 1 2 13 24
We can use fcoalesce(some + math * calc, 0) to do the math and, if NA, replace it with 0. (We use it on the above* variables below, and not necessarily on the HCC* logical variables. It can apply there too, if desired. If those HCC* variables are throw-away, though, it just doesn't matter.)
Fake data:
library(data.table)
set.seed(42)
hccthreshold <- 50
dat <- data.table( claim10month = sample(99, 10), claim11month = sample(99, 10), claim12month = sample(99, 10) )
dat$claim11month[5] <- NA
dat
# claim10month claim11month claim12month
# 1: 91 46 90
# 2: 92 71 14
# 3: 28 91 96
# 4: 80 25 91
# 5: 61 NA 8
# 6: 49 89 49
# 7: 69 97 37
# 8: 13 11 84
# 9: 60 95 41
# 10: 64 51 76
First, let's programmatically determine the column names we want to act on, and from then create the same vectors for the new variables. (I'm a big fan of determining and adapting these variable names programmatically, so that if you get a partial data set your code still works. You might consider setting checks and alarms to catch something wrong. For instance, stopifnot(length(claimnames) == 12L), in case you are expecting to always have precisely 12 months.)
claimnames <- grep("^claim[0-9]+month", colnames(dat), value = TRUE)
hccnames <- gsub("^claim", "HCC", claimnames)
abovenames <- gsub("^claim", "aboveHCC", claimnames)
claimnames
# [1] "claim10month" "claim11month" "claim12month"
hccnames
# [1] "HCC10month" "HCC11month" "HCC12month"
abovenames
# [1] "aboveHCC10month" "aboveHCC11month" "aboveHCC12month"
And now, we can process the data.
dat[, c(hccnames) := lapply(mget(claimnames), `>`, hccthreshold) ]
dat[, c(abovenames) := Map(function(hcc, clm) fcoalesce(clm - hcc * hccthreshold, 0),
mget(hccnames), mget(claimnames)) ]
dat
# claim10month claim11month claim12month HCC10month HCC11month HCC12month aboveHCC10month aboveHCC11month aboveHCC12month
# 1: 91 46 90 TRUE FALSE TRUE 41 46 40
# 2: 92 71 14 TRUE TRUE FALSE 42 21 14
# 3: 28 91 96 FALSE TRUE TRUE 28 41 46
# 4: 80 25 91 TRUE FALSE TRUE 30 25 41
# 5: 61 NA 8 TRUE NA FALSE 11 0 8
# 6: 49 89 49 FALSE TRUE FALSE 49 39 49
# 7: 69 97 37 TRUE TRUE FALSE 19 47 37
# 8: 13 11 84 FALSE FALSE TRUE 13 11 34
# 9: 60 95 41 TRUE TRUE FALSE 10 45 41
# 10: 64 51 76 TRUE TRUE TRUE 14 1 26
I chose to keep the HCC* variables as logical instead of your +(...) integers, but it's directly translatable and up to you.

Return list of lists from foreach loop in R

I have a function which returns a list of two objects (a list l and a number n). I want to loop over this function in a foreach loop.
create_lists <- function(){
l = sample(100, 5)
n = sample(100, 1)
return(list(l=l, n=n))}
Because create_lists has a list as ouput, this post told me to use a combine function which looks like this:
combine_custom <- function(list1, list2){
ls = c(list1$l, list2$l)
ns = c(list1$n, list2$n)
return(list(l = ls, n = ns))
}
So now my foreach loop looks like this:
m = foreach(i=1:5, .combine = combine_custom)%do%{
create_lists()}
My desired output would be:
m$l
[[1]]
[1] 100 25 86 21 28
[[2]]
[1] 78 37 79 41 61
[[3]]
[1] 73 22 78 94 13
[[4]]
[1] 15 28 76 78 52
[[5]]
[1] 32 93 92 2 1
m$n
[1] 52 56 3 79 82
But what I get is something like this:
$l
[1] 84 28 75 59 68 84 28 75 59 68
$n
[1] 31 91 18 98 39
So I have two problems:
1) Why is everything but two of the l lists dropped?
2) How can I make m$l to be a list of lists?
EDIT:
I tried another approach I got from here which does not use c:
combine_custom <- function(list1, list2){
ls = list1$l[[length(list1$l)+1]] = list(list2$l)
ns = c(list1$n, list2$n)
return(list(l = ls, n = ns))
}
But this gave the same result as described above, to be exact:
$l
$l[[1]]
[1] 65 84 48 81 82
$n
[1] 88 79 92 36 71
I have found another way which avoids the problem mentioned above, namely that combine has to create a new list first and later only append lists.
Also, the real function I am using actually returns a list of lists, so the following proved useful:
combine_custom <- function(list1, list2) {
if (plotrix::listDepth(list1$l) > plotrix::listDepth(list2$l)) {
ls <- c(list1$l, list(list2$l))
} else {
ls <- c(list(list1$l), list(list2$l))
}
ns <- c(list1$n, list2$n)
return(list(l = ls, n = ns))
}
This is not perfect if the function can return lists of varying nesting depths, but it works in my case.
The combine part is giving a lot of trouble, because on the first iteration, it needs to make a list out of two lists , but on the second iteration, it needs to append one list as an element to a list of lists.
Another approach (may or may not work depending on the size of your actual data/problem) is to use the purrr package for working with lists:
> m <- foreach(i=1:3)%do%{create_lists()}
> m
[[1]]
[[1]]$l
[1] 21 33 12 50 36
[[1]]$n
[1] 74
[[2]]
[[2]]$l
[1] 12 80 39 78 6
[[2]]$n
[1] 74
[[3]]
[[3]]$l
[1] 9 61 75 63 94
[[3]]$n
[1] 2
> purrr::transpose(m)
$l
$l[[1]]
[1] 21 33 12 50 36
$l[[2]]
[1] 12 80 39 78 6
$l[[3]]
[1] 9 61 75 63 94
$n
$n[[1]]
[1] 74
$n[[2]]
[1] 74
$n[[3]]
[1] 2
Hope that helps!
Thank you #Maria H., you solved my problem! The 'plotrix' package didn't work for me, but I used 'collapse' and it worked fine:
combine_custom1 <- function(a, b) {
if (collapse::ldepth(a) > collapse::ldepth(b)) {
ls <- c(a, list(b))
} else {
ls <- c(list(a), list(b))
}
return(ls)
}

Integers that are not divisible by several numbers

I am trying to print a vector with the integers between 1 and 100 that are not divisible by 2, 3 and 7 in R.
I tried seq but I am not sure how to continue.
Another option is to use Filter to, well, filter the sequence for any number that meets your condition:
Filter(function(i) { all(i %% c(2,3,7) != 0) }, seq(100))
## [1] 1 5 11 13 17 19 23 25 29 31 37 41 43 47 53 55 59 61 65 67 71 73 79 83 85 89 95 97
Note that while this may (IMO) be the most readable, it's the worst in terms of performance (so far):
UPDATED to take into account rawr's for loop solution:
microbenchmark(
filter={ v1 <- seq(100); Filter(function(i) { all(i %% c(2,3,7) != 0) }, v1) },
reduce={ v1 <- seq(100); v1[!Reduce(`|`,lapply(c(2,3,7), function(x) !(v1 %%x)))] },
rowout={ v1 <- seq(100); v1[rowSums(outer(v1, c(2, 3, 7), "%%") == 0) == 0] },
looopy={ v1 <- seq(100); for (ii in c(2,3,7)) v1 <- v1[-which(v1 %% ii == 0)]; v1 },
times=1000
)
## Unit: microseconds
## expr min lq mean median uq max neval cld
## filter 108.280 118.7000 143.88592 126.2155 136.6290 2349.952 1000 c
## reduce 21.552 23.8095 25.91997 24.8150 25.8580 144.067 1000 ab
## rowout 26.075 28.4920 31.11812 29.5350 31.2125 184.225 1000 b
## looopy 14.149 16.0765 18.11806 16.8995 17.8595 160.485 1000 a
To make it fair I added sequence generation to all of them (and, I was doing this to compare relative performance vs actual speed anyway, so the comparison results still work).
Original statement:
"Unsurprisingly, akrun's is optimal :-)"
is now superseded by:
"Unsurprisingly, rawr's is optimal :-)"
Basically you want to compute each of the numbers in 1:100 modulo 2, 3, and 7. You could use outer to perform all the modulo operations in a single vectorized operation, using rowSums to identify the elements in 1:100 that are not perfectly divided by 2, 3, or 7.
v1 <- 1:100
v1[rowSums(outer(v1, c(2, 3, 7), "%%") == 0) == 0]
# [1] 1 5 11 13 17 19 23 25 29 31 37 41 43 47 53 55 59 61 65 67 71 73 79 83 85 89 95 97
We can do this in a loop using lapply using the modulo operator, convert the 0 to TRUE by negating (!), use Reduce with | to find the corresponding list elements that are either TRUE, negate and subset the 'v1'
v1[!Reduce(`|`,lapply(c(2,3,7), function(x) !(v1 %%x)))]
Or instead of looping, this can be also done in a faster way.
v1[!(!v1%%2) + (!v1%%3) + (!v1%%7)]
data
v1 <- seq(100)
The other answers are better, but if you really need to use a for loop, as this question suggests, this could be a possibility:
x <- vector()
n <- 1L
for(i in 1:100){if (i%%2!=0 & i%%3!=0 & i%%7!=0) {x[n] <- i; n <- n+1}}
#> x
# [1] 1 5 11 13 17 19 23 25 29 31 37 41 43 47 53 55 59 61 65 67 71 73 79 83 85 89 95 97
As already mentioned, the other answers posted here are better because they exploit the vectorized capabilities of R. The short code shown here is probably slower than any of the other answers and more complicated to maintain. It is the typical syntax of other programming languages, like C or FORTRAN, applied to R. It works, but it is not the way things should be done.
Rather than using modulo arithmetic explicitly, we can generate the negative modulo sequence easily by counting down. Then for each of the three sequences, we can OR them all together, then drop it into which().
which(as.logical(pmin(rep_len(1:0, 100),
rep_len(2:0, 100),
rep_len(6:0, 100))))
If we want to be a bit less hardcoded, we might use do.call with lapply():
which(as.logical(do.call(pmin, lapply(c(2,3,7)-1, function(x)rep_len(x:0, 100)))))
EDIT:
Here's one way to do it using logicals:
v1 <- logical(100); for (ii in c(2,3,7) -1) v1 <- v1 | rep_len(rep(c(F,T), c(ii,1)), 100) ; which(!v1)
I had the same problem in my class. I assumed the teacher gave me all the information I needed to find the answer and I was correct. This is week one and all that other silly stuff all you other advanced people used has not came up.
I did this though.
r = c(1:100)
which(r %% 3 == 0 & r %% 7 == 0 & r %% 2 == 0)
Use the which function.

replacing specific elements of a vector

I am trying to make a user-defined function below using the R
wrkexpcode.into.month <- function(vec) {
tmp.vec <- vec
tmp.vec[tmp.vec == 0 | tmp.vec == 9] <- NA
tmp.vec[tmp.vec == 1] <- 4
tmp.vec[tmp.vec == 2] <- 13
tmp.vec[tmp.vec == 3] <- 31
tmp.vec[tmp.vec == 4] <- 78
tmp.vec[tmp.vec == 5] <- 174
tmp.vec[tmp.vec == 6] <- 240
return (tmp.vec)
}
but when I execute with a simple command like
wrkexpcode.into.month(c(3,2,2,3,1,3,5,6,4))
the result comes like
[1] 31 13 13 31 78 31 174 240 78
but I expect the result like
[1] 31 13 13 31 **4** 31 174 240 78
How can I fix this?
You have to carefully follow the flow of your function, evaluating what the values are. You are expecting 1 to be replaced by 4 based on tmp.vec[tmp.vec == 1] <- 4, however in tmp.vec[tmp.vec == 4] <- 78 later down the road, the 4 is replaced by a 78. This is caused by replacing the values in tmp.vec and using tmp.vec for determining what needs to be replaced. Like #MattewPlourde said, you need to base the replacement on vec:
tmp.vec[vec == 1] <- 4
Although I would simply replace the code by:
wrkexpcode.into.month <- function(vec) {
translation_vector = c('0' = NA, '1' = 4, '2' = 13, '3' = 31,
'4' = 78, '5' = 174, '6' = 240, '9' = NA)
return(translation_vector[as.character(vec)])
}
wrkexpcode.into.month(c(3,2,2,3,1,3,5,6,4))
# 3 2 2 3 1 3 5 6 4
# 31 13 13 31 4 31 174 240 78
See also a blogpost I wrote recently about this kind of operation.
It think it will be much easier to use one of the many recode functions that are designed for such purposes instead of hard-coding it. It's just a one-liner then, e.g.
library(likert)
x <- c(3,2,2,3,1,3,5,6,4)
recode(x, from=c(0:6, 9), to=c(NA, 4,13,31,78,174,240,NA))
[1] 31 13 13 31 4 31 174 240 78
And if desired, wrap it into a function, e.g.
wrkexpcode.into.month <- function(x)
recode(x, from=c(0:6, 9), to=c(NA, 4,13,31,78,174,240,NA))
wrkexpcode.into.month(x)
[1] 31 13 13 31 4 31 174 240 78
You could create matrix pointing the input value (column1) to the desired output value (column2)
table=matrix(c(0,1,2,3,4,5,6,9,NA,4,13,31,78,174,240,NA),ncol=2)
And using sapply on the vector c(3,2,2,3,1,3,5,6,4)
sapply(c(3,2,2,3,1,3,5,6,4), function(x) table[which(table[,1] == x),2] )
to give you the desired output too

Resources