Integers that are not divisible by several numbers - r

I am trying to print a vector with the integers between 1 and 100 that are not divisible by 2, 3 and 7 in R.
I tried seq but I am not sure how to continue.

Another option is to use Filter to, well, filter the sequence for any number that meets your condition:
Filter(function(i) { all(i %% c(2,3,7) != 0) }, seq(100))
## [1] 1 5 11 13 17 19 23 25 29 31 37 41 43 47 53 55 59 61 65 67 71 73 79 83 85 89 95 97
Note that while this may (IMO) be the most readable, it's the worst in terms of performance (so far):
UPDATED to take into account rawr's for loop solution:
microbenchmark(
filter={ v1 <- seq(100); Filter(function(i) { all(i %% c(2,3,7) != 0) }, v1) },
reduce={ v1 <- seq(100); v1[!Reduce(`|`,lapply(c(2,3,7), function(x) !(v1 %%x)))] },
rowout={ v1 <- seq(100); v1[rowSums(outer(v1, c(2, 3, 7), "%%") == 0) == 0] },
looopy={ v1 <- seq(100); for (ii in c(2,3,7)) v1 <- v1[-which(v1 %% ii == 0)]; v1 },
times=1000
)
## Unit: microseconds
## expr min lq mean median uq max neval cld
## filter 108.280 118.7000 143.88592 126.2155 136.6290 2349.952 1000 c
## reduce 21.552 23.8095 25.91997 24.8150 25.8580 144.067 1000 ab
## rowout 26.075 28.4920 31.11812 29.5350 31.2125 184.225 1000 b
## looopy 14.149 16.0765 18.11806 16.8995 17.8595 160.485 1000 a
To make it fair I added sequence generation to all of them (and, I was doing this to compare relative performance vs actual speed anyway, so the comparison results still work).
Original statement:
"Unsurprisingly, akrun's is optimal :-)"
is now superseded by:
"Unsurprisingly, rawr's is optimal :-)"

Basically you want to compute each of the numbers in 1:100 modulo 2, 3, and 7. You could use outer to perform all the modulo operations in a single vectorized operation, using rowSums to identify the elements in 1:100 that are not perfectly divided by 2, 3, or 7.
v1 <- 1:100
v1[rowSums(outer(v1, c(2, 3, 7), "%%") == 0) == 0]
# [1] 1 5 11 13 17 19 23 25 29 31 37 41 43 47 53 55 59 61 65 67 71 73 79 83 85 89 95 97

We can do this in a loop using lapply using the modulo operator, convert the 0 to TRUE by negating (!), use Reduce with | to find the corresponding list elements that are either TRUE, negate and subset the 'v1'
v1[!Reduce(`|`,lapply(c(2,3,7), function(x) !(v1 %%x)))]
Or instead of looping, this can be also done in a faster way.
v1[!(!v1%%2) + (!v1%%3) + (!v1%%7)]
data
v1 <- seq(100)

The other answers are better, but if you really need to use a for loop, as this question suggests, this could be a possibility:
x <- vector()
n <- 1L
for(i in 1:100){if (i%%2!=0 & i%%3!=0 & i%%7!=0) {x[n] <- i; n <- n+1}}
#> x
# [1] 1 5 11 13 17 19 23 25 29 31 37 41 43 47 53 55 59 61 65 67 71 73 79 83 85 89 95 97
As already mentioned, the other answers posted here are better because they exploit the vectorized capabilities of R. The short code shown here is probably slower than any of the other answers and more complicated to maintain. It is the typical syntax of other programming languages, like C or FORTRAN, applied to R. It works, but it is not the way things should be done.

Rather than using modulo arithmetic explicitly, we can generate the negative modulo sequence easily by counting down. Then for each of the three sequences, we can OR them all together, then drop it into which().
which(as.logical(pmin(rep_len(1:0, 100),
rep_len(2:0, 100),
rep_len(6:0, 100))))
If we want to be a bit less hardcoded, we might use do.call with lapply():
which(as.logical(do.call(pmin, lapply(c(2,3,7)-1, function(x)rep_len(x:0, 100)))))
EDIT:
Here's one way to do it using logicals:
v1 <- logical(100); for (ii in c(2,3,7) -1) v1 <- v1 | rep_len(rep(c(F,T), c(ii,1)), 100) ; which(!v1)

I had the same problem in my class. I assumed the teacher gave me all the information I needed to find the answer and I was correct. This is week one and all that other silly stuff all you other advanced people used has not came up.
I did this though.
r = c(1:100)
which(r %% 3 == 0 & r %% 7 == 0 & r %% 2 == 0)
Use the which function.

Related

How to use Logical indexing and min function to find the row which has min value?

So, I know how to find it using the subset function. Is there any way not to use subset function?
Example dataset:
Month A B
J 67 89
F 48 69
M 78 89
A 54 90
M 54 75
So, lets say I need to write a code to find the min value in Column B.
My Code: subset(df, B == min(df)
My question:
How to use Logical indexing and min function for this dataset? I don't wanna use subset.
You can use which to find the postitions.
x <- c(2,1,3,1)
which(x == min(x))
#[1] 2 4
To get the first hit which.min could be used.
which.min(x)
#[1] 2
With the given data set.
x <- read.table(header=TRUE, text="Month A B
J 67 89
F 48 69
M 78 89
A 54 90
M 54 75")
which(x$B == min(x$B))
#[1] 2
which(x[2:3] == min(x[2:3]), TRUE)
# row col
#[1,] 2 1

reducing repetitive tasks in data.table in R

I notice that i am doing the same thing multiple time, just with slightly different values:
HCCtreshold <- 40000
claimsMonthly[, HCC12mnth := +(HCCtreshold < claim12month)][ HCC12mnth == 1, `:=` (aboveHCCth12mnth = (claim12month - HCCtreshold))][is.na(aboveHCCth12mnth),aboveHCCth12mnth := 0]
claimsMonthly[, HCC11mnth := +(HCCtreshold < claim11month)][ HCC11mnth == 1, `:=` (aboveHCCth11mnth = (claim11month - HCCtreshold))][is.na(aboveHCCth11mnth),aboveHCCth11mnth := 0]
claimsMonthly[, HCC10mnth := +(HCCtreshold < claim10month)][ HCC10mnth == 1, `:=` (aboveHCCth10mnth = (claim10month - HCCtreshold))][is.na(aboveHCCth10mnth),aboveHCCth10mnth := 0]
So started with something like this:
k <- seq.default(from = 8, to = 12, by = 1)
claimsMonthly[paste0("HCC", k, "mnth") := lapply(k, function(x) (+(HCCtreshold < paste0("HCC", k, "mnth"))))]
i get an error:
Error: Check that is.data.table(DT) == TRUE. Otherwise, := and `:=`(...) are defined for use in j, once only and in particular ways. See help(":=").
I also tried:
for(k in 8:12){
claimsMonthly[, paste0("HCC", k, "mnth") := +(HCCtreshold < paste0("HCC", k, "mnth"))]
}
the columns are created correctly, but i get incorrect values inside them. I get an 1 everywhere
I am not sure what i am doing wrong?
I can offer some suggestions and, with some fake data, try them out.
You can programmatically define names on the left-hand side of := if you wrap a vector in c(...), so for instance DT[ c(vec_of_names) := list(some, values)].
You can programmatically retrieve values of variables with a vector of variable names and mget. While I generally think mget can indicate problematic code, I believe that in here it works with low risk. (While mget and get normally retrieve variables from the operating environment, often .GlobalEnv, from within a data.table operation then retrieve columns just as easily.)
Instead of a double-tap of assignment with == 1 and then is.na(...), we can use some logical trickery and the data.table::fcoalesce function. (If you aren't familiar, fcoalesce operates like SQL's coalesce function which is a vector-friendly way of finding the first non-NA value in arguments of vectors.
fcoalesce(c(1, 2, NA, NA), c(11, 12, 13, NA), c(21, 22, 23, 24))
# [1] 1 2 13 24
We can use fcoalesce(some + math * calc, 0) to do the math and, if NA, replace it with 0. (We use it on the above* variables below, and not necessarily on the HCC* logical variables. It can apply there too, if desired. If those HCC* variables are throw-away, though, it just doesn't matter.)
Fake data:
library(data.table)
set.seed(42)
hccthreshold <- 50
dat <- data.table( claim10month = sample(99, 10), claim11month = sample(99, 10), claim12month = sample(99, 10) )
dat$claim11month[5] <- NA
dat
# claim10month claim11month claim12month
# 1: 91 46 90
# 2: 92 71 14
# 3: 28 91 96
# 4: 80 25 91
# 5: 61 NA 8
# 6: 49 89 49
# 7: 69 97 37
# 8: 13 11 84
# 9: 60 95 41
# 10: 64 51 76
First, let's programmatically determine the column names we want to act on, and from then create the same vectors for the new variables. (I'm a big fan of determining and adapting these variable names programmatically, so that if you get a partial data set your code still works. You might consider setting checks and alarms to catch something wrong. For instance, stopifnot(length(claimnames) == 12L), in case you are expecting to always have precisely 12 months.)
claimnames <- grep("^claim[0-9]+month", colnames(dat), value = TRUE)
hccnames <- gsub("^claim", "HCC", claimnames)
abovenames <- gsub("^claim", "aboveHCC", claimnames)
claimnames
# [1] "claim10month" "claim11month" "claim12month"
hccnames
# [1] "HCC10month" "HCC11month" "HCC12month"
abovenames
# [1] "aboveHCC10month" "aboveHCC11month" "aboveHCC12month"
And now, we can process the data.
dat[, c(hccnames) := lapply(mget(claimnames), `>`, hccthreshold) ]
dat[, c(abovenames) := Map(function(hcc, clm) fcoalesce(clm - hcc * hccthreshold, 0),
mget(hccnames), mget(claimnames)) ]
dat
# claim10month claim11month claim12month HCC10month HCC11month HCC12month aboveHCC10month aboveHCC11month aboveHCC12month
# 1: 91 46 90 TRUE FALSE TRUE 41 46 40
# 2: 92 71 14 TRUE TRUE FALSE 42 21 14
# 3: 28 91 96 FALSE TRUE TRUE 28 41 46
# 4: 80 25 91 TRUE FALSE TRUE 30 25 41
# 5: 61 NA 8 TRUE NA FALSE 11 0 8
# 6: 49 89 49 FALSE TRUE FALSE 49 39 49
# 7: 69 97 37 TRUE TRUE FALSE 19 47 37
# 8: 13 11 84 FALSE FALSE TRUE 13 11 34
# 9: 60 95 41 TRUE TRUE FALSE 10 45 41
# 10: 64 51 76 TRUE TRUE TRUE 14 1 26
I chose to keep the HCC* variables as logical instead of your +(...) integers, but it's directly translatable and up to you.

How can I extract a part of a vector to another vector (including positions)

I have a vector with different values (positive and negative), so, I want to select only the 10 lowest odd number values, and the 10 lowest pair values. Help me, please!
This is a way to do it using base R.
vector with odd and even numbers
x <- sample(-100:100, 30)
The modulus operator in R help to get the job done. You can use it this way
c(
# Extract the lowest even numbers
head(sort(x[x %% 2 == 0]), 5),
# Extract the lowest odds numbers
head(sort(x[x %% 2 == 1]), 5)
)
Given vector vas your input vector, you can obtain the desired output (including positions) via the following code
names(v) <- seq_along(v)
# lowest 10 odd numbers
low_odd <- sort(v[v%%2==1])[1:10]
# positions of those odd numbers in v
low_odd_pos <- as.numeric(names(low_odd))
# lowest 10 even numbers
low_even <- sort(v[v%%2==0])[1:10]
# positions of those even numbers in v
low_even_pos <- as.numeric(names(low_even))
Example
set.seed(1)
v <- sample(-50:50)
then
> low_odd
43 101 39 95 85 72 7 73 45 29
-49 -47 -45 -43 -41 -39 -37 -35 -33 -31
> low_odd_pos
[1] 43 101 39 95 85 72 7 73 45 29

Flip Every Nth Coin in R [duplicate]

This question already has answers here:
R: How to use ifelse statement for a vector of characters
(2 answers)
Closed 6 years ago.
My friend gave me a brain teaser that I wanted to try on R.
Imagine 100 coins in a row, with heads facing up for all coins. Now every 2nd coin is flipped (thus becoming tails). Then every 3rd coin is flipped. How many coins are now showing heads?
To create the vector, I started with:
flips <- rep('h', 100)
levels(flips) <- c("h", "t")
Not sure how to proceed from here. Any help would be appreciated.
Try this:
coins <- rep(1, 100) # 1 = Head, 0 = Tail
n = 3 # run till the time when you flip every 3rd coin
invisible(sapply(2:n function(i) {indices <- seq(i, 100, i); coins[indices] <<- (coins[indices] + 1) %% 2}) )
which(coins == 1)
# [1] 1 5 6 7 11 12 13 17 18 19 23 24 25 29 30 31 35 36 37 41 42 43 47 48 49 53 54 55 59 60 61 65 66 67 71 72 73 77 78 79 83 84 85 89 90 91 95 96 97
sum(coins==1)
#[1] 49
If you run till n = 100, only the coins at the positions which are perfect squares will be showing heads.
coins <- rep(1, 100) # 1 = Head, 0 = Tail
n <- 100
invisible(sapply(2:n, function(i) {indices <- seq(i, 100, i); coins[indices] <<- (coins[indices] + 1) %% 2}) )
which(coins == 1)
# [1] 1 4 9 16 25 36 49 64 81 100
sum(coins==1)
# [1] 10

Modified Bootstrapping

I'm interested in developing a modified bootstrap that samples some vector of length x, with replacement, but must meet a number of number of criteria before stopping the sampling. I'm attempting to calculate confidence intervals for lambda of a populations growth rate, 10000 iterations, but in some groupings of individuals, say vector 13, there are very few individuals growing out of the group. Typical bootstrapping would lead to a fair number instances where growth in this vector does not occur and hence the model falls apart. Each vector consists of a certain number of 1's, 2's, and 3's where 1 represents staying within a group, 2 growing out of a group, and 3 death. Here is what I have so far without the modification, it is likely not the best approach time wise, but I am new to R.
st13 <- c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,3,3)
#runs
n <- 10000
stage <- st13
stagestay <- vector()
stagemoved <- vector()
stagedead <- vector()
for(i in 1:n){
index <- sample(stage, replace=T)
stay <- ((length(index[index==1]))/(length(index)))
moved <- ((length(index[index==2]))/(length(index)))
stagestay <- rbind(stagestay,stay)
stagemoved <- rbind(stagemoved,moved)
}
Currently, this samples
My question is then: In what way can I modify the sample function to continue sampling these numbers until the length of "index" is at least the same as st13 AND until at least 1 instance of a 2 is present in "index"?
Thanks very much,
Kristopher Hennig
Masters Student
University of Mississippi
Oxford, MS, 38677
Update:
The answer from #lselzer reminded me that the requirement was for the length of the sample to be at least as long as st13. My code above just keeps sampling until it finds a bootstrap sample that contains a 2. The code of #lselzer grows the sample, 1 new index at a time, until the sample contains a 2. This is quite inefficient as you might have to call sample() many times till you get 2. My code might repeat a long time before a 2 is returned in the sample. So can we do any better?
One way would be to sample a large sample with replacement using a single call to sample(). Check which are 2s and see if there is a 2 within the first length(st13) entries. If there is, return those entries, if not, find the first 2 in the large sample and return all entries up to an including that one. If there are no 2s, add on another large sample and repeat. Here is some code:
#runs
n <- 100 #00
stage <- st13
stagedead <- stagemoved <- stagestay <- Size <- vector()
sampSize <- 100 * (len <- length(stage)) ## sample size to try
for(i in seq_len(n)){
## take a large sample
samp <- sample(stage, size = sampSize, replace = TRUE)
## check if there are any `2`s and which they are
## and if no 2s expand the sample
while(length((twos <- which(samp == 2))) < 1) {
samp <- c(samp, sample(stage, size = sampSize, replace = TRUE))
}
## now we have a sample containing at least one 2
## so set index to the required set of elements
if((min.two <- min(twos)) <= len) {
index <- samp[seq_len(len)]
} else {
index <- samp[seq_len(min.two)]
}
stay <- length(index[index==1]) / length(index)
moved <- length(index[index==2]) / length(index)
stagestay[i] <- stay
stagemoved[i] <- moved
Size[i] <- length(index)
}
Here is a really degenerate vector with only a single 2 in 46 entries:
R> st14 <- sample(c(rep(1, 45), 2))
R> st14
[1] 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[39] 1 1 1 1 1 1 1 1
If I use the above loop on it rather than st13, I get the following for the minimum sample size required to get a 2 on each of the 100 runs:
R> Size
[1] 65 46 46 46 75 46 46 57 46 106 46 46 46 66 46 46 46 46
[19] 46 46 46 46 46 279 52 46 63 70 46 46 90 107 46 46 46 87
[37] 130 46 46 46 46 46 46 60 46 167 46 46 46 71 77 46 46 84
[55] 58 90 112 52 46 53 85 46 59 302 108 46 46 46 46 46 174 46
[73] 165 103 46 110 46 80 46 166 46 46 46 65 46 46 46 286 71 46
[91] 131 61 46 46 141 46 46 53 47 83
So it would suggest that the sampSize I chose (100 * length(stage)) is a bit of overkill here but as all the operators we are using are vectorised we probably don't incur to much penalty for the overly long initial sample size, and we certainly don't incur any extra sample() calls.
Original:
If I understand you correctly, the problem is that sample() might not return any 2 indicies at all. If so, we can continue sampling until it does using the repeat control flow construct.
I've altered your code accordingly, and optimised it a bit because you never grow objects in a loop like you were doing. There are other ways this could be improved, but I'll stick with the loop for now. Explanation comes below.
st13 <- c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,3,3)
#runs
n <- 10000
stage <- st13
stagedead <- stagemoved <- stagestay <- vector()
for(i in seq_len(n)){
repeat {
index <- sample(stage, replace = TRUE)
if(any(index == 2)) {
break
}
}
stay <- length(index[index==1]) / length(index)
moved <- length(index[index==2]) / length(index)
stagestay[i] <- stay
stagemoved[i] <- moved
}
This is the main change related to your Q:
repeat {
index <- sample(stage, replace = TRUE)
if(any(index == 2)) {
break
}
}
what this does is repeat the code contained in the braces until a break is triggered to jump us out of the repeat loop. So what happens is we take a bootstrap sample, then check if any of the sample contains the index 2. If there are any 2s then we break out and carry on with the rest of the current for loop iteration. If the sample doesn't contain any 2s, the break is not triggered and we go round again taking another sample. This will happen until we do get a sample with a 2 in it.
For starters, sample has a size argument which you could use to match the length of st13. The second part of your question could be solved using a while loop.
st13 <- c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,3,3)
#runs
n <- 10000
stage <- st13
stagestay <- vector()
stagemoved <- vector()
stagedead <- vector()
for(i in 1:n){
index <- sample(stage, length(stage), replace=T)
while(!any(index == 2)) {
index <- c(index, sample(stage, 1, replace = T))
}
stay <- ((length(index[index==1]))/(length(index)))
moved <- ((length(index[index==2]))/(length(index)))
stagestay[i] <- stay
stagemoved[i] <- moved
}
While I was writing this Gavin posted his answer which is similar to mine, but I added the size argument to be sure index has at least the lenght of st13

Resources