For each column, count number of rows until condition is met - r

I would like to create a vector whose values are derived from counting the number of rows (for each column), starting at the last row, and counting "up" until a one is reached. For example,
1 1 1
1 1 0
1 0 0
would result in the following answer 0 1 2. There are 0 rows until 1 is reached in column one, and 1 row until a 1 is reached for column 2, etc.
I would like to implement the solution described above into the following code (#TimeSince):
Lattice <- rep(NA_integer_, 6) #
Results <- rep(0, 6) #
TimeSince <- rep(0,6) #
Prob <- c(0.92, 0.90, 0.85, 0.80, 0.35, 0.15)
resultList <- list()
for (j in 1:100) {
for (i in 1:6){
if (runif(1,min=0, max=1) < Prob[i]){
Lattice[i] <- 1}
else{Lattice[i:6] <- 0}
if (Lattice[i] == 0) break()}
resultList[[j]] <- Lattice
Results <- Lattice + Results
#TimeSince[[j]] <- count rows until '1' in Results per column
}

The following code produces the result I was seeking above:
#Edited code based on Ben Bolker comment
for (j in 1:100) {
for (i in 1:6){
if (runif(1,min=0, max=1) < Prob[i]){
Lattice[i] <- 1}
else{Lattice[i:6] <- 0}
if (Lattice[i] == 0) break()}
resultList[[j]] <- Lattice
dm <- do.call("rbind",resultList) # added in edit
TimeSince <- (apply(dm,2,function(x) which(rev(x)==1)[1])) - 1 # added in edit
}

Related

Locating the row and col number with the entry number in R

I am trying to find the row and column numbers of a matrix once I have the entry number. For example if I am trying to find a 0 in a matrix full of numbers I would try something like this
test_array <- array(1,c(30,30))
test_array[200] <- 0
counter <- 0
for(i in test_array){
counter <- counter + 1
if(i == 0){
print(counter)
}
}
200
So now I know that at position 200 I have a 0 but how do I check where it is in terms of its row and col.
Something like (15, 8)
You can use the arr.ind argument in which:
which(test_array == 0, arr.ind = TRUE)
#> row col
#> [1,] 20 7

Storing simulation results in R

I want to estimate Mantel-Haenszel Differential Item Functioning (DIF) Odds Ratio and HMDDIF index. I wrote the function below. It seems to me I am making a mistake when storing the results. Would you please take a look at this and give me feedback?
Here is the sample data:
# generate dataset
r <- 1000
c <- 16
test <- matrix(rbinom(r*c,1,0.5),r,c)
# create sum scores for each student using first 15 columns
test <- cbind(test, apply(test[,1:15],1,sum))
colnames(test) <- c("v1","v2","v3","v4","v5","v6","v7","v8","v9","v10","v11","v12","v13","v14","v15","group","score")
test <- as.data.frame(test)
The first 15 columns are the student True/false responses to items/questions. The group membership column is the 16th column. The student "score" variable is the sum of item scores at the last (17th) column. The formula can be found here in the picture that I got from Wikipedia (https://en.wikipedia.org/wiki/Differential_item_functioning).
For each of the score category, I want to estimate the last two formulas in this picture. Rows are 10 students and columns are six items/questions. Again, the 16th column is group membership (1-focal, 0-reference)
Here is my function code.
library(dplyr)
# this function first starts with the first item and loop k scores from 1-15. Then move to the second item.
# data should only contain the items, grouping variable, and person score.
Mantel.Haenszel <- function (data) {
# browser() #runs with debug
for (item in 1:15) { #item loop not grouping/scoring
item.incorrect <- data[,item] == 0
item.correct <- data[,item] == 1
Results <- c()
for (k in 1:15) { # for k scores
Ak <- nrow(filter(data, score == k, group == 0, item.correct)) # freq of ref group & correct
Bk <- nrow(filter(data, score == k, group == 0, item.incorrect)) # freq of ref group & incorrect
Ck <- nrow(filter(data, score == k, group == 1, item.correct)) # freq of foc group & correct
Dk <- nrow(filter(data, score == k, group == 1, item.incorrect)) # freq of foc group & incorrect
nrk <- nrow(filter(data, score == k, group == 0)) #sample size for ref
nfk <- nrow(filter(data, score == k, group == 1)) #sample size for focal
if (Bk == 0 | Ck == 0) {
next
}
nominator <-sum((Ak*Dk)/(nrk + nfk))
denominator <-sum((Bk*Ck)/(nrk + nfk))
odds.ratio <- nominator/denominator
if (odds.ratio == 0) {
next
}
MH.D.DIF <- (-2.35)*log(odds.ratio) #index
# save the output
out <- list("Odds Ratio" = odds.ratio, "MH Diff" = MH.D.DIF)
results <- rbind(Results, out)
return(results)
} # close score loop
} # close item loop
} #close function
Here is what I get
# test funnction
Mantel.Haenszel(test)
> Mantel.Haenszel(test)
Odds Ratio MH Diff
out 0.2678571 3.095659
What I want to get is
> Mantel.Haenszel(test)
Odds Ratio MH Diff
out 0.2678571 3.095659
## ##
.. ..
(15 rows here for 15 score categories in the dataset)
Should you not expect a result for every combination of item and k, for a max number of output rows of 225, barring any instances with break? If so, I think you just need to change a few minor things. First, declare Results only once, at the beginning of your function. Then, make sure you are rbind-ing and returning either Results or results, but not both. Then, move yourreturn to your actual function level rather than the loops. In the example below I've also included the current item and k for demonstration:
Mantel.Haenszel <- function (data) {
# browser() #runs with debug
Results <- c()
for (item in 1:15) {
#item loop not grouping/scoring
item.incorrect <- data[, item] == 0
item.correct <- data[, item] == 1
for (k in 1:15) {
# for k scores
Ak <-
nrow(filter(data, score == k, group == 0, item.correct)) # freq of ref group & correct
Bk <-
nrow(filter(data, score == k, group == 0, item.incorrect)) # freq of ref group & incorrect
Ck <-
nrow(filter(data, score == k, group == 1, item.correct)) # freq of foc group & correct
Dk <-
nrow(filter(data, score == k, group == 1, item.incorrect)) # freq of foc group & incorrect
nrk <-
nrow(filter(data, score == k, group == 0)) #sample size for ref
nfk <-
nrow(filter(data, score == k, group == 1)) #sample size for focal
if (Bk == 0 | Ck == 0) {
next
}
nominator <- sum((Ak * Dk) / (nrk + nfk))
denominator <- sum((Bk * Ck) / (nrk + nfk))
odds.ratio <- nominator / denominator
if (odds.ratio == 0) {
next
}
MH.D.DIF <- (-2.35) * log(odds.ratio) #index
# save the output
out <-
list(
item = item,
k = k,
"Odds Ratio" = odds.ratio,
"MH Diff" = MH.D.DIF
)
Results <- rbind(Results, out)
} # close score loop
} # close item loop
return(Results)
} #close function
test.output <- Mantel.Haenszel(test)
Gives an output like:
> head(test.output, 20)
item k Odds Ratio MH Diff
out 1 3 2 -1.628896
out 1 4 4.666667 -3.620046
out 1 5 0.757085 0.6539573
out 1 6 0.5823986 1.27041
out 1 7 0.9893293 0.02521097
out 1 8 1.078934 -0.1785381
out 1 9 1.006237 -0.01461145
out 1 10 1.497976 -0.9496695
out 1 11 1.435897 -0.8502066
out 1 12 1.5 -0.952843
out 2 3 0.8333333 0.4284557
out 2 4 2.424242 -2.08097
out 2 5 1.368664 -0.7375117
out 2 6 1.222222 -0.4715761
out 2 7 0.6288871 1.089938
out 2 8 1.219512 -0.4663597
out 2 9 1 0
out 2 10 2.307692 -1.965183
out 2 11 0.6666667 0.952843
out 2 12 0.375 2.304949
Is that what you're looking for?

R: find consecutive occurrence of a number

first define some function to bind list rowwise and column wise
# a function to append vectors row wise
rbindlist <- function(list) {
n <- length(list)
res <- NULL
for (i in seq(n)) res <- rbind(res, list[[i]])
return(res)
}
cbindlist <- function(list) {
n <- length(list)
res <- NULL
for (i in seq(n)) res <- cbind(res, list[[i]])
return(res)
}
# generate sample data
sample.dat <- list()
set.seed(123)
for(i in 1:365){
vec1 <- sample(c(0,1), replace=TRUE, size=5)
sample.dat[[i]] <- vec1
}
dat <- rbindlist(sample.dat)
dat has five columns. Each column is a location and has 365 days of the year (365 rows) with values 1 or 0.
I have another dataframe (see below) which has certain days of the year for each column (location) in dat.
# generate second sample data
set.seed(123)
sample.dat1 <- list()
for(i in 1:5){
vec1 <- sort(sample(c(258:365), replace=TRUE, size=4), decreasing = F)
sample.dat1[[i]] <- vec1
}
dat1 <- cbindlist(sample.dat1)
I need to use dat1 to subset days in dat to do a calculation. An example below:
1) For location 1 (first column in both dat1 and dat):
In column 1 of dat, select the days from 289 till 302 (using dat1), find the longest consecutive occurrence of 1.
Repeat it and this time select the days from 303 (302 + 1) till 343 from dat, find the longest consecutive occurrence of 1.
Repeat it for 343 till 353: select the days from 344 (343 + 1) till 353, find the longest consecutive occurrence of 1.
2) Do this for all the columns
If I want to do sum of 1s, I can do this:
dat <- as.tibble(dat)
dat1 <- as.tibble(dat1)
pmap(list(dat,dat1), ~ {
range1 <- ..2[1]
range2 <- ..2[2]
range3 <- ..2[3]
range4 <- ..2[4]
sum.range1 <- sum(..1[range1:range2]) # this will generate sum between range 1 and range 2
sum.range2 <- sum(..1[range2:range3]) # this will generate sum between range 2 and range 3
sum.range3 <- sum(..1[range3:range4]) # this will generate sum between range 3 and range 4
c(sum.range1=sum.range1,sum.range2=sum.range2,sum.range3=sum.range3)
})
For longest consequtive occurrence of 1 between each range, I thought of using the rle function. Example below:
pmap(list(dat,dat1), ~ {
range1 <- ..2[1]
range2 <- ..2[2]
range3 <- ..2[3]
range4 <- ..2[4]
spell.range1 <- rle(..1[range1:range2]) # sort the data, this shows the longest run of ANY type (0 OR 1)
spell.1.range1 <- tapply(spell.range1$lengths, spell.range1$values, max)[2] # this should select the maximum consequtive run of 1
spell.range2 <- rle(..1[range2:range3]) # sort the data, this shows the longest run of ANY type (0 OR 1)
spell.1.range2 <- tapply(spell.range2$lengths, spell.range2$values, max)[2] # this should select the maximum consequtive run of 1
spell.range3 <- rle(..1[range3:range4]) # sort the data, this shows the longest run of ANY type (0 OR 1)
spell.1.range3 <- tapply(spell.range3$lengths, spell.range3$values, max)[2] # this should select the maximum consequtive run of 1
c(spell.1.range1 = spell.1.range1, spell.1.range2 = spell.1.range2, spell.1.range3 = spell.1.range3)
})
I get an error which I think is because I am not using the rle function properly here. I would really like to keep the code as above since
my others code are in the same pattern and format of the outputs is suited for my need, so I would appreciate if someone can suggest how to fix it.
OP's code does work for me. So, without a specific error message it is impossible to understand why the code is not working for the OP.
However, the sample datasets created by the OP are matrices (before they were coerced to tibble) and I felt challenged to find a way to solve the task in base R without using purrr:
To find the number of consecutive occurences of a particular value val in a vector x we can use the following function:
max_rle <- function(x, val) {
y <- rle(x)
len <- y$lengths[y$value == val]
if (length(len) > 0) max(len) else NA
}
Examples:
max_rle(c(0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1), 1)
[1] 4
max_rle(c(0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1), 0)
[1] 2
# find consecutive occurrences in column batches
lapply(seq_len(ncol(dat1)), function(col_num) {
start <- head(dat1[, col_num], -1L)
end <- tail(dat1[, col_num], -1L) - 1
sapply(seq_along(start), function(range_num) {
max_rle(dat[start[range_num]:end[range_num], col_num], 1)
})
})
[[1]]
[1] 8 4 5
[[2]]
[1] 4 5 2
[[3]]
[1] NA 3 4
[[4]]
[1] 5 5 4
[[5]]
[1] 3 2 3
The first lapply() loops over the columns of dat and dat1, resp. The second sapply() loops over the row ranges stored in dat1 and subsets dat accordingly.

R create vector with a for and while loop

Good morning,
I have the following problem.
My Data.frame "data" has the format:
Type amount
1 2
2 0
3 3
I would like to create a vector with the format:
1
1
3
3
3
This means I would like to transform my data.
I created a vector and wrote the following code for my transformation in R:
vector <- numeric(5)
for (i in 1:3){
k <- 1
while (k <= data[i,2]){
vector[k] <- data[i,1]
k <- k+1
}
}
The problem is, I get the following results and I have no Idea at which part I go wrong…
3
3
3
0
0
There might be many different ways in solving this particular problem in R but I am curious why my solution doesn't work. I am thankful for alternatives, but really would like to know what my mistake is.
Thank's for your help!
Try this solution:
df <- data.frame(type = c(1, 2, 3), amount = c(2, 0, 3))
result <- unlist(mapply(function(x, y) rep.int(x, y), df[, "type"], df[, "amount"]))
result
Output is following:
# [1] 1 1 3 3 3
Exaclty your code is buggy. Correct code should looks following:
df <- data.frame(type = c(1, 2, 3), amount = c(2, 0, 3))
vector <- numeric(5)
k <- 1
for (i in 1:3) {
j <- 1
while (j <= df[i, 2]) {
vector[k] <- df[i, 1]
k <- k + 1
j <- j + 1
}
}
vector
# [1] 1 1 3 3 3
Probably the fastest and most elegant way to obtain this result has been posted before in a comment by #akrun:
with(data, rep(Type, amount))
[1] 1 1 3 3 3
However, if you want to do this with for/while loops, it could be helpful to use a list for such cases, where the number of entries is not known at the beginning.
Here is an example with minimal modifications of your code:
my_list <- vector("list", 3)
for (i in 1:3) {
k <- 1
while (k <= data[i,2]){
my_list[[i]][k] <- data[i,1]
k <- k + 1
}
}
vector <- unlist(my_list)
#> vector
#[1] 1 1 3 3 3
The reason why your code didn't work was essentially that you were trying to put too much information into a single variable, k. It cannot serve as both, an index of your output vector, and as a counter for the individual entries in the first column of data; a counter which is reset to 1 each time the while loop has finished.

Euler Project #1 in R

Problem
Find the sum of all numbers below 1000 that can be divisible by 3 or 5
One solution I created:
x <- c(1:999)
values <- x[x %% 3 == 0 | x %% 5 == 0]
sum(values
Second solution I can't get to work and need help with. I've pasted it below.
I'm trying to use a loop (here, I use while() and after this I'll try for()). I am still struggling with keeping references to indexes (locations in a vector) separate from values/observations within vectors. Loops seem to make it more challenging for me to distinguish the two.
Why does this not produce the answer to Euler #1?
x <- 0
i <- 1
while (i < 100) {
if (i %% 3 == 0 | i %% 5 == 0) {
x[i] <- c(x, i)
}
i <- i + 1
}
sum(x)
And in words, line by line this is what I understand is happening:
x gets value 0
i gets value 1
while object i's value (not the index #) is < 1000
if is divisible by 3 or 5
add that number i to the vector x
add 1 to i in order (in order to keep the loop going to defined limit of 1e3
sum all items in vector x
I am guessing x[i] <- c(x, i) is not the right way to add an element to vector x. How do I fix this and what else is not accurate?
First, your loop runs until i < 100, not i < 1000.
Second, replace x[i] <- c(x, i) with x <- c(x, i) to add an element to the vector.
Here is a shortcut that performs this sum, which is probably more in the spirit of the problem:
3*(333*334/2) + 5*(199*200/2) - 15*(66*67/2)
## [1] 233168
Here's why this works:
In the set of integers [1,999] there are:
333 values that are divisible by 3. Their sum is 3*sum(1:333) or 3*(333*334/2).
199 values that are divisible by 5. Their sum is 5*sum(1:199) or 5*(199*200/2).
Adding these up gives a number that is too high by their intersection, which are the values that are divisible by 15. There are 66 such values, and their sum is 15*(1:66) or 15*(66*67/2)
As a function of N, this can be written:
f <- function(N) {
threes <- floor(N/3)
fives <- floor(N/5)
fifteens <- floor(N/15)
3*(threes*(threes+1)/2) + 5*(fives*(fives+1)/2) - 15*(fifteens*(fifteens+1)/2)
}
Giving:
f(999)
## [1] 233168
f(99)
## [1] 2318
And another way:
x <- 1:999
sum(which(x%%5==0 | x%%3==0))
# [1] 233168
A very efficient approach is the following:
div_sum <- function(x, n) {
# calculates the double of the sum of all integers from 1 to n
# that are divisible by x
max_num <- n %/% x
(x * (max_num + 1) * max_num)
}
n <- 999
a <- 3
b <- 5
(div_sum(a, n) + div_sum(b, n) - div_sum(a * b, n)) / 2
In contrast, a very short code is the following:
x=1:999
sum(x[!x%%3|!x%%5])
Here is an alternative that I think gives the same answer (using 99 instead of 999 as the upper bound):
iters <- 100
x <- rep(0, iters-1)
i <- 1
while (i < iters) {
if (i %% 3 == 0 | i %% 5 == 0) {
x[i] <- i
}
i <- i + 1
}
sum(x)
# [1] 2318
Here is the for-loop mentioned in the original post:
iters <- 99
x <- rep(0, iters)
i <- 1
for (i in 1:iters) {
if (i %% 3 == 0 | i %% 5 == 0) {
x[i] <- i
}
i <- i + 1
}
sum(x)
# [1] 2318

Resources