I'm using runner:streak_run to count sequences of 0 and 1 in a column called "inactive_indicator".
The column is= 0,0,0,1,1,1,0,1,1,0,0,0,0,0,0,0,0,1,1,1,1
For runner::streak_run(inactive_indicator))
I get the following:
1,2,3,1,2,3,1,1,2,1,2,3,4,5,5,5,5,1,2,3,4
Why is it stuck on 5 when it should go up to 8?
In documentation it says that k - running window size. By default window size equals length(x). Allow varying window size specified by vector of length(x)
As I understand, the default definition should be enough.
Problem resolves and I get expected results when running:
runner::streak_run(inactive_indicator),k=length(inactive_indicator))
Why doesn't it work in the first place?
This can be solved with rle from base R
sequence(rle(inactive_indicator)$lengths)
#[1] 1 2 3 1 2 3 1 1 2 1 2 3 4 5 6 7 8 1 2 3 4
Checked with runner
runner::streak_run(inactive_indicator)
#[1] 1 2 3 1 2 3 1 1 2 1 2 3 4 5 6 7 8 1 2 3 4
It is possible that there are some leading/lagging spaces in the column and it is not numeric. In that case, use trimws
runner::streak_run(trimws(inactive_indicator))
data
inactive_indicator <- c(0,0,0,1,1,1,0,1,1,0,0,0,0,0,0,0,0,1,1,1,1)
Related
I have a dataset called restrictions and I know if people can do actions (eat with a fork, come out of bed...).
Each number represents with which level of difficulty each individual can do an action (1: No difficulty, 2: Some difficulties, 3: High difficulties, 4: Cannot do the action at all)
I am mostly interested in level 4.
The dataset looks like this (with many more variables)
> head(restrictions)
RATOI_I RAHAB_I RANOU_I RAELI_I RAACH_I RAREP_I RAMEN_I RAADM_I RAMED_I RADPI_I RADPE_I RABUS_I
1 4 4 1 1 4 4 4 4 1 1 4 4
2 4 3 3 1 4 4 4 4 4 2 4 4
I would like to know how many people are level 4 in RATOI_I (I can do that) and for these people level 4 in RATOI_I, how many are level 4 in RAHAB_I and each variable.
I looked at the function sapply() but I am completely lost, I do not know how to use it and with which function.
Or must I maybe use the group_by() function?
Thanks in advance!
You can use apply with sum using restrictions==4 to count the number equal 4 per column.
apply(restrictions==4, 2, sum)
#colSums(restrictions==4) #Alternative
#RATOI_I RAHAB_I RANOU_I RAELI_I RAACH_I RAREP_I RAMEN_I RAADM_I RAMED_I RADPI_I RADPE_I RABUS_I
# 2 1 0 0 2 2 2 2 1 0 2 2
Or only for those having restrictions$RATOI_I==4 (Thanks to #Daniel-o for pointing on this):
apply(restrictions[restrictions$RATOI_I==4]==4, 2, sum)
#colSums(restrictions[restrictions$RATOI_I==4]==4)
#RATOI_I RAHAB_I RANOU_I RAELI_I RAACH_I RAREP_I RAMEN_I RAADM_I RAMED_I RADPI_I RADPE_I RABUS_I
# 2 1 0 0 2 2 2 2 1 0 2 2
we can also do by base packages:
df[df<4]<-0
df[df==4]<-1
colSums(df)
>RATOI_I RAHAB_I RANOU_I RAELI_I RAACH_I RAREP_I RAMEN_I RAADM_I RAMED_I RADPI_I RADPE_I RABUS_I
2 1 0 0 2 2 2 2 1 0 2 2
I'd think this would be simple using the rev() and seq() functions, but am struggling to get the reverse order part correct.
I'm trying to get 5432101234543210 from 5:0.
Not too hard to set as a function...
try_it <- function(x) {
c(rev(x), x[2:length(x-1)], rev(x)[2:length(x-1)])
}
try_it(0:5)
# [1] 5 4 3 2 1 0 1 2 3 4 5 4 3 2 1 0
Edit
Extend function to have variable repeats
try_it <- function(x, reps) {
c(rev(x), rep(c(x[2:length(x-1)], rev(x)[2:length(x-1)]), (reps - 1) / 2))
}
try_it(0:5, 5)
# [1] 5 4 3 2 1 0 1 2 3 4 5 4 3 2 1 0 1 2 3 4 5 4 3 2 1 0
Note: I've not worked hard to generalise this extension, it will not return the correct length for an even number of repetitions. I'm sure you could modify to suit your requirements.
This question already has answers here:
Find and break on repeated runs
(3 answers)
Closed 6 years ago.
Imagine a vector of integers like so:
> rep(c(1,4,2),10)
[1] 1 4 2 1 4 2 1 4 2 1 4 2 1 4 2 1 4 2 1 4 2 1 4 2 1 4 2 1 4 2
For us human beings it seems easy to identify the pattern 1 - 4 - 2 even without knowing the function how the vector was created. But how would you identify this pattern using R?
Edit
As this question was marked as a dupe I'm going to specify it a bit. The above example was an easy one to explain the idea. The main goal would be to identify more hidden patterns like 1 4 2 5 6 7 1 4 2 9 1 4 2 3 4 5 1 4 2 and also patterns that are approximately the same like 1 4 2 1 4 1.99 1 4 2 1.01 4 2 1 4.01 2. What are the ideas to always Identify the pattern 1 4 2 in those cases?
Assuming that the subpattern must start at the beginning and repeat to the end of the input try it for a subpattern length of k = 1, 2, 3, ... We have assumed that only patterns that are half the length of the input or less are to be considered:
for(k in seq_len(length(x)/2)) {
pat <- x[1:k]
if (identical(rep(pat, length = length(x)), x)) {
print(pat)
break
}
}
## [1] 1 4 2
Note: This was used as the input x:
x <- rep(c(1, 4, 2), 10)
If I have a vector numbers <- c(1,1,2,4,2,2,2,2,5,4,4,4), and I use 'table(numbers)', I get
names 1 2 4 5
counts 2 5 4 1
What if I want it to include 3 also or generally, all numbers from 1:max(numbers) even if they are not represented in numbers. Thus, how would I generate an output as such:
names 1 2 3 4 5
counts 2 5 0 4 1
If you want R to add up numbers that aren't there, you should create a factor and explicitly set the levels. table will return a count for each level.
table(factor(numbers, levels=1:max(numbers)))
# 1 2 3 4 5
# 2 5 0 4 1
For this particular example (positive integers), tabulate would also work:
numbers <- c(1,1,2,4,2,2,2,2,5,4,4,4)
tabulate(numbers)
# [1] 2 5 0 4 1
I have some data:
Length(cm) Frequency
1 5
2 2
3 3
4 5
Is there a way to expand these numbers in R without typing them out manually, so I can work out the std error of the mean for length, so I have a dataset like:
1 1 1 1 1 2 2 3 3 3 4 4 4 4 4
which I can then work on? Thanks
You can use rep.
> l <- 1:4
> f <- c(5,2,3,5)
> rep(l,f)
[1] 1 1 1 1 1 2 2 3 3 3 4 4 4 4 4
In addition to using rep to replicate the observations you could also use the wtd.mean and wtd.var functions in the Hmisc package to compute the weighted summaries without expanding (this will be better if the expanded vector would take up a large portion of memory).
I recommend using a dataframe:
sd(rep(data$length, data$freq))