R - Occurence by variable - r

I have a dataset called restrictions and I know if people can do actions (eat with a fork, come out of bed...).
Each number represents with which level of difficulty each individual can do an action (1: No difficulty, 2: Some difficulties, 3: High difficulties, 4: Cannot do the action at all)
I am mostly interested in level 4.
The dataset looks like this (with many more variables)
> head(restrictions)
RATOI_I RAHAB_I RANOU_I RAELI_I RAACH_I RAREP_I RAMEN_I RAADM_I RAMED_I RADPI_I RADPE_I RABUS_I
1 4 4 1 1 4 4 4 4 1 1 4 4
2 4 3 3 1 4 4 4 4 4 2 4 4
I would like to know how many people are level 4 in RATOI_I (I can do that) and for these people level 4 in RATOI_I, how many are level 4 in RAHAB_I and each variable.
I looked at the function sapply() but I am completely lost, I do not know how to use it and with which function.
Or must I maybe use the group_by() function?
Thanks in advance!

You can use apply with sum using restrictions==4 to count the number equal 4 per column.
apply(restrictions==4, 2, sum)
#colSums(restrictions==4) #Alternative
#RATOI_I RAHAB_I RANOU_I RAELI_I RAACH_I RAREP_I RAMEN_I RAADM_I RAMED_I RADPI_I RADPE_I RABUS_I
# 2 1 0 0 2 2 2 2 1 0 2 2
Or only for those having restrictions$RATOI_I==4 (Thanks to #Daniel-o for pointing on this):
apply(restrictions[restrictions$RATOI_I==4]==4, 2, sum)
#colSums(restrictions[restrictions$RATOI_I==4]==4)
#RATOI_I RAHAB_I RANOU_I RAELI_I RAACH_I RAREP_I RAMEN_I RAADM_I RAMED_I RADPI_I RADPE_I RABUS_I
# 2 1 0 0 2 2 2 2 1 0 2 2

we can also do by base packages:
df[df<4]<-0
df[df==4]<-1
colSums(df)
>RATOI_I RAHAB_I RANOU_I RAELI_I RAACH_I RAREP_I RAMEN_I RAADM_I RAMED_I RADPI_I RADPE_I RABUS_I
2 1 0 0 2 2 2 2 1 0 2 2

Related

runner:streak_run shows unexpected result when k remains unchanged

I'm using runner:streak_run to count sequences of 0 and 1 in a column called "inactive_indicator".
The column is= 0,0,0,1,1,1,0,1,1,0,0,0,0,0,0,0,0,1,1,1,1
For runner::streak_run(inactive_indicator))
I get the following:
1,2,3,1,2,3,1,1,2,1,2,3,4,5,5,5,5,1,2,3,4
Why is it stuck on 5 when it should go up to 8?
In documentation it says that k - running window size. By default window size equals length(x). Allow varying window size specified by vector of length(x)
As I understand, the default definition should be enough.
Problem resolves and I get expected results when running:
runner::streak_run(inactive_indicator),k=length(inactive_indicator))
Why doesn't it work in the first place?
This can be solved with rle from base R
sequence(rle(inactive_indicator)$lengths)
#[1] 1 2 3 1 2 3 1 1 2 1 2 3 4 5 6 7 8 1 2 3 4
Checked with runner
runner::streak_run(inactive_indicator)
#[1] 1 2 3 1 2 3 1 1 2 1 2 3 4 5 6 7 8 1 2 3 4
It is possible that there are some leading/lagging spaces in the column and it is not numeric. In that case, use trimws
runner::streak_run(trimws(inactive_indicator))
data
inactive_indicator <- c(0,0,0,1,1,1,0,1,1,0,0,0,0,0,0,0,0,1,1,1,1)

How to BiCluster with constant values in columns - in R

My Problem in general:
I have a data frame where i would like to find all bi-clusters with constant values in columns.
For Example the initial dataframe:
> df
v1 v2 v3
1 0 2 1
2 1 3 2
3 2 4 3
4 3 3 4
5 4 2 3
6 5 2 4
7 2 2 3
8 3 1 2
And for example i would like to find the a cluster like this:
> cluster1
v1 v3
1 2 3
2 2 3
I tried to use the biclust package and tested several functions but the result was always not what i want to archive.
I figured out that I may can use the BCPlaid function with fit.model = y ~ m. But it looks like this produce also different results.
Is there a way to archive this task efficient?

I am trying to take a vector of numbers 5:0 and repeat it 3 times, every other time reversing its order

I'd think this would be simple using the rev() and seq() functions, but am struggling to get the reverse order part correct.
I'm trying to get 5432101234543210 from 5:0.
Not too hard to set as a function...
try_it <- function(x) {
c(rev(x), x[2:length(x-1)], rev(x)[2:length(x-1)])
}
try_it(0:5)
# [1] 5 4 3 2 1 0 1 2 3 4 5 4 3 2 1 0
Edit
Extend function to have variable repeats
try_it <- function(x, reps) {
c(rev(x), rep(c(x[2:length(x-1)], rev(x)[2:length(x-1)]), (reps - 1) / 2))
}
try_it(0:5, 5)
# [1] 5 4 3 2 1 0 1 2 3 4 5 4 3 2 1 0 1 2 3 4 5 4 3 2 1 0
Note: I've not worked hard to generalise this extension, it will not return the correct length for an even number of repetitions. I'm sure you could modify to suit your requirements.

R table function

If I have a vector numbers <- c(1,1,2,4,2,2,2,2,5,4,4,4), and I use 'table(numbers)', I get
names 1 2 4 5
counts 2 5 4 1
What if I want it to include 3 also or generally, all numbers from 1:max(numbers) even if they are not represented in numbers. Thus, how would I generate an output as such:
names 1 2 3 4 5
counts 2 5 0 4 1
If you want R to add up numbers that aren't there, you should create a factor and explicitly set the levels. table will return a count for each level.
table(factor(numbers, levels=1:max(numbers)))
# 1 2 3 4 5
# 2 5 0 4 1
For this particular example (positive integers), tabulate would also work:
numbers <- c(1,1,2,4,2,2,2,2,5,4,4,4)
tabulate(numbers)
# [1] 2 5 0 4 1

Generating interaction terms manually

I am trying to estimate a fixed effects panel with individual-specific time trends using plm and am running up against the same problem as other people. I'm more than willing to use the workaround described in the linked CrossValidated question but cannot figure out how to generate the necessary data frame columns.
That is, I have a data frame of the form
data.frame(date=rep(1:5,times=3),id=rep(1:3,each=5))
and would like to add to this data frame a column for each id that is named date_idX, has the same value as date for all observations where id==X and zero otherwise.
Any more elegant solutions to my problem would of course also be appreciated.
> dfrm <- data.frame(date=rep(1:5,times=3),id=rep(1:3,each=5))
>
> X <-3; dfrm$time_idX <- dfrm$date*(dfrm$id==X)
> dfrm
date id time_idX
1 1 1 0
2 2 1 0
3 3 1 0
4 4 1 0
5 5 1 0
6 1 2 0
7 2 2 0
8 3 2 0
9 4 2 0
10 5 2 0
11 1 3 1
12 2 3 2
13 3 3 3
14 4 3 4
15 5 3 5
I suspect that what your really wanted was to do this in a regression formula. For that the I() function is needed. This is pseudo-code:
regfun( form = yield ~ I(date*(id==X) ), data=dfrm)
I'm not guaranteeing this will be a proper solution to the problem of using plm, but is a method that should work with ordinary regression. You should edit your question to include a proper test case.

Resources