I am trying to estimate a fixed effects panel with individual-specific time trends using plm and am running up against the same problem as other people. I'm more than willing to use the workaround described in the linked CrossValidated question but cannot figure out how to generate the necessary data frame columns.
That is, I have a data frame of the form
data.frame(date=rep(1:5,times=3),id=rep(1:3,each=5))
and would like to add to this data frame a column for each id that is named date_idX, has the same value as date for all observations where id==X and zero otherwise.
Any more elegant solutions to my problem would of course also be appreciated.
> dfrm <- data.frame(date=rep(1:5,times=3),id=rep(1:3,each=5))
>
> X <-3; dfrm$time_idX <- dfrm$date*(dfrm$id==X)
> dfrm
date id time_idX
1 1 1 0
2 2 1 0
3 3 1 0
4 4 1 0
5 5 1 0
6 1 2 0
7 2 2 0
8 3 2 0
9 4 2 0
10 5 2 0
11 1 3 1
12 2 3 2
13 3 3 3
14 4 3 4
15 5 3 5
I suspect that what your really wanted was to do this in a regression formula. For that the I() function is needed. This is pseudo-code:
regfun( form = yield ~ I(date*(id==X) ), data=dfrm)
I'm not guaranteeing this will be a proper solution to the problem of using plm, but is a method that should work with ordinary regression. You should edit your question to include a proper test case.
Related
I have a dataset called restrictions and I know if people can do actions (eat with a fork, come out of bed...).
Each number represents with which level of difficulty each individual can do an action (1: No difficulty, 2: Some difficulties, 3: High difficulties, 4: Cannot do the action at all)
I am mostly interested in level 4.
The dataset looks like this (with many more variables)
> head(restrictions)
RATOI_I RAHAB_I RANOU_I RAELI_I RAACH_I RAREP_I RAMEN_I RAADM_I RAMED_I RADPI_I RADPE_I RABUS_I
1 4 4 1 1 4 4 4 4 1 1 4 4
2 4 3 3 1 4 4 4 4 4 2 4 4
I would like to know how many people are level 4 in RATOI_I (I can do that) and for these people level 4 in RATOI_I, how many are level 4 in RAHAB_I and each variable.
I looked at the function sapply() but I am completely lost, I do not know how to use it and with which function.
Or must I maybe use the group_by() function?
Thanks in advance!
You can use apply with sum using restrictions==4 to count the number equal 4 per column.
apply(restrictions==4, 2, sum)
#colSums(restrictions==4) #Alternative
#RATOI_I RAHAB_I RANOU_I RAELI_I RAACH_I RAREP_I RAMEN_I RAADM_I RAMED_I RADPI_I RADPE_I RABUS_I
# 2 1 0 0 2 2 2 2 1 0 2 2
Or only for those having restrictions$RATOI_I==4 (Thanks to #Daniel-o for pointing on this):
apply(restrictions[restrictions$RATOI_I==4]==4, 2, sum)
#colSums(restrictions[restrictions$RATOI_I==4]==4)
#RATOI_I RAHAB_I RANOU_I RAELI_I RAACH_I RAREP_I RAMEN_I RAADM_I RAMED_I RADPI_I RADPE_I RABUS_I
# 2 1 0 0 2 2 2 2 1 0 2 2
we can also do by base packages:
df[df<4]<-0
df[df==4]<-1
colSums(df)
>RATOI_I RAHAB_I RANOU_I RAELI_I RAACH_I RAREP_I RAMEN_I RAADM_I RAMED_I RADPI_I RADPE_I RABUS_I
2 1 0 0 2 2 2 2 1 0 2 2
I have a question regarding longitudinal study analysis and work with R.
I have the following data format:
ID Visit Behaviour Distance_to_first_visit_in_month
1 0 1 0
1 1 1 6
1 2 1 12
1 3 1 50
2 0 3 0
2 1 3 8
2 2 3 16
2 3 3 25
2 4 3 40
2 5 3 60
3 0 1 0
3 1 1 6
3 2 1 12
3 3 3 24
3 4 3 30
3 5 3 55
I need the data in the following format:
ID Visit Behaviour Distance_to_first_visit_in_month Status
1 0 1 0 0
2 0 3 0 1
3 3 3 24 1
If a person has 1 every time until the end he should be only censored because the study is finished. If a person has 3 for the first time I need the Distance_to_to_first_visit_in_month because there he has the status 1 in the Kapplan-Meyer curve.
I tried to filter the maximal Distance_to_first_visit_in_month and get the Behaviour. When I bring the data to the wide format it is easy to get those. But I can't get the Distance_to_first_visit_in_month when the person 3 as Behaviour at the beginning or when otherwise.
I have 300IDs with sometimes 11 visits so I can't prepare the data manuell.
Do you have an idea?
Thanks you in advance.
Best Christina
As you don't explain how to aggregate your data to the second dataset, I can only show you how to get the ID's that match your conditions and how to implement the status variable. See this example:
library(dplyr)
# get id's with only 1
id_list1 <- lapply(df %>% split(.$ID),function(x){
if(unique(x$ID)==1){
return(unique(x$ID))
}
}) %>%
unlist()
# get id's with 3 as first value
id_list3 <- lapply(df %>% split(.$ID),function(x){
if(x[x$Visit==0,"Behaviour"]==3){
return(unique(x$ID))
}
}) %>%
unlist()
df %>%
mutate(Status = ifelse(ID %in% id_list3,1,0)) %>%
mutate(new_dist = ifelse(!ID %in% id_list3,Distance_to_first_visit_in_month,NA))
Please note that you'll get named vectors in id_list1 and id_list3. There are no duplicates, just the name of the element matching the element.
And do you mean Visit number 0 with "at the beginning"? Otherwise you'll have to adjust x$Visit==0.
My Problem in general:
I have a data frame where i would like to find all bi-clusters with constant values in columns.
For Example the initial dataframe:
> df
v1 v2 v3
1 0 2 1
2 1 3 2
3 2 4 3
4 3 3 4
5 4 2 3
6 5 2 4
7 2 2 3
8 3 1 2
And for example i would like to find the a cluster like this:
> cluster1
v1 v3
1 2 3
2 2 3
I tried to use the biclust package and tested several functions but the result was always not what i want to archive.
I figured out that I may can use the BCPlaid function with fit.model = y ~ m. But it looks like this produce also different results.
Is there a way to archive this task efficient?
If I have a vector numbers <- c(1,1,2,4,2,2,2,2,5,4,4,4), and I use 'table(numbers)', I get
names 1 2 4 5
counts 2 5 4 1
What if I want it to include 3 also or generally, all numbers from 1:max(numbers) even if they are not represented in numbers. Thus, how would I generate an output as such:
names 1 2 3 4 5
counts 2 5 0 4 1
If you want R to add up numbers that aren't there, you should create a factor and explicitly set the levels. table will return a count for each level.
table(factor(numbers, levels=1:max(numbers)))
# 1 2 3 4 5
# 2 5 0 4 1
For this particular example (positive integers), tabulate would also work:
numbers <- c(1,1,2,4,2,2,2,2,5,4,4,4)
tabulate(numbers)
# [1] 2 5 0 4 1
I have some data:
Length(cm) Frequency
1 5
2 2
3 3
4 5
Is there a way to expand these numbers in R without typing them out manually, so I can work out the std error of the mean for length, so I have a dataset like:
1 1 1 1 1 2 2 3 3 3 4 4 4 4 4
which I can then work on? Thanks
You can use rep.
> l <- 1:4
> f <- c(5,2,3,5)
> rep(l,f)
[1] 1 1 1 1 1 2 2 3 3 3 4 4 4 4 4
In addition to using rep to replicate the observations you could also use the wtd.mean and wtd.var functions in the Hmisc package to compute the weighted summaries without expanding (this will be better if the expanded vector would take up a large portion of memory).
I recommend using a dataframe:
sd(rep(data$length, data$freq))