R: Resampling Control Group in Propensity Scoriing - r

I am currently working on a propensity scoring match model.
An example of my code is provided below:
library(MatchIt)
head(data)
match.it.data <- matchit(TARGET ~ PROBABILITY, data = data, method = "nearest", ratio = 1)
I was wondering if anyone knew how to allow my Treated group observations to be matched with the Control group observations more than once.
In other words, a control group observation would be able to be matched with my Treated group more than once. So, rather than doing a one to one doing a one to the closest possible. So, potentially allow the the treated group to resample the control group more than once sort of like a bootstrap simulation would work with multiple iterations.
thank you for your help.

Simply add replace = TRUE in the call to matchit().

Related

R: Nearest neighbour matching with MatchIT

I would like to use nearest neighbour matching with MatchIt in R.
So far I have used the following code:
Matching<- matchit(Treatment ~ Size+ Age + Expenses, data=data, method = "nearest", distance="glm", replace=TRUE)
I have two questions:
Question 1.)
When I run this code and run Matching again then I get a summary.
One line then says
A matchit object
- method: 1:1 nearest neighbor matching with replacement
I want to have the same control observation to be matched multiple times if neeeded.
Is the code above doing that?
I am confused since it says 1:1 nearest neighbor matching with replacement, I don't know if it now only uses an observation in the control group not more than once due to the 1:1 part in the sentence. However, since I use replace=true in the code I thought that this does exactly that so that one observation in the control group can be matched several times.
Could someone explain to me if my understanding is correct?
Question 2.)
After having run
Matching<- matchit(Treatment ~ Size+ Age + Expenses, data=data, method = "nearest", distance="glm", replace=TRUE)
I would like to estimate the average treatment effect.
I use the following document as a reference on how to estimate it:
https://cran.r-project.org/web/packages/MatchIt/vignettes/estimating-effects.html#after-pair-matching-with-replacement
However, I would like to use clustered standard errors by subclass and id.
Therefore, I need to first write the code:
Matching_gm <- get_matches(Matching)
When I look at the weights of Matching_gm they are always 1. However, when I run summary(Matching$weigths) there are many weights that are different from 1.
Why do the weights change when I use get_matches ? As far as I know, this should not be the case.
It is called 1:1 matching because each treated unit gets one match, but it is possible that control units are reused as matches for multiple treated units. If you set ratio to something other than 1, you would be doing k:1 matching, where each treated units gets k matches, composed of controls that may be resued for other treated units.
get_matches() produces a row for each unit for each time it is matched. If a control unit is matched twice (i.e., to two different treated units), it will have two rows each with a weight of 1 in the get_matches() output, but it will have a weight of 2 in the matchit() output (though this weight may be scaled to be different from 2). If you use match.data() instead of get_matches(), you will see that each unit receives only one row and the weights for each control unit are the same as in the matchit() output.

How to use the "how" function for an unbalanced repeated design

I have a set of control and treated plots which had been sampled during years. I run the prc function in the vegan package and want to perform a permutation test to check whether control vs treated plots significantly differ during years. As my data is unbalanced, I can not use strata function. my code look like:
library(vegan)
year=as.factor(c(rep(1995,8),rep(1999,8),rep(2001,8),rep(2013,4),rep(1995,4),
rep(1999,4),rep(2001,4),rep(2013,4)))
treatment=as.factor(c(rep("control",28),rep("treated",16)))
I've written this, but I'm sure that it is wrong because the treatment is missing here:
h1 <- how(within = Within(type = "series", mirror = F),
blocks = year, nperm = 999
)
Any suggestions is greatly appreciated.
Under the null hypothesis, samples from the control or treated groups are exchangeable and hence you don't want them in the permutation design; you really want to permute them to generate the permutation-based null distribution for the test statistic.
The permutation design is there to indicate what isn't exchangeable.
You haven't explained why you want samples within the blocks to be permuted in series; why are samples within years also time series? If they're not, you don't want this.
You only need to worry about imbalance if you want to permute the strata. Whilst using blocks is similar in some respects to strata, blocks are never permuted so if you can use blocks you can use strata as you won't be permuting them.
If you want to permute the years as groups of samples, then you'll need strata and you'll need balance at the year level, which you don't have.
What you have defined with your call to how() is:
groups samples by year and as such samples will never be swapped between years, and
samples within the levels of year will be permuted in series, keeping their temporal order intact after applying cyclic shift permutations.
If that's not what you want to do, you need to explain in words what you want to do. By "do" I mean what is it you want to test? What is your model in vegan?

Reducing "treatment" sample size through MatchIt (or another package) to increase sample similarity

I am trying to match two samples on several covariates using MatchIt, but I am having difficulty creating samples that are similar enough. Both my samples are plenty large (~1000 in the control group, ~5000 in the comparison group).
I want to get a matched sample with participants as closely matched as possible and I am alright with losing sample size in the control group. Right now, MatchIt only returns two groups of 1000, whereas I want two groups that are very closely matched and would be fine with smaller groups (e.g., 500 instead of 1000).
Is there a way to do this through either MatchIt or another package? I would rather avoid using random sampling and then match if possible because I want as close a match between groups as possible.
Apologies for not having a reproducible example, I am still pretty new to using R and couldn't figure out how to make a sample of this issue...
Below is the code I have for matching the two groups.
data<- na.omit(data)
data$Group<- as.numeric(data$Group)
data$Group<- recode(data$Group, '1 = 1; 2 = 0')
m.out <- matchit(Group ~ Age + YearsEdu + Income + Gender, data = data, ratio = 1)
s.out <- summary(m.out, standardize = TRUE)
plot(s.out)
matched.data <- match.data(m.out)
MatchIt, like other similar packages, offers several matching routines that enable you to play around with the settings. Check out the argument method, which is set to method = 'nearest' by default. This means that unless you specify, it will look for the best match for each of the treatment observations. In your case, you will always have 1000 paired matches with this setting.
You can choose to set it to method = 'exact', which is much more restrictive. In the documentation you will find:
This technique matches each treated unit to all
possible control units with exactly the same values on all the covariates, forming subclasses
such that within each subclass all units (treatment and control) have the same covariate values.
On the lalonde dataset, you can run:
m.out <- matchit(treat ~ educ + black + hispan, data = lalonde, method = 'exact')
summary(m.out)
As a consequence, it discards some of the treatment observation that could not get matched. Have a look at the other possibilities for method, maybe you will find something you will like better.
That being said, be mindful not to discard too many treatment observations. If you do, you will make the treatment group look like the control group (instead of the opposite), which might lead to unwanted results.
You should look into the package designmatch, which implements a form of matching called cardinality matching that does what you want (i.e., find the largest matched set that yields desired balance). Unlike MatchIt, designmatch doesn't use a distance variable; instead, it uses optimization to solve the matching problem. You select exactly how balanced you want each covariate to be, and it will do its best to solve the problem while retaining as many matches as possible. The methodology is described in Zubizarreta, Paredes, & Rosenbaum (2014).

Model with Matched pairs and repeated measures

I will delete if this is too loosely programming but my search has turned up NULL so I'm hoping someone can help.
I have a design that has a case/control matched pairs design with repeated measurements. Looking for a model/function/package in R
I have 2 measures at time=1 and 2 measures at time=2. I have Case/Control status as Group (2 levels), and matched pairs id as match_id and want estimate the effect of Group, time and the interaction on speed, a continuous variable.
I wanted to do something like this:
(reg_id is the actual participant ID)
speed_model <- geese(speed ~ time*Group, id = c(reg_id,match_id),
data=dataforGEE, corstr="exchangeable", family=gaussian)
Where I want to model the autocorrelation within a person via reg_id, but also within the matched pairs via match_id
But I get:
Error in model.frame.default(formula = speed ~ time * Group, data = dataFullGEE, :
variable lengths differ (found for '(id)')
Can geese or GEE in general not handle clustering around 2 sets of id? Is there a way to even do this? I'm sure there is.
Thank you for any help you can provide.
This is definatly a better question for Cross Validated, but since you have exactly 2 observations per subject, I would consider the ANCOVA model:
geese(speed_at_time_2 ~ speed_at_time_1*Group, id = c(match_id),
data=dataforGEE, corstr="exchangeable", family=gaussian)
Regarding the use of ANCOVA, you might find this reference useful.

Is there an add-on that allows me to create groups that are matched according to one or more criteria?

I want to compare two groups of subjects (0,1) but want to make sure that the differences I observe aren't due to a third variable, which is significantly different between the two groups. Group 1 is much smaller than group 0 so I guess it would be optimal to select a subset of subjects from group 0 that best matches the third variable between groups. In a perfect world I guess the add-on would select a subset from both groups that would both maximize the number of subjects and match the third variable between groups. Is there any add-on available that helps me do that. If not, you guys might know an efficient way to achieve the same by some clever coding. Of course it would be even better if I could match the groups over some similarity parameter based on a multitude of variables.
Take a look at the sampling package. I believe it is the most full featured for doing these types of things. Anyway, here is a worked example:
require(sampling)
set.seed(12345)
# Set number of subjects
n = 1000
# Generate data
group = factor(sample(c(0,0,1), n, replace=T))
x = 0.2 * as.numeric(group) + rnorm(n)
data = data.frame(group, x)
# Demonstrate the significant group effect
summary(lm(x ~ group, data=data))
# Let's say we want a sample with 50 subjects in each group
pik = inclusionprobastrata(as.numeric(data$group), c(50, 50))
picks = balancedstratification(cbind(data$x), as.numeric(data$group), pik)
# Pick out our balanced sample
new.data = data[picks==1, ]
# Demonstrate that the group effect is gone
summary(lm(x ~ group, data=new.data))

Resources