I am estimating what's often called the "event-study" specification of a difference-in-differences model in R. Basically, we observe treated and control units over time and estimate a two-way fixed effects model with parameters for the "effect" of being treated in each time period (omitting one period, usually the one before treatment, as the reference period). I am struggling with how to compactly specify this model with R formulas.
For example, here is the model...
library(lfe)
library(tidyverse)
library(dummies)
N <- 100
df <- tibble(
id = rep(1:N, 5),
treat = id >= ceiling(N / 2),
time = rep(1:5, each=N),
x = rnorm(5 * N)
)
# produce an outcome variable
df <- df %>% mutate(
y = x - treat * (time == 5) + time + rnorm(5*N)
)
head(df)
# easily recover the parameters with the true model...
summary(felm(
y ~ x + I(treat * (time == 5)) | id + time, data = df
))
Now, I want to do an event-study design using period 4 as the baseline because treatment happens in period 5. We expect coefficients near zero on the pre-periods (1–4), and a negative treatment effect for the treated in the treated period (time == 5)
df$timefac <- factor(df$time, levels = c(4, 1, 2, 3, 5))
summary(felm(
y ~ x + treat * timefac | id + time, data = df
))
That looks good, but produces lots of NAs because several of the coefficients are absorbed by the unit and time effects. Ideally, I can specify the model without those coefficients...
# create dummy for each time period for treated units
tdum <- dummy(df$time)
df <- bind_cols(df, as.data.frame(tdum))
df <- df %>% mutate_at(vars(time1:time5), ~ . * treat)
# estimate model, manually omitting one dummy
summary(felm(
y ~ x + time1 + time2 + time3 + time5 | id + time, data = df
))
Now, the question is how to specify this model in a compact way. I thought the following would work, but it produces very unpredictable output...
summary(felm(
y ~ x + treat:timefac | id + time, data = df
))
With the above, R does not use period 4 as the reference period and sometimes chooses to include the interaction with untreated rather than treated. The output is...
Coefficients:
Estimate Std. Error t value Pr(>|t|)
x 0.97198 0.05113 19.009 < 2e-16 ***
treatFALSE:timefac4 NA NA NA NA
treatTRUE:timefac4 -0.19607 0.28410 -0.690 0.49051
treatFALSE:timefac1 NA NA NA NA
treatTRUE:timefac1 -0.07690 0.28572 -0.269 0.78796
treatFALSE:timefac2 NA NA NA NA
treatTRUE:timefac2 NA NA NA NA
treatFALSE:timefac3 0.15525 0.28482 0.545 0.58601
treatTRUE:timefac3 NA NA NA NA
treatFALSE:timefac5 0.97340 0.28420 3.425 0.00068 ***
treatTRUE:timefac5 NA NA NA NA
Is there a way to specify this model without having to manually produce dummies and interactions for treated units for every time period?
If you know Stata, I'm essentially looking for something as easy as:
areg y x i.treat##ib4.time, absorb(id)
(Note how simple it is to tell Stata to treat the variable as categorical — the i prefix —without making dummies for time and also indicate that period 4 should be the base period — the b4 prefix.)
The package fixest performs fixed-effects estimations (like lfe) and includes utilities to deal with interactions. The function i (or interact) is what you're looking for.
Here is an example where the treatment is interacted with the year and year 5 is dropped out:
library(fixest)
data(base_did)
est_did = feols(y ~ x1 + i(treat, period, 5) | id + period, base_did)
est_did
#> OLS estimation, Dep. Var.: y
#> Observations: 1,080
#> Fixed-effects: id: 108, period: 10
#> Standard-errors: Clustered (id)
#> Estimate Std. Error t value Pr(>|t|)
#> x1 0.973490 0.045678 21.312000 < 2.2e-16 ***
#> treat:period::1 -1.403000 1.110300 -1.263700 0.206646
#> treat:period::2 -1.247500 1.093100 -1.141200 0.254068
#> treat:period::3 -0.273206 1.106900 -0.246813 0.805106
#> treat:period::4 -1.795700 1.088000 -1.650500 0.099166 .
#> treat:period::6 0.784452 1.028400 0.762798 0.445773
#> treat:period::7 3.598900 1.101600 3.267100 0.001125 **
#> treat:period::8 3.811800 1.247500 3.055500 0.002309 **
#> treat:period::9 4.731400 1.097100 4.312600 1.8e-05 ***
#> treat:period::10 6.606200 1.120500 5.895800 5.17e-09 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-likelihood: -2,984.58 Adj. R2: 0.48783
The nice thing is that you can plot the interacted coefficients out of the estimation to have a quick visual representation of the results (if you find the graph too sober, no worries you can customize almost everything in it):
coefplot(est_did)
If you don't want to use fixest for estimation, you can still use the function i to create interactions. Its syntax is i(var, f, ref, drop, keep): it interacts the variable var with a dummy variable for each value in f. You can select which values of f to retain with the arguments ref, drop and keep. drop well... drops values from f and ref is the same as drop, but the references are shown in the coefplot (while the values in drop don't appear in the graph).
Here's an example of what i does:
head(with(base_did, i(treat, period, keep = 3:7)))
#> treat:period::3 treat:period::4 treat:period::5 treat:period::6 treat:period::7
#> 1 0 0 0 0 0
#> 2 0 0 0 0 0
#> 3 1 0 0 0 0
#> 4 0 1 0 0 0
#> 5 0 0 1 0 0
#> 6 0 0 0 1 0
head(with(base_did, i(treat, period, drop = 3:7)))
#> treat:period::1 treat:period::2 treat:period::8 treat:period::9 treat:period::10
#> 1 1 0 0 0 0
#> 2 0 1 0 0 0
#> 3 0 0 0 0 0
#> 4 0 0 0 0 0
#> 5 0 0 0 0 0
#> 6 0 0 0 0 0
You can find more information on fixest here.
You can redefine the timefac so that untreated observations are coded as the omitted time category.
df %>%
mutate(time = ifelse(treat == 0, 4, time),
timefac = factor(time, levels = c(4, 1, 2, 3, 5)))
Then, you can use timefac without interactions and get a regression table with no NAs.
summary(felm(
y ~ x + timefac | id + time, data = df
))
Coefficients:
Estimate Std. Error t value Pr(>|t|)
x 0.98548 0.05028 19.599 < 2e-16 ***
time_fac1 -0.01335 0.27553 -0.048 0.961
time_fac2 -0.10332 0.27661 -0.374 0.709
time_fac3 0.24169 0.27575 0.876 0.381
time_fac5 -1.16305 0.27557 -4.221 3.03e-05 ***
This idea came from: https://blogs.worldbank.org/impactevaluations/econometrics-sandbox-event-study-designs-co
Related
I'm trying to implement a predictive model from a publication see here for reference
The paper specifies predictive models that are derived from a previous clinical study and provides the coefficients and covariance matrices for each.
I'm fairly familiar with fitting a model to data in R - but I've never had to specify one.
Specifically, I am looking to create the model so that I can leverage predict() to generate predictive outcomes for a different set of patients while accounting of the model's variability.
For convenience I've provided the one of the two models and related coefficients and covariance matrices, both are of a similar form. Any help is greatly appreciated:
# Model 1
# TKV model
#
# delta_TKV = exp(intercept + a x age + b x Ln(TKV_t) + c x female + d x age x Ln(TKV_t)) - 500
# delta_TKV - the change in total kidney volume (TKV) over a period of time in years
# age - age of patient in years
# Ln(TKV_t) - natural log of total kidney volume at time t
# female - boolean value for gender
# age:Ln(TKV_t) - interaction term between age and Ln(TKV)
# Coefficients Estimate SE
# intercept 0.7889 1.1313
# age 0.1107 0.0287
# Ln(TKV) 0.8207 0.1556
# Female -0.0486 0.0266
# Age:Ln(TKV) -0.0160 0.0039
# Covariance intercept age Ln(TKV) Female Age:Ln(TKV)
# intercept 1.279758 -0.031790 -0.175654 -0.001306 0.004362
# age -0.031790 0.00823 0.004361 -0.000016 -0.000113
# Ln(TKV) -0.175651 0.004361 0.024207 -0.000155 -0.000601
# Female -0.001306 -0.000016 0.000155 0.000708 0.000002
# Age:Ln(TKV) 0.004362 -0.000113 -0.000601 0.000002 0.000016
I don't known wether you can generate a model to be used with predict with custom coefficients. But you can use model.frame or model.matrix to generate a design matrix based on your formula e.g.
data = data.frame(delta_TKV = 1:3 , TKV_t = 3.5, female = c(T,F,T), age = 40:42 )
model = model.frame(log(delta_TKV + 500) ~ age + log(TKV_t) + female + age:log(TKV_t),
data)
model
#> (Intercept) age log(TKV_t) femaleTRUE age:log(TKV_t)
#> 1 1 40 1.252763 1 50.11052
#> 2 1 41 1.252763 0 51.36328
#> 3 1 42 1.252763 1 52.61604
#> attr(,"assign")
#> [1] 0 1 2 3 4
#> attr(,"contrasts")
#> attr(,"contrasts")$female
#> [1] "contr.treatment"
coefs = c(
intercept = 0.7889 ,
age = 0.1107 ,
`log(TKV)` = 0.8207 ,
female = -0.0486 ,
`Age:log(TKV)` = -0.0160
)
model %*% coefs
#> [,1]
#> 1 5.394674
#> 2 5.533930
#> 3 5.575986
i transformed the formula to make it like lm spec, thus the response is the logarithm of y + 500 and you must obtain y doing the reverse, the same apply if you were to use lm
I am trying to do a panel regression in R.
pdata <- pdata.frame(NEW, index = c("Year"))
And:
R1 <- plm(Market_Cap ~ GDP_growthR + Volatility_IR + FDI
+ Savings_rate, data=pdata, model="between")
However when I want to use the within (or random) estimator, I got the following error:
Error in plm.fit(data, model, effect, random.method, random.models, random.dfcor, : empty model
But, when I use the between estimator, everything is fine. Do you have any explanation and suggestion?
Thank you!
You should heed the advice in the comments.
I addressed a version of the OP's question on CV. If the structure of the data is the same, then you're only observing one cross-sectional unit over time. In your setting, you're observing a single country over many years. If your data was a true panel dataset, you would be observing more than one country over at least two years. For example, I will simulate a small panel data frame.
library(dplyr)
library(plm)
set.seed(12345)
panel <- tibble(
country = c(rep("Spain", 5), rep("France", 5), rep("Croatia", 5)),
year = rep(2016:2020, 3), # each country is observed over 5 years
x = rnorm(15), # sample 15 random deviates (5 per country)
y = sample(c(10000:100000), size = 15) # sample incomes (range: 10,000 - 100,000)
) %>%
mutate(
France = ifelse(country == "France", 1, 0),
Croatia = ifelse(country == "Croatia", 1, 0),
y_2016 = ifelse(year == 2016, 1, 0),
y_2017 = ifelse(year == 2017, 1, 0),
y_2018 = ifelse(year == 2018, 1, 0),
y_2019 = ifelse(year == 2019, 1, 0),
y_2020 = ifelse(year == 2020, 1, 0)
)
Inside of the mutate() function I appended dummies for all countries and all years, excluding one country and one year. In your other question, you estimate time fixed effects. Software invariably drops one year to avoid collinearity. You don't need to append the dummies, but they are helpful for explication purposes. Here is a classic panel data frame:
# Panel - varies across two dimensions (country + time)
# 3 countries observed over 5 years for a total of 15 country-year observations
# A tibble: 15 x 10
country year x y France Croatia y_2017 y_2018 y_2019 y_2020
<chr> <int> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Spain 2016 0.586 81371 0 0 0 0 0 0
2 Spain 2017 0.709 10538 0 0 1 0 0 0
3 Spain 2018 -0.109 26893 0 0 0 1 0 0
4 Spain 2019 -0.453 71363 0 0 0 0 1 0
5 Spain 2020 0.606 43308 0 0 0 0 0 1
6 France 2016 -1.82 42544 1 0 0 0 0 0
7 France 2017 0.630 88187 1 0 1 0 0 0
8 France 2018 -0.276 91368 1 0 0 1 0 0
9 France 2019 -0.284 65563 1 0 0 0 1 0
10 France 2020 -0.919 22061 1 0 0 0 0 1
11 Croatia 2016 -0.116 80390 0 1 0 0 0 0
12 Croatia 2017 1.82 48623 0 1 1 0 0 0
13 Croatia 2018 0.371 93444 0 1 0 1 0 0
14 Croatia 2019 0.520 79582 0 1 0 0 1 0
15 Croatia 2020 -0.751 33367 0 1 0 0 0 1
As #DaveArmstrong correctly noted, you should specify the panel indexes. First, we specify a panel data frame, then we estimate the model.
pdata <- pdata.frame(panel, index = c("year", "country"))
random <- plm(y ~ x, model = "random", data = pdata)
A one-way random effects model is fit. The call to summary() will produce the following (abridged output):
Call:
plm(formula = y ~ x, data = pdata, model = "random")
Balanced Panel: n = 5, T = 3, N = 15
Effects:
var std.dev share
idiosyncratic 685439601 26181 0.819
individual 151803385 12321 0.181
theta: 0.2249
Residuals:
Min. 1st Qu. Median 3rd Qu. Max.
-49380 -17266 6221 17759 32442
Coefficients:
Estimate Std. Error z-value Pr(>|z|)
(Intercept) 58308.0 8653.7 6.7380 1.606e-11 ***
x 7777.0 8808.9 0.8829 0.3773
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
But your data does not have this structure, hence the warning message. In fact, your data is similar to carving out one country from this panel. For example, suppose we winnowed down the data frame to Croatian observations only. The following code takes a subset of the previous data frame:
croatia_only <- panel %>%
filter(country == "Croatia") # grab only the observations from Croatia
Here, longitudinal variation only exists for one country. In other words, by restricting attention to Croatia, we cannot exploit the variation across countries; we only have variation in one dimension! The resulting data frame looks like the following:
# Time Series - varies across one dimension (time)
# A tibble: 5 x 10
country year x y France Croatia y_2017 y_2018 y_2019 y_2020
<chr> <int> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Croatia 2016 -0.116 80390 0 1 0 0 0 0
2 Croatia 2017 1.82 48623 0 1 1 0 0 0
3 Croatia 2018 0.371 93444 0 1 0 1 0 0
4 Croatia 2019 0.520 79582 0 1 0 0 1 0
5 Croatia 2020 -0.751 33367 0 1 0 0 0 1
Now I will re-estimate a random effects model with one country:
pdata <- pdata.frame(croatia_only, index = c("year", "country"))
random_croatia <- plm(y ~ x , model = "random", data = pdata)
This should reproduce your error message (i.e., empty model). Note, you only have variation within one country! As you correctly noted, a "between-effects" model is estimable, but not for reasons you might presume. A "between effects" model averages over all years within a country, then it runs ordinary least squares on the 'averaged' data. In your setting, taking the average over your time series results in a country mean. And since you only observe one country, then you only have one observation. Such a model is inestimable. However, you can 'pool' together all of your yearly observations for one country and run a linear model instead. That is what you're doing. To test this out using one country, try comparing the "between" model with the "pooling" model. They should produce identical estimates of x.
# Run this using the croatia_only data frame
summary(plm(y ~ x , model = "between", data = pdata))
summary(plm(y ~ x , model = "pooling", data = pdata))
It should be painfully obvious now, but model = "pooling" is equivalent to running lm().
If you want me to tie this into your previous post, try estimating a linear model with separate dummies for all years as covariates. You will quickly discover that you have no residual degrees of freedom, which is exactly the problem outlined in your other post.
In sum, I would look for data from other countries. Once you do that, you can use plm() for all it's worth.
I am using R to compute the linear regression on the following model, as well as find the marginal effects of age on pizza at specific points (20,30,40,50,55).
mod6.22c <- lm(pizza ~ age + income + age*income +
I((age*age)*income), data = piz4)
The problem I am running into is when using the margins command, R does not see interaction terms that are inserted into the lm with I((age x age) x income). The margins command will only produce accurate average marginal effects when the interaction terms are in the form of variable1 x variable1. I also can't create a new variable in my table table$newvariable <- table$variable1^2, because the margins command won't identify newvariable as related to variable1.
This has been fine up until now, where my interaction terms have only been a quadratic, or an xy interaction, but now I am at a point where I need to calculate the average marginal effects with the interaction term AGE^2xINCOME included in the model, but the only way I can seem to get the summary lm output to be correct is by using I(age^2*(income)) or by creating a new variable in my table. As stated before, the margins command can't read I(age^2*(income)), and if I create a new variable, the margins command doesn't recognize the variables are related, and the average marginal effects produced are incorrect.
The error I am receiving:
> summary(margins(mod6.22c, at = list(age= c(20,30,40,50,55)),
variables = "income"))
Error in names(classes) <- clean_terms(names(classes)) :
'names' attribute [4] must be the same length as the vector [3]
I appreciate any help in advance.
Summary of data:
Pizza is annual expenditure on pizza, female, hs, college and grad are dummy variables, income is in thousands of dollars per year, age is years old.
> head(piz4)
pizza female hs college grad income age agesq
1 109 1 0 0 0 19.5 25 625
2 0 1 0 0 0 39.0 45 2025
3 0 1 0 0 0 15.6 20 400
4 108 1 0 0 0 26.0 28 784
5 220 1 1 0 0 19.5 25 625
6 189 1 1 0 0 39.0 35 1225
Libraries used:
library(data.table)
library(dplyr)
library(margins)
tldr
This works:
mod6.22 <- lm(pizza ~ age + income + age*income, data = piz4)
**summary(margins(mod6.22, at = list(age= c(20,30,40,50,55)), variables = "income"))**
factor age AME SE z p lower upper
income 20.0000 4.5151 1.5204 2.9697 0.0030 1.5352 7.4950
income 30.0000 3.2827 0.9049 3.6276 0.0003 1.5091 5.0563
income 40.0000 2.0503 0.4651 4.4087 0.0000 1.1388 2.9618
income 50.0000 0.8179 0.7100 1.1520 0.2493 -0.5736 2.2095
income 55.0000 0.2017 0.9909 0.2036 0.8387 -1.7403 2.1438
This doesn't work:
mod6.22c <- lm(pizza ~ age + income + age*income + I((age * age)*income), data = piz4)
**summary(margins(mod6.22c, at = list(age= c(20,30,40,50,55)), variables = "income"))**
Error in names(classes) <- clean_terms(names(classes)) :
'names' attribute [4] must be the same length as the vector [3]
How do I get margins to read my interaction variable I((age*age)*income)?
I would like to plot each of the variables that are part of the glm model, where the y axis is the predicted probability and the x axis is the variable levels or values.
Here is my code that I tried in order to do it:
The data:
dat <- read.table(text = "target apcalc admit num
0 0 0 21
0 0 1 24
0 1 0 55
0 1 1 72
1 0 0 5
1 0 1 31
1 1 0 11
1 1 1 3", header = TRUE)
The glm model:
f<-glm(target ~ apcalc + admit +num, data = dat,family=binomial(link='logit'))
The loop to present the desired plot:
for(i in 1:length(f$var.names)){
plot(predict(f,i.var.names=i,newdata=dat,type='response'))
}
I got a strange plot as an output ("Index" in the x axis and "predict(f,i.var.names=i,newdata=dat,type='response')" in the y axis. How can I fix my code in order to get the desired result?
(I don't the reputation yet in order to present it here)
Heres plotting all your variables with the predicted probability,
f<-glm(target ~ apcalc + admit +num, data=dat,family=binomial(link="logit"))
PredProb=predict(f,type='response') #predicting probabilities
par(mfrow=c(2,2))
for(i in names(dat)){
plot(dat[,i],PredProb,xlab=i)
}
On running the f<-glm(.....) part, f$var.names is giving NULL as output. There must be some error there.
f<-glm(target ~ apcalc + admit +num, data=dat,family=binomial("logit"))
f
Call: glm(formula = target ~ apcalc + admit + num, family = binomial("logit"),
data = dat)
Coefficients:
(Intercept) apcalc admit num
2.2690 3.1742 2.4406 -0.1721
Degrees of Freedom: 7 Total (i.e. Null); 4 Residual
Null Deviance: 11.09
Residual Deviance: 5.172 AIC: 13.17
f$var.names
NULL
I'm having some trouble using coxph(). I've two categorical variables:"tecnologia" and "pais", and I want to evaluate the possible interaction effect of "pais" on "tecnologia"."tecnologia" is a variable factor with 2 levels: gps and convencional. And "pais" as 2 levels: PT and ES. I have no idea why this warning keeps appearing.
Here's the code and the output:
cox_AC<-coxph(Surv(dados_temp$dias_seg,dados_temp$status)~tecnologia*pais,data=dados_temp)
Warning message:
In coxph(Surv(dados_temp$dias_seg, dados_temp$status) ~ tecnologia * :
X matrix deemed to be singular; variable 3
> cox_AC
Call:
coxph(formula = Surv(dados_temp$dias_seg, dados_temp$status) ~
tecnologia * pais, data = dados_temp)
coef exp(coef) se(coef) z p
tecnologiagps -0.152 0.859 0.400 -0.38 7e-01
paisPT 1.469 4.345 0.406 3.62 3e-04
tecnologiagps:paisPT NA NA 0.000 NA NA
Likelihood ratio test=23.8 on 2 df, p=6.82e-06 n= 127, number of events= 64
I'm opening another question about this subject, although I made a similar one some months ago, because I'm facing the same problem again, with other data. And this time I'm sure it's not a data related problem.
Can somebody help me?
Thank you
UPDATE:
The problem does not seem to be a perfect classification
> xtabs(~status+tecnologia,data=dados)
tecnologia
status conv doppler gps
0 39 6 24
1 30 3 34
> xtabs(~status+pais,data=dados)
pais
status ES PT
0 71 8
1 49 28
> xtabs(~tecnologia+pais,data=dados)
pais
tecnologia ES PT
conv 69 0
doppler 1 8
gps 30 28
Here's a simple example which seems to reproduce your problem:
> library(survival)
> (df1 <- data.frame(t1=seq(1:6),
s1=rep(c(0, 1), 3),
te1=c(rep(0, 3), rep(1, 3)),
pa1=c(0,0,1,0,0,0)
))
t1 s1 te1 pa1
1 1 0 0 0
2 2 1 0 0
3 3 0 0 1
4 4 1 1 0
5 5 0 1 0
6 6 1 1 0
> (coxph(Surv(t1, s1) ~ te1*pa1, data=df1))
Call:
coxph(formula = Surv(t1, s1) ~ te1 * pa1, data = df1)
coef exp(coef) se(coef) z p
te1 -23 9.84e-11 58208 -0.000396 1
pa1 -23 9.84e-11 100819 -0.000229 1
te1:pa1 NA NA 0 NA NA
Now lets look for 'perfect classification' like so:
> (xtabs( ~ s1+te1, data=df1))
te1
s1 0 1
0 2 1
1 1 2
> (xtabs( ~ s1+pa1, data=df1))
pa1
s1 0 1
0 2 1
1 3 0
Note that a value of 1 for pa1 exactly predicts having a status s1 equal to 0. That is to say, based on your data, if you know that pa1==1 then you can be sure than s1==0. Thus fitting Cox's model is not appropriate in this setting and will result in numerical errors.
This can be seen with
> coxph(Surv(t1, s1) ~ pa1, data=df1)
giving
Warning message:
In fitter(X, Y, strats, offset, init, control, weights = weights, :
Loglik converged before variable 1 ; beta may be infinite.
It's important to look at these cross tables before fitting models. Also it's worth starting with simpler models before considering those involving interactions.
If we add the interaction term to df1 manually like this:
> (df1 <- within(df1,
+ te1pa1 <- te1*pa1))
t1 s1 te1 pa1 te1pa1
1 1 0 0 0 0
2 2 1 0 0 0
3 3 0 0 1 0
4 4 1 1 0 0
5 5 0 1 0 0
6 6 1 1 0 0
Then check it with
> (xtabs( ~ s1+te1pa1, data=df1))
te1pa1
s1 0
0 3
1 3
We can see that it's a useless classifier, i.e. it does not help predict status s1.
When combining all 3 terms, the fitter does manage to produce a numerical value for te1 and pe1 even though pe1 is a perfect predictor as above. However a look at the values for the coefficients and their errors shows them to be implausible.
Edit #JMarcelino: If you look at the warning message from the first coxph model in the example, you'll see the warning message:
2: In coxph(Surv(t1, s1) ~ te1 * pa1, data = df1) :
X matrix deemed to be singular; variable 3
Which is likely the same error you're getting and is due to this problem of classification. Also, your third cross table xtabs(~ tecnologia+pais, data=dados) is not as important as the table of status by interaction term. You could add the interaction term manually first as in the example above then check the cross table. Or you could say:
> with(df1,
table(s1, pa1te1=pa1*te1))
pa1te1
s1 0
0 3
1 3
That said, I notice one of the cells in your third table has a zero (conv, PT) meaning you have no observations with this combination of predictors. This is going to cause problems when trying to fit.
In general, the outcome should be have some values for all levels of the predictors and the predictors should not classify the outcome as exactly all or nothing or 50/50.
Edit 2 #user75782131 Yes, generally speaking xtabs or a similar cross-table should be performed in models where the outcome and predictors are discrete i.e. have a limited no. of levels. If 'perfect classification' is present then a predictive model / regression may not be appropriate. This is true for example for logistic regression (outcome is binary) as well as Cox's model.