Adding industry dummies to 2SLS in R - r

for my Master Thesis I want to regress the ESG score on the stock price drop during the pandemic. For OLS this works fine. To check for potential Endogeneity I also conduct a 2SLS regression with the industry average ESG score as instrument. This works fine as long as I leave out the industry dummies. When adding them I get the following error Message:
Warning message:
In pf(w, df[1L], df[2L], lower.tail = FALSE) : NaNs produced
Moreover, the diagnostics for weak instruments and Wu-Hausman also show NaN.
I am aware of the dummy variable trap so not all industries were included.
Does anyone know why I get this error message? Any help is appreciated. Below I will provide my results with and without dummies.
Without dummies
With dummies
I managed to replicate the warning message with the first 10 rows of my data:
structure(list(NAME = c("A-MARK PRECIOUS METALS", "AAON", "AAR", "ABBOTT LABORATORIES", "ABBVIE", "ABEONA THERAPEUTICS", "ABERCROMBIE & FITCH A", "ABIOMED", "ABM INDS.", "ABRAXAS PETROLEUM"), ESG = c(30.93, 46.31, 24.66, 70.67, 79.79, 36.58, 69.13, 25.88, 72.66, 18.88 ), LogAssets = c(13.538484837701, 13.0147959839376, 14.5473975668402, 18.0628622893116, 18.8299054426017, 11.9263455151269, 15.0139386185708, 13.9751825387466, 15.1444141253049, 11.9688365085588), Quick = c(0.38, 2.27, 1.69, 1.14, 0.6, 2.26, 1.24, 4, 1.32, 0.1), I_Agri = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), I_Cons = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), I_Fin = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), I_Man = c(0, 1, 0, 1, 1, 1, 0, 1, 0, 0), I_Mining = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 1), I_Serv = c(0, 0, 0, 0, 0, 0, 0, 0, 1, 0), I_Trade = c(1, 0, 0, 0, 0, 0, 1, 0, 0, 0), I_Utility = c(0, 0, 1, 0, 0, 0, 0, 0, 0, 0), Drop = c(0.107419712070875, 0.277738886944347, 0.791427308882015, 0.297000895255148, 0.31485022307202, 0.75, 0.527002967359051, 0.222692078618225, 0.473209685729006, 0.683189655172414), Leverage = c(0.8177, 0.0178, 0.3993, 0.3623, 0.8679, 0.0169, 0.2659, 0, 0.3257, 1.5458 ), ROA = c(0.0622, 0.1926, 0.0065, 0.0726, 0.0542, -0.4324, -0.0259, 0.1888, 0.0095, -0.6467), ESG_A = c(41.9334803921569, 41.6947268673356, 42.0122772277228, 41.6947268673356, 41.6947268673356, 41.6947268673356, 41.9334803921569, 41.6947268673356, 37.5789174311926, 34.9968604651163 )), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame" ))
The code I used:
library(AER)
IV2=ivreg(Drop~ESG+LogAssets+Leverage+ROA+Quick+I_Cons+I_Fin+I_Man+I_Mining+I_Serv+I_Trade+I_Utility | ESG_A+LogAssets+Leverage+ROA+Quick+I_Cons+I_Fin+I_Man+I_Mining+I_Serv+I_Trade+I_Utility, data=mwe)
summary(IV2, diagnostics = TRUE)
Rstudio Version: 2022.2.0.443 Operating system: Windows 10 pro 64 bit
Thank you!

Related

Aggregate similar constructs/ FA with binary variables

I would like to aggregate, in order to reduce the number of constructs, its following data frame containing only binary variables that correspond to "yes/no", its following data frame (first 10 row). The original data frame contains 169 rows.
outcome <-
structure(list(Q9_Automazione.processi = c(0, 0, 0, 0, 0, 0,
1, 1, 1, 0), Q9_Velocita.Prod = c(1, 0, 0, 1, 0, 0, 1, 1, 1,
0), Q9_Flessibilita.Prod = c(0, 0, 0, 1, 0, 0, 1, 1, 0, 1), Q9_Controllo.processi = c(0,
0, 0, 1, 0, 0, 1, 1, 0, 0), Q9_Effic.Magazzino = c(0, 0, 0, 1,
0, 0, 0, 0, 0, 0), Q9_Riduz.Costi = c(0, 1, 0, 0, 0, 0, 0, 0,
0, 1), Q9_Miglior.Sicurezza = c(0, 0, 0, 0, 0, 0, 1, 0, 1, 1),
Q9_Connett.Interna = c(0, 0, 0, 0, 0, 0, 0, 1, 1, 0), Q9_Connett.Esterna = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0), Q9_Virtualizzazione = c(0, 0,
0, 0, 0, 0, 0, 0, 0, 0), Q9_Innov.Prod = c(0, 0, 0, 0, 0,
1, 0, 0, 0, 1), Q9_Person.Prod = c(0, 1, 0, 1, 0, 1, 0, 0,
0, 1), Q9_Nuovi.Mercati = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
Q9_Nuovi.BM = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Q9_Perform.Energ = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0), Q9_Perform.SostAmb = c(0, 0,
0, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, 10L), class = "data.frame")
I have tried performing factor analysis via the tethracoric method on the obtained correlation matrix ( the obtained value from the KMO function turns out to be inadequate) both directly on the dataframe and then using tethracoric correletions in fafunction (using cor = "tet" I get a negative Tucker Lewis Index).
I have been reading up on this but cannot find a methodology that is adequate and of which I am certain of the correctness of the analysis.
So basically what I would like to achieve is to aggregate similar constructs, e.g., assess whether column 5 has value 1 (i.e., "yes") almost always when column 11 has value 1 and then aggregate.
Here the code that I try to used
library(psych)
tet <- tetrachoric(outcome)
corrplot(tet$rho, "ellipse", tl.cex = 0.75, tl.col = "black")
par(mfrow = c(1,2))
corr_matrix %>%
ggcorrplot(show.diag = F,
type="lower",
lab=TRUE,
lab_size=2)
KMO(corr_matrix)
cortest.bartlett(corr_matrix)
fa.parallel(corr_matrix, fm = "ml")
factor <- fa(corr_matrix, nfactors = 3, rotate = "oblimin", fm = "ml")
print(factor, cut = 0.3, digits = 3)
# -------- Pearson --------
cor(outcome, method = 'pearson', use = "pairwise.complete.obs") %>%
ggcorrplot(show.diag = F,
type="lower",
lab=TRUE,
lab_size=2)
KMO(outcome)
cortest.bartlett(outcome)
fa.parallel(outcome)
factor1 <- fa(outcome, nfactors = 3, rotate = "oblimin", cor = "tet", fm = "ml")
print(factor1, cut = 0.3, digits = 3)

t.test outputs in the `table` package in R

So here's a sample of the data I am working with:
> dput(candidateEvokeDFYoung)
structure(list(youngTreatment = structure(c(NA, 1, 0, 1, 0, 1,
0, 1, 1, 0, 1, 1, 0, 0, NA, NA, NA, NA, 1, 1), format.stata = "%10.0g"),
candTrustworthy = structure(c(0, 0, 0, 0, 0, 0, 0, 1, 0,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), format.stata = "%10.0g"),
candKnowledgeable = structure(c(1, 0, 0, 0, 1, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1), format.stata = "%10.0g"),
candQualified = structure(c(0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 1), format.stata = "%10.0g"),
candConservative = structure(c(0, 0, 0, 0, 1, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0), format.stata = "%10.0g"),
candLiberal = structure(c(0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0), format.stata = "%10.0g"), candInexperienced = structure(c(0,
1, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0), format.stata = "%10.0g"),
candPrincipled = structure(c(1, 1, 0, 0, 0, 0, 1, 0, 0, 0,
0, 0, 0, 1, 1, 1, 0, 0, 0, 0), format.stata = "%10.0g"),
candDistance = structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 0, 0, 0), format.stata = "%10.0g"),
candEfficacy = structure(c(1, 0, 0, 0, 1, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 0, 0, 1, 0, 0), format.stata = "%10.0g")), row.names = c(NA,
-20L), class = c("tbl_df", "tbl", "data.frame"))
What I am trying to do is generate a table using the tables package with results from a t.test. The trouble I am having is I've taken this dataset and have used lapply to calculate my t.tests on each of the variables with youngTreatment as my 'y' variable:
candidateEvokesDiffYoung = lapply(candidateEvokeDFYoung[-1], function(x) t.test(x ~ candidateEvokeDFYoung$youngTreatment))
This gives me a list of lists. I have no clue how to use tables::tabular to access
list[['statistic']]
and
list[['p.value]]
I could definitely just manually pull all of these out myself and put it in a dataframe for stargazer or something, but I was wondering if there was someone who knew how I could do this more efficiently and with the tables package.
t.test returns objects of the class htest. I believe the best way to gather the results of an object of the class htest is to use the function tidy of the package broom.
library(broom)
candidateEvokesDiffYoung = lapply(candidateEvokeDFYoung[-1],
function(x) {
t.test(x ~ candidateEvokeDFYoung$youngTreatment)
})
m <- t(sapply(candidateEvokesDiffYoung, tidy))
This will allow you to refer to the elements in a similar way to what you seem to be trying to.
> m["candTrustworthy", "p.value"][[1]]
[1] 0.7875872
> unlist(m[, "p.value"])
candTrustworthy candKnowledgeable candQualified candConservative candLiberal candInexperienced candPrincipled candDistance candEfficacy
0.7875872 0.7875872 0.7875872 0.3632175 0.3465935 0.6933006 0.3790778 NaN 0.3632175

weighted mean R

I was wondering how I can get the weighted average of my data. I have already looked on the internet, but when I try the weighted.mean function, I keep getting the same result, so I was wondering what I am doing wrong.
Below is some information of the dataset:
dput(head(new))
structure(list(comp.1 = c(0.5, 0.25, 0, 0.25, 0.31, 0.3), comp.2 = c(0.3,
0.15, 0, 0.15, 0, 0), comp.3 = c(0.2, 0.6, 1, 0.6, 0.69, 0.7),
genderMale = c(0, 1, 1, 1, 0, 0), SeniorCitizen = c(0, 0,
0, 0, 0, 0), PartnerYes = c(1, 0, 0, 0, 0, 0), DependentsYes = c(0,
0, 0, 0, 0, 0), tenure = c(-1.28015700354285, 0.064298112878097,
-1.23941593940889, 0.512449818351747, -1.23941593940889,
-0.994969554605076), MultipleLinesYes = c(0, 0, 0, 0, 0,
1), `InternetServiceFiber optic` = c(0, 0, 0, 0, 1, 1), OnlineSecurityYes = c(0,
1, 1, 1, 0, 0), OnlineBackupYes = c(1, 0, 1, 0, 0, 0), DeviceProtectionYes = c(0,
1, 0, 1, 0, 1), TechSupportYes = c(0, 0, 0, 1, 0, 0), StreamingTVYes = c(0,
0, 0, 0, 0, 1), StreamingMoviesYes = c(0, 0, 0, 0, 0, 1),
`ContractOne year` = c(0, 1, 0, 1, 0, 0), `ContractTwo year` = c(0,
0, 0, 0, 0, 0), PaperlessBillingYes = c(1, 0, 1, 0, 1, 1),
`PaymentMethodCredit card (automatic)` = c(0, 0, 0, 0, 0,
0), `PaymentMethodElectronic check` = c(1, 0, 0, 0, 1, 1),
`PaymentMethodMailed check` = c(0, 1, 1, 0, 0, 0), MonthlyCharges = c(-1.16161133177258,
-0.260859369930086, -0.363897417225722, -0.747797238601399,
0.196164226945719, 1.15840663636787), TotalCharges = c(1.47494433546539,
3.27634689625303, 2.03402652377511, 3.26499480914874, 2.18084241464668,
2.91407858538911)), row.names = c("1", "2", "3", "4", "5",
"6"), class = "data.frame")
As you can see, I have 3 components (comp.1, comp.2, comp.3). All of these components have their posterior probabilities. And I am wondering how I can get the weighted averages for all of these and the final weighted averages. I have tried:
weighted.mean(new$comp.1, new$SeniorCitizen)
weighted.mean(new$comp.2, new$SeniorCitizen)
weighted.mean(new$comp.3, new$SeniorCitizen)
It gave me the output 0.24, 0.14 and 0.61. But irrespectively which variable I put, I get the same output. What am I doing wrong?

How to add ellipse in bray nmds analysis in vegan package

I have plotted point graph using vegan package but I want to circle the similarly treated species. As shown in the figure, 3 colors for 3 treatments. I want to circle them too.
Here is my code.
library(vegan)
library(MASS)
library(readxl)
bray1 <- read_excel("bray1.xlsx")
cols <- c("red", "blue","blue", "green","green","red","blue","green","green","red","red","blue")
row.names(bray1) <- c("SI1", "SII0", "SI0", "SII2", "SI2", "SII1", "SIII0", "SIV2", "SIII2", "SIV1", "SIII1", "SIV0")
bcdist <- vegdist(bray1, "bray")
bcmds <- isoMDS(bcdist, k = 2)
plot(bcmds$points, type = "n", xlab = "", ylab = "")
text(bcmds$points, dimnames(bray1)[[1]],col = cols,size=10)
[My data
bray1<-structure(list(`Andropogon virginicus` = c(0, 0, 0, 0, 2.7, 31.5333333333333, 0, 0, 0, 0, 0, 0), `Oenothera parviflora` = c(61.6,30.3333333333333, 7.53333333333333, 0, 11.7333333333333, 0, 0, 0,75.4, 0, 0, 0), `Lespedeza cuneata` = c(0, 0, 0, 0, 0, 46.7333333333333, 0, 0, 3, 0, 0, 0), `Lespedeza pilosa` = c(0, 1.93333333333333, 0, 0, 1.73333333333333, 0, 0, 0, 0, 1.7, 0, 0), `Chamaesyce maculata` = c(0, 0, 0,4.733333333, 0, 0, 0, 0, 0, 0, 0, 0), `Chamaesyce nutans` = c(0,0, 0, 0, 0,0, 0.166666666666667, 0, 0, 0, 0, 0), `Bidens frondosa` = c(0, 0, 0,1.76666666666667, 1.03333333333333, 3.23333333333333, 0, 0, 0, 0, 0, 0), `Erigeron annuus` = c(0, 0, 0, 0, 0.4, 0, 0, 0, 0, 0, 0, 0), `Erigeron canadensis` = c(0, 0, 0, 0, 0, 4.33333333333333, 0, 0, 9.1, 2.066666667, 0,0), `Equisetum arvense` = c(46, 62.7333333333333, 0, 1.66666666666667, 0, 0.533333333333333, 0, 0, 0, 0, 0, 0), `Erigeron sumatrensis` = c(0, 0, 0, 0, 0, 16.4333333333333, 0, 4, 0, 6.633333333, 0, 0), `Hypochaeris radicata` = c(0, 3.76666666666667, 116.6, 0, 5.033333333, 9.76666666666667, 29, 0, 23.1666666666667, 82.16666667, 0, 0), `Lactuca indica` = c(10.26666667, 0, 1.566666667, 120.1333333, 44.36666667, 42.0333333333333, 0, 14.2333333333333, 0, 0, 14.36666667, 22.2), `Solidago altissima` = c(0, 1.06666666666667, 33.93333333, 0, 0, 0, 0, 0, 0, 6.6, 0, 0), `Sonchus asper` = c(0, 35.9, 0, 0, 0, 7.46666666666667,
29.6666666666667, 4.96666666666667, 0, 0, 0.23, 2.933333333 )), .Names = c("Andropogon virginicus", "Oenothera parviflora", "Lespedeza cuneata", "Lespedeza pilosa", "Chamaesyce maculata", "Chamaesyce nutans", "Bidens frondosa", "Erigeron annuus", "Erigeron canadensis", "Equisetum arvense", "Erigeron sumatrensis", "Hypochaeris radicata", "Lactuca indica", "Solidago altissima", "Sonchus asper"), row.names = c(NA, -12L), class = c("tbl_df", "tbl", "data.frame"))
Here are a couple of alternatives based on the dataEllipse function in the car package. I have made a few minor alterations to your base graph. I found it hard to read the pure "green" color text, so I switched it to "darkgreen". I changed the plotting limits so that the full ellipses would be in the picture. Also, your text statement included an argument size. text does not have an argument size so I replaced it with cex to set the font size.
library(car)
Group = c(1,2,2,3,3,1,2,3,3,1,1,2)
cols <- c("red", "blue","blue", "darkgreen","darkgreen","red","blue",
"darkgreen","darkgreen","red","red","blue")
In the first version, I did what I think you asked for, ellipses marking the treatment groups.
plot(bcmds$points, type = "n", xlab = "", ylab = "",
xlim=c(-0.8,0.8), ylim=c(-0.8,0.8), asp=1)
text(bcmds$points, dimnames(bray1)[[1]],col = cols, cex=0.8)
dataEllipse(bcmds$points[,1], bcmds$points[,2], factor(Group),
plot.points=F, add=T, col=c("red", "blue", "green"),
levels=rep(0.6, 3), center.pch=0, group.labels="", lwd=1)
In the second version, instead of using the outline of the ellipse, I use a transparent fill color to show the ellipses.
plot(bcmds$points, type = "n", xlab = "", ylab = "",
xlim=c(-0.8,0.8), ylim=c(-0.8,0.8), asp=1)
text(bcmds$points, dimnames(bray1)[[1]],col = cols, cex=0.8)
dataEllipse(bcmds$points[,1], bcmds$points[,2], factor(Group),
plot.points=F, add=T, col=c("red", "blue", "green"),
levels=rep(0.6, 3), center.pch=0, group.labels="",
lty=0, fill=TRUE, fill.alpha=0.04)

Plotting all of the rows in different graph - data frame

Propably the code is very simple but I have never tried plotting in R yet.
I would like to have a linear plot for every row and all the plots on different graph.
The number in my data goes from 0 to 1. Value one is the maximum of the plot, in some cases there might be few maximums in a single row. I would like to have a pdf file as an output.
Data:
> dput(head(tbl_end))
structure(list(`NA` = structure(1:6, .Label = c("AT1G01050",
"AT1G01080", "AT1G01090", "AT1G01220", "AT1G01320", "AT1G01420",
"ATCG00800", "ATCG00810", "ATCG00820", "ATCG01090", "ATCG01110",
"ATCG01120", "ATCG01240", "ATCG01300", "ATCG01310", "ATMG01190"
), class = "factor"), `10` = c(0, 0, 0, 0, 0, 0), `20` = c(0,
0, 0, 0, 0, 0), `52.5` = c(0, 1, 0, 0, 0, 0), `81` = c(0, 0.660693687777888,
0, 0, 0, 0), `110` = c(0, 0.521435654491704, 0, 0, 0, 1), `140.5` = c(0,
0.437291194705566, 0, 0, 0, 1), `189` = c(0, 0.52204783488213,
0, 0, 0, 0), `222.5` = c(0, 0.524298383907171, 0, 0, 0, 0), `278` = c(1,
0.376865096972469, 0, 1, 0, 0), `340` = c(0, 0, 0, 0, 0, 0),
`397` = c(0, 0, 0, 0, 0, 0), `453.5` = c(0, 0, 0, 0, 0, 0
), `529` = c(0, 0, 0, 0, 0, 0), `580` = c(0, 0, 0, 0, 0,
0), `630.5` = c(0, 0, 0, 0, 0, 0), `683.5` = c(0, 0, 0, 0,
0, 0), `735.5` = c(0, 0, 0, 0, 0, 0), `784` = c(0, 0, 0.476101907006443,
0, 0, 0), `832` = c(0, 0, 1, 0, 0, 0), `882.5` = c(0, 0,
0, 0, 0, 0), `926.5` = c(0, 0, 0, 0, 1, 0), `973` = c(0,
0, 0, 0, 0, 0), `1108` = c(0, 0, 0, 0, 0, 0), `1200` = c(0,
0, 0, 0, 0, 0)), .Names = c(NA, "10", "20", "52.5", "81",
"110", "140.5", "189", "222.5", "278", "340", "397", "453.5",
"529", "580", "630.5", "683.5", "735.5", "784", "832", "882.5",
"926.5", "973", "1108", "1200"), row.names = c(NA, 6L), class = "data.frame").
Would be great to have a name of the row on the top of each page in pdf.
Here's an example using your dputed data:
# open the pdf file
pdf(file='myfile.pdf')
# since I don't know what values should be on the X axis,
# I'm just using values from 1 to number of y-values
x <- 1:(ncol(tbl_end)-1)
for(i in 1:nrow(tbl_end)){
# plot onto a new pdf page
plot(x=x,y=tbl_end[i,-1],type='b',main=tbl_end[i,1],xlab='X',ylab='Y')
}
# close the pdf file
dev.off()
where the first page is something like this:
If you want to change the style (e.g. lines without the little circles etc.) of the plot, have a look at the documentation.

Resources