lately I met a problem that took me quite a long time to figure it out but could not in the end. I want to use pgmm function in the package plm to produce GMM estimate on a cross-section country data including 180 countries and 65 time periods. Here is my code:
pgmm(D_rcr ~ lag(D_rcr,1) +
eco_cycle + I(log(Human_trend)) +
I(log(capital_trend)) + I(log(rtfpna)) + exp_rate + urban +
industry + service| plm::lag(mpk3_delta,3:6),data= data_test,
index = c("country","year"),effect = "twoways",transformation = "ld")
And the data is like:
country year D_rcr eco_cycle Human_trend capital_trend rtfpna exp_rate urban industry service
1000 Burkina Faso 1999 0.0074201618 0.0295545705 4.644064 23946.998 0.8284378 -8.221149e-06 17.166 25.19151 42.19550
1001 Burkina Faso 2000 -0.0046062428 -0.0085762554 4.781708 25026.203 0.8177401 -8.013943e-06 17.844 21.52736 45.66413
1002 Burkina Faso 2001 -0.0074698958 -0.0022468581 4.942214 26203.394 0.8430429 -4.433730e-06 18.540 19.47667 43.47496
1003 Burkina Faso 2002 -0.0072339948 -0.0180040290 5.102502 27513.395 0.8564266 -4.243651e-06 19.258 17.52184 43.92530
1004 Burkina Faso 2003 0.0208224248 -0.0013267292 5.262760 28994.841 0.8928111 -4.900598e-06 19.996 21.18051 41.74380
1005 Burkina Faso 2004 0.0077643394 -0.0164384391 5.424015 30686.577 0.9057222 -5.039807e-06 20.757 21.17522 44.30414
1006 Burkina Faso 2005 -0.0162568441 0.0079704026 5.588694 32625.279 0.9540021 -6.000714e-06 21.537 17.97970 42.98950
1007 Burkina Faso 2006 0.0157383040 0.0101814490 5.759905 34843.140 0.9746150 -6.004488e-06 22.339 17.62221 45.65378
1008 Burkina Faso 2007 0.0200791048 -0.0074020766 5.940701 37366.725 0.9920313 -5.925001e-06 23.163 18.95747 48.37608
1009 Burkina Faso 2008 -0.0329526715 -0.0083514921 6.134060 40213.051 1.0026470 -6.737820e-06 23.993 16.22030 43.57783
1010 Burkina Faso 2009 0.0108550329 -0.0364106100 6.341043 43385.046 0.9904070 -5.581967e-06 24.828 19.32466 45.10181
1011 Burkina Faso 2010 0.0003792556 -0.0105232997 6.561223 46865.511 1.0080181 -3.757856e-06 25.665 23.00269 41.38044
1012 Burkina Faso 2011 0.0036570272 -0.0008078762 6.808363 50612.776 1.0000000 -1.947466e-06 26.505 27.15270 39.00203
1013 Burkina Faso 2012 -0.0133615481 0.0088997716 7.066275 54562.488 0.9885733 -4.819380e-06 27.346 24.91152 40.03299
1014 Burkina Faso 2013 -0.0124167169 0.0180233629 7.332700 58635.384 0.9726224 -4.963807e-06 28.186 20.99917 43.38722
1015 Burkina Faso 2014 -0.0093625110 0.0183559642 7.605543 62756.115 0.9531422 -2.547616e-06 29.024 20.47991 44.29216
1016 Burundi 1980 -0.0076063659 -0.0518049023 2.122768 4760.103 0.8508636 -4.026274e-05 4.339 12.61903 25.13108
1017 Burundi 1981 0.0062886770 0.0123692536 2.204532 5003.674 0.9142978 -2.922222e-05 4.503 13.41068 25.27003
1018 Burundi 1982 -0.0073451957 -0.0326804079 2.286727 5257.374 0.8792791 -3.623259e-05 4.674 15.44782 27.69609
1019 Burundi 1983 -0.0048256924 -0.0422295228 2.369051 5513.472 0.8658228 -3.508869e-05 4.850 15.50037 27.25349
1020 Burundi 1984 -0.0083655241 -0.0846945313 2.450960 5763.198 0.8221024 -3.652248e-05 5.033 13.84020 26.02882
1021 Burundi 1985 0.0062427433 -0.0081527185 2.531672 5997.500 0.8820450 -2.695840e-05 5.221 13.00197 25.46080
1022 Burundi 1986 0.0085330972 -0.0050508112 2.610223 6208.165 0.8880036 -2.122831e-05 5.417 13.51567 27.95960
1023 Burundi 1987 0.0013978951 0.0048612717 2.685521 6388.418 0.8895743 -2.379938e-05 5.620 17.12692 27.76170
1024 Burundi 1988 0.0120351151 0.0273960217 2.756473 6533.799 0.9023147 -1.564180e-05 5.830 16.66728 29.08768
1025 Burundi 1989 -0.0040708811 0.0237706740 2.822166 6643.176 0.8862884 -1.854892e-05 6.047 19.66485 26.65657
1026 Burundi 1990 0.0031577402 0.0461310277 2.882089 6718.144 0.9082311 -2.401352e-05 6.271 18.96324 25.15806
1027 Burundi 1991 0.0053723287 0.0913896512 2.944615 6763.149 0.9525304 -2.121296e-05 6.455 19.59612 26.09317
1028 Burundi 1992 -0.0006242234 0.1118378705 3.002747 6784.165 0.9633930 -2.326152e-05 6.637 21.17273 25.29381
1029 Burundi 1993 -0.0140939288 0.0566603249 3.058046 6787.775 0.8787442 -2.704195e-05 6.823 22.44800 24.93331
1030 Burundi 1994 -0.0051914045 0.0446976884 3.112606 6780.856 0.8391036 -2.185189e-05 7.014 22.47806 30.74383
1031 Burundi 1995 -0.0101237974 -0.0044596262 3.168960 6770.888 0.7683206 -2.265241e-05 7.211 19.24821 32.60719
1032 Burundi 1996 -0.0120905700 -0.0733896784 3.230019 6765.241 0.6975604 -1.365663e-05 7.412 12.63033 30.14807
1033 Burundi 1997 -0.0018359105 -0.0465258303 3.298963 6770.493 0.6909751 -9.966291e-06 7.618 15.62753 36.64545
1034 Burundi 1998 0.0010393142 0.0151532770 3.379006 6792.311 0.7113460 -1.557693e-05 7.830 15.84338 36.12517
1035 Burundi 1999 -0.0087320046 0.0148961511 3.473067 6835.203 0.6845708 -1.100638e-05 8.036 16.20578 35.90429
1036 Burundi 2000 -0.0036065995 0.0062503238 3.583363 6902.337 0.6644523 -1.597629e-05 8.246 16.93214 35.00821
1037 Burundi 2001 -0.0015909534 0.0154145281 3.715491 6997.577 0.6616966 -1.724226e-05 8.461 16.49441 37.06893
1038 Burundi 2002 0.0000723279 0.0336641982 3.865710 7124.685 0.6701715 -1.645656e-05 8.682 16.69844 37.51544
1039 Burundi 2003 -0.0081492568 -0.0184660371 4.033378 7286.589 0.6383393 -1.518514e-05 8.908 17.03472 36.60505
1040 Burundi 2004 -0.0053202507 -0.0321812288 4.217188 7484.465 0.6333758 -1.595708e-05 9.139 17.70286 36.85237
1041 Burundi 2005 -0.0073075515 -0.1193560863 4.415421 7718.514 0.5999591 -2.871464e-05 9.375 18.45308 37.05039
1042 Burundi 2006 0.0953559671 -0.1570653899 4.626188 7989.549 0.6067449 -3.880530e-05 9.617 16.71110 38.94535
1043 Burundi 2007 0.0094422379 -0.1988202659 4.847617 8298.925 0.6223342 -3.338575e-05 9.864 18.03837 44.62553
1044 Burundi 2008 0.0217479374 -0.1709468848 5.077926 8647.895 0.6832167 -3.469164e-05 10.118 15.98312 43.42593
1045 Burundi 2009 0.0516023614 -0.0108370357 5.315478 9036.445 0.8509285 -2.165262e-05 10.376 16.63140 42.83632
1046 Burundi 2010 0.0128353068 0.0302280799 5.558862 9461.720 0.9318257 -1.805687e-05 10.641 16.70423 42.84727
1047 Burundi 2011 0.0164302478 0.0549153842 5.820920 9917.799 1.0000000 -1.674949e-05 10.912 16.89924 42.75338
1048 Burundi 2012 0.0177947374 0.0834531229 6.088302 10396.854 1.0781934 -1.448994e-05 11.189 16.88556 42.53204
1049 Burundi 2013 -0.0094655827 0.0443990294 6.360665 10890.133 1.0653062 -1.141332e-05 11.472 17.73397 42.43866
1050 Burundi 2014 -0.0061952542 0.0121767175 6.637850 11389.026 1.0576664 -1.038206e-05 11.761 18.31099 42.42737
The error is
Error in solve.default(crossprod(WX, t(crossprod(WX, A1)))) :
Lapack routine dgesv: system is exactly singular: U[10,10] = 0
And sometimes after adjustments,i.e, data_test <- dplyr::filter(data_test,!is.na(rtfpna)), the error would become:
Error in solve.default(A1) :
system is computationally singular: reciprocal condition number = 1.14054e-16
or
Error in solve.default(crossprod(WX, t(crossprod(WX, A2)))) :
system is computationally singular: reciprocal condition number = 1.69599e-24
I guess the pgmm function 1) cannot handle well with the unbalanced dataframe as plm function, especially when the data contains 10% NA. 2)the solve function does not have a substitution to solve inverse matrix when the eigen value is too small. Also, according to my colleague who works mainly on Stata, Stata does not have such problem neither.So my question is, how to fix this problem, is my code heading the right way?
Any suggestion would be helpful.
Judging from the data you provided, this could cause the error: Your dataset it not balanced. It seems like your data for Burundi starts 1980, while Burkina Faso starts 1999.
I had the same error. In my dataset, I had the years 1891 to 1899, however 1892 was missing - I had forgotten to clean the data, so it is balanced. When I removed 1891, the problem was solved.
Intuitively this makes sense: The Sys-GMM uses high level lags to instrument the first lag. However, if years are randomly missing this obviously does not work consistently.
Of course, it could also be that you have highly correlated vars in your explaining variables.
Related
I have just loaded built-in R data set 'emissions'.
I would like to remove from data set first row 'United States'.
Apparently I can do it like:
data2 <- data[1,]
but what, if i know the name of row but not a position in data set?
How to remove it refering only to name, knowing that this row is named 'United States'?
Here is how data set looks like:
GDP perCapita CO2
UnitedStates 8083000 29647 6750
Japan 3080000 24409 1320
Germany 1740000 21197 1740
France 1320000 22381 550
UnitedKingdom 1242000 21010 675
Italy 1240000 21856 540
Russia 692000 4727 2000
Canada 658000 21221 700
Spain 642400 16401 370
Australia 394000 20976 480
Netherlands 343900 21755 240
Poland 280700 7270 400
Belgium 236300 23208 145
Sweden 176200 19773 75
I only tried to refer to it by row positions. Works fine, but I guess in bigger data sets I will not scroll trough rows and count them...
You could filter your dataframe by row.names using the following code:
data2[!(row.names(data2) %in% "UnitedStates"),]
#> GDP perCapita CO2
#> Japan 3080000 24409 1320
#> Germany 1740000 21197 1740
#> France 1320000 22381 550
#> UnitedKingdom 1242000 21010 675
#> Italy 1240000 21856 540
#> Russia 692000 4727 2000
#> Canada 658000 21221 700
#> Spain 642400 16401 370
#> Australia 394000 20976 480
#> Netherlands 343900 21755 240
#> Poland 280700 7270 400
#> Belgium 236300 23208 145
#> Sweden 176200 19773 75
Created on 2022-12-26 with reprex v2.0.2
Make sure you spelled the row name right.
Data:
data2 <- read.table(text = ' GDP perCapita CO2
UnitedStates 8083000 29647 6750
Japan 3080000 24409 1320
Germany 1740000 21197 1740
France 1320000 22381 550
UnitedKingdom 1242000 21010 675
Italy 1240000 21856 540
Russia 692000 4727 2000
Canada 658000 21221 700
Spain 642400 16401 370
Australia 394000 20976 480
Netherlands 343900 21755 240
Poland 280700 7270 400
Belgium 236300 23208 145
Sweden 176200 19773 75', header = TRUE)
yet another approach:
setdiff(rownames(data2),
c('UnitedStates', 'SkipThis', 'OmitThatToo')
) %>%
data2[., ]
Using which:
mtcars[which(rownames(mtcars)!='Mazda RX4'),]
As it has been said before:
df[!row.names(df) == "United States",]
This should be simple, I just can't get it to work
I have a dataframe all_emissions_state_total that looks something like this:
tribe state scc pollutant emissions unit category eis year fraction
NA WY 707 Methane 546 TON onroad NA 2011 NA
NA WY 707 Methane 38 TON onroad NA 2011 NA
NA WY 3405 Methane 2937 TON onroad NA 2011 NA
NA MT 707 Methane 665 TON onroad NA 2011 NA
NA WY 390 CO2 740 TON onroad NA 2011 NA
NA MT 390 CO2 12 TON onroad NA 2011 NA
NA WY 3405 Methane 329 TON onroad NA 2011 NA
GHYU WY 390 CO2 44 TON point NA 2011 NA
BERS WY 390 CO2 64445 TON point NA 2011 596
SDSH KS 707 Methane 123 TON point NA 2011 3890
SDSH MT 707 Methane 58 TON point NA 2011 112
And I want it to look like this:
state scc pollutant emissions unit year
WY 707 Methane 584 TON 2011
MT 707 Methane 723 TON 2011
WY 3405 Methane 3266 TON 2011
WY 390 CO2 65229 TON 2011
MT 390 CO2 12 TON 2011
KS 707 Methane 123 TON 2011
In the original dataframe all_emissions_state_total, tribe, state, scc, pollutant, emissions, category, eis, and fraction vary. unit is always TON, and year is always 2011.
I am wanting the rows to be grouped by rows that have the same state, scc, and pollutant, and for the emissions column to be the sum of those rows that are being grouped. tribe, category, eis, and fraction do not matter and can be dropped, but unit and year need to stay.
This is what I thought would work:
all_emissions_state <- all_emissions_state_total %>%
group_by( state, scc, pollutant ) %>%
summarise( emissions = sum( emissions ) )
but my output for this is a 1x1 dataframe all_emissions_state that has column emissions and 1 value that is the sum of all emissions from the dataframe.
One option in Base
New_df <- do.call(rbind,lapply(split(df, with(df,paste0(state,scc,pollutant))), function(x) x[1,c("state","scc","pollutant","emissions","unit","year")]))
New_df$emissions <- sapply( split(df$emissions, with(df,paste0(state,scc,pollutant))), sum)
row.names(New_df) <- NULL
> New_df
state scc pollutant emissions unit year
1 KS 707 Methane 123 TON 2011
2 MT 390 CO2 12 TON 2011
3 MT 707 Methane 723 TON 2011
4 WY 3405 Methane 3266 TON 2011
5 WY 390 CO2 65229 TON 2011
6 WY 707 Methane 584 TON 2011
This should work if unit/year are constant across group. Try putting dplyr::sumarise() maybe you have function conflict.
all_emissions_state <- all_emissions_state_total %>%
dplyr::group_by(state, scc, pollutant) %>%
dplyr::summarise(
emissions = sum(emissions),
unit = dplyr::first(unit),
year = dplyr::first(year)
)
I have an old software that uses some kind of database that saving the data on TXB/TZB files. The data in those files looks like this:
1911 ¸£
1913 ¼£
1916 ְ£
1921 ִ£
1922 ָ£
1923 ּ£
1924 װ£
1925 ה£
1926 ט£
1929 ל£
1930 פ£
1931 £
1932 ₪
1933 ₪
1934 ,₪
1935 <₪
1936 h₪
1937 x₪
1938 €₪
1939 ”₪
Someone know this extension? How to decode the data? it displays the data partially (Years in the example above), but around that data there are some unrecognized characters.
I'm trying to perform analysis on a time series data of inflation rates from the year 1960 to 2015. The dataset is a yearly time series over 56 years with 1 real value per each year, which is the following:
Year Inflation percentage
1960 1.783264746
1961 1.752021563
1962 3.57615894
1963 2.941176471
1964 13.35403727
1965 9.479452055
1966 10.81081081
1967 13.0532972
1968 2.996404315
1969 0.574712644
1970 5.095238095
1971 3.081105573
1972 6.461538462
1973 16.92815855
1974 28.60169492
1975 5.738605162
1976 -7.63438068
1977 8.321619342
1978 2.517518817
1979 6.253164557
1980 11.3652609
1981 13.11510484
1982 7.887270664
1983 11.86886396
1984 8.32157969
1985 5.555555556
1986 8.730811404
1987 8.798689021
1988 9.384775808
1989 3.26256011
1990 8.971233545
1991 13.87024609
1992 11.78781925
1993 6.362038664
1994 10.21150033
1995 10.22488756
1996 8.977149075
1997 7.16425362
1998 13.2308409
1999 4.669821024
2000 4.009433962
2001 3.684807256
2002 4.392199745
2003 3.805865922
2004 3.76723848
2005 4.246353323
2006 6.145522388
2007 6.369996746
2008 8.351816444
2009 10.87739112
2010 11.99229692
2011 8.857845297
2012 9.312445605
2013 10.90764331
2014 6.353194544
2015 5.872426595
'stock1' contains my data where the first column stands for Year, and the second for 'Inflation.percentage', as follows:
stock1<-read.csv("India-Inflation time series.csv", header=TRUE, stringsAsFactors=FALSE, as.is=TRUE)
The following is my code for creating the time series object:
stock <- ts(stock1$Inflation.percentage,start=(1960), end=(2015),frequency=1)
Following this, I am trying to decompose the time series object 'stock' using the following line of code:
decom_add <- (decompose(stock, type ="additive"))
Here I get an error:
Error in decompose(stock, type = "additive") : time series has no
or less than 2 periods
Why is this so? I initially thought it has something to do with frequency, but since the data is annual, the frequency has to be 1 right? If it is 1, then aren't there definitely more than 2 periods in the data?
Why isn't decompose() working? What am I doing wrong?
Thanks a lot in advance!
Please try for frequency=2, because frequency needs to be greater than 1. Because this action will change your model, for me the better way is to load data which contain and month column, so the frequency will be 12.
I have a dataset of some 39k rows of data, an excerpt is below:
'Country', 'Group', 'Item', 'Year' are categorical
'Production' and 'Waste' are numerical
'LF' is also numerical, but is the result of 'Waste'/'Production
Region Country Group Item Year Production Waste LF
Europe Bulgaria Cereals Wheat 1961 2040 274 0.134313725
Europe Bulgaria Cereals Wheat 1962 2090 262 0.125358852
Europe Bulgaria Cereals Wheat 1963 1894 277 0.14625132
Europe Bulgaria Cereals Wheat 1964 2121 286 0.134842056
Europe Bulgaria Cereals Wheat 1965 2923 341 0.116660965
Europe Bulgaria Cereals Wheat 1966 3193 385 0.120576261
Europe Bulgaria Cereals Barley 1961 612 15 0.024509804
Europe Bulgaria Cereals Barley 1962 599 16 0.026711185
Europe Bulgaria Cereals Barley 1963 618 16 0.025889968
Europe Bulgaria Cereals Barley 1964 764 21 0.027486911
Europe Bulgaria Cereals Barley 1965 876 22 0.025114155
Europe Bulgaria Cereals Barley 1966 1064 24 0.022556391
I have used the following code to generate 991 different means by Item and Group
df2 <- aggregate(LF ~ Country + Item, data=df1, FUN='mean')
The results of this function look ok.
I would like to test whether the respective means of LF in df2 are different to the underlying annual observations in df1 for each Country-Item combination (ie. if FALSE, then LF is really just a static ratio, if TRUE then 'Waste' is independent from 'Production').
How might this best be done? There seem to be 991 tests to conduct for this dataset alone and I don't know how to mix the apply and t.test functions in this manner.
Thanks!
t.test requires two groups to compare on a numeric/scale dependent output variable. Here, it seems to me that for each combination of country and item you want to compare all different year averages/means. In other words, you are trying to investigate if year is influencing the LF averages, for each combination of country and item.
The easiest way to do this is to create a linear model (LF ~ Year) for each combination of country and item and interpret the coefficient and p value of the variable year.
library(dplyr)
library(broom)
set.seed(115)
# example dataset
dt = data.frame(Country = rep("country1",12),
Item = c(rep("item1",6), rep("item2",6)),
Year = rep(1961:1966,2),
LF = runif(12,0,1))
# general means by country and item
dt %>% group_by(Country,Item) %>% summarise(Mean_LF = mean(LF))
# each years means by country and item
dt %>% group_by(Country,Item,Year) %>% summarise(Mean_LF = mean(LF))
# does year influence the means for each country and item?
dt %>% group_by(Country,Item) %>% do(tidy(lm(LF~Year, data=.)))
Hope this helps. Let me know if I'm missing something and I'll update my code.