System exactly singular with pgmm (package plm) - r

I am trying to run a pgmm regression (Arellano Bond estimator) following the example online with the EmplUK dataset.
My dataset is unbalanced, with some missing values (that I also removed, without any difference). This is the paste from R' dataframe.
row.names ID Year p I
1 23 1 1992 NA NA
2 22 1 1993 17.01 NA
3 21 1 1994 15.86 NA
4 20 1 1995 17.02 7.512347
5 19 1 1996 20.64 7.685104
6 18 1 1997 19.11 12.730282
7 17 1 1998 12.76 12.633871
8 16 1 1999 17.90 7.416381
9 15 1 2000 28.66 6.396114
10 14 1 2001 24.46 9.213729
11 13 1 2002 24.99 20.117159
12 12 1 2003 28.85 11.117816
13 11 1 2004 38.26 11.242638
14 10 1 2005 54.57 13.015168
15 9 1 2006 65.16 18.507212
16 8 1 2007 72.44 18.875281
17 7 1 2008 96.94 24.459170
18 6 1 2009 61.74 21.332035
19 5 1 2010 79.61 17.119038
20 4 1 2011 111.26 16.941914
21 3 1 2012 111.63 19.964875
22 2 1 2013 108.56 28.863894
23 1 1 2014 99.03 15.182615
24 45 2 1993 17.01 NA
25 44 2 1994 15.86 NA
26 43 2 1995 17.02 NA
27 42 2 1996 20.64 NA
28 41 2 1997 19.11 NA
29 40 2 1998 12.76 NA
30 39 2 1999 17.90 11.428262
31 38 2 2000 28.66 20.232613
32 37 2 2001 24.46 25.811754
33 36 2 2002 24.99 18.959958
34 35 2 2003 28.85 20.767074
35 34 2 2004 38.26 29.260406
36 33 2 2005 54.57 25.837434
37 32 2 2006 65.16 32.675618
38 31 2 2007 72.44 48.415190
39 30 2 2008 96.94 42.444435
40 29 2 2009 61.74 40.047462
41 28 2 2010 79.61 49.090816
42 27 2 2011 111.26 53.828050
43 26 2 2012 111.63 61.684020
44 25 2 2013 108.56 68.394140
45 24 2 2014 99.03 55.738584
46 76 3 1984 NA NA
47 75 3 1985 NA NA
48 74 3 1986 NA NA
49 73 3 1987 18.53 NA
50 72 3 1988 14.91 NA
51 71 3 1989 18.23 NA
52 70 3 1990 23.76 17.046268
53 69 3 1991 20.04 30.191128
54 68 3 1992 19.32 30.414108
55 67 3 1993 17.01 27.916000
56 66 3 1994 15.86 26.437651
57 65 3 1995 17.02 25.895513
58 64 3 1996 20.64 26.791996
59 63 3 1997 19.11 30.074375
60 62 3 1998 12.76 42.636103
61 61 3 1999 17.90 46.862510
62 60 3 2000 28.66 30.154079
63 59 3 2001 24.46 30.297644
64 58 3 2002 24.99 34.851205
65 57 3 2003 28.85 38.854943
66 56 3 2004 38.26 37.542447
67 55 3 2005 54.57 38.456399
68 54 3 2006 65.16 43.465535
69 53 3 2007 72.44 41.749414
70 52 3 2008 96.94 48.371262
71 51 3 2009 61.74 54.914470
72 50 3 2010 79.61 65.444964
73 49 3 2011 111.26 76.888119
74 48 3 2012 111.63 81.833602
75 47 3 2013 108.56 83.800483
76 46 3 2014 99.03 79.713947
my codes are the following:
data <- plm.data(Autoregression,index=c("ID","Year"))
Panel <- subset(data, !is.na(I) )
Are <- pgmm( I~p+lag( I , 0:1)
| lag(I, 2:99),
data = Panel, effect = "twoways", model = "onestep")
I have tried also many other versions, including every possible number of the lags, shorter or longer. I suppose that the problem is related to the lag function inside the pgmm, that for some reason does not create lags and simply past again and again the variable, obviously making the matrix non singular. I have also tried to create proper lags with excel and then import the text file, and use the lagged variables from excel instead of the lag function. Unfortunately, I am not sure about the syntax of the pgmm and again didn't work.
The error is the following :
Errore in solve.default(crossprod(WX, t(crossprod(WX, A1)))) :
Lapack routine dgesv: system is exactly singular: U[3,3] = 0
Inoltre: Warning message:
In pgmm(I ~ lag(I, 1) + p | lag(I, 2:10), Panel, effect = "twoways", :
the first-step matrix is singular, a general inverse is used
Can you please help me?

Related

Remove connecting line between first and last record of ggplot

I'm struggling with a problem which should be quite easy to solve. However, I wasn't able to fix it.
Here is my data:
cluster variable value
1 1 1988 16.266506
2 2 1988 1.651491
3 3 1988 1.414906
4 4 1988 3.524106
5 5 1988 1.255048
6 6 1988 5.247590
7 1 1989 58.542374
8 2 1989 55.348154
9 3 1989 1.281950
10 4 1989 79.946518
11 5 1989 295.739329
12 6 1989 111.941471
13 1 1990 35.831376
14 2 1990 163.154334
15 3 1990 6.267801
16 4 1990 36.135324
17 5 1990 32.184136
18 6 1990 65.808952
19 1 1991 104.319331
20 2 1991 271.297555
21 3 1991 1.717811
22 4 1991 6.162088
23 5 1991 223.614068
24 6 1991 144.494680
25 1 1992 24.920946
26 2 1992 97.514737
27 3 1992 2.338454
28 4 1992 8.236198
29 5 1992 119.907743
30 6 1992 59.466458
31 1 1993 35.915740
32 2 1993 11.444630
33 3 1993 1.754765
34 4 1993 5.023139
35 5 1993 2.464351
36 6 1993 12.793560
37 1 1994 10.192094
38 2 1994 17.123972
39 3 1994 1.148919
40 4 1994 44.892803
41 5 1994 46.797657
42 6 1994 27.554103
43 1 1995 47.949046
44 2 1995 58.979519
45 3 1995 2.156014
46 4 1995 2.386940
47 5 1995 16.583813
48 6 1995 30.259352
49 1 1996 50.782284
50 2 1996 23.318913
51 3 1996 245.206559
52 4 1996 92.726616
53 5 1996 23.872951
54 6 1996 60.165873
55 1 1997 16.047945
56 2 1997 96.002264
57 3 1997 154.553556
58 4 1997 4.683534
59 5 1997 4.230310
60 6 1997 40.414528
61 1 1998 9.674630
62 2 1998 276.691314
63 3 1998 1.539398
64 4 1998 47.155072
65 5 1998 167.535475
66 6 1998 121.591620
67 1 1999 1.996771
68 2 1999 19.291985
69 3 1999 9.627251
70 4 1999 2.111284
71 5 1999 5.077251
72 6 1999 7.651787
73 1 2000 1.533511
74 2 2000 4.749388
75 3 2000 77.764969
76 4 2000 8.822520
77 5 2000 1.398238
78 6 2000 9.021997
79 1 2001 4.147357
80 2 2001 4.655750
81 3 2001 1.192090
82 4 2001 4.542792
83 5 2001 1.401844
84 6 2001 3.560301
85 1 2002 4.358921
86 2 2002 5.588824
87 3 2002 9.286483
88 4 2002 3.383068
89 5 2002 1.630102
90 6 2002 4.163573
91 1 2003 114.590967
92 2 2003 416.672354
93 3 2003 2.145251
94 4 2003 7.990705
95 5 2003 650.406070
96 6 2003 280.008212
97 1 2004 31.423147
98 2 2004 25.393737
99 3 2004 2.134556
100 4 2004 38.647510
101 5 2004 58.314884
102 6 2004 35.628008
103 1 2005 120.493931
104 2 2005 76.455433
105 3 2005 206.615430
106 4 2005 59.008307
107 5 2005 27.198275
108 6 2005 79.833828
109 1 2006 235.611236
110 2 2006 84.379053
111 3 2006 86.638692
112 4 2006 201.766197
113 5 2006 3.348146
114 6 2006 127.260565
115 1 2007 31.479617
116 2 2007 8.959114
117 3 2007 1.191066
118 4 2007 24.147700
119 5 2007 17.038608
120 6 2007 18.790166
121 1 2008 54.826089
122 2 2008 2.163957
123 3 2008 1.479409
124 4 2008 3.141238
125 5 2008 1.304543
126 6 2008 13.931403
127 1 2009 63.018339
128 2 2009 101.637635
129 3 2009 5.172660
130 4 2009 58.412126
131 5 2009 236.547752
132 6 2009 106.876749
133 1 2010 11.843006
134 2 2010 4.458760
135 3 2010 12.711000
136 4 2010 57.260891
137 5 2010 38.884449
138 6 2010 26.512278
139 1 2011 134.628759
140 2 2011 216.482243
141 3 2011 5.593466
142 4 2011 3.980969
143 5 2011 27.394367
144 6 2011 93.071463
145 1 2012 3.696990
146 2 2012 17.026470
147 3 2012 21.556694
148 4 2012 1.682511
149 5 2012 13.405246
150 6 2012 9.999758
151 1 2013 1.642975
152 2 2013 44.140334
153 3 2013 42.019019
154 4 2013 2.643122
155 5 2013 1.234858
156 6 2013 15.342229
157 1 2014 2.200339
158 2 2014 3.041888
159 3 2014 42.076690
160 4 2014 1.359859
161 5 2014 1.271090
162 6 2014 4.638317
163 1 2015 95.083916
164 2 2015 204.618897
165 3 2015 1.191329
166 4 2015 18.865633
167 5 2015 228.506156
168 6 2015 129.305328
169 1 2016 81.739401
170 2 2016 40.525547
171 3 2016 192.637080
172 4 2016 9.985224
173 5 2016 61.758033
174 6 2016 57.468834
175 1 2017 201.880418
176 2 2017 98.496414
177 3 2017 230.865579
178 4 2017 25.877045
179 5 2017 93.934230
180 6 2017 112.588227
I want to plot this as a line plot of ggplot so that every cluster gets its own line. This should be straight forward and I did this many times, but for a reason, this time ggplot connects the first and the last data point with an additional line. How can I remove that line?
ggplot(median_min_7, aes(x = variable, y = value)) +
geom_line(mapping = aes(color = cluster), group = 1)
Since you provideds no code.. hard to check for errors.. this works for me
library(data.table)
library(ggplot2)
# Code ------------------
ggplot(DT, aes(x = variable, y = value, group = cluster, colour = as.factor(cluster))) +
geom_line()
# Sample data --------------
DT <- fread("row cluster variable value
1 1 1988 16.266506
2 2 1988 1.651491
3 3 1988 1.414906
4 4 1988 3.524106
5 5 1988 1.255048
6 6 1988 5.247590
7 1 1989 58.542374
8 2 1989 55.348154
9 3 1989 1.281950
10 4 1989 79.946518
11 5 1989 295.739329
12 6 1989 111.941471
13 1 1990 35.831376
14 2 1990 163.154334
15 3 1990 6.267801
16 4 1990 36.135324
17 5 1990 32.184136
18 6 1990 65.808952
19 1 1991 104.319331
20 2 1991 271.297555
21 3 1991 1.717811
22 4 1991 6.162088
23 5 1991 223.614068
24 6 1991 144.494680
25 1 1992 24.920946
26 2 1992 97.514737
27 3 1992 2.338454
28 4 1992 8.236198
29 5 1992 119.907743
30 6 1992 59.466458
31 1 1993 35.915740
32 2 1993 11.444630
33 3 1993 1.754765
34 4 1993 5.023139
35 5 1993 2.464351
36 6 1993 12.793560
37 1 1994 10.192094
38 2 1994 17.123972
39 3 1994 1.148919
40 4 1994 44.892803
41 5 1994 46.797657
42 6 1994 27.554103
43 1 1995 47.949046
44 2 1995 58.979519
45 3 1995 2.156014
46 4 1995 2.386940
47 5 1995 16.583813
48 6 1995 30.259352
49 1 1996 50.782284
50 2 1996 23.318913
51 3 1996 245.206559
52 4 1996 92.726616
53 5 1996 23.872951
54 6 1996 60.165873
55 1 1997 16.047945
56 2 1997 96.002264
57 3 1997 154.553556
58 4 1997 4.683534
59 5 1997 4.230310
60 6 1997 40.414528
61 1 1998 9.674630
62 2 1998 276.691314
63 3 1998 1.539398
64 4 1998 47.155072
65 5 1998 167.535475
66 6 1998 121.591620
67 1 1999 1.996771
68 2 1999 19.291985
69 3 1999 9.627251
70 4 1999 2.111284
71 5 1999 5.077251
72 6 1999 7.651787
73 1 2000 1.533511
74 2 2000 4.749388
75 3 2000 77.764969
76 4 2000 8.822520
77 5 2000 1.398238
78 6 2000 9.021997
79 1 2001 4.147357
80 2 2001 4.655750
81 3 2001 1.192090
82 4 2001 4.542792
83 5 2001 1.401844
84 6 2001 3.560301
85 1 2002 4.358921
86 2 2002 5.588824
87 3 2002 9.286483
88 4 2002 3.383068
89 5 2002 1.630102
90 6 2002 4.163573
91 1 2003 114.590967
92 2 2003 416.672354
93 3 2003 2.145251
94 4 2003 7.990705
95 5 2003 650.406070
96 6 2003 280.008212
97 1 2004 31.423147
98 2 2004 25.393737
99 3 2004 2.134556
100 4 2004 38.647510
101 5 2004 58.314884
102 6 2004 35.628008
103 1 2005 120.493931
104 2 2005 76.455433
105 3 2005 206.615430
106 4 2005 59.008307
107 5 2005 27.198275
108 6 2005 79.833828
109 1 2006 235.611236
110 2 2006 84.379053
111 3 2006 86.638692
112 4 2006 201.766197
113 5 2006 3.348146
114 6 2006 127.260565
115 1 2007 31.479617
116 2 2007 8.959114
117 3 2007 1.191066
118 4 2007 24.147700
119 5 2007 17.038608
120 6 2007 18.790166
121 1 2008 54.826089
122 2 2008 2.163957
123 3 2008 1.479409
124 4 2008 3.141238
125 5 2008 1.304543
126 6 2008 13.931403
127 1 2009 63.018339
128 2 2009 101.637635
129 3 2009 5.172660
130 4 2009 58.412126
131 5 2009 236.547752
132 6 2009 106.876749
133 1 2010 11.843006
134 2 2010 4.458760
135 3 2010 12.711000
136 4 2010 57.260891
137 5 2010 38.884449
138 6 2010 26.512278
139 1 2011 134.628759
140 2 2011 216.482243
141 3 2011 5.593466
142 4 2011 3.980969
143 5 2011 27.394367
144 6 2011 93.071463
145 1 2012 3.696990
146 2 2012 17.026470
147 3 2012 21.556694
148 4 2012 1.682511
149 5 2012 13.405246
150 6 2012 9.999758
151 1 2013 1.642975
152 2 2013 44.140334
153 3 2013 42.019019
154 4 2013 2.643122
155 5 2013 1.234858
156 6 2013 15.342229
157 1 2014 2.200339
158 2 2014 3.041888
159 3 2014 42.076690
160 4 2014 1.359859
161 5 2014 1.271090
162 6 2014 4.638317
163 1 2015 95.083916
164 2 2015 204.618897
165 3 2015 1.191329
166 4 2015 18.865633
167 5 2015 228.506156
168 6 2015 129.305328
169 1 2016 81.739401
170 2 2016 40.525547
171 3 2016 192.637080
172 4 2016 9.985224
173 5 2016 61.758033
174 6 2016 57.468834
175 1 2017 201.880418
176 2 2017 98.496414
177 3 2017 230.865579
178 4 2017 25.877045
179 5 2017 93.934230
180 6 2017 112.588227")

Adding a vector to a data frame each time it goes through a for loop

My loop is finding the top 20 counties with the highest abundance’s for 10 different years for 35 species of birds. I created a for loop that gives me a data frame with the county and abundance (two columns, named "y") for each year, and I want to put that into a larger data frame for each of the 10 years, so each species in the list has 200 rows of data.
I am struggling to add the data frame of one year (y) to the next as it goes through. I’ve tried rbind and append and it keeps saying I have the wrong replacement rows, or repeats the same 20 rows throughout the data frame
for (i in 1:35) {
bbscounties[[i]]=as.data.frame(matrix(nrow=200, ncol=4))
colnames(bbscounties[[i]])=c("Species", "Year", "CountyGEOID", "Abundance")
}
for (i in 1:35) {
for(j in c(1992:1996,2013:2017)) {
sub=subset(species[[i]], Year==j)
for(k in 1:nrow(sub)) {
pt1=st_sfc(st_point(c(sub[k,]$Longitude, sub[k,]$Latitude)))
cnty1=extract.county(pt1, counties)
sub$County[k]=as.numeric(levels(cnty1$GEOID))[cnty1$GEOID]
y=rbind(y,y)
x=aggregate(sub$SpeciesTotal, by=list(County=sub$County), FUN=sum)
y=arrange(x, desc(x))[1:20,]
bbscounties[[i]][,1]=species[[i]]$AOU[1]
bbscounties[[i]][,2]=sub$Year[1]
bbscounties[[i]][,3]=y[,1]
bbscounties[[i]][,4]=y[,2]
}
}
}
``
head(species[[1]])
X CountryNum StateNum Route RouteDataID RPID Year AOU Count10 Count20 Count30 Count40 Count50 StopTotal SpeciesTotal Active Starttime
120 92256 840 14 103 6345387 101 2012 Aimophila_ruficeps 7 0 0 0 0 5 7 1 510
155 92291 840 14 103 6357807 101 2013 Aimophila_ruficeps 2 4 0 0 0 3 6 1 510
157 92293 840 14 103 6218217 101 1994 Aimophila_ruficeps 6 1 0 0 0 6 7 1 510
162 92298 840 14 103 6226712 101 1996 Aimophila_ruficeps 12 3 0 0 0 10 15 1 510
182 92318 840 14 103 6215329 101 1993 Aimophila_ruficeps 1 1 0 0 0 2 2 1 510
184 92320 840 14 103 6191824 101 1981 Aimophila_ruficeps 4 0 0 0 0 3 4 1 510
Latitude Longitude Stratum BCR LT RT RTD County
120 34.56753 -118.5573 92 32 -  1 - Roadside route 1 - Random, 50 Stops 205
155 34.56753 -118.5573 92 32 -  1 - Roadside route 1 - Random, 50 Stops 205
157 34.56753 -118.5573 92 32 -  1 - Roadside route 1 - Random, 50 Stops 205
162 34.56753 -118.5573 92 32 -  1 - Roadside route 1 - Random, 50 Stops 205
182 34.56753 -118.5573 92 32 -  1 - Roadside route 1 - Random, 50 Stops 205
184 34.56753 -118.5573 92 32 -  1 - Roadside route 1 - Random, 50 Stops 205
nrow(species[[1]])
[1] 1992
extract.county=function(pt, counties){
pt = st_sfc(pt)
st_crs(pt)=st_crs(counties)
pt=st_sf(pt)
x=st_join(pt, counties)
}
bbscounties[[1]][1:50,]
Species Year CountyGEOID Abundance
1 Aimophila_ruficeps 1992 48243 44
2 Aimophila_ruficeps 1992 4023 39
3 Aimophila_ruficeps 1992 4019 20
4 Aimophila_ruficeps 1992 4003 16
5 Aimophila_ruficeps 1992 35047 13
6 Aimophila_ruficeps 1992 48209 12
7 Aimophila_ruficeps 1992 48259 12
8 Aimophila_ruficeps 1992 4007 11
9 Aimophila_ruficeps 1992 48137 10
10 Aimophila_ruficeps 1992 6053 7
11 Aimophila_ruficeps 1992 6017 5
12 Aimophila_ruficeps 1992 48463 5
13 Aimophila_ruficeps 1992 6037 3
14 Aimophila_ruficeps 1992 48413 3
15 Aimophila_ruficeps 1992 6083 2
16 Aimophila_ruficeps 1992 35029 2
17 Aimophila_ruficeps 1992 40075 2
18 Aimophila_ruficeps 1992 48377 2
19 Aimophila_ruficeps 1992 4012 1
20 Aimophila_ruficeps 1992 4013 1
21 Aimophila_ruficeps 1992 48243 44
22 Aimophila_ruficeps 1992 4023 39
23 Aimophila_ruficeps 1992 4019 20
24 Aimophila_ruficeps 1992 4003 16
25 Aimophila_ruficeps 1992 35047 13
26 Aimophila_ruficeps 1992 48209 12
27 Aimophila_ruficeps 1992 48259 12
28 Aimophila_ruficeps 1992 4007 11

How to count the number of instances where the wind directions is < (-3.54) for >= 5 consecutive days in a season

I have a data set of U wind direction recorded every day from 1993 to 2016. I would like to know how many instances there are in a season (let's say Autumn; from March to May) where U is <(-3.54) for >= 5 days. I have looked at previous questions and answers on stack overflow, but I haven't been able to find an example to guide me. Any help would be appreciated.
A section of my data set looks like this:
```````````````````````````````
Year Month Day U
1 1993 1 1 2.2752712
2 1993 1 2 -2.3828683
3 1993 1 3 -6.5054070
4 1993 1 4 -6.5550585
5 1993 1 5 -0.8896707
6 1993 1 6 -2.2694185
7 1993 1 7 1.6020930
8 1993 1 8 4.4161047
9 1993 1 9 -3.4612790
10 1993 1 10 -4.1855815
11 1993 1 11 4.3345735
12 1993 1 12 6.7505038
13 1993 1 13 2.7704460
14 1993 1 14 -0.3126935
15 1993 1 15 -5.2111823
16 1993 1 16 0.1577910
17 1993 1 17 3.2431200
18 1993 1 18 4.1351938
19 1993 1 19 8.8824225
20 1993 1 20 11.5171123
21 1993 1 21 8.4929847
22 1993 1 22 -0.6879845
23 1993 1 23 9.1634883
24 1993 1 24 8.0907365
25 1993 1 25 -5.9970930
26 1993 1 26 -11.9065697
27 1993 1 27 -0.0509885
28 1993 1 28 -0.9271122
29 1993 1 29 -1.2506782
30 1993 1 30 2.8655622
31 1993 1 31 5.1648452
32 1993 2 1 -0.6710272
33 1993 2 2 -0.1745542
34 1993 2 3 7.1772285
35 1993 2 4 -1.2568218
36 1993 2 5 -1.4439727
37 1993 2 6 0.6784107
38 1993 2 7 8.6756010
39 1993 2 8 1.5709885
40 1993 2 9 -6.4978875
41 1993 2 10 0.8981590
42 1993 2 11 -5.4501548
43 1993 2 12 -2.0549033
44 1993 2 13 -0.9364535
45 1993 2 14 2.3316280
46 1993 2 15 8.4644767
47 1993 2 16 4.2322285
48 1993 2 17 -4.2141278
49 1993 2 18 -7.1285853
50 1993 2 19 -3.9616670
51 1993 2 20 3.0711045
52 1993 2 21 0.8550193
53 1993 2 22 2.7637208
54 1993 2 23 -4.0326550
55 1993 2 24 -6.9834690
56 1993 2 25 -7.1804845
57 1993 2 26 2.7410468
58 1993 2 27 0.9994572
59 1993 2 28 -2.1881782
60 1993 3 1 -1.6012982
61 1993 3 2 0.4499225
62 1993 3 3 -2.4872480
63 1993 3 4 -2.1658527
64 1993 3 5 -1.4132365
65 1993 3 6 2.2400198
66 1993 3 7 -3.1068022
67 1993 3 8 -0.5415117
68 1993 3 9 0.9616280
69 1993 3 10 -7.1419960
70 1993 3 11 1.2279457
71 1993 3 12 6.1011240
72 1993 3 13 4.9892440
73 1993 3 14 4.8197285
74 1993 3 15 1.6525583
75 1993 3 16 -9.0284302
76 1993 3 17 -3.3607170
77 1993 3 18 5.7897092
78 1993 3 19 -2.5350580
79 1993 3 20 -3.1431975
80 1993 3 21 6.2275968
81 1993 3 22 0.9624417
82 1993 3 23 -8.9311823
83 1993 3 24 -9.6640115
84 1993 3 25 -9.7974420
85 1993 3 26 -3.8447093
86 1993 3 27 1.6185270
87 1993 3 28 -4.5626552
88 1993 3 29 -7.6756202
89 1993 3 30 5.4181783
90 1993 3 31 5.9135658
91 1993 4 1 3.4654847
92 1993 4 2 -2.1095738
93 1993 4 3 -9.3131203
94 1993 4 4 -8.1391280
95 1993 4 5 -10.7533140
96 1993 4 6 6.1808530
97 1993 4 7 5.7693025
98 1993 4 8 0.3322870
99 1993 4 9 10.3273835
100 1993 4 10 5.7872480
101 1993 4 11 0.8317830
102 1993 4 12 -0.7549225
103 1993 4 13 11.9887015
104 1993 4 14 4.1117440
105 1993 4 15 1.2044572
106 1993 4 16 1.3899808
107 1993 4 17 11.2100388
108 1993 4 18 8.2815310
109 1993 4 19 -0.8285080
110 1993 4 20 -5.7935273
111 1993 4 21 -4.0424420
112 1993 4 22 -0.5786045
113 1993 4 23 0.3742055
114 1993 4 24 -0.4698642
115 1993 4 25 -0.3981780
116 1993 4 26 5.5060660
117 1993 4 27 5.0961628
118 1993 4 28 4.3308137
119 1993 4 29 7.8211433
120 1993 4 30 1.4068415
121 1993 5 1 -6.0343218
122 1993 5 2 2.5626165
123 1993 5 3 -0.2517055
124 1993 5 4 -0.3624998
125 1993 5 5 5.4518413
126 1993 5 6 8.0799417
127 1993 5 7 9.6727713
128 1993 5 8 6.9166862
129 1993 5 9 5.1044767
130 1993 5 10 -3.5812015
131 1993 5 11 -0.6386435
132 1993 5 12 3.8953680
133 1993 5 13 2.2846125
134 1993 5 14 6.8920930
135 1993 5 15 6.3412790
136 1993 5 16 9.9857557
137 1993 5 17 4.9041085
138 1993 5 18 1.2711628
139 1993 5 19 -0.8744572
140 1993 5 20 -1.7563565
141 1993 5 21 7.7133918
142 1993 5 22 1.8609305
143 1993 5 23 5.0106588
144 1993 5 24 2.2513178
145 1993 5 25 9.8685660
146 1993 5 26 17.1051357
147 1993 5 27 15.9958140
148 1993 5 28 11.9747288
149 1993 5 29 10.4338953
150 1993 5 30 9.8273450
151 1993 5 31 2.9315697
152 1993 6 1 -4.8080815
153 1993 6 2 7.4390697
154 1993 6 3 9.7631200
155 1993 6 4 3.0179265
156 1993 6 5 -0.9081978
157 1993 6 6 0.8990115
158 1993 6 7 -1.6712595
159 1993 6 8 -6.6958335
160 1993 6 9 3.0657173
161 1993 6 10 2.8695543
162 1993 6 11 14.8854070
163 1993 6 12 6.0319572
164 1993 6 13 -0.8188955
165 1993 6 14 -2.1511820
166 1993 6 15 2.8237210
167 1993 6 16 6.0374808
168 1993 6 17 5.7747092
169 1993 6 18 3.7086240
170 1993 6 19 11.2165893
171 1993 6 20 13.0581202
172 1993 6 21 10.7091860
173 1993 6 22 5.5876357
174 1993 6 23 7.3413180
175 1993 6 24 -3.0820543
176 1993 6 25 -0.4195735
177 1993 6 26 2.3836045
178 1993 6 27 -3.6750388
179 1993 6 28 10.1507362
180 1993 6 29 11.7455232
181 1993 6 30 4.6698065
```````````````````````````````
You can do:
with(rle(Data$U < -3.54), sum(values==TRUE & lengths>=5))
You want to explore only some months:
D <- subset(Data, Month %in% 3:5)
with(rle(D$U < -3.54), sum(values==TRUE & lengths>=5))
There is no such sequence.
To have some data in the result I changed the task a little bit:
R <- rle(D$U > -3.54)
Rdat <- with(R, data.frame(values, lengths))
Rdat$start <- 1 + cumsum(c(0, head(Rdat$lengths, -1)))
Ri <- subset(Rdat, values==TRUE & lengths>=5)
cbind(D[Ri$start,], Ri$lengths)
The result are the starting days of the sequences.
#> cbind(D[Ri$start,], Ri$lengths)
# Year Month Day U Ri$lengths
#60 1993 3 1 -1.6012982 9
#70 1993 3 11 1.2279457 5
#76 1993 3 17 -3.3607170 6
#96 1993 4 6 6.1808530 14
#112 1993 4 22 -0.5786045 9
#122 1993 5 2 2.5626165 8
#131 1993 5 11 -0.6386435 21
Here's a way with dplyr that also keeps the first day of each instance.
library(dplyr)
df %>%
filter(Month %in% 3:5) %>%
mutate(threshold = U > -3.54,
group = cumsum(threshold != lag(threshold, default = FALSE))) %>%
group_by(group) %>%
mutate(n_days = n()) %>%
summarise_all(first) %>%
filter(threshold, n_days >= 5) %>%
select(-group, -threshold) -> instances
instances
# A tibble: 7 x 5
Year Month Day U n_days
<int> <int> <int> <dbl> <int>
1 1993 3 1 -1.60 9
2 1993 3 11 1.23 5
3 1993 3 17 -3.36 6
4 1993 4 6 6.18 14
5 1993 4 22 -0.579 9
6 1993 5 2 2.56 8
7 1993 5 11 -0.639 21
nrow(instances)
[1] 7

Trying to add a legend in the left upper corner of the graph, but the code I've taken from other scripts isn't working [duplicate]

This question already has answers here:
Add legend to ggplot2 line plot
(4 answers)
Closed 3 years ago.
I want to add a legend of my two data sets in the upper left corner of my plot, but somehow the code that I've taken from other scripts isn't working does anybody have an idea why?
library("ggplot2")
library("reshape2")
data<-read.csv("trial.csv",header=TRUE,dec=".",sep=',',na.strings="NA")
#Example of data
Year Annual Cumulative
1 1960 1 1
2 1961 0 1
3 1962 0 1
4 1963 0 1
5 1964 0 1
6 1965 0 1
7 1966 0 1
8 1967 1 2
9 1968 0 2
10 1969 0 2
11 1970 0 2
12 1971 0 2
13 1972 1 3
14 1973 0 3
15 1974 1 4
16 1975 1 5
17 1976 0 5
18 1977 0 5
19 1978 0 5
20 1979 4 9
21 1980 2 11
22 1981 1 12
23 1982 1 13
24 1983 3 16
25 1984 1 17
26 1985 2 19
27 1986 1 20
28 1987 4 24
29 1988 3 27
30 1989 3 30
31 1990 3 33
32 1991 1 34
33 1992 4 38
34 1993 0 38
35 1994 4 42
36 1995 4 46
37 1996 3 49
38 1997 5 54
39 1998 2 56
40 1999 0 56
41 2000 6 62
42 2001 11 73
43 2002 8 81
44 2003 2 83
45 2004 5 88
46 2005 7 95
47 2006 13 108
48 2007 22 130
49 2008 13 143
50 2009 13 156
51 2010 17 173
52 2011 14 187
53 2012 24 211
54 2013 24 235
55 2014 18 253
56 2015 19 272
57 2016 17 289
58 2017 16 305
59 2018 24 329
60 2019 9 338
Plotting code:
p1<-ggplot(data=data,aes(x=Year))+
geom_line(aes(y=Cumulative),linetype="solid",color="red",size=1.1)+
geom_point(aes(y=Cumulative),shape=1,color="red",size=3,stroke=1.5)+
geom_line(aes(y=Annual),linetype="solid",color="darkorange",size=1.1)+
geom_point(aes(y=Annual),shape=1,color="darkorange",size=3,stroke=1.5)+
scale_y_continuous(sec.axis=sec_axis(~.*1/10,name="Annual\n"))
p1<-p1+labs(x="\nYear",y="Cumulative\n")
p1+theme(axis.title.x=element_text(size=18),
axis.text.x=element_text(size=14),
axis.title.y=element_text(size=18),
axis.text.y=element_text(size=14),
axis.ticks=element_blank())
There are a number of alternative ways to do this. One quick thing to do is melt the data by year and simplify geom_point and geom_line to remove some repetition. You will have a legend created for you, which you can customize and relocate based on legend.position in your theme.
library("ggplot2")
library("reshape2")
data.melt <- melt(data, "Year")
ggplot(data.melt, aes(x = Year, y = value, color = variable)) +
geom_point(shape=1, size=3, stroke=1.5) +
geom_line(linetype="solid", size=1.1) +
scale_colour_manual(values=c("darkorange", "red")) +
scale_y_continuous(sec.axis=sec_axis(~.*1/10, name="Annual\n")) +
labs(x="\nYear", y="Cumulative\n", color="Legend Title") +
theme(axis.title.x=element_text(size=18),
axis.text.x=element_text(size=14),
axis.title.y=element_text(size=18),
axis.text.y=element_text(size=14),
axis.ticks=element_blank(),
legend.position = c(0.1, 0.9))

Facet Wrap in ggplot2 for my data and how to calculate P50?

I have the survival data (%) over time from my experiment. The data is as follows:
Survival <- read.table(text= "Species Var Year Time PerSurvival
1 1 2014 0 86
1 1 2014 1 74
1 1 2014 7 80
1 1 2014 14 69
1 1 2014 21 63
1 1 2014 28 52
1 1 2014 35 53
1 1 2014 42 47
1 1 2014 50 32
1 2 2015 0 99
1 2 2015 1 98
1 2 2015 7 95
1 2 2015 14 91
1 2 2015 21 60
1 2 2015 28 61
1 2 2015 35 48
1 2 2015 42 43
1 2 2015 50 10
1 3 2014 0 98
1 3 2014 1 97
1 3 2014 7 84
1 3 2014 14 82
1 3 2014 21 53
1 3 2014 28 52
1 3 2014 35 44
1 3 2014 42 29
1 3 2014 50 5
1 4 2014 0 92
1 4 2014 1 84
1 4 2014 7 78
1 4 2014 14 73
1 4 2014 21 57
1 4 2014 28 52
1 4 2014 35 46
1 4 2014 42 41
1 4 2014 50 13
2 5 2014 0 99
2 5 2014 1 97
2 5 2014 7 95
2 5 2014 14 86
2 5 2014 21 73
2 5 2014 28 76
2 5 2014 35 64
2 5 2014 42 29
2 5 2014 56 0
2 6 2015 0 94
2 6 2015 1 100
2 6 2015 7 90
2 6 2015 14 82
2 6 2015 21 76
2 6 2015 28 52
2 6 2015 35 50
2 6 2015 50 8
2 7 2014 0 98
2 7 2014 1 98
2 7 2014 7 96
2 7 2014 14 95
2 7 2014 21 82
2 7 2014 28 70
2 7 2014 35 81
2 7 2014 42 75
2 7 2014 50 9
2 8 2015 0 92
2 8 2015 1 94
2 8 2015 7 86
2 8 2015 14 84
2 8 2015 21 86
2 8 2015 28 78
2 8 2015 35 68
2 8 2015 50 53", header = TRUE)
I used the following code to visualise the data and fit logistic function:
View(Survival
attach(Survival)
library(ggplot2)
ggplot(data=Survival) +
geom_point(mapping=aes(x=Time,y=PerSurvival)) + xlim(0, 60) + ylim(0, 100) +
geom_smooth(mapping=aes(x=Time,y=PerSurvival), method="nls", formula = y ~ 100 / (1 + exp(-a* (x - b))), se = FALSE, fill=NA, method.args=list(start = c(a=-0.05, b=50))) +
facet_wrap(~ Year + Var, nrow=2)
Running this code resulted in following graph:
I need help for the following questions:
What code should be used to generate two separate plots for year 2014 and 2015. In plot for year 2014 there should be 5 curves for each var and in the year 2015 there should be 3 curves. How to colour the points and curve for each var or give different shape for the points for each curve?
What code should be used to generate plots in two rows, one each for years using facet wrap options? In such case in the top row there should be 5 individual plots with points and curve. Like wise the bottom row should have 3 curves?
How to derive time for 50% reduction in survival for each var? How to use dose.p function in Library(MASS)?

Resources