Plot a data frame using ggplot - r

I have a dataframe, with the following data:
data1$YEAR data1$WEEK data1$TOTAL.PATIENTS
1 2009 1 579428
9 2009 2 565631
17 2009 3 582932
25 2009 4 611176
33 2009 5 638613
41 2009 6 648304
49 2009 7 624583
57 2009 8 659573
65 2009 9 623389
73 2009 10 637672
81 2009 11 605503
89 2009 12 608342
97 2009 13 586651
105 2009 14 564460
113 2009 15 558837
121 2009 16 577836
129 2009 17 624734
137 2009 18 598189
145 2009 19 550300
153 2009 20 544432
161 2009 21 531526
169 2009 22 538177
177 2009 23 493761
185 2009 24 521701
193 2009 25 512268
201 2009 26 475877
209 2009 27 480680
217 2009 28 502466
225 2009 29 503971
233 2009 30 485804
241 2009 31 496666
249 2009 32 506019
257 2009 33 544827
265 2009 34 588916
273 2009 35 573972
281 2009 36 571201
289 2009 37 638302
296 2009 38 608464
303 2009 39 606458
311 2009 40 855346
319 2009 41 853912
327 2009 42 906536
335 2009 43 898860
343 2009 44 899425
351 2009 45 864348
359 2009 46 853552
367 2009 47 654101
375 2009 48 814550
383 2009 49 781811
391 2009 50 728401
399 2009 51 536961
407 2009 52 583299
2 2010 1 721138
...
second column is the year from 2009 to 2015
third column is the week of the year
I would like to plot this data frame. On the x-axis of this plot I would like to see the weeks of each year separately.
something like this. How can I do that?

Doe this work or you need to re-label X-axis to Year only (in the following plot the x-axis is in Year-Weeks)?
head(df)
Year Week TOTAL.PATIENTS
1 2009 11 605503
2 2009 12 608342
3 2009 13 586651
4 2009 14 564460
5 2009 15 558837
6 2009 16 577836
df$Year_Week <- paste(df$Year, sprintf('%02d', df$Week), sep='-')
df$Year <- as.factor(df$Year)
library(scales)
ggplot(df, aes(Year_Week,TOTAL.PATIENTS,col=Year, group=Year)) +
geom_line(lwd=2) + scale_y_continuous(labels = comma) +
xlab('Year-Week') +
theme(axis.text.x = element_text(angle=90, vjust = 0.5))

Related

Remove connecting line between first and last record of ggplot

I'm struggling with a problem which should be quite easy to solve. However, I wasn't able to fix it.
Here is my data:
cluster variable value
1 1 1988 16.266506
2 2 1988 1.651491
3 3 1988 1.414906
4 4 1988 3.524106
5 5 1988 1.255048
6 6 1988 5.247590
7 1 1989 58.542374
8 2 1989 55.348154
9 3 1989 1.281950
10 4 1989 79.946518
11 5 1989 295.739329
12 6 1989 111.941471
13 1 1990 35.831376
14 2 1990 163.154334
15 3 1990 6.267801
16 4 1990 36.135324
17 5 1990 32.184136
18 6 1990 65.808952
19 1 1991 104.319331
20 2 1991 271.297555
21 3 1991 1.717811
22 4 1991 6.162088
23 5 1991 223.614068
24 6 1991 144.494680
25 1 1992 24.920946
26 2 1992 97.514737
27 3 1992 2.338454
28 4 1992 8.236198
29 5 1992 119.907743
30 6 1992 59.466458
31 1 1993 35.915740
32 2 1993 11.444630
33 3 1993 1.754765
34 4 1993 5.023139
35 5 1993 2.464351
36 6 1993 12.793560
37 1 1994 10.192094
38 2 1994 17.123972
39 3 1994 1.148919
40 4 1994 44.892803
41 5 1994 46.797657
42 6 1994 27.554103
43 1 1995 47.949046
44 2 1995 58.979519
45 3 1995 2.156014
46 4 1995 2.386940
47 5 1995 16.583813
48 6 1995 30.259352
49 1 1996 50.782284
50 2 1996 23.318913
51 3 1996 245.206559
52 4 1996 92.726616
53 5 1996 23.872951
54 6 1996 60.165873
55 1 1997 16.047945
56 2 1997 96.002264
57 3 1997 154.553556
58 4 1997 4.683534
59 5 1997 4.230310
60 6 1997 40.414528
61 1 1998 9.674630
62 2 1998 276.691314
63 3 1998 1.539398
64 4 1998 47.155072
65 5 1998 167.535475
66 6 1998 121.591620
67 1 1999 1.996771
68 2 1999 19.291985
69 3 1999 9.627251
70 4 1999 2.111284
71 5 1999 5.077251
72 6 1999 7.651787
73 1 2000 1.533511
74 2 2000 4.749388
75 3 2000 77.764969
76 4 2000 8.822520
77 5 2000 1.398238
78 6 2000 9.021997
79 1 2001 4.147357
80 2 2001 4.655750
81 3 2001 1.192090
82 4 2001 4.542792
83 5 2001 1.401844
84 6 2001 3.560301
85 1 2002 4.358921
86 2 2002 5.588824
87 3 2002 9.286483
88 4 2002 3.383068
89 5 2002 1.630102
90 6 2002 4.163573
91 1 2003 114.590967
92 2 2003 416.672354
93 3 2003 2.145251
94 4 2003 7.990705
95 5 2003 650.406070
96 6 2003 280.008212
97 1 2004 31.423147
98 2 2004 25.393737
99 3 2004 2.134556
100 4 2004 38.647510
101 5 2004 58.314884
102 6 2004 35.628008
103 1 2005 120.493931
104 2 2005 76.455433
105 3 2005 206.615430
106 4 2005 59.008307
107 5 2005 27.198275
108 6 2005 79.833828
109 1 2006 235.611236
110 2 2006 84.379053
111 3 2006 86.638692
112 4 2006 201.766197
113 5 2006 3.348146
114 6 2006 127.260565
115 1 2007 31.479617
116 2 2007 8.959114
117 3 2007 1.191066
118 4 2007 24.147700
119 5 2007 17.038608
120 6 2007 18.790166
121 1 2008 54.826089
122 2 2008 2.163957
123 3 2008 1.479409
124 4 2008 3.141238
125 5 2008 1.304543
126 6 2008 13.931403
127 1 2009 63.018339
128 2 2009 101.637635
129 3 2009 5.172660
130 4 2009 58.412126
131 5 2009 236.547752
132 6 2009 106.876749
133 1 2010 11.843006
134 2 2010 4.458760
135 3 2010 12.711000
136 4 2010 57.260891
137 5 2010 38.884449
138 6 2010 26.512278
139 1 2011 134.628759
140 2 2011 216.482243
141 3 2011 5.593466
142 4 2011 3.980969
143 5 2011 27.394367
144 6 2011 93.071463
145 1 2012 3.696990
146 2 2012 17.026470
147 3 2012 21.556694
148 4 2012 1.682511
149 5 2012 13.405246
150 6 2012 9.999758
151 1 2013 1.642975
152 2 2013 44.140334
153 3 2013 42.019019
154 4 2013 2.643122
155 5 2013 1.234858
156 6 2013 15.342229
157 1 2014 2.200339
158 2 2014 3.041888
159 3 2014 42.076690
160 4 2014 1.359859
161 5 2014 1.271090
162 6 2014 4.638317
163 1 2015 95.083916
164 2 2015 204.618897
165 3 2015 1.191329
166 4 2015 18.865633
167 5 2015 228.506156
168 6 2015 129.305328
169 1 2016 81.739401
170 2 2016 40.525547
171 3 2016 192.637080
172 4 2016 9.985224
173 5 2016 61.758033
174 6 2016 57.468834
175 1 2017 201.880418
176 2 2017 98.496414
177 3 2017 230.865579
178 4 2017 25.877045
179 5 2017 93.934230
180 6 2017 112.588227
I want to plot this as a line plot of ggplot so that every cluster gets its own line. This should be straight forward and I did this many times, but for a reason, this time ggplot connects the first and the last data point with an additional line. How can I remove that line?
ggplot(median_min_7, aes(x = variable, y = value)) +
geom_line(mapping = aes(color = cluster), group = 1)
Since you provideds no code.. hard to check for errors.. this works for me
library(data.table)
library(ggplot2)
# Code ------------------
ggplot(DT, aes(x = variable, y = value, group = cluster, colour = as.factor(cluster))) +
geom_line()
# Sample data --------------
DT <- fread("row cluster variable value
1 1 1988 16.266506
2 2 1988 1.651491
3 3 1988 1.414906
4 4 1988 3.524106
5 5 1988 1.255048
6 6 1988 5.247590
7 1 1989 58.542374
8 2 1989 55.348154
9 3 1989 1.281950
10 4 1989 79.946518
11 5 1989 295.739329
12 6 1989 111.941471
13 1 1990 35.831376
14 2 1990 163.154334
15 3 1990 6.267801
16 4 1990 36.135324
17 5 1990 32.184136
18 6 1990 65.808952
19 1 1991 104.319331
20 2 1991 271.297555
21 3 1991 1.717811
22 4 1991 6.162088
23 5 1991 223.614068
24 6 1991 144.494680
25 1 1992 24.920946
26 2 1992 97.514737
27 3 1992 2.338454
28 4 1992 8.236198
29 5 1992 119.907743
30 6 1992 59.466458
31 1 1993 35.915740
32 2 1993 11.444630
33 3 1993 1.754765
34 4 1993 5.023139
35 5 1993 2.464351
36 6 1993 12.793560
37 1 1994 10.192094
38 2 1994 17.123972
39 3 1994 1.148919
40 4 1994 44.892803
41 5 1994 46.797657
42 6 1994 27.554103
43 1 1995 47.949046
44 2 1995 58.979519
45 3 1995 2.156014
46 4 1995 2.386940
47 5 1995 16.583813
48 6 1995 30.259352
49 1 1996 50.782284
50 2 1996 23.318913
51 3 1996 245.206559
52 4 1996 92.726616
53 5 1996 23.872951
54 6 1996 60.165873
55 1 1997 16.047945
56 2 1997 96.002264
57 3 1997 154.553556
58 4 1997 4.683534
59 5 1997 4.230310
60 6 1997 40.414528
61 1 1998 9.674630
62 2 1998 276.691314
63 3 1998 1.539398
64 4 1998 47.155072
65 5 1998 167.535475
66 6 1998 121.591620
67 1 1999 1.996771
68 2 1999 19.291985
69 3 1999 9.627251
70 4 1999 2.111284
71 5 1999 5.077251
72 6 1999 7.651787
73 1 2000 1.533511
74 2 2000 4.749388
75 3 2000 77.764969
76 4 2000 8.822520
77 5 2000 1.398238
78 6 2000 9.021997
79 1 2001 4.147357
80 2 2001 4.655750
81 3 2001 1.192090
82 4 2001 4.542792
83 5 2001 1.401844
84 6 2001 3.560301
85 1 2002 4.358921
86 2 2002 5.588824
87 3 2002 9.286483
88 4 2002 3.383068
89 5 2002 1.630102
90 6 2002 4.163573
91 1 2003 114.590967
92 2 2003 416.672354
93 3 2003 2.145251
94 4 2003 7.990705
95 5 2003 650.406070
96 6 2003 280.008212
97 1 2004 31.423147
98 2 2004 25.393737
99 3 2004 2.134556
100 4 2004 38.647510
101 5 2004 58.314884
102 6 2004 35.628008
103 1 2005 120.493931
104 2 2005 76.455433
105 3 2005 206.615430
106 4 2005 59.008307
107 5 2005 27.198275
108 6 2005 79.833828
109 1 2006 235.611236
110 2 2006 84.379053
111 3 2006 86.638692
112 4 2006 201.766197
113 5 2006 3.348146
114 6 2006 127.260565
115 1 2007 31.479617
116 2 2007 8.959114
117 3 2007 1.191066
118 4 2007 24.147700
119 5 2007 17.038608
120 6 2007 18.790166
121 1 2008 54.826089
122 2 2008 2.163957
123 3 2008 1.479409
124 4 2008 3.141238
125 5 2008 1.304543
126 6 2008 13.931403
127 1 2009 63.018339
128 2 2009 101.637635
129 3 2009 5.172660
130 4 2009 58.412126
131 5 2009 236.547752
132 6 2009 106.876749
133 1 2010 11.843006
134 2 2010 4.458760
135 3 2010 12.711000
136 4 2010 57.260891
137 5 2010 38.884449
138 6 2010 26.512278
139 1 2011 134.628759
140 2 2011 216.482243
141 3 2011 5.593466
142 4 2011 3.980969
143 5 2011 27.394367
144 6 2011 93.071463
145 1 2012 3.696990
146 2 2012 17.026470
147 3 2012 21.556694
148 4 2012 1.682511
149 5 2012 13.405246
150 6 2012 9.999758
151 1 2013 1.642975
152 2 2013 44.140334
153 3 2013 42.019019
154 4 2013 2.643122
155 5 2013 1.234858
156 6 2013 15.342229
157 1 2014 2.200339
158 2 2014 3.041888
159 3 2014 42.076690
160 4 2014 1.359859
161 5 2014 1.271090
162 6 2014 4.638317
163 1 2015 95.083916
164 2 2015 204.618897
165 3 2015 1.191329
166 4 2015 18.865633
167 5 2015 228.506156
168 6 2015 129.305328
169 1 2016 81.739401
170 2 2016 40.525547
171 3 2016 192.637080
172 4 2016 9.985224
173 5 2016 61.758033
174 6 2016 57.468834
175 1 2017 201.880418
176 2 2017 98.496414
177 3 2017 230.865579
178 4 2017 25.877045
179 5 2017 93.934230
180 6 2017 112.588227")

Trying to add a legend in the left upper corner of the graph, but the code I've taken from other scripts isn't working [duplicate]

This question already has answers here:
Add legend to ggplot2 line plot
(4 answers)
Closed 3 years ago.
I want to add a legend of my two data sets in the upper left corner of my plot, but somehow the code that I've taken from other scripts isn't working does anybody have an idea why?
library("ggplot2")
library("reshape2")
data<-read.csv("trial.csv",header=TRUE,dec=".",sep=',',na.strings="NA")
#Example of data
Year Annual Cumulative
1 1960 1 1
2 1961 0 1
3 1962 0 1
4 1963 0 1
5 1964 0 1
6 1965 0 1
7 1966 0 1
8 1967 1 2
9 1968 0 2
10 1969 0 2
11 1970 0 2
12 1971 0 2
13 1972 1 3
14 1973 0 3
15 1974 1 4
16 1975 1 5
17 1976 0 5
18 1977 0 5
19 1978 0 5
20 1979 4 9
21 1980 2 11
22 1981 1 12
23 1982 1 13
24 1983 3 16
25 1984 1 17
26 1985 2 19
27 1986 1 20
28 1987 4 24
29 1988 3 27
30 1989 3 30
31 1990 3 33
32 1991 1 34
33 1992 4 38
34 1993 0 38
35 1994 4 42
36 1995 4 46
37 1996 3 49
38 1997 5 54
39 1998 2 56
40 1999 0 56
41 2000 6 62
42 2001 11 73
43 2002 8 81
44 2003 2 83
45 2004 5 88
46 2005 7 95
47 2006 13 108
48 2007 22 130
49 2008 13 143
50 2009 13 156
51 2010 17 173
52 2011 14 187
53 2012 24 211
54 2013 24 235
55 2014 18 253
56 2015 19 272
57 2016 17 289
58 2017 16 305
59 2018 24 329
60 2019 9 338
Plotting code:
p1<-ggplot(data=data,aes(x=Year))+
geom_line(aes(y=Cumulative),linetype="solid",color="red",size=1.1)+
geom_point(aes(y=Cumulative),shape=1,color="red",size=3,stroke=1.5)+
geom_line(aes(y=Annual),linetype="solid",color="darkorange",size=1.1)+
geom_point(aes(y=Annual),shape=1,color="darkorange",size=3,stroke=1.5)+
scale_y_continuous(sec.axis=sec_axis(~.*1/10,name="Annual\n"))
p1<-p1+labs(x="\nYear",y="Cumulative\n")
p1+theme(axis.title.x=element_text(size=18),
axis.text.x=element_text(size=14),
axis.title.y=element_text(size=18),
axis.text.y=element_text(size=14),
axis.ticks=element_blank())
There are a number of alternative ways to do this. One quick thing to do is melt the data by year and simplify geom_point and geom_line to remove some repetition. You will have a legend created for you, which you can customize and relocate based on legend.position in your theme.
library("ggplot2")
library("reshape2")
data.melt <- melt(data, "Year")
ggplot(data.melt, aes(x = Year, y = value, color = variable)) +
geom_point(shape=1, size=3, stroke=1.5) +
geom_line(linetype="solid", size=1.1) +
scale_colour_manual(values=c("darkorange", "red")) +
scale_y_continuous(sec.axis=sec_axis(~.*1/10, name="Annual\n")) +
labs(x="\nYear", y="Cumulative\n", color="Legend Title") +
theme(axis.title.x=element_text(size=18),
axis.text.x=element_text(size=14),
axis.title.y=element_text(size=18),
axis.text.y=element_text(size=14),
axis.ticks=element_blank(),
legend.position = c(0.1, 0.9))

set limits for scale_x_date in ggplot2 in facet_grid context

I have a data frame named "a" as :
date individus annee
80 2013-07-23 0 2013
77 2013-07-12 0 2013
63 2013-05-13 7 2013
72 2013-06-25 2 2013
7 2011-04-19 20 2011
58 2013-04-23 6 2013
4 2011-04-11 7 2011
52 2012-07-03 0 2012
56 2012-08-06 9 2012
6 2011-04-15 0 2011
38 2012-05-02 8 2012
67 2013-05-28 1 2013
66 2013-05-24 0 2013
59 2013-04-26 46 2013
73 2013-06-28 9 2013
74 2013-07-02 0 2013
22 2011-06-14 44 2011
70 2013-06-17 0 2013
41 2012-05-11 0 2012
14 2011-05-13 6 2011
42 2012-05-15 0 2012
27 2011-07-18 0 2011
18 2011-05-26 0 2011
36 2012-04-13 39 2012
31 2011-07-29 12 2011
55 2012-07-13 25 2012
49 2012-06-14 17 2012
50 2012-06-18 69 2012
51 2012-06-25 65 2012
57 2013-04-19 41 2013
I would like to plot this data with ggplot2, whit facet_grid on annee with this code
plot<-ggplot(data=lob.df)+
# geom_point(aes(x=date, y=individus))+
geom_smooth(aes(x=date, y=individus, colour=annee))+
labs(x="Date",y="Nombre d'individus")+
scale_colour_discrete(name="Année")+
facet_grid(.~annee)
how give me that :
And I would like to remove all blanc data ... so I have played with scale_y_date but I wasn't able to reduce the graph limit :-S
You can set the scales parameter in facet_grid to be "free_x" like this:
plot<-ggplot(data=df)+
# geom_point(aes(x=date, y=individus))+
geom_smooth(aes(x=date, y=individus, colour=annee))+
labs(x="Date",y="Nombre d'individus")+
facet_grid(.~annee,scales="free_x")
##
print(plot)
I had to modify the aesthetics of your plot a little bit because your code was not running on my machine (I'm not using a very recent release of R), but using facet_grid(.~annee,scales="free_x")
should still work fine for you.
Data:
df <- read.table(
text=" date individus annee
80 2013-07-23 0 2013
77 2013-07-12 0 2013
63 2013-05-13 7 2013
72 2013-06-25 2 2013
7 2011-04-19 20 2011
58 2013-04-23 6 2013
4 2011-04-11 7 2011
52 2012-07-03 0 2012
56 2012-08-06 9 2012
6 2011-04-15 0 2011
38 2012-05-02 8 2012
67 2013-05-28 1 2013
66 2013-05-24 0 2013
59 2013-04-26 46 2013
73 2013-06-28 9 2013
74 2013-07-02 0 2013
22 2011-06-14 44 2011
70 2013-06-17 0 2013
41 2012-05-11 0 2012
14 2011-05-13 6 2011
42 2012-05-15 0 2012
27 2011-07-18 0 2011
18 2011-05-26 0 2011
36 2012-04-13 39 2012
31 2011-07-29 12 2011
55 2012-07-13 25 2012
49 2012-06-14 17 2012
50 2012-06-18 69 2012
51 2012-06-25 65 2012
57 2013-04-19 41 2013")
##
df$date <- as.Date(df$date)
df$individus <- as.numeric(df$individus)
df$annee <- as.numeric(df$annee)

Time series, change monthly data to quarterly

Now I have some monthly data like :
1/1/90 620
2/1/90,591
3/1/90,574
4/1/90,542
5/1/90,534
6/1/90,545
#...etc
If I use ts() function, it's easy to make the data into time series structure like:
Jan Feb Mar ... Nov Dec
1990 620 591 574 ... 493 464
1991 100 200 300 ...........
Is there any possibilities to change it into quarterly repeating like this:
1st 2nd 3rd 4th
1990-Q1 620 591 574 464
1990-Q2 100 200 300 400
1990-Q3 ...
1990-Q4 ...
1991-Q1 ...
I tried to change
ts(mydata,start=c(1990,1),frequency=12)
to
ts(mydata,start=c(as.yearqrt("1990-1",1)),frequency=4)
but it seems not working.
Could anyone help me? Thank you very much.
monthly <- ts(mydata, start = c(1990, 1), frequency = 12)
quarterly <- aggregate(monthly, nfrequency = 4)
I don't agree with Hyndman on this one. Which is rare as Hyndman can usually do no wrong. However, I can show you his solution doesn't give the OP what he wants.
test<-c(1:100)
test_ts <- ts(test, start=c(2000,1), frequency=12)
test_ts
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2000 1 2 3 4 5 6 7 8 9 10 11 12
2001 13 14 15 16 17 18 19 20 21 22 23 24
2002 25 26 27 28 29 30 31 32 33 34 35 36
2003 37 38 39 40 41 42 43 44 45 46 47 48
2004 49 50 51 52 53 54 55 56 57 58 59 60
2005 61 62 63 64 65 66 67 68 69 70 71 72
2006 73 74 75 76 77 78 79 80 81 82 83 84
2007 85 86 87 88 89 90 91 92 93 94 95 96
2008 97 98 99 100
test_agg <- aggregate(test_ts, nfrequency=4)
test_agg
2000 6 15 24 33
2001 42 51 60 69
2002 78 87 96 105
2003 114 123 132 141
2004 150 159 168 177
2005 186 195 204 213
2006 222 231 240 249
2007 258 267 276 285
2008 294
Well, wait, that first quarter isn't the average of the 3 months, its the sum. (1+2+3 =6 but you want it to show the mean=2). So you will need to modify that a tad.
test_agg <- aggregate(test_ts, nfrequency=4)/3
# divisor is (old freq)/(new freq) = 12/4 = 3
Qtr1 Qtr2 Qtr3 Qtr4
2000 2 5 8 11
2001 14 17 20 23
2002 26 29 32 35
2003 38 41 44 47
2004 50 53 56 59
2005 62 65 68 71
2006 74 77 80 83
2007 86 89 92 95
2008 98
Which now shows you the mean of the monthly data written as quarterly.
The divisor is the trick here. If you had weekly (freq=52) and wanted quarterly (freq=4) you'd divide by 52/4=13.
If you want the mean instead of the sum, just add "mean":
quarterly <- aggregate(monthly, nfrequency=4,mean)

Summing cells of some rows and columns

I have a large data frame where some rows have repeated values in some of their columns. I want to keep the repeated values and sum those which are different. Below there is a sample of my data:
data<-data.frame(season=c(2008,2009,2010,2011,2011,2012,2000,2001),
lic=c(132228,140610,149215,158559,158559,944907,37667,45724),
client=c(174,174,174,174,174,174,175,175),
qtty=c(31,31,31,31,31,31,36,26),
held=c(60,65,58,68,68,70,29,23),
catch=c(7904,6761,9236,9323.2,801,NA,2330,3594.5),
potlift=c(2715,2218,3000,3887,750,NA,2314,3472))
.
season lic client qtty held catch potlift
2008 132228 174 31 60 7904 2715
2009 140610 174 31 65 6761 2218
2010 149215 174 31 58 9236 3000
2011 158559 174 31 68 9323.2 3887
2011 158559 174 31 68 801 750
2012 944907 174 31 70 NA NA
2000 37667 175 36 29 2330 2314
2001 45724 175 26 23 3594.5 3472
Note that the season 2011 is repeated, each variable (client... held), except catch and potlift. I need to keep the values of (client... held) and sum catch and potlift; therefore my new data frame should be like the example below:
season lic client qtty held catch potlift
2008 132228 174 31 60 7904 2715
2009 140610 174 31 65 6761 2218
2010 149215 174 31 58 9236 3000
2011 158559 174 31 68 10124.2 4637
2012 944907 174 31 70 NA NA
2000 37667 175 36 29 2330 2314
2001 45724 175 26 23 3594.5 3472
I have attempted to do so using aggregate, but this function sum everything. Any help will be appreciated.
data$catch <- with(data, ave(catch,list(lic,client,qtty,held),FUN=sum))
data$potlift <- with(data, ave(potlift,list(lic,client,qtty,held),FUN=sum))
unique(data)
season lic client qtty held catch potlift
1 2008 132228 174 31 60 7904.0 2715
2 2009 140610 174 31 65 6761.0 2218
3 2010 149215 174 31 58 9236.0 3000
4 2011 158559 174 31 68 10124.2 4637
6 2012 944907 174 31 70 NA NA
7 2000 37667 175 36 29 2330.0 2314
8 2001 45724 175 26 23 3594.5 3472
aggregate seems to work fine for me, but I'm not sure what you were trying:
> aggregate(cbind(catch, potlift) ~ ., data, sum, na.action = "na.pass")
season lic client qtty held catch potlift
1 2001 45724 175 26 23 3594.5 3472
2 2000 37667 175 36 29 2330.0 2314
3 2010 149215 174 31 58 9236.0 3000
4 2008 132228 174 31 60 7904.0 2715
5 2009 140610 174 31 65 6761.0 2218
6 2011 158559 174 31 68 10124.2 4637
7 2012 944907 174 31 70 NA NA
Here, use cbind to identify the columns that you want to aggregate by. You can then specify all the other columns, or just use . to indicate "use all other columns not mentioned in the cbind call.

Resources