This question already has answers here:
Aggregate by specific year in R
(2 answers)
Closed last year.
I'm currently having an issue manipulating/aggregating my dataframe. The current data frame I have is as follow:
Farm
Year
Cow
Duck
Chicken
Sheep
Horse
Farm 1
2020
22
12
100
30
25
Farm 1
2020
0
12
120
20
20
Farm 1
2019
16
6
80
10
16
Farm 1
2019
12
0
50
0
11
Farm 1
2018
8
0
0
16
0
Farm 1
2018
0
0
10
13
12
Farm 2
2020
31
28
27
10
14
Farm 2
2020
0
13
31
20
0
Farm 2
2019
3
31
0
20
43
Farm 2
2019
20
50
43
17
42
Farm 2
2018
39
33
0
48
10
Farm 2
2018
34
20
28
12
12
Farm 3
2020
27
0
37
30
42
Farm 3
2020
50
9
0
0
0
Farm 3
2019
0
19
0
20
16
Farm 3
2019
0
2
0
0
7
Farm 3
2018
0
0
5
27
0
Farm 3
2018
0
7
43
49
42
For simplicity, the code for the data frame is as follows:
Farms = c(rep("Farm 1", 6), rep("Farm 2", 6), rep("Farm 3", 6))
Year = rep(c(2020,2020,2019,2019,2018,2018),3)
Cow = c(22,0,16,12,8,0,31,0,3,20,39,34,27,50,0,0,0,0)
Duck = c(12,12,6,0,0,0,28,13,31,50,33,20,0,9,19,2,0,7)
Chicken = c(100,120,80,50,0,10,27,31,0,43,0,28,37,0,0,0,5,43)
Sheep = c(30,20,10,0,16,13,10,20,20,17,48,12,30,0,20,0,27,49)
Horse = c(25,20,16,11,0,12,14,0,43,42,10,12,42,0,16,7,0,42)
Data = data.frame(Farms, Year, Cow, Duck, Chicken, Sheep, Horse)
Can I check if anyone knows how I can change the dataframe to the following table below using group_by and/or aggregate and/or pivot_wider or any other ways? The dataframe below aggregated the farm by year and took the average of each animal for the year.
Farm
Year
Cow
Duck
Chicken
Sheep
Horse
Farm 1
2020
Average of 2020 = (22+0)/2 = 11
12
110
25
22.5
Farm 1
2019
14
3
65
5
13.5
Farm 1
2018
4
0
5
14.5
6
Farm 2
2020
15.5
20.5
29
15
7
Farm 2
2019
11.5
40.5
21.5
18.5
42.5
Farm 2
2018
36.5
26.5
14
30
11
Farm 3
2020
38.5
4.5
18.5
15
21
Farm 3
2019
0
10.5
0
10
11.5
Farm 3
2018
0
3.5
24
38
21
Thank you in Advance and a happy 2022 to all!
aggregate(.~Year + Farms, Data, mean)
Year Farms Cow Duck Chicken Sheep Horse
1 2018 Farm 1 4.0 0.0 5.0 14.5 6.0
2 2019 Farm 1 14.0 3.0 65.0 5.0 13.5
3 2020 Farm 1 11.0 12.0 110.0 25.0 22.5
4 2018 Farm 2 36.5 26.5 14.0 30.0 11.0
5 2019 Farm 2 11.5 40.5 21.5 18.5 42.5
6 2020 Farm 2 15.5 20.5 29.0 15.0 7.0
7 2018 Farm 3 0.0 3.5 24.0 38.0 21.0
8 2019 Farm 3 0.0 10.5 0.0 10.0 11.5
9 2020 Farm 3 38.5 4.5 18.5 15.0 21.0
aggregate(.~Farms + Year, Data, mean)
Farms Year Cow Duck Chicken Sheep Horse
1 Farm 1 2018 4.0 0.0 5.0 14.5 6.0
2 Farm 2 2018 36.5 26.5 14.0 30.0 11.0
3 Farm 3 2018 0.0 3.5 24.0 38.0 21.0
4 Farm 1 2019 14.0 3.0 65.0 5.0 13.5
5 Farm 2 2019 11.5 40.5 21.5 18.5 42.5
6 Farm 3 2019 0.0 10.5 0.0 10.0 11.5
7 Farm 1 2020 11.0 12.0 110.0 25.0 22.5
8 Farm 2 2020 15.5 20.5 29.0 15.0 7.0
9 Farm 3 2020 38.5 4.5 18.5 15.0 21.0
Data%>%
group_by(Farms, Year) %>%
summarise(across(everything(), mean), .groups = 'drop')
# A tibble: 9 x 7
Farms Year Cow Duck Chicken Sheep Horse
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Farm 1 2018 4 0 5 14.5 6
2 Farm 1 2019 14 3 65 5 13.5
3 Farm 1 2020 11 12 110 25 22.5
4 Farm 2 2018 36.5 26.5 14 30 11
5 Farm 2 2019 11.5 40.5 21.5 18.5 42.5
6 Farm 2 2020 15.5 20.5 29 15 7
7 Farm 3 2018 0 3.5 24 38 21
8 Farm 3 2019 0 10.5 0 10 11.5
9 Farm 3 2020 38.5 4.5 18.5 15 21
Onyambu's answer is good. But small thing - and I know you didn't ask for this - you might want to consider if by average you want the mean or median statistic. At first glance, looks like the data might be rather skewed and median might be better for you.
Data %>%
pivot_longer(names_to = 'names', values_to = 'values', 3:7) %>%
ggplot(aes(x = values)) + geom_density() + facet_wrap(~names)
I want to conditionally create a new var = old var. My data looks like this:
id id2
1.1 1 1
1.2 2 2
1.3 3 3
1.4 4 4
1.5 NA 5
5.5 5 6
5.6 6 7
5.7 7 8
5.8 8 9
5.51 NA 10
9.9 9 11
9.10 10 12
9.11 11 13
9.4 NA 14
12.12 12 15
12.2 NA 16
13.13 13 17
13.14 14 18
13.15 15 19
13.16 16 20
How can I create a new var = id2 when id is missing? If id is not missing, id3 is missing.
id id2 id3
1.1 1 1
1.2 2 2
1.3 3 3
1.4 4 4
1.5 NA 5 5
5.5 5 6
5.6 6 7
5.7 7 8
5.8 8 9
5.51 NA 10 10
9.9 9 11
9.10 10 12
9.11 11 13
9.4 NA 14 14
12.12 12 15
12.2 NA 16 16
13.13 13 17
13.14 14 18
13.15 15 19
13.16 16 20
Thanks!!
Assuming that dat is your data frame, you can do the following based on ifelse in base R.
dat$id3 <- with(dat, ifelse(is.na(id), id2, NA))
Or
dat2 <- transform(dat, id3 = ifelse(is.na(id), id2, NA))
DATA
dat <- read.table(text = " id id2
1.1 1 1
1.2 2 2
1.3 3 3
1.4 4 4
1.5 NA 5
5.5 5 6
5.6 6 7
5.7 7 8
5.8 8 9
5.51 NA 10
9.9 9 11
9.10 10 12
9.11 11 13
9.4 NA 14
12.12 12 15
12.2 NA 16
13.13 13 17
13.14 14 18
13.15 15 19
13.16 16 20",
header = TRUE)
This question already has answers here:
Selecting only integers from a vector [duplicate]
(2 answers)
Closed 5 years ago.
I would like to filter my data frame based on integer values from the first column v :
v P_el
1 2.5 0
2 3.0 78
3 3.5 172
4 4.0 287
5 4.5 426
6 5.0 601
7 5.5 814
8 6.0 1069
9 6.5 1367
10 7.0 1717
11 7.5 2110
12 8.0 2546
13 8.5 3002
14 9.0 3427
15 9.5 3751
16 10.0 3922
The output should look like this :
v P_el
2 3 78
4 4 287
6 5 601
8 6 1069
10 7 1717
12 8 2546
14 9 3427
16 10 3922
We can check if the values divided by one are with a remainder of 0.
dat[dat$v %% 1 == 0, ]
v P_el
2 3 78
4 4 287
6 5 601
8 6 1069
10 7 1717
12 8 2546
14 9 3427
16 10 3922
DATA
dat <- read.table(text = " v P_el
1 2.5 0
2 3.0 78
3 3.5 172
4 4.0 287
5 4.5 426
6 5.0 601
7 5.5 814
8 6.0 1069
9 6.5 1367
10 7.0 1717
11 7.5 2110
12 8.0 2546
13 8.5 3002
14 9.0 3427
15 9.5 3751
16 10.0 3922",
header = TRUE)
You can use seq( ) function if you have an idea of sequence in column v
dat
# v P_el
# 1 2.5 0
# 2 3.0 78
# 3 3.5 172
# 4 4.0 287
# 5 4.5 426
# 6 5.0 601
# 7 5.5 814
# 8 6.0 1069
# 9 6.5 1367
# 10 7.0 1717
# 11 7.5 2110
# 12 8.0 2546
# 13 8.5 3002
# 14 9.0 3427
# 15 9.5 3751
# 16 10.0 3922
dat[seq(2,16,by = 2),]
# v P_el
# 2 3 78
# 4 4 287
# 6 5 601
# 8 6 1069
# 10 7 1717
# 12 8 2546
# 14 9 3427
# 16 10 3922
I am trying to calculate diameter growth for a set of trees over a number of years in a dataframe in which each row is a given tree during a given year. Typically, this sort of data has each individual stem as a single row with that stem's diameter for each year given in a separate column, but for various reasons, this dataframe needs to remain such that each row is an individual stem in an individual year. A simplistic model version of the data would be as follows
df<-data.frame("Stem"=c(1:5,1:5,1,2,3,5,1,2,3,5,6),
"Year"=c(rep(1997,5), rep(1998,5), rep(1999,4), rep(2000,5)),
"Diameter"=c(1:5,seq(1.5,5.5,1),2,3,4,6,3,5,7,9,15))
df
Stem Year DAP
1 1 1997 1.0
2 2 1997 2.0
3 3 1997 3.0
4 4 1997 4.0
5 5 1997 5.0
6 1 1998 1.5
7 2 1998 2.5
8 3 1998 3.5
9 4 1998 4.5
10 5 1998 5.5
11 1 1999 2.0
12 2 1999 3.0
13 3 1999 4.0
14 5 1999 6.0
15 1 2000 3.0
16 2 2000 5.0
17 3 2000 7.0
18 5 2000 9.0
19 6 2000 15.0
What I am trying to accomplish is to make a new column that takes the diameter for a given stem in a given year and subtracts the diameter for that same stem in the previous year. I assume that this will require some set of nested for loops. Something like
for (i in 1:length(unique(df$Stem_ID){
for (t in 2:length(unique(df$Year){
.....
}
}
What I'm struggling with is how to write the function that calculates:
Diameter[t]-Diameter[t-1] for each stem. Any suggestions would be greatly appreciated.
Try:
> do.call(rbind, lapply(split(df, df$Stem), function(x) transform(x, diff = c(0,diff(x$Diameter)))))
Stem Year Diameter diff
1.1 1 1997 1.0 0.0
1.6 1 1998 1.5 0.5
1.11 1 1999 2.0 0.5
1.15 1 2000 3.0 1.0
2.2 2 1997 2.0 0.0
2.7 2 1998 2.5 0.5
2.12 2 1999 3.0 0.5
2.16 2 2000 5.0 2.0
3.3 3 1997 3.0 0.0
3.8 3 1998 3.5 0.5
3.13 3 1999 4.0 0.5
3.17 3 2000 7.0 3.0
4.4 4 1997 4.0 0.0
4.9 4 1998 4.5 0.5
5.5 5 1997 5.0 0.0
5.10 5 1998 5.5 0.5
5.14 5 1999 6.0 0.5
5.18 5 2000 9.0 3.0
6 6 2000 15.0 0.0
Rnso's answer works. You could also do the slightly shorter:
>df[order(df$Stem),]
>df$diff <- unlist(tapply(df$Diameter,df$Stem, function(x) c(NA,diff(x))))
Stem Year Diameter diff
1 1 1997 1.0 NA
6 1 1998 1.5 0.5
11 1 1999 2.0 0.5
15 1 2000 3.0 1.0
2 2 1997 2.0 NA
7 2 1998 2.5 0.5
12 2 1999 3.0 0.5
16 2 2000 5.0 2.0
3 3 1997 3.0 NA
8 3 1998 3.5 0.5
13 3 1999 4.0 0.5
17 3 2000 7.0 3.0
4 4 1997 4.0 NA
9 4 1998 4.5 0.5
5 5 1997 5.0 NA
10 5 1998 5.5 0.5
14 5 1999 6.0 0.5
18 5 2000 9.0 3.0
19 6 2000 15.0 NA
Or if you're willing to use the data.table package you can be very succinct:
>require(data.table)
>DT <- data.table(df)
>setkey(DT,Stem)
>DT <- DT[,diff:= c(NA, diff(Diameter)), by = Stem]
>df <- as.data.frame(DT)
Stem Year Diameter diff
1 1 1997 1.0 NA
2 1 1998 1.5 0.5
3 1 1999 2.0 0.5
4 1 2000 3.0 1.0
5 2 1997 2.0 NA
6 2 1998 2.5 0.5
7 2 1999 3.0 0.5
8 2 2000 5.0 2.0
9 3 1997 3.0 NA
10 3 1998 3.5 0.5
11 3 1999 4.0 0.5
12 3 2000 7.0 3.0
13 4 1997 4.0 NA
14 4 1998 4.5 0.5
15 5 1997 5.0 NA
16 5 1998 5.5 0.5
17 5 1999 6.0 0.5
18 5 2000 9.0 3.0
19 6 2000 15.0 NA
If you have a large dataset, data.table has the advantage of being extremely fast.
I am having a problem ploting my data as a 3D surface using this script:
wireframe(Z~X*Y, data=FI02, xlab="X", ylab="Y", main="Surface elevation", drape=TRUE,
colorkey=TRUE, screen=list(z=-60, x=-60))
The output is just a cube without data / surface (see attachment). What was my mistake?
"X" "Y" "Z" "Plot"
552032.707 413894.885 10.8 2
552033.707 413896.585 13.4 2
552036.907 413899.685 18.5 2
552039.307 413898.085 10.5 2
552039.807 413894.585 11.2 2
552044.107 413894.985 9 2
552044.007 413895.035 11.5 2
552043.607 413896.985 13.4 2
552047.407 413897.885 8.2 2
552045.207 413898.985 10.7 2
552042.307 413902.085 9.4 2
552040.907 413902.885 12.5 2
552036.607 413901.585 11.4 2
552036.207 413901.435 12.4 2
552039.907 413905.285 18 2
552036.707 413906.585 9.7 2
552037.407 413908.785 6.3 2
552038.907 413911.085 7.5 2
552039.607 413911.285 16.8 2
552041.107 413908.985 9.5 2
552041.307 413910.385 14.5 2
552042.207 413909.985 9.3 2
552050.707 413911.985 12.5 2
552048.907 413909.985 18.6 2
552044.507 413906.585 6.7 2
552047.807 413904.085 6.8 2
552048.007 413904.285 12.8 2
552050.407 413903.885 9.7 2
552049.107 413909.785 5.2 2
552050.507 413910.785 12.5 2
552052.407 413908.685 16.5 2
552057.907 413910.385 10.3 2
552058.707 413909.785 18.5 2
552058.907 413910.485 12.4 2
552059.707 413908.385 15.3 2
552060.307 413910.785 7.2 2
552061.207 413911.985 11.8 2
552071.007 413912.185 17 2
552068.707 413911.385 8.3 2
552069.107 413910.885 15.5 2
552068.607 413908.485 8 2
Try this to see why I don't think this data is well suited for wireframe:
cloud(Z~X+Y, data=FI02, xlab="X", ylab="Y", main="Surface elevation",
type="l", screen=list(z=-60, x=-60))