Where is my 'if-else' block going wrong? - r

I've a dataframe df file with the following data:
ID P1 P2 Year Month A B
11084 23 43 2001 April 41.9 -99.99
67985 76 12 2001 May 6.9 -9.99
11084 34 64 2001 June -999 -99.99
34084 56 77 2001 July NA -99.99
11043 90 54 2001 August NA -99.99
23084 55 32 2001 September 50.8 -99.99
11084 77 14 2001 October 0 -99.99
54328 89 56 2001 November -999 -99.99
I'm trying to add two new columns and fill 'Yes'/'No' values for the records with missing values. My expected output is:
ID P1 P2 Year Month A B A_miss B_miss
11084 23 43 2001 April 41.9 -99.99 No Yes
67985 76 12 2001 May 6.9 123 No No
11084 34 64 2001 June -999 -99.99 Yes Yes
34084 56 77 2001 July NA -99.99 Yes Yes
11043 90 54 2001 August NA -99.99 Yes Yes
23084 55 32 2001 September 50.8 -99.99 No Yes
11084 77 14 2001 October 0 -99.99 No Yes
54328 89 56 2001 November -999 -99.99 Yes Yes
I'm new to R. I was trying to achieve this using simple for loop and if/else conditions in the following way:
for(i in length(df$A))
{
if(df$A[i] == -999 || df$A[i] == 'NA')
df$A_miss[i] <- 'Yes'
else
df$A_miss[i] <- 'No'
}
I was firstly trying the loop on 'A' column, but only the else part was executing everytime I try and the 'No' values are being filled in the entire 'A_miss' column. I'm unable to find out why the if part isn't working.
Where am I going wrong?

Your loop is not correctly defined. This one works:
for (i in 1:length(df$A)) {
if(df$A[i] == -999 || is.na(df$A[i]) )
df$A_miss[i] <- 'Yes'
else
df$A_miss[i] <- 'No'
}
The limit should be set as (i in 1:length(df$A)), and not as (i in length(df$A). Hope this helps.
PS: As you can see, the important correction pointed out by #Pascal has been implemented here.
PPS: The version below should be much faster than your code with the for loop:
df$A_miss <- 'No'
df$A_miss[which(df$A==-999 | is.na(df$A)] <- 'Yes'
(I just noticed that this solution is very similar to the one that had been suggested earlier by #Daniel Fischer)

A vectorized version:
df <- structure(list(ID = c(11084L, 67985L, 11084L, 34084L, 11043L,
23084L, 11084L, 54328L), P1 = c(23L, 76L, 34L, 56L, 90L, 55L,
77L, 89L), P2 = c(43L, 12L, 64L, 77L, 54L, 32L, 14L, 56L), Year = c(2001L,
2001L, 2001L, 2001L, 2001L, 2001L, 2001L, 2001L), Month = structure(c(1L,
5L, 4L, 3L, 2L, 8L, 7L, 6L), .Label = c("April", "August", "July",
"June", "May", "November", "October", "September"), class = "factor"),
A = c(41.9, 6.9, -999, NA, NA, 50.8, 0, -999), B = c(-99.99,
123, -99.99, -99.99, -99.99, -99.99, -99.99, -99.99), A_miss = c("No",
"No", "Yes", "Yes", "Yes", "No", "No", "Yes")), .Names = c("ID",
"P1", "P2", "Year", "Month", "A", "B", "A_miss"), row.names = c(NA,
-8L), class = "data.frame")
df$A_miss <- ifelse(df$A == -999 | is.na(df$A), "yes", "no")
df$B_miss <- ifelse(df$B == -99.99 | is.na(df$B), "yes", "no")
ID P1 P2 Year Month A B A_miss B_miss
1 11084 23 43 2001 April 41.9 -99.99 no yes
2 67985 76 12 2001 May 6.9 123.00 no no
3 11084 34 64 2001 June -999.0 -99.99 yes yes
4 34084 56 77 2001 July NA -99.99 yes yes
5 11043 90 54 2001 August NA -99.99 yes yes
6 23084 55 32 2001 September 50.8 -99.99 no yes
7 11084 77 14 2001 October 0.0 -99.99 no yes
8 54328 89 56 2001 November -999.0 -99.99 yes yes

Maybe you could try this, without any loop or if clause:
df$A[(df$A==-999)|(is.na(df$A))] <- "yes"
df$A[df$A!="yes"] <- "no"

Using the which command might increase the speed of the process:
df$A_miss[which(df$A==-999 | is.na(df$A))] <- 'Yes'
df$A_miss[which(df$A_miss!='Yes')] <- 'no'

Related

Calculating Percent Change in R for Multiple Variables

I'm trying to calculate percent change in R with each of the time points included in the column label (table below). I have dplyr loaded and my dataset was loaded in R and I named it data. Below is the code I'm using but it's not calculating correctly. I want to create a new dataframe called data_per_chg which contains the percent change from "v1" each variable from. For instance, for wbc variable, I would like to calculate percent change of wbc.v1 from wbc.v1, wbc.v2 from wbc.v1, wbc.v3 from wbc.v1, etc, and do that for all the remaining variables in my dataset. I'm assuming I can probably use a loop to easily do this but I'm pretty new to R so I'm not quite sure how proceed. Any guidance will be greatly appreciated.
id
wbc.v1
wbc.v2
wbc.v3
rbc.v1
rbc.v2
rbc.v3
hct.v1
hct.v2
hct.v3
a1
23
63
30
23
56
90
13
89
47
a2
81
45
46
N/A
18
78
14
45
22
a3
NA
27
14
29
67
46
37
34
33
data_per_chg<-data%>%
group_by(id%>%
arrange(id)%>%
mutate(change=(wbc.v2-wbc.v1)/(wbc.v1))
data_per_chg
Assuming the NA values are all NA and no N/A
library(dplyr)
library(stringr)
data <- data %>%
na_if("N/A") %>%
type.convert(as.is = TRUE) %>%
mutate(across(-c(id, matches("\\.v1$")), ~ {
v1 <- get(str_replace(cur_column(), "v\\d+$", "v1"))
(.x - v1)/v1}, .names = "{.col}_change"))
-output
data
id wbc.v1 wbc.v2 wbc.v3 rbc.v1 rbc.v2 rbc.v3 hct.v1 hct.v2 hct.v3 wbc.v2_change wbc.v3_change rbc.v2_change rbc.v3_change hct.v2_change hct.v3_change
1 a1 23 63 30 23 56 90 13 89 47 1.7391304 0.3043478 1.434783 2.9130435 5.84615385 2.6153846
2 a2 81 45 46 NA 18 78 14 45 22 -0.4444444 -0.4320988 NA NA 2.21428571 0.5714286
3 a3 NA 27 14 29 67 46 37 34 33 NA NA 1.310345 0.5862069 -0.08108108 -0.1081081
If we want to keep the 'v1' columns as well
data %>%
na_if("N/A") %>%
type.convert(as.is = TRUE) %>%
mutate(across(ends_with('.v1'), ~ .x - .x,
.names = "{str_replace(.col, 'v1', 'v1change')}")) %>%
transmute(id, across(ends_with('change')),
across(-c(id, matches("\\.v1$"), ends_with('change')),
~ {
v1 <- get(str_replace(cur_column(), "v\\d+$", "v1"))
(.x - v1)/v1}, .names = "{.col}_change")) %>%
select(id, starts_with('wbc'), starts_with('rbc'), starts_with('hct'))
-output
id wbc.v1change wbc.v2_change wbc.v3_change rbc.v1change rbc.v2_change rbc.v3_change hct.v1change hct.v2_change hct.v3_change
1 a1 0 1.7391304 0.3043478 0 1.434783 2.9130435 0 5.84615385 2.6153846
2 a2 0 -0.4444444 -0.4320988 NA NA NA 0 2.21428571 0.5714286
3 a3 NA NA NA 0 1.310345 0.5862069 0 -0.08108108 -0.1081081
data
data <- structure(list(id = c("a1", "a2", "a3"), wbc.v1 = c(23L, 81L,
NA), wbc.v2 = c(63L, 45L, 27L), wbc.v3 = c(30L, 46L, 14L), rbc.v1 = c("23",
"N/A", "29"), rbc.v2 = c(56L, 18L, 67L), rbc.v3 = c(90L, 78L,
46L), hct.v1 = c(13L, 14L, 37L), hct.v2 = c(89L, 45L, 34L), hct.v3 = c(47L,
22L, 33L)), class = "data.frame", row.names = c(NA, -3L))

How to replace data in a column in R?

So I have a dataframe called "myData"
print(myData)
ID Name Status AGE
123 Mike Yes 18
124 John No 20
125 Lily Yes 21
126 Jasper No 24
127 Toby Yes 27
128 Will No 19
129 Oscar Yes 32
I received an updated dataframe that has updated "Status" called "myData2".
This dataframe has less observations than my original one and only has ID and Status.
This is the updated dataframe
print(myData2)
ID Status
123 Yes
125 Yes
126 Yes
128 No
129 No
Is there function where I can update 'Status' column in myData with the data in myData2 using the column "ID"?
This is my desired ouput
ID Name Status AGE
123 Mike Yes 18
124 John No 20
125 Lily Yes 21
126 Jasper Yes 24
127 Toby Yes 27
128 Will No 19
129 Oscar No 32
We can use data.table join to quickly update the first dataset 'Status' with the values of second after joining on 'ID'
library(data.table)
setDT(myData)[myData2, Status := i.Status, on = .(ID)]
myData
# ID Name Status AGE
#1: 123 Mike Yes 18
#2: 124 John No 20
#3: 125 Lily Yes 21
#4: 126 Jasper Yes 24
#5: 127 Toby Yes 27
#6: 128 Will No 19
#7: 129 Oscar No 32
In dplyr, we do a left_join and then coalesce the 'Status' columns
library(dplyr)
myData %>%
left_join(myData2, by = 'ID') %>%
mutate(Status = coalesce(Status.y, Status.x)) %>%
select(-Status.x, -Status.y)
data
myData <- structure(list(ID = 123:129, Name = c("Mike", "John", "Lily",
"Jasper", "Toby", "Will", "Oscar"), Status = c("Yes", "No", "Yes",
"No", "Yes", "No", "Yes"), AGE = c(18L, 20L, 21L, 24L, 27L, 19L,
32L)), class = "data.frame", row.names = c(NA, -7L))
myData2 <- structure(list(ID = c(123L, 125L, 126L, 128L, 129L), Status = c("Yes",
"Yes", "Yes", "No", "No")), class = "data.frame", row.names = c(NA,
-5L))
Here is a base R solution using merge, i.e.,
myData$Status <- with(merge(myData,myData2,by = "ID",all.x = TRUE),
ifelse(is.na(Status.y),Status.x,Status.y))
such that
> myData
ID Name Status AGE
1 123 Mike Yes 18
2 124 John No 20
3 125 Lily Yes 21
4 126 Jasper Yes 24
5 127 Toby Yes 27
6 128 Will No 19
7 129 Oscar No 32

How to loop variables from a data.frame into another into a single column

I am trying to extract only 32 specific Species from the data.frame dat and create another data.frame with all species into a single col, whilst I also extract the year, values, and temperature and place those into a single column. I am also placing the months that belong to each of these.
An example of data.frame:
structure(list(Year = c(1994L, 1995L, 1996L, 1997L, 1998L, 1999L,
2000L, 2002L, 2003L, 2004L, 2005L, 2006L, 2007L, 2008L, 2009L,
2010L, 2011L, 2012L, 2013L), Species = structure(c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
), .Label = "Blackbird", class = "factor"), Farmland = c(96.0309523809524,
96.8520833333333, 96.781746031746, 96.8597222222222, 97.4410299003322,
96.6654846335697, 96.858803986711, 97.0811403508772, 96.9259974259974,
97.2803571428571, 96.6017598343685, 96.3777777777778, 96.3227670288895,
96.8100546279118, 96.431746031746, 96.6232323232323, 96.2537878787879,
96.1431827431827, 96.0778288740245), X.Jan. = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L), .Label = "Jan", class = "factor"), atwo.TempJanuary = c(5.06916107894286,
4.390669300225, 3.88357903166667, 1.80642228995455, 5.16489863448837,
5.54367179174468, 4.83031500397674, 5.40830211455263, 4.26790743608108,
4.927588606725, 5.841963431, 4.3303368412, 7.08188921457143,
6.75067792993878, 2.83417096753488, 1.36880495640909, 4.35569636247727,
5.82305364068889, 3.52697043756522)), row.names = c(NA, -19L), class = "data.frame")
An extra example (This is the original data.frame dat):
structure(list(Year = c(2006L, 2007L, 1999L, 2004L, 1995L, 2011L,
2011L), Species = structure(c(2L, 4L, 3L, 6L, 2L, 5L, 1L), .Label = c("Buzzard",
"Collared Dove", "Greenfinch", "Linnet", "Meadow Pipit", "Willow Warbler"
), class = "factor"), TempJanuary = c(2.128387049, 4.233225712,
5.270967624, 4.826451505, 4.390322483, 3.841290237, 3.981290234
), TempFebruary = c(0.927499979, 3.098928502, 4.67428561, 5.05103437,
6.343214144, 6.414285571, 6.625356995), TempMarch = c(1.637741899,
3.22096767, 7.312257901, 6.444515985, 5.337096655, 6.787741784,
7.052903068), TempApril = c(4.877333224, 5.888999868, 9.510666454,
9.386333124, 9.005333132, 12.40966639, 12.50166639), TempMay = c(8.729999805,
7.748064343, 13.09096745, 12.1638707, 11.68935458, 12.83032229,
13.07967713), TempJune = c(11.48033308, 11.20633308, 13.91166636,
15.77399965, 14.05266635, 14.30733301, 14.56133301), TempJuly = c(14.86354805,
11.9338707, 17.85612863, 16.44451576, 18.92935442, 15.53612868,
15.75161255), TempAugust = c(12.45225779, 11.48419329, 16.54935447,
18.31516088, 19.22483828, 15.80225771, 16.08387061), TempSeptember = c(13.45633303,
10.09333311, 15.94333298, 15.27299966, 13.52733303, 15.41933299,
15.68566632), TempOctober = c(10.24387074, 7.462903059, 10.5161288,
10.84709653, 13.05967713, 12.67774165, 12.83967713), TempNovember = c(4.650999896,
3.614999919, 7.246333171, 7.388666502, 7.455999833, 9.371333124,
9.511333121), TempDecember = c(3.764516045, 2.116774146, 4.268064421,
4.825161182, 2.01741931, 5.582903101, 5.701290195), Farmland = c(100L,
100L, 40L, 90L, 80L, 10L, 80L)), row.names = c(1L, 100L, 1000L,
2000L, 3000L, 5000L, 10000L), class = "data.frame")
Another look into the data.frame:
'data.frame': 19 obs. of 5 variables:
$ Year : int 1994 1995 1996 1997 1998 1999 2000 2002 2003 2004 ...
$ Species : Factor w/ 1 level "Blackbird": 1 1 1 1 1 1 1 1 1 1 ...
$ Farmland : num 96 96.9 96.8 96.9 97.4 ...
$ X.Jan. : Factor w/ 1 level "Jan": 1 1 1 1 1 1 1 1 1 1 ...
$ atwo.TempJanuary: num 5.07 4.39 3.88 1.81 5.16 ...
A deeper look into dat:
Year Species TempJanuary TempFebruary TempMarch TempApril
1 2006 Collared Dove 2.128387 0.927500 1.637742 4.877333
100 2007 Linnet 4.233226 3.098929 3.220968 5.889000
1000 1999 Greenfinch 5.270968 4.674286 7.312258 9.510666
2000 2004 Willow Warbler 4.826452 5.051034 6.444516 9.386333
3000 1995 Collared Dove 4.390322 6.343214 5.337097 9.005333
5000 2011 Meadow Pipit 3.841290 6.414286 6.787742 12.409666
10000 2011 Buzzard 3.981290 6.625357 7.052903 12.501666
TempMay TempJune TempJuly TempAugust TempSeptember TempOctober
1 8.730000 11.48033 14.86355 12.45226 13.45633 10.243871
100 7.748064 11.20633 11.93387 11.48419 10.09333 7.462903
1000 13.090967 13.91167 17.85613 16.54935 15.94333 10.516129
2000 12.163871 15.77400 16.44452 18.31516 15.27300 10.847097
3000 11.689355 14.05267 18.92935 19.22484 13.52733 13.059677
5000 12.830322 14.30733 15.53613 15.80226 15.41933 12.677742
10000 13.079677 14.56133 15.75161 16.08387 15.68567 12.839677
TempNovember TempDecember Farmland
1 4.651000 3.764516 100
100 3.615000 2.116774 100
1000 7.246333 4.268064 40
2000 7.388667 4.825161 90
3000 7.456000 2.017419 80
5000 9.371333 5.582903 10
10000 9.511333 5.701290 80
And Here are a few examples of code I have been using to get here:
#Blackbird population-------------------------------------------------------------
Black_Bird<-aggregate(Farmland ~ Year + Species + TempJanuary, dat[dat$Species=="Blackbird" & dat$Farmland >80,],mean)
Black_bird <- ddply(Black_Bird, .(Year, Species, TempJanuary), Farmland=round(mean(Farmland), 2))
aone<-aggregate(Farmland ~ Year + Species, Black_bird, mean)
atwo<-aggregate(TempJanuary ~ Year + Species, Black_bird, mean)
aone<-aone[, -2]
#Buzzard Population-----------
Buzzard_Bird <-aggregate(Farmland ~ Year + Species + TempJanuary, dat[dat$Species=="Buzzard" & dat$Farmland >80,],mean)
Buzzard_bird <- ddply(Buzzard_Bird, .(Year, Species, TempJanuary), Farmland=round(mean(Farmland), 2))
athree<-aggregate(Farmland ~ Year + Species, Buzzard_bird, mean)
afour<-aggregate(TempJanuary ~ Year + Species, Buzzard_bird, mean)
athree<-athree[, -2]
#Combine and melt into single columns-----------------------------------------------------
mod1<-cbind(atwo, afour, aone, athree)
melt(mod1, id.vars = c("Year", "Farmland", "Species"), measure.vars = c("TempJanuary"), variable.name = "Month", value.name = "Temperature" )
melt has not been working efficiently, it doesn't seem to place buzzard into the same column as Blackbird. It stops at 19 rows and cuts off. This seems ineffective and time-consuming. Is there a faster and efficient solution?
This is what it should look like:
Year Species Farmland Month Temperature
2008 Blackbird 83.0 Jan 9.011174
2009 Blackbird 83.0 Jan 10.155201
2012 Greenfinch 83.0 Feb 9.578269
2009 Swallow 83.0 Mar 10.361573
2010 Robin 84.5 Oct 9.191641
I have 32 Species to select from:
[1] Dunnock Blackbird Song Thrush Bullfinch
[5] Corn Bunting Turtle Dove Grey Partridge Yellow Wagtail
[9] Starling Linnet Yellowhammer Skylark
[13] Kestrel Reed Bunting Whitethroat Greenfinch
[17] Rook Stock dove Goldfinch Woodpigeon
[21] Jackdaw House martin Swallow Lapwing
[25] Wren Robin Blue Tit Great tit
[29] Long-tailed Tit Chaffinch Buzzard Sparrowhawk
32 Levels: Blackbird Blue Tit Bullfinch Buzzard ... Yellowhammer
And 12 months of temperature from Jan-December.
These are some previous codes that led me into the wrong direction:
library(psych)
dat_two <- aggregate(Farmland ~ Species + Year + TempJanuary + TempFebruary + TempMarch + TempApril + TempMay + TempJune + TempJuly + TempAugust + TempSeptember + TempOctober + TempNovember + TempDecember, dat[dat$Species %in% c('Starling', 'Skylark', 'Yellow Wagtail', 'Kestrel', 'Yellowhammer', 'Greenfinch', 'Swallow', 'Lapwing', 'House Martin', 'Long-tailed Tit', 'Linnet', 'Grey Partridge', 'Turtle Dove', 'Corn Bunting', 'Bullfinch', 'Song Thrush', 'Blackbird', 'Dunnock', 'Whitethroat', 'Rook', 'Woodpigeon', 'Reed Bunting', 'Stock Dove', 'Goldfinch', 'Jackdaw', 'Wren', 'Robin', 'Blue Tit', 'Great Tit', 'Chaffinch', 'Buzzard', 'Sparrowhawk') & dat$Farmland >80,], mean)
dat_three <- aggregate(Farmland ~ Species + Year + TempJanuary + TempFebruary + TempMarch + TempApril + TempMay + TempJune + TempJuly + TempAugust + TempSeptember + TempOctober + TempNovember + TempDecember , dat_two, mean)
colnames(dat_two) <- c("Species", "Year", "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec", "Farmland")
library(plyr)
dat_one <- ddply(dat_three, .(Species, Year, TempJanuary, TempFebruary, TempMarch, TempApril, TempMay, TempJune, TempJuly, TempAugust, TempSeptember, TempOctober, TempNovember, TempDecember), summarise, mean = round(mean(Farmland), 2))
#-----------------------------------------------------------------
Jan_Year <- ddply(dat_one, .(Year), summarise, TempJanuary=round(geometric.mean(TempJanuary, na.rm=TRUE), 2))
Feb_Year <- ddply(dat_one, .(Year), summarise, TempFebruary=round(geometric.mean(TempFebruary, na.rm=TRUE), 2))
Mar_Year <- ddply(dat_one, .(Year), summarise, TempMarch=round(geometric.mean(TempMarch, na.rm=TRUE), 2))
Apr_Year <- ddply(dat_one, .(Year), summarise, TempApril=round(geometric.mean(TempApril, na.rm=TRUE), 2))
May_Year <- ddply(dat_one, .(Year), summarise, TempMay=round(geometric.mean(TempMay, na.rm=TRUE), 2))
Jun_Year <- ddply(dat_one, .(Year), summarise, TempJune=round(geometric.mean(TempJune, na.rm=TRUE), 2))
Jun_Year <- ddply(dat_one, .(Year), summarise, TempJune=round(geometric.mean(TempJune, na.rm=TRUE), 2))
Jul_Year <- ddply(dat_one, .(Year), summarise, TempJuly=round(geometric.mean(TempJuly, na.rm=TRUE), 2))
Aug_Year <- ddply(dat_one, .(Year), summarise, TempAugust=round(geometric.mean(TempAugust, na.rm=TRUE), 2))
Sep_Year <- ddply(dat_one, .(Year), summarise, TempSeptember=round(geometric.mean(TempSeptember, na.rm=TRUE), 2))
Oct_Year <- ddply(dat_one, .(Year), summarise, TempOctober=round(geometric.mean(TempOctober, na.rm=TRUE), 2))
Nov_Year <- ddply(dat_one, .(Year), summarise, TempNovember=round(geometric.mean(TempNovember, na.rm=TRUE), 2))
Dec_Year <- ddply(dat_one, .(Year), summarise, TempDecember=round(geometric.mean(TempDecember, na.rm=TRUE), 2))
Farm_Year <- ddply(dat_one, .(Year), summarise, Farmland=round(geometric.mean(mean, na.rm=TRUE), 2))
Farm_Temp <- cbind(Farm_Year, Jan_Year, Feb_Year, Mar_Year, Apr_Year,May_Year, Jun_Year, Jul_Year, Aug_Year, Sep_Year, Oct_Year, Nov_Year, Dec_Year)
Farm_Temp <- Farm_Temp[, !duplicated(colnames(Farm_Temp))]
colnames(Farm_Temp) <- c("Year", "Farmland", "Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")
Farm_Temp <- Farm_Temp[, -2]
#-----------------------------
Spring <- aggregate((TempMarch + TempApril + TempMay)/3~Year, Farm_Temp, mean)
Summer <- aggregate((TempJune + TempJuly + TempAugust)/3 ~ Year, Farm_Temp, geometric.mean)
Autumn <- aggregate((TempSeptember + TempOctober+TempNovember)/3~Year, Farm_Temp, geometric.mean)
Winter <- aggregate((TempDecember + TempJanuary + TempFebruary)/3~Year, Farm_Temp, geometric.mean)
Season_Temp <- cbind(Farm_Year, Spring,Summer, Autumn, Winter)
Season_Temp <- Season_Temp[, !duplicated(colnames(Season_Temp))]
colnames(Season_Temp) <- c("Year", "Farmland", "spring", "Summer", "Autumn", "Winter")
#--------------------------------------------------------------------------------------------------------------
library(reshape2)
Season_practice <- aggregate((Mar+ Apr + May)/3 ~ Year + Species + Farmland, dat_two, geometric.mean)
prac1 <- aggregate((Jun+ Jul + Aug)/3 ~ Year + Species + Farmland, dat_two, geometric.mean)
prac1 <- prac1[, c(-1, -2, -3)]
prac2 <- aggregate((Sep + Oct + Nov)/3 ~ Year + Species + Farmland, dat_two, geometric.mean)
prac2 <- prac2[, c(-1, -2, -3)]
prac3 <- aggregate((Dec+ Jan + Feb)/3 ~ Year + Species + Farmland, dat_two, geometric.mean)
prac3 <- prac3[, c(-1, -2, -3)]
Season_practice <- cbind(Season_practice, prac1, prac2, prac3)
colnames(Season_practice) <- c("Year", "Species", "Farmland", "Spring", "Summer", "Autumn", "Winter")
Seasonal_Temp <- melt(Season_practice, id.vars = c("Year", "Species", "Farmland"), measure = c("Spring", "Summer", "Autumn", "Winter"), variable.name = "Month", value.name = "Temperature")
Practicing_Temp <- melt(dat_two, id.vars = c("Year", "Species"), measure = c('Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'), variable.name = "Month", value.name = "Temperature")
This is an extract from a larger data.frame in attempt to do what was mentioned. As you can see the seasons in respect to values are repeating themselves which shouldn't be the case as the months have different values, so I must have gone wrong on the way:
Year Species Month Farmland
1 1994 Blackbird Spring 95.96875
2 1995 Blackbird Spring 95.46875
3 1996 Blackbird Spring 95.64815
4 1997 Blackbird Spring 95.62071
5 1998 Blackbird Spring 95.71925
6 1999 Blackbird Spring 95.74444
7 2000 Blackbird Spring 95.82440
8 2002 Blackbird Spring 95.78333
9 2003 Blackbird Spring 95.61640
10 2004 Blackbird Spring 95.86797
11 2005 Blackbird Spring 95.08452
12 2006 Blackbird Spring 94.66667
13 2007 Blackbird Spring 95.60745
14 2008 Blackbird Spring 93.98383
15 2009 Blackbird Spring 95.08167
16 2010 Blackbird Spring 95.23426
17 2011 Blackbird Spring 95.25000
18 2012 Blackbird Spring 94.75204
19 2013 Blackbird Spring 94.28821
20 1994 Blackbird Summer 95.96875
21 1995 Blackbird Summer 95.46875
22 1996 Blackbird Summer 95.64815
23 1997 Blackbird Summer 95.62071
24 1998 Blackbird Summer 95.71925
25 1999 Blackbird Summer 95.74444
26 2000 Blackbird Summer 95.82440
27 2002 Blackbird Summer 95.78333
28 2003 Blackbird Summer 95.61640
29 2004 Blackbird Summer 95.86797
30 2005 Blackbird Summer 95.08452
31 2006 Blackbird Summer 94.66667
32 2007 Blackbird Summer 95.60745
33 2008 Blackbird Summer 93.98383
34 2009 Blackbird Summer 95.08167
35 2010 Blackbird Summer 95.23426
36 2011 Blackbird Summer 95.25000
37 2012 Blackbird Summer 94.75204
38 2013 Blackbird Summer 94.28821
39 1994 Blackbird Autumn 95.96875
40 1995 Blackbird Autumn 95.46875
41 1996 Blackbird Autumn 95.64815
42 1997 Blackbird Autumn 95.62071
43 1998 Blackbird Autumn 95.71925
44 1999 Blackbird Autumn 95.74444
45 2000 Blackbird Autumn 95.82440
46 2002 Blackbird Autumn 95.78333
47 2003 Blackbird Autumn 95.61640
48 2004 Blackbird Autumn 95.86797
49 2005 Blackbird Autumn 95.08452
50 2006 Blackbird Autumn 94.66667
51 2007 Blackbird Autumn 95.60745
52 2008 Blackbird Autumn 93.98383
53 2009 Blackbird Autumn 95.08167
54 2010 Blackbird Autumn 95.23426
55 2011 Blackbird Autumn 95.25000
56 2012 Blackbird Autumn 94.75204
57 2013 Blackbird Autumn 94.28821
58 1994 Blackbird Winter 95.96875
59 1995 Blackbird Winter 95.46875
60 1996 Blackbird Winter 95.64815
61 1997 Blackbird Winter 95.62071
62 1998 Blackbird Winter 95.71925
63 1999 Blackbird Winter 95.74444
64 2000 Blackbird Winter 95.82440
65 2002 Blackbird Winter 95.78333
66 2003 Blackbird Winter 95.61640
67 2004 Blackbird Winter 95.86797
68 2005 Blackbird Winter 95.08452
69 2006 Blackbird Winter 94.66667
70 2007 Blackbird Winter 95.60745
71 2008 Blackbird Winter 93.98383
72 2009 Blackbird Winter 95.08167
73 2010 Blackbird Winter 95.23426
74 2011 Blackbird Winter 95.25000
75 2012 Blackbird Winter 94.75204
76 2013 Blackbird Winter 94.28821
Consider reshape to re-structure your data from wide to long format and then aggregate by year, month, or assigned season.
Input
Year,Species,TempJanuary,TempFebruary,TempMarch,TempApril,TempMay,TempJune,TempJuly,TempAugust,TempSeptember,TempOctober,TempNovember,TempDecember,Farmland
2006,Collared Dove,2.128387,0.9275,1.637742,4.877333,8.73,11.48033,14.86355,12.45226,13.45633,10.243871,4.651,3.764516,100
2007,Linnet,4.233226,3.098929,3.220968,5.889,7.748064,11.20633,11.93387,11.48419,10.09333,7.462903,3.615,2.116774,100
1999,Greenfinch,5.270968,4.674286,7.312258,9.510666,13.090967,13.91167,17.85613,16.54935,15.94333,10.516129,7.246333,4.268064,40
2004,Willow Warbler,4.826452,5.051034,6.444516,9.386333,12.163871,15.774,16.44452,18.31516,15.273,10.847097,7.388667,4.825161,90
1995,Collared Dove,4.390322,6.343214,5.337097,9.005333,11.689355,14.05267,18.92935,19.22484,13.52733,13.059677,7.456,2.017419,80
2011,Meadow Pipit,3.84129,6.414286,6.787742,12.409666,12.830322,14.30733,15.53613,15.80226,15.41933,12.677742,9.371333,5.582903,10
2011,Buzzard,3.98129,6.625357,7.052903,12.501666,13.079677,14.56133,15.75161,16.08387,15.68567,12.839677,9.511333,5.70129,80
R
bird_df = read.csv(...)
# RESHAPE WIDE TO LONG
r_df <- reshape(bird_df, varying = colnames(bird_df)[3:14], times = colnames(bird_df)[3:14],
v.names = "Temperature", timevar = "Month",
new.row.names = 1:1E5, direction = "long")
# ASSIGN COLUMNS
r_df$Month <- factor(substr(gsub("Temp", "", r_df$Month), 1, 3), levels = month.abb)
r_df$Season <- ifelse(r_df$Month %in% c("Mar", "Apr", "May"), "Spring",
ifelse(r_df$Month %in% c("Jun", "Jul", "Aug"), "Summer",
ifelse(r_df$Month %in% c("Sep", "Oct", "Nov"), "Autumn",
ifelse(r_df$Month %in% c("Dec", "Jan", "Feb"), "Winter", NA)
)
)
)
# RE-ORDER ROWS
r_df <- data.frame(with(r_df, r_df[order(Year, Month, Species),]),
row.names = NULL)
Output
head(r_df)
# Year Species Farmland Month Temperature id Season
# 1 1995 Collared Dove 80 Jan 4.390322 5 Winter
# 2 1995 Collared Dove 80 Feb 6.343214 5 Winter
# 3 1995 Collared Dove 80 Mar 5.337097 5 Spring
# 4 1995 Collared Dove 80 Apr 9.005333 5 Spring
# 5 1995 Collared Dove 80 May 11.689355 5 Spring
# 6 1995 Collared Dove 80 Jun 14.052670 5 Summer
# ...
aggregate(cbind(Temperature, Farmland) ~ Species + Year, r_df, mean)
# Year Species Temperature Farmland
# 1 2011 Buzzard 11.114639 80
# 2 1995 Collared Dove 10.419384 80
# 3 2006 Collared Dove 7.434402 100
# 4 1999 Greenfinch 10.512513 40
# 5 2007 Linnet 6.841882 100
# ...
aggregate(cbind(Temperature, Farmland) ~ Species + Year + Month, r_df, mean)
# Year Month Species Temperature Farmland
# 1 2011 Jan Buzzard 3.981290 80
# 2 2011 Feb Buzzard 6.625357 80
# 3 2011 Mar Buzzard 7.052903 80
# 4 2011 Apr Buzzard 12.501666 80
# 5 2011 May Buzzard 13.079677 80
# ...
aggregate(cbind(Temperature, Farmland) ~ Species + Year + Season, r_df, mean)
# Species Year Season Temperature Farmland
# 1 Collared Dove 1995 Autumn 11.347669 80
# 2 Greenfinch 1999 Autumn 11.235264 40
# 3 Willow Warbler 2004 Autumn 11.169588 90
# 4 Collared Dove 2006 Autumn 9.450400 100
# 5 Linnet 2007 Autumn 7.057078 100
# ...
I think this is what you're asking for? You have to install tidyverse
library('tidyverse')
dat %>%
pivot_longer(matches('Temp'),
names_to = 'Month',
values_to = 'Temp',
names_prefix = 'Temp')

Months to integer R

This is part of the dataframe I am working on. The first column represents the year, the second the month, and the third one the number of observations for that month of that year.
2005 07 2
2005 10 4
2005 12 2
2006 01 4
2006 02 1
2006 07 2
2006 08 1
2006 10 3
I have observations from 2000 to 2018. I would like to run a Kernel Regression on this data, so I need to create a continuum integer from a date class vector. For instance Jan 2000 would be 1, Jan 2001 would be 13, Jan 2002 would be 25 and so on. With that I will be able to run the Kernel. Later on, I need to translate that back (1 would be Jan 2000, 2 would be Feb 2000 and so on) to plot my model.
Just use a little algebra:
df$cont <- (df$year - 2000L) * 12L + df$month
You could go backward with modulus and integer division.
df$year <- df$cont %/% 12 + 2000L
df$month <- df$cont %% 12 # 12 is set at 0, so fix that with next line.
df$month[df$month == 0L] <- 12L
Here, %% is the modulus operator and %/% is the integer division operator. See ?"%%" for an explanation of these and other arithmetic operators.
What you can do is something like the following. First create a dates data.frame with expand.grid so we have all the years and months from 2000 01 to 2018 12. Next put this in the correct order and last add an order column so that 2000 01 starts with 1 and 2018 12 is 228. If you merge this with your original table you get the below result. You can then remove columns you don't need. And because you have a dates table you can return the year and month columns based on the order column.
dates <- expand.grid(year = seq(2000, 2018), month = seq(1, 12))
dates <- dates[order(dates$year, dates$month), ]
dates$order <- seq_along(dates$year)
merge(df, dates, by.x = c("year", "month"), by.y = c("year", "month"))
year month obs order
1 2005 10 4 70
2 2005 12 2 72
3 2005 7 2 67
4 2006 1 4 73
5 2006 10 3 82
6 2006 2 1 74
7 2006 7 2 79
8 2006 8 1 80
data:
df <- structure(list(year = c(2005L, 2005L, 2005L, 2006L, 2006L, 2006L, 2006L, 2006L),
month = c(7L, 10L, 12L, 1L, 2L, 7L, 8L, 10L),
obs = c(2L, 4L, 2L, 4L, 1L, 2L, 1L, 3L)),
class = "data.frame",
row.names = c(NA, -8L))
An option is to use yearmon type from zoo package and then calculate difference of months from Jan 2001 using difference between yearmon type.
library(zoo)
# +1 has been added to difference so that Jan 2001 is treated as 1
df$slNum = (as.yearmon(paste0(df$year, df$month),"%Y%m")-as.yearmon("200001","%Y%m"))*12+1
# year month obs slNum
# 1 2005 7 2 67
# 2 2005 10 4 70
# 3 2005 12 2 72
# 4 2006 1 4 73
# 5 2006 2 1 74
# 6 2006 7 2 79
# 7 2006 8 1 80
# 8 2006 10 3 82
Data:
df <- read.table(text =
"year month obs
2005 07 2
2005 10 4
2005 12 2
2006 01 4
2006 02 1
2006 07 2
2006 08 1
2006 10 3",
header = TRUE, stringsAsFactors = FALSE)

Transform Year-to-date to Quarterly data with data.table

Quarterly data from a data provider has the issue that for some variables the quarterly data values are actually Year-to-date figures. That means the values are the sum of all previous quarters (Q2 = Q1 + Q2 , Q3 = Q1 + Q2 + Q3, ...).
The structure of the original data looks the following:
library(data.table)
library(plyr)
dt.quarter.test <- structure(list(Year = c(2000L, 2000L, 2000L, 2000L, 2001L, 2001L, 2001L, 2001L)
, Quarter = c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L)
, Data.Year.to.Date = c(162, 405, 610, 938, 331, 1467, 1981, 2501))
, .Names = c("Year", "Quarter", "Data.Year.to.Date"), class = c("data.table", "data.frame"), row.names = c(NA, -8L))
In order to calculate the quarterly values I therefore need to subtract the previous Quarter from Q2, Q3 and Q4.
I've managed to get the desired results by using the ddply function from the plyr package.
dt.quarter.result <- ddply(dt.quarter.test, "Year"
, transform
, Data.Quarterly = Data.Year.to.Date - shift(Data.Year.to.Date, n = 1L, type = "lag", fill = 0))
dt.quarter.result
Year Quarter Data.Year.to.Date Data.Quarterly
1 2000 1 162 162
2 2000 2 405 243
3 2000 3 610 205
4 2000 4 938 328
5 2001 1 331 331
6 2001 2 1467 1136
7 2001 3 1981 514
8 2001 4 2501 520
But I am not really happy with the command, since it seems quite clumsy and I would like to get some input on how to improve it and especially do it directly within the data.table.
Here is the data.table syntax, and you might find data.table cheat sheet helpful:
library(data.table)
dt.quarter.test[, Data.Quarterly := Data.Year.to.Date - shift(Data.Year.to.Date, fill = 0), Year][]
# Year Quarter Data.Year.to.Date Data.Quarterly
# 1: 2000 1 162 162
# 2: 2000 2 405 243
# 3: 2000 3 610 205
# 4: 2000 4 938 328
# 5: 2001 1 331 331
# 6: 2001 2 1467 1136
# 7: 2001 3 1981 514
# 8: 2001 4 2501 520

Resources