Calculating the change in % of data by year - r

I am trying to calculate the % change by year in the following dataset, does anyone know if this is possible?
I have the difference but am unsure how we can change this into a percentage
C diff(economy_df_by_year$gdp_per_capita)
df
year gdp
1998 8142.
1999 8248.
2000 8211.
2001 7926.
2002 8366.
2003 10122.
2004 11493.
2005 12443.
2006 13275.
2007 15284.

Assuming that gdp is the total value, you could do something like this:
library(tidyverse)
tribble(
~year, ~gdp,
1998, 8142,
1999, 8248,
2000, 8211,
2001, 7926,
2002, 8366,
2003, 10122,
2004, 11493,
2005, 12443,
2006, 13275,
2007, 15284
) -> df
df |>
mutate(pdiff = 100*(gdp - lag(gdp))/gdp)
#> # A tibble: 10 × 3
#> year gdp pdiff
#> <dbl> <dbl> <dbl>
#> 1 1998 8142 NA
#> 2 1999 8248 1.29
#> 3 2000 8211 -0.451
#> 4 2001 7926 -3.60
#> 5 2002 8366 5.26
#> 6 2003 10122 17.3
#> 7 2004 11493 11.9
#> 8 2005 12443 7.63
#> 9 2006 13275 6.27
#> 10 2007 15284 13.1
Which relies on the tidyverse framework.
If gdp is the difference, you will need the total to get a percentage, if that is what you mean by change in percentage by year.

df$change <- NA
df$change[2:10] <- (df[2:10, "gdp"] - df[1:9, "gdp"]) / df[1:9, "gdp"]
This assigns the yearly GDP growth to each row except the first one where it remains as NA

df$diff <- c(0,diff(df$gdp))
df$percentDiff <- 100*(c(0,(diff(df$gdp)))/(df$gdp - df$diff))
This is another possibility.

Related

Find average change in timeseries

I have an annual mean timeseries dataset for 15 years, and I am trying to find the average change/increase/decrease in this timeseries.
The timeseries I have is spatial (average values for each grid-cell/pixel, years repeat).
How can I do this in R via dplyr?
Sample data
year = c(2005, 2005, 2005, 2005, 2006, 2006, 2006, 2006, 2007, 2007, 2007, 2007, 2008, 2008, 2008, 2008)
Tmean = c(24, 24.5, 25.8,25, 24.8, 25, 23.5, 23.8, 24.8, 25, 25.2, 25.8, 25.3, 25.6, 25.2, 25)
Code
library(tidyverse)
df = data.frame(year, Tmean)
change = df$year %>%
# Sort by year
arrange(year) %>%
mutate(Diff_change = Tmean - lag(Tmean), # Difference in Tmean between years
Rate_percent = (Diff_change / year)/Tmean * 100) # Percent change # **returns inf values**
Average_change = mean(change$Rate_percent, na.rm = TRUE)
To find the average: mean(). To find the differences or changes: diff()
So, to find the average change:
> avg_change <- mean(diff(Tmean))
> print(avg_change)
[1] 0.06666667
If you need that in percentage, then you want to find out how much the difference between an element and its previous one (this year - last year) is in percentage with respect to last year, like so:
> pct_change <- Tmean[2:length(Tmean)] / Tmean[1:(length(Tmean)-1)] - 1
> avg_pct_change <- mean(pct_change) * 100
> print(avg_pct_change)
[1] 0.3101632
We can put those vectors into a data frame to use with dplyr (...if that's how you want to do it; this is straightforward with base R as well).
library(dplyr)
df <- data.frame(year, Tmean)
change <- df %>%
arrange(year) %>%
mutate(Diff_change = Tmean - lag(Tmean), # Difference in Tmean between years
Diff_time = year - lag(year),
Rate_percent = (Diff_change/Diff_time)/lag(Tmean) * 100) # Percent change
Average_change = mean(change$Rate_percent, na.rm = TRUE)
Results (with updated question data)
> change
year Tmean Diff_change Rate_percent
1 2005 24.0 NA NA
2 2005 24.5 0.5 2.0833333
3 2005 25.8 1.3 5.3061224
4 2005 25.0 -0.8 -3.1007752
5 2006 24.8 -0.2 -0.8000000
6 2006 25.0 0.2 0.8064516
7 2006 23.5 -1.5 -6.0000000
8 2006 23.8 0.3 1.2765957
9 2007 24.8 1.0 4.2016807
10 2007 25.0 0.2 0.8064516
11 2007 25.2 0.2 0.8000000
12 2007 25.8 0.6 2.3809524
13 2008 25.3 -0.5 -1.9379845
14 2008 25.6 0.3 1.1857708
15 2008 25.2 -0.4 -1.5625000
16 2008 25.0 -0.2 -0.7936508
> Average_change
[1] 0.3101632

R: How do I avoid getting an error when merging two data frames (group by/summarise)?

I have a big data frame of 80,000 rows. It was created by combining individual data frames from different years. The origin variable indicates the year of the entry's original data frame.
Here is an example of the first few of the big data frame rows that show how data frames from 2003 and 2011 were combined.
df_1:
ID City State origin
1 NY NY 2003
2 NY NY 2003
3 SF CA 2003
1 NY NY 2011
3 SF CA 2011
2 NY NY 2011
4 LA CA 2011
5 SD CA 2011
Now I want to create a new variable called first_appearance that takes the min of the origin variable for each ID:
final_df:
ID City State origin first_appearance
1 NY NY 2003 2003
2 NY NY 2003 2003
3 SF CA 2003 2003
1 NY NY 2011 2003
3 SF CA 2011 2003
2 NY NY 2011 2003
4 LA CA 2011 2011
5 SD CA 2011 2011
So far, I've tried using:
prestep_final <- df_1 %>% group_by(ID) %>% summarise(first_apperance = min(origin))
final_df <- merge(prestep_final, df_1, by = "ID")
Prestep_final works and produces a data frame with the ID and the first_appearance.
Unfortunately, the merge step doesn't work and yields a data frame with NA entries only.
How can I improve my code so that I can produce a table like final_df above. I'd appreciate any suggestions and don't have package preferences.
If you change summarise to mutate you get your desired result without merging:
library(tidyverse)
df <- tibble::tribble(
~ID, ~City, ~State, ~origin,
1, 'NY', 'NY', 2003,
2, 'NY', 'NY', 2003,
3, 'SF', 'CA', 2003,
1, 'NY', 'NY', 2011,
3, 'SF', 'CA', 2011,
2, 'NY', 'NY', 2011,
4, 'LA', 'CA', 2011,
5, 'SD', 'CA', 2011
)
df %>% group_by(ID) %>%
mutate(first_appearance = min(origin))
#> # A tibble: 8 x 5
#> # Groups: ID [5]
#> ID City State origin first_appearance
#> <dbl> <chr> <chr> <dbl> <dbl>
#> 1 1 NY NY 2003 2003
#> 2 2 NY NY 2003 2003
#> 3 3 SF CA 2003 2003
#> 4 1 NY NY 2011 2003
#> 5 3 SF CA 2011 2003
#> 6 2 NY NY 2011 2003
#> 7 4 LA CA 2011 2011
#> 8 5 SD CA 2011 2011
Created on 2020-06-10 by the reprex package (v0.3.0)
An option with data.table
library(data.table)
setDT(df)[, first_appearance := min(origin), ID]
Or in base R
df$first_appearance <- with(df, ave(origin, ID, FUN = min))

Select unique entries showing at least one value from another column

I have the following dataset (32000 entries) of water chemical compounds annual means organized by monitoring sites and sampling year:
data= data.frame(Site_ID=c(1, 1, 1, 2, 2, 2, 3, 3, 3), Year=c(1976, 1977, 1978, 2004, 2005, 2006, 2003, 2004, 2005), AnnualMean=c(1.1, 1.2, 1.1, 2.1, 2.6, 3.1, 2.7, 2.6, 1.9))
Site_ID Year AnnualMean
1 1976 1.1
1 1977 1.2
1 1978 1.1
2 2004 2.1
2 2005 2.6
2 2006 3.1
3 2003 2.7
3 2004 2.6
3 2005 1.9
I would like to select the data only from all monitoring sites showing at least a measurement in 2005 in their time range. With the above dataset, the expect output dataset would be:
Site_ID Year AnnualMean
2 2004 2.1
2 2005 2.6
2 2006 3.1
3 2003 2.7
3 2004 2.6
3 2005 1.9
I am completely new in R and have been spinning my head around with data manipulation, so thank you in advance!
With dplyr:
library(dplyr)
data %>%
group_by(Site_ID) %>%
filter(2005 %in% Year)
Here is a base R solution, using subset + ave
dfout <- subset(df,!!ave(Year,Site_ID,FUN = function(x) "2005" %in% x))
such that
> dfout
Site_ID Year AnnualMean
4 2 2004 2.1
5 2 2005 2.6
6 2 2006 3.1
7 3 2003 2.7
8 3 2004 2.6
9 3 2005 1.9
An option with data.table
library(data.table)
setDT(data)[, .SD[2005 %in% Year], Site_ID]

Conversion of monthly data to yearly data in a dataframe in r

I have a dataframe showing monthly mgpp from 2000-2010:
dataframe1
Year Month mgpp
1: 2000 1 0.01986404
2: 2000 2 0.011178429
3: 2000 3 0.02662008
4: 2000 4 0.05034293
5: 2000 5 0.23491388
---
128: 2010 8 0.13234501
129: 2010 9 0.10432369
130: 2010 10 0.04329537
131: 2010 11 0.04343289
132: 2010 12 0.09494946
I am trying to convert this dataframe1 into a raster that will show the variable mgpp. However I want to format the dataframe first which will show only the yearly mgpp. The expected outcome is shown below :
dataframe1
Year mgpp
1: 2000 0.01986704
2: 2001 0.01578429
3: 2002 0.02662328
4: 2003 0.05089593
5: 2004 0.07491388
6: 2005 0.11229201
7: 2006 0.10318569
8: 2007 0.07129537
9: 2008 0.04373689
10: 2009 0.02885386
11: 2010 0.74848348
I want to aggregate the months by mean. For instance, 2000 value shows one value that is the mean from Jan-Dec for the 2000 year.How can I achieve this? Help would be appreciated
Here a data.table approach.
library(data.table)
setDT(dataframe1)[,.(Yearly.mgpp = mean(mgpp)),by=Year]
Year Yearly.mgpp
1: 2000 0.06858387
2: 2010 0.08366928
Or if you prefer dplyr.
library(dplyr)
dataframe1 %>%
group_by(Year) %>%
summarise(Yearly.mgpp = mean(mgpp))
# A tibble: 2 x 2
Year Yearly.mgpp
<dbl> <dbl>
1 2000 0.0686
2 2010 0.0837
Or base R.
result <- sapply(split(dataframe1$mgpp,dataframe1$Year),mean)
data.frame(Year = as.numeric(names(result)),Yearly.mgpp = result)
Year Yearly.mgpp
2000 2000 0.06858387
2010 2010 0.08366928
Sample Data
dataframe1 <- structure(list(Year = c(2000, 2000, 2000, 2000, 2000, 2010, 2010,
2010, 2010, 2010), Month = c(1, 2, 3, 4, 5, 8, 9, 10, 11, 12),
mgpp = c(0.01986404, 0.011178429, 0.02662008, 0.05034293,
0.23491388, 0.13234501, 0.10432369, 0.04329537, 0.04343289,
0.09494946)), class = "data.frame", row.names = c(NA, -10L
))

Column manipulation in R - matching correct names

I have a data.frame composed of multiple columns and thousands of rows. Below I attempt to display its (head):
|year |state_name|idealPoint| vote_no| vote_yes|
|:--------------|---------:|---------:|---------:|---------:|
|1971 | China | -25.0000| 31.0000| 45.4209|
|1972 | China | -26.2550| 38.2974| 45.4209|
|1973 | China | 28.2550| 35.2974| 45.4209|
|1994 | Czech | 27.2550| 34.2974| 45.4209|
As you can see. Not all countries [there are 196 of them] joined voting at the UN in the same year.
What I want to do is to create a new column in my data.frame (votes) that consists of the absolute difference between ChinaIdealpoints to Czech Ideal points (for given year...). I know how to create the new column with dplyr but how do I multiply correct countries from the list of 196 countries? (the difference between the year of joining can be then deleted manually I think).
The final Output should be new data.frame (or new columns in votes) looking like this: China ideal point in 1994 was, for instance, 2.2550
|year |state_name|idealPoint|Abs.Difference China_Czech
|:--------------|---------:|---------:|-------------------------:|
|1971 | China | -25.0000| NA |
|1972 | China | -26.2550| NA |
|1973 | China | 28.2550| NA |
|1994 | Czech | 27.2550| 25.0000 |
Codes:
df1 <- data.frame(year = c(1994,1995,1996,1997,1994,1995,1996,1997),
state_name = c("China","China","China","China","Czech_Republic","Czech_Republic","Czech_Republic","Czech_Republic"),
idealpoints = c(-25.0000,-26.2550,28.2550,27.2550,-27.0000,-28.2550,29.2550,22.2550),
vote_no = c(31.0000,38.2974,35.2974,34.2974,33.0000,36.2974,37.2974,38.2974),
vote_yes = c(45.4209,45.4209,45.4209,45.4209,45.4209,45.4209,45.4209,45.4209))
china_df <- df1[df1$state_name == "China",]
czech_df <- df1[df1$state_name == "Czech_Republic",]
china_czech_merge <- merge(china_df,czech_df,by = "year")
china_czech_merge$Abs_diff <- abs(china_czech_merge$idealpoints.x - china_czech_merge$idealpoints.y)
Output:
year state_name.x idealpoints.x vote_no.x vote_yes.x state_name.y idealpoints.y vote_no.y vote_yes.y Abs_diff
1 1994 China -25.000 31.0000 45.4209 Czech_Republic -27.000 33.0000 45.4209 2
2 1995 China -26.255 38.2974 45.4209 Czech_Republic -28.255 36.2974 45.4209 2
3 1996 China 28.255 35.2974 45.4209 Czech_Republic 29.255 37.2974 45.4209 1
4 1997 China 27.255 34.2974 45.4209 Czech_Republic 22.255 38.2974 45.4209 5
I think this will work for you.
Thanks
Does this perhaps solve your problem?
library(tibble)
library(dplyr)
a <- tribble(
~year, ~ctry, ~vote,
1994, "China", 5,
1995, "China", 100,
1996, "China", 600,
1997, "China", 45,
1998, "China", 9,
1994, "Czech_Republic", 1,
1995, "Czech_Republic", 5,
1996, "Czech_Republic", 100,
1997, "Czech_Republic", 40,
1998, "Czech_Republic", 6,
)
a %>%
group_by(year) %>%
mutate(foo = abs(lag(lead(vote) - vote)))
Output:
# A tibble: 10 x 4
# Groups: year [5]
year ctry vote foo
<dbl> <chr> <dbl> <dbl>
1 1994 China 5 NA
2 1995 China 100 NA
3 1996 China 600 NA
4 1997 China 45 NA
5 1998 China 9 NA
6 1994 Czech_Republic 1 4
7 1995 Czech_Republic 5 95
8 1996 Czech_Republic 100 500
9 1997 Czech_Republic 40 5
10 1998 Czech_Republic 6 3
You'll have to filter down the data to fit your needs, e.g. by country.

Resources