How can I add new variable with MUTATE: growth rate? - r

I haven't coded for several months and now am stuck with the following issue.
I have the following dataset:
Year World_export China_exp World_import China_imp
1 1992 3445.534 27.7310 3402.505 6.2220
2 1993 1940.061 27.8800 2474.038 18.3560
3 1994 2458.337 39.6970 2978.314 3.3270
4 1995 4641.168 15.9790 5504.787 18.0130
5 1996 5680.688 74.1650 6939.291 25.1870
6 1997 7206.604 70.2440 8639.422 31.9030
7 1998 7069.725 99.6510 8530.293 41.5030
8 1999 5916.077 169.4593 6673.743 37.8139
9 2000 7331.588 136.2180 8646.253 47.3789
10 2001 7471.374 143.0542 8292.893 41.2899
11 2002 8074.975 217.4286 9092.341 46.4730
12 2003 9956.433 162.2522 11558.007 71.7753
13 2004 13751.671 282.8678 16345.452 157.0768
14 2005 15976.238 430.8655 16708.094 284.1065
15 2006 19728.935 398.6704 22344.856 553.6356
16 2007 24275.244 484.5276 28693.113 815.7914
17 2008 32570.781 613.3714 39381.251 1414.8120
18 2009 21282.228 173.9463 28563.576 1081.3720
19 2010 25283.462 475.7635 34884.450 1684.0839
20 2011 41418.670 636.5881 45759.051 2193.8573
21 2012 46027.529 432.6025 46404.382 2373.4535
22 2013 37132.301 460.7133 43022.550 2829.3705
23 2014 36046.461 640.2552 40502.268 2373.2351
24 2015 26618.982 781.0016 30264.299 2401.1907
25 2016 23537.354 472.7022 27609.884 2129.4806
What I need is simple: to compute growth rates of each variable, that is, find difference between two elements, divide it by first element and multiply by 100.
I'm trying to write a script, that ends up with error message:
trade_Ch %>%
mutate (
World_exp_grate = sapply(2:nrow(trade_Ch),function(i)((World_export[i]-World_export[i-1])/World_export[i-1]))
)
Error in mutate_impl(.data, dots) : Column World_exp_grate must
be length 25 (the number of rows) or one, not 24
although this piece of code gives me right values:
x <- sapply(2:nrow(trade_Ch),function(i)((trade_Ch$World_export[i]-trade_Ch$World_export[i-1])/trade_Ch$World_export[i-1]))
How can I correctly embedd the code into my MUTATE part from dplyr package?
OR
Is there is another elegant way to solve this issue?

library(dplyr)
df %>%
mutate_each(funs(chg = ((.-lag(.))/lag(.))*100), World_export:China_imp)

trade_Ch %>%
mutate(world_exp_grate = 100*(World_export - lag(World_export))/lag(World_export))

The problem is that you cannot calculate the World_exp_grate for your first row. Therefore you have to set it to NA.
One variant to solve this is
trade_Ch %>%
mutate (World_export_lag = lag(World_export),
World_exp_grate = (World_export - World_export_lag)/World_export_lag)) %>%
select(-World_export_lag)
lag shifts the vector by one position.
lag(1:5)
# [1] NA 1 2 3 4

Related

SQL `lead()` equivalent in R

I want to make something like LEAD(mes) OVER(PARTITION BY CODIGO_CLIENTE ORDER BY mes) mes_2 in R, but I dont know a similar function.
I have no clue how to work it out.
Since you shared no data and desired output, here is an example with lead() from the dplyr package. The example is from the Help page of lead(). This can give you a good idea of what you can do with this function.
df <- data.frame(year = 2000:2005, value = (0:5) ^ 2)
scrambled <- df[sample(nrow(df)), ]
year value
1 2000 0
5 2004 16
3 2002 4
4 2003 9
2 2001 1
6 2005 25
right <- mutate(scrambled, `next` = lead(value, order_by = year))
arrange(right, year)
year value next
1 2000 0 1
2 2001 1 4
3 2002 4 9
4 2003 9 16
5 2004 16 25
6 2005 25 NA
Since you're new to R I suggest you read a bit on the dplyr package. Also, to make it easier for the people trying to help you, please provide more details next time!

Transpose column and group dataframe [duplicate]

This question already has answers here:
How to reshape data from long to wide format
(14 answers)
Closed 5 years ago.
I'm trying to change a dataframe in R to group multiple rows by a measurement. The table has a location (km), a size (mm) a count of things in that size bin, a site and year. I want to take the sizes, make a column from each one (2, 4 and 6 in this example), and place the corresponding count into each the row for that location, site and year.
It seems like a combination of transposing and grouping, but I can't figure out a way to accomplish this in R. I've looked at t(), dcast() and aggregate(), but those aren't really close at all.
So I would go from something like this:
df <- data.frame(km=c(rep(32,3),rep(50,3)), mm=rep(c(2,4,6),2), count=sample(1:25,6), site=rep("A", 6), year=rep(2013, 6))
km mm count site year
1 32 2 18 A 2013
2 32 4 2 A 2013
3 32 6 12 A 2013
4 50 2 3 A 2013
5 50 4 17 A 2013
6 50 6 21 A 2013
To this:
km site year mm_2 mm_4 mm_6
1 32 A 2013 18 2 12
2 50 A 2013 3 17 21
Edit: I tried the solution in a suggested duplicate, but I did not work for me, not really sure why. The answer below worked better.
As suggested in the comment above, we can use the sep argument in spread:
library(tidyr)
spread(df, mm, count, sep = "_")
km site year mm_2 mm_4 mm_6
1 32 A 2013 4 20 1
2 50 A 2013 15 14 22
As you mentioned dcast(), here is a method using it.
set.seed(1)
df <- data.frame(km=c(rep(32,3),rep(50,3)),
mm=rep(c(2,4,6),2),
count=sample(1:25,6),
site=rep("A", 6),
year=rep(2013, 6))
library(reshape2)
dcast(df, ... ~ mm, value.var="count")
# km site year 2 4 6
# 1 32 A 2013 13 10 20
# 2 50 A 2013 3 17 1
And if you want a bit of a challenge you can try the base function reshape().
df2 <- reshape(df, v.names="count", idvar="km", timevar="mm", ids="mm", direction="wide")
colnames(df2) <- sub("count.", "mm_", colnames(df2))
df2
# km site year mm_2 mm_4 mm_6
# 1 32 A 2013 13 10 20
# 4 50 A 2013 3 17 1

How to 'stretch' the cell of a column from a data frame in R

'stretch' may not be the most suitable way to put it, but I can't come up with any other word.
I have a data frame like this :
var1 <- c(rep(0, each=9),1999,rep(0, each=9),2000,rep(0, each=9),2001)
var2 <- c(rnorm(n=30))
df1 <- data.frame(var1,var2)
What I want to do is to replace every 0 from the column var1 by the next number encountered in the column. Hence I want sthg like:
var1 <- c(rep(1999, each=10),rep(2000, each=10),rep(2001, each=10))
var2 <- c(rnorm(n=30))
df2 <- data.frame(var1,var2)
With var2 having specific and ordered values I don't want to move around.
The thing is, the data frame is 500 000 rows long, so I would like not to find the row number of every var1 different from 0.
(it's likely that such question has been asked before, but since I couldn't find another word than 'stretch'...)
One way using na.locf from zoo:
library(zoo)
#convert zeros to NA in order to use na.locf afterwards
df1$var1[df1$var1 == 0] <- NA
#fromLast carries the observations backwards
df1$var1 <- na.locf(df1$var1, fromLast = TRUE)
Out:
> df1
var1 var2
1 1999 -0.04750614
2 1999 -0.35462388
3 1999 0.30700748
4 1999 1.09506443
5 1999 -0.61049306
6 1999 0.66687294
7 1999 0.54623236
8 1999 -0.04848903
9 1999 -0.56502719
10 1999 0.08067966
11 2000 -0.05474748
12 2000 0.27380898
13 2000 -0.21283353
14 2000 -0.89820808
15 2000 -0.18752047
16 2000 0.21827094
17 2000 0.56370895
18 2000 -1.21738551
19 2000 -0.61426847
20 2000 -1.34144736
21 2001 -0.52697208
22 2001 0.90209640
23 2001 -0.52040468
24 2001 -0.37432746
25 2001 -0.21218776
26 2001 0.88372231
27 2001 0.54274394
28 2001 0.06127087
29 2001 0.04263164
30 2001 0.52294204

How to calculate the exponential in some columns of a dataframe in R?

I have a dataframe:
X Year Dependent.variable.1 Forecast.Dependent.variable.1
1 2009 12.42669703 12.41831191
2 2010 12.39309563 12.40043599
3 2011 12.36596964 12.38256006
4 2012 12.32067284 12.36468414
5 2013 12.303095 12.34680822
6 2014 NA 12.32893229
7 2015 NA 12.31105637
8 2016 NA 12.29318044
9 2017 NA 12.27530452
10 2018 NA 12.25742859
I want to calulate the exponential of the third and fourth columns. How can I do that?
In case your dataframe is called dfs, you can do the following:
dfs[c('Dependent.variable.1','Forecast.Dependent.variable.1')] <- exp(dfs[c('Dependent.variable.1','Forecast.Dependent.variable.1')])
which gives you:
X Year Dependent.variable.1 Forecast.Dependent.variable.1
1 1 2009 249371 247288.7
2 2 2010 241131 242907.5
3 3 2011 234678 238603.9
4 4 2012 224285 234376.5
5 5 2013 220377 230224.0
6 6 2014 NA 226145.1
7 7 2015 NA 222138.5
8 8 2016 NA 218202.9
9 9 2017 NA 214336.9
10 10 2018 NA 210539.5
In case you know the column numbers, this could then also simply be done by using:
dfs[,3:4] <- exp(dfs[,3:4])
which gives you the same result as above. I usually prefer to use the actual column names as the indices might change when the data frame is further processed (e.g. I delete columns, then the indices change).
Or you could do:
dfs$Dependent.variable.1 <- exp(dfs$Dependent.variable.1)
dfs$Forecast.Dependent.variable.1 <- exp(dfs$Forecast.Dependent.variable.1)
In case you want to store these columns in new variables (below they are called exp1 and exp2, respectively), you can do:
exp1 <- exp(dfs$Forecast.Dependent.variable.1)
exp2 <- exp(dfs$Dependent.variable.1)
In case you want to apply it to more than two columns and/or use more complicated functions, I highly recommend to look at apply/lappy.
Does that answer your question?

Sum column values that match year in another column in R

I have the following dataframe
y<-data.frame(c(2007,2008,2009,2009,2010,2010),c(10,13,10,11,9,10),c(5,6,5,7,4,7))
colnames(y)<-c("year","a","b")
I want to have a final data.frame that adds together within the same year the values in "y$a" in the new "a" column and the values in "y$b" in the new "b" column so that it looks like this"
year a b
2007 10 5
2008 13 6
2009 21 12
2010 19 11
The following loop has done it for me,
years<- as.numeric(levels(factor(y$year)))
add.a<- numeric(length(y[,1]))
add.b<- numeric(length(y[,1]))
for(i in years){
ind<- which(y$year==i)
add.a[ind]<- sum(as.numeric(as.character(y[ind,"a"])))
add.b[ind]<- sum(as.numeric(as.character(y[ind,"b"])))
}
y.final<-data.frame(y$year,add.a,add.b)
colnames(y.final)<-c("year","a","b")
y.final<-subset(y.final,!duplicated(y.final$year))
but I just think there must be a faster command. Any ideas?
Kindest regards,
Marco
The aggregate function is a good choice for this sort of operation, type ?aggregate for more information about it.
aggregate(cbind(a,b) ~ year, data = y, sum)
# year a b
#1 2007 10 5
#2 2008 13 6
#3 2009 21 12
#4 2010 19 11

Resources