How to calculate the exponential in some columns of a dataframe in R? - r

I have a dataframe:
X Year Dependent.variable.1 Forecast.Dependent.variable.1
1 2009 12.42669703 12.41831191
2 2010 12.39309563 12.40043599
3 2011 12.36596964 12.38256006
4 2012 12.32067284 12.36468414
5 2013 12.303095 12.34680822
6 2014 NA 12.32893229
7 2015 NA 12.31105637
8 2016 NA 12.29318044
9 2017 NA 12.27530452
10 2018 NA 12.25742859
I want to calulate the exponential of the third and fourth columns. How can I do that?

In case your dataframe is called dfs, you can do the following:
dfs[c('Dependent.variable.1','Forecast.Dependent.variable.1')] <- exp(dfs[c('Dependent.variable.1','Forecast.Dependent.variable.1')])
which gives you:
X Year Dependent.variable.1 Forecast.Dependent.variable.1
1 1 2009 249371 247288.7
2 2 2010 241131 242907.5
3 3 2011 234678 238603.9
4 4 2012 224285 234376.5
5 5 2013 220377 230224.0
6 6 2014 NA 226145.1
7 7 2015 NA 222138.5
8 8 2016 NA 218202.9
9 9 2017 NA 214336.9
10 10 2018 NA 210539.5
In case you know the column numbers, this could then also simply be done by using:
dfs[,3:4] <- exp(dfs[,3:4])
which gives you the same result as above. I usually prefer to use the actual column names as the indices might change when the data frame is further processed (e.g. I delete columns, then the indices change).
Or you could do:
dfs$Dependent.variable.1 <- exp(dfs$Dependent.variable.1)
dfs$Forecast.Dependent.variable.1 <- exp(dfs$Forecast.Dependent.variable.1)
In case you want to store these columns in new variables (below they are called exp1 and exp2, respectively), you can do:
exp1 <- exp(dfs$Forecast.Dependent.variable.1)
exp2 <- exp(dfs$Dependent.variable.1)
In case you want to apply it to more than two columns and/or use more complicated functions, I highly recommend to look at apply/lappy.
Does that answer your question?

Related

How can I add new variable with MUTATE: growth rate?

I haven't coded for several months and now am stuck with the following issue.
I have the following dataset:
Year World_export China_exp World_import China_imp
1 1992 3445.534 27.7310 3402.505 6.2220
2 1993 1940.061 27.8800 2474.038 18.3560
3 1994 2458.337 39.6970 2978.314 3.3270
4 1995 4641.168 15.9790 5504.787 18.0130
5 1996 5680.688 74.1650 6939.291 25.1870
6 1997 7206.604 70.2440 8639.422 31.9030
7 1998 7069.725 99.6510 8530.293 41.5030
8 1999 5916.077 169.4593 6673.743 37.8139
9 2000 7331.588 136.2180 8646.253 47.3789
10 2001 7471.374 143.0542 8292.893 41.2899
11 2002 8074.975 217.4286 9092.341 46.4730
12 2003 9956.433 162.2522 11558.007 71.7753
13 2004 13751.671 282.8678 16345.452 157.0768
14 2005 15976.238 430.8655 16708.094 284.1065
15 2006 19728.935 398.6704 22344.856 553.6356
16 2007 24275.244 484.5276 28693.113 815.7914
17 2008 32570.781 613.3714 39381.251 1414.8120
18 2009 21282.228 173.9463 28563.576 1081.3720
19 2010 25283.462 475.7635 34884.450 1684.0839
20 2011 41418.670 636.5881 45759.051 2193.8573
21 2012 46027.529 432.6025 46404.382 2373.4535
22 2013 37132.301 460.7133 43022.550 2829.3705
23 2014 36046.461 640.2552 40502.268 2373.2351
24 2015 26618.982 781.0016 30264.299 2401.1907
25 2016 23537.354 472.7022 27609.884 2129.4806
What I need is simple: to compute growth rates of each variable, that is, find difference between two elements, divide it by first element and multiply by 100.
I'm trying to write a script, that ends up with error message:
trade_Ch %>%
mutate (
World_exp_grate = sapply(2:nrow(trade_Ch),function(i)((World_export[i]-World_export[i-1])/World_export[i-1]))
)
Error in mutate_impl(.data, dots) : Column World_exp_grate must
be length 25 (the number of rows) or one, not 24
although this piece of code gives me right values:
x <- sapply(2:nrow(trade_Ch),function(i)((trade_Ch$World_export[i]-trade_Ch$World_export[i-1])/trade_Ch$World_export[i-1]))
How can I correctly embedd the code into my MUTATE part from dplyr package?
OR
Is there is another elegant way to solve this issue?
library(dplyr)
df %>%
mutate_each(funs(chg = ((.-lag(.))/lag(.))*100), World_export:China_imp)
trade_Ch %>%
mutate(world_exp_grate = 100*(World_export - lag(World_export))/lag(World_export))
The problem is that you cannot calculate the World_exp_grate for your first row. Therefore you have to set it to NA.
One variant to solve this is
trade_Ch %>%
mutate (World_export_lag = lag(World_export),
World_exp_grate = (World_export - World_export_lag)/World_export_lag)) %>%
select(-World_export_lag)
lag shifts the vector by one position.
lag(1:5)
# [1] NA 1 2 3 4

How to express a variable as a function of 2 others in a dataframe composed of 3 vectors

I know it is fundamental but I can't find the trick ...
Here is an exemple :
Species <- c("dark frog",rep(c("elephant","tiger","boa"),3),"black mamba")
Year <- c(rep(2011,4),rep(2012,3),rep(2013,4))
Abundance <- c(2,4,5,6,9,2,1,5,6,8,4)
df <- data.frame(Species, Year, Abundance)
I would like to obtain another dataframe (3 rows *5 columns) with the abundance values in function of the species as the column names (each species appearing thus only one time) and the years as the row names (appearing one time also).
May someone help me please ?
You mean something like this?
> xtabs(Abundance~Year+Species, data=df)
Species
Year black mamba boa dark frog elephant tiger
2011 0 6 2 4 5
2012 0 1 0 9 2
2013 4 8 0 5 6
The class for the above is a table, so if you prefer a data.frame instead, you can try:
library(tidyr)
new.df<- spread(df, key = Species, value = Abundance)
Year black mamba boa dark frog elephant tiger
1 2011 NA 6 2 4 5
2 2012 NA 1 NA 9 2
3 2013 4 8 NA 5 6
If you want 0s instead of NA add the following line:
new.df[is.na(new.df)]<- 0

Conditional cumulative subtraction

This is what my data.table looks like:
library(data.table)
dt <- fread('
Year Total Shares Balance
2017 10 1 10
2016 12 2 9
2015 10 2 7
2014 10 3 6
2013 10 NA 3
')
**Balance** is my desired column. I am trying to find the cumulative subtractions by taking the first value of Total which is 10(it should also be the first value of Balance field) and then cumulatively subtracting values in Shares. So the second value is 10-1 =9 and the third value is 9-2 = 7 and such. There is one condition, if the Year is 2014, then subtract the Shares value after dividing it by 2. so the fourth value is 7-(2/2)=6 and the fifth value is 6-3=3. I want to end the calc as of the last row.
My attempt is:
dt[, Balance:= ifelse( Year == 2014, cumsum(Total[1]-Shares/2), cumsum(Total[1] - Shares))]
Here is one method.
dt[, Balance2 := Total[1] - cumsum(shift(Shares * (1 - (0.5 *(Year == 2015))), fill=0))]
shift is used to create a lag variable, and the first element is filled with 0, using fill=0. The other elements are calculated as Shares * (1 - (0.5 *(Year == 2015))) which return Shares except when Years == 2015, in which case Shares * 0.5 is returned.
which returns
dt
Year Total Shares Balance Balance2
1: 2017 10 1 10 10
2: 2016 12 2 9 9
3: 2015 10 2 7 7
4: 2014 10 3 6 6
5: 2013 10 NA 3 3
FWIW, I wanted to provide a functional alternative that would allow for more flexible calculations in the cumulative differences, indexing, etc. I also have read in the data with read.table.
dt <- read.table(header=TRUE, text='
Year Total Shares Balance
2017 10 1 10
2016 12 2 9
2015 10 2 7
2014 10 3 6
2013 10 NA 3
')
makeNewBalance <- function(dt) {
output <- NULL
for (i in 1:nrow(dt)) {
if (i==1) {
output[i] <- dt$Total[i]
} else {
output[i] <- output[i-1] - as.integer(ifelse(dt$Year[i]==2014,
dt$Shares[i-1]/2,
dt$Shares[i-1]))
}
}
return(output)
}
dt$NewBalance <- makeNewBalance(dt)
which also returns
> dt
Year Total Shares Balance NewBalance
1 2017 10 1 10 10
2 2016 12 2 9 9
3 2015 10 2 7 7
4 2014 10 3 6 6
5 2013 10 NA 3 3

Sum column values that match year in another column in R

I have the following dataframe
y<-data.frame(c(2007,2008,2009,2009,2010,2010),c(10,13,10,11,9,10),c(5,6,5,7,4,7))
colnames(y)<-c("year","a","b")
I want to have a final data.frame that adds together within the same year the values in "y$a" in the new "a" column and the values in "y$b" in the new "b" column so that it looks like this"
year a b
2007 10 5
2008 13 6
2009 21 12
2010 19 11
The following loop has done it for me,
years<- as.numeric(levels(factor(y$year)))
add.a<- numeric(length(y[,1]))
add.b<- numeric(length(y[,1]))
for(i in years){
ind<- which(y$year==i)
add.a[ind]<- sum(as.numeric(as.character(y[ind,"a"])))
add.b[ind]<- sum(as.numeric(as.character(y[ind,"b"])))
}
y.final<-data.frame(y$year,add.a,add.b)
colnames(y.final)<-c("year","a","b")
y.final<-subset(y.final,!duplicated(y.final$year))
but I just think there must be a faster command. Any ideas?
Kindest regards,
Marco
The aggregate function is a good choice for this sort of operation, type ?aggregate for more information about it.
aggregate(cbind(a,b) ~ year, data = y, sum)
# year a b
#1 2007 10 5
#2 2008 13 6
#3 2009 21 12
#4 2010 19 11

How to reference data frame ranges within a list using a logical condition

With this question I would like to complement the discussion started here
I have a list containing data frames for which I need to update specific ranges one at a time. The 'one at a time' requirement I think makes lapply hard to use here - but correct me if I am wrong.
Let me propose the following example to clarify.
set.seed(1)
d1<-data.frame(a=rnorm(5), b=c(rep(2006, times=4),NA), c=seq(from=22, to=30, by=2))
d2<-data.frame(a=1:5, b=c(2007, 2007, NA, NA, 2007), c=11:15)
d3<-data.frame(a=12:16, b=c(NA, NA, NA, 2008, 2008), c=21:25)
my.ls<-list(d1=d1, d2=d2, d3=d3)
my.ls
$d1
a b c
# 1.5117812 2006 22
# 0.3898432 2006 24
# -0.6212406 2006 26
# -2.2146999 2006 28
# 1.1249309 NA 30
$d2
a b c
# 1 2007 11
# 2 2007 12
# 3 NA 13
# 4 NA 14
# 5 2007 15
$d3
a b c
# 12 NA 21
# 13 NA 22
# 14 NA 23
# 15 2008 24
# 16 2008 25
Now, assume I want to change the NAs in the b column of $d2 with values of the year before so as to see in rows 3 and 4 2006. Following the linked discussion, I thought I was able to do that using the code,
my.ls[["d2"]][,is.na(b)]<-2006
but I was wrong. R reports an error which says that the object b can't be found. I think here the problem for me is to understand how to reference data frames ranges within lists using a logical condition.
As usual, the solution might be way easier than I am figuring it out in my mind...

Resources