How to "extrapolate" values of panel data in R?

How to "extrapolate" values of panel data in R? - r

I have a panel data with NA values like below:
uid year month day value
1 1 2016 8 1 NA
2 1 2016 8 2 NA
3 1 2016 8 3 30
4 1 2016 8 4 NA
5 1 2016 8 5 20
6 2 2016 8 1 40
7 2 2016 8 2 NA
8 2 2016 8 3 50
9 2 2016 8 4 NA
10 2 2016 8 5 NA
I would like to perform a linear interpolation, so I wrote this code:
library(dplyr)
library(zoo)
panel_df <- group_by(panel_df, userid)
panel_df <- mutate(panel_df, value=na.approx(value, na.rm=FALSE))
then I get the output:
uid year month day value
1 1 2016 8 1 NA
2 1 2016 8 2 NA
3 1 2016 8 3 30
4 1 2016 8 4 25
5 1 2016 8 5 20
6 2 2016 8 1 40
7 2 2016 8 2 45
8 2 2016 8 3 50
9 2 2016 8 4 NA
10 2 2016 8 5 NA
Here the approx method interpolates NA values successfully but does not extrapolate.
Is there any good way to replace the value of the 1st and 2nd rows with first non-NA value of this user(30)? Similary, how I can replace the value of the 9th and 10th rows with last non-NA value of this user(50)?

One way to do this is by using na.spline() from same package zoo:
panel_df <- group_by(panel_df, uid)
panel_df <- mutate(panel_df, value=na.spline(value))
panel_df
Source: local data frame [10 x 5]
Groups: uid [2]
uid year month day value
<int> <int> <int> <int> <dbl>
1 1 2016 8 1 40
2 1 2016 8 2 35
3 1 2016 8 3 30
4 1 2016 8 4 25
5 1 2016 8 5 20
6 2 2016 8 1 40
7 2 2016 8 2 45
8 2 2016 8 3 50
9 2 2016 8 4 55
10 2 2016 8 5 60

Related

How to update data from column i row 2 to column j row 1 but grouped by two variables (dplyr) in a R dataframe?

I have two columns: sites (3 sites) and month (Jan - Mar) where I sampled in each month. For each month I have corresponding values in column i. I want to copy column i, row 2 to column j row 1. Then assign column j row 3 column i row 1. Repeat this pattern for the rest of the rows for each site. So, if column i went from 1 to 18. Column j would go from 2 3 1 5 6 4 8 9 7 11 12 10 14 15 13 17 18 13. I tried to modify the code from an answer for a similar problem I got here using dplyr. I tried to use the group_by function in dplyr so that it would loop back again, but the function is operating on the entire column.
library(dplyr)
col.site <- c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6)
col.month <- c("Jan","Feb","Mar","Jan","Feb","Mar","Jan","Feb","Mar","Jan","Feb","Mar","Jan","Feb","Mar","Jan","Feb","Mar")
col.i <- c(1:18)
df <- data.frame(col.site,col.month, col.i)
df <- df %>% group_by(col.month,col.site) %>%
mutate(col.j = lead(col.i, default = col.i[1]))
col.j
[1] 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 1
What I expected col.j:
[1] 2 3 1 5 6 4 8 9 7 11 12 10 14 15 13 17 18 13

I think you should only group_by col.site :
library(dplyr)
df %>%
group_by(col.site) %>%
mutate(col.j = lead(col.i, default = first(col.i)))
# col.site col.month col.i col.j
# <dbl> <chr> <int> <int>
# 1 1 Jan 1 2
# 2 1 Feb 2 3
# 3 1 Mar 3 1
# 4 2 Jan 4 5
# 5 2 Feb 5 6
# 6 2 Mar 6 4
# 7 3 Jan 7 8
# 8 3 Feb 8 9
# 9 3 Mar 9 7
#10 4 Jan 10 11
#11 4 Feb 11 12
#12 4 Mar 12 10
#13 5 Jan 13 14
#14 5 Feb 14 15
#15 5 Mar 15 13
#16 6 Jan 16 17
#17 6 Feb 17 18
#18 6 Mar 18 16

Using data.table
library(data.table)
setDT(df)[, col.j := shift(col.i, type = 'lead', fill = first(col.i)), col.site]
Or using dplyr
library(dplyr)
df %>%
group_by(col.site) %>%
mutate(col.j = c(col.i[-1], col.i[1]))
-output
# col.site col.month col.i col.j
# <dbl> <chr> <int> <int>
# 1 1 Jan 1 2
# 2 1 Feb 2 3
# 3 1 Mar 3 1
# 4 2 Jan 4 5
# 5 2 Feb 5 6
# 6 2 Mar 6 4
# 7 3 Jan 7 8
# 8 3 Feb 8 9
# 9 3 Mar 9 7
#10 4 Jan 10 11
#11 4 Feb 11 12
#12 4 Mar 12 10
#13 5 Jan 13 14
#14 5 Feb 14 15
#15 5 Mar 15 13
#16 6 Jan 16 17
#17 6 Feb 17 18
#18 6 Mar 18 16

How to remove columns with duplicate values in a data frame?

I have the following data:
Years A B C D
2015 1 7 1 13
2016 2 8 2 14
2017 3 9 3 15
2018 4 10 4 16
2019 5 11 5 17
2020 6 12 6 18
I want the result to looks as below (the columns with duplicate values removed):
Years A B D
2015 1 7 13
2016 2 8 14
2017 3 9 15
2018 4 10 16
2019 5 11 17
2020 6 12 18
Thanks in advance for all the help!

Combine the functions unclass and duplicated to find matching columns and then take the others:
df[!duplicated(unclass(df))]
output:
Years A B D
<dbl> <dbl> <dbl> <dbl>
1 2015 1 7 13
2 2016 2 8 14
3 2017 3 9 15
4 2018 4 10 16
5 2019 5 11 17
6 2020 6 12 18

Or we can transpose the dataset and apply the duplicated
df1[!duplicated(t(df1))]
# Years A B D
#1 2015 1 7 13
#2 2016 2 8 14
#3 2017 3 9 15
#4 2018 4 10 16
#5 2019 5 11 17
#6 2020 6 12 18
data
df1 <- structure(list(Years = 2015:2020, A = 1:6, B = 7:12, C = 1:6,
D = 13:18), class = "data.frame", row.names = c(NA, -6L))

If you want something fast, try this approach
df[!duplicated(as.list(df))]
# Years A B D
# 1 2015 1 7 13
# 2 2016 2 8 14
# 3 2017 3 9 15
# 4 2018 4 10 16
# 5 2019 5 11 17
# 6 2020 6 12 18

Retaining unique values per individual id in a dataframe in R

A very basic question! I tried finding searching a lot and using my own brain but eventually, had to come here.. :)
Well here is a sample dataframe
df<- data.frame(id=c(1,1,1,1,2,2,2,2,3,3,3,3),
quarter=c(1,2,3,4,1,2,3,4,1,2,3,4),
year=c(2015,2015,2015,2015,2015,2015,2015,2015,2015,2015,2015,2015),
value=c(2.75,2.75,2.75,2.75,2.90,2.90,2.90,2.90,2.21,2.21,2.21,2.21))
> df
id quarter year value
1 1 1 2015 2.75
2 1 2 2015 2.75
3 1 3 2015 2.75
4 1 4 2015 2.75
5 2 1 2015 2.90
6 2 2 2015 2.90
7 2 3 2015 2.90
8 2 4 2015 2.90
9 3 1 2015 2.21
10 3 2 2015 2.21
11 3 3 2015 2.21
12 3 4 2015 2.21
I need unique value per id. So, I use this-
df$value[duplicated(df$value)]<-NA
And I get what I need.
> df
id quarter year value
1 1 1 2015 2.75
2 1 2 2015 NA
3 1 3 2015 NA
4 1 4 2015 NA
5 2 1 2015 2.90
6 2 2 2015 NA
7 2 3 2015 NA
8 2 4 2015 NA
9 3 1 2015 2.21
10 3 2 2015 NA
11 3 3 2015 NA
12 3 4 2015 NA
Now lets say that I have the a new dataframe with more similar values -
df<- data.frame(id=c(1,1,1,1,2,2,2,2,3,3,3,3),
quarter=c(1,2,3,4,1,2,3,4,1,2,3,4),
year=c(2015,2015,2015,2015,2016,2016,2016,2016,2015,2015,2015,2015),
value=c(2.75,2.75,2.75,2.75,2.75,2.75,2.75,2.75,2.21,2.21,2.21,2.21))
If I use the same code, I will end up with data missing for ID 2 as well.
How could I retain unique values for every ID per year??
Any help is much appreciated.

Here is a base R solution using ave + duplicated
df <- within(df,value <- ave(value,
id,
year,
FUN = function(v) ifelse(duplicated(v),NA,v)))
such that
> df
id quarter year value
1 1 1 2015 2.75
2 1 2 2015 NA
3 1 3 2015 NA
4 1 4 2015 NA
5 2 1 2015 2.90
6 2 2 2015 NA
7 2 3 2015 NA
8 2 4 2015 NA
9 3 1 2015 2.21
10 3 2 2015 NA
11 3 3 2015 NA
12 3 4 2015 NA

Using duplicated on cbind id and year instead of value should give you the desired result:
df[duplicated(cbind(df$id, df$year)), "value"]<-NA
Using this solution on your second data.frame that gave you missing rows:
df<- data.frame(id=c(1,1,1,1,2,2,2,2,3,3,3,3),
quarter=c(1,2,3,4,1,2,3,4,1,2,3,4),
year=c(2015,2015,2015,2015,2016,2016,2016,2016,2015,2015,2015,2015),
value=c(2.75,2.75,2.75,2.75,2.75,2.75,2.75,2.75,2.21,2.21,2.21,2.21))
df[duplicated(cbind(df$id, df$year)), "value"]<-NA
Returns:
id quarter year value
1 1 1 2015 2.75
2 1 2 2015 NA
3 1 3 2015 NA
4 1 4 2015 NA
5 2 1 2016 2.75
6 2 2 2016 NA
7 2 3 2016 NA
8 2 4 2016 NA
9 3 1 2015 2.21
10 3 2 2015 NA
11 3 3 2015 NA
12 3 4 2015 NA

Ordering zoo object by months - sequentially

I have a data frame looks like this:
month SYMBOL val1 val2 val3
Jan/2017 A 3 4 6
Feb/2017 A 1 2 4
Mar/2017 A 2 5 3
Apr/2017 A 4 3 6
May/2017 A 6 2 8
Jan/2017 B 7 3 1
Feb/2017 B 3 7 3
Mar/2017 B 1 3 6
Apr/2017 B 7 2 8
May/2017 B 9 7 2
Jan/2017 C 0 8 6
Feb/2017 C 1 3 9
Mar/2017 C 3 3 1
Apr/2017 C 4 1 5
May/2017 C 6 7 1
When I convert it into a zoo object with yearmon as index, the SYMBOLS column changes like this:
SYMBOL val1 val2 val3
Jan/2017 A 3 4 6
Jan/2017 B 7 3 1
Jan/2017 C 0 8 6
Feb/2017 A 1 2 4
Feb/2017 B 3 7 3
Feb/2017 C 1 3 9
Mar/2017 A 2 5 3
Mar/2017 B 1 3 6
Mar/2017 C 3 3 1
Apr/2017 A 4 3 6
Apr/2017 B 7 2 8
Apr/2017 C 6 2 8
May/2017 A 9 7 2
May/2017 B 4 1 5
May/2017 C 6 7 1
Is there a way to order years sequentially while creating a zoo object so that the SYMBOLS remain as AAA, BBB, CCC instead of getting distorted? zoo inevitably changes it to JAN JAN JAN FEB FEB FEB instead of JAN - MAY for symbol A, JAN - MAY for symbol B and so on.

A zoo object is a timeseries and, in particular, time series have ordered observations. If you want to represent an object that is not a time series then either don't use zoo or somehow re-work it into being a time series.
1) Multivariate time series Although the data as presented (see Lines in the Note below) is not a time series it can be represented as a multivariate time series by splitting it on the second input column:
library(zoo)
z <- read.zoo(text = Lines, split = 2, FUN = as.yearmon, format = "%b/%Y", header = TRUE)
giving:
> z
val1.A val2.A val3.A val1.B val2.B val3.B val1.C val2.C val3.C
Jan 2017 3 4 6 7 3 1 0 8 6
Feb 2017 1 2 4 3 7 3 1 3 9
Mar 2017 2 5 3 1 3 6 3 3 1
Apr 2017 4 3 6 7 2 8 4 1 5
May 2017 6 2 8 9 7 2 6 7 1
2) by list of multiple time series Alternately, it would also be possible to represent it as a by list of zoo objects:
DF <- read.table(text = Lines, header = TRUE)
byz <- by(DF[-2], DF[2], function(x) read.zoo(x, FUN = as.yearmon, format = "%b/%Y"))
giving:
> byz
SYMBOL: A
val1 val2 val3
Jan 2017 3 4 6
Feb 2017 1 2 4
Mar 2017 2 5 3
Apr 2017 4 3 6
May 2017 6 2 8
------------------------------------------------------------
SYMBOL: B
val1 val2 val3
Jan 2017 7 3 1
Feb 2017 3 7 3
Mar 2017 1 3 6
Apr 2017 7 2 8
May 2017 9 7 2
------------------------------------------------------------
SYMBOL: C
val1 val2 val3
Jan 2017 0 8 6
Feb 2017 1 3 9
Mar 2017 3 3 1
Apr 2017 4 1 5
May 2017 6 7 1
3) synthesized index It may be difficult to manipulate such an object but to cover all the possibilities one could synthesize a new index from the SYMBOL and month columns to create a zoo series with a character index like this.
myindex <- function(sym, mon) paste(sym, format(as.yearmon(mon, "%b/%Y"), "%Y-%m"))
z2 <- read.zoo(text = Lines, index = 2:1, FUN = myindex, header = TRUE)
giving the following zoo object:
> z2
val1 val2 val3
A 2017-01 3 4 6
A 2017-02 1 2 4
A 2017-03 2 5 3
A 2017-04 4 3 6
A 2017-05 6 2 8
B 2017-01 7 3 1
B 2017-02 3 7 3
B 2017-03 1 3 6
B 2017-04 7 2 8
B 2017-05 9 7 2
C 2017-01 0 8 6
C 2017-02 1 3 9
C 2017-03 3 3 1
C 2017-04 4 1 5
C 2017-05 6 7 1
Note: The input in reproducible form is:
Lines <- "month SYMBOL val1 val2 val3
Jan/2017 A 3 4 6
Feb/2017 A 1 2 4
Mar/2017 A 2 5 3
Apr/2017 A 4 3 6
May/2017 A 6 2 8
Jan/2017 B 7 3 1
Feb/2017 B 3 7 3
Mar/2017 B 1 3 6
Apr/2017 B 7 2 8
May/2017 B 9 7 2
Jan/2017 C 0 8 6
Feb/2017 C 1 3 9
Mar/2017 C 3 3 1
Apr/2017 C 4 1 5
May/2017 C 6 7 1"

R paired column index

Say I have two matrix, A and B:
mth <- c(rep(1:5,2))
day <- c(rep(10,5),rep(11,5))
hr <- c(3,4,5,6,7,3,4,5,6,7)
v <- c(3,4,5,4,3,3,4,5,4,3)
A <- data.frame(cbind(mth,day,hr,v))
year <- c(2008:2012)
mth <- c(1:5)
B <- data.frame(cbind(year,mth))
What I want should be look like:
mth <- c(rep(2008:2012,2))
day <- c(rep(10,5),rep(11,5))
hr <- c(3,4,5,6,7,3,4,5,6,7)
v <- c(3,4,5,4,3,3,4,5,4,3)
A <- data.frame(cbind(mth,day,hr,v))
Basically what I need is to change the column mth in A with column year in B, Maybe I didn't search for the right keyword, I was not able to get what I want(I tried which()), please help, thank you.

A2 <- merge(A,B, by = "mth")[ , -1]
names(A2)[(which(names(A2)=="year"))] <- "mth"
> A2
day hr v mth
1 10 3 3 2008
2 11 3 3 2008
3 11 4 4 2009
4 10 4 4 2009
5 11 5 5 2010
6 10 5 5 2010
7 11 6 4 2011
8 10 6 4 2011
9 10 7 3 2012
10 11 7 3 2012

Probably the easiest solution is to use merge, which is equivalent to a sql join in a lot of ways:
merge(A,B)
#-----
merge(A, B)
mth day hr v year
1 1 10 3 3 2008
2 1 11 3 3 2008
3 2 11 4 4 2009
4 2 10 4 4 2009
5 3 11 5 5 2010
6 3 10 5 5 2010
7 4 11 6 4 2011
8 4 10 6 4 2011
9 5 10 7 3 2012
10 5 11 7 3 2012
You could also probably use match like this to replace mth in place:
A$mth <- B[match(A$mth, B$mth),1]
#-----
mth day hr v
1 2008 10 3 3
2 2009 10 4 4
3 2010 10 5 5
4 2011 10 6 4
5 2012 10 7 3
6 2008 11 3 3
7 2009 11 4 4
8 2010 11 5 5
9 2011 11 6 4
10 2012 11 7 3
While a little dense, that code indexes B by matching the two mth columns from A and B and then grabs the first column.+

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to "extrapolate" values of panel data in R? - r

Related

How to update data from column i row 2 to column j row 1 but grouped by two variables (dplyr) in a R dataframe?

How to remove columns with duplicate values in a data frame?

Retaining unique values per individual id in a dataframe in R

Ordering zoo object by months - sequentially

R paired column index

Categories

Resources