NA value in a dataframe

NA value in a dataframe - r

I try to apply a function to a column of a dataframe but when I do this i got a column full of NA values. I don't understand why.
Here is my code :
courbe <- function(x) exp(coef(regression)[1]*x+coef(regression[2]))
dataT[,c(2)] <- courbe(dataT[,c(1)])
And here my dataframe :
DateRep Cases
1 25 NA
2 24 NA
3 23 NA
4 22 NA
5 21 NA
6 20 NA
7 19 NA
8 18 NA
9 17 NA
10 16 NA
11 15 NA
12 14 NA
13 13 NA
14 12 NA
15 11 NA
16 10 NA
17 9 NA
18 8 NA
19 7 NA
20 6 NA
21 5 NA
22 4 NA
23 3 NA
24 2 NA
25 1 NA
26 0 NA
The output of print(coef(regression)) :
Coefficients:
(Intercept) dataT$DateRep
2.7095 0.2211

As figured out in the comments, the mistake was in the placement of indices coef(regression)[1] and coef(regression[2]).

Related

Fill missing values in time series using previous day data - R

I have a data frame where each row is a different date and every column is different time series.
The date range in the table is 01.01.2019-01.01.2021.
Some of the time series are relevant for only part of the dates and have missing values on weekends and holidays.
How can I complete the missing values for each time series using previous day values only for the relevant dates of each column (if the time series in a specific column is from 01.03.2019 to 01.09.2019 I want to complete only the missing values in this dates range)?
In addition, if the time series stops for more than 5 days and then continues I want to stop the completion and then to restart the completion again.
I have tried to use the fill function:
data <- data %>%
fill(colnames(data))
but it completes also the missing data after the specific time series is over.
For example the df is:
# Date time_series_1 time_series_2 time_series_3
1 01-01-2019 NA 10 8
2 02-01-2019 5 NA 10
3 03-01-2019 10 NA 20
4 04-01-2019 20 6 40
5 05-01-2019 30 NA NA
6 06-01-2019 NA 8 NA
7 07-01-2019 7 NA NA
8 08-01-2019 5 NA NA
9 09-01-2019 NA NA 5
10 10-01-2019 NA NA NA
11 11-01-2019 NA NA 7
12 12-01-2019 NA NA 10
13 13-01-2019 NA NA 11
14 14-01-2019 NA NA 12
15 15-01-2019 NA NA NA
16 16-01-2019 NA NA 9
17 17-01-2019 NA NA 10
18 18-01-2019 NA NA 10
19 19-01-2019 5 NA 11
20 20-01-2019 NA NA NA
21 21-01-2019 5 NA NA
22 22-01-2019 6 NA NA
The desired output is:
# Date time_series_1 time_series_2 time_series_3
1 01-01-2019 NA 10 8
2 02-01-2019 5 10 10
3 03-01-2019 10 10 20
4 04-01-2019 20 6 40
5 05-01-2019 30 6 40
6 06-01-2019 30 8 40
7 07-01-2019 7 NA 40
8 08-01-2019 5 NA 40
9 09-01-2019 NA NA 5
10 10-01-2019 NA NA 5
11 11-01-2019 NA NA 7
12 12-01-2019 NA NA 10
13 13-01-2019 NA NA 11
14 14-01-2019 NA NA 12
15 15-01-2019 NA NA 12
16 16-01-2019 NA NA 9
17 17-01-2019 NA NA 10
18 18-01-2019 NA NA 10
19 19-01-2019 5 NA 11
20 20-01-2019 5 NA 11
21 21-01-2019 5 NA 11
22 22-01-2019 6 NA 11

Edit
Thanks to #G. Grothendieck to mention that na.locf0 has maxgap argument which can handle the 5-day condition directly.
data[-1] <- lapply(data[-1], zoo::na.locf0, maxgap = 5)
data
Earlier Answer
You can write a function with rle and zoo::na.locf0 to replace NA only if the length of consecutive NA is less than equal to 5. Apply this function for multiple columns with lapply.
conditionally_replace_na <- function(x) {
ifelse(with(rle(is.na(x)), rep(lengths, lengths)) <= 5 & is.na(x),
zoo::na.locf0(x), x)
}
data[-1] <- lapply(data[-1], conditionally_replace_na)
data
# Date time_series_1 time_series_2 time_series_3
#1 01-01-2019 NA 10 8
#2 02-01-2019 5 10 10
#3 03-01-2019 10 10 20
#4 04-01-2019 20 6 40
#5 05-01-2019 30 6 40
#6 06-01-2019 30 8 40
#7 07-01-2019 7 NA 40
#8 08-01-2019 5 NA 40
39 09-01-2019 NA NA 5
#10 10-01-2019 NA NA 5
#11 11-01-2019 NA NA 7
#12 12-01-2019 NA NA 10
#13 13-01-2019 NA NA 11
#14 14-01-2019 NA NA 12
#15 15-01-2019 NA NA 12
#16 16-01-2019 NA NA 9
#17 17-01-2019 NA NA 10
#18 18-01-2019 NA NA 10
#19 19-01-2019 5 NA 11
#20 20-01-2019 5 NA 11
#21 21-01-2019 5 NA 11
#22 22-01-2019 6 NA 11
Function can also be applied with dplyr::across
library(dplyr)
data %>% mutate(across(starts_with('time_series'), conditionally_replace_na))

Expand a dataframe based on columns in the dataframe in R

I have the following dataframe in R
df<-data.frame(
"Val1"=seq(from=1, to=40, by=5), 'Val2'=c(2,4,2,5,11,3,5,3),
"Val3"=seq(from=5, to=40, by=5), "Val4"=c(3,5,7,3,7,5,7,8))
The resulting dataframe looks as follows. Val 1, Val3 are the causal variables and Val2, Val4 are the dependent variables
Val1 Val2 Val3 Val4
1 1 2 5 3
2 6 4 10 5
3 11 2 15 7
4 16 5 20 3
5 21 11 25 7
6 26 3 30 5
7 31 5 35 7
8 36 3 40 8
I wish to obtain the following dataframe as an output
Val1 Val2 Val3 Val4
1 1 2 1 NA
2 2 NA 2 NA
3 3 NA 3 3
4 4 NA 4 NA
5 5 NA 5 NA
6 6 4 6 NA
7 7 NA 7 NA
8 8 NA 8 NA
9 9 NA 9 NA
10 10 NA 10 5
11 11 2 11 NA
12 12 NA 12 NA
13 13 NA 13 NA
14 14 NA 14 NA
15 15 NA 15 7
16 16 5 16 NA
17 17 NA 17 NA
18 18 NA 18 NA
19 19 NA 19 NA
20 20 NA 20 3
21 21 11 21 NA
22 22 NA 22 NA
23 23 NA 23 NA
24 24 NA 24 NA
25 25 NA 25 7
26 26 3 26 NA
27 27 NA 27 NA
28 28 NA 28 NA
29 29 NA 29 NA
30 30 NA 30 5
31 31 5 31 NA
32 32 NA 32 NA
33 33 NA 33 NA
34 34 NA 34 NA
35 35 NA 35 7
36 36 3 36 NA
37 37 NA 37 NA
38 38 NA 38 NA
39 39 NA 39 NA
40 40 NA 40 8
How do I accomplish this. I have created the following code but it involves creating a second dataframe and then copying data from the first to the second. Is there a way to overwrite the existing dataframe. I would like to avoid loops
df2<-data.frame('Val1'=
seq(from=min(na.omit(c(df$Val1, df$Val3))), to= max(na.omit(c(df$Val1,
df$Val3))), by=1), "Val3"=seq(from=min(na.omit(c(df$Val1, df$Val3))), to=
max(na.omit(c(df$Val1, df$Val3))), by=1))
###### Create two loops
for(i in df$Val1){
for(j in df2$Val1){
if(i==j){
df2$Val2[df2$Val1==j]=df$Val2[df$Val1==i]
} else{df2$Val2[df2$Val1==j]=NA}}}
for(i in df$Val3){ for(j in df2$Val3){
if(i==j){df2$Val4[df2$Val3==j]=df$Val4[df$Val3==i]
} else{df2$Val4[df2$Val3==j]=NA}}}
Is there a faster vectorised way to accomplish the same. requesting some one to help

Assuming there's a slight error in your output example (row 3 should show NA for Val4 and the 3 in row 3 should be in row 5), this works:
library(tidyverse)
df_new <- bind_cols(
df %>%
select(Val1, Val2) %>%
complete(., expand(., Val1 = 1:40)),
df %>%
select(Val3, Val4) %>%
complete(., expand(., Val3 = 1:40))
)
> df_new
# A tibble: 40 x 4
Val1 Val2 Val3 Val4
<dbl> <dbl> <dbl> <dbl>
1 1 2 1 NA
2 2 NA 2 NA
3 3 NA 3 NA
4 4 NA 4 NA
5 5 NA 5 3
6 6 4 6 NA
7 7 NA 7 NA
8 8 NA 8 NA
9 9 NA 9 NA
10 10 NA 10 5
# ... with 30 more rows
We use bind_cols() to put together two parts of the dataframe:
First we select the first two columns, expand() the causal variable and complete() the data, then we do it again for the third and fourth column.

fill in NAs in dataframe between values

I have an example dataset
newdata<-data.frame(Tow.y=c(21,"NA","NA","NA","NA","NA",22,"NA","NA","NA","NA","NA",23,"NA","NA"),Tow=c("NA","NA","NA",21,"NA","NA","NA","NA",22,"NA","NA","NA","NA","NA",23))
newdata$Tow.y<-as.numeric(as.character(newdata$Tow.y))
newdata$Tow<-as.numeric(as.character(newdata$Tow))
newdata1<-newdata %>%
mutate(Station = coalesce(Tow.y, Tow))
newdata1
The resulting code produces:
Tow.y Tow Station
1 21 NA 21
2 NA NA NA
3 NA NA NA
4 NA 21 21
5 NA NA NA
6 NA NA NA
7 22 NA 22
8 NA NA NA
9 NA 22 22
10 NA NA NA
11 NA NA NA
12 NA NA NA
13 23 NA 23
14 NA NA NA
15 NA 23 23
I would like to fill in NAs for NAs between unique values in Station. So NAs in between the two 21 values would be 21, the NAs in between the 22s would be 22, etc. The NAs in between consecutive numbers would remain NAs.
Like this:
Tow.y Tow Station
1 21 NA 21
2 NA NA 21
3 NA NA 21
4 NA 21 21
5 NA NA NA
6 NA NA NA
7 22 NA 22
8 NA NA 22
9 NA 22 22
10 NA NA NA
11 NA NA NA
12 NA NA NA
13 23 NA 23
14 NA NA 23
15 NA 23 23
I have tried the na.locf function in the zoo package, but that replaces all NA values.
newdata1$Station2<-na.locf(newdata1$Station,na.rm = F)
Other examples I have looked at show that you can use na.locf with a group variable, but I dont have a grouping variable that is complete for the data set. Does anyone have a method for filling in the NAs where I need them to be filled in.

Here's a good way. I left the helper columns in to demonstrate how it works, but you can easily remove them with a select.
newdata1 %>%
mutate(from_first = zoo::na.locf(Station, na.rm = FALSE),
from_last = zoo::na.locf(Station, na.rm = FALSE, fromLast = TRUE),
result = if_else(from_first == from_last, from_first, Station))
# Tow.y Tow Station from_first from_last result
# 1 21 NA 21 21 21 21
# 2 NA NA NA 21 21 21
# 3 NA NA NA 21 21 21
# 4 NA 21 21 21 21 21
# 5 NA NA NA 21 22 NA
# 6 NA NA NA 21 22 NA
# 7 22 NA 22 22 22 22
# 8 NA NA NA 22 22 22
# 9 NA 22 22 22 22 22
# 10 NA NA NA 22 23 NA
# 11 NA NA NA 22 23 NA
# 12 NA NA NA 22 23 NA
# 13 23 NA 23 23 23 23
# 14 NA NA NA 23 23 23
# 15 NA 23 23 23 23 23

Based on the example, it seems that the 'Tow' and 'Tow.y' values match in a 'start', 'end' way. In that case, we can use base R methods.
Create a sequence index ('i1') to replicate the non-NA elements in 'Tow' (or 'Tow.y') for the 'Station' column. The 'lst' returns a list of numeric index, which is used to assign the values to 'Station'
lst <- do.call(Map, c(f = seq, unname(lapply(newdata,
function(x) seq_along(x)[!is.na(x)]))))
i1 <- unlist(lst)
newdata$Station[i1] <- rep(na.omit(newdata$Tow), lengths(lst))
newdata
# Tow.y Tow Station
#1 21 NA 21
#2 NA NA 21
#3 NA NA 21
#4 NA 21 21
#5 NA NA NA
#6 NA NA NA
#7 22 NA 22
#8 NA NA 22
#9 NA 22 22
#10 NA NA NA
#11 NA NA NA
#12 NA NA NA
#13 23 NA 23
#14 NA NA 23
#15 NA 23 23
Or using the same logic with tidyverse
library(tidyverse)
newdata %>%
mutate_all(funs(row_number() * !is.na(.))) %>%
map( ~ .x[.x!=0]) %>%
transpose %>%
map(reduce, `:`) %>%
set_names(na.omit(newdata$Tow)) %>%
stack %>%
right_join(newdata %>% mutate(values = row_number())) %>%
rename(Station = ind) %>%
ungroup %>%
select(names(newdata), everything(), -values)
# Tow.y Tow Station
#1 21 NA 21
#2 NA NA 21
#3 NA NA 21
#4 NA 21 21
#5 NA NA <NA>
#6 NA NA <NA>
#7 22 NA 22
#8 NA NA 22
#9 NA 22 22
#10 NA NA <NA>
#11 NA NA <NA>
#12 NA NA <NA>
#13 23 NA 23
#14 NA NA 23
#15 NA 23 23

Combine dataframes with "one column name in common with different number of elements in that common column" into one

Assume we have 5 dataframes with one column of them is based on the same timestamp ("year"); each has possibly a different number of rows and columns, but each has a first column called year. Each start and end in a different year, so there is no common starting or ending date among all the dataframes. We want to combine all of the dataframes into one based on the year those data were collected (i.e., the data collected in one matrix in a particular year, correspond to the data in other dataframes for that same year). For those dataframes that do not have a corresponding year, we want to fill those blanks with NA.
How can we line the data up and combine them into one dataframe?
Assume for the sake of argument, we have the following dataframes:
M1 <- data.frame(year=2000:2010, v1=16:26, v2=25:35)
M1; dim(M1) # 11x3
M2 <- data.frame(year=2005:2018, v3=6:19, v4=5:18, v5=3:16)
M2; dim(M2) #14x4
M3 <- data.frame(year=2002:2016, v3=3:17, v6=2:16, v7=0:14)
M3; dim(M3) # 15x4
M4 <- data.frame(year=2008:2020, v3=9:21, v6=8:20, v8=6:18)
M4; dim(M4) # 13x4
M5 <- data.frame(year=2018:2020, v9=19:21, v10=18:20, v11=16:18, v12=29:31)
M5; dim(M5) # 3x5
Note: A very similar question was asked by another useR, and closed by the reasoning "not clear". I clarified his question neatly.

I think it's more efficient to use the full_join command of dplyr. You can use full_join every time a new dataframe needs to be joined, or use it once within a reduce function that will work sequentially. See both methods below:
# create example datasets
M1 <- data.frame(year=2000:2010, v1=16:26, v2=25:35)
M2 <- data.frame(year=2005:2018, v3=6:19, v4=5:18, v5=3:16)
M3 <- data.frame(year=2002:2016, v3=3:17, v6=2:16, v7=0:14)
M4 <- data.frame(year=2008:2020, v3=9:21, v6=8:20, v8=6:18)
M5 <- data.frame(year=2018:2020, v9=19:21, v10=18:20, v11=16:18, v12=29:31)
First method:
library(dplyr)
# use the full_join command
# you have to "manually" use a full_join command for every new dataset you want to join
full_join(M1, M2, by="year") %>%
full_join(M3, by="year") %>%
full_join(M4, by="year") %>%
full_join(M5, by="year")
# year v1 v2 v3.x v4 v5 v3.y v6.x v7 v3 v6.y v8 v9 v10 v11 v12
# 1 2000 16 25 NA NA NA NA NA NA NA NA NA NA NA NA NA
# 2 2001 17 26 NA NA NA NA NA NA NA NA NA NA NA NA NA
# 3 2002 18 27 NA NA NA 3 2 0 NA NA NA NA NA NA NA
# 4 2003 19 28 NA NA NA 4 3 1 NA NA NA NA NA NA NA
# 5 2004 20 29 NA NA NA 5 4 2 NA NA NA NA NA NA NA
# 6 2005 21 30 6 5 3 6 5 3 NA NA NA NA NA NA NA
# 7 2006 22 31 7 6 4 7 6 4 NA NA NA NA NA NA NA
# 8 2007 23 32 8 7 5 8 7 5 NA NA NA NA NA NA NA
# 9 2008 24 33 9 8 6 9 8 6 9 8 6 NA NA NA NA
# 10 2009 25 34 10 9 7 10 9 7 10 9 7 NA NA NA NA
# 11 2010 26 35 11 10 8 11 10 8 11 10 8 NA NA NA NA
# 12 2011 NA NA 12 11 9 12 11 9 12 11 9 NA NA NA NA
# 13 2012 NA NA 13 12 10 13 12 10 13 12 10 NA NA NA NA
# 14 2013 NA NA 14 13 11 14 13 11 14 13 11 NA NA NA NA
# 15 2014 NA NA 15 14 12 15 14 12 15 14 12 NA NA NA NA
# 16 2015 NA NA 16 15 13 16 15 13 16 15 13 NA NA NA NA
# 17 2016 NA NA 17 16 14 17 16 14 17 16 14 NA NA NA NA
# 18 2017 NA NA 18 17 15 NA NA NA 18 17 15 NA NA NA NA
# 19 2018 NA NA 19 18 16 NA NA NA 19 18 16 19 18 16 29
# 20 2019 NA NA NA NA NA NA NA NA 20 19 17 20 19 17 30
# 21 2020 NA NA NA NA NA NA NA NA 21 20 18 21 20 18 31
Second method:
library(purrr)
# apply full join sequentially to the datasets in your list
list(M1,M2,M3,M4,M5) %>%
reduce(full_join, by="year")
# year v1 v2 v3.x v4 v5 v3.y v6.x v7 v3 v6.y v8 v9 v10 v11 v12
# 1 2000 16 25 NA NA NA NA NA NA NA NA NA NA NA NA NA
# 2 2001 17 26 NA NA NA NA NA NA NA NA NA NA NA NA NA
# 3 2002 18 27 NA NA NA 3 2 0 NA NA NA NA NA NA NA
# 4 2003 19 28 NA NA NA 4 3 1 NA NA NA NA NA NA NA
# 5 2004 20 29 NA NA NA 5 4 2 NA NA NA NA NA NA NA
# 6 2005 21 30 6 5 3 6 5 3 NA NA NA NA NA NA NA
# 7 2006 22 31 7 6 4 7 6 4 NA NA NA NA NA NA NA
# 8 2007 23 32 8 7 5 8 7 5 NA NA NA NA NA NA NA
# 9 2008 24 33 9 8 6 9 8 6 9 8 6 NA NA NA NA
# 10 2009 25 34 10 9 7 10 9 7 10 9 7 NA NA NA NA
# 11 2010 26 35 11 10 8 11 10 8 11 10 8 NA NA NA NA
# 12 2011 NA NA 12 11 9 12 11 9 12 11 9 NA NA NA NA
# 13 2012 NA NA 13 12 10 13 12 10 13 12 10 NA NA NA NA
# 14 2013 NA NA 14 13 11 14 13 11 14 13 11 NA NA NA NA
# 15 2014 NA NA 15 14 12 15 14 12 15 14 12 NA NA NA NA
# 16 2015 NA NA 16 15 13 16 15 13 16 15 13 NA NA NA NA
# 17 2016 NA NA 17 16 14 17 16 14 17 16 14 NA NA NA NA
# 18 2017 NA NA 18 17 15 NA NA NA 18 17 15 NA NA NA NA
# 19 2018 NA NA 19 18 16 NA NA NA 19 18 16 19 18 16 29
# 20 2019 NA NA NA NA NA NA NA NA 20 19 17 20 19 17 30
# 21 2020 NA NA NA NA NA NA NA NA 21 20 18 21 20 18 31

You can simply do,
Reduce(function(x, y)merge(x, y, by = 'year', all = TRUE), mget(ls(pattern = 'M[0-9]+')))
which gives,
year v1 v2 v3.x v4 v5 v3.y v6.x v7 v3 v6.y v8 v9 v10 v11 v12
1 2000 16 25 NA NA NA NA NA NA NA NA NA NA NA NA NA
2 2001 17 26 NA NA NA NA NA NA NA NA NA NA NA NA NA
3 2002 18 27 NA NA NA 3 2 0 NA NA NA NA NA NA NA
4 2003 19 28 NA NA NA 4 3 1 NA NA NA NA NA NA NA
5 2004 20 29 NA NA NA 5 4 2 NA NA NA NA NA NA NA
6 2005 21 30 6 5 3 6 5 3 NA NA NA NA NA NA NA
7 2006 22 31 7 6 4 7 6 4 NA NA NA NA NA NA NA
8 2007 23 32 8 7 5 8 7 5 NA NA NA NA NA NA NA
9 2008 24 33 9 8 6 9 8 6 9 8 6 NA NA NA NA
10 2009 25 34 10 9 7 10 9 7 10 9 7 NA NA NA NA
11 2010 26 35 11 10 8 11 10 8 11 10 8 NA NA NA NA
12 2011 NA NA 12 11 9 12 11 9 12 11 9 NA NA NA NA
13 2012 NA NA 13 12 10 13 12 10 13 12 10 NA NA NA NA
14 2013 NA NA 14 13 11 14 13 11 14 13 11 NA NA NA NA
15 2014 NA NA 15 14 12 15 14 12 15 14 12 NA NA NA NA
16 2015 NA NA 16 15 13 16 15 13 16 15 13 NA NA NA NA
17 2016 NA NA 17 16 14 17 16 14 17 16 14 NA NA NA NA
18 2017 NA NA 18 17 15 NA NA NA 18 17 15 NA NA NA NA
19 2018 NA NA 19 18 16 NA NA NA 19 18 16 19 18 16 29
20 2019 NA NA NA NA NA NA NA NA 20 19 17 20 19 17 30
21 2020 NA NA NA NA NA NA NA NA 21 20 18 21 20 18 31

Some of the data frames do have identical column names. So, it is possible that there might be different values for the same year and column name.
Therefore, I suggest to rbindlist() all data frames and use melt() and dcast() with an appropriate aggregation function which will make those "duplicate" entries visible:
df_list <- mget(paste0("M", 1:5))
library(datat.table)
rbindlist(df_list, use.names = TRUE, fill = TRUE, idcol = "df")[
, melt(.SD, id.vars = c("df", "year"), na.rm = TRUE)][
, dcast(.SD, year ~ variable, function(x) toString(unique(x)))]
year v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12
1: 2000 16 25
2: 2001 17 26
3: 2002 18 27 3 2 0
4: 2003 19 28 4 3 1
5: 2004 20 29 5 4 2
6: 2005 21 30 6 5 3 5 3
7: 2006 22 31 7 6 4 6 4
8: 2007 23 32 8 7 5 7 5
9: 2008 24 33 9 8 6 8 6 6
10: 2009 25 34 10 9 7 9 7 7
11: 2010 26 35 11 10 8 10 8 8
12: 2011 12 11 9 11 9 9
13: 2012 13 12 10 12 10 10
14: 2013 14 13 11 13 11 11
15: 2014 15 14 12 14 12 12
16: 2015 16 15 13 15 13 13
17: 2016 17 16 14 16 14 14
18: 2017 18 17 15 17 15
19: 2018 19 18 16 18 16 19 18 16 29
20: 2019 20 19 17 20 19 17 30
21: 2020 21 20 18 21 20 18 31
year v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12
Alternatively, the name of the source data frame can be used to make the column names unique:
rbindlist(df_list, use.names = TRUE, fill = TRUE, idcol = "df")[
, melt(.SD, id.vars = c("df", "year"), na.rm = TRUE)][
, dcast(.SD, year ~ paste(variable, df, sep = "_"))]
year v10_M5 v11_M5 v12_M5 v1_M1 v2_M1 v3_M2 v3_M3 v3_M4 v4_M2 v5_M2 v6_M3 v6_M4 v7_M3 v8_M4 v9_M5
1: 2000 NA NA NA 16 25 NA NA NA NA NA NA NA NA NA NA
2: 2001 NA NA NA 17 26 NA NA NA NA NA NA NA NA NA NA
3: 2002 NA NA NA 18 27 NA 3 NA NA NA 2 NA 0 NA NA
4: 2003 NA NA NA 19 28 NA 4 NA NA NA 3 NA 1 NA NA
5: 2004 NA NA NA 20 29 NA 5 NA NA NA 4 NA 2 NA NA
6: 2005 NA NA NA 21 30 6 6 NA 5 3 5 NA 3 NA NA
7: 2006 NA NA NA 22 31 7 7 NA 6 4 6 NA 4 NA NA
8: 2007 NA NA NA 23 32 8 8 NA 7 5 7 NA 5 NA NA
9: 2008 NA NA NA 24 33 9 9 9 8 6 8 8 6 6 NA
10: 2009 NA NA NA 25 34 10 10 10 9 7 9 9 7 7 NA
11: 2010 NA NA NA 26 35 11 11 11 10 8 10 10 8 8 NA
12: 2011 NA NA NA NA NA 12 12 12 11 9 11 11 9 9 NA
13: 2012 NA NA NA NA NA 13 13 13 12 10 12 12 10 10 NA
14: 2013 NA NA NA NA NA 14 14 14 13 11 13 13 11 11 NA
15: 2014 NA NA NA NA NA 15 15 15 14 12 14 14 12 12 NA
16: 2015 NA NA NA NA NA 16 16 16 15 13 15 15 13 13 NA
17: 2016 NA NA NA NA NA 17 17 17 16 14 16 16 14 14 NA
18: 2017 NA NA NA NA NA 18 NA 18 17 15 NA 17 NA 15 NA
19: 2018 18 16 29 NA NA 19 NA 19 18 16 NA 18 NA 16 19
20: 2019 19 17 30 NA NA NA NA 20 NA NA NA 19 NA 17 20
21: 2020 20 18 31 NA NA NA NA 21 NA NA NA 20 NA 18 21
year v10_M5 v11_M5 v12_M5 v1_M1 v2_M1 v3_M2 v3_M3 v3_M4 v4_M2 v5_M2 v6_M3 v6_M4 v7_M3 v8_M4 v9_M5

Step 1: Find the minimum and maximum year taking into account all the dataframes:
min(M1["year"], M2["year"], M3["year"], M4["year"], M5["year"]) # 2000
max(M1["year"], M2["year"], M3["year"], M4["year"], M5["year"]) # 2020
Step 2: Extend M1 through M5 by pumping relevant NAs, and taking care the missing years at the beginning and at the end
M1NA <- rbind(M1, data.frame(year=2011:2020, v1=NA, v2=NA))
M1NA
M2NA <- rbind(data.frame(year=2000:2004, v3=NA, v4=NA, v5=NA), M2, data.frame(year=2019:2020, v3=NA, v4=NA, v5=NA))
M2NA
M3NA <- rbind(data.frame(year=2000:2001, v3=NA, v6=NA, v7=NA), M3, data.frame(year=2017:2020, v3=NA, v6=NA, v7=NA))
M3NA
M4NA <- rbind(data.frame(year=2000:2007, v3=NA, v6=NA, v8=NA), M4)
M4NA
M5NA <- rbind(data.frame(year=2000:2017, v9=NA, v10=NA, v11=NA, v12=NA), M5)
M5NA
Step 3: Combine NA'ed dataframes in the final dataframe; no need to repeat year column in the other dataframes, hence delete them.
CombinedFrame <- cbind(M1NA, M2NA[-1], M3NA[-1], M4NA[-1], M5NA[-1])
CombinedFrame
In case of matrices, do the following:
Convert resultant dataframe to matrix by either of the following ways:
CombinedMatrix <- as.matrix(sapply(CombinedFrame, as.numeric))
CombinedMatrix
CombinedMatrix <- matrix(as.numeric(unlist(CombinedFrame)),nrow=nrow(CombinedFrame))
CombinedMatrix
Note: The above conversions take into account the possibility of the existence of strings in dataframe (i.e., the matrix at hand at the beginning)
This produces the following (just as desired):
year v1 v2 v3 v4 v5 v3 v6 v7 v3 v6 v8 v9 v10 v11 v12
[1,] 2000 16 25 NA NA NA NA NA NA NA NA NA NA NA NA NA
[2,] 2001 17 26 NA NA NA NA NA NA NA NA NA NA NA NA NA
[3,] 2002 18 27 NA NA NA 3 2 0 NA NA NA NA NA NA NA
[4,] 2003 19 28 NA NA NA 4 3 1 NA NA NA NA NA NA NA
[5,] 2004 20 29 NA NA NA 5 4 2 NA NA NA NA NA NA NA
[6,] 2005 21 30 6 5 3 6 5 3 NA NA NA NA NA NA NA
[7,] 2006 22 31 7 6 4 7 6 4 NA NA NA NA NA NA NA
[8,] 2007 23 32 8 7 5 8 7 5 NA NA NA NA NA NA NA
[9,] 2008 24 33 9 8 6 9 8 6 9 8 6 NA NA NA NA
[10,] 2009 25 34 10 9 7 10 9 7 10 9 7 NA NA NA NA
[11,] 2010 26 35 11 10 8 11 10 8 11 10 8 NA NA NA NA
[12,] 2011 NA NA 12 11 9 12 11 9 12 11 9 NA NA NA NA
[13,] 2012 NA NA 13 12 10 13 12 10 13 12 10 NA NA NA NA
[14,] 2013 NA NA 14 13 11 14 13 11 14 13 11 NA NA NA NA
[15,] 2014 NA NA 15 14 12 15 14 12 15 14 12 NA NA NA NA
[16,] 2015 NA NA 16 15 13 16 15 13 16 15 13 NA NA NA NA
[17,] 2016 NA NA 17 16 14 17 16 14 17 16 14 NA NA NA NA
[18,] 2017 NA NA 18 17 15 NA NA NA 18 17 15 NA NA NA NA
[19,] 2018 NA NA 19 18 16 NA NA NA 19 18 16 19 18 16 29
[20,] 2019 NA NA NA NA NA NA NA NA 20 19 17 20 19 17 30
[21,] 2020 NA NA NA NA NA NA NA NA 21 20 18 21 20 18 31

Rank instances by missing amount in descending order

I want to sort this dataset as (rank instances by missing amount in descending order)
can someone help me how to do it in R language , is there any command to do it in r .
df=data.frame(x=c(1,4,6,NA,7,NA,9,10,4,NA),
y=c(10,12,NA,NA,14,18,20,15,12,17),
z=c(225,198,NA,NA,NA,130,NA,200,NA,99),
v=c(44,51,NA,NA,45,NA,25,36,75,NA))
df
x y z v
1 1 10 225 44
2 4 12 198 51
3 6 NA NA NA
4 NA NA NA NA
5 7 14 NA 45
6 NA 18 130 NA
7 9 20 NA 25
8 10 15 200 36
9 4 12 NA 75
10 NA 17 99 NA
I want to get this result :
x y z v
4 NA NA NA NA
3 6 NA NA NA
6 NA 18 130 NA
10 NA 17 99 NA
5 7 14 NA 45
7 9 20 NA 25
9 4 12 NA 75
1 1 10 225 44
2 4 12 198 51
8 10 15 200 36

In my comment I incorrectly remembered the name of the argument for changing the direction of an order result. The fix is simply to use the correct name:
> df[ order(rowSums(is.na(df)), decreasing=TRUE), ]
x y z v
4 NA NA NA NA
3 6 NA NA NA
6 NA 18 130 NA
10 NA 17 99 NA
5 7 14 NA 45
7 9 20 NA 25
9 4 12 NA 75
1 1 10 225 44
2 4 12 198 51
8 10 15 200 36

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

NA value in a dataframe - r

As figured out in the comments, the mistake was in the placement of indices coef(regression)[1] and coef(regression[2]).

Related

Fill missing values in time series using previous day data - R

Expand a dataframe based on columns in the dataframe in R

fill in NAs in dataframe between values

Combine dataframes with "one column name in common with different number of elements in that common column" into one

Rank instances by missing amount in descending order

Categories

Resources