Inputting values into data frame of longer length - r

I have 2 data frames with different numbers of rows (A has 55 and B has 41). I would like to take the Py values from data frame B and put them into A$Py corresponding to the "Link".
I tried
link.list <- A$Link
for(i in 1:length(link.list)){
A$Py[i] <- B[which(B$Link==link.list[i]), "Py"]
}
But get:
Error in A$Py[i] <- B[which(B$Link == link.list[i]), "Py"] :
replacement has length zero
I assume this error is triggered when there is a A$Link that is not in B. Any ideas solving this problem?
Thanks
data frame A:
Link VU Py
1 DVH1-1 1 NA
2 DVH1-10 9 NA
3 DVH1-2 1 NA
4 DVH1-3 1 NA
5 DVH1-4 9 NA
6 DVH1-5 9 NA
7 DVH1-6 1 NA
8 DVH1-7 1 NA
9 DVH1-8 10 NA
10 DVH1-9 10 NA
11 DVH2-1 2 NA
12 DVH2-2 1 NA
13 DVH2-3 9 NA
14 DVH2-4 9 NA
15 DVH2-5 10 NA
16 DVH2-6 9 NA
17 DVH2-7 4 NA
18 DVH2-8 9 NA
19 DVH3-1 1 NA
20 DVH3-2 12 NA
21 DVH3-3 12 NA
22 DWH1-1 4 NA
23 DWH1-10 8 NA
24 DWH1-2 4 NA
25 DWH1-3 4 NA
26 DWH1-4 8 NA
27 DWH1-5 8 NA
28 DWH1-6 4 NA
29 DWH1-7 4 NA
30 DWH1-8 9 NA
31 DWH1-9 9 NA
32 DWH2-1 4 NA
33 DWH2-2 4 NA
34 DWH2-3 8 NA
35 DWH2-4 8 NA
36 DWH2-5 8 NA
37 DWH2-6 8 NA
38 DWH2-7 7 NA
39 DWH2-8 5 NA
40 DWH3-1 3 NA
41 DWH3-2 49 NA
42 DWH3-3 0 NA
43 MH1-1 0 NA
44 MH1-2 1 NA
45 MH1-3 1 NA
46 MH1-4 1 NA
47 MH1-5 1 NA
48 UH1-1 17 NA
49 UH1-2 17 NA
50 UH1-3 17 NA
51 UH1-4 19 NA
52 UH2-1 4 NA
53 UH2-2 15 NA
54 UH3-1 24 NA
55 UH3-2 25 NA
data frame B:
Link Py
1 DVH1-1 0
2 DVH1-10 4
3 DVH1-2 0
4 DVH1-3 14
5 DVH1-4 0
6 DVH1-5 2
7 DVH1-6 12
8 DVH1-7 11
9 DVH1-8 9
10 DVH1-9 9
11 DVH2-1 0
12 DVH2-2 14
13 DVH2-3 3
14 DVH2-4 0
15 DVH2-5 10
16 DVH2-6 0
17 DVH2-7 2
18 DVH2-8 4
19 DVH3-1 16
20 DVH3-3 8
21 DWH1-1 6
22 DWH1-10 2
23 DWH1-2 0
24 DWH1-3 7
25 DWH1-5 0
26 DWH1-6 12
27 DWH1-7 10
28 DWH1-8 0
29 DWH1-9 3
30 DWH2-1 0
31 DWH2-2 10
32 DWH2-7 0
33 DWH2-8 9
34 DWH3-1 0
35 DWH3-2 0
36 MH1-1 0
37 UH1-3 6
38 UH1-4 4
39 UH2-1 0
40 UH2-2 9
41 UH3-2 4

Use merge and merge by Link, all.x will return all rows for x (in your case x= A).
I've only passed the first two columns of A, as A$pY in your example were all NA
merge(A[,1:2],B,by='Link', all.x = TRUE)

> head(a)
X Link VU Py
1 1 DVH1-1 1 NA
2 2 DVH1-10 9 NA
3 3 DVH1-2 1 NA
4 4 DVH1-3 1 NA
5 5 DVH1-4 9 NA
6 6 DVH1-5 9 NA
> head(b)
X Link Py
1 1 DVH1-1 0
2 2 DVH1-10 4
3 3 DVH1-2 0
4 4 DVH1-3 14
5 5 DVH1-4 0
6 6 DVH1-5 2
a[a$Link %in% b$Link,5]<-b[a$Link %in% b$Link,3]
names(a)[5]<-"Py1"
> head(a)
X Link VU Py Py1
1 1 DVH1-1 1 NA 0
2 2 DVH1-10 9 NA 4
3 3 DVH1-2 1 NA 0
4 4 DVH1-3 1 NA 14
5 5 DVH1-4 9 NA 0
6 6 DVH1-5 9 NA 2

Related

R: How to swap values in a data frame with condition?

My dataset has 2 IDs respectively from a parent and a child but I don't know which is who. I have however their age
This is the table I am working with:
ID1 ID2 sex1 sex2 age1 age2
1 8 9 1 2 44 11
2 17 7 1 1 56 76
3 1 44 NA NA 16 55
4 3 13 NA NA NA NA
5 55 6 2 NA 56 10
6 4 33 2 NA 45 9
7 2 66 1 NA 12 45
8 72 99 NA NA NA NA
9 12 11 2 2 30 12
By using an if statement, I want to identify who's who according to their age.
Here is the code I made but it is not working:
install.packages('seqinr')
library(seqinr)
for (i in 1:nrow(data)){
if (data$age2[i]> data$age1[i]){
swap(data$age1[i], data$age2[i])
}
}
The error message:
Error in if (data$age2[i] > data$age1[i]) { :
missing value where TRUE/FALSE needed
I want to put the parents' age in age1 and the child's age in age2.
Does someone has a better idea on how to do it?
Welcome to SO!
You can manage it without any for loop, in case you only need to put the highest value in age1, and the lower value in age2, comparing by row the two columns:
# I've put age_* to compare results with data, to replace, use age* in df$age*
df$age_1 <- pmax(df$age1, df$age2)
df$age_2 <- pmin(df$age1, df$age2)
With result:
ID1 ID2 sex1 sex2 age1 age2 age_1 age_2
1 8 9 1 2 44 11 44 11
2 17 7 1 1 56 76 76 56
3 1 44 NA NA 16 55 55 16
4 3 13 NA NA NA NA NA NA
5 55 6 2 NA 56 10 56 10
6 4 33 2 NA 45 9 45 9
7 2 66 1 NA 12 45 45 12
8 72 99 NA NA NA NA NA NA
9 12 11 2 2 30 12 30 12
With data:
df <- read.table(text = 'ID1 ID2 sex1 sex2 age1 age2
1 8 9 1 2 44 11
2 17 7 1 1 56 76
3 1 44 NA NA 16 55
4 3 13 NA NA NA NA
5 55 6 2 NA 56 10
6 4 33 2 NA 45 9
7 2 66 1 NA 12 45
8 72 99 NA NA NA NA
9 12 11 2 2 30 12', header = T)
library(tidyverse)
df <- read_table(
"ID1 ID2 sex1 sex2 age1 age2
8 9 1 2 44 11
17 7 1 1 56 76
1 44 NA NA 16 55
3 13 NA NA NA NA
55 6 2 NA 56 10
4 33 2 NA 45 9
2 66 1 NA 12 45
72 99 NA NA NA NA
12 11 2 2 30 12"
)
Method 1:
df %>%
transform(age1 = case_when(age1 > age2 ~ age1,
TRUE ~ age2),
age2 = case_when(age2 > age1 ~ age2,
TRUE ~ age1))
Method 2:
df %>%
transform(age1 = pmax(age1, age2),
age2 = pmin(age1, age2))
ID1 ID2 sex1 sex2 age1 age2
1 8 9 1 2 44 11
2 17 7 1 1 76 56
3 1 44 NA NA 55 16
4 3 13 NA NA NA NA
5 55 6 2 NA 56 10
6 4 33 2 NA 45 9
7 2 66 1 NA 45 12
8 72 99 NA NA NA NA
9 12 11 2 2 30 12

NA value in a dataframe

I try to apply a function to a column of a dataframe but when I do this i got a column full of NA values. I don't understand why.
Here is my code :
courbe <- function(x) exp(coef(regression)[1]*x+coef(regression[2]))
dataT[,c(2)] <- courbe(dataT[,c(1)])
And here my dataframe :
DateRep Cases
1 25 NA
2 24 NA
3 23 NA
4 22 NA
5 21 NA
6 20 NA
7 19 NA
8 18 NA
9 17 NA
10 16 NA
11 15 NA
12 14 NA
13 13 NA
14 12 NA
15 11 NA
16 10 NA
17 9 NA
18 8 NA
19 7 NA
20 6 NA
21 5 NA
22 4 NA
23 3 NA
24 2 NA
25 1 NA
26 0 NA
The output of print(coef(regression)) :
Coefficients:
(Intercept) dataT$DateRep
2.7095 0.2211
As figured out in the comments, the mistake was in the placement of indices coef(regression)[1] and coef(regression[2]).

Is there a way to fix the issue of there being rows that has NA values for all of its attributes and NA for its rowname?

I am trying to store all rows with NA for my columns Math_G1, Math_G2 and Math_G3 into a dataset variable. However when I do this, there are additional rows that pops up which have NA as values throughout all its attributes including its row number (eg. NA.1, NA.2 ...) How do I fix this?
I have already tried to use the c() function to attempt to filter out all these results but these rows are still there, in addition to this, i have also used the which() function but they are still there.
Here is my code :
dat <- read.csv(file = "final merged.csv", stringsAsFactors=FALSE, na.strings=c("NA", "NULL"))
dat_small <- dat[c("age","traveltime","studytime",
"failures","famrel","freetime","goout","Dalc","Walc",
"health","absences","Math_G1","Math_G2","Math_G3","Por_G1","Por_G2","Por_G3","DoubleSub")]
sample_size <- 500
all_set <- sample(1:length(dat[,1]),sample_size,replace = F)
dat <- dat_small[all_set,]
index_na_math <- which(is.na(c(dat$Math_G1,dat$Math_G2,dat$Math_G3)))
index_na_por <- which(is.na(c(dat$Por_G1,dat$Por_G2,dat$Por_G3)))
index_na_both <- c(index_na_math,index_na_por)
#each row of my dataset helps define a specific student
#portugese and math are subjects that students within the dataset takes
dat_purepor <- dat[which(index_na_math),] #students who takes only portugese
dat_puremath <- dat[c(index_na_por),] # students who takes only math
dat_math <- dat[c(-index_na_math),] #students who takes math + students who take both
dat_por <- dat[c(-index_na_por),] #students who take portugese + students who take both
dat_both <- dat[c(-index_na_both),] #students who takes both math and portugese
dat_purepor
dat_puremath
I expected the output to be filtered according to my conditions but without any rows with NA as the values for all its columns so I don't understand why the final results return NA.
Here is a preview of the dataset dat_small:
> dat_small
age traveltime studytime failures famrel freetime goout Dalc Walc health absences Math_G1 Math_G2 Math_G3 Por_G1 Por_G2 Por_G3 DoubleSub
1 18 2 2 0 4 3 4 1 1 3 6 5 6 6 13 13 13 1
2 17 1 2 0 5 3 3 1 1 3 4 5 5 6 15 15 15 1
3 15 1 2 3 4 3 2 2 3 3 10 7 8 10 10 12 13 1
4 15 1 3 0 3 2 2 1 1 5 2 15 14 15 14 14 14 1
5 16 1 2 0 4 3 2 1 2 5 4 6 10 10 13 13 13 1
6 16 1 2 0 5 4 2 1 2 5 10 15 15 15 10 13 13 1
7 16 1 2 0 4 4 4 1 1 3 0 12 12 11 14 14 16 1
8 17 2 2 0 4 1 4 1 1 1 6 6 5 6 12 13 13 1
9 15 1 2 0 4 2 2 1 1 1 0 16 18 19 13 17 17 1
10 15 1 2 0 5 5 1 1 1 5 0 14 15 15 9 10 11 1
11 15 1 2 0 3 3 3 1 2 2 0 10 8 9 15 15 15 1
12 15 3 3 0 5 2 2 1 1 4 4 10 12 12 10 12 13 1
13 15 1 1 0 4 3 3 1 3 5 2 14 14 14 13 14 15 1
14 15 2 2 0 5 4 3 1 2 3 2 10 10 11 14 14 14 1
15 15 1 3 0 4 5 2 1 1 3 0 14 16 16 11 12 14 1
16 16 1 1 0 4 4 4 1 2 2 4 14 14 14 9 8 9 1
17 16 1 3 0 3 2 3 1 2 2 6 13 14 14 10 10 16 1
18 16 3 2 0 5 3 2 1 1 4 4 8 10 10 11 11 11 1
19 17 1 1 3 5 5 5 2 4 5 16 6 5 5 10 13 13 1
20 16 1 1 0 3 1 3 1 3 5 4 8 10 10 14 14 14 1
21 15 1 2 0 4 4 1 1 1 1 0 13 14 15 9 8 10 1
22 15 1 1 0 5 4 2 1 1 5 0 12 15 15 10 13 13 1
23 16 1 2 0 4 5 1 1 3 5 2 15 15 16 11 10 11 1
24 16 2 2 0 5 4 4 2 4 5 0 13 13 12 14 14 14 1
25 15 1 3 0 4 3 2 1 1 5 2 10 9 8 10 11 10 1
26 16 1 1 2 1 2 2 1 3 5 14 6 9 8 13 13 13 1
27 15 1 1 0 4 2 2 1 2 5 2 12 12 11 12 11 12 1
28 15 1 1 0 2 2 4 2 4 1 4 15 16 15 14 12 12 1
29 16 1 2 0 5 3 3 1 1 5 4 11 11 11 10 10 1 1
30 16 1 2 0 4 4 5 5 5 5 16 10 12 11 9 12 12 1
31 15 1 2 0 5 4 2 3 4 5 0 9 11 12 9 10 11 1
32 15 2 2 0 4 3 1 1 1 5 0 17 16 17 14 14 16 1
33 15 1 2 0 4 5 2 1 1 5 0 17 16 16 14 14 16 1
34 15 1 2 0 5 3 2 1 1 2 0 8 10 12 10 13 13 1
35 16 1 1 0 5 4 3 1 1 5 0 12 14 15 9 12 12 1
36 15 2 1 0 3 5 1 1 1 5 0 8 7 6 14 13 12 1
37 15 1 3 0 5 4 3 1 1 4 2 15 16 18 14 14 16 1
38 16 2 3 0 2 4 3 1 1 5 7 15 16 15 9 9 8 1
39 15 1 3 0 4 3 2 1 1 5 2 12 12 11 14 13 12 1
40 15 1 1 0 4 3 1 1 1 2 8 14 13 13 14 13 12 1
41 16 2 2 1 3 3 3 1 2 3 25 7 10 11 13 13 13 1
42 15 1 1 0 5 4 3 2 4 5 8 12 12 12 10 13 13 1
43 15 1 2 0 4 3 3 1 1 5 2 19 18 18 9 12 12 1
44 15 1 1 0 5 4 1 1 1 1 0 8 8 11 10 13 13 1
45 16 2 2 1 4 3 3 2 2 5 14 10 10 9 11 11 11 1
46 15 1 2 0 5 2 2 1 1 5 8 8 8 6 12 11 12 1
47 16 1 2 0 2 3 5 1 4 3 12 11 12 11 10 11 11 1
48 16 1 4 0 4 2 2 1 1 2 4 19 19 20 14 14 16 1
49 15 1 2 0 4 3 3 2 2 5 2 15 15 14 10 13 13 1
50 15 1 2 1 4 4 4 1 1 3 2 7 7 7 15 15 15 1
51 16 3 2 0 4 3 3 2 3 4 2 12 13 13 13 13 13 1
52 15 1 2 0 4 3 3 1 1 5 2 11 13 13 16 14 16 1
53 15 2 1 1 5 5 5 3 4 5 6 11 11 10 14 14 16 1
54 15 1 1 0 3 3 4 2 3 5 0 8 10 11 11 12 13 1
55 15 1 1 0 5 3 4 4 4 1 6 10 13 13 13 12 13 1
[ reached getOption("max.print") -- omitted 889 rows ]
Here is a preview of what happens when i run the dat_puremath dataset.
> dat_puremath
age traveltime studytime failures famrel freetime goout Dalc Walc health absences Math_G1 Math_G2 Math_G3 Por_G1 Por_G2 Por_G3 DoubleSub
918 15 2 4 0 4 4 2 2 3 3 12 16 16 16 NA NA NA 0
931 16 1 2 3 2 3 3 2 2 4 5 7 7 7 NA NA NA 0
933 16 1 2 0 3 3 4 1 1 4 0 12 13 14 NA NA NA 0
935 16 1 1 0 4 5 2 1 1 5 20 13 12 12 NA NA NA 0
927 16 2 2 0 3 4 4 1 4 5 2 13 13 11 NA NA NA 0
929 17 1 2 0 5 3 3 1 1 3 0 8 8 9 NA NA NA 0
942 17 1 3 0 3 3 2 2 2 3 3 11 11 11 NA NA NA 0
928 16 1 2 0 1 2 2 1 2 1 14 12 13 12 NA NA NA 0
936 17 1 3 0 3 2 3 1 1 4 4 10 9 9 NA NA NA 0
939 17 1 4 0 5 2 2 1 2 5 0 17 17 18 NA NA NA 0
941 17 1 2 0 4 2 2 1 1 3 12 11 9 9 NA NA NA 0
937 17 1 2 0 5 4 5 1 2 5 4 10 9 11 NA NA NA 0
925 16 1 2 0 4 4 2 1 1 3 0 14 14 14 NA NA NA 0
938 17 1 3 0 4 3 3 1 1 3 6 13 12 12 NA NA NA 0
921 15 1 3 0 4 2 2 1 1 5 2 9 11 11 NA NA NA 0
943 17 1 3 0 4 4 3 1 1 5 7 12 14 14 NA NA NA 0
NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.2 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.3 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.4 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.5 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.6 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.7 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.8 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.9 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.10 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.11 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.12 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.13 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.14 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.15 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.16 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.17 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.18 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.19 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.20 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.21 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.22 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.23 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.24 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.25 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.26 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.27 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.28 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.29 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.30 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA.31 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Can someone explain why this happens and how I can fix it? Thank you!
When indexing, using is.na(c(dat$Math_G1,dat$Math_G2,dat$Math_G3)) creates an array of length 3*nrow(dat), so when applying the indices it does not behave as expected once past the index number nrow(dat).
Try the following
index_na_math <- (is.na(dat$Math_G1) | is.na(dat$Math_G2) | is.na(dat$Math_G3))
similarly for the other one, and then
index_na_both <- index_na_math | index_na_por
# or depending what you mean by 'both'
index_na_both <- index_na_math & index_na_por
The subsetting with dat_math <- dat[!index_na_math,] will yield the expected result (accordingly for the others).

Expand a dataframe based on columns in the dataframe in R

I have the following dataframe in R
df<-data.frame(
"Val1"=seq(from=1, to=40, by=5), 'Val2'=c(2,4,2,5,11,3,5,3),
"Val3"=seq(from=5, to=40, by=5), "Val4"=c(3,5,7,3,7,5,7,8))
The resulting dataframe looks as follows. Val 1, Val3 are the causal variables and Val2, Val4 are the dependent variables
Val1 Val2 Val3 Val4
1 1 2 5 3
2 6 4 10 5
3 11 2 15 7
4 16 5 20 3
5 21 11 25 7
6 26 3 30 5
7 31 5 35 7
8 36 3 40 8
I wish to obtain the following dataframe as an output
Val1 Val2 Val3 Val4
1 1 2 1 NA
2 2 NA 2 NA
3 3 NA 3 3
4 4 NA 4 NA
5 5 NA 5 NA
6 6 4 6 NA
7 7 NA 7 NA
8 8 NA 8 NA
9 9 NA 9 NA
10 10 NA 10 5
11 11 2 11 NA
12 12 NA 12 NA
13 13 NA 13 NA
14 14 NA 14 NA
15 15 NA 15 7
16 16 5 16 NA
17 17 NA 17 NA
18 18 NA 18 NA
19 19 NA 19 NA
20 20 NA 20 3
21 21 11 21 NA
22 22 NA 22 NA
23 23 NA 23 NA
24 24 NA 24 NA
25 25 NA 25 7
26 26 3 26 NA
27 27 NA 27 NA
28 28 NA 28 NA
29 29 NA 29 NA
30 30 NA 30 5
31 31 5 31 NA
32 32 NA 32 NA
33 33 NA 33 NA
34 34 NA 34 NA
35 35 NA 35 7
36 36 3 36 NA
37 37 NA 37 NA
38 38 NA 38 NA
39 39 NA 39 NA
40 40 NA 40 8
How do I accomplish this. I have created the following code but it involves creating a second dataframe and then copying data from the first to the second. Is there a way to overwrite the existing dataframe. I would like to avoid loops
df2<-data.frame('Val1'=
seq(from=min(na.omit(c(df$Val1, df$Val3))), to= max(na.omit(c(df$Val1,
df$Val3))), by=1), "Val3"=seq(from=min(na.omit(c(df$Val1, df$Val3))), to=
max(na.omit(c(df$Val1, df$Val3))), by=1))
###### Create two loops
for(i in df$Val1){
for(j in df2$Val1){
if(i==j){
df2$Val2[df2$Val1==j]=df$Val2[df$Val1==i]
} else{df2$Val2[df2$Val1==j]=NA}}}
for(i in df$Val3){ for(j in df2$Val3){
if(i==j){df2$Val4[df2$Val3==j]=df$Val4[df$Val3==i]
} else{df2$Val4[df2$Val3==j]=NA}}}
Is there a faster vectorised way to accomplish the same. requesting some one to help
Assuming there's a slight error in your output example (row 3 should show NA for Val4 and the 3 in row 3 should be in row 5), this works:
library(tidyverse)
df_new <- bind_cols(
df %>%
select(Val1, Val2) %>%
complete(., expand(., Val1 = 1:40)),
df %>%
select(Val3, Val4) %>%
complete(., expand(., Val3 = 1:40))
)
> df_new
# A tibble: 40 x 4
Val1 Val2 Val3 Val4
<dbl> <dbl> <dbl> <dbl>
1 1 2 1 NA
2 2 NA 2 NA
3 3 NA 3 NA
4 4 NA 4 NA
5 5 NA 5 3
6 6 4 6 NA
7 7 NA 7 NA
8 8 NA 8 NA
9 9 NA 9 NA
10 10 NA 10 5
# ... with 30 more rows
We use bind_cols() to put together two parts of the dataframe:
First we select the first two columns, expand() the causal variable and complete() the data, then we do it again for the third and fourth column.

Rank instances by missing amount in descending order

I want to sort this dataset as (rank instances by missing amount in descending order)
can someone help me how to do it in R language , is there any command to do it in r .
df=data.frame(x=c(1,4,6,NA,7,NA,9,10,4,NA),
y=c(10,12,NA,NA,14,18,20,15,12,17),
z=c(225,198,NA,NA,NA,130,NA,200,NA,99),
v=c(44,51,NA,NA,45,NA,25,36,75,NA))
df
x y z v
1 1 10 225 44
2 4 12 198 51
3 6 NA NA NA
4 NA NA NA NA
5 7 14 NA 45
6 NA 18 130 NA
7 9 20 NA 25
8 10 15 200 36
9 4 12 NA 75
10 NA 17 99 NA
I want to get this result :
x y z v
4 NA NA NA NA
3 6 NA NA NA
6 NA 18 130 NA
10 NA 17 99 NA
5 7 14 NA 45
7 9 20 NA 25
9 4 12 NA 75
1 1 10 225 44
2 4 12 198 51
8 10 15 200 36
In my comment I incorrectly remembered the name of the argument for changing the direction of an order result. The fix is simply to use the correct name:
> df[ order(rowSums(is.na(df)), decreasing=TRUE), ]
x y z v
4 NA NA NA NA
3 6 NA NA NA
6 NA 18 130 NA
10 NA 17 99 NA
5 7 14 NA 45
7 9 20 NA 25
9 4 12 NA 75
1 1 10 225 44
2 4 12 198 51
8 10 15 200 36

Resources