I have the following dataframe in R
df<-data.frame(
"Val1"=seq(from=1, to=40, by=5), 'Val2'=c(2,4,2,5,11,3,5,3),
"Val3"=seq(from=5, to=40, by=5), "Val4"=c(3,5,7,3,7,5,7,8))
The resulting dataframe looks as follows. Val 1, Val3 are the causal variables and Val2, Val4 are the dependent variables
Val1 Val2 Val3 Val4
1 1 2 5 3
2 6 4 10 5
3 11 2 15 7
4 16 5 20 3
5 21 11 25 7
6 26 3 30 5
7 31 5 35 7
8 36 3 40 8
I wish to obtain the following dataframe as an output
Val1 Val2 Val3 Val4
1 1 2 1 NA
2 2 NA 2 NA
3 3 NA 3 3
4 4 NA 4 NA
5 5 NA 5 NA
6 6 4 6 NA
7 7 NA 7 NA
8 8 NA 8 NA
9 9 NA 9 NA
10 10 NA 10 5
11 11 2 11 NA
12 12 NA 12 NA
13 13 NA 13 NA
14 14 NA 14 NA
15 15 NA 15 7
16 16 5 16 NA
17 17 NA 17 NA
18 18 NA 18 NA
19 19 NA 19 NA
20 20 NA 20 3
21 21 11 21 NA
22 22 NA 22 NA
23 23 NA 23 NA
24 24 NA 24 NA
25 25 NA 25 7
26 26 3 26 NA
27 27 NA 27 NA
28 28 NA 28 NA
29 29 NA 29 NA
30 30 NA 30 5
31 31 5 31 NA
32 32 NA 32 NA
33 33 NA 33 NA
34 34 NA 34 NA
35 35 NA 35 7
36 36 3 36 NA
37 37 NA 37 NA
38 38 NA 38 NA
39 39 NA 39 NA
40 40 NA 40 8
How do I accomplish this. I have created the following code but it involves creating a second dataframe and then copying data from the first to the second. Is there a way to overwrite the existing dataframe. I would like to avoid loops
df2<-data.frame('Val1'=
seq(from=min(na.omit(c(df$Val1, df$Val3))), to= max(na.omit(c(df$Val1,
df$Val3))), by=1), "Val3"=seq(from=min(na.omit(c(df$Val1, df$Val3))), to=
max(na.omit(c(df$Val1, df$Val3))), by=1))
###### Create two loops
for(i in df$Val1){
for(j in df2$Val1){
if(i==j){
df2$Val2[df2$Val1==j]=df$Val2[df$Val1==i]
} else{df2$Val2[df2$Val1==j]=NA}}}
for(i in df$Val3){ for(j in df2$Val3){
if(i==j){df2$Val4[df2$Val3==j]=df$Val4[df$Val3==i]
} else{df2$Val4[df2$Val3==j]=NA}}}
Is there a faster vectorised way to accomplish the same. requesting some one to help
Assuming there's a slight error in your output example (row 3 should show NA for Val4 and the 3 in row 3 should be in row 5), this works:
library(tidyverse)
df_new <- bind_cols(
df %>%
select(Val1, Val2) %>%
complete(., expand(., Val1 = 1:40)),
df %>%
select(Val3, Val4) %>%
complete(., expand(., Val3 = 1:40))
)
> df_new
# A tibble: 40 x 4
Val1 Val2 Val3 Val4
<dbl> <dbl> <dbl> <dbl>
1 1 2 1 NA
2 2 NA 2 NA
3 3 NA 3 NA
4 4 NA 4 NA
5 5 NA 5 3
6 6 4 6 NA
7 7 NA 7 NA
8 8 NA 8 NA
9 9 NA 9 NA
10 10 NA 10 5
# ... with 30 more rows
We use bind_cols() to put together two parts of the dataframe:
First we select the first two columns, expand() the causal variable and complete() the data, then we do it again for the third and fourth column.
Related
My dataset has 2 IDs respectively from a parent and a child but I don't know which is who. I have however their age
This is the table I am working with:
ID1 ID2 sex1 sex2 age1 age2
1 8 9 1 2 44 11
2 17 7 1 1 56 76
3 1 44 NA NA 16 55
4 3 13 NA NA NA NA
5 55 6 2 NA 56 10
6 4 33 2 NA 45 9
7 2 66 1 NA 12 45
8 72 99 NA NA NA NA
9 12 11 2 2 30 12
By using an if statement, I want to identify who's who according to their age.
Here is the code I made but it is not working:
install.packages('seqinr')
library(seqinr)
for (i in 1:nrow(data)){
if (data$age2[i]> data$age1[i]){
swap(data$age1[i], data$age2[i])
}
}
The error message:
Error in if (data$age2[i] > data$age1[i]) { :
missing value where TRUE/FALSE needed
I want to put the parents' age in age1 and the child's age in age2.
Does someone has a better idea on how to do it?
Welcome to SO!
You can manage it without any for loop, in case you only need to put the highest value in age1, and the lower value in age2, comparing by row the two columns:
# I've put age_* to compare results with data, to replace, use age* in df$age*
df$age_1 <- pmax(df$age1, df$age2)
df$age_2 <- pmin(df$age1, df$age2)
With result:
ID1 ID2 sex1 sex2 age1 age2 age_1 age_2
1 8 9 1 2 44 11 44 11
2 17 7 1 1 56 76 76 56
3 1 44 NA NA 16 55 55 16
4 3 13 NA NA NA NA NA NA
5 55 6 2 NA 56 10 56 10
6 4 33 2 NA 45 9 45 9
7 2 66 1 NA 12 45 45 12
8 72 99 NA NA NA NA NA NA
9 12 11 2 2 30 12 30 12
With data:
df <- read.table(text = 'ID1 ID2 sex1 sex2 age1 age2
1 8 9 1 2 44 11
2 17 7 1 1 56 76
3 1 44 NA NA 16 55
4 3 13 NA NA NA NA
5 55 6 2 NA 56 10
6 4 33 2 NA 45 9
7 2 66 1 NA 12 45
8 72 99 NA NA NA NA
9 12 11 2 2 30 12', header = T)
library(tidyverse)
df <- read_table(
"ID1 ID2 sex1 sex2 age1 age2
8 9 1 2 44 11
17 7 1 1 56 76
1 44 NA NA 16 55
3 13 NA NA NA NA
55 6 2 NA 56 10
4 33 2 NA 45 9
2 66 1 NA 12 45
72 99 NA NA NA NA
12 11 2 2 30 12"
)
Method 1:
df %>%
transform(age1 = case_when(age1 > age2 ~ age1,
TRUE ~ age2),
age2 = case_when(age2 > age1 ~ age2,
TRUE ~ age1))
Method 2:
df %>%
transform(age1 = pmax(age1, age2),
age2 = pmin(age1, age2))
ID1 ID2 sex1 sex2 age1 age2
1 8 9 1 2 44 11
2 17 7 1 1 76 56
3 1 44 NA NA 55 16
4 3 13 NA NA NA NA
5 55 6 2 NA 56 10
6 4 33 2 NA 45 9
7 2 66 1 NA 45 12
8 72 99 NA NA NA NA
9 12 11 2 2 30 12
I need to replace all values of rows (in range) into NA. How can I do it?
For example:
x <- c(1:30)
y <- c("a","b","c")
z <- rep(3)
df1 <- data.frame(x,y,z)
I need to replace all values of the rows (1:10) into NA
We can use row index for assignment
df1[1:10, ] <- NA
-output
df1
x y z
1 NA <NA> NA
2 NA <NA> NA
3 NA <NA> NA
4 NA <NA> NA
5 NA <NA> NA
6 NA <NA> NA
7 NA <NA> NA
8 NA <NA> NA
9 NA <NA> NA
10 NA <NA> NA
11 11 b 3
12 12 c 3
13 13 a 3
14 14 b 3
15 15 c 3
16 16 a 3
17 17 b 3
18 18 c 3
19 19 a 3
20 20 b 3
21 21 c 3
22 22 a 3
23 23 b 3
24 24 c 3
25 25 a 3
26 26 b 3
27 27 c 3
28 28 a 3
29 29 b 3
30 30 c 3
I try to apply a function to a column of a dataframe but when I do this i got a column full of NA values. I don't understand why.
Here is my code :
courbe <- function(x) exp(coef(regression)[1]*x+coef(regression[2]))
dataT[,c(2)] <- courbe(dataT[,c(1)])
And here my dataframe :
DateRep Cases
1 25 NA
2 24 NA
3 23 NA
4 22 NA
5 21 NA
6 20 NA
7 19 NA
8 18 NA
9 17 NA
10 16 NA
11 15 NA
12 14 NA
13 13 NA
14 12 NA
15 11 NA
16 10 NA
17 9 NA
18 8 NA
19 7 NA
20 6 NA
21 5 NA
22 4 NA
23 3 NA
24 2 NA
25 1 NA
26 0 NA
The output of print(coef(regression)) :
Coefficients:
(Intercept) dataT$DateRep
2.7095 0.2211
As figured out in the comments, the mistake was in the placement of indices coef(regression)[1] and coef(regression[2]).
I want to sort this dataset as (rank instances by missing amount in descending order)
can someone help me how to do it in R language , is there any command to do it in r .
df=data.frame(x=c(1,4,6,NA,7,NA,9,10,4,NA),
y=c(10,12,NA,NA,14,18,20,15,12,17),
z=c(225,198,NA,NA,NA,130,NA,200,NA,99),
v=c(44,51,NA,NA,45,NA,25,36,75,NA))
df
x y z v
1 1 10 225 44
2 4 12 198 51
3 6 NA NA NA
4 NA NA NA NA
5 7 14 NA 45
6 NA 18 130 NA
7 9 20 NA 25
8 10 15 200 36
9 4 12 NA 75
10 NA 17 99 NA
I want to get this result :
x y z v
4 NA NA NA NA
3 6 NA NA NA
6 NA 18 130 NA
10 NA 17 99 NA
5 7 14 NA 45
7 9 20 NA 25
9 4 12 NA 75
1 1 10 225 44
2 4 12 198 51
8 10 15 200 36
In my comment I incorrectly remembered the name of the argument for changing the direction of an order result. The fix is simply to use the correct name:
> df[ order(rowSums(is.na(df)), decreasing=TRUE), ]
x y z v
4 NA NA NA NA
3 6 NA NA NA
6 NA 18 130 NA
10 NA 17 99 NA
5 7 14 NA 45
7 9 20 NA 25
9 4 12 NA 75
1 1 10 225 44
2 4 12 198 51
8 10 15 200 36
I have 2 data frames with different numbers of rows (A has 55 and B has 41). I would like to take the Py values from data frame B and put them into A$Py corresponding to the "Link".
I tried
link.list <- A$Link
for(i in 1:length(link.list)){
A$Py[i] <- B[which(B$Link==link.list[i]), "Py"]
}
But get:
Error in A$Py[i] <- B[which(B$Link == link.list[i]), "Py"] :
replacement has length zero
I assume this error is triggered when there is a A$Link that is not in B. Any ideas solving this problem?
Thanks
data frame A:
Link VU Py
1 DVH1-1 1 NA
2 DVH1-10 9 NA
3 DVH1-2 1 NA
4 DVH1-3 1 NA
5 DVH1-4 9 NA
6 DVH1-5 9 NA
7 DVH1-6 1 NA
8 DVH1-7 1 NA
9 DVH1-8 10 NA
10 DVH1-9 10 NA
11 DVH2-1 2 NA
12 DVH2-2 1 NA
13 DVH2-3 9 NA
14 DVH2-4 9 NA
15 DVH2-5 10 NA
16 DVH2-6 9 NA
17 DVH2-7 4 NA
18 DVH2-8 9 NA
19 DVH3-1 1 NA
20 DVH3-2 12 NA
21 DVH3-3 12 NA
22 DWH1-1 4 NA
23 DWH1-10 8 NA
24 DWH1-2 4 NA
25 DWH1-3 4 NA
26 DWH1-4 8 NA
27 DWH1-5 8 NA
28 DWH1-6 4 NA
29 DWH1-7 4 NA
30 DWH1-8 9 NA
31 DWH1-9 9 NA
32 DWH2-1 4 NA
33 DWH2-2 4 NA
34 DWH2-3 8 NA
35 DWH2-4 8 NA
36 DWH2-5 8 NA
37 DWH2-6 8 NA
38 DWH2-7 7 NA
39 DWH2-8 5 NA
40 DWH3-1 3 NA
41 DWH3-2 49 NA
42 DWH3-3 0 NA
43 MH1-1 0 NA
44 MH1-2 1 NA
45 MH1-3 1 NA
46 MH1-4 1 NA
47 MH1-5 1 NA
48 UH1-1 17 NA
49 UH1-2 17 NA
50 UH1-3 17 NA
51 UH1-4 19 NA
52 UH2-1 4 NA
53 UH2-2 15 NA
54 UH3-1 24 NA
55 UH3-2 25 NA
data frame B:
Link Py
1 DVH1-1 0
2 DVH1-10 4
3 DVH1-2 0
4 DVH1-3 14
5 DVH1-4 0
6 DVH1-5 2
7 DVH1-6 12
8 DVH1-7 11
9 DVH1-8 9
10 DVH1-9 9
11 DVH2-1 0
12 DVH2-2 14
13 DVH2-3 3
14 DVH2-4 0
15 DVH2-5 10
16 DVH2-6 0
17 DVH2-7 2
18 DVH2-8 4
19 DVH3-1 16
20 DVH3-3 8
21 DWH1-1 6
22 DWH1-10 2
23 DWH1-2 0
24 DWH1-3 7
25 DWH1-5 0
26 DWH1-6 12
27 DWH1-7 10
28 DWH1-8 0
29 DWH1-9 3
30 DWH2-1 0
31 DWH2-2 10
32 DWH2-7 0
33 DWH2-8 9
34 DWH3-1 0
35 DWH3-2 0
36 MH1-1 0
37 UH1-3 6
38 UH1-4 4
39 UH2-1 0
40 UH2-2 9
41 UH3-2 4
Use merge and merge by Link, all.x will return all rows for x (in your case x= A).
I've only passed the first two columns of A, as A$pY in your example were all NA
merge(A[,1:2],B,by='Link', all.x = TRUE)
> head(a)
X Link VU Py
1 1 DVH1-1 1 NA
2 2 DVH1-10 9 NA
3 3 DVH1-2 1 NA
4 4 DVH1-3 1 NA
5 5 DVH1-4 9 NA
6 6 DVH1-5 9 NA
> head(b)
X Link Py
1 1 DVH1-1 0
2 2 DVH1-10 4
3 3 DVH1-2 0
4 4 DVH1-3 14
5 5 DVH1-4 0
6 6 DVH1-5 2
a[a$Link %in% b$Link,5]<-b[a$Link %in% b$Link,3]
names(a)[5]<-"Py1"
> head(a)
X Link VU Py Py1
1 1 DVH1-1 1 NA 0
2 2 DVH1-10 9 NA 4
3 3 DVH1-2 1 NA 0
4 4 DVH1-3 1 NA 14
5 5 DVH1-4 9 NA 0
6 6 DVH1-5 9 NA 2