For the following example:
set.seed(24)
D <- data.frame(Team=sample(LETTERS[1:6],100,TRUE),stringsAsFactors=FALSE)
If i want to find the first row at which all players have had 1 turn, then the following works:
max(match(unique(D$Team),D$Team))
# [1] 18
but what if i want to find the first row when teams have played 2 games, or 3 games or more? Im stuck on how to do this, I guess what I would be looking for is the first index, i, in which all elements of table(D$Team)[1:i] are greater than 2, 3, 4. But this is quite slow and clunky
You could add a column with the total number of matches played by a team and then use max(which(...)) to interrogate a given amount :
D$Matches <- vapply(1:nrow(D),FUN = function(r)sum(D$Team[1:r] == D$Team[r]),1)
getWhenAllTeamsHavePlayedNMatches <- function(nMatches){
if(sum(D$Matches == nMatches) == length(unique(D$Team))){
return(max(which(D$Matches == nMatches)))
}
return(NA)
}
getWhenAllTeamsHavePlayedNMatches(4)
# e.g. returns 42
If you want to precalculate all values and add a column to D :
D$Matches <- vapply(1:nrow(D),FUN = function(r)sum(D$Team[1:r] == D$Team[r]),1)
nTeams <- length(unique(D$Team))
D$NumMatchesWithAllTeam <- vapply(1:nrow(D),
FUN = function(r) {
if(sum(D$Matches[1:r] == D$Matches[r]) == nTeams)
return(D$Matches[r])
return(NA)
}
,1)
Resulting data.frame :
> D
Team Matches NumMatchesWithAllTeam
1 B 1 NA
2 B 2 NA
3 E 1 NA
4 D 1 NA
5 D 2 NA
6 F 1 NA
7 B 3 NA
8 E 2 NA
9 E 3 NA
10 B 4 NA
11 D 3 NA
12 C 1 NA
13 E 4 NA
14 E 5 NA
15 B 5 NA
16 F 2 NA
17 B 6 NA
18 A 1 1
19 D 4 NA
20 A 2 NA
21 A 3 NA
22 D 5 NA
23 E 6 NA
24 A 4 NA
25 B 7 NA
26 E 7 NA
27 A 5 NA
28 D 6 NA
29 D 7 NA
30 A 6 NA
31 B 8 NA
32 B 9 NA
33 C 2 2
34 A 7 NA
35 F 3 NA
36 B 10 NA
37 E 8 NA
38 D 8 NA
39 E 9 NA
40 F 4 NA
41 C 3 3
42 C 4 4
43 B 11 NA
44 B 12 NA
45 A 8 NA
46 A 9 NA
47 C 5 NA
48 C 6 NA
49 B 13 NA
50 C 7 NA
51 C 8 NA
52 F 5 5
53 C 9 NA
54 E 10 NA
55 D 9 NA
56 F 6 6
57 C 10 NA
58 B 14 NA
59 B 15 NA
60 A 10 NA
61 C 11 NA
62 B 16 NA
63 B 17 NA
64 A 11 NA
65 E 11 NA
66 B 18 NA
67 F 7 7
68 F 8 8
69 E 12 NA
70 C 12 NA
71 A 12 NA
72 B 19 NA
73 A 13 NA
74 F 9 9
75 D 10 NA
76 C 13 NA
77 D 11 NA
78 E 13 NA
79 A 14 NA
80 E 14 NA
81 D 12 NA
82 A 15 NA
83 D 13 NA
84 B 20 NA
85 C 14 NA
86 C 15 NA
87 B 21 NA
88 F 10 10
89 C 16 NA
90 F 11 11
91 B 22 NA
92 E 15 NA
93 F 12 12
94 A 16 NA
95 C 17 NA
96 D 14 NA
97 D 15 NA
98 A 17 NA
99 C 18 NA
100 C 19 NA
Related
My dataset has 2 IDs respectively from a parent and a child but I don't know which is who. I have however their age
This is the table I am working with:
ID1 ID2 sex1 sex2 age1 age2
1 8 9 1 2 44 11
2 17 7 1 1 56 76
3 1 44 NA NA 16 55
4 3 13 NA NA NA NA
5 55 6 2 NA 56 10
6 4 33 2 NA 45 9
7 2 66 1 NA 12 45
8 72 99 NA NA NA NA
9 12 11 2 2 30 12
By using an if statement, I want to identify who's who according to their age.
Here is the code I made but it is not working:
install.packages('seqinr')
library(seqinr)
for (i in 1:nrow(data)){
if (data$age2[i]> data$age1[i]){
swap(data$age1[i], data$age2[i])
}
}
The error message:
Error in if (data$age2[i] > data$age1[i]) { :
missing value where TRUE/FALSE needed
I want to put the parents' age in age1 and the child's age in age2.
Does someone has a better idea on how to do it?
Welcome to SO!
You can manage it without any for loop, in case you only need to put the highest value in age1, and the lower value in age2, comparing by row the two columns:
# I've put age_* to compare results with data, to replace, use age* in df$age*
df$age_1 <- pmax(df$age1, df$age2)
df$age_2 <- pmin(df$age1, df$age2)
With result:
ID1 ID2 sex1 sex2 age1 age2 age_1 age_2
1 8 9 1 2 44 11 44 11
2 17 7 1 1 56 76 76 56
3 1 44 NA NA 16 55 55 16
4 3 13 NA NA NA NA NA NA
5 55 6 2 NA 56 10 56 10
6 4 33 2 NA 45 9 45 9
7 2 66 1 NA 12 45 45 12
8 72 99 NA NA NA NA NA NA
9 12 11 2 2 30 12 30 12
With data:
df <- read.table(text = 'ID1 ID2 sex1 sex2 age1 age2
1 8 9 1 2 44 11
2 17 7 1 1 56 76
3 1 44 NA NA 16 55
4 3 13 NA NA NA NA
5 55 6 2 NA 56 10
6 4 33 2 NA 45 9
7 2 66 1 NA 12 45
8 72 99 NA NA NA NA
9 12 11 2 2 30 12', header = T)
library(tidyverse)
df <- read_table(
"ID1 ID2 sex1 sex2 age1 age2
8 9 1 2 44 11
17 7 1 1 56 76
1 44 NA NA 16 55
3 13 NA NA NA NA
55 6 2 NA 56 10
4 33 2 NA 45 9
2 66 1 NA 12 45
72 99 NA NA NA NA
12 11 2 2 30 12"
)
Method 1:
df %>%
transform(age1 = case_when(age1 > age2 ~ age1,
TRUE ~ age2),
age2 = case_when(age2 > age1 ~ age2,
TRUE ~ age1))
Method 2:
df %>%
transform(age1 = pmax(age1, age2),
age2 = pmin(age1, age2))
ID1 ID2 sex1 sex2 age1 age2
1 8 9 1 2 44 11
2 17 7 1 1 76 56
3 1 44 NA NA 55 16
4 3 13 NA NA NA NA
5 55 6 2 NA 56 10
6 4 33 2 NA 45 9
7 2 66 1 NA 45 12
8 72 99 NA NA NA NA
9 12 11 2 2 30 12
I need to replace all values of rows (in range) into NA. How can I do it?
For example:
x <- c(1:30)
y <- c("a","b","c")
z <- rep(3)
df1 <- data.frame(x,y,z)
I need to replace all values of the rows (1:10) into NA
We can use row index for assignment
df1[1:10, ] <- NA
-output
df1
x y z
1 NA <NA> NA
2 NA <NA> NA
3 NA <NA> NA
4 NA <NA> NA
5 NA <NA> NA
6 NA <NA> NA
7 NA <NA> NA
8 NA <NA> NA
9 NA <NA> NA
10 NA <NA> NA
11 11 b 3
12 12 c 3
13 13 a 3
14 14 b 3
15 15 c 3
16 16 a 3
17 17 b 3
18 18 c 3
19 19 a 3
20 20 b 3
21 21 c 3
22 22 a 3
23 23 b 3
24 24 c 3
25 25 a 3
26 26 b 3
27 27 c 3
28 28 a 3
29 29 b 3
30 30 c 3
I have the following dataframe in R
df<-data.frame(
"Val1"=seq(from=1, to=40, by=5), 'Val2'=c(2,4,2,5,11,3,5,3),
"Val3"=seq(from=5, to=40, by=5), "Val4"=c(3,5,7,3,7,5,7,8))
The resulting dataframe looks as follows. Val 1, Val3 are the causal variables and Val2, Val4 are the dependent variables
Val1 Val2 Val3 Val4
1 1 2 5 3
2 6 4 10 5
3 11 2 15 7
4 16 5 20 3
5 21 11 25 7
6 26 3 30 5
7 31 5 35 7
8 36 3 40 8
I wish to obtain the following dataframe as an output
Val1 Val2 Val3 Val4
1 1 2 1 NA
2 2 NA 2 NA
3 3 NA 3 3
4 4 NA 4 NA
5 5 NA 5 NA
6 6 4 6 NA
7 7 NA 7 NA
8 8 NA 8 NA
9 9 NA 9 NA
10 10 NA 10 5
11 11 2 11 NA
12 12 NA 12 NA
13 13 NA 13 NA
14 14 NA 14 NA
15 15 NA 15 7
16 16 5 16 NA
17 17 NA 17 NA
18 18 NA 18 NA
19 19 NA 19 NA
20 20 NA 20 3
21 21 11 21 NA
22 22 NA 22 NA
23 23 NA 23 NA
24 24 NA 24 NA
25 25 NA 25 7
26 26 3 26 NA
27 27 NA 27 NA
28 28 NA 28 NA
29 29 NA 29 NA
30 30 NA 30 5
31 31 5 31 NA
32 32 NA 32 NA
33 33 NA 33 NA
34 34 NA 34 NA
35 35 NA 35 7
36 36 3 36 NA
37 37 NA 37 NA
38 38 NA 38 NA
39 39 NA 39 NA
40 40 NA 40 8
How do I accomplish this. I have created the following code but it involves creating a second dataframe and then copying data from the first to the second. Is there a way to overwrite the existing dataframe. I would like to avoid loops
df2<-data.frame('Val1'=
seq(from=min(na.omit(c(df$Val1, df$Val3))), to= max(na.omit(c(df$Val1,
df$Val3))), by=1), "Val3"=seq(from=min(na.omit(c(df$Val1, df$Val3))), to=
max(na.omit(c(df$Val1, df$Val3))), by=1))
###### Create two loops
for(i in df$Val1){
for(j in df2$Val1){
if(i==j){
df2$Val2[df2$Val1==j]=df$Val2[df$Val1==i]
} else{df2$Val2[df2$Val1==j]=NA}}}
for(i in df$Val3){ for(j in df2$Val3){
if(i==j){df2$Val4[df2$Val3==j]=df$Val4[df$Val3==i]
} else{df2$Val4[df2$Val3==j]=NA}}}
Is there a faster vectorised way to accomplish the same. requesting some one to help
Assuming there's a slight error in your output example (row 3 should show NA for Val4 and the 3 in row 3 should be in row 5), this works:
library(tidyverse)
df_new <- bind_cols(
df %>%
select(Val1, Val2) %>%
complete(., expand(., Val1 = 1:40)),
df %>%
select(Val3, Val4) %>%
complete(., expand(., Val3 = 1:40))
)
> df_new
# A tibble: 40 x 4
Val1 Val2 Val3 Val4
<dbl> <dbl> <dbl> <dbl>
1 1 2 1 NA
2 2 NA 2 NA
3 3 NA 3 NA
4 4 NA 4 NA
5 5 NA 5 3
6 6 4 6 NA
7 7 NA 7 NA
8 8 NA 8 NA
9 9 NA 9 NA
10 10 NA 10 5
# ... with 30 more rows
We use bind_cols() to put together two parts of the dataframe:
First we select the first two columns, expand() the causal variable and complete() the data, then we do it again for the third and fourth column.
I want to sort this dataset as (rank instances by missing amount in descending order)
can someone help me how to do it in R language , is there any command to do it in r .
df=data.frame(x=c(1,4,6,NA,7,NA,9,10,4,NA),
y=c(10,12,NA,NA,14,18,20,15,12,17),
z=c(225,198,NA,NA,NA,130,NA,200,NA,99),
v=c(44,51,NA,NA,45,NA,25,36,75,NA))
df
x y z v
1 1 10 225 44
2 4 12 198 51
3 6 NA NA NA
4 NA NA NA NA
5 7 14 NA 45
6 NA 18 130 NA
7 9 20 NA 25
8 10 15 200 36
9 4 12 NA 75
10 NA 17 99 NA
I want to get this result :
x y z v
4 NA NA NA NA
3 6 NA NA NA
6 NA 18 130 NA
10 NA 17 99 NA
5 7 14 NA 45
7 9 20 NA 25
9 4 12 NA 75
1 1 10 225 44
2 4 12 198 51
8 10 15 200 36
In my comment I incorrectly remembered the name of the argument for changing the direction of an order result. The fix is simply to use the correct name:
> df[ order(rowSums(is.na(df)), decreasing=TRUE), ]
x y z v
4 NA NA NA NA
3 6 NA NA NA
6 NA 18 130 NA
10 NA 17 99 NA
5 7 14 NA 45
7 9 20 NA 25
9 4 12 NA 75
1 1 10 225 44
2 4 12 198 51
8 10 15 200 36
I have 2 data frames with different numbers of rows (A has 55 and B has 41). I would like to take the Py values from data frame B and put them into A$Py corresponding to the "Link".
I tried
link.list <- A$Link
for(i in 1:length(link.list)){
A$Py[i] <- B[which(B$Link==link.list[i]), "Py"]
}
But get:
Error in A$Py[i] <- B[which(B$Link == link.list[i]), "Py"] :
replacement has length zero
I assume this error is triggered when there is a A$Link that is not in B. Any ideas solving this problem?
Thanks
data frame A:
Link VU Py
1 DVH1-1 1 NA
2 DVH1-10 9 NA
3 DVH1-2 1 NA
4 DVH1-3 1 NA
5 DVH1-4 9 NA
6 DVH1-5 9 NA
7 DVH1-6 1 NA
8 DVH1-7 1 NA
9 DVH1-8 10 NA
10 DVH1-9 10 NA
11 DVH2-1 2 NA
12 DVH2-2 1 NA
13 DVH2-3 9 NA
14 DVH2-4 9 NA
15 DVH2-5 10 NA
16 DVH2-6 9 NA
17 DVH2-7 4 NA
18 DVH2-8 9 NA
19 DVH3-1 1 NA
20 DVH3-2 12 NA
21 DVH3-3 12 NA
22 DWH1-1 4 NA
23 DWH1-10 8 NA
24 DWH1-2 4 NA
25 DWH1-3 4 NA
26 DWH1-4 8 NA
27 DWH1-5 8 NA
28 DWH1-6 4 NA
29 DWH1-7 4 NA
30 DWH1-8 9 NA
31 DWH1-9 9 NA
32 DWH2-1 4 NA
33 DWH2-2 4 NA
34 DWH2-3 8 NA
35 DWH2-4 8 NA
36 DWH2-5 8 NA
37 DWH2-6 8 NA
38 DWH2-7 7 NA
39 DWH2-8 5 NA
40 DWH3-1 3 NA
41 DWH3-2 49 NA
42 DWH3-3 0 NA
43 MH1-1 0 NA
44 MH1-2 1 NA
45 MH1-3 1 NA
46 MH1-4 1 NA
47 MH1-5 1 NA
48 UH1-1 17 NA
49 UH1-2 17 NA
50 UH1-3 17 NA
51 UH1-4 19 NA
52 UH2-1 4 NA
53 UH2-2 15 NA
54 UH3-1 24 NA
55 UH3-2 25 NA
data frame B:
Link Py
1 DVH1-1 0
2 DVH1-10 4
3 DVH1-2 0
4 DVH1-3 14
5 DVH1-4 0
6 DVH1-5 2
7 DVH1-6 12
8 DVH1-7 11
9 DVH1-8 9
10 DVH1-9 9
11 DVH2-1 0
12 DVH2-2 14
13 DVH2-3 3
14 DVH2-4 0
15 DVH2-5 10
16 DVH2-6 0
17 DVH2-7 2
18 DVH2-8 4
19 DVH3-1 16
20 DVH3-3 8
21 DWH1-1 6
22 DWH1-10 2
23 DWH1-2 0
24 DWH1-3 7
25 DWH1-5 0
26 DWH1-6 12
27 DWH1-7 10
28 DWH1-8 0
29 DWH1-9 3
30 DWH2-1 0
31 DWH2-2 10
32 DWH2-7 0
33 DWH2-8 9
34 DWH3-1 0
35 DWH3-2 0
36 MH1-1 0
37 UH1-3 6
38 UH1-4 4
39 UH2-1 0
40 UH2-2 9
41 UH3-2 4
Use merge and merge by Link, all.x will return all rows for x (in your case x= A).
I've only passed the first two columns of A, as A$pY in your example were all NA
merge(A[,1:2],B,by='Link', all.x = TRUE)
> head(a)
X Link VU Py
1 1 DVH1-1 1 NA
2 2 DVH1-10 9 NA
3 3 DVH1-2 1 NA
4 4 DVH1-3 1 NA
5 5 DVH1-4 9 NA
6 6 DVH1-5 9 NA
> head(b)
X Link Py
1 1 DVH1-1 0
2 2 DVH1-10 4
3 3 DVH1-2 0
4 4 DVH1-3 14
5 5 DVH1-4 0
6 6 DVH1-5 2
a[a$Link %in% b$Link,5]<-b[a$Link %in% b$Link,3]
names(a)[5]<-"Py1"
> head(a)
X Link VU Py Py1
1 1 DVH1-1 1 NA 0
2 2 DVH1-10 9 NA 4
3 3 DVH1-2 1 NA 0
4 4 DVH1-3 1 NA 14
5 5 DVH1-4 9 NA 0
6 6 DVH1-5 9 NA 2