I need to replace all values of rows (in range) into NA. How can I do it?
For example:
x <- c(1:30)
y <- c("a","b","c")
z <- rep(3)
df1 <- data.frame(x,y,z)
I need to replace all values of the rows (1:10) into NA
We can use row index for assignment
df1[1:10, ] <- NA
-output
df1
x y z
1 NA <NA> NA
2 NA <NA> NA
3 NA <NA> NA
4 NA <NA> NA
5 NA <NA> NA
6 NA <NA> NA
7 NA <NA> NA
8 NA <NA> NA
9 NA <NA> NA
10 NA <NA> NA
11 11 b 3
12 12 c 3
13 13 a 3
14 14 b 3
15 15 c 3
16 16 a 3
17 17 b 3
18 18 c 3
19 19 a 3
20 20 b 3
21 21 c 3
22 22 a 3
23 23 b 3
24 24 c 3
25 25 a 3
26 26 b 3
27 27 c 3
28 28 a 3
29 29 b 3
30 30 c 3
Related
I try to apply a function to a column of a dataframe but when I do this i got a column full of NA values. I don't understand why.
Here is my code :
courbe <- function(x) exp(coef(regression)[1]*x+coef(regression[2]))
dataT[,c(2)] <- courbe(dataT[,c(1)])
And here my dataframe :
DateRep Cases
1 25 NA
2 24 NA
3 23 NA
4 22 NA
5 21 NA
6 20 NA
7 19 NA
8 18 NA
9 17 NA
10 16 NA
11 15 NA
12 14 NA
13 13 NA
14 12 NA
15 11 NA
16 10 NA
17 9 NA
18 8 NA
19 7 NA
20 6 NA
21 5 NA
22 4 NA
23 3 NA
24 2 NA
25 1 NA
26 0 NA
The output of print(coef(regression)) :
Coefficients:
(Intercept) dataT$DateRep
2.7095 0.2211
As figured out in the comments, the mistake was in the placement of indices coef(regression)[1] and coef(regression[2]).
I have the following panel data frame:
X1 X2 X3 X4 X5 Y1 Y2 Y3 Y4 Y5
Ind 1 7 NA NA NA NA 1 4 6 8 6
Ind 2 2 NA 16 NA NA 5 16 12 3 4
Ind 3 NA NA NA 19 92 13 NA 12 NA NA
Ind 4 32 5 12 3 5 NA NA NA NA 4
Ind 5 44 3 46 3 47 3 2 NA 3 4
Ind 6 NA 34 NA 8 NA 14 15 12 3 4
Ind 7 49 55 67 49 89 6 17 2 3 4
Ind 8 NA NA 49 NA NA 11 20 6 NA 4
Ind 9 1 1 5 NA 9 NA NA NA NA NA
In pastable format:
df <- read.table(text="Index_name,X1 X2 X3 X4 X5 Y1 Y2 Y3 Y4 Y5
Ind_1 7 NA NA NA NA 1 4 6 8 6
Ind_2 2 NA 16 NA NA 5 16 12 3 4
Ind_3 NA NA NA 19 92 13 NA 12 NA NA
Ind_4 32 5 12 3 5 NA NA NA NA 4
Ind_5 44 3 46 3 47 3 2 NA 3 4
Ind_6 NA 34 NA 8 NA 14 15 12 3 4
Ind_7 49 55 67 49 89 6 17 2 3 4
Ind_8 NA NA 49 NA NA 11 20 6 NA 4
Ind_9 1 1 5 NA 9 NA NA NA NA NA",row.names=1,
header=TRUE, stringsAsFactors=FALSE)
I want to filter out all rows that don't have at least 2 non-NA values in both the columns that start with X and the columns that start with Y.
For example:
Ind1: Drop (only 1 value in X1-X5)
Ind2: Keep (cause here there are at least 2 numbers in X)
Ind3: Keep cause both X and Y have 2 or more observations.
Ind4: Delete (only 1 value in Y1-Y5)
Ind5: Keep
Ind6: Keep
Ind7: Keep
Ind8: Delete (Only 1 value in X1-X5)
Ind9: Delete (though X is ok, Y is not okay.)
You could do this. Basically, you are counting (with rowSums), the number of non-NA data points first in x1-x5 and then in y1-y5. To indentify non-NAs, I use !is.na(). The ! is a negation, so the expression means "Not an NA". Finally, you are keeping only the rows where "row sum of non-NAs is >=2" for x1-x5 AND (&) for y1-y5. To be clear about the indexing, there are 10 columns in your data.frame. df[,1:5] represents the first 5 columns, which are x1-x5.
df[rowSums(!is.na(df[,1:5]))>=2 & rowSums(!is.na(df[,6:10]))>=2,]
X1 X2 X3 X4 X5 Y1 Y2 Y3 Y4 Y5
Ind_2 2 NA 16 NA NA 5 16 12 3 4
Ind_3 NA NA NA 19 92 13 NA 12 NA NA
Ind_5 44 3 46 3 47 3 2 NA 3 4
Ind_6 NA 34 NA 8 NA 14 15 12 3 4
Ind_7 49 55 67 49 89 6 17 2 3 4
DATA
df <- read.table(text="Index_name,X1 X2 X3 X4 X5 Y1 Y2 Y3 Y4 Y5
Ind_1 7 NA NA NA NA 1 4 6 8 6
Ind_2 2 NA 16 NA NA 5 16 12 3 4
Ind_3 NA NA NA 19 92 13 NA 12 NA NA
Ind_4 32 5 12 3 5 NA NA NA NA 4
Ind_5 44 3 46 3 47 3 2 NA 3 4
Ind_6 NA 34 NA 8 NA 14 15 12 3 4
Ind_7 49 55 67 49 89 6 17 2 3 4
Ind_8 NA NA 49 NA NA 11 20 6 NA 4
Ind_9 1 1 5 NA 9 NA NA NA NA NA",row.names=1,
header=TRUE, stringsAsFactors=FALSE)
I have the following dataframe in R
df<-data.frame(
"Val1"=seq(from=1, to=40, by=5), 'Val2'=c(2,4,2,5,11,3,5,3),
"Val3"=seq(from=5, to=40, by=5), "Val4"=c(3,5,7,3,7,5,7,8))
The resulting dataframe looks as follows. Val 1, Val3 are the causal variables and Val2, Val4 are the dependent variables
Val1 Val2 Val3 Val4
1 1 2 5 3
2 6 4 10 5
3 11 2 15 7
4 16 5 20 3
5 21 11 25 7
6 26 3 30 5
7 31 5 35 7
8 36 3 40 8
I wish to obtain the following dataframe as an output
Val1 Val2 Val3 Val4
1 1 2 1 NA
2 2 NA 2 NA
3 3 NA 3 3
4 4 NA 4 NA
5 5 NA 5 NA
6 6 4 6 NA
7 7 NA 7 NA
8 8 NA 8 NA
9 9 NA 9 NA
10 10 NA 10 5
11 11 2 11 NA
12 12 NA 12 NA
13 13 NA 13 NA
14 14 NA 14 NA
15 15 NA 15 7
16 16 5 16 NA
17 17 NA 17 NA
18 18 NA 18 NA
19 19 NA 19 NA
20 20 NA 20 3
21 21 11 21 NA
22 22 NA 22 NA
23 23 NA 23 NA
24 24 NA 24 NA
25 25 NA 25 7
26 26 3 26 NA
27 27 NA 27 NA
28 28 NA 28 NA
29 29 NA 29 NA
30 30 NA 30 5
31 31 5 31 NA
32 32 NA 32 NA
33 33 NA 33 NA
34 34 NA 34 NA
35 35 NA 35 7
36 36 3 36 NA
37 37 NA 37 NA
38 38 NA 38 NA
39 39 NA 39 NA
40 40 NA 40 8
How do I accomplish this. I have created the following code but it involves creating a second dataframe and then copying data from the first to the second. Is there a way to overwrite the existing dataframe. I would like to avoid loops
df2<-data.frame('Val1'=
seq(from=min(na.omit(c(df$Val1, df$Val3))), to= max(na.omit(c(df$Val1,
df$Val3))), by=1), "Val3"=seq(from=min(na.omit(c(df$Val1, df$Val3))), to=
max(na.omit(c(df$Val1, df$Val3))), by=1))
###### Create two loops
for(i in df$Val1){
for(j in df2$Val1){
if(i==j){
df2$Val2[df2$Val1==j]=df$Val2[df$Val1==i]
} else{df2$Val2[df2$Val1==j]=NA}}}
for(i in df$Val3){ for(j in df2$Val3){
if(i==j){df2$Val4[df2$Val3==j]=df$Val4[df$Val3==i]
} else{df2$Val4[df2$Val3==j]=NA}}}
Is there a faster vectorised way to accomplish the same. requesting some one to help
Assuming there's a slight error in your output example (row 3 should show NA for Val4 and the 3 in row 3 should be in row 5), this works:
library(tidyverse)
df_new <- bind_cols(
df %>%
select(Val1, Val2) %>%
complete(., expand(., Val1 = 1:40)),
df %>%
select(Val3, Val4) %>%
complete(., expand(., Val3 = 1:40))
)
> df_new
# A tibble: 40 x 4
Val1 Val2 Val3 Val4
<dbl> <dbl> <dbl> <dbl>
1 1 2 1 NA
2 2 NA 2 NA
3 3 NA 3 NA
4 4 NA 4 NA
5 5 NA 5 3
6 6 4 6 NA
7 7 NA 7 NA
8 8 NA 8 NA
9 9 NA 9 NA
10 10 NA 10 5
# ... with 30 more rows
We use bind_cols() to put together two parts of the dataframe:
First we select the first two columns, expand() the causal variable and complete() the data, then we do it again for the third and fourth column.
For the following example:
set.seed(24)
D <- data.frame(Team=sample(LETTERS[1:6],100,TRUE),stringsAsFactors=FALSE)
If i want to find the first row at which all players have had 1 turn, then the following works:
max(match(unique(D$Team),D$Team))
# [1] 18
but what if i want to find the first row when teams have played 2 games, or 3 games or more? Im stuck on how to do this, I guess what I would be looking for is the first index, i, in which all elements of table(D$Team)[1:i] are greater than 2, 3, 4. But this is quite slow and clunky
You could add a column with the total number of matches played by a team and then use max(which(...)) to interrogate a given amount :
D$Matches <- vapply(1:nrow(D),FUN = function(r)sum(D$Team[1:r] == D$Team[r]),1)
getWhenAllTeamsHavePlayedNMatches <- function(nMatches){
if(sum(D$Matches == nMatches) == length(unique(D$Team))){
return(max(which(D$Matches == nMatches)))
}
return(NA)
}
getWhenAllTeamsHavePlayedNMatches(4)
# e.g. returns 42
If you want to precalculate all values and add a column to D :
D$Matches <- vapply(1:nrow(D),FUN = function(r)sum(D$Team[1:r] == D$Team[r]),1)
nTeams <- length(unique(D$Team))
D$NumMatchesWithAllTeam <- vapply(1:nrow(D),
FUN = function(r) {
if(sum(D$Matches[1:r] == D$Matches[r]) == nTeams)
return(D$Matches[r])
return(NA)
}
,1)
Resulting data.frame :
> D
Team Matches NumMatchesWithAllTeam
1 B 1 NA
2 B 2 NA
3 E 1 NA
4 D 1 NA
5 D 2 NA
6 F 1 NA
7 B 3 NA
8 E 2 NA
9 E 3 NA
10 B 4 NA
11 D 3 NA
12 C 1 NA
13 E 4 NA
14 E 5 NA
15 B 5 NA
16 F 2 NA
17 B 6 NA
18 A 1 1
19 D 4 NA
20 A 2 NA
21 A 3 NA
22 D 5 NA
23 E 6 NA
24 A 4 NA
25 B 7 NA
26 E 7 NA
27 A 5 NA
28 D 6 NA
29 D 7 NA
30 A 6 NA
31 B 8 NA
32 B 9 NA
33 C 2 2
34 A 7 NA
35 F 3 NA
36 B 10 NA
37 E 8 NA
38 D 8 NA
39 E 9 NA
40 F 4 NA
41 C 3 3
42 C 4 4
43 B 11 NA
44 B 12 NA
45 A 8 NA
46 A 9 NA
47 C 5 NA
48 C 6 NA
49 B 13 NA
50 C 7 NA
51 C 8 NA
52 F 5 5
53 C 9 NA
54 E 10 NA
55 D 9 NA
56 F 6 6
57 C 10 NA
58 B 14 NA
59 B 15 NA
60 A 10 NA
61 C 11 NA
62 B 16 NA
63 B 17 NA
64 A 11 NA
65 E 11 NA
66 B 18 NA
67 F 7 7
68 F 8 8
69 E 12 NA
70 C 12 NA
71 A 12 NA
72 B 19 NA
73 A 13 NA
74 F 9 9
75 D 10 NA
76 C 13 NA
77 D 11 NA
78 E 13 NA
79 A 14 NA
80 E 14 NA
81 D 12 NA
82 A 15 NA
83 D 13 NA
84 B 20 NA
85 C 14 NA
86 C 15 NA
87 B 21 NA
88 F 10 10
89 C 16 NA
90 F 11 11
91 B 22 NA
92 E 15 NA
93 F 12 12
94 A 16 NA
95 C 17 NA
96 D 14 NA
97 D 15 NA
98 A 17 NA
99 C 18 NA
100 C 19 NA
I want to sort this dataset as (rank instances by missing amount in descending order)
can someone help me how to do it in R language , is there any command to do it in r .
df=data.frame(x=c(1,4,6,NA,7,NA,9,10,4,NA),
y=c(10,12,NA,NA,14,18,20,15,12,17),
z=c(225,198,NA,NA,NA,130,NA,200,NA,99),
v=c(44,51,NA,NA,45,NA,25,36,75,NA))
df
x y z v
1 1 10 225 44
2 4 12 198 51
3 6 NA NA NA
4 NA NA NA NA
5 7 14 NA 45
6 NA 18 130 NA
7 9 20 NA 25
8 10 15 200 36
9 4 12 NA 75
10 NA 17 99 NA
I want to get this result :
x y z v
4 NA NA NA NA
3 6 NA NA NA
6 NA 18 130 NA
10 NA 17 99 NA
5 7 14 NA 45
7 9 20 NA 25
9 4 12 NA 75
1 1 10 225 44
2 4 12 198 51
8 10 15 200 36
In my comment I incorrectly remembered the name of the argument for changing the direction of an order result. The fix is simply to use the correct name:
> df[ order(rowSums(is.na(df)), decreasing=TRUE), ]
x y z v
4 NA NA NA NA
3 6 NA NA NA
6 NA 18 130 NA
10 NA 17 99 NA
5 7 14 NA 45
7 9 20 NA 25
9 4 12 NA 75
1 1 10 225 44
2 4 12 198 51
8 10 15 200 36