Removing NA with dplyr::filter() [duplicate] - r

This question already has answers here:
How to deal with nonstandard column names (white space, punctuation, starts with numbers)
(3 answers)
Remove rows in R matrix where all data is NA [duplicate]
(2 answers)
Closed 1 year ago.
The data is like
example<-matrix(NA,40,7)
colnames(example)=c("1month","2month","3month","4month","5month","6month","7month")
example[,1]<-rep(c(1,3,6,2,4,98,5,3,NA),len=40)
example[,2]<-rep(c(2,7,NA,8,2,NA,3,NA),len=40)
example[,3]<-rep(c(5,3,2,NA),len=40)
example[,4]<-rep(c(NA,91,98,52,35,NA),len=40)
example[,5]<-rep(c(3,NA),len=40)
example[,6]<-rep(c(98,NA,NA,123),len=40)
example[,7]<-rep(c(3,51,NA,NA,4,NA,5,NA),len=40)
example<-as.data.frame(example)
I want to remove 'NA' for each column.
I can do it using drop_na function
but !is.na() doesn't work.
example %>% select('1month') %>% drop_na('1month')<- this work
example %>% select('1month') %>% filter(!is.na('1month')) <- this doesn't work. the result for this is under.
I wonder why this doesn't work and there is any way that I can use != or !is.na() function.
Thank you for your help. Sincerely.
1month
1 1
2 3
3 6
4 2
5 4
6 98
7 5
8 3
9 NA
10 1
11 3
12 6
13 2
14 4
15 98
16 5
17 3
18 NA
19 1
20 3
21 6
22 2
23 4
24 98
25 5
26 3
27 NA
28 1
29 3
30 6
31 2
32 4
33 98
34 5
35 3
36 NA
37 1
38 3
39 6
40 2

Related

Avoid duplicates in numeric vector shifting numbers

I'm looking for the optimal way to go from a numeric vector containing duplicate entries, like this one:
a=c(1,3,4,4,4,5,7,9,27,28,28,30,42,43)
to this one, avoiding the duplicates by shifting +1 if appropriate:
b=c(1,3,4,5,6,7,8,9,27,28,29,30,42,43)
side to side comparison:
> data.frame(a=a, b=b)
a b
1 1 1
2 3 3
3 4 4
4 4 5
5 4 6
6 5 7
7 7 8
8 9 9
9 27 27
10 28 28
11 28 29
12 30 30
13 42 42
14 43 43
is there any easy and quick way to do it? Thanks!
In case you want it to be done only once (there may still be duplicates):
a=c(1,3,4,4,4,5,7,9,27,28,28,30,42,43)
a <- ifelse(duplicated(a),a+1,a)
output:
> a
[1] 1 3 4 5 5 5 7 9 27 28 29 30 42 43
Loop that will lead to a state without any duplicates:
a=c(1,3,4,4,4,5,7,9,27,28,28,30,42,43)
while(length(a[duplicated(a)])) {
a <- ifelse(duplicated(a),a+1,a)
}
output:
> a
[1] 1 3 4 5 6 7 8 9 27 28 29 30 42 43
An alternative is to use a recursive function:
no_dupes <- function(x) {
if (anyDuplicated(x) == 0)
x
else
no_dupes(x + duplicated(x))
}
no_dupes(a)
[1] 1 3 4 5 6 7 8 9 27 28 29 30 42 43
A tidyverse option using purrr::accumulate.
library(dplyr)
library(purrr)
accumulate(a, ~ if_else(.y <= .x, .x+1, .y))
# [1] 1 3 4 5 6 7 8 9 27 28 29 30 42 43

Cumulative function for a specific range of values

I have a table with a column "Age" that has a values from 1 to 10, and a column "Population" that has values specified for each of the "age" values. I want to generate a cumulative function for population such that resultant values start from ages at least 1 and above, 2 and above, and so on. I mean, the resultant array should be (203,180..and so on). Any help would be appreciated!
Age Population Withdrawn
1 23 3
2 12 2
3 32 2
4 33 3
5 15 4
6 10 1
7 19 2
8 18 3
9 19 1
10 22 5
You can use cumsum and rev:
df$sum_above <- rev(cumsum(rev(df$Population)))
The result:
> df
Age Population sum_above
1 1 23 203
2 2 12 180
3 3 32 168
4 4 33 136
5 5 15 103
6 6 10 88
7 7 19 78
8 8 18 59
9 9 19 41
10 10 22 22

r - aggregate / substract two variables, rows

I'm using the aggregate function for calculating the difference for every observation of two variables,so somehow like this (and the I want to save the result as a new variable) :
data1
Group Points_Attempt1 Points_Attempt2
1 1 10 5
2 1 34 23
3 1 50 5
4 1 10 12
5 2 11 21
6 2 23 23
7 2 32 10
8 2 12 10
I'm able to do something like this:
aggregate(data1[c("Points_Attempt1","Points_Attempt2")],list(data1$group),diff)
But I want it for every single observations and I just do not now to select the observations, so somehow the row numbers (here from 1-8).
So I'm searching for the following fourth column (Difference), which I then would like to safe as a new variable:
Group Points_Attempt1 Points_Attempt2 Difference
1 1 10 5 5
2 1 34 23 11
3 1 50 5 45
4 1 10 12 -2
5 2 11 21 -10
6 2 23 23 0
7 2 32 10 22
8 2 12 10 2
I would be highly thankful, if someone could help me with this.
We can use mutate_each
library(dplyr)
data1 %>%
group_by(Group) %>%
mutate_each(funs(c(NA, diff(.))), 2:3)
Or if we need to subtract between the variables,
data1 %>%
mutate(Difference = Points_Attemp1 - Points_Attemp2)

extract a column based on other two column

ID MON in out
2 1 23 12
3 1 23 12
7 1 33 22
1 2 22 11
2 2 111 100
1 3 21 10
2 3 22 11
2 4 111 100
7 4 21 10
2 5 31 20
7 2046 41 30
I have a large data set in this format. I want to extract column four for the value of column 1==2 and column 2 smaller then 5.
It's basic R.
df[,4][df[,1]==2 & df[,2]<5]

how to deal with this kind of data type

I used igraph package to detect communities. When I used membership(community) function, the result is:
1 2 3 4 5 6 7 13 17 18 19 20 22 23 24 25
12 9 1 10 12 6 12 16 1 11 6 6 3 13 16 1
29 30 31 33 34 37 38 39 40 41 42 43 44 45 46 47
9 5 11 14 13 6 13 11 12 13 1 16 11 6 12 7
...
The first line is node ID and the second line is its corresponding community ID.
Suppose the name of the above result is X. I used Y=data.frame(X). The result is:
community
1 12
2 9
3 1
4 10
5 12
6 6
7 12
13 16
...
I want to use the first column (1,2,3,...), for instance, Y[13,]=16. But in this case, it is Y[8,]=16. How to do this?
This question may be very simple. But I do not know how to google it. Thanks.
Function as.data.frame() converts a named vector to a data frame, where the names of the vector elements are used as row names.
In other words, use a construct like rownames(Y)[8] to access the first column (or the row names, actually).

Resources