My data consists of columns and rows. Each column has "NA" and different numbers.
For example column1 is:
2
1
1
NA
1
NA
NA
NA
I want to assign a column id to the numbers in each column.
for(j in 1:54){
if(!(col[j] <-"NA")){
col[j] <- i
}
}
Expected result for column1:
1
1
NA
NA
NA
1
NA
NA
1
**column 2: **
2
2
NA
NA
NA
2
NA
NA
2
You can use
v <- c(2, 1, NA, NA, 4, 5, NA)
id <- ifelse(!is.na(v), 1, NA)
id
1 1 NA NA 1 1 NA
This means you don't need the for loop here. If you can apply a function to a vector you should avoid using the for loop.
Also, please provide your data so that others can actually use it (like in my code above).
EDIT
According to the comments you have multiple columns. You can use same code. See here
df <- data.frame(a= c(2, 1, NA, NA, 4, 5, NA), b= c(3, NA, NA, NA, 5, NA, 6))
id <- sapply(1:ncol(df), function(i){
ifelse(!is.na(df[ , i]), i, NA)})
id
a b
[1,] 1 2
[2,] 1 NA
[3,] NA NA
[4,] NA NA
[5,] 1 2
[6,] 1 NA
[7,] NA 2
Related
I'm trying to convert values in a data frame to rank order values by row. So take this:
df = data.frame(A = c(10, 20, NA), B = c(NA, 10, 20), C = c(20, NA, 10))
When I do this:
t(apply(df, 1, rank))
I get this:
[1,] 1 3 2
[2,] 2 1 3
[3,] 3 2 1
But I want the NA values to continue showing as NA, like so:
[1,] 1 NA 2
[2,] 2 1 NA
[3,] NA 2 1
Try using the argument na.last and set it to keep:
t(apply(df, 1, rank, na.last='keep'))
Output:
A B C
[1,] 1 NA 2
[2,] 2 1 NA
[3,] NA 2 1
As mentioned in the documentation of rank:
na.last:
for controlling the treatment of NAs. If TRUE, missing values in the data are put last; if FALSE, they are put first; if NA, they are removed; if "keep" they are kept with rank NA.
Here a dplyr approach
Libraries
library(dplyr)
Data
df <- tibble(A = c(10, 20, NA), B = c(NA, 10, 20), C = c(20, NA, 10))
Code
df %>%
mutate(across(.fns = ~rank(x = .,na.last = "keep")))
Output
# A tibble: 3 x 3
A B C
<dbl> <dbl> <dbl>
1 1 NA 2
2 2 1 NA
3 NA 2 1
I have a dataframe with two columns. I need to check if where a column is NA the other is not. Thanks
Edited.
I would like to know, for each row of the dataframe, if there are rows with both columns not NA.
You can use the following code to check which row has no NA values:
df <- data.frame(x = c(1, NA),
y = c(2, NA))
which(rowSums(is.na(df))==ncol(df))
Output:
[1] 1
As you can see the first rows has no NA values so both columns have no NA values.
Here's a simple code to generate a column of the NA count for each row:
x <- sample(c(1, NA), 25, replace = TRUE)
y <- sample(c(1, NA), 25, replace = TRUE)
df <- data.frame(x, y)
df$NA_Count <- apply(df, 1, function(x) sum(is.na(x)))
df
x y NA_Count
1 NA 1 1
2 NA NA 2
3 1 NA 1
4 1 NA 1
5 NA NA 2
6 1 NA 1
7 1 1 0
8 1 1 0
9 1 1 0
I would like to know how I can remove from a dataset the records that have more than 5 null values in the columns that define them. The following code allows you to delete records with any NA in any column. However, how can I modify it to do exactly what I ask? Any ideas?
df [ complete.cases (df),]
Here is an example data frame. One of the rows has 6 NA values.
We sum the NA values by row in a new column, filter where the number of NA is less than or equal to 5, then remove the new column.
df <- data.frame(a = c(1,NA,1,1),
b = c(1, NA, NA, 1),
c = c(1, NA, NA, NA),
d = c(1, NA, NA ,NA),
e = c(1, NA, NA, NA),
f = c(1, NA, NA, NA))
a b c d e f
1 1 1 1 1 1 1
2 NA NA NA NA NA NA
3 1 NA NA NA NA NA
4 1 1 NA NA NA NA
df %>%
mutate(count = rowSums(is.na(df))) %>%
filter(count <= 5) %>%
select(-count)
a b c d e f
1 1 1 1 1 1 1
2 1 NA NA NA NA NA
3 1 1 NA NA NA NA
I'm assuming you are referring to values of NA in your data indicating a missing value. NULL is returned by expressions and functions whose value is undefined. First create some reproducible data:
set.seed(42)
vals <- sample.int(1000, 250)
idx <- sample.int(250, 100)
vals[idx] <- NA
example <- as.data.frame(matrix(vals, 25))
Now compute the number of missing values by row and exclude the rows with more than 5 missing values:
na.count <- rowSums(is.na(example))
example[na.count<=5, ]
How to do something like this :
library(zoo)
s = sample(1:10,10)
s
# [1] 3 9 1 10 2 8 6 7 4 5
rollapply(s, width=1, function(x) ifelse(x==3|x==4, return(Current_Position), return(NA)) )
The output I would like to get is :
# [1] 1 NA NA NA NA NA NA NA 9 NA
i.e. how to get that Current_Position value inside the function being called by rollapply?
In such cases apply along the index rather than the vector itself:
library(zoo)
s <- c(3, 9, 1, 10, 2, 8, 6, 7, 4, 5)
rollapply(seq_along(s), 1, function(ix) if (s[ix] %in% 3:4) ix else NA)
## [1] 1 NA NA NA NA NA NA NA 9 NA
I was wondering if anyone had a quick and dirty solution to the following problem, I have a matrix that has rows of NAs and I would like to replace the rows of NAs with the previous row (if it is not also a row of NAs).
Assume that the first row is not a row of NAs
Thanks!
Adapted from an answer to this question: Idiomatic way to copy cell values "down" in an R vector
f <- function(x) {
idx <- !apply(is.na(x), 1, all)
x[idx,][cumsum(idx),]
}
x <- data.frame(a=c(1, 2, NA, 3, NA, NA), b=c(4, 5, NA, 6, NA, 7))
> x
a b
1 1 4
2 2 5
3 NA NA
4 3 6
5 NA NA
6 NA 7
> f(x)
a b
1 1 4
2 2 5
2.1 2 5
4 3 6
4.1 3 6
6 NA 7
Trying to think of times you may have two all NA rows in a row.
#create a data set like you discuss (in the future please do this yourself)
set.seed(14)
x <- matrix(rnorm(10), nrow=2)
y <- rep(NA, 5)
v <- do.call(rbind.data.frame, sample(list(x, x, y), 10, TRUE))
One approach:
NArows <- which(apply(v, 1, function(x) all(is.na(x)))) #find all NAs
notNA <- which(!seq_len(nrow(v)) %in% NArows) #find non NA rows
rep.row <- sapply(NArows, function(x) tail(notNA[x > notNA], 1)) #replacement rows
v[NArows, ] <- v[rep.row, ] #assign
v #view
This would not work if your first row is all NAs.
You can always use a loop, here assuming that 1 is not NA as indicated:
fill = data.frame(x=c(1,NA,3,4,5))
for (i in 2:length(fill)){
if(is.na(fill[i,1])){ fill[i,1] = fill[(i-1),1]}
}
If m is your matrix, this is your quick and dirty solution:
sapply(2:nrow(m),function(i){ if(is.na(m[i,1])) {m[i,] <<- m[(i-1),] } })
Note it uses the ugly (and dangerous) <<- operator.
Matthew's example:
x <- data.frame(a=c(1, 2, NA, 3, NA, NA), b=c(4, 5, NA, 6, NA, 7))
na.rows <- which( apply( x , 1, function(z) (all(is.na(z)) ) ) )
x[na.rows , ] <- x[na.rows-1, ]
x
#---
a b
1 1 4
2 2 5
3 2 5
4 3 6
5 3 6
6 NA 7
Obviously a first row with all NA's would present problems.
Here is a straightforward and conceptually perhaps the simplest one-liner:
x <- data.frame(a=c(1, 2, NA, 3, NA, NA), b=c(4, 5, NA, 6, NA, 7))
a b
1 1 4
2 2 5
3 NA NA
4 3 6
5 NA NA
6 NA 7
x1<-t(sapply(1:nrow(x),function(y) ifelse(is.na(x[y,]),x[y-1,],x[y,])))
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 2 5
[4,] 3 6
[5,] 3 6
[6,] NA 7
To put the column names back, just use colnames(x1)<-colnames(x)