I have a table that looks kind of like this:
# item 1 2 3 4 5 6 7 8
#1 1 2 4 6 NA NA NA NA NA
#2 2 1 4 5 6 NA NA NA NA
#3 3 NA NA NA NA NA NA NA NA
#4 4 1 2 6 NA NA NA NA NA
#5 5 2 3 4 6 7 8 NA NA
and I have a list
list1<-11:13
I want to replace the NAs with the elements in the list by row and result should be like this:
# item 1 2 3 4 5 6 7 8
#1 1 2 4 6 11 12 13 NA NA
#2 2 1 4 5 6 11 12 13 NA
#3 3 11 12 13 NA NA NA NA NA
#4 4 1 2 6 11 12 13 NA NA
#5 5 2 3 4 6 7 8 11 12
I tried
for(i in 1:5){
res<-which(is.na(Mydata[i,]))
Mydata[i,res]<-c(list1, rep(NA, 8))
}
It seems to work with the table in the example but gives many warning messages. And when I run it with a really large table it sometimes gives the wrong result. Can anyone tell me what is wrong my code? Or is there any better way to do this?
We loop through the rows of 'Mydata' using apply with MARGIN=1, create the numeric index for elements that are NA ('i1'), check the minimum length of the NA elements and the list1 ('l1') and replace the elements based on the minimum number of elements.
t(apply(Mydata, 1, function(x) {
i1 <- which(is.na(x))
l1 <- min(length(i1), length(list1))
replace(x, i1[seq(l1)], list1[seq(l1)])}))
# item X1 X2 X3 X4 X5 X6 X7 X8
#1 1 2 4 6 11 12 13 NA NA
#2 2 1 4 5 6 11 12 13 NA
#3 3 11 12 13 NA NA NA NA NA
#4 4 1 2 6 11 12 13 NA NA
#5 5 2 3 4 6 7 8 11 12
Or as #RichardSciven mentioned, we can use na.omit with apply by looping over the rows
t(apply(df, 1, function(x) {
w <- na.omit(which(is.na(x))[1:3])
x[w] <- list1[1:length(w)]
x }))
You could do it all in one go using matrix indexing:
sel <- pmin(outer( 0:2, max.col(is.na(dat), "first"), `+`), ncol(dat))
dat[unique(cbind(c(col(sel)),c(sel)))] <- 11:13
# item 1 2 3 4 5 6 7 8
#[1,] 1 2 4 6 11 12 13 NA NA
#[2,] 2 1 4 5 6 11 12 13 NA
#[3,] 3 11 12 13 NA NA NA NA NA
#[4,] 4 1 2 6 11 12 13 NA NA
#[5,] 5 2 3 4 6 7 8 11 12
Related
If I have data such as
idx<-c("1_1_2015_0_00_00","1_1_2015_0_10_00","1_1_2015_0_30_00","1_1_2015_0_40_00","1_1_2015_0_60_00","1_1_2015_0_80_00")
rr<-c(2,3,4,1,5,6)
no<-seq(1,6)
dat<-data.frame(no,idx,rr)
then i want to pair with a standard index
id<-c("1_1_2015_0_00_00","1_1_2015_0_10_00","1_1_2015_0_20_00","1_1_2015_0_30_00","1_1_2015_0_40_00","1_1_2015_0_50_00","1_1_2015_0_60_00","1_1_2015_0_70_00","1_1_2015_0_80_00")
so i have rank of index of missing data such
no idx rr
1 1 1_1_2015_0_00_00 2
2 2 1_1_2015_0_10_00 3
3 NA NA NA
4 3 1_1_2015_0_30_00 4
5 4 1_1_2015_0_40_00 1
6 NA NA NA
7 5 1_1_2015_0_60_00 5
8 NA NA NA
9 6 1_1_2015_0_80_00 6
How to get it?
You can use match
dat[match(id, dat$idx), ]
# no idx rr
#1 1 1_1_2015_0_00_00 2
#2 2 1_1_2015_0_10_00 3
#NA NA <NA> NA
#3 3 1_1_2015_0_30_00 4
#4 4 1_1_2015_0_40_00 1
#NA.1 NA <NA> NA
#5 5 1_1_2015_0_60_00 5
#NA.2 NA <NA> NA
#6 6 1_1_2015_0_80_00 6
match(id, dat$idx) returns
#[1] 1 2 NA 3 4 NA 5 NA 6
and we use this vector to select rows of dat.
Lets start with two data frames:
m1 <- matrix(sample(c(NA, 1:10), 100, replace = TRUE), 10)
df1 <- as.data.frame(m1)
df1
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 4 5 9 8 3 8 7 1 5 5
2 2 1 NA 6 6 NA 3 8 8 2
3 NA 5 7 2 1 10 8 6 5 7
4 8 1 1 6 8 4 5 3 5 2
5 10 4 9 9 1 NA 7 8 6 2
6 1 8 NA 6 5 7 9 9 9 3
7 1 10 2 4 NA 10 6 5 5 4
8 7 3 10 7 5 5 2 1 NA 1
9 NA NA 8 10 6 4 3 10 7 7
10 7 10 2 2 9 4 NA 1 2 10
m2 <- matrix(sample(c(NA, 2:20), 100, replace = TRUE), 10)
df2 <- as.data.frame(m2)
df2
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 5 NA NA 19 20 15 5 11 4 17
2 4 13 20 NA 9 18 7 11 5 12
3 17 3 14 4 6 2 11 16 11 7
4 14 10 9 16 NA 7 20 5 8 6
5 5 14 10 20 19 16 NA 7 NA NA
6 12 14 14 8 3 20 15 7 15 17
7 4 15 18 12 4 2 19 13 9 8
8 14 11 4 20 5 17 NA 13 19 12
9 15 3 14 16 14 19 17 8 5 NA
10 2 2 11 2 16 4 NA 18 20 NA
Now, I do not want to merge both df, but only some colums.
How can I move df2$V10 to df1$V4?
The resulting df would be composed by 20 rows, but rows 11:20 would be filled by the 10 values of df2$V10. The remaining columns in these interval should be NA.
Extract the 'V10' column from 'df2', create a data.frame and use bind_rows to bind the two datasets. The other column values will be by default filled by NAs
library(dplyr)
bind_rows(df1, data.frame(V4 = df2$V10))
# V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
#1 2 10 NA 9 7 NA NA 8 1 5
#2 2 5 10 10 8 8 3 7 NA 2
#3 3 7 NA 5 4 5 2 5 7 2
#4 9 4 6 4 8 6 7 9 8 2
#5 3 6 2 3 3 6 10 5 9 5
#6 1 NA 3 7 5 4 6 3 7 10
#7 6 3 1 3 4 10 2 6 NA 7
#8 9 1 5 4 4 7 4 2 2 1
#9 3 1 6 6 1 7 7 6 6 1
#10 NA 6 10 9 10 10 6 4 3 9
#11 NA NA NA 10 NA NA NA NA NA NA
#12 NA NA NA 3 NA NA NA NA NA NA
#13 NA NA NA 4 NA NA NA NA NA NA
#14 NA NA NA 18 NA NA NA NA NA NA
#15 NA NA NA 20 NA NA NA NA NA NA
#16 NA NA NA 11 NA NA NA NA NA NA
#17 NA NA NA 15 NA NA NA NA NA NA
#18 NA NA NA 2 NA NA NA NA NA NA
#19 NA NA NA 3 NA NA NA NA NA NA
#20 NA NA NA 14 NA NA NA NA NA NA
For multiple columns, subset the dataset and set the column names of interest before doing the bind_rows
bind_rows(df1, setNames(df2[c('V10', 'V8')], c('V4', 'V2')))
I have a large dataframe, 300+ columns (time series) with about 2600 observations. The columns are filled with a lot of NA's and then a short time series, and then typically NA's again. I would like to find the first non-NA value in each column and replace it with NA.
This is what I'm hoping to achieve, only with a much bigger dataframe:
Before:
x1 x2 x3 x4
1 NA NA NA NA
2 NA NA NA NA
3 1 1 NA NA
4 2 2 1 1
5 3 3 2 2
6 4 4 3 3
7 5 5 4 4
8 6 6 5 5
9 7 7 6 6
10 8 8 7 7
11 9 9 NA NA
12 10 10 NA NA
13 NA NA NA NA
14 NA NA NA NA
After:
x1 x2 x3 x4
1 NA NA NA NA
2 NA NA NA NA
3 NA NA NA NA
4 2 2 NA NA
5 3 3 2 2
6 4 4 3 3
7 5 5 4 4
8 6 6 5 5
9 7 7 6 6
10 8 8 7 7
11 9 9 NA NA
12 10 10 NA NA
13 NA NA NA NA
14 NA NA NA NA
I've searched around and found a way to do this for each column, but my efforts to apply it to the whole dataframe has proven difficult.
I have created an example dataframe to reproduce my original dataframe:
#Dataframe with NA
x1=x2=c(NA,NA,1:10,NA,NA)
x3=x4=c(NA,NA,NA,1:7,NA,NA,NA,NA)
df=data.frame(x1,x2,x3,x4)
I have used this to replace the first value with NA in 1 column (provided by #Joshua Ulrich here), however I would like to apply it to all columns without manually changing 300+ codes:
NonNAindex <- which(!is.na(df[,1]))
firstNonNA <- min(NonNAindex)
is.na(df[,1]) <- seq(firstNonNA, length.out=1)
I have tried to set the above as a function and run it for all columns with apply/lapply, as well as a for loop, but haven't really figured out how to apply the changes to my dataframe. I'm sure there is something I've completely overlooked as I'm just taking my first small steps in R.
All suggestions would be highly appreciated!
We can use base R
df1[] <- lapply(df1, function(x) replace(x, which(!is.na(x))[1], NA))
df1
# x1 x2 x3 x4
#1 NA NA NA NA
#2 NA NA NA NA
#3 NA NA NA NA
#4 2 2 NA NA
#5 3 3 2 2
#6 4 4 3 3
#7 5 5 4 4
#8 6 6 5 5
#9 7 7 6 6
#10 8 8 7 7
#11 9 9 NA NA
#12 10 10 NA NA
#13 NA NA NA NA
#14 NA NA NA NA
Or as #thelatemail suggested
df1[] <- lapply(df1, function(x) replace(x, Position(Negate(is.na), x), NA))
Since you would like to do this for all columns, you could use the mutate_all function from dplyr. See http://dplyr.tidyverse.org/ for more information. In particular, you may want to look at some of the examples shown here.
library(dplyr)
mutate_all(df, funs(if_else(row_number() == min(which(!is.na(.))), NA_integer_, .)))
#> x1 x2 x3 x4
#> 1 NA NA NA NA
#> 2 NA NA NA NA
#> 3 NA NA NA NA
#> 4 2 2 NA NA
#> 5 3 3 2 2
#> 6 4 4 3 3
#> 7 5 5 4 4
#> 8 6 6 5 5
#> 9 7 7 6 6
#> 10 8 8 7 7
#> 11 9 9 NA NA
#> 12 10 10 NA NA
#> 13 NA NA NA NA
#> 14 NA NA NA NA
For example,
dataX = data.frame(a=c(1:5),b=c(2:6),c=c(3:7),d=c(4:8),e=c(5:9),f=c(6:10))
How do I insert a blank column after every 2 columns?
Here is a similar method that uses a trick with matrices and integer selection of columns. The original data.frame gets an NA column with cbind. The columns of this new object are then referenced with every two columns and then the final NA column using a matrix to fill in the final column with rbind.
cbind(dataX, NewCol=NA)[c(rbind(matrix(seq_along(dataX), 2), ncol(dataX)+1))]
a b NewCol c d NewCol.1 e f NewCol.2
1 1 2 NA 3 4 NA 5 6 NA
2 2 3 NA 4 5 NA 6 7 NA
3 3 4 NA 5 6 NA 7 8 NA
4 4 5 NA 6 7 NA 8 9 NA
5 5 6 NA 7 8 NA 9 10 NA
We can use use split to split the dataset at unique positions into a list of data.frame, loop through the list, cbind with NA and cbind the elements together
res <- do.call(cbind, setNames(lapply(split.default(dataX, (seq_len(ncol(dataX))-1)%/%2),
function(x) cbind(x, NewCol = NA)), NULL))
res
# a b NewCol c d NewCol e f NewCol
#1 1 2 NA 3 4 NA 5 6 NA
#2 2 3 NA 4 5 NA 6 7 NA
#3 3 4 NA 5 6 NA 7 8 NA
#4 4 5 NA 6 7 NA 8 9 NA
#5 5 6 NA 7 8 NA 9 10 NA
names(res) <- make.unique(names(res))
Let us construct a empty data frame with the same number of rows as dataX
empty_df <- data.frame(x1=rep(NA,nrow(df)),x2=rep(NA,nrow(df)),x3=rep(NA,nrow(df)))
dataX<-cbind(dataX,empty_df)
dataX<-dataX[c("a","b","x1","c","d","x2","e","f","x3")]
resulting in:
a b x1 c d x2 e f x3
1 1 2 NA 3 4 NA 5 6 NA
2 2 3 NA 4 5 NA 6 7 NA
3 3 4 NA 5 6 NA 7 8 NA
4 4 5 NA 6 7 NA 8 9 NA
5 5 6 NA 7 8 NA 9 10 NA
I have a little problem I'd need your help with. I have the following data frame:
set.seed(1000)
test = data.frame(a = sample(10, replace=T), b = sample(10, replace=T), c=rep(NA, 10))
> test
a b c
1 1 6 NA
2 2 4 NA
3 6 3 NA
4 6 9 NA
5 1 5 NA
6 4 3 NA
7 5 1 NA
8 3 7 NA
9 5 10 NA
10 4 2 NA
and perform the diff() function to compute difference between consecutive rows within each column
test2 = abs(apply(test, 2, diff))
> test2
a b c
[1,] 1 2 NA
[2,] 4 1 NA
[3,] 0 6 NA
[4,] 5 4 NA
[5,] 3 2 NA
[6,] 1 2 NA
[7,] 2 6 NA
[8,] 2 3 NA
[9,] 1 8 NA
I would like to replace those elements in 'test' where the difference in test2 is, say, greater than/equal to 4, with NA values. I would expect, for example, test[3,1] to become NA, since its diff in test2[2,1] is >= 4
test2 <- abs(apply(test,2,function(x) c(NA, diff(x))))
Update
Based on the new information:
test[!is.na(test2) & test2 >4] <- NA
test
# a b c
# 1 4 4 NA
# 2 8 8 NA
# 3 NA 4 NA
# 4 NA NA NA
# 5 6 8 NA
# 6 NA NA NA
# 7 NA 5 NA
# 8 6 7 NA
# 9 3 NA NA
# 10 3 NA NA