I have a little problem I'd need your help with. I have the following data frame:
set.seed(1000)
test = data.frame(a = sample(10, replace=T), b = sample(10, replace=T), c=rep(NA, 10))
> test
a b c
1 1 6 NA
2 2 4 NA
3 6 3 NA
4 6 9 NA
5 1 5 NA
6 4 3 NA
7 5 1 NA
8 3 7 NA
9 5 10 NA
10 4 2 NA
and perform the diff() function to compute difference between consecutive rows within each column
test2 = abs(apply(test, 2, diff))
> test2
a b c
[1,] 1 2 NA
[2,] 4 1 NA
[3,] 0 6 NA
[4,] 5 4 NA
[5,] 3 2 NA
[6,] 1 2 NA
[7,] 2 6 NA
[8,] 2 3 NA
[9,] 1 8 NA
I would like to replace those elements in 'test' where the difference in test2 is, say, greater than/equal to 4, with NA values. I would expect, for example, test[3,1] to become NA, since its diff in test2[2,1] is >= 4
test2 <- abs(apply(test,2,function(x) c(NA, diff(x))))
Update
Based on the new information:
test[!is.na(test2) & test2 >4] <- NA
test
# a b c
# 1 4 4 NA
# 2 8 8 NA
# 3 NA 4 NA
# 4 NA NA NA
# 5 6 8 NA
# 6 NA NA NA
# 7 NA 5 NA
# 8 6 7 NA
# 9 3 NA NA
# 10 3 NA NA
Related
I have generated random data like this.
data <- replicate(10,sample(0:9,10,rep=FALSE))
ind <- which(data %in% sample(data, 5))
#now replace those indices in data with NA
data[ind]<-NA
#here is our vector with 15 random NAs
data = as.data.frame(data)
rownames(data) = 1:10
colnames(data) = 1:10
data
which results in a data frame like this. How can I reorder the entry value such that if the entry is numeric, then the value will be placed in a (row number - 1), and NA will be put in any rows where there is no value matching the (row number -1). The data I want, for example, the first column, should look like this
.
How can I do this? I have no clue at all. We can order decreasing or increasing and put NA in the last order, but that is not what I want.
You can make a helper function to assign values to indices at (values + 1), then apply the function over all columns:
fx <- function(x) {
vals <- x[!is.na(x)]
pos <- vals + 1
out <- rep(NA, length(x))
out[pos] <- vals
out
}
as.data.frame(sapply(data, fx))
1 2 3 4 5 6 7 8 9 10
1 NA 0 NA 0 0 0 0 NA 0 0
2 NA NA NA 1 1 NA NA NA NA NA
3 2 NA 2 2 NA NA NA NA 2 NA
4 3 NA 3 3 NA NA 3 NA 3 3
5 4 4 4 4 NA 4 NA 4 4 NA
6 5 5 NA 5 NA NA 5 5 5 NA
7 NA 6 6 NA 6 NA NA 6 NA NA
8 7 NA 7 7 NA 7 7 NA NA 7
9 NA NA NA NA 8 8 8 8 8 8
10 9 9 NA NA 9 NA NA 9 NA 9
Starting data:
set.seed(13)
data <- replicate(10, sample(
c(0:9, rep(NA, 10)),
10,
replace = FALSE
))
data <- as.data.frame(data)
colnames(data) <- 1:10
data
1 2 3 4 5 6 7 8 9 10
1 2 NA NA 2 NA NA 0 NA 3 7
2 4 NA NA 4 NA NA NA NA 2 9
3 9 9 NA 3 9 4 NA 6 4 0
4 NA NA NA 1 6 NA NA 4 NA NA
5 5 6 3 0 NA NA 5 8 8 NA
6 NA NA 7 NA NA NA 7 NA 5 3
7 3 4 6 NA 1 0 NA 5 NA NA
8 NA NA NA 7 0 7 NA NA 0 NA
9 NA 0 4 NA 8 8 8 9 NA 8
10 7 5 2 5 NA NA 3 NA NA NA
I have a large dataframe, 300+ columns (time series) with about 2600 observations. The columns are filled with a lot of NA's and then a short time series, and then typically NA's again. I would like to find the first non-NA value in each column and replace it with NA.
This is what I'm hoping to achieve, only with a much bigger dataframe:
Before:
x1 x2 x3 x4
1 NA NA NA NA
2 NA NA NA NA
3 1 1 NA NA
4 2 2 1 1
5 3 3 2 2
6 4 4 3 3
7 5 5 4 4
8 6 6 5 5
9 7 7 6 6
10 8 8 7 7
11 9 9 NA NA
12 10 10 NA NA
13 NA NA NA NA
14 NA NA NA NA
After:
x1 x2 x3 x4
1 NA NA NA NA
2 NA NA NA NA
3 NA NA NA NA
4 2 2 NA NA
5 3 3 2 2
6 4 4 3 3
7 5 5 4 4
8 6 6 5 5
9 7 7 6 6
10 8 8 7 7
11 9 9 NA NA
12 10 10 NA NA
13 NA NA NA NA
14 NA NA NA NA
I've searched around and found a way to do this for each column, but my efforts to apply it to the whole dataframe has proven difficult.
I have created an example dataframe to reproduce my original dataframe:
#Dataframe with NA
x1=x2=c(NA,NA,1:10,NA,NA)
x3=x4=c(NA,NA,NA,1:7,NA,NA,NA,NA)
df=data.frame(x1,x2,x3,x4)
I have used this to replace the first value with NA in 1 column (provided by #Joshua Ulrich here), however I would like to apply it to all columns without manually changing 300+ codes:
NonNAindex <- which(!is.na(df[,1]))
firstNonNA <- min(NonNAindex)
is.na(df[,1]) <- seq(firstNonNA, length.out=1)
I have tried to set the above as a function and run it for all columns with apply/lapply, as well as a for loop, but haven't really figured out how to apply the changes to my dataframe. I'm sure there is something I've completely overlooked as I'm just taking my first small steps in R.
All suggestions would be highly appreciated!
We can use base R
df1[] <- lapply(df1, function(x) replace(x, which(!is.na(x))[1], NA))
df1
# x1 x2 x3 x4
#1 NA NA NA NA
#2 NA NA NA NA
#3 NA NA NA NA
#4 2 2 NA NA
#5 3 3 2 2
#6 4 4 3 3
#7 5 5 4 4
#8 6 6 5 5
#9 7 7 6 6
#10 8 8 7 7
#11 9 9 NA NA
#12 10 10 NA NA
#13 NA NA NA NA
#14 NA NA NA NA
Or as #thelatemail suggested
df1[] <- lapply(df1, function(x) replace(x, Position(Negate(is.na), x), NA))
Since you would like to do this for all columns, you could use the mutate_all function from dplyr. See http://dplyr.tidyverse.org/ for more information. In particular, you may want to look at some of the examples shown here.
library(dplyr)
mutate_all(df, funs(if_else(row_number() == min(which(!is.na(.))), NA_integer_, .)))
#> x1 x2 x3 x4
#> 1 NA NA NA NA
#> 2 NA NA NA NA
#> 3 NA NA NA NA
#> 4 2 2 NA NA
#> 5 3 3 2 2
#> 6 4 4 3 3
#> 7 5 5 4 4
#> 8 6 6 5 5
#> 9 7 7 6 6
#> 10 8 8 7 7
#> 11 9 9 NA NA
#> 12 10 10 NA NA
#> 13 NA NA NA NA
#> 14 NA NA NA NA
For example,
dataX = data.frame(a=c(1:5),b=c(2:6),c=c(3:7),d=c(4:8),e=c(5:9),f=c(6:10))
How do I insert a blank column after every 2 columns?
Here is a similar method that uses a trick with matrices and integer selection of columns. The original data.frame gets an NA column with cbind. The columns of this new object are then referenced with every two columns and then the final NA column using a matrix to fill in the final column with rbind.
cbind(dataX, NewCol=NA)[c(rbind(matrix(seq_along(dataX), 2), ncol(dataX)+1))]
a b NewCol c d NewCol.1 e f NewCol.2
1 1 2 NA 3 4 NA 5 6 NA
2 2 3 NA 4 5 NA 6 7 NA
3 3 4 NA 5 6 NA 7 8 NA
4 4 5 NA 6 7 NA 8 9 NA
5 5 6 NA 7 8 NA 9 10 NA
We can use use split to split the dataset at unique positions into a list of data.frame, loop through the list, cbind with NA and cbind the elements together
res <- do.call(cbind, setNames(lapply(split.default(dataX, (seq_len(ncol(dataX))-1)%/%2),
function(x) cbind(x, NewCol = NA)), NULL))
res
# a b NewCol c d NewCol e f NewCol
#1 1 2 NA 3 4 NA 5 6 NA
#2 2 3 NA 4 5 NA 6 7 NA
#3 3 4 NA 5 6 NA 7 8 NA
#4 4 5 NA 6 7 NA 8 9 NA
#5 5 6 NA 7 8 NA 9 10 NA
names(res) <- make.unique(names(res))
Let us construct a empty data frame with the same number of rows as dataX
empty_df <- data.frame(x1=rep(NA,nrow(df)),x2=rep(NA,nrow(df)),x3=rep(NA,nrow(df)))
dataX<-cbind(dataX,empty_df)
dataX<-dataX[c("a","b","x1","c","d","x2","e","f","x3")]
resulting in:
a b x1 c d x2 e f x3
1 1 2 NA 3 4 NA 5 6 NA
2 2 3 NA 4 5 NA 6 7 NA
3 3 4 NA 5 6 NA 7 8 NA
4 4 5 NA 6 7 NA 8 9 NA
5 5 6 NA 7 8 NA 9 10 NA
I would like to have a dataframe like this for example:
example=data.frame(a=c(1,2,3,4,5,6,7,8), b=c(1,2,3,4,5,6,7,8), c=c(1,2,3,4,5,6,7,8), d = c(1,2,3,4,5,6,7,8))
a b c d
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
4 4 4 4 4
5 5 5 5 5
6 6 6 6 6
7 7 7 7 7
8 8 8 8 8
Be transformed so that the first row stays in the same position, but each row after that shifts one column to the right from the previous row, for example:
a b c d X X.1 X.2 X.3 X.4 X.5 X.6
1 1 1 1 1 NA NA NA NA NA NA NA
2 NA 2 2 2 2 NA NA NA NA NA NA
3 NA NA 3 3 3 3 NA NA NA NA NA
4 NA NA NA 4 4 4 4 NA NA NA NA
5 NA NA NA NA 5 5 5 5 NA NA NA
6 NA NA NA NA NA 6 6 6 6 NA NA
7 NA NA NA NA NA NA 7 7 7 7 NA
8 NA NA NA NA NA NA NA 8 8 8 8
This is for the purpose that each column can then be summed (by which I mean that for each new column, the rows will be added together, but there's no meaning to the column titles so moving them doesn't matter), so the column names don't particularly matter.
Any help would be much appreciated as i've yet to stumble across anything to achieve this kind of data transformation.
EDIT: Thanks to everyone who answered, all the solutions worked great!
Here is another base R for loop. I construct the matrix first, and then fill it in.
# build matrix of missing values
myMat <- matrix(NA, nrow(example), ncol(example) + nrow(example) - 1)
# fill it in by row with vector pulled from row of example and simiplified with `unlist`
for(i in seq_len(nrow(myMat))) {
myMat[i, i:(ncol(example) + i - 1)] <- unlist(example[i,], use.names=FALSE)
}
This returns
myMat
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[1,] 1 1 1 1 NA NA NA NA NA NA NA
[2,] NA 2 2 2 2 NA NA NA NA NA NA
[3,] NA NA 3 3 3 3 NA NA NA NA NA
[4,] NA NA NA 4 4 4 4 NA NA NA NA
[5,] NA NA NA NA 5 5 5 5 NA NA NA
[6,] NA NA NA NA NA 6 6 6 6 NA NA
[7,] NA NA NA NA NA NA 7 7 7 7 NA
[8,] NA NA NA NA NA NA NA 8 8 8 8
A simple approach to create a matrix with NA and fill in the values you want.
Shifted = matrix(NA, nrow=nrow(example),
ncol=nrow(example) + ncol(example) - 1)
for(i in 1:nrow(example)) {
Shifted[i, i:(i+ncol(example)-1)] = unlist(example[i,]) }
If you really want a data.frame, you can finish off with
as.data.frame(Shifted)
I have a table that looks kind of like this:
# item 1 2 3 4 5 6 7 8
#1 1 2 4 6 NA NA NA NA NA
#2 2 1 4 5 6 NA NA NA NA
#3 3 NA NA NA NA NA NA NA NA
#4 4 1 2 6 NA NA NA NA NA
#5 5 2 3 4 6 7 8 NA NA
and I have a list
list1<-11:13
I want to replace the NAs with the elements in the list by row and result should be like this:
# item 1 2 3 4 5 6 7 8
#1 1 2 4 6 11 12 13 NA NA
#2 2 1 4 5 6 11 12 13 NA
#3 3 11 12 13 NA NA NA NA NA
#4 4 1 2 6 11 12 13 NA NA
#5 5 2 3 4 6 7 8 11 12
I tried
for(i in 1:5){
res<-which(is.na(Mydata[i,]))
Mydata[i,res]<-c(list1, rep(NA, 8))
}
It seems to work with the table in the example but gives many warning messages. And when I run it with a really large table it sometimes gives the wrong result. Can anyone tell me what is wrong my code? Or is there any better way to do this?
We loop through the rows of 'Mydata' using apply with MARGIN=1, create the numeric index for elements that are NA ('i1'), check the minimum length of the NA elements and the list1 ('l1') and replace the elements based on the minimum number of elements.
t(apply(Mydata, 1, function(x) {
i1 <- which(is.na(x))
l1 <- min(length(i1), length(list1))
replace(x, i1[seq(l1)], list1[seq(l1)])}))
# item X1 X2 X3 X4 X5 X6 X7 X8
#1 1 2 4 6 11 12 13 NA NA
#2 2 1 4 5 6 11 12 13 NA
#3 3 11 12 13 NA NA NA NA NA
#4 4 1 2 6 11 12 13 NA NA
#5 5 2 3 4 6 7 8 11 12
Or as #RichardSciven mentioned, we can use na.omit with apply by looping over the rows
t(apply(df, 1, function(x) {
w <- na.omit(which(is.na(x))[1:3])
x[w] <- list1[1:length(w)]
x }))
You could do it all in one go using matrix indexing:
sel <- pmin(outer( 0:2, max.col(is.na(dat), "first"), `+`), ncol(dat))
dat[unique(cbind(c(col(sel)),c(sel)))] <- 11:13
# item 1 2 3 4 5 6 7 8
#[1,] 1 2 4 6 11 12 13 NA NA
#[2,] 2 1 4 5 6 11 12 13 NA
#[3,] 3 11 12 13 NA NA NA NA NA
#[4,] 4 1 2 6 11 12 13 NA NA
#[5,] 5 2 3 4 6 7 8 11 12