Replaces all values with "NA" in R dataframe - r

I have a set of data
x <- seq(1, 10, by=1)
y <- seq(1, 10, by=1)
data <- expand.grid(x,y)
I would like to create a new set of data called NA_data and replaces all the 100 values in that data frame with "NA"

Just multiply with NA as any operation with NA results in NA and here we could either multiply or add (+) or subtract (-) or divide (/) and it still returns NA
NA_data <- data * NA
Or another option is
NA_data <- data[rep(nrow(data) + 1, nrow(data)),]
Or use the matrix way
NA_data <- as.data.frame(matrix(nrow = nrow(data),
ncol = ncol(data), dimnames = list(NULL, names(data)))

Related

Deleting x rows every y rows in a dataframe in R

How can I remove (for example) 3 rows every 10 rows?
For example if I have a dataframe with 100 rows at the end I need I dataframe with 70 rows (with missing the first,the second,the third,the eleventh,the twelfth,the thirteenth and so on).
Using a toy dataframe with 100 lines, try this:
df <- data.frame(x = 1:100, y = 1:100)
rem <- as.vector(sapply(1:3, function(i) seq(i, nrow(df), 10)))
df[-rem, ]
We can use outer to create a sequence of the rows to remove.
result <- df[-c(outer(seq(1, nrow(df), 10), 0:2, `+`)), ]
We could do this with rep in a vectorized way
i1 <- seq(1, nrow(df), 10)
out <- df[-(rep(0:2, each = length(i1)) + i1),]
data
df <- data.frame(x = 1:100, y = 1:100)

Create subset matrix according to criteria/ Extract key rows according to criteria

I want to subset the rows of my original matrix into two separate matrices.
I setup the problem as follows:
set.seed(2)
Mat1 <- data.frame(matrix(nrow = 4, ncol =10, data = rnorm(40,0,1)))
keep.rows = matrix(nrow =2, ncol =4)
keep.rows[,1] = c(1,2)
keep.rows[,2] = c(2,3)
keep.rows[,3] = c(2,3)
keep.rows[,4] = c(1,2)
Mat1
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
1 0.9959846 -2.2079198 -0.3869496 -1.183606 1.959357077 1.0744594 -0.8621983 -0.4213736 0.4718595 1.2309537
2 -1.6957649 1.8221225 0.3866950 -1.358457 0.007645872 0.2605978 2.0480403 -0.3508344 1.3589398 1.1471368
3 -0.5333721 -0.6533934 1.6003909 -1.512671 -0.842615198 -0.3142720 0.9399201 -1.0273806 0.5641686 0.1065980
4 -1.3722695 -0.2846812 1.6811550 -1.253105 -0.601160105 -0.7496301 2.0086871 -0.2505191 0.4559801 -0.7833167
Mat 1 is my original matrix. Now from the Keep rows matrix, I want to create two output matrices. The first output matrix (Output1) should store all the row numbers specified in keep.row. The second output(Output2) matrix should store all remaining rows. In my actual application my matrices are very large and so cannot be sorted manually as i do here.
I need:
1) I need a function that does this simply over large matrices.
2) Ideally one where i can change the number of entries to "keep" each time. So in this case I store 3 entries. However, imagine if my keep.rows matrix was 2x2. In this case, I might want to store five entries each time.
Results should be of the form:
Output1 <- data.frame(matrix(nrow = 2, ncol =10))
Output1[1:2,1:3] <- Mat1[c(1,2), 1:3]
Output1[1:2,4:6] <- Mat1[c(2,3), 4:6]
Output1[1:2,7:9] <- Mat1[c(2,3), 7:9]
Output1[1:2,10] <- Mat1[c(1,2), 10]
Output2 <- data.frame(matrix(nrow = 2, ncol =10))
Output2[1:2,1:3] <- Mat1[c(3,4), 1:3]
Output2[1:2,4:6] <- Mat1[c(1,4), 4:6]
Output2[1:2,7:9] <- Mat1[c(1,4), 7:9]
Output2[1:2,10] <- Mat1[c(3,4), 10]
IMPORTANT: In the answer i need output 2 to be specified in a way that keeps all remaining rows. In my application my keep.row matrix is the same size. But Mat1 contains 1000 rows +
You can use sapply which iterates over the columns of Mat1 with seq_along(Mat1) and subset Mat1 using keep.rows. With cbind you get a matrix-like data.frame from the returned list of sapply. To get the remaining data you simply place a - before keep.rows.
Output1 <- do.call(cbind, sapply(seq_along(Mat1), function(i) Mat1[keep.rows[,(i+2) %/% 3], i, drop = FALSE], simplify = FALSE))
Output2 <- do.call(cbind, sapply(seq_along(Mat1), function(i) Mat1[-keep.rows[,(i+2) %/% 3], i, drop = FALSE], simplify = FALSE))

rollapply for moving average with non-business day

I'd like to get MovingAverage in data which have "NA" in the middle of data like below.
date <- seq.Date(as.Date("2018-07-02"),as.Date("2018-07-14"),by = "days")
A <- c(100,110,120,130,140,NA,NA,150,160,170,180,190,200)
B <- c(200,220,240,260,280,NA,NA,300,320,340,360,380,400)
C <- c(150,160,170,180,190,200,210,NA,NA,220,230,240,250)
dataset <- data.frame(A,B,C)
dataset <- as.xts(dataset, order.by = date)
If I use rollapply like below to get 3-day MovingAverage...
y <- rollapply(dataset, width = 3, function(x) mean(x, na.rm = TRUE ))
This is not what I want.
For example, In MovingAverage of A at "2018-07-09", the result is (NA+NA+150)/1 = 150. But I want to get (130+140+150)/3 = 140.
How can I do that?
I assume you want NAs to stay as NA and otherwise to take the mean of the last 3 non-NAs.
1) Take 5 elements at a time and if the last element is NA then return NA; otherwise, remove the NAs and take the mean of the last 3. Note that this does imply that the first 4 rows will be NA.
mean_bus <- function(x) if (is.na(tail(x, 1))) NA else mean(tail(na.omit(x), 3))
y1 <- rollapplyr(dataset, width = 5, mean_bus)
2) An alternate would be to take the last 3 non-NAs and then overwrite that with NAs in all positions where the input is NA.
mean_omit <- function(x) mean(tail(na.omit(x), 3))
y <- rollapplyr(dataset, 5, mean_omit)
y2 <- replace(y, is.na(dataset), NA)
all.equal(y1, y2)
## [1] TRUE
3) If you prefer to fill in the first 4 rows with partial values then convert to zoo and use the partial= argument of rollapplyr.zoo. mean_bus is from (1).
y3 <- as.xts(rollapplyr(as.zoo(dataset), 5, mean_bus, partial = TRUE))
You could either remove the NAs in each series before you compute the moving average (MA).
Or you use a larger window and keep only the last three values for the MA.
y <- rollapply(dataset, width = 5,
function(x) {mean(tail(x[ !is.na(x) ], 3))})

R: applying function to matrix except individual cell(s)

Suppose I have a 10x10 matrix. How can I fill it with 0's while excluding certain individual cells (preferably in a single operation)?
blank <- matrix(NA,nrow=10,ncol=10)
for (i in 1:10) {for (j in 1:10) {blank[i,j] <- 0 }}
# except blank[2,5], blank[9,3], blank[1,4], to be left NA
Probably more efficient to rather declare the matrix as 0s and assign the NAs to the small number of exception cells:
blank <- matrix(0, nrow = 10, ncol = 10)
blank[2, 5] <- blank[9, 3] <- blank[1, 4] <- NA
Or, more programmably:
coords <- list(c(2, 5),
c(9, 3),
c(1, 4))
blank[do.call("rbind", coords)] <- NA
(the key being this part of ?"["):
When indexing arrays by [ a single argument i can be a matrix with as many columns as there are dimensions of x; the result is then a vector with elements corresponding to the sets of indices in each row of i.
If this is supposed to be a random assignment of NA to a zero matrix then this might suffice.
zero3NA <- matrix(0, 10, 10)
zero3NA[ cbind( sample(nrow(zero3NA), 3), sample(ncol(zero3NA), 3) ) ] <- NA

randomly replace elements in a matrix

I would like to randomly replace elements in a matrix with some specified value, here -99. I tried the first method below and it did not work. Then I tried a different approach, also below, and it did work.
Why does the first method not work? What am I doing incorrectly? Thank you for any advice.
I suspect the second method is better because, apart from working, it allows me to specify the percentage of the elements I want replaced. The first method does not since it can randomly draw the same i,j pairs repeatedly.
Here is the first method, the one that does not work:
# This does not work
set.seed(1234)
ncols <- 10
nrows <- 5
NA_value <- -99
my.fake.data <- round(rnorm(ncols*nrows, 20, 5))
my.fake.grid <- matrix(my.fake.data, nrow=nrows, ncol=ncols, byrow=TRUE)
my.fake.grid
random.i <- sample(ncols, round(0.40*nrows*ncols), replace = TRUE)
random.j <- sample(nrows, round(0.40*nrows*ncols), replace = TRUE)
my.fake.grid[random.j, random.i] <- NA_value
my.fake.grid
Here is the second method, the one that does work:
# This works
set.seed(1234)
ncols <- 10
nrows <- 5
NA_value <- -99
my.fake.data <- round(rnorm(ncols*nrows, 20, 5))
my.fake.grid <- matrix(my.fake.data, nrow=nrows, ncol=ncols, byrow=TRUE)
my.fake.grid
my.fake.data2 <- c(my.fake.grid)
random.x <- sample(length(my.fake.data2), round(0.40*length(my.fake.data2)), replace = FALSE)
my.fake.data2[random.x] <- NA_value
my.fake.grid2 <- matrix(my.fake.data2, nrow=nrows, ncol=ncols, byrow=FALSE)
my.fake.grid2
Could try
library(data.table) # For faster cross/join, alterantively could use expand.grid
temp <- as.matrix(CJ(seq_len(nrows), seq_len(ncols))) # Create all possible row/column index combinations
indx <- temp[sample(nrow(temp), round(0.4 * nrow(temp))), ] # Sample 40% of them
my.fake.grid[indx] <- NA_value # Replace with -99
sum(my.fake.grid == -99)/(ncols * nrows) # Validating percentage
##[1] 0.4

Resources