Replacing NAs in a Matrix in R - r

I have created a container of NAs and am trying to replace the NAs with a specified value that is an argument in one of the functions.
num.cars.beg<-20
num.cars.end<-70
num.cars.incr<-5
num.cols<-(abs(num.cars.end-num.cars.beg)/num.cars.incr)+1
num.its<-10
car.var.mat<-matrix(NA,num.its,num.cols) #creates empty container to hold
results
car.intervals<-c(seq(num.cars.beg,num.cars.end, num.cars.incr))
colnames(car.var.mat)<-paste(car.intervals,"Cars",sep = " ")
rownames(car.var.mat)<-paste("Iter.",c(seq(1,num.its,1)),sep = "")
This has created a matrix where the rows are driven by "num.its" and the columns are "num.cars" from 20-70 in intervals of 5. For each iteration, I would like to run each column through my formula "run.sim" and replace the NAs with the value of run.sim. So for example:
num.cars = 20 num.cars = 25
num.its = 1 run.sim1 output run.sim2 output
num.its = 2 run.sim3 output run.sim4 output
where,
run.sim(num.cars = each value in car.intervals, num.its = 2)

The key to filling in the container is 1) to make sure the for loop structured as "1:numcols" or "1:numrows" and 2) that the argument in run.sim for num.cars is defined through [] of the matrix car.intervals, so that you clarify that the function loops through each row i, its respective location in each column j.
num.its <- 1
num.itr <- 10
for (i in 1:num.itr){
for (j in 1:num.cols){
car.var.mat[i,j]<-mean(run.sim(
num.cars = car.intervals[j],
num.its = num.its))
}
}

Related

How to treat a single row of matrix in R as a matrix object

I have an R script that removes random rows from an nxm (n row, m column) matrix depending on which elements occur in a data set. I have a conditional statement that terminates if there are no rows remaining. This works fine if there are 0 rows, but not if there is one.
For example, if I have
m1 = rbind(c(1,2),c(1,4),c(2,3),c(2,4))
and I delete all rows
m1 = m1[-c(1,2,3,4),]
the conditional statement
if(length(m1[,1]) > 0)
evaluates correctly to FALSE and the program terminates, since the object m1 is a 0x2 matrix. However, if I delete all but one row, e.g.
m1 = m1[-c(1,2,4),]
the same conditional statement does not evaluate because the remaining row is no longer treated as a matrix object of dimension 1xn, but rather as a numeric vector, so dim, length(m[,1]) etc are undefined.
Is there some way to preserve a single row as a 1xn matrix object, other than checking if only a single row remains and applying t(as.matrix(m1)), which would be a very clumsy approach?
I've appended my complete script below, but the details of the script shouldn't be necessary to address this question. The while(temp_mat[,1] > 0) is the step that breaks if I have a single row (but works fine if there are none or any number of rows > 1)
seq_to_mask = function(mat){
temp_mat = mat
to_mask = c()
iter = 0
while(length(temp_mat[,1])>0){
all_instances = c(temp_mat[,1],temp_mat[,2])
#number of times a sample appears
occurrences = sort(table(all_instances))
max_instances = as.numeric(names(occurrences)[length(occurrences)])
posits = which(temp_mat[,1]==max_instances | temp_mat[,2]==max_instances)
to_mask = c(to_mask, max_instances)
temp_mat = temp_mat[-posits,]
iter = iter + 1
}
return(to_mask)
}
The reason seems to be the coercion of matrix to vector when there is a single row/column. We can use drop = FALSE (by default it is drop = TRUE)
m1 <- m1[-c(1, 2, 4), , drop = FALSE]

How can I create a correlation matrix in R without removing null values?

im trying to use correlation matricies in r
my data has bunch of null or n/a values
my current approach is to convert these null values to 0
this works but it results in an inaccurate matrix because some columns that have correlation are overpowered by the 0
do you know of any solution to fix the na values?
Here is my code:
mydata = read.csv("exoplanet.csv")
res2 <- cor(mydata[sapply(mydata, function(x) is.numeric(x))])
res2[is.na(res2)] <- 0
corrplot(res2, type = "upper", order = "hclust",
tl.col = "black", tl.srt = 45)
Here is a workaround I tried:
mydata = read.csv("exoplanet.csv")
mydata = lapply(mydata, as.numeric)
mydata = as.matrix(as.numeric(unlist(mydata)))
//the reason i do this is because otherwise i get a list to double error
now when i try to use this graph, i get this error:
The matrix is not in [-1, 1]! or In as.dist.default(1 - corr) : non-square matrix
You need to remove columns that has only 1 type of value (excluding NAs):
x = read.csv("./Downloads/Exoplanet extract new.csv")
is_num = sapply(x,is.numeric)
not_mono = sapply(x,function(i)length(unique(i[!is.na(i)])))>1
Then:
cor(x[,is_num & not_mono],use="p")
You still have NA cells due to some missing values between some columns, but most likely this is the best you can do.

Assign a value to character named variable with index

I know we can use assign to assign values to a character name vector. For example
assign("target",1:5)
However, if we want to change the 1st element of the target(target can be a vector/matrix/list), how should we do that? target here can also be a matrix, so we can change one element, one row or one column.
I want to do something like
target[1] <- 99
if I use
assign("target[1]",99)
it will only generate a new object named target[1] and value is 99. Here is a simple and trial example
# This function is meaningless, just used to show my situation
# variable_name is a character
example_function <- function(variable_name){
assign(variable_name,1:5)
if(rnorm(1)>1){
variable_name[1] <- 99 #This will not work and I just need some function to achive this purpose
}
}
example_function("justAname")
As an alternative approach you could use the [<- function.
f = function(variable_name){
assign(variable_name,1:5)
if(rnorm(1)>1){
`[<-`(eval(as.name(variable_name)),i = 1, value = 99)
}
get(variable_name)
}
This should also work with matrices
f_mat = function(variable_name){
assign(variable_name,matrix(1:25,nrow = 5))
if(rnorm(1)>1){
`[<-`(eval(as.name(variable_name)),i = 1, j = , value = 99) # for 1st row
# `[<-`(eval(as.name(variable_name)),i = , j = 1, value = 99) # for 1st col
#specify i and j for ith row jth column
}
get(variable_name)
}
and lists similarly.

Reducing row reference by 1 for each for loop iteration in R

I'm working on a formula in R, that iterates over a data frame in reverse. Right now, the formula will take a set number of columns, and find the mean for each column, up to a set row number. What I'd like to do is have the row number decrease by 1 for each iteration of the for loop. The goal here is to create a "triangular" reference that uses one less value for the column means, per iteration.
Here's some code you can use to create sample data that works in the formula.
test = data.frame(p1 = c(1,2,0,1,0,2,0,1,0,0), p2 = c(0,0,1,2,0,1,2,1,0,1))
Here's the function I'm working with. My best guess is that I'll need to add some sort of reference to i in the mean(data[1:row, i]) section, but I can't seem to work the logic/math out on my own.
averagePickup = function(data, day, periods) {
# data will be your Pickup Data
# day is the day you're forecasting for (think row number)
# periods is the period or range of periods that you need to average (a column or range of columns).
pStart = ncol(data)
pEnd = ncol(data) - (periods-1)
row = (day-1)
new_frame <- as.data.frame(matrix(nrow = 1, ncol = periods))
for(i in pStart:pEnd) {
new_frame[1,1+abs(ncol(data)-i)] <- mean(data[1:row , i])
}
return(sum(new_frame[1,1:ncol(new_frame)]))
}
Right now, inputing averagePickup(test,5,2) will yield a result of 1.75. This is the sum of the means for the first 4 values of the two columns. What I'd like the result to be is 1.33333. This would be the sum of the mean of the first 4 values in column p1, and the mean of the first 3 values in column p2.
Please let me know if you need any further clarification, I'm still a total scrub at R!!!
Like this?
test = data.frame(p1 = c(1,2,0,1,0,2,0,1,0,0), p2 = c(0,0,1,2,0,1,2,1,0,1))
averagePickup = function(data, first, second) {
return(mean(test[1:first,1]) + mean(test[1:second,2]))
}
averagePickup(test,4,3)
This gives you 1.333333
Welp, I ended up figuring it out with a few more head bashes against the wall. Here's what worked for me:
averagePickup = function(data, day, periods) {
# data will be your Pickup Data
# day is the day you're forecasting for (think row number)
# periods is the period or range of periods that you need to average (a column or range of columns).
pStart = ncol(data)
pEnd = ncol(data) - (periods-1)
row = (day-1)
new_frame <- as.data.frame(matrix(nrow = 1, ncol = periods))
q <- 0 # Instantiated a q value. Run 0 will be the first one.
for(i in pStart:pEnd) {
new_frame[1,1+abs(ncol(data)-i)] <- mean(data[1:(day - periods + q) , i]) # Added a subtraction of q from the row number to use.
q <- q + 1 # Incrementing q, so the next time will use one less row.
}
return(sum(new_frame[1,1:ncol(new_frame)]))
}

use rollapply for certain rows

I wonder if it is possible to use rollapply() only for certain rows of a dataframe. I know the "by" argument can specify the every by-th time point at which I calculate FUN, but now I have a very specific vector of row indices to which I wish to apply the rollapply(). For example, I have the below dataframe:
df <- data.frame(x = (1:10), y = (11:20))
I know how to calculate the rolling mean for y column when the rolling width is 3.
library(zoo)
m <- rollapply(df$y, width = 3, FUN = mean, fill = NA, align = "right")
But what if I want the width-3-mean only for the 4th and 9th row? Is there something in "by" argument that I can manipulate? Or some other better methods (using apply to do rolling calculation maybe)?
Hopefully I am understanding your question correctly. I think you are asking how to perform a function on every 4th and 9th element in a sliding window? If yes, just restrict your function to the 4th and 9th element using x[4] and x[9]. Like this:
output <- rollapply(df, 9, function(x) (x[4] + x[9])/2), fill = NA, align = "right")
I also interpret your question as asking how to get the mean when the window contains the 4th or 9th row? This can be done by sub setting. The question you need to think about is where you want the 4th and 9th row to be located within your window. Do you want the 4th row to be at position x[1], x[2], or x[3] within your window? Depending on what is at the other positions will obviously effect your output. Say you dont know, and all three seem reasonable, you will need to write a function a that creates a list of dataframes containing the range of data you are interested in, and then use an apply function, or a for loop, to rollapply the mean function over each dataframe in the list. You can then all of these outputs into a dataframe to work with further. Like this:
# the rlist library has a function that allows us to add items to a list
# which will be handy later on
library(rlist)
library(zoo)
# your example data
df <- data.frame(x = (1:10), y = (11:20))
# a vector of your desired rows
desired_rows <- c(4,9)
# A for loop that generates a list of dataframes
# with your desired rows in the middle of each
for (i in desired_rows){
lower_bound <- i-2
upper_bound <- i+2
df_subset <- df[c(lower_bound:upper_bound), ]
if(exists("list_df_range")){
list_df_range <- list.append(list_df_range, df_subset)
}else{
list_df_range <- list(df_subset)
}
}
# a second for loop that applies your rollapply function to each
# data frame in the list and then
# returns a dataframe of the final results
# with each column named after the originating row
for (n in list_df_range){
m <- rollapply(n$y, width = 3, FUN = mean, fill = NA, align = "right")
if(exists("final_out")){
final_out <- cbind(final_out, m)
}else{
final_out <- data.frame(m)
}
}
names(final_out) <- desired_rows
Based on the comment below the question by the poster it seems that what is wanted is to take the mean of each rolling window of width 3 excluding the middle element in each window and only keeping the 4th and 9th elements so
cc <- c(4, 9)
rollapply(df$y, list(c(-2, 0)), mean, fill = NA)[cc]
## [1] 13 18
or
rollapplyr(df$y, 3, function(x) mean(x[-2]), fill = NA)[cc]
## [1] 13 18
or
sapply(cc, function(ix) mean(df$y[seq(to = ix, by = 2, length = 2)]))
## [1] 13 18
or
(df$y[cc - 2] + df$y[cc]) / 2
## [1] 13 18

Resources