Assess each number in a column in dataframe with while loop - r

Let's say I have the code below:
data = data.frame(x=numeric(), y=numeric(), z=numeric(), ans=numeric())
x = rnorm(10,0,.01)
y = rnorm(10,0,.45)
z = rnorm(10,0,.8)
ans = x+y+z
data = rbind(data, data.frame(x=x, y=y, z=z, ans=ans))
example = function(ans) {
ans^2
}
data$result = example(data$ans)
I want to use a while loop to assess the ans column of the dataframe and if all of the numbers in the column are less than 0 perform the example function on the ans column. If they are not all below 0 I would like it to run again until they are. Any help is appreciated.

You can use any to test whether there are negative x values.
data = data.frame(x=rnorm(10,0,.01),
y = rnorm(10,0,.45),
z = rnorm(10,0,.8))
while(any(data$x >= 0)){
data$x[data$x >= 0] <- rnorm(length(data$x[data$x >= 0]),0,.01)
}
data$ans = x+y+z
print(data)
x y z ans
1 -0.0014348613 0.51931771 -0.4695617 1.2199625
2 -0.0037155145 -0.72322260 0.4650501 2.2660665
3 -0.0007619743 0.42842295 -0.3660313 0.2690932
4 -0.0068680912 0.36888855 1.4445536 0.6955025
5 -0.0134698425 -0.17174076 -1.2325956 0.7463931
6 -0.0029502825 -0.04208495 -1.4656484 -0.7020727
7 -0.0027566384 0.09476311 -0.1328970 -0.1437156
8 -0.0188576808 -0.25938843 -0.6648152 0.4843587
9 -0.0013769550 -0.00792926 0.4946057 -2.1885040
10 -0.0026376453 -0.15831996 -0.1263073 -0.2611772

Related

R: How to access a 'complicated list'

I am working on an assignment, which tasks me to generate a list of data, using the below code.
##Use the make_data function to generate 25 different datasets, with mu_1 being a vector
x <- seq(0, 3, len=25)
make_data <- function(a){
n = 1000
p = 0.5
mu_0 = 0
mu_1=a
sigma_0 = 1
sigma_1 = 1
y <- rbinom(n, 1, p)
f_0 <- rnorm(n, mu_0, sigma_0)
f_1 <- rnorm(n, mu_1, sigma_1)
x <- ifelse(y == 1, f_1, f_0)
test_index <- createDataPartition(y, times = 1, p = 0.5, list = FALSE)
list(train = data.frame(x = x, y = as.factor(y)) %>% slice(-test_index),
test = data.frame(x = x, y = as.factor(y)) %>% slice(test_index))
}
dat <- sapply(x,make_data)
The code looks good to go, and 'dat' appears to be a 25 column, 2 row table, each with its own data frame.
Now, each data frame within a cell has 2 columns.
And this is where I get stuck.
While I can get to the data frame in row 1, column 1, just fine (i.e. just use dat[1,1]), I can't reach the column of 'x' values within dat[1,1]. I've experimented with
dat[1,1]$x
dat[1,1][1]
But they only throw weird responses: error/null.
Any idea how I can pull the column? Thanks.
dat[1, 1] is a list.
class(dat[1, 1])
#[1] "list"
So to reach to x you can do
dat[1, 1]$train$x
Or
dat[1, 1][[1]]$x
As a sidenote, instead of having this 25 X 2 matrix as output in dat I would actually prefer to have a nested list.
dat <- lapply(x,make_data)
#Access `x` column of first list from `train` dataset.
dat[[1]]$train$x
However, this is quite subjective and you can chose whatever format you like the best.

Mapply recycling in R

I have written a function which undertakes arima modelling, outputs a table of coefficients and p-values, ranks the p-values and returns an arima model with no significant variables.
The function takes two inputs, a time series objects, and a data frame.
Here is the code:
backward_stepwise<-function(x, y){
repeat{
arima_result<-auto_arima(x)
arima_pvals<-p_calc(arima_result)
arima_outputs<-run_outputs(arima_result, arima_pvals)
arima_ranked<-rank_pval(arima_outputs)
# temporary fix to .xreg being added to term names
for(i in 1:length(arima_ranked$term)){
arima_ranked$term<-gsub(arima_ranked$term, pattern = 'xreg.',
replacement = "")
}
remove_num_one<-remove_one(arima_ranked)
# removed the cond_select function so that y and x write over
themselves
y<-subset(y, select = colnames(y) != remove_num_one)
x<-as.ts(y)
if(min(arima_ranked$rank, na.rm = TRUE) != 1){
break
}
}
return(arima_result)
}
I am going to apply this to a list of time series objects and list of data frames
Example of time series list
CAN_V98
ADE_U91
ADE_V95
Example of data frames
CAN_V98
ADE_U91
ADE_V95
When I apply it view mapply or for loop, are either methods taking the values from the same index. I.e will the backward step-wise function strip variables from CAN_V98 and keep using CAN_V98 from the data frame list, or after performing its first loop it will use the second data frame from the list of data frames.
# Application via for loop
for(i in mkt_grd){
x<-list_ts_actual[[i]]
y<-list_df_actual[[i]]
ts_outputs[[i]]<-backward_stepwise(x, y)
}
# Application via mapply
ts_outputs1<-mcmapply(backward_stepwise, list_ts_actual,
list_df_actual,SIMPLIFY = FALSE)
Thank you for any assistance
mapply doesn't work like an embedded for loop ie. it will not run through every j for each i so to speak.
mapply(function(x, y){cat("x = ", x, " y = ", y, "\n")},
x = 1:5, y = 1:5)
x = 1 y = 1
x = 2 y = 2
x = 3 y = 3
x = 4 y = 4
x = 5 y = 5
as you can see if it iterates through in parallel, if the length of list y is shorter it will recycle, ie,
mapply(function(x, y){cat("x = ", x, " y = ", y, "\n")},
x = 1:5, y = 1:3)
x = 1 y = 1
x = 2 y = 2
x = 3 y = 3
x = 4 y = 1
x = 5 y = 2

Mice: partial imputation using where argument failing

I encounter a problem with the use of the mice function to do multiple imputation. I want to do imputation only on part of the missing data, what looking at the help seems possible and straightworward. But i can't get it to work.
here is the example:
I have some missing data on x and y:
library(mice)
plouf <- data.frame(ID = rep(LETTERS[1:10],each = 10), x = sample(10,100,replace = T), y = sample(10,100,replace = T))
plouf[sample(100,10),c("x","y")] <- NA
I want only to impute missing data on y:
where <- data.frame(ID = rep(FALSE,100),x = rep(FALSE,100),y = is.na(plouf$y))
I do the imputation
plouf.imp <- mice(plouf, m = 1,method="pmm",maxit=5,where = where)
I look at the imputed values:
test <- complete(plouf.imp)
Here i still have NAs on y:
> sum(is.na(test$y))
[1] 10
if I use where to say to impute on all values, it works:
where <- data.frame(ID = rep(FALSE,100),x = is.na(plouf$x),y = is.na(plouf$y))
plouf.imp <- mice(plouf, m = 1,method="pmm",maxit=5,where = where)
test <- complete(plouf.imp)
> sum(is.na(test$y))
[1] 0
but it does the imputation on x too, that I don't want in this specific case (speed reason in a statistial simulation study)
Has anyone any idea ?
This is happening because of below code -
plouf[sample(100,10),c("x","y")] <- NA
Let's consider your 1st case wherein you want to impute y only. Check it's PredictorMatrix
plouf.imp <- mice(plouf, m = 1, method="pmm", maxit=5, where = whr)
plouf.imp
#PredictorMatrix:
# ID x y
#ID 0 0 0
#x 0 0 0
#y 1 1 0
It says that y's missing value will be predicted based on ID & x since it's value is 1 in row y.
Now check your sample data where you are populating NA in x & y column. You can notice that wherever y is NA x is also having the same NA value.
So what happens is that when mice refers PredictorMatrix for imputation in y column it encounters NA in x and ignore those rows as all independent variables (i.e. ID & x) are expected to be non-missing in order to predict the outcome i.e. missing values in y.
Try this -
library(mice)
#sample data
set.seed(123)
plouf <- data.frame(ID = rep(LETTERS[1:10],each = 10), x = sample(10,100,replace = T), y = sample(10,100,replace = T))
plouf[sample(100,10), "x"] <- NA
set.seed(999)
plouf[sample(100,10), "y"] <- NA
#missing value imputation
whr <- data.frame(ID = rep(FALSE,100), x = rep(FALSE,100), y = is.na(plouf$y))
plouf.imp <- mice(plouf, m = 1, method="pmm", maxit=5, where = whr)
test <- complete(plouf.imp)
sum(is.na(test$y))
#[1] 1
Here only one value of y is left to be imputed and in this case both x & y are having NA value i.e. row number 39 (similar to your 1st case).

R-create a data frame with 2 rows in a loop

How can I create a data frame with 2 rows with this structure?
X1 Y1 Calc1 X2 Y2 Calc2 … Xn Yn Calcn
1 4 0.25 2 5 0.4 i i+3 i/i+3
I tried using this code:
dataRowTemp<-numeric(length = 0)
dataRow<-numeric(length = 0)
headerRowTemp<-character(length = 0)
headerRow<-character(length = 0)
for (i in 1:150){
X<- i
Y<- i+3
Calc <- X/Y
dataRowTemp <- c(X,Y,Calc)
dataRow<-c(dataRow,dataRowTemp)
headerRowTemp <- paste(c("X", i),c("Y", i),c("Calc", i),sep='')
headerRow<-c(headerRow,headerRowTemp)
}
unfortunately, I can’t create the a correct header (titleRow) and how can I combine them to data.frame later?
Is there an elegant and better way to do so?
Build a function to be used in each iteration.
myfun <- function(i) {
X <- i
Y <- i + 3
c(X = X, Y = Y, Calc = X/Y)
}
Set the number of iterations.
n <- 150
Apply the function to the numbers from 1 to n, use matrix(..., nrow = 1) to store the output in a matrix of only 1 row, and transform it into a data.frame (because it is what you say you aim at).
mydf <- data.frame(matrix(sapply(seq_len(n), myfun), nrow = 1))
Use paste0 in a loop to iteratively assign names to the column of your data.frame.
names(mydf) <- c(sapply(seq_len(n), function(i) paste0(c('X', 'Y', 'Calc'), i)))

Looping over dataframe to create scatterplots

Data frame
x <- data.frame(id = c("A","B","C"), x_predictor = c(5,6,7), x_depended = c(5.5, 6.5, 7.5), y_predictor=c(2,3,2), y_depended=c(3,3,2), z_predictor=c(12,10,12), z_depended=c(14,11,13))
> x
id x_predictor x_depended y_predictor y_depended z_predictor z_depended
1 A 5 5.5 2 3 12 14
2 B 6 6.5 3 3 10 11
3 C 7 7.5 2 2 12 13
I would like to create a scatterplot for each level on ID and for each pair depended and predictor.
I have created a for loop where I loop over unique levels in ID, but how can I loop over pairs of depended and predictor?
uni <- unique(x$id)
for (p in uni){
print(ggplot(x[x$id==p], aes(y = x_depended,x = x_predictor))+geom_point()
}
I would like to plot depended vs predictor. Depended will always be in following column to its predictor.
This code will plot three different scatter plots where each plot will contain the different columns that you have in your data frame.
require(ggplot2)
x_plots <- list()
uni <- unique(x$id)
uni_counter <- 0
i <- 0
for (colnum in seq(2, 6, 2)) {
x_col <- names(x)[colnum]
y_col <- names(x)[colnum + 1]
# Retrieve the current uni.
curr_uni <- uni[uni_counter]
# Increment our counters
uni_counter <- uni_counter + 1
i <- i + 1
# Create the ggplot command,
# the command is created dynamically so that we can iterate through
# different columns in our data frame.
ggplot_cmd <- paste0("x_plots[[i]] <- ggplot(x[x$id == curr_uni], aes(y = ", y_col, ", x = ", y_col, "))+geom_point()")
# Evaluate each plot.
eval(parse(text = ggplot_cmd))
}
You can than load the multiplot() function posted here to draw all the generated plots in one figure using:
multiplot(plotlist = x_plots)
Hope this helps.

Resources