Using ifelse for changing columns in R - r

I would like to apply the ifelse condition to the following data frame according to the schema. I can do it repeatedly, but I have a lot of data.
My code:
d <- data.frame(x_1 = sample(1:100,10),x_2 = sample(1:100,10), y_1 =sample(1:100,10), y_2 =sample(1:100,10),
y_3 =sample(1:100,10), y_4 =sample(1:100,10))
ifelse(d$x_1>d$y_1, 0, d$x_1-d$y_1)
ifelse(d$x_2>d$y_2, 0, d$x_2-d$y_2)
ifelse(d$x_1>d$y_3, 0, d$x_1-d$y_3)
ifelse(d$x_2>d$y_4, 0, d$x_2-d$y_4) # x_1>y_5..., x_2>y_6,...
Edit:
My x_.* are days of the week so I have x_1...x_7. But my y_.* are many. Code should work as follows:
x_1-y_1
x_2-y_2
x_3-y_3
x_4-y_4
x_5-y_5
x_6-y_6
x_7-y_7
x_1-y_8
x_2-y_9
.
.
.

If you want to compare every x_.* column with every y_.* column you can use outer.
First find out "x" and "y" columns.
x_col <- grep('x', names(d), value = TRUE)
y_col <- grep('y', names(d), value = TRUE)
We can create an index to subset x_col. The ifelse logic can be simplified to pmin
inds <- seq_along(y_col) %% length(x_col)
inds[inds == 0] <- length(x_col)
We can use mapply to subtract columns.
mapply(function(x, y) pmin(0, x - y), d[inds], d[y_col])

Related

R: How to access a 'complicated list'

I am working on an assignment, which tasks me to generate a list of data, using the below code.
##Use the make_data function to generate 25 different datasets, with mu_1 being a vector
x <- seq(0, 3, len=25)
make_data <- function(a){
n = 1000
p = 0.5
mu_0 = 0
mu_1=a
sigma_0 = 1
sigma_1 = 1
y <- rbinom(n, 1, p)
f_0 <- rnorm(n, mu_0, sigma_0)
f_1 <- rnorm(n, mu_1, sigma_1)
x <- ifelse(y == 1, f_1, f_0)
test_index <- createDataPartition(y, times = 1, p = 0.5, list = FALSE)
list(train = data.frame(x = x, y = as.factor(y)) %>% slice(-test_index),
test = data.frame(x = x, y = as.factor(y)) %>% slice(test_index))
}
dat <- sapply(x,make_data)
The code looks good to go, and 'dat' appears to be a 25 column, 2 row table, each with its own data frame.
Now, each data frame within a cell has 2 columns.
And this is where I get stuck.
While I can get to the data frame in row 1, column 1, just fine (i.e. just use dat[1,1]), I can't reach the column of 'x' values within dat[1,1]. I've experimented with
dat[1,1]$x
dat[1,1][1]
But they only throw weird responses: error/null.
Any idea how I can pull the column? Thanks.
dat[1, 1] is a list.
class(dat[1, 1])
#[1] "list"
So to reach to x you can do
dat[1, 1]$train$x
Or
dat[1, 1][[1]]$x
As a sidenote, instead of having this 25 X 2 matrix as output in dat I would actually prefer to have a nested list.
dat <- lapply(x,make_data)
#Access `x` column of first list from `train` dataset.
dat[[1]]$train$x
However, this is quite subjective and you can chose whatever format you like the best.

summation of columns of a data.frame based on ifelse conditions in R?

I want to add Column A and X based on the conditions that if the result is greater than zero, use the summation result otherwise make it zero. Here is my code so far
library(tidyverse)
library(set.seed(1500)
FakeData <- data.frame(A = runif(20,-5,20), X = runif(20,0,22))
FakeData$sum <- if (sum(FakeData$A+FakeData$X) < 0){
0
} else {
sum(FakeData$A+FakeData$X
}
We can use pmax in base R. It should be faster
FakeData$Sum <- with(FakeData, pmax(0, A + X))
The if/else is not vectorized. Instead use ifelse
FakeData$Sum <- with(FakeData, ifelse(A + X < 0, 0, A + X))

How to run a test for some column relative to all the rest columns in R?

I want to do grangertest() for some column relative to all the rest columns in data frame.
How can I do this automatically ?
library(lmtest)
df <- data.frame(rnorm(100),
rnorm(100),
rnorm(100),
rnorm(100),
rnorm(100))
grangertest(df[, 1] ~ df[, 2], order = 1)
grangertest(df[, 1] ~ df[, 3], order = 1)
grangertest(df[, 1] ~ df[, 4], order = 1)
grangertest(df[, 1] ~ df[, 5], order = 1)
grangertest(df[, 2] ~ df[,1], order = 1)
grangertest(df[, 2] ~ df[,3], order = 1)
grangertest(df[, 2] ~ df[,4], order = 1)
grangertest(df[, 2] ~ df[,5], order = 1)
# and so on, with different values of "order"
As a result, I want to get a table with values F and Pr(>F) for each combination of variables.
You can also go for lapply.
library(lmtest)
df <- data.frame(rnorm(100),
rnorm(100),
rnorm(100),
rnorm(100),
rnorm(100))
lapply(df[,-1], grangertest, df[,1])
lapply(df[,-2], grangertest, df[,2])
lapply(df[,-3], grangertest, df[,3])
lapply(df[,-4], grangertest, df[,4])
lapply() applies a function to a list (here the dataframe is a list of vectors) and returns a list that contains the results.
Please revise your question to be a good question and contain a reproducible example.
Either way, regarding your question:
What you want to do is apply a function to all the columns in the dataframe. Here's the dplyr solution:
library(tidyverse) # load the package(s) you need (the dplyr package is contained here)
df <- mtcars # I'll just take one of the in-built R datasets
compareVar <- df$mpg # I copy one variable of the dataframe
df %>% # I take the dataframe
select(-mpg) %>% # remove the variable I copied before
summarise_all(cor, y = compareVar) # and apply the function cor() to all columns
Important: you have to provide the function (cor() in my case grangertest() in your case) without the parentheses, i.e. you pass the object. Additional arguments must be provided also within the summarise_all() function. In my case the argument I provide is y = - it is an argument of the function cor().

Subset in the data frame rows in R

I have a data frame with 30 rows and 4 columns (namely, x, y, z, u). It is given below.
mydata = data.frame(x = rnorm(30,4), y = rnorm(30,2,1), z = rnorm(30,3,1), u = rnorm(30,5))
Further, I have a sequence values, which represent row number in my data frame.
myseq = c(seq(1, 30, by = 5))
myseq
[1] 1 6 11 16 21 26
Now, I wanted to compute the prob values for each segment of 99 rows.
filt= subset(mydata[1:6,], mydata[1:6,]$x < mydata[1:6,]$y & mydata[1:6,]$z < mydata[1:6,]$u
filt
prob = length(filt$x)/30
prob
Then I need to compute the above prob for 1:6,.., 27:30 and so on . Here, I have only 6 prob values. So, I can do one by one. If I have 100 values it would be tedious. Are there any way to compute the prob values?.
Thank you in advance.
BTW: in subset(DF[1:99,], ...), use DF[1:99,] in the first argument, not again, ala
subset(DF[1:99,], cumsuml < inchivaluel & cumsumr < inchivaluer)
Think about how to do this in a list.
The first step is to break your data into the va starting points. I'll start with a list of the indices to break it into:
inds <- mapply(seq, va, c(va[-1], nrow(DF)), SIMPLIFY=FALSE)
this now is a list of sequences, starting with 1:99, then 100:198, etc. See str(inds) to verify.
Now we can subset a portion of the data based on each element's vector of indices:
filts <- lapply(inds, function(ind) subset(DF[ind,], cumsuml < inchivaluel & cumsumr < inchivaluer))
We now have a list of vectors, let's summarize it:
results <- sapply(filts, function(filt) length(filt$cumsuml)/length(alpha))
Bottom line, it helps to think about how to break this problem into lists, examples at http://stackoverflow.com/a/24376207/3358272.
BTW: instead of initially making a list of indices, we could just break up the data in that first step, ala
DF2 <- mapply(function(a,b) DF[a:b,], va, c(va[-1], nrow(DF)), SIMPLIFY=FALSE)
filts <- lapply(DF2, function(x) subset(x, cumsuml < inchivaluel & cumsumr < inchivaluer))
results <- sapply(filts, function(filt) length(filt$cumsuml)/length(alpha))

how to correlate 2 variables when X > 1

I have a data set and want to run a correlation between X and Y. However, I only want to look at X values that are greater than 1.
cor(Data$X, Data$Y, use = "complete.obs")
What argument do I add to run a correlation between X and Y only for the X values that are greater than 1?
You can subset using the [ operator.
Try this:
# Generate Example Data
Data <- data.frame(X = seq(-5, 10, 1),
Y = sample(1:100, 16))
with(data = Data[Data$X > 1, ], cor(X, Y, use = "complete.obs"))
[ lets us specify rows and columns in the style my.data.frame[rows, columns]. Here we are specifying that we want only rows where X > 1, but all columns. We could also do the following to ask for each column individually by name:
cor(Data[Data$X > 1, "X"], Data[Data$X > 1, "Y"], use = "complete.obs"))
Or even the following to subset the column vectors:
cor(Data$X[Data$X > 1], Data$Y[Data$X > 1], use = "complete.obs"))
Of course, these are only to illustrate the flexibility. It's best to subset the whole data set once to avoid discrepancies.

Resources