Compute multiple arrays of same variables into one Variables in R? - r

How to compute different parameters as one in R. For example. I have 3 arrays of a variable A called A1.1,A1.2,A1.3. I want to compute them in one as "A". How to do that?
A1.1>c(1,1,1,0,0,0)
A1.2>c(1,0,0,1,1,1)
A1.3>c(0,1,1,1,1,1)
Out put should be like this. in SPSS we do this by compute variables.
A>c(1,1,1,1,1,1)

In R you can use simple math on arrays, for example:
A1.1 <- c(1,0,1,0,0,0)
A1.2 <- c(1,0,0,1,1,1)
A1.3 <- c(0,0,1,1,1,1)
A1 <- 1*((A1.1 + A1.2 + A1.3)>0)
> A1
[1] 1 0 1 1 1 1

In R you can use the any() function inside of apply() to make this check. For example:
a1 <- c(1,0,0,0,1,1)
a2 <- c(0,1,0,0,0,1)
a3 <- c(0,1,1,0,1,1)
a <- apply(data.frame(a1,a2,a3), 1, function(x) ifelse(any(x),1,0))
And then as output:
> a
[1] 1 1 1 0 1 1
In SPSS you can take a similar approach:
COMPUTE a = ANY(1, a1 TO a3) .
EXE .

Related

Assign 0 or 1 based on conditional probability + previous simulated data

I'm doing a simulation study and I have some problem generating data that meet certain conditions.
My first simulated data looks like below.
A1 A2
1 0.8 6
2 0.5 3
3 0.9 2
...
1000
This is how I generated A1 & A2
set.seed(47)
df <- data.frame(A1 = rnorm(1000, mean=0.7, sd=0.1), A2 = rnorm(1000, mean=4, sd=1))
df
In tabular format, this is how the conditional statement looks where 0=fail and 1=pass and the output in the table is the probability of getting a 1 for A3.
A1 0 1
A2
0 0.1 0.3
1 0.9 0.7
Here is the explanation in words:
I want to generate a third row (A3) based on conditional probabilities of the first two rows. This is the condition I want to apply.
If A1>=0.7 (pass) & A2>=0.8 (pass) --> A3=1 with a 70% probability (implying %30 of zero)
If A1>=0.7 (pass) & A2<0.8 (fail) --> A3=1 with a 30% probability
If A1<0.7 (fail) & A2>=0.8 (pass) --> A3=1 with a 90% probability
If A1<0.7 (fail) & A2<0.8 (fail)--> A3=1 with a 10% probability
I hope my logic makes sense. Please let me know if I need more data or words to better explain. Thank you.
You could use a little trick here of converting logical vectors to integers then counting in binary.
If you do the logical test df$A1 >= 0.7 you get a vector of TRUE and FALSE values. If instead you do as.numeric(df$A1 >= 0.7) you get the equivalent vector of 1s and 0s. The trick is to do this for both variables, but multiply the second vector by 2. Now if you add both vectors together, you will get a number between 0 and 3 that corresponds to your truth table:
A1 pass, A2 pass = 3
A1 fail, A2 pass = 2
A1 pass, A2 fail = 1
A1 fail, A2 fail = 0.
Note that if we add one to these numbers, we get a value between one and four. We can therefore use them as indexes of our probability vector:
probs <- c(0.1, 0.3, 0.9, 0.7)[(df$A1 >= 0.7) + 2*(df$A2 >= 0.8)]
That means we can generate the random binary numbers using rbinom like so:
df$A3 <- rbinom(1000, 1, probs)
Resulting in:
head(df)
#> A1 A2 A3
#> 1 0.8994696 5.345481 1
#> 2 0.7711143 3.662635 1
#> 3 0.7185405 3.125840 1
#> 4 0.6718235 3.914527 0
#> 5 0.7108776 3.366858 1
#> 6 0.5914263 2.082173 0
Created on 2022-09-30 with reprex v2.0.2

Error about arguments of length zero when judge a dataframe is Null in R

Hi there I am new to R and I am trying to see if the df is Null, if it is, then assigned "empty" to it but there is an error like this, do you know how to solve it?
if(is.na(df)){
df <- "Empty"}
Error in if (is.na(data_si)) { : argument is of length zero
Suppose you have this data frame
df<-data.frame(
A=c('A1','A2','A3','A4'),
B=c(1,2,3,NA)
)
print(df)
output:
A B
1 A1 1
2 A2 2
3 A3 3
4 A4 NULL
And you want to replace the missing value with the word "Empty". Then, you can do a nested for loop, combined with if statement:
for(i in 1:nrow(df)){
for(j in 1:length(df)){
if(is.null(df[i,j])){
df[i,j]<-'empty'
}else{
next
}
}
}
Output:
A B
1 A1 1
2 A2 2
3 A3 3
4 A4 empty
The reason for this code is to inspect every single entry inside the data frame, whether they are NA or not. The "i" represents index for rows and the "j" represents index for column. The "next" means we skip the particular entry if it's not NULL (do nothing if it's not NULL)
Another way to do this problem is by using ifelse.
You can search manually for each columns
df$A<-ifelse(is.null(df$A),'empty',df$A)
df$B<-ifelse(is.null(df$B),'empty',df$B)
or you can also use for loop as well
for(i in 1:length(df)){
df[,i]<-ifelse(is.null(df[,i]),'empty',df[,i])
}
Both ways will give you the same results
Hope it helps!
Subset the whole dataframe on NA values and replace them:
df[is.na(df)] <- "Empty"
df
A B
1 A1 1
2 A2 2
3 A3 3
4 A4 Empty

How to connect multiple variables into one vector

I have variables from c1 to c24, totally 24 variables. I want to do something like:
b <- c(c1,c2,c3,c4,c5,c6,c7,c8,c9,
c10,c11,c12,c13,c14,c15,c16,c17,
c18,c19,c20,c21,c22,c23,c24)
How could I do this ? It is not working to use something like b <- c(c 1:c24), R only connects two values (c1 and c24) in this case, but I want to put all 24 values into this vector.
You can do this with lapply and get:
c1 <- c2 <- c3 <- c4 <- 1
unlist( ## convert from list to vector
lapply(
paste0("c",1:4), ## names of variables
get) ## retrieve variable by name
)
## [1] 1 1 1 1
In general, it would be a good idea to look further back in your workflow and see if it's possible to generate those variables within a list in the first place ...

subset of data frame on based on multiple conditions

I'm actually having a trouble with a particular task of my code. I have a data frame as
n <- 6
set.seed(123)
df <- data.frame(x=paste0("x",seq_along(1:n)), A=sample(c(-2:2),n,replace=TRUE), B=sample(c(-1:3),n,replace=TRUE))
#
# x A B
# 1 x1 -1 1
# 2 x2 1 3
# 3 x3 0 1
# 4 x4 2 1
# 5 x5 2 3
# 6 x6 -2 1
and a decision tree as
A>0;Y;Y;N;N
B>1;Y;N;Y;N
C;1;2;2;1
that I load by
dt <- read.csv2("tmp.csv", header=FALSE)
I'd like to create a loop for all the possible combinations of (A>0) and (B>1) and set the C value to the subset x column that satisfy that condition. So, here's what I did
nr <- 3
nc <- 5
cond <- dt[1:(nr-1),1,drop=FALSE]
rule <- dt[nr,1,drop=FALSE]
subdf <- vector(mode="list",2^(nr-1))
for (i in 2:nc) {
check <- paste0("")
for (j in 1:(nr-1)) {
case <- paste0(dt[j,1])
if (dt[j,i]=="N")
case <- paste0("!",case)
check <- paste0(check, "(", case, ")" )
if (j<(nr-1))
check <- paste0(check, "&")
}
subdf[i] <- subset(df,check)
subdf[i]$C <- dt[nr,i]
}
unlist(subdf)
unfortunately, I got an error using subset as by this, it cannot parse the conditions from my string statements. what should I do?
Your issue is your creating of the subset: the subset commands expects a boolean and you gave it a string. ('check'). So the simplest solution here is to add a 'parse'. I feel there is a more elegant way to solve this problem and I hope someone'll come along and do it, but you can fix the final part of your code with the following
mysubset <- subset(df,with(df,eval(parse(text=check))))
if(nrow(mysubset)>0){
mysubset$C <- dt[nr,i]
}
subdf[[i]]<-mysubset
I have added the parse/eval part to generate a vector of booleans to subset only the 'TRUE' cases, and added a check for whether C could be added (will give error if there are no rows).
Based on the previous answer, I came up with a more elegant/practical way of generating a vector of combined rules, and then applying them all to the data, using apply/lapply.
##create list of formatted rules
#format each 'building' block separately,
#based on rows in 'dt'.
part_conditions <- apply(dt[-nrow(dt),],MARGIN=1,FUN=function(x){
res <- sprintf("(%s%s)", ifelse(x[-1]=="Y","","!"), x[1])
})
# > part_conditions
# 1 2
# [1,] "(A>0)" "(B>1)"
# [2,] "(A>0)" "(!B>1)"
# [3,] "(!A>0)" "(B>1)"
# [4,] "(!A>0)" "(!B>1)"
#combine to vector of conditions
conditions <- apply(part_conditions, MARGIN=1,FUN=paste, collapse="&")
# > conditions
# [1] "(A>0)&(B>1)" "(A>0)&(!B>1)" "(!A>0)&(B>1)" "(!A>0)&(!B>1)"
#for each condition, test in data wheter condition is 'T'
temp <- sapply(conditions, function(rule){
return(with(df, eval(parse(text=rule))))
}
)
rules <- as.numeric(t(dt[nrow(dt),-1]))
#then find which of the (in this case) four is 'T', and put the appropriate rule
#in df
df$C <- rules[apply(temp,1,which)]
> df
x A B C
1 x1 -1 1 1
2 x2 1 3 1
3 x3 0 1 1
4 x4 2 1 2
5 x5 2 3 1
6 x6 -2 1 1

Gnu R: Rename variable in loop

I would like to create a loop in order to create 15 crosstables with one data.frame (var1), which consist of 15 variables, and another variable (var2), see data which can be downloaded here.
The code is now able to give results, but I would like to know how I can rename the variable "mytable" so that I get mytable1, mytable2, etc.
Code:
library(vcd) # for Cramer's V
var1 <- read.csv("~/example.csv", dec=",")
var2 <- sample(1:43)
i <- 1
while(i <= ncol(var1)) {
mytable[[i]] <- table(var2,var1[,i])
assocstats(mytable[[i]])
print(mytable[[i]])
i <- i + 1
}
As suggested in the comments, using names like mytable1, mytable2, etc. for a list of objects is actively discouraged when using R. Collecting all in a list is more useful and cleaner.
One way to do what you want would be this:
library(vcd) # for Cramer's V
data(mtcars)
var1 <- mtcars[ , c(2, 8:11)] ##OP's CSV no longer available
var2 <- sample(1:5, 32, TRUE)
mytable <- myassoc <- list() ##store output in a list
##a `for` loop looks simpler than `while`
for(i in 1:ncol(var1)){
mytable[[i]] <- table(var2, var1[ , i])
myassoc[[i]] <- assocstats(mytable[[i]])
}
So now to access "mytable2" and "myassoc2" you would simply do:
> mytable[[2]]
var2 0 1
1 4 2
2 6 6
3 1 1
4 2 3
5 5 2
> myassoc[[2]]
X^2 df P(> X^2)
Likelihood Ratio 1.7079 4 0.78928
Pearson 1.6786 4 0.79460
Phi-Coefficient : NA
Contingency Coeff.: 0.223
Cramer's V : 0.229

Resources