How to code this if else clause in R? - r

I have a function that outputs a list containing strings. Now, I want to check if this list contain strings which are all 0's or if there is at least one string which doesn't contain all 0's (can be more).
I have a large dataset. I am going to execute my function on each of the rows of the dataset. Now,
Basically,
for each row of the dataset
mylst <- func(row[i])
if (mylst(contains strings containing all 0's)
process the next row of the dataset
else
execute some other code
Now, I can code the if-else clause but I am not able to code the part where I have to check the list for all 0's. How can I do this in R?
Thanks!

You can use this for loop:
for (i in seq(nrow(dat))) {
if( !any(grepl("^0+$", dat[i, ])) )
execute some other code
}
where dat is the name of your data frame.
Here, the regex "^0+$" matches a string that consists of 0s only.

I'd like to suggest solution that avoids use of explicit for-loop.
For a given data set df, one can find a logical vector that indicates the rows with all zeroes:
all.zeros <- apply(df,1,function(s) all(grepl('^0+$',s))) # grepl() was taken from the Sven's solution
With this logical vector, it is easy to subset df to remove all-zero rows:
df[!all.zeros,]
and use it for any subsequent transformations.
'Toy' dataset
df <- data.frame(V1=c('00','01','00'),V2=c('000','010','020'))
UPDATE
If you'd like to apply the function to each row first and then analyze the resulting strings, you should slightly modify the all.zeros expression:
all.zeros <- apply(df,1,function(s) all(grepl('^0+$',func(s))))

Related

How to formulate for loop here

I have a csv file of information of car (price, model, color, and more)
I have uploaded this into R through read.csv
Some variables are text based categorical variables such as Model, color, and fuel type
I came up with a for loop to find out how to find these text based categorical variables
for(i in 1:dim(car)[2]){
if(is.character(car[,i])){
print(names(car)[i])
}
}
###car is name of file
Now I want to add to the loop how to find the index of the column. For example column of Model is 2 but how should I integrate it into this loop? Below is what I have so far but response is "Integer(0)".
for(i in 1:dim(car)[2]){
if(is.character(car[,i])){
print(which(i==colnames(car)))}
}
dim(car)[2] is the number of columns of car. (ncol() is a more common way to get this number for a data frame).
1:dim(car)[2] is therefore 1, 2, 3, ... up to the number of columns.
So for(i in ...) means i will be 1, then i will be 2, .... up to the number of columns.
When your if statement is TRUE, the current value of i is the column number. So you want print(i) inside your if() statement.
Your attempt, print(which(i==colnames(car))) failes because colnames(car) are the names of the columns, and i is the number of the column. Names and numbers are different.
A more R-like way to do this would be to use sapply instead of a loop. Something like this:
char_cols = sapply(cars, is.character)
char_cols # named vector saying if each column is character or not
char_cols[char_cols] # look only at the character columns
"which" function can still be used. From the response from Gregor Thomas there is a way to modify there is a way to modify for loop
for(i in 1:ncol(car)){
if(is.character(car[,i])){
print(names(car)[i])
print(which(names(car)[i]==colnames(car)))
}
}
enter image description here
we first print the actual names through print(names(car)[i])
then we simply ask R to print the names (that we receive above) that match with name in column of "car" dataset
check the link below for a picture. Once again thank you to Mr. Gregor Thomas
A slight variation of Gregor Thomas' smart recommendation is to use sapply with the typeof function to type every column and then the which function to get the character column numbers:
x <- sapply(cars, typeof)
y <- which(x == 'character')
Also note that you see which columns are character from a visual inspection of a dataframe's structure, str(car)

Data table subsetting in r by concatenating string variables

I have a data table that I am trying to subset by creating a list of variable names by pasting together some string vectors in the j argument of the data table, but I'm running into difficulty.
I have a character vector called foos (for this example foos <- c('FOO0','FOO1','FOO2')) and a vector I created with c() . I wanted to subset my data table by doing dt[,paste0(foos, c('VAR0','VAR1','VAR2'))] but that didn’t work as expected. I output what paste0(foos, c('VAR0','VAR1','VAR2')) returns and it becomes
[1] "FOO0VAR0" "FOO1VAR1" "FOO2VAR2"
so it seems this approach does a vector index by vector index concatenation instead of a concatenation of the vectors themselves (and that’s a bit surprising to me, I’d expect to have to lapply to get a paste happening on elements of a vector). Changing the permutation of the c() and paste0 didn’t work. I also tried to do
dt[,c(foos,c('VAR0','VAR1','VAR2'))] but that also doesn't work.
Is there a way to subset by a created concatenation of two string vectors in the jth column of a data table in R?

R: Check for finite values in DataFrame

I need to check whether data frame is "empty" or not ("empty" in a sense that dataframe contain zero finite value. If there is mix of finite and non-finite value, it should NOT be considered "empty")
Referring to How to check a data.frame for any non-finite, I came up with one line code to almost achieve this objective
nrow(tmp[rowSums(sapply(tmp, function(x) is.finite(x))) > 0,]) == 0
where tmp is some data frame.
This code works fine for most cases, but it fails if data frame contains a single row.
For example, the above code would work fine for,
tmp <- data.frame(a=c(NA,NA), b=c(NA,NA)) OR tmp <- data.frame(a=c(3,NA), b=c(4,NA))
But not for,
tmp <- data.frame(a=NA, b=NA)
because I think rowSums expects at least two rows
I looked at some other posts such as https://stats.stackexchange.com/questions/6142/how-to-calculate-the-rowmeans-with-some-single-rows-in-data, but I still couldn't come up a solution for my problem.
My question is, are there any clean ways (i.e. avoid using loops and ideally one liner) to check for being "empty" for any dataframes?
Thanks
If you are checking all columns, then you can just do
all(sapply(tmp, is.finite))
Here we are using all rather than the rowSums trick so we don't have to worry about preserving matrices.

in R: combine columns of different dataframes

I try to combine each columns of three different dataframes to get an object with the same length of the original dataframe and three columns of every subobject. Each of the original dataframe has 10 columns and 14 rows.
I tried it with a for-loop, but the result is not usable for me.
t <- NULL
for(i in 1 : length(net)) {
a <- cbind(imp.qua.00.09[i], exp.qua.00.09[i], net[i])
t <- list(t, a)
}
t
But in the end I would like to get 10 seperated dataframes with three columns.
So I want to loop through this:
a <- cbind(imp.qua.00.09[i], exp.qua.00.09[i], net[i])
for every column of each original dataframe. But if I use t <- list(t, a) it constructs a crazy list. Thanks.
The code you're using to append elements to t is wrong, you should do in this way:
t <- list()
for(i in 1:length(net)) {
a <- cbind(imp.qua.00.09[i], exp.qua.00.09[i], net[i])
t[[length(t)+1]] <- a
}
t
Your code is wrong since at each step, you transform t into a list where the first element is the previous t (that is a list, except for the first iteration), and the second element is the subset. So basically in the end you're getting a sort of recursive list composed by two elements where the second one is the data.frame subset and the first is again a list of two elements with the same structure, for ten levels.
Anyway, your code is equivalent to this one-liner (that is probably more efficient since it does not perform any list concatenation):
t <- lapply(1:length(net),
function(i){cbind(imp.qua.00.09[i], exp.qua.00.09[i], net[i])})
This should work:
do.call(cbind,list(imp.qua.00.09, exp.qua.00.09, net))

Evaluating dataframe and storing the result

My dataframe(m*n) has few hundreds of columns, i need to compare each column with all other columns (contingency table) and perform chisq test and save the results for each column in different variable.
Its working for one column at a time like,
s <- function(x) {
a <- table(x,data[,1])
b <- chisq.test(a)
}
c1 <- apply(data,2,s)
The results are stored in c1 for column 1, but how will I loop this over all columns and save result for each column for further analysis?
If you're sure you want to do this (I wouldn't, thinking about the multitesting problem), work with lists :
Data <- data.frame(
x=sample(letters[1:3],20,TRUE),
y=sample(letters[1:3],20,TRUE),
z=sample(letters[1:3],20,TRUE)
)
# Make a nice list of indices
ids <- combn(names(Data),2,simplify=FALSE)
# use the appropriate apply
my.results <- lapply(ids,
function(z) chisq.test(table(Data[,z]))
)
# use some paste voodoo to give the results the names of the column indices
names(my.results) <- sapply(ids,paste,collapse="-")
# select all values for y :
my.results[grep("y",names(my.results))]
Not harder than that. As I show you in the last line, you can easily get all tests for a specific column, so there is no need to make a list for each column. That just takes longer and takes more space, but gives the same information. You can write a small convenience function to extract the data you need :
extract <- function(col,l){
l[grep(col,names(l))]
}
extract("^y$",my.results)
Which makes you can even loop over different column names of your dataframe and get a list of lists returned :
lapply(names(Data),extract,my.results)
I strongly suggest you get yourself acquainted with working with lists, they're one of the most powerful and clean ways of doing things in R.
PS : Be aware that you save the whole chisq.test object in your list. If you only need the value for Chi square or the p-value, select them first.
Fundamentally, you have a few problems here:
You're relying heavily on global arguments rather than local ones.
This makes the double usage of "data" confusing.
Similarly, you rely on a hard-coded value (column 1) instead of
passing it as an argument to the function.
You're not extracting the one value you need from the chisq.test().
This means your result gets returned as a list.
You didn't provide some example data. So here's some:
m <- 10
n <- 4
mytable <- matrix(runif(m*n),nrow=m,ncol=n)
Once you fix the above problems, simply run a loop over various columns (since you've now avoided hard-coding the column) and store the result.

Resources