Names of columns in Data Frame - r

Suppose we have a dataframe data with the following column names: A1, A2, A3 . I want to check if any of the columns are 0 . Instead of doing the following:
if (data$A1 == 0 || data$A2 == 0 || data$A3 == 0)
{
print("TRUE")
}
is there a way of doing something like this:
for (i in 1:3)
{
if (data$Ai == 0)
{
print("TRUE")
}
}
How do you use indices in variable names?

You can't use variables with the $ syntax. better to just subset the data.frame using column names as strings (which you can do) and then check if any are 0
any(data[, paste0("A", 1:3)]==0)

It isn't clear to me whether you just want to know if any element of a column is equal to 0, if every element of a column is equal to 0, or what exactly you want. That said, in my example/answer, you can change the function argument to look for what it is you want.
I would use the apply function as it can be more simple and fast than writing a loop. The first argument is your dataframe, the second is 1 for rows, 2 for columns (what you want):
df <- data.frame(A=c(1,0,3),B=c(3,3,3),C=c(4,3,0)) # Replace with your data
# For if you just want to know if each column contains a 0
apply(df,2,FUN=function(x) max(x==0,na.rm=T))
# For if you just want to know how many 0s are in each column
apply(df,2,FUN=function(x) sum(x==0))
# For if you want to know if all entries in each column are 0
apply(df,2,FUN=function(x) all(x==0))
# etc.
These will returned named vectors where you can see what the answer is for each column by name.
I hope this helps!

Related

R- Remove rows based on condition across some columns

I have a data frame like this :
I want to remove rows which have values = 0 in those columns which are "numeric". I tried some functions but returned to me error o dind't remove anything be cause not the entire row is = 0. Summarizing, i need to remove the rows which are equals to 0 on the colums that have a numeric class( i.e from sales month to expected sales,). How i could do this???(below attach the result i expect)
PD: If I could do it with some function that allows me to put the number of the column instead of the name, it would be great!
Here a simple solution with lapply.
set.seed(5)
df <- data.frame(a=1:10,b=letters[1:10],x=sample(0:5,5,replace = T),y=sample(c(0,10,20,30,40,50),5,replace = T))
df <-df[!unlist(lapply(1:nrow(df), function(i) {
any(df[i, ] == 0)
})), ]

How do I make a for loop change each row instead of the whole column

I have a data frame that has 1,500 observations. I created a column for low income, and I need to assign values based on the value of another variable.
What I want is that if the value on df$income is 4:13, the loop assigns value 1. Otherwise, if it's 14 :17, assign a value 0.
I have the loop like this:
low_inc <- NA
vote$low_inc <- low_inc
for(i in vote$income){
if (vote$income == 4||5||6||7||8||9||10||11||12||13) {
vote$low_inc <- 1
} else if (vote$income == 14||15||16||17){
vote$low_inc <- 0
}
}
I think the issue is that I am changing the whole column instead of each individually.

rbind () on a conditional basis

Here's the question. I have a bunch of if statements that create different data frames (A-F) based on the user inputs. In some instances, some of the data frames will be empty, so maybe (A-C) are empty, but (D-F) have information.
I'm trying to create a conditional rbind(), where it combines the rows only if the data frame is not empty.
I'm not quite sure how to go about this? I don't know if I should create a bunch of conditions and use another if statement:
cond_a <- nrow(a) != 0
cond_b <- nrow(b) != 0
cond_c <- nrow(c) != 0
cond_d <- nrow(d) != 0
cond_e <- nrow(e) != 0
cond_f <- nrow(f) != 0
but then I don't know how to utilize these conditions...
EDIT: To take a step back and better explain: I have one data frame that I split into 6 different data frames by subsetting by a column, so it splits it into 6 data frames (A-G). The column has the letters A-G. These letters A-G in that column change depending on user inputs.
I then have a series of if statements that asks "if A is not empty then perform this aggregation", thereby skipping the df if it has no data in it. The aggregation takes it from 16 to 19 columns. Because the empty df has not been aggregated it still has 19 columns. After I perform these if statements and aggregations, I am left with dfA, dfB, dfC, etc. that have either been aggregated (16 columns) or still empty (19 columns). I then want a piece of code that says "for the df that have been aggregated (16 columns) rbind, if the df has not been aggregated (is empty), then don't perform the rbind.
thanks!
You can do something like this :
Create an empty list of the data frames you wish to select.
Write a bunch of if statements to see which ones contain non-zero zeros
Keep adding those names of non-zero DFs to the empty list created in step 1
Write an expression which collapses those names to an rbind expression
Evaluate the expression
Here is the script to do it :
1.Create an Empty list to store the names of dataframes
list.df = " "
2&3. If statements check if the Data Frame is Non-Zero, add the name to the list if true
if(nrow(df.a) > 0) {
list.df=c(list.df,deparse(substitute(df.a)))
}
if(nrow(df.b) > 0) {
list.df=c(list.df,deparse(substitute(df.b)))
}
if(nrow(df.c) > 0) {
list.df=c(list.df,deparse(substitute(df.c)))
}
.... So on from A through G
if(nrow(df.g) > 0) {
list.df=c(list.df,deparse(substitute(df.g)))
}
4.Formulate the expression
list.df = list.df[2:length(list.df)] #remove first element
expression = paste0("df.combined = rbind(",paste0( list.df, collapse =
','), paste0(")"))
5.Evaluate the expression
eval(parse(text=expression))
df.combined will be your final dataset that you are looking for

extract data from only columns matching character strings

I have a dataset that looks something like this (but much larger)
Jul_08 <- c(1,0,2,0,3)
Aug_08 <- c(0,0,1,0,1)
Sep_08 <- c(0,1,0,0,1)
month<-c("Jul_08","Aug_08","Jul_08","Sep_08","Jul_08")
dataset <- data.frame(Jul_08 = Jul_08, Aug_08 = Aug_08, Sep_08=Sep_08,month=month)
For each row, I would to isolate the value for a select month only as indicated by the "month" field. In other words, for a given row, if the column "month" = Jul_08, then for a new "value" column, I would like to include the datum that pertained to the column "Jul_08" from that row.
In essence, the output would add this value column to the dataset
value<-c(1,0,2,0,3)
Creating this final dataset
dataset.value<-cbind(dataset,value)
You can use matrix indexing:
w <- match(month, names(dataset))
dataset$value <- dataset[ cbind(seq_len(nrow(dataset)), w) ]
Here the w vector tells R which column to take the value from and seq_len is used to say use the same row, so the value column is constructed by taking the 1st column in the 1st row, then the 2nd column and 2nd row, 1st column for the 3rd row, etc.
You can use lapply :
value <- unlist(lapply(1:nrow(dataset),
function(r){
dataset[r,as.character(dataset[r,'month'])]
}))
> value
[1] 1 0 2 0 3
Or, alternatively :
value <- diag(as.matrix(dataset[,as.character(dataset$month)]))
> value
[1] 1 0 2 0 3
Then you can cbind the new column as you did in your example.
Some notes:
I prefer unlist(lapply(...)) over sapply since automagic simplification implemented in sapply function tends to surprise me sometimes. But I'm pretty sure this time you can use it without any problem.
as.character is necessary only if month column is a factor (as in the example), otherwise is redundant (but I would leave it, just to be safe).

Update binary column values in a dataframe based on checkboxGroupInput in Shiny

I have a long list of checkboxes in a checkboxGroupInput statement. The labels and values of the checkboxes correspond to a subset of the colnames in a dataframe.
For instance, the dataframe is called userdf and has columns like this:
A B C
1 1 0
If the name of the checkboxGroupInput is sotags then I want input$sotags to modify the dataframe such that if it contains A but not B or C:
A B C
1 0 0
My lame attempt at this was:
for(i in 1:colnames(userdf)){
if(colnames(userdf[i]) %in% paste(input$sotags)){userdf[,i] <- 1}
if(!colnames(userdf[i]) %in% paste(input$sotags)){userdf[,i] <- 0}
}
If you want to see my entire working code, it's here: https://github.com/hack-r/coursera_shiny
Lets say that you start with userdf in your code, which in not reactive, like so
userdf<-data.frame(A=NA,B=NA,C=NA)
and input$sotags is your checkboxGroupInput which will be character and one of your column names.
Then you can make a new data.frame like so:
userdf2<-reactive({
as.data.frame(matrix(as.numeric(colnames(userdf)==input$sotags),nrow=1,
dimnames=list(NULL,colnames(userdf)))
})
Edited to Add:
If input$sotags is a character vector, you can replace the == with %in% in the line starting as.data.frame and that will put a 1 in all the selected columns.
I think this should give the same result as your code:
userdf[,input$sotags] <- 1
userdf[,! colnames(userdf) %in% input$sotags] <- 0
But that will result in a data frame with all rows being equal...
Why would you need that?

Resources