Update binary column values in a dataframe based on checkboxGroupInput in Shiny - r

I have a long list of checkboxes in a checkboxGroupInput statement. The labels and values of the checkboxes correspond to a subset of the colnames in a dataframe.
For instance, the dataframe is called userdf and has columns like this:
A B C
1 1 0
If the name of the checkboxGroupInput is sotags then I want input$sotags to modify the dataframe such that if it contains A but not B or C:
A B C
1 0 0
My lame attempt at this was:
for(i in 1:colnames(userdf)){
if(colnames(userdf[i]) %in% paste(input$sotags)){userdf[,i] <- 1}
if(!colnames(userdf[i]) %in% paste(input$sotags)){userdf[,i] <- 0}
}
If you want to see my entire working code, it's here: https://github.com/hack-r/coursera_shiny

Lets say that you start with userdf in your code, which in not reactive, like so
userdf<-data.frame(A=NA,B=NA,C=NA)
and input$sotags is your checkboxGroupInput which will be character and one of your column names.
Then you can make a new data.frame like so:
userdf2<-reactive({
as.data.frame(matrix(as.numeric(colnames(userdf)==input$sotags),nrow=1,
dimnames=list(NULL,colnames(userdf)))
})
Edited to Add:
If input$sotags is a character vector, you can replace the == with %in% in the line starting as.data.frame and that will put a 1 in all the selected columns.

I think this should give the same result as your code:
userdf[,input$sotags] <- 1
userdf[,! colnames(userdf) %in% input$sotags] <- 0
But that will result in a data frame with all rows being equal...
Why would you need that?

Related

Building a Dataframe Column-by-Column in R

Is there a way for me to iteratively build a dataframe in R? I would be interested in knowing how I would do so either by adding column-by-column or row-by-row. I have been trying for some time now and find myself stuck.
Here is some code that I have tried:
line <- as.list(strsplit(line, ", "))[[1]] # make into list
col_names = names(idx_for_cell_counts_by_gene_id)
df <- data.frame() # here is where I get stuck - want an empty dataframe
for (x in 1:length(col_names)) {
column_name <- col_names[[x]]
information <- line[[x]]
df$column_name <- information
}
I have tried looking at some SO examples (#1, #2) but to no avail. Is there something I should do to instantiate an empty dataframe (or, better yet, a dataframe with only 'column headers' and now rows) in R?
One issue is that df$column_name creates a column named column_name. It doesn't use the value in the object named column_name. Making a representative example and walking through it will show you:
df <- data.frame(placeholder = 0)
column_name <- "my_col"
# The following will create a column named "column_name"
df$column_name <- 0
# df
# placeholder column_name
# 1 0 0
# The following will create a column with the value inside of the object `column_name`
df[,column_name] <- 0
# df
# placeholder column_name my_col
# 1 0 0 0
Another issue you have is that you're making a data.frame of length 0. That means that any column you add needs to be a matching length. All columns in a dataframe must be the same length.
One way to deal with this is to create a placeholder column when you create the dataframe and then remove it later. df <- data.frame(placeholder = boolean(length(line[[1]]))). There may be other more elegant ways to handle this.

R: retrieve dataframe name from another dataframe

I have a dataframe dataselect that tells me what dataframe to use for each case of an analysis (let's call this the relevant dataframe).
The case is assigned dynamically, and therefore which dataframe is relevant depends on that case.
Based on the case, I would like to assign the relevant dataframe to a pointer "relevantdf". I tried:
datasetselect <- data.frame(case=c("case1","case2"),dataset=c("df1","df2"))
df1 <- data.frame(var1=letters[1:3],var2=1:3)
df2 <- data.frame(var1=letters[4:10],var2=4:10)
currentcase <- "case1"
relevantdf <- get(datasetselect[datasetselect$case == currentcase,"dataset"]) # relevantdf should point to df1
I don't understand if I have a problem with the get() function or the subsetting process.
You are almost there, the problem is that the dataset column from datasetselect is a factor, you just need to convert it to character
You can add this line after the definition of datasetselect:
datasetselect$dataset <- as.character(datasetselect$dataset)
And you get your expected output
> relevantdf
var1 var2
1 a 1
2 b 2
3 c 3

Names of columns in Data Frame

Suppose we have a dataframe data with the following column names: A1, A2, A3 . I want to check if any of the columns are 0 . Instead of doing the following:
if (data$A1 == 0 || data$A2 == 0 || data$A3 == 0)
{
print("TRUE")
}
is there a way of doing something like this:
for (i in 1:3)
{
if (data$Ai == 0)
{
print("TRUE")
}
}
How do you use indices in variable names?
You can't use variables with the $ syntax. better to just subset the data.frame using column names as strings (which you can do) and then check if any are 0
any(data[, paste0("A", 1:3)]==0)
It isn't clear to me whether you just want to know if any element of a column is equal to 0, if every element of a column is equal to 0, or what exactly you want. That said, in my example/answer, you can change the function argument to look for what it is you want.
I would use the apply function as it can be more simple and fast than writing a loop. The first argument is your dataframe, the second is 1 for rows, 2 for columns (what you want):
df <- data.frame(A=c(1,0,3),B=c(3,3,3),C=c(4,3,0)) # Replace with your data
# For if you just want to know if each column contains a 0
apply(df,2,FUN=function(x) max(x==0,na.rm=T))
# For if you just want to know how many 0s are in each column
apply(df,2,FUN=function(x) sum(x==0))
# For if you want to know if all entries in each column are 0
apply(df,2,FUN=function(x) all(x==0))
# etc.
These will returned named vectors where you can see what the answer is for each column by name.
I hope this helps!

subsetting using column names as objects

I am trying to subset a data frame using a column names stored in an object. Is this possible? Here is an example:
ReallyLongColNameA <- c(1,2,3,4,5,6)
ReallyLongColNameB <- c(6,5,4,3,2,1)
ReallyLongColNameC <- c(7,8,9,10,11,12)
X <- data.frame(ReallyLongColNameA, ReallyLongColNameB, ReallyLongColNameC)
can i store a column name as such:
ShortColNameB <- names(X[2])
and then subset using the column name stored in object ShortColNameB
I can subset the following:
subX <- X[X$ReallyLongColB == 6,]
To get:
ReallyLongColA ReallyLongColB ReallyLongColC
1 6 7
But what if I wanted the following desired output by using the column name stored in an object (ShortColNameB)?:
ReallyLongColA ReallyLongColB
1 6
You can easily remove the last column by subsetting on column numbers.
X[X[[ShortColNameB]]==6,c(1,2)]
You define what rows you want by filtering on the ==6 for ShortColNameB, and you define the columns you want by selecting the numbers (e.g. 1st and 2nd column, A & B).

in R: remove rows containing no integer (such as characters i.e.) from a data frame

My data frame df looks like follow:
Variable A Variable B Variable C
9 2 1
2 0 don't know
maybe 1 1
? 0 3
I need to remove all rows, where non-numerical values are used. It should look like this afterwards:
Variable A Variable B Variable C
9 2 1
I thought about something like
df[! grepl(*!= numerical*, df),]
or
df[! df %in% *!= numerical*, ]
but I don't find anything I could use as input for "take all that doesn't match numerical values". Could you please help me?
Thanks a lot!
One option would be to loop through the columns, convert to numeric so that all non-numeric elements convert to NA, check for NA with is.na , negate (!) it, compare the corresponding elements of list with Reduce and &, use that to subset the rows.
df[Reduce(`&`, lapply(df, function(x) !is.na(as.numeric(x)))),]
This might not be the best way to do it, but works.
s is the df that contains your data-
contains <- lapply(seq_len(nrow(s)), function(i){
yes <- grep("[^0-9.]" , s[i,]) #regex for presence of non-digits
ifelse(identical(yes, integer(0)),F,T)
}) %>% unlist
s <- s[which(!contains),]
Thanks!

Resources