Building a Dataframe Column-by-Column in R - r

Is there a way for me to iteratively build a dataframe in R? I would be interested in knowing how I would do so either by adding column-by-column or row-by-row. I have been trying for some time now and find myself stuck.
Here is some code that I have tried:
line <- as.list(strsplit(line, ", "))[[1]] # make into list
col_names = names(idx_for_cell_counts_by_gene_id)
df <- data.frame() # here is where I get stuck - want an empty dataframe
for (x in 1:length(col_names)) {
column_name <- col_names[[x]]
information <- line[[x]]
df$column_name <- information
}
I have tried looking at some SO examples (#1, #2) but to no avail. Is there something I should do to instantiate an empty dataframe (or, better yet, a dataframe with only 'column headers' and now rows) in R?

One issue is that df$column_name creates a column named column_name. It doesn't use the value in the object named column_name. Making a representative example and walking through it will show you:
df <- data.frame(placeholder = 0)
column_name <- "my_col"
# The following will create a column named "column_name"
df$column_name <- 0
# df
# placeholder column_name
# 1 0 0
# The following will create a column with the value inside of the object `column_name`
df[,column_name] <- 0
# df
# placeholder column_name my_col
# 1 0 0 0
Another issue you have is that you're making a data.frame of length 0. That means that any column you add needs to be a matching length. All columns in a dataframe must be the same length.
One way to deal with this is to create a placeholder column when you create the dataframe and then remove it later. df <- data.frame(placeholder = boolean(length(line[[1]]))). There may be other more elegant ways to handle this.

Related

R: adding rows to a table from a for loop

I have a 1 column table with postcodes in it: I would like to loop through each postcode using the postcode_lookup() function in the postcodeioR library.
My current attempts are the following:
x <- data.frame()
for(i in 1:3){
x[i, ] <- postcode_lookup(table$Var1[i])
}
So i instantiated a new table and tried to add the result of postcode_lookup to a new row every time. But I get nothing. What i get is data frame with 3 obs. and 0 variables. the data should look like this: imagine 31 columns and multiple rows:
table
You need to explicitly specify the number of columns when creating a data frame:
df <- as.data.frame(matrix(NA, 0, 1))
set.seed(123)
val <- runif(20)
for (i in 1:3){
df[i, ] <- val[[i]]
}
In this case, a matrix with 0 rows and 1 column is converted to a data frame. This is a convenient way to create an empty data frame with the required number of columns.
In your case, you have a data frame with 0 columns. Hence, nothing gets populated.

R: retrieve dataframe name from another dataframe

I have a dataframe dataselect that tells me what dataframe to use for each case of an analysis (let's call this the relevant dataframe).
The case is assigned dynamically, and therefore which dataframe is relevant depends on that case.
Based on the case, I would like to assign the relevant dataframe to a pointer "relevantdf". I tried:
datasetselect <- data.frame(case=c("case1","case2"),dataset=c("df1","df2"))
df1 <- data.frame(var1=letters[1:3],var2=1:3)
df2 <- data.frame(var1=letters[4:10],var2=4:10)
currentcase <- "case1"
relevantdf <- get(datasetselect[datasetselect$case == currentcase,"dataset"]) # relevantdf should point to df1
I don't understand if I have a problem with the get() function or the subsetting process.
You are almost there, the problem is that the dataset column from datasetselect is a factor, you just need to convert it to character
You can add this line after the definition of datasetselect:
datasetselect$dataset <- as.character(datasetselect$dataset)
And you get your expected output
> relevantdf
var1 var2
1 a 1
2 b 2
3 c 3

How to sum the values of different columns in a dataframe looping on the variables names

I'm relatively new to R (used to work in Stata before) so sorry if the question is too trivial.
I've a dataframe with variables named in a sequential way that follows the following logic:
q12.X.Y
where X assumes the values from 1 to 9, and Y from 1 to 5
I need to add together the values of the variables of all the q12.X.Y variables with the Y numbers from 1 to 3 (but NOT those ending with the number 4 or 5)
Ideally I would have written a loop based on the sequential numbers of the variables, namely something like:
df$test <- 0
for(i in 1:9){
for(j in 1:3){
df$test <- df$test+ df$q12.i.j
}
}
That obviously do not work.
I also tried with the command "rowSums" and "subset"
df$test <- rowSums(subset(df,select= ...)
However I find it a bit cumbersome, as the column numbers are not sequential and i do not want to type the name of all the variables.
Any suggestion how to do that?
We can use grep to get the match
rowSums(df[grep("q12\\.[1-9]\\.[1-3]", names(df))])
or if all the column names are present, then use an exact match by creating the column names with paste
rowSums(df[paste0(rep(paste0("q12.", 1:9, "."), 3), 1:3)])

Vector gets stored as a dataframe instead of being a vector

I am new to r and rstudio and I need to create a vector that stores the first 100 rows of the csv file the programme reads . However , despite all my attempts my variable v1 ends up becoming a dataframe instead of an int vector . May I know what I can do to solve this? Here's my code:
library(readr)
library(readr)
cup_data <- read_csv("C:/Users/Asus.DESKTOP-BTB81TA/Desktop/STUDY/YEAR 2/
YEAR 2 SEM 2/PREDICTIVE ANALYTICS(1_PA_011763)/Week 1 (Intro to PA)/
Practical/cup98lrn variable subset small.csv")
# Retrieve only the selected columns
cup_data_small <- cup_data[c("AGE", "RAMNTALL", "NGIFTALL", "LASTGIFT",
"GENDER", "TIMELAG", "AVGGIFT", "TARGET_B", "TARGET_D")]
str(cup_data_small)
cup_data_small
#get the number of columns and rows
ncol(cup_data_small)
nrow(cup_data_small)
cat("No of column",ncol(cup_data_small),"\nNo of Row :",nrow(cup_data_small))
#cat
#Concatenate and print
#Outputs the objects, concatenating the representations.
#cat performs much less conversion than print.
#Print the first 10 rows of cup_data_small
head(cup_data_small, n=10)
#Create a vector V1 by selecting first 100 rows of AGE
v1 <- cup_data_small[1:100,"AGE",]
Here's what my environment says:
cup_data_small is a tibble, a slightly modified version of a dataframe that has slightly different rules to try to avoid some common quirks/inconsistencies in standard dataframes. E.g. in a standard dataframe, df[, c("a")] gives you a vector, and df[, c("a", "b")] gives you a dataframe - you're using the same syntax so arguably they should give the same type of result.
To get just a vector from a tibble, you have to explicitly pass drop = TRUE, e.g.:
library(dplyr)
# Standard dataframe
iris[, "Species"]
iris_tibble = iris %>%
as_tibble()
# Remains a tibble/dataframe
iris_tibble[, "Species"]
# This gives you just the vector
iris_tibble[, "Species", drop = TRUE]

Update binary column values in a dataframe based on checkboxGroupInput in Shiny

I have a long list of checkboxes in a checkboxGroupInput statement. The labels and values of the checkboxes correspond to a subset of the colnames in a dataframe.
For instance, the dataframe is called userdf and has columns like this:
A B C
1 1 0
If the name of the checkboxGroupInput is sotags then I want input$sotags to modify the dataframe such that if it contains A but not B or C:
A B C
1 0 0
My lame attempt at this was:
for(i in 1:colnames(userdf)){
if(colnames(userdf[i]) %in% paste(input$sotags)){userdf[,i] <- 1}
if(!colnames(userdf[i]) %in% paste(input$sotags)){userdf[,i] <- 0}
}
If you want to see my entire working code, it's here: https://github.com/hack-r/coursera_shiny
Lets say that you start with userdf in your code, which in not reactive, like so
userdf<-data.frame(A=NA,B=NA,C=NA)
and input$sotags is your checkboxGroupInput which will be character and one of your column names.
Then you can make a new data.frame like so:
userdf2<-reactive({
as.data.frame(matrix(as.numeric(colnames(userdf)==input$sotags),nrow=1,
dimnames=list(NULL,colnames(userdf)))
})
Edited to Add:
If input$sotags is a character vector, you can replace the == with %in% in the line starting as.data.frame and that will put a 1 in all the selected columns.
I think this should give the same result as your code:
userdf[,input$sotags] <- 1
userdf[,! colnames(userdf) %in% input$sotags] <- 0
But that will result in a data frame with all rows being equal...
Why would you need that?

Resources