R how to create a dataframe by adding columns - r

I am very very new to R....I have been using Python and MATLAB my whole life.
So here is what I would like to do. During each loop, I compute a column that I would like to add on to a dataframe.
Problem is that I do not know the length of the column. So I cannot create the dataframe to a specific length. So I keep getting an error when I try to add the column to the empty original empty dataframe...
# extract the data where the column 7 has no data.
df_glm <- data.frame(matrix(ncol = 11, nrow = 0))
for (j in 1:ncol(data_cancer)){
col_ele <- data_cancer[,j]
col_filtered <- col_ele[col_bool7]
# make new dataframe by concetenating the filtered column.
df_glm[,i] <- col_filtered
}
data_cancer_filter <- data_cancer[,col_bool7]
How can I resolve this issue?
I am getting an error at df_glm[,i] because the column is as long as col_bool7. But I want to learn how to do this without creating dataframe of exact size beforehand.

If I am understanding this correctly, you're looping through columns and taking the rows where col_bool7 is TRUE and putting it in another dataframe. dplyr filter() would be an efficient solution:
library(dplyr)
df_glm = data_cancer %>%
filter(col_bool7)

Related

R - Sum every two columns in a dataframe and paste results to new columns at the end

I have a dataframe of dynamic length, i.e. it get's longer everytime new variables are attached. In this case, I need to sum the values in every two columns 8:length(df) and attach the results (the sum of every two columns) at the end of this dataframe. So what I want to automate for alle columns in question is this:
df <- df %>%
mutate(sumAB = A + B)
Ideally, I would like to name these new columns based on a vector containing the intended colnames, which I already prepared. As I am fairly new to R, I could not get this running with for loops or the apply family. Every suggestion appreciated.
Thanks!
You can use split.default to split every two columns and then using lapply sum the values.
cols <- 8:ncol(df)
result <- cbind(df[1:8], sapply(split.default(df[cols],
rep(1:length(cols), each = 2, length.out = length(cols))),
rowSums, na.rm = TRUE))
result

How can I get the column/variable names of a dataframe that fit certain parameters?

I came across a problem in my DataCamp exercise that basically asked "Remove the column names in this vector that are not factors." I know what they -wanted- me to do, and that was to simply do glimpse(df) and manually delete elements of the vector containing the column names, but that wasn't satisfying for me. I figured there was a simple way to store the column names of the dataframe that are factors into a vector. So, I tried two things that ended up working, but I worry they might be inefficient.
Example data Frame:
factorVar <- as.factor(LETTERS[1:10])
df1 <- data.frame(x = 1, y = 1:10, factorVar = sample(factorVar, 10))
My first solution was this:
vector1 <- names(select_if(df1, is.factor))
This worked, but select_if returns an entire tibble of a filtered dataframe and then gets the column names. Surely there's an easier way...
Next, I tried this:
vector2 <- colnames(df1)[sapply(df1,is.factor)]
This also worked, but I wanted to know if there's a quicker, more efficient way of filtering column names based on their type and then storing the results as a vector.

Square brackets and dataframes in R

Okay I am rather puzzled by the different behaviours of dataframes and xtses in R and I'm hoping someone can explain it to me.
df = as.data.frame(x = c(1,2),row.names = c("2012-12-12","2012-12-13"))
xts = as.xts(x=c(1,2),order.by = as.POSIXct(c("2012-12-12","2012-12-13")))
I have two different datasets here. When you print them, they look almost similar. When I want the first row of the xts, xts[1,] returns the row with colnames and the index. But when you do df[1,] it only returns a vector.
Is there a way to return the first row of the dataframe, complete with the rownames and colname? I'm aware that I can hack it by doing as.data.frame(as.xts(df)[1,]) but is there a more elegant solution?
This is a very particular case where a subsetting operation by rows on a data frame has only ONE cell.
In that case, you need to specify drop = FALSE, here
df[1, , drop = FALSE]
I'd add a recommendation that when you create a data frame from scratch, use the data.frame() function instead of as.data.frame()

How do I import data from a .csv file into R without repeating the values of the first column into all the other ones?

I want to import data into R from a .csv file.
So far I have done the following:
> #Clear environment
rm(list=ls())
#Read my data into R
myData <- read.csv("C:/Users/.../flow.csv", header=TRUE)
#Convert from list to array
myData <- array(as.numeric(unlist(myData)), dim=c(264,3))
#Create vectors with specific values of interest: qMax, qMin
qMax <- myData[,2]
qMin <- myData[,3]
#Transform vectors into matrices
qMax <- matrix(qMax,nrow = 12, ncol = round((length(qMax)/12)))
qMin <- matrix(qMin,nrow = 12, ncol = round((length(qMin)/12)))
After importing the data using read.csv, I have a list. I then proceed to transform this list into an array with 264 lines of data spread through 3 columns. Here I have my first problem.
I know that each column of my list brings a different set of data; the values are not the same. However, after I check to see what I imported, it seems that only the first column is imported correctly, but then it repeats itself for columns one and two.
Here's an image for better explanation:
The matrix has the right layout, yet wrong data. Columns 2 and 3 should have different values from each other and from column 1.
How do I correct that? I have checked the source and the original document has all the correct values.
Also, assuming I will eventually get rid of this mistake, will the proceeding lines of code from the block "#Transform vectors into matrices" deliver a 12 x 22 matrix? The first six elements of both qMax and qMin are NA and I wish to keep it this way in the matrix. Will R perform that with these lines of code or will I need to change it?
Thank you.
Edit: As suggested by akrun, here's the results for str(myData and for dput(droplevels(head(myData)))

R - creating dataframe from colMeans function

I've been trying to create a dataframe from my original dataframe, where rows in the new dataframe would represent mean of every 20 rows of the old dataframe. I discovered a function called colMeans, which does the job pretty well, the only problem, which still persists is how to change that vector of results back to dataframe, which can be further analysed.
my code for colMeans: (matrix1 in my original dataframe converted to matrix, this was the only way I managed to get it to work)
a<-colMeans(matrix(matrix1, nrow=20));
But here I get the numeric sequence, which has all the results concatenated in one single column(if I try for example as.data.frame(a)). How am I supposed to get this result back into dataframe where each column includes only the results for specific column name and not all the averages.
I hope my question is clear, thanks for help.
Based on the methods('as.data.frame'), as.data.frame.list is an option to convert each element of a vector to columns of a data.frame
as.data.frame.list(a)
data
m1 <- matrix(1:20, ncol=4, dimnames=list(NULL, paste0('V', 1:4)))
a <- colMeans(m1)

Resources