Selecting Columns in R using list and names - r

I found this question which was helpful in subsetting a DF for numeric columns only.
Selecting only numeric columns from a data frame
However I can't figure out how to do numeric columns PLUS any other columns.
I've tried:
nums <- sapply(df, is.numeric)
df <- df[, c(nums, "charcolumn")]
and:
df <- df[,c(sapply(df, is.numeric), "Pop_Size_Group")]
both of which gave me an "undefined columns selected error"
I understand that the sapply function gives me a list of TRUE/FALSE. How can I subset my df to include all numeric columns PLUS additional columns I identify?

Maybe selecting names and concatenating them with "Pop_Size_Group"
df <- df[,c(names(df)[sapply(df, is.numeric)], "Pop_Size_Group")]

Related

iterate through DataFrame in R to change columns types

I come from Python and I am not sure how to accomplish this in R. I want to write a function that takes two arguments. A dataframe and a list of column names. I want iterate through the dataframe to convert the column names that match the ones in the list.
the list of column names I want to convert, the type is character
col.names<-c('Ri','Na','Mg')
I wrote this function but it is not returning the desired output
function.convert<- function(df,col.names){
for (i in colnames(df)) {
if (i %in% col.names){
as.factor(i)}
}}
my desired output is the same dataframe but with specified columns converted to factor type.
You can do
df[col.names] <- do.call(cbind.data.frame, lapply(df[col.names], as.factor))
or using dplyr
df %>% mutate_at(col.names, as.factor)
Or as a function
f <- function(df, col.names) {
df[col.names] <- do.call(cbind.data.frame, lapply(df[col.names], as.factor))
df
}

R Change Columns with prefix from factor to character

I have a data frame with factors and characters. I want to change the columns with the column prefix "ID_" to be changed from factors to characters.
I tried the below, but it changes the whole data frame to characters, I just want to change the colnames with "ID_". I don't know how many "ID_" will end up in the data frame (this is part of a larger function that will loop across dataframes with various numbers of "ID_")
###Changes the whole dataframe to character rather than only the intended columns
df.loc[] <- lapply(df.loc[, grepl("ID_", colnames(df.loc))], as.character)
The problem is you assign to the whole data frame with df.loc[] <-. Try this:
my_cols <- grepl("ID_", colnames(df.loc))
df.loc[my_cols] <- lapply(df.loc[my_cols], as.character)
Here is a tidyverse solution:
food <- data_frame(
"ID_fruits" = factor(c("apple", "banana", "cherry")),
"vegetables" = factor(c("asparagus", "broccoli", "cabbage")),
"ID_drinks" = factor(c("absinthe", "beer", "cassis"))
)
food %>%
mutate_at(vars(starts_with("ID_")), as.character)
```
You can also do this with ifelse:
df[] <- ifelse(grepl("^ID_", colnames(df)), lapply(df, as.character), df)

R convert data.frame to list by column

I would like to convert a data.frame into a list of data.frames by column using base R functions and holding the first column constant. For example, I would like to split DF into a list of three data.frames, each of which includes the first column. That is, I would like to end up with the list named LONG without having to type out each list element out separately. Thank you.
DF <- data.frame(OBS=1:10,HEIGHT=rnorm(10),WEIGHT=rnorm(10),TEMP=rnorm(10))
DF
LONG <- list(HEIGHT = DF[c("OBS", "HEIGHT")],
WEIGHT = DF[c("OBS", "WEIGHT")],
TEMP = DF[c("OBS", "TEMP" )])
LONG
SHORT <- as.list(DF)
SHORT
SPLIT <- split(DF, col(DF))
We can loop through the names of 'DF' except the first one, cbind the first column with the subset of 'DF' from the names.
setNames(lapply(names(DF)[-1], function(x) cbind(DF[1], DF[x])), names(DF)[-1])
Or another option would be
Map(cbind, split.default(DF[-1], names(DF)[-1]), OBS=DF[1])

Select a numeric columns of a dataframe in a list

I have a list of dataframes. After applying a function I get new columns that are non numeric. From each resulting dataframe that I save in a list modified_list As a result I want to save my modified dataframes but I only want to save the columns that contain numeric values.
I am stocked in the selection of numeric columns. I do not know how to select numeric columns on a list of dataframes. My code looks something like this. Please do you have any idea what can i do to make this code work?
library(plyr)
library(VIM)
data1 <- sleep
data2 <- sleep
data3 <- sleep
# get a list of dataframes
list_dataframes <- list(data1, data2, data3) # list of dataframes
n <- length(list_dataframes)
# apply function to the list_dataframes
modified_list <- llply(list_dataframes, myfunction)
# selects only numeric results
nums <- llply(modified_list, is.numeric)
# saving results
for (i in 1:n){
write.table(file = sprintf( "myfile/%s_hd.txt", dataframes[i]), modified_list[[i]][, nums], row.names = F, sep=",")
}
It sounds like you want to subset each data.frame in a list of data.frames to their numeric columns.
You can test which columns of a data.frame called df are numeric with
sapply(df, is.numeric)
This returns a logical vector, which can be used to subset your data.frame like this:
df[sapply(df, is.numeric)]
Returning the numeric columns of that data.frame. To do this over a list of data.frames df_list and return a list of subsetted data.frames:
lapply(df_list, function(df) df[sapply(df, is.numeric)])
Edit: Thanks #Richard Scriven for simplifying suggestion.

R Apply function on data frame columns

I have a function in R to turn factors to numeric:
as.numeric.factor <- function(x) {as.numeric(levels(x))[x]}
and I have a dataframe that consists of both factors, numeric and other types of data.
I want to apply the functions above at once on the whole dataframe to turn all factors to numeric types columns.
Any idea ?
thanks
You could check whether the column is factor or not by is.factor and sapply. Use that as an index to filter out those columns and convert the columns to "numeric" by as.numeric.factor function in a lapply loop.
indx <- sapply(dat, is.factor)
dat[indx] <- lapply(dat[indx], as.numeric.factor)
You could also apply the function without subsetting (but applying it on a subset would be faster)
To prevent the columns to be converted to "factor", you could specify stringsAsFactors=FALSE argument or colClasses argument within the read.table/read.csv I would imagine the columns to have atleast a single non-numeric component which automatically convert this to factor while reading the dataset.
One option would be:
dat[] <- lapply(dat, function(x) if(is.factor(x)) as.numeric(levels(x))[x] else x)

Resources