R convert data.frame to list by column - r

I would like to convert a data.frame into a list of data.frames by column using base R functions and holding the first column constant. For example, I would like to split DF into a list of three data.frames, each of which includes the first column. That is, I would like to end up with the list named LONG without having to type out each list element out separately. Thank you.
DF <- data.frame(OBS=1:10,HEIGHT=rnorm(10),WEIGHT=rnorm(10),TEMP=rnorm(10))
DF
LONG <- list(HEIGHT = DF[c("OBS", "HEIGHT")],
WEIGHT = DF[c("OBS", "WEIGHT")],
TEMP = DF[c("OBS", "TEMP" )])
LONG
SHORT <- as.list(DF)
SHORT
SPLIT <- split(DF, col(DF))

We can loop through the names of 'DF' except the first one, cbind the first column with the subset of 'DF' from the names.
setNames(lapply(names(DF)[-1], function(x) cbind(DF[1], DF[x])), names(DF)[-1])
Or another option would be
Map(cbind, split.default(DF[-1], names(DF)[-1]), OBS=DF[1])

Related

How do I rename a single column in multiple dataframes to the name of the dataframe in which they reside in R?

I am currently trying to rename a single column in multiple dataframes to match the dataframe name in R.
I have seen some questions/solutions on the site that are similar to what I am attempting to do, but none appear to do this dynamically. I have over 45 dataframes I need rename a column in, so manually typing in each individual name is doable, but time consuming.
Dataframe1 <- column
Dataframe2 <- column
Dataframe3 <- column
I want it to look like this:
Dataframe1 <- Dataframe1
Dataframe2 <- Dataframe2
Dataframe3 <- Dataframe3
The ultimate goal is to have a master dataframe with columns Dataframe1, Dataframe2, and Dataframe3
We can get all the datasets into a list and rename at once in the list
lst1 <- lapply(mget(ls(pattern = "Dataframe\\d+")), function(x) {
names(x)[5] <- "newcol"
x})
Update
If we are renaming the columns in different datasets with different names, then create a vector of columns names that corresponds to each 'Dataframe' column name
nm1 <- c("col5A", "col5B", "col5C", ..., "col5Z")
lst2 <- Map(function(x) {names(x)[5] <- y; x},
mget(ls(pattern = "Dataframe\\d+")),
nm1)
In the above code, we are renaming the 5th column to 'newcol'.
It can also be done using tidyverse
library(dplyr)
library(purrr)
map(mget(ls(pattern = "Dataframe\\d+")), ~ .x %>%
rename_at(5, ~ "newcol"))

Manipulating a dataset by separating variables

I have a data set that looks similar to the image shown below. Total, it is over a 1000 observations long. I want to create a new data frame that separates the single variable into 3 variables. Each variable is separated by a "+" in each observation, so it will need to be separated by using that as a factor.
Here is a solution using data.table:
library(data.table)
# Data frame
df <- data.frame(MovieId.Title.Genres = c("yyyy+xxxx+wwww", "zzzz+aaaa+aaaa"))
# Data frame to data table.
df <- data.table(df)
# Split column into parts.
df[, c("MovieId", "Title", "Genres") := tstrsplit(MovieId.Title.Genres, "\\+")]
# Print data table
df
I'll assume that your movieData object is a single column data.frame object.
If you want to split a single element from your data set, use strsplit using the character + (which R wants to see written as "\\+"):
# split the first element of movieData into a vector of strings:
strsplit(as.character(movieData[1,1]), "\\+")
Use lapply to apply this to the entire column, then massage the resulting list into a nice, usable data.frame:
# convert to a list of vectors:
step1 = lapply(movieData[,1], function(x) strsplit(as.character(x), "\\+"))
# step1 is a list, so make it into a data.frame:
step2 = as.data.frame(step1)
# step2 is a nice data.frame, but its names are garbage. Fix it:
movieDataWithColumns = setNames(step2, c("MovieId", "Title", "Genres"))

How to cbind many data frames with a loop?

I have 105 data frames with xts, zoo class and II want to combine their 6th columns into a data frame.
So, I created a data frame that contains all the data frame names to use it with a 'for' function:
mydata <- AAL
for (i in 2:105) {
k <- top100[i,1] # The first column contains all the data frame names
mydata <- cbind(mydata, k)
}
It's obviously wrong, but I have no idea either how to cbind so many data frames with completely different names (my data frame names are NASDAQ Symbols) nor how to pick the 6th column of all.
Thank you in advance
Try foreach package. May be there is more elegant way to do this task, but this approach will work.
library(foreach)
#create simple data frames with columns named 'A' and 'B'
df1<-t(data.frame(1,2,3))
df2<-t(data.frame(4,5,6))
colnames(df1)<-c('A')
colnames(df2)<-c('B')
#make a list
dfs<-list(df1,df2)
#join data frames column by column, this will preserve their names
foreach(x=1:2
,.combine=cbind)%do% # don`t forget this directive
{
dfs[[x]]
}
The result will be:
A B
X1 1 4
X2 2 5
X3 3 6
To pick column number 6:
df[,6]
First, you should store all of your data.frames in a list. You can then use a combination of lapply and do.call to extract and recombine the sixth columns of each of the data.frames:
# Create sample data
df_list <- lapply(1:105, function(x) {
as.data.frame(matrix(sample(1:1000, 100), ncol = 10))
})
# Extract the sixth column from each data.frame
extracted_cols <- lapply(df_list, function(x) x[6])
# Combine all of the columns together into a new data.frame
result <- do.call("cbind", extracted_cols)
One way to get all of your preexisting data.frames into a list would be to use lapply along with get:
df_list <- lapply(top100[[1]], get)

Select a numeric columns of a dataframe in a list

I have a list of dataframes. After applying a function I get new columns that are non numeric. From each resulting dataframe that I save in a list modified_list As a result I want to save my modified dataframes but I only want to save the columns that contain numeric values.
I am stocked in the selection of numeric columns. I do not know how to select numeric columns on a list of dataframes. My code looks something like this. Please do you have any idea what can i do to make this code work?
library(plyr)
library(VIM)
data1 <- sleep
data2 <- sleep
data3 <- sleep
# get a list of dataframes
list_dataframes <- list(data1, data2, data3) # list of dataframes
n <- length(list_dataframes)
# apply function to the list_dataframes
modified_list <- llply(list_dataframes, myfunction)
# selects only numeric results
nums <- llply(modified_list, is.numeric)
# saving results
for (i in 1:n){
write.table(file = sprintf( "myfile/%s_hd.txt", dataframes[i]), modified_list[[i]][, nums], row.names = F, sep=",")
}
It sounds like you want to subset each data.frame in a list of data.frames to their numeric columns.
You can test which columns of a data.frame called df are numeric with
sapply(df, is.numeric)
This returns a logical vector, which can be used to subset your data.frame like this:
df[sapply(df, is.numeric)]
Returning the numeric columns of that data.frame. To do this over a list of data.frames df_list and return a list of subsetted data.frames:
lapply(df_list, function(df) df[sapply(df, is.numeric)])
Edit: Thanks #Richard Scriven for simplifying suggestion.

list variables to individual data.frames

Let's say I have a list of 30 data.frames, each containing 2 variables (called value, and rank), called myList
I'd know I can use
my.DF <- do.call("cbind", myList)
to create the output my.DF containing all the variables next to each other.
It is possible to cbind each variable individually into it's own data.frame i.e to just have a new data.frame of just the 2nd variable?
We can extract the second column by looping over the list (lapply) and wrap with data.frame.
data.frame(lapply(myList, `[`, 2))
If we want to separate the variables,
lapply(names(myList[[1]]), function(x)
do.call(cbind,lapply(myList, `[`, x)))
data
set.seed(24)
myList <- list( data.frame(value=1:6, rank= sample(6)),
data.frame(value=7:12, rank=sample(6)))

Resources