This question already has answers here:
How to split a data frame?
(8 answers)
Closed 5 years ago.
I'm new to R. I have a dataset with names in the first row, the category the names belong to in the second row, and then price observations for two year from the third row onwards. I want to split the data frame using the categories in the second row. How do I do this?
This is what my dataset looks like (on R):
This is what I want it look like (on Excel) :
Note: I cannot do this on Excel and then import because there are way too many categories.
Multiple possiblities
df <- data.frame(data = c(1:12), category = rep(letters[1:3], 4))
subset function.
df_a <- subset(df, category == "a")
basic data.frame subset
df_a <- df[df$category == "a",]
into a list
ls <- list
for(category in unique(df$category)){
ls[[category]] <- df[df$category == "a", ]
}
You have the answer in your question. The split or split.data.frame functions would do it. The second argument must be of factor type for this to work.
Example
newdf <- split.data.frame(iris, iris$Species)
newdf
Related
This question already has answers here:
Split data.frame based on levels of a factor into new data.frames
(3 answers)
Closed 2 years ago.
For this dataframe:
df = data.frame(
col = c("a","f","g","a")
)
How do I subset it for each unique letter and input it into a new dataframe like so?:
sheet_a <- subset(df, col == "a")
sheet_f <- subset(df, col == "f")
sheet_g <- subset(df, col == "g")
I think I need to use a column of unique characters using the below code in a for loop but I'm not sure how
uniq.name_col <- unique(as.vector(df$col))
Thank you for any help!
You can try this, that includes code for exporting dataframes to environment:
#Create list
List <- split(df,df$col)
#Set to envir al dataframes
list2env(List,.GlobalEnv)
This question already has answers here:
Split a large dataframe into a list of data frames based on common value in column
(3 answers)
Closed 4 years ago.
I have a dataframe like the following
x <- c(1:100)
y <- c("a","b","c","d","e","f","g","h","i","j")
y<-rep(y, each=10)
df<-data.frame(x,y)
I would like to make a list of dataframes by subsetting by values in the y column. The end result would produce the same output as something like this:
df1 <- data.frame(df[df$y=="a",])
df2 <- data.frame(df[df$y=="b",])
...
df10 <- data.frame(df[df$y=="j",])
list <- list(df1,df2.....df10)
... but without all of the repetition. Thanks!
split(df, y)
.................
This question already has answers here:
Selecting only numeric columns from a data frame
(12 answers)
Closed 4 years ago.
I would like to extract all columns for which the values are numeric from a dataframe, for a large dataset.
#generate mixed data
dat <- matrix(rnorm(100), nrow = 20)
df <- data.frame(letters[1 : 20], dat)
I was thinking of something along the lines of:
numdat <- df[,df == "numeric"]
That however leaves me without variables. The following gives an error.
dat <- df[,class == "numeric"]
Error in class == "numeric" :
comparison (1) is possible only for atomic and list types
What should I do instead?
use sapply
numdat <- df[,sapply(df, function(x) {class(x)== "numeric"})]
This question already has answers here:
How do I extract a single column from a data.frame as a data.frame?
(3 answers)
Closed 6 years ago.
How can I subset a data frame:
df <- data.frame(a = c(1,2,3), b = c(4,5,6))
such that I always get a data frame back even if only one column is selected?
Result as desired when selecting two columns:
class( df[,1:2] )
[1] "data.frame"
Result not as desired when selecting only one column:
class( df[,1] )
[1] "numeric"
Desired result when selecting one column would be equivalent to:
class( data.frame(a = c(1,2,3) )
To clarify from Zheyuan Li:
df[1]
df[,1, drop = FALSE]
return a data frame with only column 1.
If you want to subset rows as well as columns, these work for me:
df[1:2, 1, drop = FALSE]
subset(df[1], a < 3)
subset(df, subset = a<3, select = a)
As suggested in the comments, both of these possibilities give exactly what I was looking for. Thanks for helping me understand this better!
df[1]
df[,1,drop=FALSE]
This question already has answers here:
Group Data in R for consecutive rows
(3 answers)
Closed 6 years ago.
I have written a for loop that takes a group of 5 rows from a dataframe and passes it to a function, the function then returns just one row after doing some operations on those 5 rows. Below is the code:
for (i in 1:nrow(features_data1)){
if (i - start == 4){
group = features_data1[start:i,]
group <- as.data.frame(group)
start <- i+1
sub_data = feature_calculation(group)
final_data = rbind(final_data,sub_data)
}
}
Can anyone please suggest me an alternative to this as the for loop is taking a lot of time. The function feature_calculation is huge.
Try this for a base R approach:
# convert features to data frame in advance so we only have to do this once
features_df <- as.data.frame(features_data1)
# assign each observation (row) to a group of 5 rows and split the data frame into a list of data frames
group_assignments <- as.factor(rep(1:ceiling(nrow(features_df) / 5), each = 5, length.out = nrow(features_df)))
groups <- split(features_df, group_assignments)
# apply your function to each group individually (i.e. to each element in the list)
sub_data <- lapply(X = groups, FUN = feature_calculation)
# bind your list of data frames into a single data frame
final_data <- do.call(rbind, sub_data)
You might be able to use the purrr and dplyr packages for a speed-up. The latter has a function bind_rows that is much quicker than do.call(rbind, list_of_data_frames) if this is likely to be very large.