This question already has answers here:
Split data.frame based on levels of a factor into new data.frames
(3 answers)
Closed 2 years ago.
For this dataframe:
df = data.frame(
col = c("a","f","g","a")
)
How do I subset it for each unique letter and input it into a new dataframe like so?:
sheet_a <- subset(df, col == "a")
sheet_f <- subset(df, col == "f")
sheet_g <- subset(df, col == "g")
I think I need to use a column of unique characters using the below code in a for loop but I'm not sure how
uniq.name_col <- unique(as.vector(df$col))
Thank you for any help!
You can try this, that includes code for exporting dataframes to environment:
#Create list
List <- split(df,df$col)
#Set to envir al dataframes
list2env(List,.GlobalEnv)
Related
This question already has answers here:
Split a large dataframe into a list of data frames based on common value in column
(3 answers)
Closed 4 years ago.
I have a dataframe like the following
x <- c(1:100)
y <- c("a","b","c","d","e","f","g","h","i","j")
y<-rep(y, each=10)
df<-data.frame(x,y)
I would like to make a list of dataframes by subsetting by values in the y column. The end result would produce the same output as something like this:
df1 <- data.frame(df[df$y=="a",])
df2 <- data.frame(df[df$y=="b",])
...
df10 <- data.frame(df[df$y=="j",])
list <- list(df1,df2.....df10)
... but without all of the repetition. Thanks!
split(df, y)
.................
This question already has answers here:
How do I split a data frame among columns, say at every nth column?
(1 answer)
What is the algorithm behind R core's `split` function?
(1 answer)
Closed 4 years ago.
Is there an easy way in base R to split a data frame into a list of data frames based on an index factor levels (taken from another data frame)?
For example,
x = data.frame(num1 = 1:26, let = letters, num2 = 10:35, LET = LETTERS)
ls = list(x[, 1:2], x[, 3:4])
But lets say we had an index indicating factor levels for columns, can split be used?
indx = c(1,1,2,2)
? split(x, indx)
It would be the default method of split
out <- split.default(x, indx)
identical(ls, setNames(out, NULL))
#[1] TRUE
This question already has answers here:
Selecting only numeric columns from a data frame
(12 answers)
Closed 4 years ago.
I would like to extract all columns for which the values are numeric from a dataframe, for a large dataset.
#generate mixed data
dat <- matrix(rnorm(100), nrow = 20)
df <- data.frame(letters[1 : 20], dat)
I was thinking of something along the lines of:
numdat <- df[,df == "numeric"]
That however leaves me without variables. The following gives an error.
dat <- df[,class == "numeric"]
Error in class == "numeric" :
comparison (1) is possible only for atomic and list types
What should I do instead?
use sapply
numdat <- df[,sapply(df, function(x) {class(x)== "numeric"})]
This question already has answers here:
How to split a data frame?
(8 answers)
Closed 5 years ago.
I'm new to R. I have a dataset with names in the first row, the category the names belong to in the second row, and then price observations for two year from the third row onwards. I want to split the data frame using the categories in the second row. How do I do this?
This is what my dataset looks like (on R):
This is what I want it look like (on Excel) :
Note: I cannot do this on Excel and then import because there are way too many categories.
Multiple possiblities
df <- data.frame(data = c(1:12), category = rep(letters[1:3], 4))
subset function.
df_a <- subset(df, category == "a")
basic data.frame subset
df_a <- df[df$category == "a",]
into a list
ls <- list
for(category in unique(df$category)){
ls[[category]] <- df[df$category == "a", ]
}
You have the answer in your question. The split or split.data.frame functions would do it. The second argument must be of factor type for this to work.
Example
newdf <- split.data.frame(iris, iris$Species)
newdf
This question already has answers here:
Sort (order) data frame rows by multiple columns
(19 answers)
Closed 6 years ago.
I am trying to order rows by a variable. I have created a sample data frame below and tried to order the rows but the ordering does not appear to work.
# Create vectors for data frame
score <- rep(seq(1:3), 2)
id <- rep(c(2014, 2015), each = 3)
var_if_1 <- rep(c(0.1, 0.8), each = 3)
var_if_2 <- rep(c(0.9, 0.7), each = 3)
var_if_3 <- rep(c(0.6, 0.2), each = 3)
# Generate and print data frame of raw data
foo <- data.frame(score, id, var_if_1, var_if_2, var_if_3)
foo
# Impose arbitrary ordering
bar <- foo[sample(1:nrow(foo)), ]
bar
# Order rows increasing on 'score'
bar[order(score), ]
What am I doing wrong that this doesn't oder the rows on score?
You should use
bar[order(bar$score), ]
Otherwise, you're ordering on the base of the variable "score" instead of the column.