From a dataframe extract columns with numerical values [duplicate] - r

This question already has answers here:
Selecting only numeric columns from a data frame
(12 answers)
Closed 4 years ago.
I would like to extract all columns for which the values are numeric from a dataframe, for a large dataset.
#generate mixed data
dat <- matrix(rnorm(100), nrow = 20)
df <- data.frame(letters[1 : 20], dat)
I was thinking of something along the lines of:
numdat <- df[,df == "numeric"]
That however leaves me without variables. The following gives an error.
dat <- df[,class == "numeric"]
Error in class == "numeric" :
comparison (1) is possible only for atomic and list types
What should I do instead?

use sapply
numdat <- df[,sapply(df, function(x) {class(x)== "numeric"})]

Related

How to loop through a vector of data frame names to print first columns of the df's? [duplicate]

This question already has answers here:
How to extract certain columns from a list of data frames
(3 answers)
Closed 2 years ago.
so x is a vector. i am trying to print the first col of df's name's saved in the vector. so far I have tried the below but they don't seem to work.
x = (c('Ethereum,another Df..., another DF...,'))
for (i in x){
print(i[,1])
}
sapply(toString(Ethereum), function(i) print(i[1]))
You can try this
x <- c('Ethereum','anotherDf',...)
for (i in x){
print(get(i)[,1])
}
You can use mget to get data in a list and using lapply extract the first column of each dataframe in the list.
data <- lapply(mget(x), `[`, 1)
#Use `[[` to get it as vector.
#data <- lapply(mget(x), `[[`, 1)
Similar solution using purrr::map :
data <- purrr::map(mget(x), `[`, 1)

R make a list of dataframes by subsetting from a dataframe [duplicate]

This question already has answers here:
Split a large dataframe into a list of data frames based on common value in column
(3 answers)
Closed 4 years ago.
I have a dataframe like the following
x <- c(1:100)
y <- c("a","b","c","d","e","f","g","h","i","j")
y<-rep(y, each=10)
df<-data.frame(x,y)
I would like to make a list of dataframes by subsetting by values in the y column. The end result would produce the same output as something like this:
df1 <- data.frame(df[df$y=="a",])
df2 <- data.frame(df[df$y=="b",])
...
df10 <- data.frame(df[df$y=="j",])
list <- list(df1,df2.....df10)
... but without all of the repetition. Thanks!
split(df, y)
.................

Splitting data frame in R [duplicate]

This question already has answers here:
How to split a data frame?
(8 answers)
Closed 5 years ago.
I'm new to R. I have a dataset with names in the first row, the category the names belong to in the second row, and then price observations for two year from the third row onwards. I want to split the data frame using the categories in the second row. How do I do this?
This is what my dataset looks like (on R):
This is what I want it look like (on Excel) :
Note: I cannot do this on Excel and then import because there are way too many categories.
Multiple possiblities
df <- data.frame(data = c(1:12), category = rep(letters[1:3], 4))
subset function.
df_a <- subset(df, category == "a")
basic data.frame subset
df_a <- df[df$category == "a",]
into a list
ls <- list
for(category in unique(df$category)){
ls[[category]] <- df[df$category == "a", ]
}
You have the answer in your question. The split or split.data.frame functions would do it. The second argument must be of factor type for this to work.
Example
newdf <- split.data.frame(iris, iris$Species)
newdf

Refactor whole data frame [duplicate]

This question already has answers here:
Drop unused factor levels in a subsetted data frame
(16 answers)
Closed 8 years ago.
I have a data frame that I have read from disk and then applied a filter:
df <- df[ df$x > 10, ]
Question: How can I refactor all factors in the data frame now that several rows have been removed?
The following worked for me:
df <- as.data.frame(lapply(df, function (x) if (is.factor(x)) factor(x) else x))
Source: http://r.789695.n4.nabble.com/Refactor-all-factors-in-a-data-frame-tp826749p826754.html

R: How to include lm residual back into the data.frame? [duplicate]

This question already has answers here:
Aligning Data frame with missing values
(4 answers)
Closed 6 years ago.
I am trying to put the residuals from lm back into the original data.frame:
fit <- lm(y ~ x, data = mydata, weight = ind)
mydata$resid <- fit$resid
The second line would normally work if the residual has the same length as the number of rows of mydata. However, in my case, some of the elements of ind is NA. Therefore the residual length is usually less than the number of rows. Also fit$resid is a vector of "numeric" so there is no label for me to merge it back with the mydata data.frame. Is there an elegant way to achieve this?
I think it should be pretty easy if ind is just a vector.
sel <- which(!is.na(ind))
mydata$resid <- NA
mydata$resid[sel] <- fit$resid

Resources