R: How to include lm residual back into the data.frame? [duplicate] - r

This question already has answers here:
Aligning Data frame with missing values
(4 answers)
Closed 6 years ago.
I am trying to put the residuals from lm back into the original data.frame:
fit <- lm(y ~ x, data = mydata, weight = ind)
mydata$resid <- fit$resid
The second line would normally work if the residual has the same length as the number of rows of mydata. However, in my case, some of the elements of ind is NA. Therefore the residual length is usually less than the number of rows. Also fit$resid is a vector of "numeric" so there is no label for me to merge it back with the mydata data.frame. Is there an elegant way to achieve this?

I think it should be pretty easy if ind is just a vector.
sel <- which(!is.na(ind))
mydata$resid <- NA
mydata$resid[sel] <- fit$resid

Related

Replace outliers of a dataframe with the mean value [duplicate]

This question already has an answer here:
How to replace outlier values?
(1 answer)
Closed 1 year ago.
I want to find all the outliers in a dataframe and replace them by the mean of the variable (column).
This is a big dataframe, composed of 46 obs. of 147 variables.
I was thinking of doing somethings like
new_df <- for (i in scaled.df){
i[!i %in% boxplot.stats(i)$out]
And then replace NULL values, but that function creates a NULL object, I believe the reason is that the new vectors created won´t have the same length.
Any ideas? Thx
You can write a function to do this -
replace_outlier_with_mean <- function(x) {
replace(x, x %in% boxplot.stats(x)$out, mean(x))
}
To apply for multiple columns you can use lapply -
scaled.df[] <- lapply(scaled.df, replace_outlier_with_mean)
Or in dplyr -
library(dplyr)
scaled.df %>% mutate(across(.fns = replace_outlier_with_mean))

From a dataframe extract columns with numerical values [duplicate]

This question already has answers here:
Selecting only numeric columns from a data frame
(12 answers)
Closed 4 years ago.
I would like to extract all columns for which the values are numeric from a dataframe, for a large dataset.
#generate mixed data
dat <- matrix(rnorm(100), nrow = 20)
df <- data.frame(letters[1 : 20], dat)
I was thinking of something along the lines of:
numdat <- df[,df == "numeric"]
That however leaves me without variables. The following gives an error.
dat <- df[,class == "numeric"]
Error in class == "numeric" :
comparison (1) is possible only for atomic and list types
What should I do instead?
use sapply
numdat <- df[,sapply(df, function(x) {class(x)== "numeric"})]

R sum multi columns in df technique [duplicate]

This question already has answers here:
How to sum a variable by group
(18 answers)
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 5 years ago.
I try to get sum for each of the Flags 1-3 in my dataframe and keep same column names, so I get single row , but looks like I missing some df/Numeric conversion here, can you please advice , not sure whey I get dim(dfs) = NULL??
df <- data.frame(label=2017, F1=1:4, F2=2:5, F3=3:6)
df
dfs <- c( max(df$label), sum(df$F1), sum(df$F2), sum(df$F3))
#dfs <- data.frame(c( max(df$label), sum(df$F1), sum(df$F2), sum(df$F3)) )
dfs
str(dfs)
dim(dfs)
colnames(dfs) <-c('Label', 'F1','F2','F3')
## Error in `colnames<-`(`*tmp*`, value = c("Label", "F1", "F2", "F3")) :
## attempt to set 'colnames' on an object with less than two dimensions
Your c() creates a vector, not a data frame. If you convert your vector to a one-row data frame with as.data.frame(t(dfs)), you'll be able to set the column names.
You might also be interested in colSums(), or maybe even the How to sum variables by group? R-FAQ.

Functionally add new column based on division of two others [duplicate]

This question already has answers here:
Dynamically select data frame columns using $ and a character value
(10 answers)
Closed 7 years ago.
Background
Sorry if this is a repeat, I couldn't find an exact match to this question.
So as part of a larger function, I'm trying to add a new column in a data.frame which is basically the division of two variables within that data.frame.
For example:
data(iris)
iris_test <- function(dataset, var1, var2) {
data <- dataset
data$length_width <- data$var1/data$var2
return(data)
}
If i then utilize this function
iris <- iris_test(iris, 'Petal.Length', 'Petal.Width')
I would hopefully generate a new column with data$length_width, however the code is breaking.
Error in `$<-.data.frame`(`*tmp*`, "length_width", value = numeric(0)) :
replacement has 0 rows, data has 150
I suspect you could do something fancy with paste() or formula() but really I want to understand what is happening and wy.
You cannot use character variables for the dollar notation. Try this:
data(iris)
iris_test <- function(dataset, var1, var2) {
data <- dataset
data$length_width <- data[[var1]]/data[[var2]]
return(data)
}

Reversing a melting operation with reshape2 [duplicate]

This question already has an answer here:
Simpler way to reconstitute a melted data frame back to the original
(1 answer)
Closed 9 years ago.
Consider the following code.
library (reshape2)
x = rnorm (20)
y = x + rnorm (rnorm (20, sd = .01))
dfr <- data.frame (x, y)
mlt <- melt (dfr)
When I try to reverse this operation with dcast,
dcast (mlt, value ~ variable)
I get instead a data frame with three columns (not suitable for scatter-plotting, for instance).
How can I reenact the original data frame with dcast?
How could R know the ordering that existed before the melt? i.e. the notion that row one of x matches up with row one of y.
If you add an index column (since R will complain about duplicated row.names) you can do this operation simply:
dfr$idx <- seq_along(dfr$x)
mlt <- melt(dfr, id.var='idx')
dcast(mlt, idx ~ variable, value.var='value')

Resources