This question already has answers here:
Dynamically select data frame columns using $ and a character value
(10 answers)
Closed 7 years ago.
Background
Sorry if this is a repeat, I couldn't find an exact match to this question.
So as part of a larger function, I'm trying to add a new column in a data.frame which is basically the division of two variables within that data.frame.
For example:
data(iris)
iris_test <- function(dataset, var1, var2) {
data <- dataset
data$length_width <- data$var1/data$var2
return(data)
}
If i then utilize this function
iris <- iris_test(iris, 'Petal.Length', 'Petal.Width')
I would hopefully generate a new column with data$length_width, however the code is breaking.
Error in `$<-.data.frame`(`*tmp*`, "length_width", value = numeric(0)) :
replacement has 0 rows, data has 150
I suspect you could do something fancy with paste() or formula() but really I want to understand what is happening and wy.
You cannot use character variables for the dollar notation. Try this:
data(iris)
iris_test <- function(dataset, var1, var2) {
data <- dataset
data$length_width <- data[[var1]]/data[[var2]]
return(data)
}
Related
This question already has answers here:
Add conditions to expand grid in R?
(2 answers)
Closed 1 year ago.
I am currently trying to store these values together in a list. I want to essentially loop through two sequences and save the occurrences where var1 is greater than var2. The print statement appears to work, however, I think I am doing something wrong with saving the result in a list.
var1 <- seq(5, 51, by = 2)
var2 <- seq(2, 10, by = 1)
list <- list()
for (j in seq_along(var1)) {
for (i in seq_along(var2)) {
if (var1[j] > var2[i]) {
list[[i]] <- print(c(var1[j], var2[i]))
}
}
}
This is just an example, I am trying to use both variables to plug into a function that filters through data and provides a result. I therefore want a combination of 204 lists of data.
Thanks in advance.
This would probably be easier to store in a data.frame. YOu can more easily do this in base R with
expand.grid(var1=var1, var2=var2) |>
subset(var1 >var2)
or with tidyverse stuff
library(tidyverse)
crossing(var1, var2) %>%
filter(var1 > var2)
This question already has an answer here:
How to replace outlier values?
(1 answer)
Closed 1 year ago.
I want to find all the outliers in a dataframe and replace them by the mean of the variable (column).
This is a big dataframe, composed of 46 obs. of 147 variables.
I was thinking of doing somethings like
new_df <- for (i in scaled.df){
i[!i %in% boxplot.stats(i)$out]
And then replace NULL values, but that function creates a NULL object, I believe the reason is that the new vectors created won´t have the same length.
Any ideas? Thx
You can write a function to do this -
replace_outlier_with_mean <- function(x) {
replace(x, x %in% boxplot.stats(x)$out, mean(x))
}
To apply for multiple columns you can use lapply -
scaled.df[] <- lapply(scaled.df, replace_outlier_with_mean)
Or in dplyr -
library(dplyr)
scaled.df %>% mutate(across(.fns = replace_outlier_with_mean))
This question already has answers here:
Update data frame via function doesn't work
(6 answers)
Closed 1 year ago.
I have written a function to change values not being NA in each column into a new value. The following example illustrates the problem:
df <- data.frame(A=c(1,NA,1,1,NA),
B=c(NA,1,NA,1,NA),
C=c(1,NA,1,NA,1))
1's should be changed into 0's with the function:
cambio <- function(d,v){
d[[v]][!is.na(d[[v]])] <- 0
}
The column is named within the function with [[]], and it is passed with quotes as argument to the function. I learned this in a clear and useful response to the post Pass a data.frame column name to a function.
However, after running the function, for example, with the first variable,
cambio(df,"A")
the values of tha column keep unchanged.
Why this function does not work as expected?
You have
d[[v]][!is.na(d[[v]])] <- 0
But this tells it to put a zero on any not NA, so you want:
cambio <- function(d,v){
d[[v]][is.na(d[[v]])] <- 0
return(d)
}
EDIT:: you're just missing the return(d) statement.
Here's a few base R solutions:
one:
replace(df, df == 1, 0)
two:
replace(df, !is.na(df), 0)
three:
data.frame(lapply(df, pmin, 0))
This question already has answers here:
Calculating ratios by group with dplyr
(2 answers)
Closed 2 years ago.
I have a problem with a for() loop running over unique site names in a data frame. The function is running, but it keeps on returning NULL result and I cannot find the mistake I must be making. $POP_RELCATEGORY is numeric. The code is:
x <- split(xx, xx$LOCALITY)
testtest <- for(i in length(unique(names(x)))){
curr_year <- max(x[[i]]$POP_RELCATEGORY[x[[i]]$ROK == 2019])
prev_year <- max(x[[i]]$POP_RELCATEGORY[x[[i]]$ROK == 2018])
return(curr_year/prev_year)
}
testtest
The ideal output would be an vector consisting of curr_year/prev_year for each unique site (locality).
Thank you
You don't need to split the data into various dataframes for this task. There are function available which can help you do to such grouped manipulation. For example, using dplyr you can try :
library(dplyr)
df %>%
group_by(LOCALITY) %>%
summarise(curr_year = max(POP_RELCATEGORY[ROK == 2019]),
prev_year = max(POP_RELCATEGORY[ROK == 2018]),
result = curr_year/prev_year)
This question already has answers here:
Selecting only numeric columns from a data frame
(12 answers)
Closed 4 years ago.
I would like to extract all columns for which the values are numeric from a dataframe, for a large dataset.
#generate mixed data
dat <- matrix(rnorm(100), nrow = 20)
df <- data.frame(letters[1 : 20], dat)
I was thinking of something along the lines of:
numdat <- df[,df == "numeric"]
That however leaves me without variables. The following gives an error.
dat <- df[,class == "numeric"]
Error in class == "numeric" :
comparison (1) is possible only for atomic and list types
What should I do instead?
use sapply
numdat <- df[,sapply(df, function(x) {class(x)== "numeric"})]