This question already has answers here:
Calculating ratios by group with dplyr
(2 answers)
Closed 2 years ago.
I have a problem with a for() loop running over unique site names in a data frame. The function is running, but it keeps on returning NULL result and I cannot find the mistake I must be making. $POP_RELCATEGORY is numeric. The code is:
x <- split(xx, xx$LOCALITY)
testtest <- for(i in length(unique(names(x)))){
curr_year <- max(x[[i]]$POP_RELCATEGORY[x[[i]]$ROK == 2019])
prev_year <- max(x[[i]]$POP_RELCATEGORY[x[[i]]$ROK == 2018])
return(curr_year/prev_year)
}
testtest
The ideal output would be an vector consisting of curr_year/prev_year for each unique site (locality).
Thank you
You don't need to split the data into various dataframes for this task. There are function available which can help you do to such grouped manipulation. For example, using dplyr you can try :
library(dplyr)
df %>%
group_by(LOCALITY) %>%
summarise(curr_year = max(POP_RELCATEGORY[ROK == 2019]),
prev_year = max(POP_RELCATEGORY[ROK == 2018]),
result = curr_year/prev_year)
Related
This question already has answers here:
Add conditions to expand grid in R?
(2 answers)
Closed 1 year ago.
I am currently trying to store these values together in a list. I want to essentially loop through two sequences and save the occurrences where var1 is greater than var2. The print statement appears to work, however, I think I am doing something wrong with saving the result in a list.
var1 <- seq(5, 51, by = 2)
var2 <- seq(2, 10, by = 1)
list <- list()
for (j in seq_along(var1)) {
for (i in seq_along(var2)) {
if (var1[j] > var2[i]) {
list[[i]] <- print(c(var1[j], var2[i]))
}
}
}
This is just an example, I am trying to use both variables to plug into a function that filters through data and provides a result. I therefore want a combination of 204 lists of data.
Thanks in advance.
This would probably be easier to store in a data.frame. YOu can more easily do this in base R with
expand.grid(var1=var1, var2=var2) |>
subset(var1 >var2)
or with tidyverse stuff
library(tidyverse)
crossing(var1, var2) %>%
filter(var1 > var2)
This question already has an answer here:
How to replace outlier values?
(1 answer)
Closed 1 year ago.
I want to find all the outliers in a dataframe and replace them by the mean of the variable (column).
This is a big dataframe, composed of 46 obs. of 147 variables.
I was thinking of doing somethings like
new_df <- for (i in scaled.df){
i[!i %in% boxplot.stats(i)$out]
And then replace NULL values, but that function creates a NULL object, I believe the reason is that the new vectors created won´t have the same length.
Any ideas? Thx
You can write a function to do this -
replace_outlier_with_mean <- function(x) {
replace(x, x %in% boxplot.stats(x)$out, mean(x))
}
To apply for multiple columns you can use lapply -
scaled.df[] <- lapply(scaled.df, replace_outlier_with_mean)
Or in dplyr -
library(dplyr)
scaled.df %>% mutate(across(.fns = replace_outlier_with_mean))
This question already has answers here:
How to split a data frame?
(8 answers)
Closed 5 years ago.
I'm new to R. I have a dataset with names in the first row, the category the names belong to in the second row, and then price observations for two year from the third row onwards. I want to split the data frame using the categories in the second row. How do I do this?
This is what my dataset looks like (on R):
This is what I want it look like (on Excel) :
Note: I cannot do this on Excel and then import because there are way too many categories.
Multiple possiblities
df <- data.frame(data = c(1:12), category = rep(letters[1:3], 4))
subset function.
df_a <- subset(df, category == "a")
basic data.frame subset
df_a <- df[df$category == "a",]
into a list
ls <- list
for(category in unique(df$category)){
ls[[category]] <- df[df$category == "a", ]
}
You have the answer in your question. The split or split.data.frame functions would do it. The second argument must be of factor type for this to work.
Example
newdf <- split.data.frame(iris, iris$Species)
newdf
This question already has answers here:
Group Data in R for consecutive rows
(3 answers)
Closed 6 years ago.
I have written a for loop that takes a group of 5 rows from a dataframe and passes it to a function, the function then returns just one row after doing some operations on those 5 rows. Below is the code:
for (i in 1:nrow(features_data1)){
if (i - start == 4){
group = features_data1[start:i,]
group <- as.data.frame(group)
start <- i+1
sub_data = feature_calculation(group)
final_data = rbind(final_data,sub_data)
}
}
Can anyone please suggest me an alternative to this as the for loop is taking a lot of time. The function feature_calculation is huge.
Try this for a base R approach:
# convert features to data frame in advance so we only have to do this once
features_df <- as.data.frame(features_data1)
# assign each observation (row) to a group of 5 rows and split the data frame into a list of data frames
group_assignments <- as.factor(rep(1:ceiling(nrow(features_df) / 5), each = 5, length.out = nrow(features_df)))
groups <- split(features_df, group_assignments)
# apply your function to each group individually (i.e. to each element in the list)
sub_data <- lapply(X = groups, FUN = feature_calculation)
# bind your list of data frames into a single data frame
final_data <- do.call(rbind, sub_data)
You might be able to use the purrr and dplyr packages for a speed-up. The latter has a function bind_rows that is much quicker than do.call(rbind, list_of_data_frames) if this is likely to be very large.
This question already has answers here:
Dynamically select data frame columns using $ and a character value
(10 answers)
Closed 7 years ago.
Background
Sorry if this is a repeat, I couldn't find an exact match to this question.
So as part of a larger function, I'm trying to add a new column in a data.frame which is basically the division of two variables within that data.frame.
For example:
data(iris)
iris_test <- function(dataset, var1, var2) {
data <- dataset
data$length_width <- data$var1/data$var2
return(data)
}
If i then utilize this function
iris <- iris_test(iris, 'Petal.Length', 'Petal.Width')
I would hopefully generate a new column with data$length_width, however the code is breaking.
Error in `$<-.data.frame`(`*tmp*`, "length_width", value = numeric(0)) :
replacement has 0 rows, data has 150
I suspect you could do something fancy with paste() or formula() but really I want to understand what is happening and wy.
You cannot use character variables for the dollar notation. Try this:
data(iris)
iris_test <- function(dataset, var1, var2) {
data <- dataset
data$length_width <- data[[var1]]/data[[var2]]
return(data)
}