loop multiplication on different datasets with same variable names

loop multiplication on different datasets with same variable names - r

Hey I'm sure I'm missing something simple with mapping, but I can't get it to work. I want to use a loop to do the same calculation for multiple dataframes that have the same name. Basically, I want this loop to not throw an error:
mtcars1 <- mtcars
mtcars2 <- mtcars
for(x in c(mtcars1, mtcars2)){
x$new <- x$mpg * x$cyl
}
So that at the end, both mtcars1 and mtcars2 have a new variable called new, that is mpg * cyl.

If you first put your dataframes into a list, you can index into each using seq_along():
dfs <- list(mtcars1 = mtcars, mtcars2 = mtcars)
for (i in seq_along(dfs)) {
dfs[[i]]$new <- dfs[[i]]$mpg * dfs[[i]]$cyl
}
Or, using lapply():
dfs <- lapply(dfs, \(x) {
x$new <- x$mpg * x$cyl
x
})
Result from either approach:
#> head(dfs$mtcars1)
mpg cyl disp hp drat wt qsec vs am gear carb new
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 126.0
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 126.0
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 91.2
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 128.4
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 149.6
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 108.6
If you really want to leave your dataframes loose in the environment, you could do something like
for (nm in c("mtcars1", "mtcars2")) {
x <- get(nm)
x$new <- x$mpg * x$cyl
assign(nm, x)
}
Result:
#> head(mtcars1)
mpg cyl disp hp drat wt qsec vs am gear carb new
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 126.0
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 126.0
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 91.2
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 128.4
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 149.6
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 108.6

The for loop does not work as you think, and more important, the effect of c(mtcars1, mtcars2) is not what you think, to see so do
test <- c(mtcars1, mtcars2)
length(test)
str(test)
You need to replace the c with list. Below is one solution:
mtcars1 <- mtcars
mtcars2 <- mtcars
test <- list(mtcars1, mtcars2)
newtest <- lapply(test, FUN=function(x)
within(x, new <- mpg * cyl))

We could use transform with lapply
lst1 <- lapply(lst1, transform, new = mpg * cyl)
data
lst1 <- list(mtcars1, mtcars2)

Related

How to create function to use regular expressions to replace column names in a data frame?

I am feeling lost with how to create a helper function in R that takes the following 3 arguments:
a data frame,
a string pattern, and
a string "replacement pattern".
The function is supposed to replace occurrences of the string pattern in the names of the variables in the data frame with the replacement pattern.
Any guidance, tips or help would be greatly appreciated.

func <- function(x, nm1, nm2, ...) {
names(x) <- gsub(nm1, nm2, names(x), ...)
x
}
head(func(mtcars, "c", "C"))
# mpg Cyl disp hp drat wt qseC vs am gear Carb
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
# Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
# Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

Specifying where columns should be placed in r

When I create a new variable, is there a way to specify in the function where to place it?
Right now, it adds it to the end of the dataframe, but for ease of viewing in Excel for example, I'd like to place a new calculated column beside the columns I used for the calculation.
Here's an example of code:
rawdata2 <- (rawdata1 %>% unite(location, locations1,locations2, locations3,
na.rm = TRUE, remove=TRUE)
%>% select(-location7, -location16)
%>% unite(Sector, Sectors, na.rm=TRUE, remove=TRUE)
%>% unite(TypeofSpace, TypesofSpace, type.of.spaceOther, na.rm=TRUE,
remove=TRUE)
)

You can rearrange the columns in your data frame. It looks like you are using dplyr::select in your example.
library(dplyr)
head(mtcars)
# mpg cyl disp hp drat wt qsec vs am gear carb
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
# Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
# Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
# Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
mtcars2 <- mtcars %>%
select(mpg, carb, everything()) ## moves carb up behind mpg
head(mtcars2)
# mpg carb cyl disp hp drat wt qsec vs am gear
# Mazda RX4 21.0 4 6 160 110 3.90 2.620 16.46 0 1 4
# Mazda RX4 Wag 21.0 4 6 160 110 3.90 2.875 17.02 0 1 4
# Datsun 710 22.8 1 4 108 93 3.85 2.320 18.61 1 1 4
# Hornet 4 Drive 21.4 1 6 258 110 3.08 3.215 19.44 1 0 3
# Hornet Sportabout 18.7 2 8 360 175 3.15 3.440 17.02 0 0 3
# Valiant 18.1 1 6 225 105 2.76 3.460 20.22 1 0 3
You can do the same thing with base subsetting, for example with a data frame with 11 columns you can move the 11th behind the second by
mtcars3 <- mtcars[,c(1,11,2:10)]
identical(mtcars2, mtcars3)
# [1] TRUE

I ended up using relocate, documentation here: dplyr.tidyverse.org/reference/relocate.html

write function to replace variable with itself plus 1% of its median

I'm new to writing functions in R, but want to write a function to add 1% of the median of a variable to itself, using dplyr, and replace the variable with this transformation.
x is a numeric variable.
add_median <- function(df, x) {
x <- enquo(x)
x <- quo_name(x)
mutate(x=x+.01*median(x, na.rm=T))
}
When I run newDF <- DF %>% add_median(variable_of_interest), I get the following error:
Error in 0.01 * median(x, na.rm = T) : non-numeric argument to binary operator
What am I doing wrong here?

We could change the function to evaluate with {{}} and then use assign (:=) instead of = in mutate
library(dplyr)
add_median <- function(df, x) {
df %>%
mutate({{x}} := {{x}} + .01 * median({{x}}, na.rm = TRUE))
}
If we need to change multiple columns, use mutate_at
add_median_multiple <- function(df, vec){
df %>%
mutate_at(vars(vec), ~ . + .01 * median(., na.rm = TRUE))
}
-testing
data(mtcars)
head(mtcars) %>%
add_median(mpg)
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.21 6 160 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.21 6 160 110 3.90 2.875 17.02 0 1 4 4
#Datsun 710 23.01 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.61 6 258 110 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout 18.91 8 360 175 3.15 3.440 17.02 0 0 3 2
#Valiant 18.31 6 225 105 2.76 3.460 20.22 1 0 3 1
comparison with original 'mpg' column
head(mtcars)
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
add_median_multiple(head(mtcars), c('mpg', 'wt'))
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.21 6 160 110 3.90 2.65045 16.46 0 1 4 4
#Mazda RX4 Wag 21.21 6 160 110 3.90 2.90545 17.02 0 1 4 4
#Datsun 710 23.01 4 108 93 3.85 2.35045 18.61 1 1 4 1
#Hornet 4 Drive 21.61 6 258 110 3.08 3.24545 19.44 1 0 3 1
#Hornet Sportabout 18.91 8 360 175 3.15 3.47045 17.02 0 0 3 2
#Valiant 18.31 6 225 105 2.76 3.49045 20.22 1 0 3 1

Convert LIst To Dataframe Using For Loop And Saving Under Different Names In R

I am trying to convert my list consisting of 52 components to a dataframe for each of the components.
Without using the for loop will look something like this which is tedious:
df1 = as.data.frame(list[1])
df2 = as.data.frame(list[2])
df3 = as.data.frame(list[3])
.
.
.
df50 = as.data.frame(list[50])
How do I achieve this using the for loop? My attempt:
for (i in seq_along(list)) {
noquote(paste0("df", i)) = as.data.frame(list[i])
}
Error: target of assignment expands to non-language objec
I think I'll have to invovle assign.

If you have list of dataframes in list, you can name them and then use list2env to have them as separate dataframes in the environment.
names(list) <- paste0('df', seq_along(list))
list2env(list, .GlobalEnv)
Using a reproducible exmaple,
temp <- list(mtcars, mtcars)
names(temp) <- paste0('df', seq_along(temp))
list2env(temp, .GlobalEnv)
head(df1)
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
head(df2)
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
However, note that
list is an internal function in R, so it is better to name your variables something else.
As #MrFlick suggested try to keep your data in a list as lists are easier to manage rather than creating numerous objects in your global environment.

We can use assign instead of noquote from the OP's function
for (i in seq_along(list)) {
assign(paste0("df", i), value = list[[i]])
}

Opposite function to add_rownames in dplyr

As an intermediate step I generate a data frame with one column as character strings and the rest are numbers. I'd like to convert it to a matrix, but first I have to convert that character column into row names and remove it from the data frame.
Is there a simpe way to do this in dplyr? A function like to_rownames() that is opposite to add_rownames()?
I saw a solution using a custom function, but it's really out of dplyr philosophy.

You can now use the tibble-package:
tibble::column_to_rownames()

This provides NSE & standard eval functions:
library(dplyr)
df <- data_frame(a=sample(letters, 4), b=c(1:4), c=c(5:8))
reset_rownames <- function(df, col="rowname") {
stopifnot(is.data.frame(df))
col <- as.character(substitute(col))
reset_rownames_(df, col)
}
reset_rownames_ <- function(df, col="rowname") {
stopifnot(is.data.frame(df))
nm <- data.frame(df)[, col]
df <- df[, !(colnames(df) %in% col)]
rownames(df) <- nm
df
}
m <- "rowname"
head(as.matrix(reset_rownames(add_rownames(mtcars), "rowname")))
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
head(as.matrix(reset_rownames_(add_rownames(mtcars), m)))
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Perhaps to_rownames() or set_rownames() makes more sense. ¯\_(ツ)_/¯ YMMV.

If you really need a matrix you can just save the character column to a separate variable, drop it, and then create the matrix
library(dplyr)
df <- data_frame(a = sample(letters, 4), b = c(1:4), c = c(5:8))
letters <- df %>% select(a)
a.matrix <- df %>% select(-a) %>% as.matrix
Not sure what you are going to do after that, but this gets you as far as you asked for...

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

loop multiplication on different datasets with same variable names - r

We could use transform with lapply lst1 <- lapply(lst1, transform, new = mpg * cyl) data lst1 <- list(mtcars1, mtcars2)

Related

How to create function to use regular expressions to replace column names in a data frame?

Specifying where columns should be placed in r

write function to replace variable with itself plus 1% of its median

Convert LIst To Dataframe Using For Loop And Saving Under Different Names In R

Opposite function to add_rownames in dplyr

Categories

Resources