Scale & replace in Dataframe - r

I need to scale some columns out of my dataframe and I found a function which did it perfectly. So my question is how to replace the scaled columns in my dataframe? Is there a specific function for that or am I supposed to do it in a different way?
df <- raw_data %>%
select(CLIENTNUM:Avg_Utilization_Ratio) %>%
rename(Customer_Nr = CLIENTNUM,
Customer_Act = Attrition_Flag,
Total_Product_Count = Total_Relationship_Count)
scaled <- scale(select(df, Customer_Nr, Customer_Age, Dependent_count, Months_on_book,
Total_Product_Count, Months_Inactive_12_mon, Contacts_Count_12_mon, Credit_Limit,
Total_Revolving_Bal, Avg_Open_To_Buy, Total_Amt_Chng_Q4_Q1, Total_Trans_Amt,
Total_Trans_Ct)
, center = T, scale = T)

Create a character vector of column name and apply scale only on those columns.
cols <- c('Customer_Nr', 'Customer_Age', 'Dependent_count' .....)
df[cols] <- scale(df[cols])
Using an example of mtcars dataset :
df <- mtcars
cols <- c('mpg', 'disp')
df[cols] <- scale(df[cols])
df
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 0.1509 6 -0.5706 110 3.90 2.62 16.5 0 1 4 4
#Mazda RX4 Wag 0.1509 6 -0.5706 110 3.90 2.88 17.0 0 1 4 4
#Datsun 710 0.4495 4 -0.9902 93 3.85 2.32 18.6 1 1 4 1
#Hornet 4 Drive 0.2173 6 0.2201 110 3.08 3.21 19.4 1 0 3 1
#Hornet Sportabout -0.2307 8 1.0431 175 3.15 3.44 17.0 0 0 3 2
#Valiant -0.3303 6 -0.0462 105 2.76 3.46 20.2 1 0 3 1
#...
#...

Related

loop multiplication on different datasets with same variable names

Hey I'm sure I'm missing something simple with mapping, but I can't get it to work. I want to use a loop to do the same calculation for multiple dataframes that have the same name. Basically, I want this loop to not throw an error:
mtcars1 <- mtcars
mtcars2 <- mtcars
for(x in c(mtcars1, mtcars2)){
x$new <- x$mpg * x$cyl
}
So that at the end, both mtcars1 and mtcars2 have a new variable called new, that is mpg * cyl.
If you first put your dataframes into a list, you can index into each using seq_along():
dfs <- list(mtcars1 = mtcars, mtcars2 = mtcars)
for (i in seq_along(dfs)) {
dfs[[i]]$new <- dfs[[i]]$mpg * dfs[[i]]$cyl
}
Or, using lapply():
dfs <- lapply(dfs, \(x) {
x$new <- x$mpg * x$cyl
x
})
Result from either approach:
#> head(dfs$mtcars1)
mpg cyl disp hp drat wt qsec vs am gear carb new
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 126.0
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 126.0
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 91.2
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 128.4
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 149.6
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 108.6
If you really want to leave your dataframes loose in the environment, you could do something like
for (nm in c("mtcars1", "mtcars2")) {
x <- get(nm)
x$new <- x$mpg * x$cyl
assign(nm, x)
}
Result:
#> head(mtcars1)
mpg cyl disp hp drat wt qsec vs am gear carb new
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 126.0
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 126.0
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 91.2
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 128.4
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 149.6
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 108.6
The for loop does not work as you think, and more important, the effect of c(mtcars1, mtcars2) is not what you think, to see so do
test <- c(mtcars1, mtcars2)
length(test)
str(test)
You need to replace the c with list. Below is one solution:
mtcars1 <- mtcars
mtcars2 <- mtcars
test <- list(mtcars1, mtcars2)
newtest <- lapply(test, FUN=function(x)
within(x, new <- mpg * cyl))
We could use transform with lapply
lst1 <- lapply(lst1, transform, new = mpg * cyl)
data
lst1 <- list(mtcars1, mtcars2)

R edit subset of data frame and override original row-values

I have a dataframe, where I extract a certain subset:
tmp <- mtcars |> select(disp, hp)
then I make some data manipulation
tmp$disp <- tmp$disp*0
tmp$hp <- tmp$hp*2
Now I want to reintegrate the changes into the original
How?
Of course I could work on the original df in the first place but I just want to know how to replace all values from a df by a subset.
I want to keep the order of the column names and if possible I don't want to use any index.
I also assume there are use cases where the select query is long.
You need to select names in mtcars that match with names in tmp and then replace values.
mtcars[,names(tmp)] <- tmp
head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 0 220 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 0 220 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 0 186 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 0 220 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 0 350 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 0 210 2.76 3.460 20.22 1 0 3 1
Or instead of creating 'tmp',
library(dplyr)
mtcars <- mtcars %>%
mutate(disp = disp*0, hp = hp*2)
Or in `data.table)
setDT(mtcars)[, c("disp", "hp) := .(0, hp *2)]
Or in base R
mtcars[c("disp", "hp")] <- list(0, mtcars$hp*2)
answer is:
mtcars <- mutate(mtcars, tmp)
edit: add this solution, which could be more intuitive
newdf <- mtcars |> mutate(tmp)

How do you rename a column with dplyr using a character object [duplicate]

This question already has answers here:
Changing column names of a data frame
(18 answers)
Rename multiple columns by names
(20 answers)
Closed 1 year ago.
I'd like to rename a column using dplyr in a dynamic way by using a variable. However, it just names the column what the variable is called, rather than its content. Any ideas??
colnames(y)
[1] "time" "channel_1" "channel_2" "channel_3" "channel_4" "channel_5" "correction" "channel.corr"
ladder.channel <- "channel_4"
fsa.bc <- y %>%
select(-c(all_of(ladder.channel), "correction")) %>%
rename(ladder.channel = channel.corr)
colnames(fsa.bc)
"time" "channel_1" "channel_2" "channel_3" "channel_5" "ladder.channel"
Making the question reproducible:
library(dplyr)
y <- mtcars[, 1:8]
names(y) <- c("time", "channel_1", "channel_2", "channel_3", "channel_4", "channel_5", "correction", "channel.corr")
ladder.channel <- "channel_4"
fsa.bc <-
y %>%
select(-c(all_of(ladder.channel), "correction"))%>%
rename({{ladder.channel}} := channel.corr)
names(fsa.bc)
#> [1] "time" "channel_1" "channel_2" "channel_3" "channel_5" "channel_4"
Created on 2021-09-01 by the reprex package (v2.0.0)
You can use rename_with -
library(dplyr)
y <- mtcars
new_name <- 'MPG'
y %>% rename_with(~new_name, mpg) %>% head
# MPG cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
In base R -
names(y)[match('mpg', names(y))] <- new_name
y

Turning data manipulation into a function in R

I have downloaded an .ods file from this website (UK office for national statistics). Because of the way the sheet is structured, I import it as two separate dataframes:
library(readODS)
income_pretax <- read_ods('/Users/c.robin/Downloads/NS_Table_3_1a_1819.ods', range = "A4:U103")
income_posttax <- read_ods('/Users/c.robin/Downloads/NS_Table_3_1a_1819.ods', range = "A104:U203")
I want to do some cleaning on both dataframes: changing the name of the two of the variables and recasting one of the variables as numeric. This is what I have for this, which works on a single df:
income_pretax <- income_pretax %>%
rename(pp_tot_income_pretax = 'Percentile point\nTotal income before tax',
'2008-09' = '2008-09(a)')
income_pretax['2008-09'] <- as.numeric(income_pretax$'2008-09')
I'm struggling to get the above into a function though. I think it should be something like the below, but honestly I have no idea how to tell R i'm passing multiple dataframes to the function, nor how to handle multiple variables. Can anyone advise on this?
##Attempting a function
cleanvars <- function(data, varlist){
data <- data %>%
rename(pp_tot_income_pretax = {{varlist}})
data['2008-09'] <- as.numeric(data$'2008-09')
}
You can pass a named vector to the function.
library(dplyr)
cleanvars <- function(data, varlist){
data %>% rename(varlist)
}
cleanvars(mtcars %>% head, c('new_mpg' = 'mpg', 'new_cyl' = 'cyl'))
# new_mpg new_cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
We can do this in base R
nm1 <- c('mpg', 'cyl')
nm2 <- paste0("new_", nm1)
i1 <- match(nm1, names(mtcars))
names(mtcars)[i1] <- nm2

turning list into dataframe column in R function

I have a function of the (very simplified) form below.
doThing <- function(df){
result <- c() #Sets up dataframe to store lists
totals <- rowSums(df) #Creates rowsum list from df column
result[1] <- totals #intended to make totals a column in result. does not work
}
How do I assign the list created in this function to a column in my result dataset? I've also tried the following use of the assign function, to no avail
assign(result[1], totals)
Thank you all!
You could assign a row-wise sum as a new column to dataframe.
doThing <- function(df){
transform(df, total= rowSums(df))
}
doThing(mtcars)
# mpg cyl disp hp drat wt qsec vs am gear carb total
#Mazda RX4 21.0 6 160.0 110 3.90 2.62 16.5 0 1 4 4 329
#Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.88 17.0 0 1 4 4 330
#Datsun 710 22.8 4 108.0 93 3.85 2.32 18.6 1 1 4 1 260
#Hornet 4 Drive 21.4 6 258.0 110 3.08 3.21 19.4 1 0 3 1 426
#Hornet Sportabout 18.7 8 360.0 175 3.15 3.44 17.0 0 0 3 2 590
#...
#...
We can use
doThing <- function(df) {
df[["total"]] <- rowSums(df)
df
}
doThing(mtcars)

Resources