I have a dataframe, where I extract a certain subset:
tmp <- mtcars |> select(disp, hp)
then I make some data manipulation
tmp$disp <- tmp$disp*0
tmp$hp <- tmp$hp*2
Now I want to reintegrate the changes into the original
How?
Of course I could work on the original df in the first place but I just want to know how to replace all values from a df by a subset.
I want to keep the order of the column names and if possible I don't want to use any index.
I also assume there are use cases where the select query is long.
You need to select names in mtcars that match with names in tmp and then replace values.
mtcars[,names(tmp)] <- tmp
head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 0 220 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 0 220 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 0 186 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 0 220 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 0 350 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 0 210 2.76 3.460 20.22 1 0 3 1
Or instead of creating 'tmp',
library(dplyr)
mtcars <- mtcars %>%
mutate(disp = disp*0, hp = hp*2)
Or in `data.table)
setDT(mtcars)[, c("disp", "hp) := .(0, hp *2)]
Or in base R
mtcars[c("disp", "hp")] <- list(0, mtcars$hp*2)
answer is:
mtcars <- mutate(mtcars, tmp)
edit: add this solution, which could be more intuitive
newdf <- mtcars |> mutate(tmp)
Related
I have downloaded an .ods file from this website (UK office for national statistics). Because of the way the sheet is structured, I import it as two separate dataframes:
library(readODS)
income_pretax <- read_ods('/Users/c.robin/Downloads/NS_Table_3_1a_1819.ods', range = "A4:U103")
income_posttax <- read_ods('/Users/c.robin/Downloads/NS_Table_3_1a_1819.ods', range = "A104:U203")
I want to do some cleaning on both dataframes: changing the name of the two of the variables and recasting one of the variables as numeric. This is what I have for this, which works on a single df:
income_pretax <- income_pretax %>%
rename(pp_tot_income_pretax = 'Percentile point\nTotal income before tax',
'2008-09' = '2008-09(a)')
income_pretax['2008-09'] <- as.numeric(income_pretax$'2008-09')
I'm struggling to get the above into a function though. I think it should be something like the below, but honestly I have no idea how to tell R i'm passing multiple dataframes to the function, nor how to handle multiple variables. Can anyone advise on this?
##Attempting a function
cleanvars <- function(data, varlist){
data <- data %>%
rename(pp_tot_income_pretax = {{varlist}})
data['2008-09'] <- as.numeric(data$'2008-09')
}
You can pass a named vector to the function.
library(dplyr)
cleanvars <- function(data, varlist){
data %>% rename(varlist)
}
cleanvars(mtcars %>% head, c('new_mpg' = 'mpg', 'new_cyl' = 'cyl'))
# new_mpg new_cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
We can do this in base R
nm1 <- c('mpg', 'cyl')
nm2 <- paste0("new_", nm1)
i1 <- match(nm1, names(mtcars))
names(mtcars)[i1] <- nm2
I have a dataframe with multiple columns of a numeric type, where I want to query if a range of values exist in any of them, and bring back a true/false binary flag with as.numeric.
So I can do this the long way with:
df <- df %>%
mutate(flag = as.numeric(days_dry %in% c(1:28) |
days_frozen %in% c(1:28) |
days_fresh %in% c(1:28))
But I have a bunch of columns I want to query. Why can't I bring back the same result with this?:
df <- df %>%
mutate(flag = as.numeric(vars(starts_with("days_")) %in% c(1:28))
I get no error, but it doesn't bring back any cases which match the criteria.
There might be a better way, but ...
mtcars %>%
mutate(flag = rowSums(sapply(cbind(select(., starts_with("c"))), `%in%`, 4:6)) > 0) %>%
head()
# mpg cyl disp hp drat wt qsec vs am gear carb flag
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 TRUE
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 TRUE
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 TRUE
# Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 TRUE
# Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 FALSE
# Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 TRUE
The premise is using cbind(select(., <>))) to form a mid-pipe inner frame. From there, we sapply over its columns, converting them to columns of logicals. The last step is using rowSums(.) > 0 to determine if a row has at least one TRUE; an alternative to rowSums can use Reduce(``` | ```, ...), but while that is elegant in a list-processing kind of way, it is also slower (especially with multiple matching columns).
I think I am missing a fundamental concept about R's data frames.
head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
The names of the cars here. Is this a column? I don't think so, because I am not able to access them via mtcars[,1]. And there is no column name/header for it.
How could I create a data frame like that? How could I use that special column e.g. to describe the data in a plot for example?
They are row names, to access them use:
rownames(mtcars)
For column names use colnames, to see both row and column names, we can use:
dimnames(mtcars)
To modify, for example the first row:
rownames(mtcars)[1] <- "myNewName"
When data frame is created with data.frame, row names are assigned with 1:n numbers.
mydata <- data.frame(x = 1:5)
Then we can modify them:
rownames(mydata) <- paste0("MyName", 1:5)
Or we can add rownames when creating the data.frame:
mydata <- data.frame(x = 1:5, row.names = paste0("MyName", 1:5))
Note:
rownames are not very reliable, for example see this post. (this could be subjective opinion and I avoid them by reassigning rownames to columns)
data.table and dplyr packages prefer not to have them. You can always reassign rownames into a columns as:
mydata$myNames <- rownames(mydata)
A shorter one liner argument with data.tablePackage will make the rowname a column.
library(data.table)
setDT(mtcars, keep.rownames = TRUE[])
head(mtcars)
rn mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
This works too using tibble.
library(tibble)
mtcars %>%
rownames_to_column(var="carnames")
How could you create a data frame like that? =>
you can transform a column to a row names using textshape package. see exemple below
> column to row names
library(textshape)
state_dat <- data.frame(state.name, state.area, state.center, state.division)
column_to_rownames(state_dat)
#making 'state.name' to row names in new data 'new_state_dat'
new_state_dat<-column_to_rownames(state_dat, 'state.name')
I advise you not to use row.names() to transform column into row names
How could I use that special column e.g. to describe the data in a
plot for example?
you can use superheat package, for more information, see https://rlbarter.github.io/superheat/index.html , it's more simple and more powerful if you use textshape package instead row.names() to transform column into rownames
I think I am missing a fundamental concept about R's data frames.
head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
The names of the cars here. Is this a column? I don't think so, because I am not able to access them via mtcars[,1]. And there is no column name/header for it.
How could I create a data frame like that? How could I use that special column e.g. to describe the data in a plot for example?
They are row names, to access them use:
rownames(mtcars)
For column names use colnames, to see both row and column names, we can use:
dimnames(mtcars)
To modify, for example the first row:
rownames(mtcars)[1] <- "myNewName"
When data frame is created with data.frame, row names are assigned with 1:n numbers.
mydata <- data.frame(x = 1:5)
Then we can modify them:
rownames(mydata) <- paste0("MyName", 1:5)
Or we can add rownames when creating the data.frame:
mydata <- data.frame(x = 1:5, row.names = paste0("MyName", 1:5))
Note:
rownames are not very reliable, for example see this post. (this could be subjective opinion and I avoid them by reassigning rownames to columns)
data.table and dplyr packages prefer not to have them. You can always reassign rownames into a columns as:
mydata$myNames <- rownames(mydata)
A shorter one liner argument with data.tablePackage will make the rowname a column.
library(data.table)
setDT(mtcars, keep.rownames = TRUE[])
head(mtcars)
rn mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
This works too using tibble.
library(tibble)
mtcars %>%
rownames_to_column(var="carnames")
How could you create a data frame like that? =>
you can transform a column to a row names using textshape package. see exemple below
> column to row names
library(textshape)
state_dat <- data.frame(state.name, state.area, state.center, state.division)
column_to_rownames(state_dat)
#making 'state.name' to row names in new data 'new_state_dat'
new_state_dat<-column_to_rownames(state_dat, 'state.name')
I advise you not to use row.names() to transform column into row names
How could I use that special column e.g. to describe the data in a
plot for example?
you can use superheat package, for more information, see https://rlbarter.github.io/superheat/index.html , it's more simple and more powerful if you use textshape package instead row.names() to transform column into rownames
I have a set of data frames - let us say called report_001, report_002, report_003 and so on - I have the names of them in a character vector such as:
n <- c('report_001', 'report_002', 'report_003')
I need to turn this into a list of data frames as follows:
dfList <- list(report_001 = report_001, report_002 = report_002, report_003 = report_003)
So that I can index like this:
dfList[['report_002']]
However, since I have a large number of data frames, I don't want to do this manually. Trying to do something like this, has not worked:
dfList <- sapply(n, function(x) assign(x, as.name(x)))
For this question, what those data frames are is not important. To keep things simple, I can have:
report_001 <- mtcars
report_002 <- mtcars
report_003 <- mtcars
How can I achieve auto conversion of my names of data frames into a list of data frames of same name indices?
report_001 <- mtcars
report_002 <- mtcars
report_003 <- mtcars
n <- c('report_001', 'report_002', 'report_003')
dfList <- mget(n)
head(dfList[['report_001']])
# mpg cyl disp hp drat wt qsec vs am
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1
# Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0
# Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0
# Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0