I think I am missing a fundamental concept about R's data frames.
head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
The names of the cars here. Is this a column? I don't think so, because I am not able to access them via mtcars[,1]. And there is no column name/header for it.
How could I create a data frame like that? How could I use that special column e.g. to describe the data in a plot for example?
They are row names, to access them use:
rownames(mtcars)
For column names use colnames, to see both row and column names, we can use:
dimnames(mtcars)
To modify, for example the first row:
rownames(mtcars)[1] <- "myNewName"
When data frame is created with data.frame, row names are assigned with 1:n numbers.
mydata <- data.frame(x = 1:5)
Then we can modify them:
rownames(mydata) <- paste0("MyName", 1:5)
Or we can add rownames when creating the data.frame:
mydata <- data.frame(x = 1:5, row.names = paste0("MyName", 1:5))
Note:
rownames are not very reliable, for example see this post. (this could be subjective opinion and I avoid them by reassigning rownames to columns)
data.table and dplyr packages prefer not to have them. You can always reassign rownames into a columns as:
mydata$myNames <- rownames(mydata)
A shorter one liner argument with data.tablePackage will make the rowname a column.
library(data.table)
setDT(mtcars, keep.rownames = TRUE[])
head(mtcars)
rn mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
This works too using tibble.
library(tibble)
mtcars %>%
rownames_to_column(var="carnames")
How could you create a data frame like that? =>
you can transform a column to a row names using textshape package. see exemple below
> column to row names
library(textshape)
state_dat <- data.frame(state.name, state.area, state.center, state.division)
column_to_rownames(state_dat)
#making 'state.name' to row names in new data 'new_state_dat'
new_state_dat<-column_to_rownames(state_dat, 'state.name')
I advise you not to use row.names() to transform column into row names
How could I use that special column e.g. to describe the data in a
plot for example?
you can use superheat package, for more information, see https://rlbarter.github.io/superheat/index.html , it's more simple and more powerful if you use textshape package instead row.names() to transform column into rownames
Related
Is there a way to swap the order of the first 2 columns in a data frame with 100+ columns?
All the methods online require you to input the order yourself and with 100 columns that's a bit too tedious.
Example solution being
dfrm <- dfrm[c("2", "3", "1", "4")]
However, with my large data frame this solution is impractical. I want to maintain all the columns order except swap the first two so column 2 is in column 1's position since the software I'm using requires column 1 to be sampleID which I have as column 2, which leads to an error.
Thanks
You can consider the following. dfrm is your target data frame.
dfrm <- dfrm[, c(2, 1, 3:ncol(dfrm))]
Since 3:ncol(dfrm) maintains the same column index as the original data frame, this code will preserve all the column order except the first two columns.
data.table::setcolorder(dfrm, c(2, 1))
It doesn't even need to be a data.table.
library(dplyr)
df <- tibble(a = c(2,3),
b = c(3,4),
c = c(9,9))
df <- df %>% relocate(b, .before = a)
df
We could use select from dplyr package to put the first two columns by name and then use everything():
library(dplyr)
select(head(mtcars), cyl, disp, everything())
cyl disp mpg hp drat wt qsec vs am gear carb
Mazda RX4 6 160 21.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 6 160 21.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 4 108 22.8 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 6 258 21.4 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 8 360 18.7 175 3.15 3.440 17.02 0 0 3 2
Valiant 6 225 18.1 105 2.76 3.460 20.22 1 0 3 1
Say I am interested in renaming columns across several datasets. The columns that need to be renamed vary by name and position, so they can't be selected that way. However, the columns prior to the columns I want to rename and the columns right after are constant.
As an example, say the mpg and cyl columns in mtcars are always the first two columns and their names never change. The vs:carb columns are similar, but their positions change depending on the number of columns added before them (but after cyl). However, the variable names from hp:qsec change and sometime a new variable will get added between them.
Say I want to append the word '_Value' to the end of each of the columns that are located after cyl and before vs. How would I go about doing that, ideally using dplyr?
You can try -
library(dplyr)
mtcars %>%
rename_with(~paste0(., '_Value'), -c(mpg:cyl, vs:carb)) %>%
head
# mpg cyl disp_Value hp_Value drat_Value wt_Value qsec_Value vs am gear carb
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
If you have other columns in the data and to rename the columns specifically between cyl and vs you can do -
start <- match('cyl', names(mtcars))
end <- match('vs', names(mtcars))
cols <- (start + 1):(end - 1)
names(mtcars)[cols] <- paste0(names(mtcars)[cols], '_Value')
I think I am missing a fundamental concept about R's data frames.
head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
The names of the cars here. Is this a column? I don't think so, because I am not able to access them via mtcars[,1]. And there is no column name/header for it.
How could I create a data frame like that? How could I use that special column e.g. to describe the data in a plot for example?
They are row names, to access them use:
rownames(mtcars)
For column names use colnames, to see both row and column names, we can use:
dimnames(mtcars)
To modify, for example the first row:
rownames(mtcars)[1] <- "myNewName"
When data frame is created with data.frame, row names are assigned with 1:n numbers.
mydata <- data.frame(x = 1:5)
Then we can modify them:
rownames(mydata) <- paste0("MyName", 1:5)
Or we can add rownames when creating the data.frame:
mydata <- data.frame(x = 1:5, row.names = paste0("MyName", 1:5))
Note:
rownames are not very reliable, for example see this post. (this could be subjective opinion and I avoid them by reassigning rownames to columns)
data.table and dplyr packages prefer not to have them. You can always reassign rownames into a columns as:
mydata$myNames <- rownames(mydata)
A shorter one liner argument with data.tablePackage will make the rowname a column.
library(data.table)
setDT(mtcars, keep.rownames = TRUE[])
head(mtcars)
rn mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
This works too using tibble.
library(tibble)
mtcars %>%
rownames_to_column(var="carnames")
How could you create a data frame like that? =>
you can transform a column to a row names using textshape package. see exemple below
> column to row names
library(textshape)
state_dat <- data.frame(state.name, state.area, state.center, state.division)
column_to_rownames(state_dat)
#making 'state.name' to row names in new data 'new_state_dat'
new_state_dat<-column_to_rownames(state_dat, 'state.name')
I advise you not to use row.names() to transform column into row names
How could I use that special column e.g. to describe the data in a
plot for example?
you can use superheat package, for more information, see https://rlbarter.github.io/superheat/index.html , it's more simple and more powerful if you use textshape package instead row.names() to transform column into rownames
This question already has an answer here:
Rename multiple columns given character vectors of column names and replacement [duplicate]
(1 answer)
Closed 5 years ago.
Here's an example
temp <- mtcars
colnames(temp)[grepl("ge", colnames(temp))] <- "garbage"
Output
mpg cyl disp hp drat wt qsec vs am garbage carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
I only know what the column name will start with ("ge") but I'm not sure of the column name exactly
I want a solution that works in a dplyr chain
temp %>%
rename(vars(starts_with("ge")), "garbage")
Error: All arguments must be named
of course doesn't work. Thanks for any help
You can use rename_at. If you know that only one column starts with "ge", this will work:
library(dplyr)
mtcars %>%
rename_at(vars(starts_with("ge")), funs(paste0("garbage")))
If you want to rename more than one column, the function in funs() needs to return a vector of names, or do something like gsub() to add something to the existing column names.
I have a set of data frames - let us say called report_001, report_002, report_003 and so on - I have the names of them in a character vector such as:
n <- c('report_001', 'report_002', 'report_003')
I need to turn this into a list of data frames as follows:
dfList <- list(report_001 = report_001, report_002 = report_002, report_003 = report_003)
So that I can index like this:
dfList[['report_002']]
However, since I have a large number of data frames, I don't want to do this manually. Trying to do something like this, has not worked:
dfList <- sapply(n, function(x) assign(x, as.name(x)))
For this question, what those data frames are is not important. To keep things simple, I can have:
report_001 <- mtcars
report_002 <- mtcars
report_003 <- mtcars
How can I achieve auto conversion of my names of data frames into a list of data frames of same name indices?
report_001 <- mtcars
report_002 <- mtcars
report_003 <- mtcars
n <- c('report_001', 'report_002', 'report_003')
dfList <- mget(n)
head(dfList[['report_001']])
# mpg cyl disp hp drat wt qsec vs am
# Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1
# Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1
# Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1
# Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0
# Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0
# Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0