Select column from data frame based on dynamic value in R - r

I've been banging my head against this problem and feel certain there must be an efficient way to do this in R that doesn't involve writing a for loop. Any suggestions much appreciated!
I'd like to create a new column in a data frame that contains values from existing columns in the dataframe, but where the column whose value is selected is dynamically specified. An example will help clarify:
> mydata <- head(mtcars)
> mydata
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
> myquery <- c("cyl","cyl","gear","gear","carb", "carb")
At this point, I'd like to know if there's a simple R function that will select the value of column myquery for each row of mydata, in other words:
f(mydata, myquery)
6 6 4 3 2 1
Thanks in advance if anyone knows of a simple and efficient version way to write f, thanks in advance for your time.

You can index a data.frame with a matrix to achieve that behavior
dd<-head(mtcars)
myquery <- c("cyl","cyl","gear","gear","carb", "carb")
dd[cbind(seq_along(myquery), match(myquery, names(dd)))]
# [1] 6 6 4 3 2 1
The first column of the matrix is the row, the second is the column (and note when using this method there is no comma in the brackets like when you do a normal [,] subset. Here i converted the myqeury values to their numeric column indices using match so both columns of the matrix are the same type (as they have to be). You could have also used a character matrix if you used the row names to index the rows. Thus
dd[cbind(rownames(dd), myquery)]
# [1] 6 6 4 3 2 1
also works.

Related

Swap the first 2 columns in a data frame with 100 columns?

Is there a way to swap the order of the first 2 columns in a data frame with 100+ columns?
All the methods online require you to input the order yourself and with 100 columns that's a bit too tedious.
Example solution being
dfrm <- dfrm[c("2", "3", "1", "4")]
However, with my large data frame this solution is impractical. I want to maintain all the columns order except swap the first two so column 2 is in column 1's position since the software I'm using requires column 1 to be sampleID which I have as column 2, which leads to an error.
Thanks
You can consider the following. dfrm is your target data frame.
dfrm <- dfrm[, c(2, 1, 3:ncol(dfrm))]
Since 3:ncol(dfrm) maintains the same column index as the original data frame, this code will preserve all the column order except the first two columns.
data.table::setcolorder(dfrm, c(2, 1))
It doesn't even need to be a data.table.
library(dplyr)
df <- tibble(a = c(2,3),
b = c(3,4),
c = c(9,9))
df <- df %>% relocate(b, .before = a)
df
We could use select from dplyr package to put the first two columns by name and then use everything():
library(dplyr)
select(head(mtcars), cyl, disp, everything())
cyl disp mpg hp drat wt qsec vs am gear carb
Mazda RX4 6 160 21.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 6 160 21.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 4 108 22.8 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 6 258 21.4 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 8 360 18.7 175 3.15 3.440 17.02 0 0 3 2
Valiant 6 225 18.1 105 2.76 3.460 20.22 1 0 3 1

How to rename all columns between two sets of columns in a dataframe using R?

Say I am interested in renaming columns across several datasets. The columns that need to be renamed vary by name and position, so they can't be selected that way. However, the columns prior to the columns I want to rename and the columns right after are constant.
As an example, say the mpg and cyl columns in mtcars are always the first two columns and their names never change. The vs:carb columns are similar, but their positions change depending on the number of columns added before them (but after cyl). However, the variable names from hp:qsec change and sometime a new variable will get added between them.
Say I want to append the word '_Value' to the end of each of the columns that are located after cyl and before vs. How would I go about doing that, ideally using dplyr?
You can try -
library(dplyr)
mtcars %>%
rename_with(~paste0(., '_Value'), -c(mpg:cyl, vs:carb)) %>%
head
# mpg cyl disp_Value hp_Value drat_Value wt_Value qsec_Value vs am gear carb
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
If you have other columns in the data and to rename the columns specifically between cyl and vs you can do -
start <- match('cyl', names(mtcars))
end <- match('vs', names(mtcars))
cols <- (start + 1):(end - 1)
names(mtcars)[cols] <- paste0(names(mtcars)[cols], '_Value')

sapply returning repeated instances of first element of vector instead of all elements calculated by custom function in R

Consider the following columns in my dataset:
df$PT : contain strings with repeating pattern. Example:
[1] "60D 0%" "5M 2%" "4 2ND M 5%" ...
df$date : column of dates
[1] "2021-01-18" "2021-01-18" "2021-01-18" ...
I managed to create a function that reads inputs from the columns above, makes operations with them and returns another date (let's call it date2). The function works fine (I tested it by passing its arguments manually):
function1(PT,date) {
#if/else chain to generate date2 from PT and date
#function returns either (date2) or NA according to if/else conditions
}
So far so good. The problem comes when I try to use sapply to apply my function1 for every single term of column df$PT and store the output (which I want to be either a single date or NA for every term in df$PT) in df$new-col, such as:
df$new_col <- sapply(df$PT,function1,date=df$date)
But instead of having the expected output in df$new_col in date format as:
date2a
date2b
date2c
date2d
...
I am obtaining only the first output repeated everywhere, and in string format of a date:
18705
18705
18705
18705
...
What can be going on and how do I solve it to get the correct calculations of date2 in df$new_col?
Thank you for your help!
Because R is vectorized you can create new df columns directly from existing columns. E.g.:
cars <- mtcars
cars$new <- ifelse(cars$cyl == 6 & cars$mpg > 20, "NewVal", NA)
head(cars)
mpg cyl disp hp drat wt qsec vs am gear carb new
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 NewVal
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 NewVal
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 <NA>
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 NewVal
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 <NA>
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 <NA>

How to recover property from table into dataframe in R? [duplicate]

I think I am missing a fundamental concept about R's data frames.
head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
The names of the cars here. Is this a column? I don't think so, because I am not able to access them via mtcars[,1]. And there is no column name/header for it.
How could I create a data frame like that? How could I use that special column e.g. to describe the data in a plot for example?
They are row names, to access them use:
rownames(mtcars)
For column names use colnames, to see both row and column names, we can use:
dimnames(mtcars)
To modify, for example the first row:
rownames(mtcars)[1] <- "myNewName"
When data frame is created with data.frame, row names are assigned with 1:n numbers.
mydata <- data.frame(x = 1:5)
Then we can modify them:
rownames(mydata) <- paste0("MyName", 1:5)
Or we can add rownames when creating the data.frame:
mydata <- data.frame(x = 1:5, row.names = paste0("MyName", 1:5))
Note:
rownames are not very reliable, for example see this post. (this could be subjective opinion and I avoid them by reassigning rownames to columns)
data.table and dplyr packages prefer not to have them. You can always reassign rownames into a columns as:
mydata$myNames <- rownames(mydata)
A shorter one liner argument with data.tablePackage will make the rowname a column.
library(data.table)
setDT(mtcars, keep.rownames = TRUE[])
head(mtcars)
rn mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
This works too using tibble.
library(tibble)
mtcars %>%
rownames_to_column(var="carnames")
How could you create a data frame like that? =>
you can transform a column to a row names using textshape package. see exemple below
> column to row names
library(textshape)
state_dat <- data.frame(state.name, state.area, state.center, state.division)
column_to_rownames(state_dat)
#making 'state.name' to row names in new data 'new_state_dat'
new_state_dat<-column_to_rownames(state_dat, 'state.name')
I advise you not to use row.names() to transform column into row names
How could I use that special column e.g. to describe the data in a
plot for example?
you can use superheat package, for more information, see https://rlbarter.github.io/superheat/index.html , it's more simple and more powerful if you use textshape package instead row.names() to transform column into rownames

What is about the first column in R's dataset mtcars?

I think I am missing a fundamental concept about R's data frames.
head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
The names of the cars here. Is this a column? I don't think so, because I am not able to access them via mtcars[,1]. And there is no column name/header for it.
How could I create a data frame like that? How could I use that special column e.g. to describe the data in a plot for example?
They are row names, to access them use:
rownames(mtcars)
For column names use colnames, to see both row and column names, we can use:
dimnames(mtcars)
To modify, for example the first row:
rownames(mtcars)[1] <- "myNewName"
When data frame is created with data.frame, row names are assigned with 1:n numbers.
mydata <- data.frame(x = 1:5)
Then we can modify them:
rownames(mydata) <- paste0("MyName", 1:5)
Or we can add rownames when creating the data.frame:
mydata <- data.frame(x = 1:5, row.names = paste0("MyName", 1:5))
Note:
rownames are not very reliable, for example see this post. (this could be subjective opinion and I avoid them by reassigning rownames to columns)
data.table and dplyr packages prefer not to have them. You can always reassign rownames into a columns as:
mydata$myNames <- rownames(mydata)
A shorter one liner argument with data.tablePackage will make the rowname a column.
library(data.table)
setDT(mtcars, keep.rownames = TRUE[])
head(mtcars)
rn mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
This works too using tibble.
library(tibble)
mtcars %>%
rownames_to_column(var="carnames")
How could you create a data frame like that? =>
you can transform a column to a row names using textshape package. see exemple below
> column to row names
library(textshape)
state_dat <- data.frame(state.name, state.area, state.center, state.division)
column_to_rownames(state_dat)
#making 'state.name' to row names in new data 'new_state_dat'
new_state_dat<-column_to_rownames(state_dat, 'state.name')
I advise you not to use row.names() to transform column into row names
How could I use that special column e.g. to describe the data in a
plot for example?
you can use superheat package, for more information, see https://rlbarter.github.io/superheat/index.html , it's more simple and more powerful if you use textshape package instead row.names() to transform column into rownames

Resources