This question already has answers here:
Split data.frame by string pattern in a column
(2 answers)
Split vector with letters and numbers at the letters, into a list
(1 answer)
Closed 4 days ago.
I am currently trying to separate a data frame into smaller blocks based on headers, that are currently also listed as individual rows, like this:
1 >904 5.000000e+00 <NA> <NA>
2 0.00961538461538 9.615385e-03 9.615385e-03 9.711538e-01
3 0.00961538461538 9.615385e-03 9.711538e-01 9.615385e-03
4 0.00961538461538 9.615385e-03 9.711538e-01 9.615385e-03
5 0.00961538461538 9.615385e-03 9.711538e-01 9.615385e-03
6 0.971153846154 9.615385e-03 9.615385e-03 9.615385e-03
7 >s36 7.000000e+00 <NA> <NA>
8 0.844325153374 7.668712e-04 1.541411e-01 7.668712e-04
9 0.0774539877301 6.909509e-01 7.745399e-02 1.541411e-01
10 0.000766871165644 7.745399e-02 1.541411e-01 7.676380e-01
11 0.76763803681 7.745399e-02 7.668712e-04 1.541411e-01
12 0.0774539877301 7.745399e-02 7.676380e-01 7.745399e-02
13 0.230828220859 6.142638e-01 7.745399e-02 7.745399e-02
14 0.460889570552 2.308282e-01 1.541411e-01 1.541411e-01
Headers are the ">" rows and the numbers below are matrices that I want as values for the headers.
I would like it to look like this:
$0
1 >904 5.000000e+00 <NA> <NA>
2 0.00961538461538 9.615385e-03 9.615385e-03 9.711538e-01
3 0.00961538461538 9.615385e-03 9.711538e-01 9.615385e-03
4 0.00961538461538 9.615385e-03 9.711538e-01 9.615385e-03
5 0.00961538461538 9.615385e-03 9.711538e-01 9.615385e-03
6 0.971153846154 9.615385e-03 9.615385e-03 9.615385e-03
$1
7 >s36 7.000000e+00 <NA> <NA>
8 0.844325153374 7.668712e-04 1.541411e-01 7.668712e-04
9 0.0774539877301 6.909509e-01 7.745399e-02 1.541411e-01
10 0.000766871165644 7.745399e-02 1.541411e-01 7.676380e-01
11 0.76763803681 7.745399e-02 7.668712e-04 1.541411e-01
12 0.0774539877301 7.745399e-02 7.676380e-01 7.745399e-02
13 0.230828220859 6.142638e-01 7.745399e-02 7.745399e-02
14 0.460889570552 2.308282e-01 1.541411e-01 1.541411e-01
With split:
split(df, cumsum(grepl("^>", df[[1]])))
$`1`
V2 V3 V4 V5
1 >904 5.000000000 <NA> <NA>
2 0.00961538461538 0.009615385 9.615385e-03 9.711538e-01
3 0.00961538461538 0.009615385 9.711538e-01 9.615385e-03
4 0.00961538461538 0.009615385 9.711538e-01 9.615385e-03
5 0.00961538461538 0.009615385 9.711538e-01 9.615385e-03
6 0.971153846154 0.009615385 9.615385e-03 9.615385e-03
$`2`
V2 V3 V4 V5
7 >s36 7.0000000000 <NA> <NA>
8 0.844325153374 0.0007668712 1.541411e-01 7.668712e-04
9 0.0774539877301 0.6909509000 7.745399e-02 1.541411e-01
10 0.000766871165644 0.0774539900 1.541411e-01 7.676380e-01
11 0.76763803681 0.0774539900 7.668712e-04 1.541411e-01
12 0.0774539877301 0.0774539900 7.676380e-01 7.745399e-02
13 0.230828220859 0.6142638000 7.745399e-02 7.745399e-02
14 0.460889570552 0.2308282000 1.541411e-01 1.541411e-01
Data:
df <- structure(list(V2 = c(">904", "0.00961538461538", "0.00961538461538",
"0.00961538461538", "0.00961538461538", "0.971153846154", ">s36",
"0.844325153374", "0.0774539877301", "0.000766871165644", "0.76763803681",
"0.0774539877301", "0.230828220859", "0.460889570552"), V3 = c(5,
0.009615385, 0.009615385, 0.009615385, 0.009615385, 0.009615385,
7, 0.0007668712, 0.6909509, 0.07745399, 0.07745399, 0.07745399,
0.6142638, 0.2308282), V4 = c("<NA>", "9.615385e-03", "9.711538e-01",
"9.711538e-01", "9.711538e-01", "9.615385e-03", "<NA>", "1.541411e-01",
"7.745399e-02", "1.541411e-01", "7.668712e-04", "7.676380e-01",
"7.745399e-02", "1.541411e-01"), V5 = c("<NA>", "9.711538e-01",
"9.615385e-03", "9.615385e-03", "9.615385e-03", "9.615385e-03",
"<NA>", "7.668712e-04", "1.541411e-01", "7.676380e-01", "1.541411e-01",
"7.745399e-02", "7.745399e-02", "1.541411e-01")), row.names = c(NA,
-14L), class = "data.frame")
Related
I have a dataframe (Df1) that has 7 columns, each column is a variable to be used to develop a predictive linear regression model.
My second dataframe (Df2) is a TRUE/FALSE matrix, containing every possible column combination of Df1. Thus, it has 7 columns which match those of my first dataframe (Df1), containing either TRUE or FALSE, with 127 rows (the 128th row containing FALSE in each column, has been removed).
I want to create 127 dataframes, accounting for every column combination of my Df1, with the original values from Df1.
Is there a way of iterating through each row of Df2, and where 'TRUE' is found, creating a unique Df based from the Df1.
Are there any other solutions or considerations?
It's always better to see a concrete example, but it sounds like we can recreate your data structure like this.
I have a dataframe (Df1) that has 7 columns, each column is a variable to be used to develop a predictive linear regression model.
We can make something similar using this code:
set.seed(1)
Df1 <- as.data.frame(sapply(1:7, function(x) rnorm(10, x)))
Df1
#> V1 V2 V3 V4 V5 V6 V7
#> 1 0.3735462 3.5117812 3.918977 5.358680 4.835476 6.398106 9.401618
#> 2 1.1836433 2.3898432 3.782136 3.897212 4.746638 5.387974 6.960760
#> 3 0.1643714 1.3787594 3.074565 4.387672 5.696963 6.341120 7.689739
#> 4 2.5952808 -0.2146999 1.010648 3.946195 5.556663 4.870637 7.028002
#> 5 1.3295078 3.1249309 3.619826 2.622940 4.311244 7.433024 6.256727
#> 6 0.1795316 1.9550664 2.943871 3.585005 4.292505 7.980400 7.188792
#> 7 1.4874291 1.9838097 2.844204 3.605710 5.364582 5.632779 5.195041
#> 8 1.7383247 2.9438362 1.529248 3.940687 5.768533 4.955865 8.465555
#> 9 1.5757814 2.8212212 2.521850 5.100025 4.887654 6.569720 7.153253
#> 10 0.6946116 2.5939013 3.417942 4.763176 5.881108 5.864945 9.172612
My second dataframe (Df2) is a TRUE/FALSE matrix, containing every possible column combination of Df1. Thus, it has 7 columns which match those of my first dataframe (Df1), containing either TRUE or FALSE, with 127 rows (the 128th row containing FALSE in each column, has been removed).
This code produces a data frame of every possible combination of the 7 columns of Df1
Df2 <- as.data.frame(do.call(rbind, lapply(as.raw(0:127),
function(x) (rawToBits(x) == 0)[1:7])))
head(Df2)
#> V1 V2 V3 V4 V5 V6 V7
#> 1 TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#> 2 FALSE TRUE TRUE TRUE TRUE TRUE TRUE
#> 3 TRUE FALSE TRUE TRUE TRUE TRUE TRUE
#> 4 FALSE FALSE TRUE TRUE TRUE TRUE TRUE
#> 5 TRUE TRUE FALSE TRUE TRUE TRUE TRUE
#> 6 FALSE TRUE FALSE TRUE TRUE TRUE TRUE
...
Now assuming I have this right, the solution is a one-liner:
result <- apply(Df2, 1, function(i) Df1[i])
Now result is a list of 128 data frames, one for each of all possible combinations of the columns in Df1 (including an empty data frame at position 128 for the all-FALSE case)
head(result)
#> [[1]]
#> V1 V2 V3 V4 V5 V6 V7
#> 1 0.3735462 3.5117812 3.918977 5.358680 4.835476 6.398106 9.401618
#> 2 1.1836433 2.3898432 3.782136 3.897212 4.746638 5.387974 6.960760
#> 3 0.1643714 1.3787594 3.074565 4.387672 5.696963 6.341120 7.689739
#> 4 2.5952808 -0.2146999 1.010648 3.946195 5.556663 4.870637 7.028002
#> 5 1.3295078 3.1249309 3.619826 2.622940 4.311244 7.433024 6.256727
#> 6 0.1795316 1.9550664 2.943871 3.585005 4.292505 7.980400 7.188792
#> 7 1.4874291 1.9838097 2.844204 3.605710 5.364582 5.632779 5.195041
#> 8 1.7383247 2.9438362 1.529248 3.940687 5.768533 4.955865 8.465555
#> 9 1.5757814 2.8212212 2.521850 5.100025 4.887654 6.569720 7.153253
#> 10 0.6946116 2.5939013 3.417942 4.763176 5.881108 5.864945 9.172612
#>
#> [[2]]
#> V2 V3 V4 V5 V6 V7
#> 1 3.5117812 3.918977 5.358680 4.835476 6.398106 9.401618
#> 2 2.3898432 3.782136 3.897212 4.746638 5.387974 6.960760
#> 3 1.3787594 3.074565 4.387672 5.696963 6.341120 7.689739
#> 4 -0.2146999 1.010648 3.946195 5.556663 4.870637 7.028002
#> 5 3.1249309 3.619826 2.622940 4.311244 7.433024 6.256727
#> 6 1.9550664 2.943871 3.585005 4.292505 7.980400 7.188792
#> 7 1.9838097 2.844204 3.605710 5.364582 5.632779 5.195041
#> 8 2.9438362 1.529248 3.940687 5.768533 4.955865 8.465555
#> 9 2.8212212 2.521850 5.100025 4.887654 6.569720 7.153253
#> 10 2.5939013 3.417942 4.763176 5.881108 5.864945 9.172612
#>
#> [[3]]
#> V1 V3 V4 V5 V6 V7
#> 1 0.3735462 3.918977 5.358680 4.835476 6.398106 9.401618
#> 2 1.1836433 3.782136 3.897212 4.746638 5.387974 6.960760
#> 3 0.1643714 3.074565 4.387672 5.696963 6.341120 7.689739
#> 4 2.5952808 1.010648 3.946195 5.556663 4.870637 7.028002
#> 5 1.3295078 3.619826 2.622940 4.311244 7.433024 6.256727
#> 6 0.1795316 2.943871 3.585005 4.292505 7.980400 7.188792
#> 7 1.4874291 2.844204 3.605710 5.364582 5.632779 5.195041
#> 8 1.7383247 1.529248 3.940687 5.768533 4.955865 8.465555
#> 9 1.5757814 2.521850 5.100025 4.887654 6.569720 7.153253
#> 10 0.6946116 3.417942 4.763176 5.881108 5.864945 9.172612
#>
#> [[4]]
#> V3 V4 V5 V6 V7
#> 1 3.918977 5.358680 4.835476 6.398106 9.401618
#> 2 3.782136 3.897212 4.746638 5.387974 6.960760
#> 3 3.074565 4.387672 5.696963 6.341120 7.689739
#> 4 1.010648 3.946195 5.556663 4.870637 7.028002
#> 5 3.619826 2.622940 4.311244 7.433024 6.256727
#> 6 2.943871 3.585005 4.292505 7.980400 7.188792
#> 7 2.844204 3.605710 5.364582 5.632779 5.195041
#> 8 1.529248 3.940687 5.768533 4.955865 8.465555
#> 9 2.521850 5.100025 4.887654 6.569720 7.153253
#> 10 3.417942 4.763176 5.881108 5.864945 9.172612
#>
#> [[5]]
#> V1 V2 V4 V5 V6 V7
#> 1 0.3735462 3.5117812 5.358680 4.835476 6.398106 9.401618
#> 2 1.1836433 2.3898432 3.897212 4.746638 5.387974 6.960760
#> 3 0.1643714 1.3787594 4.387672 5.696963 6.341120 7.689739
#> 4 2.5952808 -0.2146999 3.946195 5.556663 4.870637 7.028002
#> 5 1.3295078 3.1249309 2.622940 4.311244 7.433024 6.256727
#> 6 0.1795316 1.9550664 3.585005 4.292505 7.980400 7.188792
#> 7 1.4874291 1.9838097 3.605710 5.364582 5.632779 5.195041
#> 8 1.7383247 2.9438362 3.940687 5.768533 4.955865 8.465555
#> 9 1.5757814 2.8212212 5.100025 4.887654 6.569720 7.153253
#> 10 0.6946116 2.5939013 4.763176 5.881108 5.864945 9.172612
#>
#> [[6]]
#> V2 V4 V5 V6 V7
#> 1 3.5117812 5.358680 4.835476 6.398106 9.401618
#> 2 2.3898432 3.897212 4.746638 5.387974 6.960760
#> 3 1.3787594 4.387672 5.696963 6.341120 7.689739
#> 4 -0.2146999 3.946195 5.556663 4.870637 7.028002
#> 5 3.1249309 2.622940 4.311244 7.433024 6.256727
#> 6 1.9550664 3.585005 4.292505 7.980400 7.188792
#> 7 1.9838097 3.605710 5.364582 5.632779 5.195041
#> 8 2.9438362 3.940687 5.768533 4.955865 8.465555
#> 9 2.8212212 5.100025 4.887654 6.569720 7.153253
#> 10 2.5939013 4.763176 5.881108 5.864945 9.172612
#> (etc)
Created on 2021-11-13 by the reprex package (v2.0.0)
I have some data which looks like:
col
1 €€€€€
2 ££
3 €£
4 €€
5 €€€€€
6 €€€€
7 €€
8 €€
9 €€
10 €€
11 €€
12 €€
13 €€€€
14 €€€
15 €€€€
16 €€
17 €€
18 €€€€
19 $$
20 €€€CHF
It contains a collapsed set of currency symbols of different lengths. What I would like to do is to create a new column and extract the unique currencies. In most cases the currencies are all the same however in row 3 and row 20 the currencies look like: €£ and €€€CHF respectively.
Expected output:
col colCur1 colCur2
1 €€€€€ €
2 ££ £
3 €£ € £
4 €€ ...
5 €€€€€
6 €€€€
7 €€
8 €€
9 €€
10 €€
11 €€
12 €€
13 €€€€
14 €€€
15 €€€€
16 €€
17 €€
18 €€€€ ...
19 $$ $
20 €€€CHF € CHF
Data:
structure(list(col = c("\200\200\200\200\200", "££", "\200£",
"\200\200", "\200\200\200\200\200", "\200\200\200\200", "\200\200",
"\200\200", "\200\200", "\200\200", "\200\200", "\200\200", "\200\200\200\200",
"\200\200\200", "\200\200\200\200", "\200\200", "\200\200", "\200\200\200\200",
"$$", "\200\200\200CHF")), class = "data.frame", row.names = c(NA,
-20L))
Here is an option
library(dplyr)
library(tidyr)
library(stringr)
df1 %>%
mutate(col2 = str_replace_all(col, "(.)\\1+", "\\1"),
col2 = str_replace_all(col2, "([^A-Z])([^A-Z])", "\\1,\\2"),
col2 = str_replace_all(col2, "(?<=[^A-Z])(?=[A-Z])", ","),
col2 = strsplit(col2, ",")) %>%
unnest_wider(c(col2)) %>%
rename_at(-1, ~ str_c('colCur', seq_along(.)))
-output
# A tibble: 20 x 3
# col colCur1 colCur2
# <chr> <chr> <chr>
# 1 €€€€€ € <NA>
# 2 ££ £ <NA>
# 3 €£ € £
# 4 €€ € <NA>
# 5 €€€€€ € <NA>
# 6 €€€€ € <NA>
# 7 €€ € <NA>
# 8 €€ € <NA>
# 9 €€ € <NA>
#10 €€ € <NA>
#11 €€ € <NA>
#12 €€ € <NA>
#13 €€€€ € <NA>
#14 €€€ € <NA>
#15 €€€€ € <NA>
#16 €€ € <NA>
#17 €€ € <NA>
#18 €€€€ € <NA>
#19 $$ $ <NA>
#20 €€€CHF € CHF
I received a set of dates, but it turns out that time is reported in days since 01-01-1960 in this specific data set.
D_INDDTO
1 20758
2 20856
3 21062
4 19740
5 21222
6 21203
The specific date of interest for Patient 1 is 20758 days since 01-01-60
I want to create a new covariate u$date containing the specific date of interest i d%m%y%. I tried
library(tidyverse)
u %>% mutate(date=as.date(D_INDDTO,origin="1960-01-01")
But that did not solve it.
u <- structure(list(D_INDDTO = c(20758, 20856, 21062, 19740, 21222,
21203, 20976, 20895, 18656, 18746)), row.names = c(NA, 10L), class = "data.frame")
Try this:
#Code 1
u %>% mutate(date=as.Date("1960-01-01")+D_INDDTO)
Output:
D_INDDTO date
1 20758 2016-10-31
2 20856 2017-02-06
3 21062 2017-08-31
4 19740 2014-01-17
5 21222 2018-02-07
6 21203 2018-01-19
7 20976 2017-06-06
8 20895 2017-03-17
9 18656 2011-01-29
10 18746 2011-04-29
Or this:
#Code 2
u %>% mutate(date=as.Date(D_INDDTO,origin="1960-01-01"))
Output:
D_INDDTO date
1 20758 2016-10-31
2 20856 2017-02-06
3 21062 2017-08-31
4 19740 2014-01-17
5 21222 2018-02-07
6 21203 2018-01-19
7 20976 2017-06-06
8 20895 2017-03-17
9 18656 2011-01-29
10 18746 2011-04-29
Or this:
#Code 3
u %>% mutate(date=format(as.Date(D_INDDTO,origin="1960-01-01"),'%d%m%y'))
Output:
D_INDDTO date
1 20758 311016
2 20856 060217
3 21062 310817
4 19740 170114
5 21222 070218
6 21203 190118
7 20976 060617
8 20895 170317
9 18656 290111
10 18746 290411
If more customization is required:
#Code 4
u %>% mutate(date=format(as.Date(D_INDDTO,origin="1960-01-01"),'%d-%m-%Y'))
Output:
D_INDDTO date
1 20758 31-10-2016
2 20856 06-02-2017
3 21062 31-08-2017
4 19740 17-01-2014
5 21222 07-02-2018
6 21203 19-01-2018
7 20976 06-06-2017
8 20895 17-03-2017
9 18656 29-01-2011
10 18746 29-04-2011
Stock.Open <- rep(c(102.25,102.87,102.25,100.87,103.44,103.87,103.00),times=3)
Stock.Close <- rep(c(102.12,102.62,100.12,103.00,103.87,103.12,105.12), times=3)
Stock.id<-rep(1:3,each=7)
day<-rep(c(1:7),times=3)
df<-data.frame(day,Stock.Close,Stock.Open,Stock.id)
How to I calculate the %difference for stock.open and stock.close each day w.r.t to the previous day. Ex: I want to calculate %change in Stock.open between day1,day2 then day2,day3, day3,day4, and so on..
To perform the same task for each ID.
Try this:
Stock.Open <- rep(c(102.25,102.87,102.25,100.87,103.44,103.87,103.00),times=3)
Stock.Close <- rep(c(102.12,102.62,100.12,103.00,103.87,103.12,105.12), times=3)
Stock.id<-rep(1:3,each=7)
day<-rep(c(1:7),times=3)
df<-data.frame(day,Stock.Close,Stock.Open,Stock.id)
#Subset stock data per day
temp <- subset(df, select = c(Stock.Open, Stock.Close, day, Stock.id))
#Change day and rename
temp$day <- temp$day + 1
require(plyr)
temp <- plyr::rename(temp, c("Stock.Open" = "Stock.Open.Pre", "Stock.Close" = "Stock.Close.Pre"))
#Merge back
df <- join(df, temp, by = c("day", "Stock.id"), type = "left")
#Compute difference
df$Stock.Open.Diff <- (df$Stock.Open / df$Stock.Open.Pre) - 1
df$Stock.Close.Diff <- (df$Stock.Close / df$Stock.Close.Pre) - 1
Try this:
library(dplyr)
df %>%
group_by(Stock.id) %>%
arrange(day) %>%
mutate(Change.Stock.Open = c(NA, diff(Stock.Open))/Stock.Open,
Change.Stock.Close = c(NA, diff(Stock.Close))/Stock.Close)
# A tibble: 21 x 6
# Groups: Stock.id [3]
day Stock.Close Stock.Open Stock.id Change.Stock.Open Change.Stock.Close
<int> <dbl> <dbl> <int> <dbl> <dbl>
1 1 102.12 102.25 1 NA NA
2 1 102.12 102.25 2 NA NA
3 1 102.12 102.25 3 NA NA
4 2 102.62 102.87 1 0.006027024 0.004872345
5 2 102.62 102.87 2 0.006027024 0.004872345
6 2 102.62 102.87 3 0.006027024 0.004872345
7 3 100.12 102.25 1 -0.006063570 -0.024970036
8 3 100.12 102.25 2 -0.006063570 -0.024970036
9 3 100.12 102.25 3 -0.006063570 -0.024970036
10 4 103.00 100.87 1 -0.013680976 0.027961165
# ... with 11 more rows
(Values associated with day 1 for each stock are NA since there's no previous day for comparison)
Yet another solution, this time using base R only.
df$Stock.Close.Change <- ave(df$Stock.Close, df$Stock.id, FUN = function(x) c(NA, diff(x))/x)
df$Stock.Open.Change <- ave(df$Stock.Open, df$Stock.id, FUN = function(x) c(NA, diff(x))/x)
df <- df[order(df$day), ]
row.names(df) <- NULL
df
day Stock.Close Stock.Open Stock.id Stock.Close.Change Stock.Open.Change
1 1 102.12 102.25 1 NA NA
2 1 102.12 102.25 2 NA NA
3 1 102.12 102.25 3 NA NA
4 2 102.62 102.87 1 0.004872345 0.006027024
5 2 102.62 102.87 2 0.004872345 0.006027024
6 2 102.62 102.87 3 0.004872345 0.006027024
7 3 100.12 102.25 1 -0.024970036 -0.006063570
8 3 100.12 102.25 2 -0.024970036 -0.006063570
9 3 100.12 102.25 3 -0.024970036 -0.006063570
10 4 103.00 100.87 1 0.027961165 -0.013680976
11 4 103.00 100.87 2 0.027961165 -0.013680976
12 4 103.00 100.87 3 0.027961165 -0.013680976
13 5 103.87 103.44 1 0.008375854 0.024845321
14 5 103.87 103.44 2 0.008375854 0.024845321
15 5 103.87 103.44 3 0.008375854 0.024845321
16 6 103.12 103.87 1 -0.007273080 0.004139790
17 6 103.12 103.87 2 -0.007273080 0.004139790
18 6 103.12 103.87 3 -0.007273080 0.004139790
19 7 105.12 103.00 1 0.019025875 -0.008446602
20 7 105.12 103.00 2 0.019025875 -0.008446602
21 7 105.12 103.00 3 0.019025875 -0.008446602
I've got one data frame with the names of variables, and a 1:p index of the order that I'd like them to be in.
I've got a second data frame where the order of these variables is all messed up. How do I take the information from the first to order the columns of the second?
1> key = data.frame(index = 1:6,vars = paste("V",1:6,sep=""))
1> key
index vars
1 1 V1
2 2 V2
3 3 V3
4 4 V4
5 5 V5
6 6 V6
1> set.seed(42)
1> data = data.frame(matrix(rnorm(60),10))
1> colnames(data) = sample(key$vars)
1> data
V3 V6 V5 V2 V4 V1
1 1.37095845 1.3048697 -0.3066386 0.45545012 0.2059986 0.32192527
2 -0.56469817 2.2866454 -1.7813084 0.70483734 -0.3610573 -0.78383894
3 0.36312841 -1.3888607 -0.1719174 1.03510352 0.7581632 1.57572752
4 0.63286260 -0.2787888 1.2146747 -0.60892638 -0.7267048 0.64289931
5 0.40426832 -0.1333213 1.8951935 0.50495512 -1.3682810 0.08976065
6 -0.10612452 0.6359504 -0.4304691 -1.71700868 0.4328180 0.27655075
7 1.51152200 -0.2842529 -0.2572694 -0.78445901 -0.8113932 0.67928882
8 -0.09465904 -2.6564554 -1.7631631 -0.85090759 1.4441013 0.08983289
9 2.01842371 -2.4404669 0.4600974 -2.41420765 -0.4314462 -2.99309008
10 -0.06271410 1.3201133 -0.6399949 0.03612261 0.6556479 0.28488295
data[as.character(key$vars)]
will do the trick.
# V1 V2 V3 V4 V5 V6
# 1 0.32192527 0.45545012 1.37095845 0.2059986 -0.3066386 1.3048697
# 2 -0.78383894 0.70483734 -0.56469817 -0.3610573 -1.7813084 2.2866454
# 3 1.57572752 1.03510352 0.36312841 0.7581632 -0.1719174 -1.3888607
# 4 0.64289931 -0.60892638 0.63286260 -0.7267048 1.2146747 -0.2787888
# 5 0.08976065 0.50495512 0.40426832 -1.3682810 1.8951935 -0.1333213
# 6 0.27655075 -1.71700868 -0.10612452 0.4328180 -0.4304691 0.6359504
# 7 0.67928882 -0.78445901 1.51152200 -0.8113932 -0.2572694 -0.2842529
# 8 0.08983289 -0.85090759 -0.09465904 1.4441013 -1.7631631 -2.6564554
# 9 -2.99309008 -2.41420765 2.01842371 -0.4314462 0.4600974 -2.4404669
# 10 0.28488295 0.03612261 -0.06271410 0.6556479 -0.6399949 1.3201133