Related
This question already has answers here:
Pasting two vectors with combinations of all vectors' elements
(8 answers)
Closed 2 years ago.
I have two vectors, one that contains a list of variables, and one that contains dates, such as
Variables_Pays <- c("PIB", "ConsommationPrivee","ConsommationPubliques",
"FBCF","ProductionIndustrielle","Inflation","InflationSousJacente",
"PrixProductionIndustrielle","CoutHoraireTravail")
Annee_Pays <- c("2000","2001")
I want to merge them to have a vector with each variable indexed by my date, that is my desired output is
> Colonnes_Pays_Principaux
[1] "PIB_2020" "PIB_2021" "ConsommationPrivee_2020"
[4] "ConsommationPrivee_2021" "ConsommationPubliques_2020" "ConsommationPubliques_2021"
[7] "FBCF_2020" "FBCF_2021" "ProductionIndustrielle_2020"
[10] "ProductionIndustrielle_2021" "Inflation_2020" "Inflation_2021"
[13] "InflationSousJacente_2020" "InflationSousJacente_2021" "PrixProductionIndustrielle_2020"
[16] "PrixProductionIndustrielle_2021" "CoutHoraireTravail_2020" "CoutHoraireTravail_2021"
Is there a simpler / more readabl way than a double for loop as I have tried and succeeded below ?
Colonnes_Pays_Principaux <- vector()
for (Variable in (1:length(Variables_Pays))){
for (Annee in (1:length(Annee_Pays))){
Colonnes_Pays_Principaux=
append(Colonnes_Pays_Principaux,
paste(Variables_Pays[Variable],Annee_Pays[Annee],sep="_")
)
}
}
expand.grid will create a data frame with all combinations of the two vectors.
with(
expand.grid(Variables_Pays, Annee_Pays),
paste0(Var1, "_", Var2)
)
#> [1] "PIB_2000" "ConsommationPrivee_2000"
#> [3] "ConsommationPubliques_2000" "FBCF_2000"
#> [5] "ProductionIndustrielle_2000" "Inflation_2000"
#> [7] "InflationSousJacente_2000" "PrixProductionIndustrielle_2000"
#> [9] "CoutHoraireTravail_2000" "PIB_2001"
#> [11] "ConsommationPrivee_2001" "ConsommationPubliques_2001"
#> [13] "FBCF_2001" "ProductionIndustrielle_2001"
#> [15] "Inflation_2001" "InflationSousJacente_2001"
#> [17] "PrixProductionIndustrielle_2001" "CoutHoraireTravail_2001"
We can use outer :
c(t(outer(Variables_Pays, Annee_Pays, paste, sep = '_')))
# [1] "PIB_2000" "PIB_2001"
# [3] "ConsommationPrivee_2000" "ConsommationPrivee_2001"
# [5] "ConsommationPubliques_2000" "ConsommationPubliques_2001"
# [7] "FBCF_2000" "FBCF_2001"
# [9] "ProductionIndustrielle_2000" "ProductionIndustrielle_2001"
#[11] "Inflation_2000" "Inflation_2001"
#[13] "InflationSousJacente_2000" "InflationSousJacente_2001"
#[15] "PrixProductionIndustrielle_2000" "PrixProductionIndustrielle_2001"
#[17] "CoutHoraireTravail_2000" "CoutHoraireTravail_2001"
No real need to go beyond the basics here! Use paste for pasting the strings and rep to repeat either Annee_Pays och Variables_Pays to get all combinations:
Variables_Pays <- c("PIB", "ConsommationPrivee","ConsommationPubliques",
"FBCF","ProductionIndustrielle","Inflation","InflationSousJacente",
"PrixProductionIndustrielle","CoutHoraireTravail")
Annee_Pays <- c("2000","2001")
# To get this is the same order as in your example:
paste(rep(Variables_Pays, rep(2, length(Variables_Pays))), Annee_Pays, sep = "_")
# Alternative order:
paste(Variables_Pays, rep(Annee_Pays, rep(length(Variables_Pays), 2)), sep = "_")
# Or, if order doesn't matter too much:
paste(Variables_Pays, rep(Annee_Pays, length(Variables_Pays)), sep = "_")
In base R:
Variables_Pays <- c("PIB", "ConsommationPrivee","ConsommationPubliques",
"FBCF","ProductionIndustrielle","Inflation","InflationSousJacente",
"PrixProductionIndustrielle","CoutHoraireTravail")
Annee_Pays <- c("2000","2001")
cbind(paste(Variables_Pays, Annee_Pays,sep="_"),paste(Variables_Pays, rev(Annee_Pays),sep="_")
Is there a way to vectorize an R function over all combinations of multiple parameters and return the result as a list?
As an example, using Vectorize over rnorm produces the following, but I would like to have a list of vectors corresponding to each combination of the arguments (so it should return a list of 60 vectors instead of just 5):
> vrnorm <- Vectorize(rnorm)
> vrnorm( 10*1:5, mean = 1:4, sd = 1:3)
[[1]]
[1] 1.37858918 -0.85432372 1.87321175 2.08362291 0.02950438 1.67967249
[7] 2.25954748 1.44031251 0.09816078 0.91365201
[[2]]
[1] 1.7717267 1.7961157 2.3291686 2.6114272 2.6228930 -0.2580403
[7] 3.3232109 -0.4652434 -0.4803258 -0.1170871 0.1158350 -1.0902252
[13] -0.6400934 3.6625290 2.5924096 4.5878564 0.7265718 3.2034281
[19] -0.2499768 2.0164275
[[3]]
[1] 5.8251252 3.1089121 2.8893594 2.9079357 1.9308677 4.3359878
[7] -0.3668157 4.9728508 -0.6494110 6.7729562 6.1623976 -0.1696638
[13] 5.4664038 3.8141798 -3.1842879 2.3985010 0.3840465 4.0696628
[19] 4.8217798 3.3135100 4.9028273 3.6193840 4.8861864 3.9871897
[25] -0.1059491 3.8961742 4.8293925 3.8935335 6.3194862 4.7846143
[[4]]
[1] 3.737043 2.849215 4.611868 3.494396 2.909659 4.861474 2.000194 3.343171
[9] 4.019523 3.277575 3.885272 3.331160 4.581551 4.960162 3.061960 5.359514
[17] 4.651848 3.640535 3.612368 4.338019 5.233665 3.585976 4.018191 4.320883
[25] 2.598541 3.519587 5.231375 4.733647 2.493334 2.791483 4.330052 2.498424
[33] 3.317115 3.515012 5.079780 4.720884 3.055191 5.262385 1.939961 4.779480
[[5]]
[1] 4.31697756 0.93754587 3.96698522 -0.03680018 1.94987430 1.73985617
[7] -1.42300550 2.07764933 0.45701395 2.42548257 0.67745524 -2.42054060
[13] 1.14655845 1.60277193 -1.04636658 0.94097335 3.07688803 0.58049012
[19] 1.25812532 1.91613097 -2.95408979 3.00990345 -0.67314868 0.64746260
[25] 1.69640497 0.68493689 2.84261574 1.65290227 4.16990548 -3.30426803
[31] 3.80508273 5.95888355 -0.09021591 3.88157980 -1.19166351 2.70208228
[37] -0.56278834 -0.83943824 -0.86868493 -1.19995506 -2.30275483 1.70435276
[43] 2.67984044 -0.04976799 0.98716782 2.71171575 5.21648742 0.13860495
[49] 1.61038570 0.50679460
Use expand.grid to expand all arguments and create a data frame, and then use mapply.
dat <- expand.grid(n = 10 * 1:5, mean = 1:4, sd = 1:3)
mapply(rnorm, dat$n, dat$mean, dat$sd, SIMPLIFY = FALSE)
You can also use purrr::pmap() as an alternative to mapply
library(purrr)
dat <- expand.grid(n = 10 * 1:5, mean = 1:4, sd = 1:3)
pmap(dat, rnorm)
EDITED
I have a simple list of column names that I would like to change the format of, ideally programmatically. This is a sample of the list:
vars_list <- c("tBodyAcc.mean...X", "tBodyAcc.mean...Y", "tBodyAcc.mean...Z",
"tBodyAcc.std...X", "tBodyAcc.std...Y", "tBodyAcc.std...Z",
"tGravityAcc.mean...X", "tGravityAcc.mean...Y", "tGravityAcc.mean...Z",
"tGravityAcc.std...X", "tGravityAcc.std...Y", "tGravityAcc.std...Z",
"fBodyAcc.mean...X", "fBodyAcc.mean...Y", "fBodyAcc.mean...Z",
"fBodyAcc.std...X", "fBodyAcc.std...Y", "fBodyAcc.std...Z",
"fBodyAccJerk.mean...X", "fBodyAccJerk.mean...Y", "fBodyAccJerk.mean...Z",
"fBodyAccJerk.std...X", "fBodyAccJerk.std...Y", "fBodyAccJerk.std...Z")
And this is the result I'm hoping for:
[3]"Time_Body_Acc_Mean_X" "Time_Body_Acc_Mean_Y"
[5] "Time_Body_Acc_Mean_Z" "Time_Body_Acc_Stddev_X"
[7] "Time_Body_Acc_Stddev_Y" "Time_Body_Acc_Stddev_Z"
[9] "Time_Gravity_Acc_Mean_X" "Time_Gravity_Acc_Mean_Y"
[11] "Time_Gravity_Acc_Mean_Z" "Time_Gravity_Acc_Stddev_X"
[13] "Time_Gravity_Acc_Stddev_Y" "Time_Gravity_Acc_Stddev_Z"
...
[43] "Freq_Body_Acc_Mean_X" "Freq_Body_Acc_Mean_Y"
[45] "Freq_Body_Acc_Mean_Z" "Freq_Body_Acc_Stddev_X"
[47] "Freq_Body_Acc_Stddev_Y" "Freq_Body_Acc_Stddev_Z"
[49] "Freq_Body_Acc_Jerk_Mean_X" "Freq_Body_Acc_Jerk_Mean_Y"
[51] "Freq_Body_Acc_Jerk_Mean_Z" "Freq_Body_Acc_Jerk_Stddev_X"
[53] "Freq_Body_Acc_Jerk_Stddev_Y" "Freq_Body_Acc_Jerk_Stddev_Z"
I've put together what feels like a really verbose way of making the changes employing regular expressions.
vars_list <- unlist(lapply(vars_list, function(x){gsub("^t", "Time", x)}))
vars_list <- unlist(lapply(vars_list, function(x){gsub("^f", "Freq", x)}))
vars_list <- unlist(lapply(vars_list, function(x){gsub("std", "Stddev", x)}))
vars_list <- unlist(lapply(vars_list, function(x){gsub("mean", "Mean", x)}))
vars_list <- unlist(lapply(vars_list, function(x){gsub("\\.+", "", x)}))
vars_list <- unlist(lapply(vars_list, function(x){gsub("\\.", "", x)}))
vars_list <- unlist(lapply(vars_list,
function(x){gsub("(?<=[a-z]).{0}(?=[A-Z])",
"_", x, perl = TRUE)}))
Is there a way to arrive at the same results more efficiently and elegantly by including two or more formatting steps in a single function call?
One alternative is to write your patterns and replacement in two vectors, then use stringi::stri_replace_all_regex which can do this replacement in a vectorized manner:
# patterns correspond to replacement at the same positions
patterns <- c('^t', '^f', 'std', 'mean', '\\.+', '(?<=[a-z])([A-Z])')
replacement <- c('Time', 'Freq', 'Stddev', 'Mean', '', '_$1')
library(stringi)
stri_replace_all_regex(vars_list, patterns, replacement, vectorize_all = F)
# [1] "Time_Body_Acc_Mean_X" "Time_Body_Acc_Mean_Y"
# [3] "Time_Body_Acc_Mean_Z" "Time_Body_Acc_Stddev_X"
# [5] "Time_Body_Acc_Stddev_Y" "Time_Body_Acc_Stddev_Z"
# [7] "Time_Gravity_Acc_Mean_X" "Time_Gravity_Acc_Mean_Y"
# [9] "Time_Gravity_Acc_Mean_Z" "Time_Gravity_Acc_Stddev_X"
#[11] "Time_Gravity_Acc_Stddev_Y" "Time_Gravity_Acc_Stddev_Z"
How about this using base R's sub?
sub("t(\\w+)(Acc)\\.(\\w+)\\.+([XYZ])", "Time_\\1_\\2_\\3_\\4", vars_list);
#[1] "Time_Body_Acc_mean_X" "Time_Body_Acc_mean_Y"
#[3] "Time_Body_Acc_mean_Z" "Time_Body_Acc_std_X"
#[5] "Time_Body_Acc_std_Y" "Time_Body_Acc_std_Z"
#[7] "Time_Gravity_Acc_mean_X" "Time_Gravity_Acc_mean_Y"
#[9] "Time_Gravity_Acc_mean_Z" "Time_Gravity_Acc_std_X"
#[11] "Time_Gravity_Acc_std_Y" "Time_Gravity_Acc_std_Z"
Changing mean to Mean, and std to StdDev requires two additional subs.
Ditto for t to Time and f to Freq.
As the title states, I am trying to use gsub where I use a vector for the "pattern" and "replacement". Currently, I have a code that looks like this:
names(x1) <- gsub("2110027599", "Inv1", names(x1)) #x1 is a data frame
names(x1) <- gsub("2110025622", "Inv2", names(x1))
names(x1) <- gsub("2110028045", "Inv3", names(x1))
names(x1) <- gsub("2110034716", "Inv4", names(x1))
names(x1) <- gsub("2110069349", "Inv5", names(x1))
names(x1) <- gsub("2110023264", "Inv6", names(x1))
What I hope to do is something like this:
a <- c("2110027599","2110025622","2110028045","2110034716", "2110069349", "2110023264")
b <- c("Inv1","Inv2","Inv3","Inv4","Inv5","Inv6")
names(x1) <- gsub(a,b,names(x1))
I'm guessing there is an apply function somewhere that can do this, but I am not very sure which one to use!
EDIT: names(x1) looks like this (There are many more columns, but I'm leaving them out):
> names(x1)
[1] "2110023264A.Ms.Amp" "2110023264A.Ms.Vol" "2110023264A.Ms.Watt" "2110023264A1.Ms.Amp"
[5] "2110023264A2.Ms.Amp" "2110023264A3.Ms.Amp" "2110023264A4.Ms.Amp" "2110023264A5.Ms.Amp"
[9] "2110023264B.Ms.Amp" "2110023264B.Ms.Vol" "2110023264B.Ms.Watt" "2110023264B1.Ms.Amp"
[13] "2110023264Error" "2110023264E-Total" "2110023264GridMs.Hz" "2110023264GridMs.PhV.phsA"
[17] "2110023264GridMs.PhV.phsB" "2110023264GridMs.PhV.phsC" "2110023264GridMs.TotPFPrc" "2110023264Inv.TmpLimStt"
[21] "2110023264InvCtl.Stt" "2110023264Mode" "2110023264Mt.TotOpTmh" "2110023264Mt.TotTmh"
[25] "2110023264Op.EvtCntUsr" "2110023264Op.EvtNo" "2110023264Op.GriSwStt" "2110023264Op.TmsRmg"
[29] "2110023264Pac" "2110023264PlntCtl.Stt" "2110023264Serial Number" "2110025622A.Ms.Amp"
[33] "2110025622A.Ms.Vol" "2110025622A.Ms.Watt" "2110025622A1.Ms.Amp" "2110025622A2.Ms.Amp"
[37] "2110025622A3.Ms.Amp" "2110025622A4.Ms.Amp" "2110025622A5.Ms.Amp" "2110025622B.Ms.Amp"
[41] "2110025622B.Ms.Vol" "2110025622B.Ms.Watt" "2110025622B1.Ms.Amp" "2110025622Error"
[45] "2110025622E-Total" "2110025622GridMs.Hz" "2110025622GridMs.PhV.phsA" "2110025622GridMs.PhV.phsB"
What I hope to get is this:
> names(x1)
[1] "Inv6A.Ms.Amp" "Inv6A.Ms.Vol" "Inv6A.Ms.Watt" "Inv6A1.Ms.Amp" "Inv6A2.Ms.Amp"
[6] "Inv6A3.Ms.Amp" "Inv6A4.Ms.Amp" "Inv6A5.Ms.Amp" "Inv6B.Ms.Amp" "Inv6B.Ms.Vol"
[11] "Inv6B.Ms.Watt" "Inv6B1.Ms.Amp" "Inv6Error" "Inv6E-Total" "Inv6GridMs.Hz"
[16] "Inv6GridMs.PhV.phsA" "Inv6GridMs.PhV.phsB" "Inv6GridMs.PhV.phsC" "Inv6GridMs.TotPFPrc" "Inv6Inv.TmpLimStt"
[21] "Inv6InvCtl.Stt" "Inv6Mode" "Inv6Mt.TotOpTmh" "Inv6Mt.TotTmh" "Inv6Op.EvtCntUsr"
[26] "Inv6Op.EvtNo" "Inv6Op.GriSwStt" "Inv6Op.TmsRmg" "Inv6Pac" "Inv6PlntCtl.Stt"
[31] "Inv6Serial Number" "Inv2A.Ms.Amp" "Inv2A.Ms.Vol" "Inv2A.Ms.Watt" "Inv2A1.Ms.Amp"
[36] "Inv2A2.Ms.Amp" "Inv2A3.Ms.Amp" "Inv2A4.Ms.Amp" "Inv2A5.Ms.Amp" "Inv2B.Ms.Amp"
[41] "Inv2B.Ms.Vol" "Inv2B.Ms.Watt" "Inv2B1.Ms.Amp" "Inv2Error" "Inv2E-Total"
[46] "Inv2GridMs.Hz" "Inv2GridMs.PhV.phsA" "Inv2GridMs.PhV.phsB"
Lot's of solutions already, here are one more:
The qdap package:
library(qdap)
names(x1) <- mgsub(a,b,names(x1))
From stringr documentation of str_replace_all, "If you want to apply multiple patterns and replacements to the same string, pass a named version to pattern."
Thus using a, b, and names(x1) from above
stringr::str_replace_all(names(x1), setNames(b, a))
EDIT
stringr::str_replace_all calls stringi::stri_replace_all_regex, which can be used directly and is quite a bit quicker.
x <- names(x1)
pattern <- a
replace <- b
microbenchmark::microbenchmark(
str = stringr::str_replace_all(x, setNames(replace, pattern)),
stri = stringi::stri_replace_all_regex(x, pattern, replace, vectorize_all = FALSE)
)
Unit: microseconds
expr min lq mean median uq max neval cld
str 1022.1 1070.45 1286.547 1175.55 1309 2526.8 100 b
stri 145.2 150.45 190.124 160.55 178 457.9 100 a
New Answer
If we can make another assumption, the following should work. The assumption this time is that you are really interested in substituting the first 10 characters from each value in names(x1).
Here, I've stored names(x1) as a character vector named "X1". The solution essentially uses substr to separate the values in X1 into 2 parts, match to figure out the correct replacement option, and paste to put everything back together.
a <- c("2110027599", "2110025622", "2110028045",
"2110034716", "2110069349", "2110023264")
b <- c("Inv1","Inv2","Inv3","Inv4","Inv5","Inv6")
X1pre <- substr(X1, 1, 10)
X1post <- substr(X1, 11, max(nchar(X1)))
paste0(b[match(X1pre, a)], X1post)
# [1] "Inv6A.Ms.Amp" "Inv6A.Ms.Vol" "Inv6A.Ms.Watt"
# [4] "Inv6A1.Ms.Amp" "Inv6A2.Ms.Amp" "Inv6A3.Ms.Amp"
# [7] "Inv6A4.Ms.Amp" "Inv6A5.Ms.Amp" "Inv6B.Ms.Amp"
# [10] "Inv6B.Ms.Vol" "Inv6B.Ms.Watt" "Inv6B1.Ms.Amp"
# [13] "Inv6Error" "Inv6E-Total" "Inv6GridMs.Hz"
# [16] "Inv6GridMs.PhV.phsA" "Inv6GridMs.PhV.phsB" "Inv6GridMs.PhV.phsC"
# [19] "Inv6GridMs.TotPFPrc" "Inv6Inv.TmpLimStt" "Inv6InvCtl.Stt"
# [22] "Inv6Mode" "Inv6Mt.TotOpTmh" "Inv6Mt.TotTmh"
# [25] "Inv6Op.EvtCntUsr" "Inv6Op.EvtNo" "Inv6Op.GriSwStt"
# [28] "Inv6Op.TmsRmg" "Inv6Pac" "Inv6PlntCtl.Stt"
# [31] "Inv6Serial Number" "Inv2A.Ms.Amp" "Inv2A.Ms.Vol"
# [34] "Inv2A.Ms.Watt" "Inv2A1.Ms.Amp" "Inv2A2.Ms.Amp"
# [37] "Inv2A3.Ms.Amp" "Inv2A4.Ms.Amp" "Inv2A5.Ms.Amp"
# [40] "Inv2B.Ms.Amp" "Inv2B.Ms.Vol" "Inv2B.Ms.Watt"
# [43] "Inv2B1.Ms.Amp" "Inv2Error" "Inv2E-Total"
# [46] "Inv2GridMs.Hz" "Inv2GridMs.PhV.phsA" "Inv2GridMs.PhV.phsB"
Old Answer
If we can assume that names(x1) is in the same order as the pattern and replacement and that it is basically a one-for-one replacement, you might be able to get away with just sapply.
Here's an example of that particular situation:
Imagine "names(x)" looks something like this:
X1 <- paste0("A2", a, sequence(length(a)))
X1
# [1] "A221100275991" "A221100256222" "A221100280453"
# [4] "A221100347164" "A221100693495" "A221100232646"
Here's our pattern and replacement vectors:
a <- c("2110027599", "2110025622", "2110028045",
"2110034716", "2110069349", "2110023264")
b <- c("Inv1","Inv2","Inv3","Inv4","Inv5","Inv6")
This is how we might use sapply if these assumptions are valid.
sapply(seq_along(a), function(x) gsub(a[x], b[x], X1[x]))
# [1] "A2Inv11" "A2Inv22" "A2Inv33" "A2Inv44" "A2Inv55" "A2Inv66"
Try mapply.
names(x1) <- mapply(gsub, a, b, names(x1), USE.NAMES = FALSE)
Or, even easier, str_replace from stringr.
library(stringr)
names(x1) <- str_replace(names(x1), a, b)
I needed to do something similar but had to use base R. As long as your vectors are the same length, I think this will work
for (i in seq_along(a)){
names(x1) <- gsub(a[i], b[i], names(x1))
}
Somehow names<- and match seems much more appropriate here...
names( x1 ) <- b[ match( names( x1 ) , a ) ]
But I am making the assumption that the elements of vector a are the actual names of your data.frame.
If a really is a pattern found within each of the names of x1 then this grepl approach with names<- could be useful...
new <- sapply( a , grepl , x = names( x1 ) )
names( x1 ) <- b[ apply( new , 1 , which.max ) ]
I am in the trouble of getting the values which have the same dates from two different data sources in R. The code is
#Monthly data
month_data <- c(580.11, 618.25, 641.24, 604.85, 580.86, 580.07, 632.97,
685.09, 754.50, 680.30, 698.37, 707.38, 480.11, 528.25,
541.24, 614.85, 680.86)
month_dates <- seq(as.Date("2001/06/01"), by = "1 months", length = 17)
month_data <- data.frame(month_dates, month_data)
#the dates_for_match is a list:
dates_for_match<-list(c( "2001-08-01","2001-09-01", "2001-10-01"),c("2001-11-01","2001-12-01","2002-01-01"),c("2002-02-01","2002-03-01","2002-04-01"),c("2002-05-01","2002-06-01","2002-07-01"),c( "2002-08-01","2002-09-01", "2002-10-01"))
Example:
> dates_for_match
[[1]]
[1] "2001-08-01" "2001-09-01" "2001-10-01"
[[2]]
[1] "2001-11-01" "2001-12-01" "2002-01-01"
[[3]]
[1] "2002-02-01" "2002-03-01" "2002-04-01"
[[4]]
[1] "2002-05-01" "2002-06-01" "2002-07-01"
[[5]]
[1] "2002-08-01" "2002-09-01" "2002-10-01"
I want to use the dates_for_match list to get the values from month_data that have the same dates.
You need %in%...
month_data[ month_dates %in% unlist( dates_for_match ) , 2 ]