Partial string match in another dataframe in r

Partial string match in another dataframe in r - r

Is there a way where I can find all the partial matches from df_2 to df_1?
partial match (if part of DF_1 string is in the whole string of DF_2)
For example, part of "for solution" is in the whole string of "solution"
df_1=data.frame(
DF_1=c("suspension","tablet","for solution","capsule")
)
df_2=data.frame(
index=c("1","2","3","4","5"),
DF_2=c("for suspension", "suspension", "solution", "tablet,ER","tablet,IR")
)
df_out=data.frame(
DF_1=c("suspension","suspension","tablet","tablet","for solution"),
DF_2=c("for suspension", "suspension","tablet,ER","tablet,IR","solution"),
index=c("1","2","4","5","3")
)

We can use fuzzyjoin
library(fuzzyjoin)
regex_left_join(df_2, df_1, by = c("DF_2"= "DF_1"))

Following #Akrun suggestion of using fuzzyjoin
According to your expected output, you want to join twice, and you want to perform inner_join.
Finally you'll match twice if there is a perfect match, which is why you want to deduplicate (I did it with distinct from dplyr but you can do it with what you want.
df_out = distinct(
rbind(
regex_inner_join(df_1, df_2, by = c("DF_1"= "DF_2")),
regex_inner_join(df_2, df_1, by = c("DF_2"= "DF_1"))
)
)
df_out
The output is:
DF_1 index DF_2
1 suspension 2 suspension
2 for solution 3 solution
3 suspension 1 for suspension
4 tablet 4 tablet,ER
5 tablet 5 tablet,IR
You find your expected table, not in the same order though (lines & columns).

Here is a base R option using nested *apply + grepl
df_out <- within(
df_2,
DF_1 <- unlist(sapply(
DF_2,
function(x) {
Filter(
Negate(is.na),
lapply(
df_1$DF_1,
function(y) ifelse(grepl(y, x), y, ifelse(grepl(x, y), x, NA))
)
)
}
), use.names = FALSE)
)
such that
> df_out
index DF_2 DF_1
1 1 for suspension suspension
2 2 suspension suspension
3 3 solution solution
4 4 tablet,ER tablet
5 5 tablet,IR tablet

This sounds like a job for grepl()!
E.g. grepl(value, chars, fixed = TRUE)
Let me quote an example from a different answer:
> chars <- "test"
> value <- "es"
> grepl(value, chars)
[1] TRUE
> chars <- "test"
> value <- "et"
> grepl(value, chars)
[1] FALSE

Related

R: Counting frequency of words from predefined dictionary

I have a very large dataset that looks like this: one column contains names, the second column contains their respective (very long) texts. I also have a pre-defined dictionary that contains at least 20 terms. How can I count the number of times these key words occur in each row of my dataframe? I have tried str_detect,grep(l), and %>% like, and looped over each row, but the problem seems to be that I want to detect too many terms, and these functions stop working when I use 15+ terms or so.
Would be sooo happy if anyone could help me out with this!
col1<- c("Henrik", "Joseph", "Lucy")
col2 <- c("I am going to get groceries", "He called me at six.", "No, he did not")
df <- data.frame(col1, col2)```
dict <- c("groceries", "going", "me") #but my actual dictionary is much larger

Create a unique identifier for your rows. Split your col2 by words, one in each row. Filter for only the select words in your dict. Then count by each row. Finally, combine with original df and set NA to Zeros for rows that don't have any words from your dict.
library(dplyr)
col1 <- c("A","B","A")
col2 <- c("I am going to get groceries", "He called me at six.", "No, he did not")
df <- data.frame(col1, col2, stringsAsFactors = FALSE)
dict <- c("groceries", "going", "me")
df <- df %>% mutate(row=row_number()) %>% select(row, everything())
counts <- df %>% tidyr::separate_rows(col2) %>% filter(col2 %in% dict) %>% group_by(row) %>% count(name = "counts")
final <- left_join(df, counts, by="row") %>% tidyr::replace_na(list(counts=0L))
final
#> row col1 col2 counts
#> 1 1 A I am going to get groceries 2
#> 2 2 B He called me at six. 1
#> 3 3 A No, he did not 0

Here is a base R option using gregexpr
dfout <- within(
df,
counts <- sapply(
gregexpr(paste0(dict, collapse = "|"), col2),
function(x) sum(x > 0)
)
)
or
dfout <- within(
df,
counts <- sapply(
regmatches(col2, gregexpr("\\w+", col2)),
function(v) sum(v %in% dict)
)
)
which gives
> dfout
col1 col2 counts
1 1 I am going to get groceries 2
2 2 He called me at six. 1
3 3 No, he did not 0
Data
structure(list(col1 = 1:3, col2 = c("I am going to get groceries",
"He called me at six.", "No, he did not")), class = "data.frame", row.names = c(NA,
-3L))

I think my solution gives you the output you want - that is for each word in your "dict" list, you can see how many times it appears in each sentence. Each row is an entry in df$col2 i.e. a sentence. "Dict" is your vector of terms that you're looking to match. We can loop over the vector and for each entry in the vector we match how many times that entry appears in each row/sentence using stringr::str_count. Note the syntax for str_count: str_count(string being checked over, expression you're trying to match)
str_count returns a vector showing how many times the word appears in each row. I create a data frame of these vectors which will contain the same number of rows as there are entries in the dict vector. Then you can just cbind "dict" to that data frame and you can see how many times each word is used in each sentence. I adjust the column names at very end so you can match the words to the sentence #'s. Note that if you want to calculate row means you'll need to subset out the "dict" column of the final data frame because it's character.
library(stringr)
col1<- c("Henrik", "Joseph", "Lucy")
col2 <- c("I am going to get groceries", "He called me at six.", "No, he
did not")
df <- data.frame(col1, col2)
dict <- c("groceries", "going", "me")
word_matches <- data.frame()
for (i in dict) {
word_tot<-(str_count(df$col2, i))
word_matches <- rbind(word_matches,word_tot)
}
word_matches
colnames(word_matches) <- paste("Sentence", 1:ncol(word_matches))
cbind(dict,word_matches)
dict Sentence 1 Sentence 2 Sentence 3
1 groceries 1 0 0
2 going 1 0 0
3 me 0 1 0

R regexp for odd sorting of a char vector

I have several hundred files that need their columns sorted in a convoluted way. Imagine a character vector x which is the result of names(foo) where foo is a data.frame:
x <- c("x1","i2","Component.1","Component.10","Component.143","Component.13",
"r4","A","C16:1n-7")
I'd like to have it ordered according to the following rule: First, alphabetical for anything starting with "Component". Second, alphabetical for anything remaining starting with "C" and a number. Third anything remaining in alphabetical order.
For x that would be:
x[c(3,4,6,5,9,8,2,7,1)]
Is this a regexp kind of task? And does one use match? Each file will have a different number of columns (so x will be of varying lengths). Any tips appreciated.

You can achieve that with the function order from base-r:
x <- c("x1","i2","Component.1","Component.10","Component.143","Component.13",
"r4","A","C16:1n-7")
order(
!startsWith(x, "Component"), # 0 - starts with component, 1 - o.w.
!grepl("^C\\d", x), # 0 - starts with C<NUMBER>, 1 - o.w.
x # alphabetical
)
# output: 3 4 6 5 9 8 2 7 1

A brute-force solution using only base R:
first = sort(x[grepl('^Component', x)])
second = sort(x[grepl('^C\\d', x)])
third = sort(setdiff(x, c(first, second)))
c(first, second, third)

We can split int to different elements and then use mixedsort from gtools
v1 <- c(gtools::mixedsort(grep("Component", x, value = TRUE)),
gtools::mixedsort(grep("^C\\d+", x, value = TRUE)))
c(v1, gtools::mixedsort(x[!x %in% v1]))
#[1] "Component.1" "Component.10" "Component.13" "Component.143" "C16:1n-7" "A" "i2" "r4"
#[9] "x1"
Or another option in select assuming that these are the columns of the data.frame
library(dplyr)
df1 %>%
select(mixedsort(starts_with('Component')),
mixedsort(names(.)[matches("^C\\d+")]),
gtools::mixedsort(names(.)[everything()]))
If it is just the order of occurrence
df1 %>%
select(starts_with('Component'), matches('^C\\d+'), sort(names(.)[everything()]))
data
set.seed(24)
df1 <- as.data.frame(matrix(rnorm(5 * 9), ncol = 9,
dimnames = list(NULL, x)))

How to rename individual column in dataframe [duplicate]

I know if I have a data frame with more than 1 column, then I can use
colnames(x) <- c("col1","col2")
to rename the columns. How to do this if it's just one column?
Meaning a vector or data frame with only one column.
Example:
trSamp <- data.frame(sample(trainer$index, 10000))
head(trSamp )
# sample.trainer.index..10000.
# 1 5907862
# 2 2181266
# 3 7368504
# 4 1949790
# 5 3475174
# 6 6062879
ncol(trSamp)
# [1] 1
class(trSamp)
# [1] "data.frame"
class(trSamp[1])
# [1] "data.frame"
class(trSamp[,1])
# [1] "numeric"
colnames(trSamp)[2] <- "newname2"
# Error in names(x) <- value :
# 'names' attribute [2] must be the same length as the vector [1]

This is a generalized way in which you do not have to remember the exact location of the variable:
# df = dataframe
# old.var.name = The name you don't like anymore
# new.var.name = The name you want to get
names(df)[names(df) == 'old.var.name'] <- 'new.var.name'
This code pretty much does the following:
names(df) looks into all the names in the df
[names(df) == old.var.name] extracts the variable name you want to check
<- 'new.var.name' assigns the new variable name.

colnames(trSamp)[2] <- "newname2"
attempts to set the second column's name. Your object only has one column, so the command throws an error. This should be sufficient:
colnames(trSamp) <- "newname2"

colnames(df)[colnames(df) == 'oldName'] <- 'newName'

This is an old question, but it is worth noting that you can now use setnames from the data.table package.
library(data.table)
setnames(DF, "oldName", "newName")
# or since the data.frame in question is just one column:
setnames(DF, "newName")
# And for reference's sake, in general (more than once column)
nms <- c("col1.name", "col2.name", etc...)
setnames(DF, nms)

This can also be done using Hadley's plyr package, and the rename function.
library(plyr)
df <- data.frame(foo=rnorm(1000))
df <- rename(df,c('foo'='samples'))
You can rename by the name (without knowing the position) and perform multiple renames at once. After doing a merge, for example, you might end up with:
letterid id.x id.y
1 70 2 1
2 116 6 5
3 116 6 4
4 116 6 3
5 766 14 9
6 766 14 13
Which you can then rename in one step using:
letters <- rename(letters,c("id.x" = "source", "id.y" = "target"))
letterid source target
1 70 2 1
2 116 6 5
3 116 6 4
4 116 6 3
5 766 14 9
6 766 14 13

I think the best way of renaming columns is by using the dplyr package like this:
require(dplyr)
df = rename(df, new_col01 = old_col01, new_col02 = old_col02, ...)
It works the same for renaming one or many columns in any dataset.

I find that the most convenient way to rename a single column is using dplyr::rename_at :
library(dplyr)
cars %>% rename_at("speed",~"new") %>% head
cars %>% rename_at(vars(speed),~"new") %>% head
cars %>% rename_at(1,~"new") %>% head
# new dist
# 1 4 2
# 2 4 10
# 3 7 4
# 4 7 22
# 5 8 16
# 6 9 10
works well in pipe chaines
convenient when names are stored in variables
works with a name or an column index
clear and compact

I like the next style for rename dataframe column names one by one.
colnames(df)[which(colnames(df) == 'old_colname')] <- 'new_colname'
where
which(colnames(df) == 'old_colname')
returns by the index of the specific column.

Let df be the dataframe you have with col names myDays and temp.
If you want to rename "myDays" to "Date",
library(plyr)
rename(df,c("myDays" = "Date"))
or with pipe, you can
dfNew <- df %>%
plyr::rename(c("myDays" = "Date"))

Try:
colnames(x)[2] <- 'newname2'

This is likely already out there, but I was playing with renaming fields while searching out a solution and tried this on a whim. Worked for my purposes.
Table1$FieldNewName <- Table1$FieldOldName
Table1$FieldOldName <- NULL
Edit begins here....
This works as well.
df <- rename(df, c("oldColName" = "newColName"))

You can use the rename.vars in the gdata package.
library(gdata)
df <- rename.vars(df, from = "oldname", to = "newname")
This is particularly useful where you have more than one variable name to change or you want to append or pre-pend some text to the variable names, then you can do something like:
df <- rename.vars(df, from = c("old1", "old2", "old3",
to = c("new1", "new2", "new3"))
For an example of appending text to a subset of variables names see:
https://stackoverflow.com/a/28870000/180892

You could also try 'upData' from 'Hmisc' package.
library(Hmisc)
trSamp = upData(trSamp, rename=c(sample.trainer.index..10000. = 'newname2'))

If you know that your dataframe has only one column, you can use:
names(trSamp) <- "newname2"

The OP's question has been well and truly answered. However, here's a trick that may be useful in some situations: partial matching of the column name, irrespective of its position in a dataframe:
Partial matching on the name:
d <- data.frame(name1 = NA, Reported.Cases..WHO..2011. = NA, name3 = NA)
## name1 Reported.Cases..WHO..2011. name3
## 1 NA NA NA
names(d)[grepl("Reported", names(d))] <- "name2"
## name1 name2 name3
## 1 NA NA NA
Another example: partial matching on the presence of "punctuation":
d <- data.frame(name1 = NA, Reported.Cases..WHO..2011. = NA, name3 = NA)
## name1 Reported.Cases..WHO..2011. name3
## 1 NA NA NA
names(d)[grepl("[[:punct:]]", names(d))] <- "name2"
## name1 name2 name3
## 1 NA NA NA
These were examples I had to deal with today, I thought might be worth sharing.

I would simply change a column name to the dataset with the new name I want with the following code:
names(dataset)[index_value] <- "new_col_name"

I found colnames() argument easier
https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/row%2Bcolnames
select some column from the data frame
df <- data.frame(df[, c( "hhid","b1005", "b1012_imp", "b3004a")])
and rename the selected column in order,
colnames(df) <- c("hhid", "income", "cost", "credit")
check the names and the values to be sure
names(df);head(df)

I would simply add a new column to the data frame with the name I want and get the data for it from the existing column. like this:
dataf$value=dataf$Article1Order
then I remove the old column! like this:
dataf$Article1Order<-NULL
This code might seem silly! But it works perfectly...

We can use rename_with to rename columns with a function (stringr functions, for example).
Consider the following data df_1:
df_1 <- data.frame(
x = replicate(n = 3, expr = rnorm(n = 3, mean = 10, sd = 1)),
y = sample(x = 1:2, size = 10, replace = TRUE)
)
names(df_1)
#[1] "x.1" "x.2" "x.3" "y"
Rename all variables with dplyr::everything():
library(tidyverse)
df_1 %>%
rename_with(.data = ., .cols = everything(.),
.fn = str_replace, pattern = '.*',
replacement = str_c('var', seq_along(.), sep = '_')) %>%
names()
#[1] "var_1" "var_2" "var_3" "var_4"
Rename by name particle with some dplyr verbs (starts_with, ends_with, contains, matches, ...).
Example with . (x variables):
df_1 %>%
rename_with(.data = ., .cols = contains('.'),
.fn = str_replace, pattern = '.*',
replacement = str_c('var', seq_along(.), sep = '_')) %>%
names()
#[1] "var_1" "var_2" "var_3" "y"
Rename by class with many functions of class test, like is.integer, is.numeric, is.factor...
Example with is.integer (y):
df_1 %>%
rename_with(.data = ., .cols = is.integer,
.fn = str_replace, pattern = '.*',
replacement = str_c('var', seq_along(.), sep = '_')) %>%
names()
#[1] "x.1" "x.2" "x.3" "var_1"
The warning:
Warning messages:
1: In stri_replace_first_regex(string, pattern, fix_replacement(replacement), :
longer object length is not a multiple of shorter object length
2: In names[cols] <- .fn(names[cols], ...) :
number of items to replace is not a multiple of replacement length
It is not relevant, as it is just an inconsistency of seq_along(.) with the replace function.

library(dplyr)
rename(data, de=de.y)

finding similar element between two data

I asked a question before which was complicated and I did not get any help. So I tried to simplify the question and input output.
I have tried many ways but none worked for example , I sort down some
# 1
for(i in ncol(mydata)){
corsA = grep(colnames(mydata)[i] , colnames(mysecond))
mydata[,corsA]%in%mysecond[,i]}
# here if I get true then means they have match
## 2
are.cols.identical <- function(col1, col2) identical(mydata[,col1], mysecond[,col2])
res <- outer(colnames(mydata), colnames(mysecond),FUN = Vectorize(are.cols.identical))
cut <- apply(res, 1, function(x)match(TRUE, x))
### 3
(mydata$Rad) %in% (mysecond$Ro5_P1_A5)
#### 4
which(mydata %in% mysecond)
#### 5
match(mydata$sus., mysecond$R5_P1_A5)
or
which(mydata$sus. %in% mysecond$RP1_A5)
matches <- sapply(mydata,function(x) sapply(mysecond,identical,x))
and few others, but none led me to an answer

Here is another solution using regex:
rows<-mapply(grep,mysecond,mydata)
The step above will return a list with the matched rows in each column:
rows
If you would like to see how many rows where matched you can do this:
lapply(rows,length)
Now we can go ahead a get the rows of interest in mydata, but rows is a list so we need to unlist() and we might have some duplicate rows, and we don't want them to appear twice in the output, so we use the unique() function:
rows<-unique(unlist(rows))
mydata[rows,]
#View(mydata[rows,])

require(plyr)
dat <- strsplit(as.character(mydata$subunits..UniProt.IDs.), ',')
dat <- data.frame(mydata[,1],rbind.fill(lapply(dat,function(y){as.data.frame(t(y),stringsAsFactors=FALSE)})))
mydata[unlist(apply(dat,2, function(x) which(x %in% mysecond[,2]))),]

How to rename a single column in a data.frame?

I know if I have a data frame with more than 1 column, then I can use
colnames(x) <- c("col1","col2")
to rename the columns. How to do this if it's just one column?
Meaning a vector or data frame with only one column.
Example:
trSamp <- data.frame(sample(trainer$index, 10000))
head(trSamp )
# sample.trainer.index..10000.
# 1 5907862
# 2 2181266
# 3 7368504
# 4 1949790
# 5 3475174
# 6 6062879
ncol(trSamp)
# [1] 1
class(trSamp)
# [1] "data.frame"
class(trSamp[1])
# [1] "data.frame"
class(trSamp[,1])
# [1] "numeric"
colnames(trSamp)[2] <- "newname2"
# Error in names(x) <- value :
# 'names' attribute [2] must be the same length as the vector [1]

This is a generalized way in which you do not have to remember the exact location of the variable:
# df = dataframe
# old.var.name = The name you don't like anymore
# new.var.name = The name you want to get
names(df)[names(df) == 'old.var.name'] <- 'new.var.name'
This code pretty much does the following:
names(df) looks into all the names in the df
[names(df) == old.var.name] extracts the variable name you want to check
<- 'new.var.name' assigns the new variable name.

colnames(trSamp)[2] <- "newname2"
attempts to set the second column's name. Your object only has one column, so the command throws an error. This should be sufficient:
colnames(trSamp) <- "newname2"

colnames(df)[colnames(df) == 'oldName'] <- 'newName'

This is an old question, but it is worth noting that you can now use setnames from the data.table package.
library(data.table)
setnames(DF, "oldName", "newName")
# or since the data.frame in question is just one column:
setnames(DF, "newName")
# And for reference's sake, in general (more than once column)
nms <- c("col1.name", "col2.name", etc...)
setnames(DF, nms)

This can also be done using Hadley's plyr package, and the rename function.
library(plyr)
df <- data.frame(foo=rnorm(1000))
df <- rename(df,c('foo'='samples'))
You can rename by the name (without knowing the position) and perform multiple renames at once. After doing a merge, for example, you might end up with:
letterid id.x id.y
1 70 2 1
2 116 6 5
3 116 6 4
4 116 6 3
5 766 14 9
6 766 14 13
Which you can then rename in one step using:
letters <- rename(letters,c("id.x" = "source", "id.y" = "target"))
letterid source target
1 70 2 1
2 116 6 5
3 116 6 4
4 116 6 3
5 766 14 9
6 766 14 13

I think the best way of renaming columns is by using the dplyr package like this:
require(dplyr)
df = rename(df, new_col01 = old_col01, new_col02 = old_col02, ...)
It works the same for renaming one or many columns in any dataset.

I find that the most convenient way to rename a single column is using dplyr::rename_at :
library(dplyr)
cars %>% rename_at("speed",~"new") %>% head
cars %>% rename_at(vars(speed),~"new") %>% head
cars %>% rename_at(1,~"new") %>% head
# new dist
# 1 4 2
# 2 4 10
# 3 7 4
# 4 7 22
# 5 8 16
# 6 9 10
works well in pipe chaines
convenient when names are stored in variables
works with a name or an column index
clear and compact

I like the next style for rename dataframe column names one by one.
colnames(df)[which(colnames(df) == 'old_colname')] <- 'new_colname'
where
which(colnames(df) == 'old_colname')
returns by the index of the specific column.

Let df be the dataframe you have with col names myDays and temp.
If you want to rename "myDays" to "Date",
library(plyr)
rename(df,c("myDays" = "Date"))
or with pipe, you can
dfNew <- df %>%
plyr::rename(c("myDays" = "Date"))

Try:
colnames(x)[2] <- 'newname2'

This is likely already out there, but I was playing with renaming fields while searching out a solution and tried this on a whim. Worked for my purposes.
Table1$FieldNewName <- Table1$FieldOldName
Table1$FieldOldName <- NULL
Edit begins here....
This works as well.
df <- rename(df, c("oldColName" = "newColName"))

You can use the rename.vars in the gdata package.
library(gdata)
df <- rename.vars(df, from = "oldname", to = "newname")
This is particularly useful where you have more than one variable name to change or you want to append or pre-pend some text to the variable names, then you can do something like:
df <- rename.vars(df, from = c("old1", "old2", "old3",
to = c("new1", "new2", "new3"))
For an example of appending text to a subset of variables names see:
https://stackoverflow.com/a/28870000/180892

You could also try 'upData' from 'Hmisc' package.
library(Hmisc)
trSamp = upData(trSamp, rename=c(sample.trainer.index..10000. = 'newname2'))

If you know that your dataframe has only one column, you can use:
names(trSamp) <- "newname2"

The OP's question has been well and truly answered. However, here's a trick that may be useful in some situations: partial matching of the column name, irrespective of its position in a dataframe:
Partial matching on the name:
d <- data.frame(name1 = NA, Reported.Cases..WHO..2011. = NA, name3 = NA)
## name1 Reported.Cases..WHO..2011. name3
## 1 NA NA NA
names(d)[grepl("Reported", names(d))] <- "name2"
## name1 name2 name3
## 1 NA NA NA
Another example: partial matching on the presence of "punctuation":
d <- data.frame(name1 = NA, Reported.Cases..WHO..2011. = NA, name3 = NA)
## name1 Reported.Cases..WHO..2011. name3
## 1 NA NA NA
names(d)[grepl("[[:punct:]]", names(d))] <- "name2"
## name1 name2 name3
## 1 NA NA NA
These were examples I had to deal with today, I thought might be worth sharing.

I would simply change a column name to the dataset with the new name I want with the following code:
names(dataset)[index_value] <- "new_col_name"

I found colnames() argument easier
https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/row%2Bcolnames
select some column from the data frame
df <- data.frame(df[, c( "hhid","b1005", "b1012_imp", "b3004a")])
and rename the selected column in order,
colnames(df) <- c("hhid", "income", "cost", "credit")
check the names and the values to be sure
names(df);head(df)

I would simply add a new column to the data frame with the name I want and get the data for it from the existing column. like this:
dataf$value=dataf$Article1Order
then I remove the old column! like this:
dataf$Article1Order<-NULL
This code might seem silly! But it works perfectly...

We can use rename_with to rename columns with a function (stringr functions, for example).
Consider the following data df_1:
df_1 <- data.frame(
x = replicate(n = 3, expr = rnorm(n = 3, mean = 10, sd = 1)),
y = sample(x = 1:2, size = 10, replace = TRUE)
)
names(df_1)
#[1] "x.1" "x.2" "x.3" "y"
Rename all variables with dplyr::everything():
library(tidyverse)
df_1 %>%
rename_with(.data = ., .cols = everything(.),
.fn = str_replace, pattern = '.*',
replacement = str_c('var', seq_along(.), sep = '_')) %>%
names()
#[1] "var_1" "var_2" "var_3" "var_4"
Rename by name particle with some dplyr verbs (starts_with, ends_with, contains, matches, ...).
Example with . (x variables):
df_1 %>%
rename_with(.data = ., .cols = contains('.'),
.fn = str_replace, pattern = '.*',
replacement = str_c('var', seq_along(.), sep = '_')) %>%
names()
#[1] "var_1" "var_2" "var_3" "y"
Rename by class with many functions of class test, like is.integer, is.numeric, is.factor...
Example with is.integer (y):
df_1 %>%
rename_with(.data = ., .cols = is.integer,
.fn = str_replace, pattern = '.*',
replacement = str_c('var', seq_along(.), sep = '_')) %>%
names()
#[1] "x.1" "x.2" "x.3" "var_1"
The warning:
Warning messages:
1: In stri_replace_first_regex(string, pattern, fix_replacement(replacement), :
longer object length is not a multiple of shorter object length
2: In names[cols] <- .fn(names[cols], ...) :
number of items to replace is not a multiple of replacement length
It is not relevant, as it is just an inconsistency of seq_along(.) with the replace function.

library(dplyr)
rename(data, de=de.y)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Partial string match in another dataframe in r - r

We can use fuzzyjoin library(fuzzyjoin) regex_left_join(df_2, df_1, by = c("DF_2"= "DF_1"))

This sounds like a job for grepl()! E.g. grepl(value, chars, fixed = TRUE) Let me quote an example from a different answer: > chars <- "test" > value <- "es" > grepl(value, chars) [1] TRUE > chars <- "test" > value <- "et" > grepl(value, chars) [1] FALSE

Related

R: Counting frequency of words from predefined dictionary

R regexp for odd sorting of a char vector

How to rename individual column in dataframe [duplicate]

finding similar element between two data

How to rename a single column in a data.frame?

Categories

Resources