When I do:
> quo(DLX6-AS1)
The output is:
<quosure>
expr: ^DLX6 - AS1
env: global
Which inserts spaces around the dash.
When I try to convert that to a string, I get either:
quo(DLX6-AS1) %>% quo_name
"DLX6 - AS1"
or
quo(DLX6-AS1) %>% rlang::quo_name
or
quo(`DLX6-AS1`) %>% rlang::quo_name
Error: Can't convert a call to a string
How can I make it possible to use strings with dashes in my function? The function takes in a gene name and looks up that row in a dataframe, but some of the genes are concatenated by a dash:
geneFn <- function(exp.df = seurat.object#data, gene = SOX2) {
gene <- enquo(gene)
exp.df <- exp.df[as_name(gene), ]
}
> geneFn(DLX6-AS1)
Thanks!
This has been asked before here: https://github.com/r-lib/rlang/issues/770 , but it doesn't answer how to actually do this.
What version of rlang do you have? For me this works:
quo(`DLX6-AS1`) %>% quo_name()
#> [1] "DLX6-AS1"
You do need to use backticks when column names have special characters, otherwise they are interpreted as code.
Note that it is recommended to use either as_name() or as_label() instead of quo_name(), the latter was a misleading misnomer and might be deprecated in the future.
One option would be to stick with bare row names but wrap names that aren't syntactically valid (like names with dashes) in backticks. This could be confusing if someone else is supposed to use this function.
Here's a small, reproducible example:
library(rlang)
dat = data.frame(x1 = letters[1:2],
x2 = LETTERS[1:2])
row.names(dat) = c("DLX6-AS1", "other")
geneFn <- function(exp.df = dat, gene = other) {
gene <- enquo(gene)
exp.df[as_name(gene), ]
}
geneFn(gene = other)
# x1 x2
# other b B
geneFn(gene = `DLX6-AS1`)
# x1 x2
# DLX6-AS1 a A
If you have many names like this, it may be simpler to pass quoted names instead of bare names. This also simplifies the function a bit since you don't need tidyeval.
geneFn2 <- function(exp.df = dat, gene = "other") {
exp.df[gene, ]
}
geneFn2(gene = "other")
# x1 x2
# other b B
geneFn2(gene = "DLX6-AS1")
# x1 x2
# DLX6-AS1 a A
Another option is to make syntactically valid names row names. The make.names() function can help with this.
make.names( row.names(dat) )
[1] "DLX6.AS1" "other"
Then you could assign these new row names to replace the old and go ahead with your original function with the new names.
row.names(dat) = make.names( row.names(dat) )
What about:
geneFn <- function(exp.df = seurat.object#data, gene = SOX2) {
gene <- sub(" - ","-", deparse(enexpr(gene)))
exp.df <- exp.df[gene, ]
}
Related
I am a beginner and I'm trying to find the most efficient way to change the name of the first column for many CSV files that I will be creating. Once I have created the CSV files, I am loading them into R as follows:
data <- read.csv('filename.csv')
I have used the names() function to do the name change of a single file:
names(data)[1] <- 'Y'
However, I would like to find the most efficient way of combining/piping this name change to read.csv so the same name change is applied to every file when they are opened. I tried to write a 'simple' function to do this:
addName <- function(data) {
names(data)[1] <- 'Y'
data
}
However, I do not yet fully understand the syntax for writing a function and I can't get this to work.
Note
If you were expecting your original addName function to "mutate" an existing object like so
x <- data.frame(Column_1 = c(1, 2, 3), Column_2 = c("a", "b", "c"))
# Try (unsuccessfully) to change title of "Column_1" to "Y" in x.
addName(x)
# Print x.
x
please be aware that R passes by value rather than by reference, so x itself would remain unchanged:
Column_1 Column_2
1 1 a
2 2 b
3 3 c
Any "mutation" would be achieved by overwriting x with the return value of the function
x <- addName(x)
# Print x.
x
in which case x itself would obviously be changed:
Y Column_2
1 1 a
2 2 b
3 3 c
Answer
Anyway, here's a solution that compactly incorporates pipes (%>% from the magrittr package) and a custom function. Please note that without the linebreaks and comments, which I have added for clarity, this could be condensed to only a few lines of code.
# The dplyr package helps with easy renaming, and it includes the magrittr pipe.
library(dplyr)
# ...
filenames <- c("filename1.csv", "filename2.csv", "filename3.csv")
# A function to take a CSV filename and give back a renamed dataset taken from that file.
addName <- function(filename) {
return(# Read in the named file as a data.frame.
read.csv(file = filename) %>%
# Take the resulting data.frame, and rename its first column as "Y";
# quotes are optional, unless the name contains spaces: "My Column"
# or `My Column` are needed then.
dplyr::rename(Y = 1))
}
# Get a list of all the renamed datasets, as taken by addName() from each of the filenames.
all_files <- sapply(filenames, FUN = addName,
# Keep the list structure, in which each element is a
# data.frame.
simplify = FALSE,
# Name each list element by its filename, to help keep track.
USE.NAMES = TRUE)
In fact, you could easily rename any columns you desire, all in one fell swoop:
dplyr::rename(Y = 1, 'X' = 2, "Z" = 3, "Column 4" = 4, `Column 5` = 5)
This will read a vector of filenames, change the name of the first column of each one to "Y" and store all of the files in a list.
filenames <- c("filename1.csv","filename2.csv")
addName <- function(filename) {
data <- read.csv(filename)
names(data)[1] <- 'Y'
data
}
files <- list()
for (i in 1:length(filenames)) {
files[[i]] <- addName(filenames[i])
}
I'm a newbie in R, so please have some patience and... tips are most welcome.
My goal is to create tibble that holds a "Full Name" (of a person, that may have 2 to 4 names) and his/her gender. I must start from a tibble that contains typical Male and Female names.
Below I present a minimum working example.
My problem: I can call get_name() multiple time (in 10.000 for loop!!) and get the right answer. But, I was looking for a more 'elegant' way of doing it. replicate() unfortunately returns a vector... which make it unusable.
My doubts: I know I have some (very few... right!!) issues, like the if statement, that is evaluated every time (which is redundant), but I don't find another way to do it. Any suggestion?
Any other suggestions about code struct are also welcome.
Thank you very much in advance for your help.
# Dummy name list
unit_names <- tribble(
~Women, ~Man,
"fem1", "male1",
"fem2", "male2",
"fem3", "male3",
"fem4", "male4",
"fem5", "male5",
"fem6", NA,
"fem7", NA
)
set.seed(12345) # seed for test
# Create a tibble with the full names
full_name <- tibble("Full Name" = character(), "Gender" = character() )
get_name <- function() {
# Get the Number of 'Unit-names' to compose a 'Full-name'
nbr_names <- sample(2:4, 1, replace = TRUE)
# Randomize the Gender
gender <- sample(c("Women", "Man"), 1, replace = TRUE)
if (gender == "Women") {
lim_names <- sum( !is.na(unit_names$"Women"))
} else {
lim_names <- sum( !is.na(unit_names$"Man"))
}
# Sample the Fem/Man List names (may have duplicate)
sample(unlist(unit_names[1:lim_names, gender]), nbr_names, replace = TRUE) %>%
# Form a Full-name
paste ( . , collapse = " ") %>%
# Add it to the tibble (INCLUDE the Gender)
add_row(full_name, "Full Name" = . , "Gender" = gender)
}
# How can I make 10k of this?
full_name <- get_name()
If you pass a larger number than 1 to sample this problem becomes easier to vectorise.
One thing that currently makes your problem much harder is the layout of your unit_names table: you are effectively treating male and female names as individually paired, but they clearly aren’t: hence they shouldn’t be in columns of the same table. Use a list of two vectors, for instance:
unit_names = list(
Women = c("fem1", "fem2", "fem3", "fem4", "fem5", "fem6", "fem7"),
Men = c("male1", "male2", "male3", "male4", "male5")
)
Then you can generate random names to your heart’s delight:
generate_names = function (n, unit_names) {
name_length = sample(2 : 4, n, replace = TRUE)
genders = sample(c('Women', 'Men'), n, replace = TRUE)
names = Map(sample, unit_names[genders], name_length, replace = TRUE) %>%
lapply(paste, collapse = ' ') %>%
unlist()
tibble(`Full name` = names, Gender = genders)
}
A note on style, unlike your function the above doesn’t use any global variables. Furthermore, don’t "quote" variable names (you do this in unit_names$"Women" and for the arguments of add_row). R allows this, but this is arguably a mistake in the language specification: these are not strings, they’re variable names, making them look like strings is misleading. You don’t quote your other variable names, after all. You do need to backtick-quote the `Full name` column name, since it contains a space. However, the use of backticks, rather than quotes, signifies that this is a variable name.
I am not 100% of what you are trying to get, but if I got it right...did you try with mutate at dplyr? For example:
result= mutate(data.frame,
concated_column = paste(column1, column2, column3, column4, sep = '_'))
With a LITTLE help from Konrad Rudolph, the following elegant (and vectorized ... and fast) solution that I was looking. map2 does the necessary trick.
Here is the full working example if someone needs it:
(Just a side note: I kept the initial conversion from tibble to list because the data arrives to me as a tibble...)
Once again thanks to Konrad.
# Dummy name list
unit_names <- tribble(
~Women, ~Men,
"fem1", "male1",
"fem2", "male2",
"fem3", "male3",
"fem4", "male4",
"fem5", "male5",
"fem6", NA,
"fem7", NA
)
name_list <- list(
Women = unit_names$Women[!is.na(unit_names$Women)],
Men = unit_names$Men[!is.na(unit_names$Men)]
)
generate_names = function (n, name_list) {
name_length = sample(2 : 4, n, replace = TRUE)
genders = sample(c('Women', 'Men'), n, replace = TRUE)
#names = lapply(name_list[genders], sample, name_length) %>%
names = map2(name_list[genders], name_length, sample) %>%
lapply(paste, collapse = ' ') %>%
unlist()
tibble(`Full name` = names, Gender = genders)
}
full_name <- generate_names(10000, name_list)
I'd like to change the variable names in my data.frame from e.g. "pmm_StartTimev4_E2_C19_1" to "pmm_StartTimev4_E2_C19". So if the name ends with an underscore followed by any number it gets removed.
But I'd like for this to happen only if the variable name has the word "Start" in it.
I've got a muddled up bit of code that doesn't work. Any help would be appreciated!
# Current data frame:
dfbefore <- data.frame(a=c("pmm_StartTimev4_E2_C19_1","pmm_StartTimev4_E2_E2_C1","delivery_C1_C12"),b=c("pmm_StartTo_v4_E2_C19_2","complete_E1_C12_1","pmm_StartTo_v4_E2_C19"))
# Desired data frame:
dfafter <- data.frame(a=c("pmm_StartTimev4_E2_C19","pmm_StartTimev4_E2_E2_C1","delivery_C1_C12"),b=c("pmm_StartTo_v4_E2_C19","complete_E1_C12_1","pmm_StartTo_v4_E2_C19"))
# Current code:
sub((.*{1,}[0-9]*).*","",grep("Start",names(df),value = TRUE)
How about something like this using gsub().
stripcol <- function(x) {
gsub("(.*Start.*)_\\d+$", "\\1", as.character(x))
}
dfnew <- dfbefore
dfnew[] <- lapply(dfbefore, stripcol)
We use the regular expression to look for "Start" and then grab everything but the underscore number at the end. We use lapply to apply the function to all columns.
doit <- function(x){
x <- as.character(x)
if(grepl("Start",x)){
x <- gsub("_([0-9])","",x)
}
return(x)
}
apply(dfbefore,c(1,2),doit)
a b
[1,] "pmm_StartTimev4_E2_C19" "pmm_StartTo_v4_E2_C19"
[2,] "pmm_StartTimev4_E2_E2_C1" "complete_E1_C12_1"
[3,] "delivery_C1_C12" "pmm_StartTo_v4_E2_C19"
We can use sub to capture groups where the 'Start' substring is also present followed by an underscore and one or more numbers. In the replacement, use the backreference of the captured group. As there are multiple columns, use lapply to loop over the columns, apply the sub and assign the output back to the original data
out <- dfbefore
out[] <- lapply(dfbefore, sub,
pattern = "^(.*_Start.*)_\\d+$", replacement ="\\1")
out
dfafter[] <- lapply(dfafter, as.character)
all.equal(out, dfafter, check.attributes = FALSE)
#[1] TRUE
I have an issue about replacing strings with the new ones conditionally.
I put short version of my real problem so far its working however I need a better solution since there are many rows in the real data.
strings <- c("ca_A33","cb_A32","cc_A31","cd_A30")
Basicly I want to replace strings with replace_strings. First item in the strings replaced with the first item in the replace_strings.
replace_strings <- c("A1","A2","A3","A4")
So the final string should look like
final string <- c("ca_A1","cb_A2","cc_A3","cd_A4")
I write some simple function assign_new
assign_new <- function(x){
ifelse(grepl("A33",x),gsub("A33","A1",x),
ifelse(grepl("A32",x),gsub("A32","A2",x),
ifelse(grepl("A31",x),gsub("A31","A3",x),
ifelse(grepl("A30",x),gsub("A30","A4",x),x))))
}
assign_new(strings)
[1] "ca_A1" "cb_A2" "cc_A3" "cd_A4"
Ok it seems we have solution. But lets say if I have A1000 to A1 and want to replace them from A1 to A1000 I need to do 1000 of rows of ifelse statement. How can we tackle that?
If your vectors are ordered to be matched, then you can use:
> paste0(gsub("(.*_)(.*)","\\1", strings ), replace_strings)
[1] "ca_A1" "cb_A2" "cc_A3" "cd_A4"
You can use regmatches.First obtain all the characters that are followed by _ using regexpr then replace as shown below
`regmatches<-`(strings,regexpr("(?<=_).*",strings,perl = T),value=replace_strings)
[1] "ca_A1" "cb_A2" "cc_A3" "cd_A4"
Not the fastests but very tractable and easy to maintain:
for (i in 1:length(strings)) {
strings[i] <- gsub("\\d+$", i, strings[i])
}
"\\d+$" just matches any number at the end of the string.
EDIT: Per #Onyambu's comment, removing map2_chr as paste is a vectorized function.
foo <- function(x, y){
x <- unlist(lapply(strsplit(x, "_"), '[', 1))
paste(x, y, sep = "_"))
}
foo(strings, replace_strings)
with x being strings and y being replace_strings. You first split the strings object at the _ character, and paste with the respective replace_strings object.
EDIT:
For objects where there is no positional relationship you could create a reference table (dataframe, list, etc.) and match your values.
reference_tbl <- data.frame(strings, replace_strings)
foo <- function(x){
y <- reference_tbl$replace_strings[match(x, reference_tbl$strings)]
x <- unlist(lapply(strsplit(x, "_"), '[', 1))
paste(x, y, sep = "_")
}
foo(strings)
Using the dplyr package:
strings <- c("ca_A33","cb_A32","cc_A31","cd_A30")
replace_strings <- c("A1","A2","A3","A4")
df <- data.frame(strings, replace_strings)
df <- mutate(rowwise(df),
strings = gsub("_.*",
paste0("_", replace_strings),
strings)
)
df <- select(df, strings)
Output:
# A tibble: 4 x 1
strings
<chr>
1 ca_A1
2 cb_A2
3 cc_A3
4 cd_A4
yet another way:
mapply(function(x,y) gsub("(\\w\\w_).*",paste0("\\1",y),x),strings,replace_strings,USE.NAMES=FALSE)
# [1] "ca_A1" "cb_A2" "cc_A3" "cd_A4"
imported tibble from textfile. Many numeric columns are imported as "chr". I guess it's because they contain a "," instead of a ".".
My goal is to write a loop which runs through the names of desired columns, replaces "," with "." and converts columns into "num".
Little example:
data <- data.frame("A1" =c("2,1","2,1","2,1"), "A2" =c("1,3","1,3","1,3"),
stringsAsFactors = F) %>% as.tibble() #example data
colname <- c("A1", "A2") #creating variable for loop
for(i in colname) {
nam <- paste0("data$", i)
assign(nam, as.numeric(gsub(",",".", eval(parse(text = paste0("data$",i))))) )
}
Instead of overwriting the existing column, R creates a new variable:
data$A1 # that's the existing column as part of the tibble
[1] "2,1" "2,1" "2,1"
`data$A1` # thats just a new variable. mind the little``
[1] 2.1 2.1 2.1
I also tried to assign (<-) the new numeric values via eval, but that does not work either.
eval(parse(text = paste0("data$", i))) <- as.numeric(
gsub(",",".", eval(parse(text = paste0("data$",i)))))
Error: target of assignment expands to non-language object
Any suggestions on how to transform? I have the same issue with other columns that I want to aggregate to a new variable. This variable should also be part of the existing tibble. I could do it by hand. This would take lots of time and probably produce many mistakes.
Thanks a lot!
Sam
As you are already working with the tidyverse, you can use dplyr::mutate_at and the colname variable you have already defined.
data %>%
mutate_at(.vars = colname,
.funs = function(x) { as.numeric(gsub(",", ".", x)) })