I can construct a data.frame like this -
data.frame('a_1' = 3)
However, I want to make the column-name a1 as variable. So I tried this -
data.frame(get(paste("a", 1, sep = "_")) = 3)
With this I get below error -
Error: unexpected '=' in "data.frame(get(paste("a", 1, sep = "_")) ="
Can you please help me to understand the right approach to make the colnames as variable?
Thanks for your pointer.
We can use tibble with := to do this
library(stringr)
library(tibble)
tibble(!! str_c("a", "_", 1) := 3)
-output
# A tibble: 1 x 1
a_1
<dbl>
1 3
In base R, this can be done using setNames
df1 <- setNames(data.frame(3), paste0("a", "_", 1))
-output
df1
a_1
1 3
Or if it is only for a specific number of columns, create the dataset, and use names
df1 <- data.frame(3)
names(df1)[1] <- paste0("a_", 1)
Related
I have not been programming for that long and have now encountered a problem to which I have not yet been able to find a solution.
In my dataframe there is a column that contains several pieces of information. For example, one row looks like this:
sp|O94910|AGRL1_HUMAN
or like this
sp|Q13554|KCC2B_HUMAN;sp|Q13555|KCC2G_HUMAN
Now I want to create a new column with the combination of digits between the two vertical bars.
For the upper example it would be O94910, for the lower Q13554; Q13555
I have already tried functions like str_extract_all, str_match or gsub. But nothing worked.
The "id" is the column I look at. It includes different combinations of digits. I need the one between the two |
> dput(head(anaDiff_PD_vs_CTRL$id, 10))
c("sp|O94910|AGRL1_HUMAN", "sp|P02763|A1AG1_HUMAN", "sp|P19652|A1AG2_HUMAN",
"sp|P25311|ZA2G_HUMAN", "sp|Q8NFZ8|CADM4_HUMAN", "sp|P08174|DAF_HUMAN",
"sp|Q15262|PTPRK_HUMAN", "sp|P78324|SHPS1_HUMAN;sp|Q5TFQ8|SIRBL_HUMAN;sp|Q9P1W8|SIRPG_HUMAN",
"sp|Q8N3J6|CADM2_HUMAN", "sp|P19021|AMD_HUMAN")>
With dplyr and stringr you can try...
library(dplyr)
library(stringr)
dat %>%
rowwise() %>%
mutate(dig = str_extract_all(col, "(?<=sp\\|)[A-Z0-9]+(?=\\|)"),
dig = paste0(dig, collapse = "; "))
#> # A tibble: 4 x 2
#> # Rowwise:
#> col dig
#> <chr> <chr>
#> 1 sp|Q8NFZ8|CADM4_HUMAN Q8NFZ8
#> 2 sp|94910|AGRL1_HUMAN 94910
#> 3 sp|O94910|AGRL1_HUMAN O94910
#> 4 sp|Q13554|KCC2B_HUMAN;sp|Q13555|KCC2G_HUMAN Q13554; Q13555
data
dat <- data.frame(col = c("sp|Q8NFZ8|CADM4_HUMAN", "sp|94910|AGRL1_HUMAN", "sp|O94910|AGRL1_HUMAN", "sp|Q13554|KCC2B_HUMAN;sp|Q13555|KCC2G_HUMAN"))
Created on 2022-02-02 by the reprex package (v2.0.1)
Here is a solution without tidyverse:
dat <- read.table(text = "
sp|Q8NFZ8|CADM4_HUMAN
sp|94910|AGRL1_HUMAN
sp|O94910|AGRL1_HUMAN
sp|Q13554|KCC2B_HUMAN;sp|Q13555|KCC2G_HUMAN")
ids <- strsplit(dat$V1, ";")
ids <- lapply(ids, function(x) gsub("sp\\|([[:alnum:]]*)\\|.*", "\\1", x))
ids <- lapply(ids, function(x) paste(x, collapse="; "))
dat$newcol <- unlist(ids)
Even with tidyverse, I would define a helper function for more clarity:
extract_ids <- function(x) {
ids <- strsplit(x, ";")
ids <- map(ids, ~ gsub("sp\\|([[:alnum:]]*)\\|.*", "\\1", .))
ids <- map(ids, ~ paste(., collapse="; "))
unlist(ids)
}
dat <- dat %>% mutate(ids = extract_ids(V1))
This solution should help if you want to change your column names in a similar fashion:
library(tidyverse)
# create test data frame with column names "sp|O94910|AGRL1_HUMAN" and "sp|Q13554|KCC2B_HUMAN;sp|Q13555|KCC2G_HUMAN"
col1 <- c(1,2,3,4,5)
col2 <- c(6,7,8,9,10)
df <- data.frame(col1, col2)
names(df)[1] <- "sp|O94910|AGRL1_HUMAN"
names(df)[2] <- "sp|Q13554|KCC2B_HUMAN;sp|Q13555|KCC2G_HUMAN"
names <- as.data.frame((str_split(colnames(df), "\\|", simplify = TRUE))) # split the strings representing the column names seperated by "|" into a list
# remove all strings that contain less digits than letters or special characters
for(i in 1:nrow(names)) {
for(j in 1:ncol(names)){
if ( (str_count(as.vector(str_split(names[i,j], "\\|", simplify = TRUE)), "[0-9]") >
str_count(as.vector(str_split(names[i,j], "\\|", simplify = TRUE)), "[:alpha:]|[:punct:]") )){
names[i,j] <- names[i,j]
} else {
names[i,j] <- ""
}
}
}
# combine the list columns into a single column calles "colnames"
names <- names %>% unite("colnames", 1:5, na.rm = TRUE, remove = TRUE, sep = ";")
# remove all ";" separators at the start of the strings, the end of the strings, and series of ";" into a single ";"
for (i in 1:nrow(names)){
names[i,] <- str_replace(names[i,],"\\;+$", "") %>%
str_replace("^\\;+", "") %>%
str_replace("\\;{2}", ";")
}
# convert column with new names into a vector
new_names <- as.vector(names$colnames)
# replace old names with new names
names(df) <- new_names
I have a data frame df with 7 columns and I have a list z containing multiple strings.
I want a dataframe containing only the columns in df which contain the sting from z.
df <- data.frame("a_means","b_means","c_means","d_means","e_mean","f_means","g_means")
z <- c("a_m","c_m","f_m")
How do I get the column number of the z strings in df? Or how do I get a dataframe with only the columns which contains the z strings.
What I want is:
print(df)
"a_means" "c_m" "f_m"
What I tried:
match(a, names(df)
and
df[,which(colnames(df) %in% colnames(df[ ,grepl(z,names(df)])]
You can use:
df[,match(z, substring(colnames(df), 1, 3))]
With base R:
z <- paste(z, collapse = "|")
df[, grepl(z, names(df))] # you could use grep as well
Combine the search patterns and use that as a pattern for stringr::str_detect() function.
library(dplyr)
library(stringr)
df <- data.frame(a_means = "a_means",
b_means = "b_means",
c_means = "c_means",
d_means = "d_means",
e_means = "e_means",
f_means = "f_means",
g_means = "g_means"
)
z <- c("a_m","c_m","f_m")
z <- paste(z, collapse = "|")
df %>% select_if(str_detect(names(df), z))
#> a_means c_means f_means
#> 1 a_means c_means f_means
You can simply do this:
library(dplyr)
df %>%
select(contains(z))
Check out help("starts_with"). You can also match to a starting prefix with starts_with() among other things.
You can use select and matches to subest the columns based on z
library(dplyr)
df <- data.frame("a_means","b_means","c_means","d_means","e_mean","f_means","g_means")
z <- c("a_m","c_m","f_m")
df %>%
select(matches(z))
#> X.a_means. X.c_means. X.f_means.
#> 1 a_means c_means f_means
I have below data
df<- data_frame(State= c('CA', 'IN', 'CHI'),
Age= c(46,29,32),
Status= c('Employed', '', 'Employed')
)
In the end, I want to create data that looks like this:
df<- data_frame(col1= c('State-CA', 'State-IN', 'State-CHI'),
col2= c('Age-46','Age-29','Age-32'),
col3= c('Status-Employed', '', 'Status-Employed')
)
Connecting the name of a column and its value with a dash. If a value is missing, the column name shouldn't connect to the value of the table. Could anyone help? Thanks in advance!
With imap, it is a single step. As data.frames are named list with columns of equal length, the imap loops over the list, with the anonymous function call (~), get the .y as the column name and the values as .x, then paste it with str_c
library(purrr)
library(stringr)
imap_dfc(df, ~ case_when(.x ==""|is.na(.x) ~ as.character(.x), TRUE ~ str_c(.y, .x, sep='-')))
# A tibble: 3 x 3
# State Age Status
# <chr> <chr> <chr>
#1 State-CA Age-46 Status-Employed
#2 State-IN Age-29 ""
#3 State-CHI Age-32 Status-Employed
In base R
df[] <- Map(function(x, y) ifelse(x=="", x, paste(x, y, sep="-")),df, names(df))
I think what you are looking for has been answered on this thread - Insert Column Name into its Value using R. Hope you find this helpful!
Also, this code should work for you -
col_names <- names(df)
for (c in col_names) {
df[[c]] <- ifelse(df[[c]] != "", paste(c, df[[c]], sep = "-"), "")
}
df
Output -
State Age Status
1 State-CA Age-46 Status-Employed
2 State-IN Age-29
3 State-CHI Age-32 Status-Employed
I need to prepare a certain dataset for analysis. What I have is a table with column names (obviously). The column names are as follows (sample colnames):
"X99_NORM", "X101_NORM", "X76_110_T02_09747", "X30_NORM"
(this is a vector, for those not familiair with R colnames() function)
Now, what I want is simply to flip the values in front of, and after the underscore. e.g. X99_NORM becomes NORM_X99. Note that I want this only for the column names which contain NORM in their name.
Some other base R options
1)
Use sub to switch the beginning and end - we can make use of capturing groups here.
x <- sub(pattern = "(^X\\d+)_(NORM$)", replacement = "\\2_\\1", x = x)
Result
x
# [1] "NORM_X99" "NORM_X101" "X76_110_T02_09747" "NORM_X30"
2)
A regex-free approach that might be more efficient using chartr, dirname and paste. But we need to get the indices of the columns that contain "NORM" first
idx <- grep(x = x, pattern = "NORM", fixed = TRUE)
x[idx] <- paste0("NORM_", dirname(chartr("_", "/", x[idx])))
x
data
x <- c("X99_NORM", "X101_NORM", "X76_110_T02_09747", "X30_NORM")
x = c("X99_NORM", "X101_NORM", "X76_110_T02_09747", "X30_NORM")
replace(x,
grepl("NORM", x),
sapply(strsplit(x[grepl("NORM", x)], "_"), function(x){
paste(rev(x), collapse = "_")
}))
#[1] "NORM_X99" "NORM_X101" "X76_110_T02_09747" "NORM_X30"
A tidyverse solution with stringr:
library(tidyverse)
library(stringr)
my_data <- tibble(column = c("X99_NORM", "X101_NORM", "X76_110_T02_09747", "X30_NORM"))
my_data %>%
filter(str_detect(column, "NORM")) %>%
mutate(column_2 = paste0("NORM", "_", str_extract(column, ".+(?=_)"))) %>%
select(column_2)
# A tibble: 3 x 1
column_2
<chr>
1 NORM_X99
2 NORM_X101
3 NORM_X30
Noob here to R. Trying to figure something out. I need to build a function that adds a new column to the beginning of a dataset. This new column is a concatenation of the values in other columns that the user specifies.
Imagine this is the data set named myDataSet:
col_1 col_2 col_3 col_4
bat red 1 a
cow orange 2 b
dog green 3 c
The user could use the function like so:
addPrimaryKey(myDataSet, cols=c(1,3,4))
to get the result of a new data set with columns 1, 3 and 4 concatenated into a column called ID and added to the beginning, like so:
ID col_1 col_2 col_3 col_4
bat1a bat red 1 a
cow2b cow orange 2 b
dog4c dog green 3 c
This is the script I have been working on but I have been staring at it so long, I think I have made a few mistakes. I can't figure out how to get the column numbers from the arguments into the paste function properly.
addPrimaryKey <- function(df, cols=NULL){
newVector = rep(NA, length(cols)) ##initialize vector to length of columns
colsN <- as.numeric(cols)
df <- cbind(ID=paste(
for(i in 1:length(colsN)){
holder <- df[colsN[i]]
holder
}
, sep=""), df) ##concatenate the selected columns and add as ID column to df
df
}
Any help would be greatly appreciated. Thanks so much
paste0 works fine, with some help from do.call:
do.call(paste0, mydf[c(1, 3, 4)])
# [1] "bat1a" "cow2b" "dog3c"
Your function, thus, can be something like:
addPrimaryKey <- function(inDF, cols) {
cbind(ID = do.call(paste0, inDF[cols]),
inDF)
}
You may also want to look at interaction:
interaction(mydf[c(1, 3, 4)], drop=TRUE)
# [1] bat.1.a cow.2.b dog.3.c
# Levels: bat.1.a cow.2.b dog.3.c
This should do the trick
addPrimaryKey <-function(df, cols){
q<-apply(df[,cols], 1, function(x) paste(x, collapse=""))
df<-cbind(q, df)
return(df)
}
Just add in some conditional logic for your nulls
Two other options for combining columns are dplyr::mutate() and tidyr::unite():
library(dplyr)
df %>%
mutate(new_col = paste0(col1, col3, col4)) %>%
select(new_col, everything()) # to order the column names with the new column first
library(tidyr)
df %>%
unite(new_col, c(col1, col3, col4), sep = '', remove = FALSE)
The default argument in tidy::unite() is remove = TRUE, which drops the original columns from the data frame leaving only the new column.