I have below data
df<- data_frame(State= c('CA', 'IN', 'CHI'),
Age= c(46,29,32),
Status= c('Employed', '', 'Employed')
)
In the end, I want to create data that looks like this:
df<- data_frame(col1= c('State-CA', 'State-IN', 'State-CHI'),
col2= c('Age-46','Age-29','Age-32'),
col3= c('Status-Employed', '', 'Status-Employed')
)
Connecting the name of a column and its value with a dash. If a value is missing, the column name shouldn't connect to the value of the table. Could anyone help? Thanks in advance!
With imap, it is a single step. As data.frames are named list with columns of equal length, the imap loops over the list, with the anonymous function call (~), get the .y as the column name and the values as .x, then paste it with str_c
library(purrr)
library(stringr)
imap_dfc(df, ~ case_when(.x ==""|is.na(.x) ~ as.character(.x), TRUE ~ str_c(.y, .x, sep='-')))
# A tibble: 3 x 3
# State Age Status
# <chr> <chr> <chr>
#1 State-CA Age-46 Status-Employed
#2 State-IN Age-29 ""
#3 State-CHI Age-32 Status-Employed
In base R
df[] <- Map(function(x, y) ifelse(x=="", x, paste(x, y, sep="-")),df, names(df))
I think what you are looking for has been answered on this thread - Insert Column Name into its Value using R. Hope you find this helpful!
Also, this code should work for you -
col_names <- names(df)
for (c in col_names) {
df[[c]] <- ifelse(df[[c]] != "", paste(c, df[[c]], sep = "-"), "")
}
df
Output -
State Age Status
1 State-CA Age-46 Status-Employed
2 State-IN Age-29
3 State-CHI Age-32 Status-Employed
Related
I can construct a data.frame like this -
data.frame('a_1' = 3)
However, I want to make the column-name a1 as variable. So I tried this -
data.frame(get(paste("a", 1, sep = "_")) = 3)
With this I get below error -
Error: unexpected '=' in "data.frame(get(paste("a", 1, sep = "_")) ="
Can you please help me to understand the right approach to make the colnames as variable?
Thanks for your pointer.
We can use tibble with := to do this
library(stringr)
library(tibble)
tibble(!! str_c("a", "_", 1) := 3)
-output
# A tibble: 1 x 1
a_1
<dbl>
1 3
In base R, this can be done using setNames
df1 <- setNames(data.frame(3), paste0("a", "_", 1))
-output
df1
a_1
1 3
Or if it is only for a specific number of columns, create the dataset, and use names
df1 <- data.frame(3)
names(df1)[1] <- paste0("a_", 1)
I need to prepare a certain dataset for analysis. What I have is a table with column names (obviously). The column names are as follows (sample colnames):
"X99_NORM", "X101_NORM", "X76_110_T02_09747", "X30_NORM"
(this is a vector, for those not familiair with R colnames() function)
Now, what I want is simply to flip the values in front of, and after the underscore. e.g. X99_NORM becomes NORM_X99. Note that I want this only for the column names which contain NORM in their name.
Some other base R options
1)
Use sub to switch the beginning and end - we can make use of capturing groups here.
x <- sub(pattern = "(^X\\d+)_(NORM$)", replacement = "\\2_\\1", x = x)
Result
x
# [1] "NORM_X99" "NORM_X101" "X76_110_T02_09747" "NORM_X30"
2)
A regex-free approach that might be more efficient using chartr, dirname and paste. But we need to get the indices of the columns that contain "NORM" first
idx <- grep(x = x, pattern = "NORM", fixed = TRUE)
x[idx] <- paste0("NORM_", dirname(chartr("_", "/", x[idx])))
x
data
x <- c("X99_NORM", "X101_NORM", "X76_110_T02_09747", "X30_NORM")
x = c("X99_NORM", "X101_NORM", "X76_110_T02_09747", "X30_NORM")
replace(x,
grepl("NORM", x),
sapply(strsplit(x[grepl("NORM", x)], "_"), function(x){
paste(rev(x), collapse = "_")
}))
#[1] "NORM_X99" "NORM_X101" "X76_110_T02_09747" "NORM_X30"
A tidyverse solution with stringr:
library(tidyverse)
library(stringr)
my_data <- tibble(column = c("X99_NORM", "X101_NORM", "X76_110_T02_09747", "X30_NORM"))
my_data %>%
filter(str_detect(column, "NORM")) %>%
mutate(column_2 = paste0("NORM", "_", str_extract(column, ".+(?=_)"))) %>%
select(column_2)
# A tibble: 3 x 1
column_2
<chr>
1 NORM_X99
2 NORM_X101
3 NORM_X30
I am working with some data from external sources which comes with a space in one of its variable names ("Pseudo ID"). I am trying to use purrr::map to change this variable name in all the datasets, but R seems to have problems recognising this variable. I do not want to keep changing variable names one by one... I wonder if anyone can spot the solution?
library(tidyverse)
# Mock data:
set.seed(1)
sampledata<- data.frame(
ID = sample(1:2),
name = sample(letters, 2, replace = TRUE))
colnames(sampledata)[1] <- "Pseudo ID"
# List of mock data:
datalist <- list(sampledata, sampledata)
# Set name in each dataset in the list using map
map(datalist, set_names, nm="PatientID") # BUT HOW CAN I RENAME A SPECIFIC COLUMN - 'Pseudo ID'
You can use map and set_names from purrr here.
set.seed(1)
sampledata<- data.frame(ID = sample(1:2), name = sample(letters, 2, replace = TRUE))
colnames(sampledata)[1] <- "Pseudo ID"
datalist <- list(sampledata, sampledata)
library(purrr)
map(datalist, ~ set_names(.x, nm = replace(
names(.x), names(.x) == "Pseudo ID", "PatientID"
)))
#[[1]]
# PatientID name
#1 1 o
#2 2 x
#[[2]]
# PatientID name
#1 1 o
#2 2 x
If you want to assign different names, use map2
new_id_names <- c("PatientID_1", "PatientID_2")
map2(.x = datalist, .y = new_id_names, ~ set_names(.x, nm = replace(
names(.x), names(.x) == "Pseudo ID", .y
)))
I have variables with names such as r1a r3c r5e r7g r9i r11k r13g r15i etc. I am trying to select variables which starts with r5 - r12 and create a dataframe in R.
The best code that I could write to get this done is,
data %>% select(grep("r[5-9][^0-9]" , names(data), value = TRUE ),
grep("r1[0-2]", names(data), value = TRUE))
Given my experience with regular expressions span a day, I was wondering if anyone could help me write a better and compact code for this!
Here's a regex that gets all the columns at once:
data %>% select(grep("r([5-9]|1[0-2])", names(data), value = TRUE))
The vertical bar represents an 'or'.
As the comments have pointed out, this will fail for items such as r51, and can also be shortened. Instead, you will need a slightly longer regex:
data %>% select(matches("r([5-9]|1[0-2])([^0-9]|$)"))
Suppose that in the code below x represents your names(data). Then the following will do what you want.
# The names of 'data'
x <- scan(what = character(), text = "r1a r3c r5e r7g r9i r11k r13g r15i")
y <- unlist(strsplit(x, "[[:alpha:]]"))
y <- as.numeric(y[sapply(y, `!=`, "")])
x[y > 4]
#[1] "r5e" "r7g" "r9i" "r11k" "r13g" "r15i"
EDIT.
You can make a function with a generalization of the above code. This function has three arguments, the first is the vector of variables names, the second and the third are the limits of the numbers you want to keep.
var_names <- function(x, from = 1, to = Inf){
y <- unlist(strsplit(x, "[[:alpha:]]"))
y <- as.integer(y[sapply(y, `!=`, "")])
x[from <= y & y <= to]
}
var_names(x, 5)
#[1] "r5e" "r7g" "r9i" "r11k" "r13g" "r15i"
Remove the non-digits, scan the remainder in and check whether each is in 5:12 :
DF <- data.frame(r1a=1, r3c=2, r5e=3, r7g=4, r9i=5, r11k=6, r13g=7, r15i=8) # test data
DF[scan(text = gsub("\\D", "", names(DF)), quiet = TRUE) %in% 5:12]
## r5e r7g r9i r11k
## 1 3 4 5 6
Using magrittr it could also be written like this:
library(magrittr)
DF %>% .[scan(text = gsub("\\D", "", names(.)), quiet = TRUE) %in% 5:12]
## r5e r7g r9i r11k
## 1 3 4 5 6
Noob here to R. Trying to figure something out. I need to build a function that adds a new column to the beginning of a dataset. This new column is a concatenation of the values in other columns that the user specifies.
Imagine this is the data set named myDataSet:
col_1 col_2 col_3 col_4
bat red 1 a
cow orange 2 b
dog green 3 c
The user could use the function like so:
addPrimaryKey(myDataSet, cols=c(1,3,4))
to get the result of a new data set with columns 1, 3 and 4 concatenated into a column called ID and added to the beginning, like so:
ID col_1 col_2 col_3 col_4
bat1a bat red 1 a
cow2b cow orange 2 b
dog4c dog green 3 c
This is the script I have been working on but I have been staring at it so long, I think I have made a few mistakes. I can't figure out how to get the column numbers from the arguments into the paste function properly.
addPrimaryKey <- function(df, cols=NULL){
newVector = rep(NA, length(cols)) ##initialize vector to length of columns
colsN <- as.numeric(cols)
df <- cbind(ID=paste(
for(i in 1:length(colsN)){
holder <- df[colsN[i]]
holder
}
, sep=""), df) ##concatenate the selected columns and add as ID column to df
df
}
Any help would be greatly appreciated. Thanks so much
paste0 works fine, with some help from do.call:
do.call(paste0, mydf[c(1, 3, 4)])
# [1] "bat1a" "cow2b" "dog3c"
Your function, thus, can be something like:
addPrimaryKey <- function(inDF, cols) {
cbind(ID = do.call(paste0, inDF[cols]),
inDF)
}
You may also want to look at interaction:
interaction(mydf[c(1, 3, 4)], drop=TRUE)
# [1] bat.1.a cow.2.b dog.3.c
# Levels: bat.1.a cow.2.b dog.3.c
This should do the trick
addPrimaryKey <-function(df, cols){
q<-apply(df[,cols], 1, function(x) paste(x, collapse=""))
df<-cbind(q, df)
return(df)
}
Just add in some conditional logic for your nulls
Two other options for combining columns are dplyr::mutate() and tidyr::unite():
library(dplyr)
df %>%
mutate(new_col = paste0(col1, col3, col4)) %>%
select(new_col, everything()) # to order the column names with the new column first
library(tidyr)
df %>%
unite(new_col, c(col1, col3, col4), sep = '', remove = FALSE)
The default argument in tidy::unite() is remove = TRUE, which drops the original columns from the data frame leaving only the new column.