How to set column names in R by repeating character? [duplicate] - r

This question already has answers here:
How to create a sequence starting with a character and then with numbers in R
(1 answer)
Make sequential numeric column names prefixed with a letter
(3 answers)
Closed 1 year ago.
Suppose I want to create a column name in R called L1, L2, ..., up to L200. How could I do this for a data frame?
I tried colnames(df) <- c('L1':'L200'), but this does not work (returns error message NAs introduced by coercion), even though there are 200 columns.
Help on this appreciated!

We can use paste
colnames(df) <- paste0("L", 1:200)
or to make it more automatic
colnames(df) <- paste0("L", seq_along(df))
NOTE: The range (:) operator works for integer, and not with character in base R i.e. 'L1' is a string, while 1 is integer, so 1:200 gives the range of values from 1 to 200

Here is another solution:
colnames(df) <- sprintf("L%d", 1:200)

Related

Rename column names by pattern [duplicate]

This question already has answers here:
Splitting a column in a data frame by an nth instance of a character
(3 answers)
Accessing element of a split string in R
(4 answers)
First entry from string split
(7 answers)
Closed 1 year ago.
I want to rename my columns cause it's too long, for example:
chrX:99883666-99894988_TSPAN6_ENSG00000000003.10 to TSPAN6
chrX:99839798-99854882_TNMD_ENSG00000000005.5 to TNMD
chr20:49505584-49575092_DPM1_ENSG00000000419.8 to DPM1
How can I rename it consider the elements I want to delete differs from every columns?
Using strsplit we can try:
names(df) <- strsplit(names(df), "_")[[1]][2]
If you only want to target a certain subset of names, then simply filter names(df) using that logic.
You can do that using regex. How about extracting a word between two underscores ?
x <- c("chrX:99883666-99894988_TSPAN6_ENSG00000000003.10",
"chrX:99839798-99854882_TNMD_ENSG00000000005.5",
"chr20:49505584-49575092_DPM1_ENSG00000000419.8")
sub('.*?_(\\w+)_.*', '\\1', x)
#[1] "TSPAN6" "TNMD" "DPM1"
For names of the column you can use names(df) instead of x.
names(df) <- sub('.*?_(\\w+)_.*', '\\1', names(df))
and if you prefer dplyr -
library(dplyr)
df <- df %>% rename_with(~sub('.*?_(\\w+)_.*', '\\1', .))

Removing sampled values from a character vector [duplicate]

This question already has answers here:
How to tell what is in one vector and not another?
(6 answers)
Closed 2 years ago.
I'm quite new to R, but I'm trying to make a "randomizer" of sorts.
I have a vector
names <- c('Name1', 'Name2', [...], 'Name13')
I then sample 6 names from the vector to another vector
name_sample_1 <- sample(names, 6)
What i want is to then update the "names" vector by a line of code, and not have to do it manually. I tried running:
names <- names - name_sample_1
But this returned the error 'non-numeric argument to binary operator'. Any ideas on how to do this effectively?
you have to use the handy %in% operator!
names <- paste0("name", 1:20)
sample_names <- sample(names,6)
names_updated <- names[!names %in% sample_names]

Split column names in R using a separator [duplicate]

This question already has answers here:
Extracting numbers from vectors of strings
(12 answers)
Closed 2 years ago.
I have a dataframe X with column names such as
1_abc,
2_fgy,
27_msl,
936_hhq,
3_hdv
I want to just keep the numbers as the column name (so instead of 1_abc, just 1). How do I go about removing it while keeping the rest of the data intact?
All column names have underscore as the separator between numeric and character variables. There are about 400 columns so I want to be able to code this without using specific column name
You may use sub here for a base R option:
names(df) <- sub("^(\\d+).*$", "\\1", names(df))
Another option might be:
names(df) <- sub("_.*", "", names(df))
This would just strip off everything from the first underscore until the end of the column name.

How to point an R function to a particular column of a dataset? [duplicate]

This question already has answers here:
Dynamically select data frame columns using $ and a character value
(10 answers)
Closed 2 years ago.
I have a dataset called df, which has columns a and b with three integers each. I want to write a function for the mean (obviously this already exists; I want to write a larger function and this appears to be where problems are occurring). However, this function returns NA:
mean_function <- function(x) {
mean(df$x)
}
mean_function(a) returns NA, while mean(df$a) returns 2. Is there something I'm missing about how R functions handle datasets, or another problem?
We need [[ instead of $ as it will literally check for x as column and pass a string
mean_function <- function(x) {mean(df[[x]])}
mean_function("a")
If we need to pass unquoted column name, substitute and convert to character with deparse
mean_function<- function(x) {
x <- deparse(substitute(x))
mean(df[[x]]
}
mean_function(a)

How to convert all percentage data in R to decimal? [duplicate]

This question already has answers here:
How to convert character of percent into numeric in R
(6 answers)
Closed 3 years ago.
I have a large data set containing both numerical and categorical data.
A number of columns contain % data i.e. "26.2%", as these are not recognised in R as percentages I wish to convert them to decimals.
I have tried:
data2 <- as.numeric(sub("%", "",data,fixed=TRUE))/100
However:
Warning message:
NAs introduced by coercion
Can someone please help with the correct approach and/or syntax?
If your data is a dataframe, you can not use the sub function.
sub is for vectors.
Try using the same function but column by column
e.g.
column1 <- as.numeric(sub("%", "",data$column1,fixed=TRUE))/100
You could try:
library(dplyr)
df %>% mutate_each(funs(as.numeric(gsub("%", "", ., fixed = TRUE))/100))
To apply to all columns you can combine the code provided by the other users with an apply statement. For example,
apply(d,2, function(x){
as.numeric(sub("%", "", x, fixed=TRUE))/100}
where d is your dataframe

Resources