R: Reordering Multiple File Path in an object in R [duplicate] - r

This question already has answers here:
How to sort a character vector where elements contain letters and numbers?
(6 answers)
Sort columns numerically in R [duplicate]
(2 answers)
Closed 9 months ago.
I have 100 files, each named "ABC - Day - 1(to 100).csv".
When I read them into R, it is ordered like this: Day1, Day10, Day100, etc. (see figure 1). I know R does this because it is sorting it by character, not by number. Is there a way that I could reorder the path in numerically correct order (Day1, Day2, Day3, ...) without me actually having to manually change my raw file names?
Here is what I have so far:
filenames <- list.files(path="../STEP_ONE/Test_raw",
pattern="ADD_Day+.*sav",
full.names = TRUE) # Reads in path of the 100 files

Let’s suppose you have a vector v with the names of your file (according to what you said, ___Day__.sav). You can subtract the number of the day and reorder the names with the following code:
# Load library
library(stringr)
# Matrix with your files' names and the day
tab <- as.data.frame(str_match(v, "Day\\s*(.*?)\\s*.sav"))
# Column names
colnames(tab) <- c("file.name", "day")
# Day as numeric
tab$day <- as.numeric(tab$day)
# Reorder `tab` according to $day
tab <- tab[order(tab$day),]

Related

How to set column names in R by repeating character? [duplicate]

This question already has answers here:
How to create a sequence starting with a character and then with numbers in R
(1 answer)
Make sequential numeric column names prefixed with a letter
(3 answers)
Closed 1 year ago.
Suppose I want to create a column name in R called L1, L2, ..., up to L200. How could I do this for a data frame?
I tried colnames(df) <- c('L1':'L200'), but this does not work (returns error message NAs introduced by coercion), even though there are 200 columns.
Help on this appreciated!
We can use paste
colnames(df) <- paste0("L", 1:200)
or to make it more automatic
colnames(df) <- paste0("L", seq_along(df))
NOTE: The range (:) operator works for integer, and not with character in base R i.e. 'L1' is a string, while 1 is integer, so 1:200 gives the range of values from 1 to 200
Here is another solution:
colnames(df) <- sprintf("L%d", 1:200)

Removing sampled values from a character vector [duplicate]

This question already has answers here:
How to tell what is in one vector and not another?
(6 answers)
Closed 2 years ago.
I'm quite new to R, but I'm trying to make a "randomizer" of sorts.
I have a vector
names <- c('Name1', 'Name2', [...], 'Name13')
I then sample 6 names from the vector to another vector
name_sample_1 <- sample(names, 6)
What i want is to then update the "names" vector by a line of code, and not have to do it manually. I tried running:
names <- names - name_sample_1
But this returned the error 'non-numeric argument to binary operator'. Any ideas on how to do this effectively?
you have to use the handy %in% operator!
names <- paste0("name", 1:20)
sample_names <- sample(names,6)
names_updated <- names[!names %in% sample_names]

Create a character from column names (in R) [duplicate]

This question already has answers here:
R regex find last occurrence of delimiter
(4 answers)
Closed 1 year ago.
I have a matrix with thousands of columns which names are as shown below:
Z41_5_tes_ACGTTCCATAGCCGTA
Z41_5_ACGTTCCAGAGCGGTA
Z53_5_ACGTTCCAGAGCCGTA
Z53_5_ACGTTCCAGATCTGTA
Z41_5_ACGTTGCATAGCGGTA
Z41_5_tes_ACGTTCGCTAGCCGTA
I would like to create a vector with names that include the beginning of each columns names as shown below:
Z41_5_tes
Z41_5
Z53_5
Z53_5
Z41_5
Z41_5_tes
I have tried but here I did not capture Z41_5_tes.
names <- gsub("^([^]*[^_]).$", "\1", colnames(x#data))
Z41_5
Z53_5
Remove everything after the last underscore.
sub('_[^_]*$', '', x)
#[1] "Z41_5_tes" "Z41_5" "Z53_5" "Z53_5" "Z41_5" "Z41_5_tes"
Extract everything before last underscore.
sub('(.*)_.*', '\\1', x)
#[1] "Z41_5_tes" "Z41_5" "Z53_5" "Z53_5" "Z41_5" "Z41_5_tes"
data
x <- c("Z41_5_tes_ACGTTCCATAGCCGTA", "Z41_5_ACGTTCCAGAGCGGTA",
"Z53_5_ACGTTCCAGAGCCGTA", "Z53_5_ACGTTCCAGATCTGTA",
"Z41_5_ACGTTGCATAGCGGTA", "Z41_5_tes_ACGTTCGCTAGCCGTA")

How do I extract elements from a dataframe by pattern? [duplicate]

This question already has answers here:
Subset data to contain only columns whose names match a condition
(10 answers)
Closed 3 years ago.
I have a dataframe dat that has many variables like
"x_tp1_y"
"g_tp1_z"
"f_tp2_h"
I would like to extract elements that include "tp1".
I already tried this:
grep("tp1", dat)
grepl("tp1", dat)
dat["tp1",]
I just want R to give me elements with this pattern so I do not have to type in all variable names that are in the dataframe dat.
Like this:
command that extracts elements with pattern "tp1"
R returns parts of the dataframe that have pattern "tp1":
x_tp1_y g_tp1_z
1 2
0 3
And then I would like to create a new dataframe.
I know that I just can use
newdat <- data.frame( dat[[1]], dat[ c(1:30)])
but I have so many elements in my dataframe that this would take ages.
Thank you for your help!
dat[,grep("tp1", colnames(dat))]
grep finds the index numbers in the column names of the data.frame (the vector colnames(dat)) that contain the necessary pattern. "[" subsets

R: keep leading zero [duplicate]

This question already has answers here:
How to avoid: read.table truncates numeric values beginning with 0
(3 answers)
Closed 8 years ago.
I have a dataset in .csv format. In my dataset there is one column which is leading with zero like this "05","02". i am trying to import .csv file using read.csv in R. It read successfully but it remove the leading zero.
Thanks in Advance.
If all the data in the column are of the same length, you can do paste0("0", NAME).
If variable length, try formatC like so: formatC(NAME, width = 2, format = "d", flag = "0").
In the latter example, 'd' refers to 'integer' and 'width' can be changed as desired.

Resources