change the sequence of numbers in a filename using R

change the sequence of numbers in a filename using R - r

I am sorry, I could not find an answer to this question anywhere and would really appreciate your help.
I have .csv files for each hour of a year. The filename is written in the following way:
hh_dd_mm.csv (e.g. for February 1st 00:00--> 00_01_02.csv). In order to make it easier to sort the hours of a year I would like to change the filename to mm_dd_hh.csv
How can I write in R to change the filename from the pattern HH_DD_MM to MM_DD_HH?
a <- list.files(path = ".", pattern = "HH_DD_MM")
b<-paste(pattern="MM_DD_HH")
file.rename(a,b)

Or you could do:
a <- c("00_01_02.csv", "00_02_02.csv")
gsub("(\\d{2})\\_(\\d{2})\\_(\\d{2})(.*)", "\\3_\\2_\\1\\4", a)
#[1] "02_01_00.csv" "02_02_00.csv"

Not sure if this is the best solution, but seem to work
a <- c("00_01_02.csv", "00_02_02.csv")
b <- unname(sapply(a, function(x) {temp <- strsplit(x, "(_|[.])")[[1]] ; paste0(temp[[3]], "_", temp[[2]], "_", temp[[1]], ".", temp[[4]])}))
b
## [1] "02_01_00.csv" "02_02_00.csv"

You can use chartr to create the new file name. Here's an example..
> write.csv(c(1,1), "12_34_56")
> list.files()
# [1] "12_34_56"
> file.rename("12_34_56", chartr("1256", "5612", "12_34_56"))
# [1] TRUE
> list.files()
# [1] "56_34_12"
In chartr, you can replace the elements of a string, so long as it doesn't change the number of characters in the original string. In the above code, I basically just swapped "12" with "56", which is what it looks like you are trying to do.
Or, you can write a short string swapping function
> strSwap <- function(x) paste(rev(strsplit(x, "[_]")[[1]]), collapse = "_")
> ( files <- c("84_15_45", "59_95_21", "31_51_49",
"51_88_27", "21_39_98", "35_27_14") )
# [1] "84_15_45" "59_95_21" "31_51_49" "51_88_27" "21_39_98" "35_27_14"
> sapply(files, strSwap, USE.NAMES = FALSE)
# [1] "45_15_84" "21_95_59" "49_51_31" "27_88_51" "98_39_21" "14_27_35"
You could also so it with the substr<- assignment function
> s1 <- substr(files,1,2)
> substr(files,1,2) <- substr(files,7,8)
> substr(files,7,8) <- s1
> files
# [1] "45_15_84" "21_95_59" "49_51_31" "27_88_51" "98_39_21" "14_27_35"

Related

Subset String by Position of Characters in R

I can't seem to find an elegant solution to a relatively simple problem in R. I would like to extract characters from a string based on a vector of positions. For example, how could I extract the 1st, 3rd, and 5th characters from example.string? substr does not work without a beginning and end.
example.string <- "ApplesAndCookies"
characters.wanted <- c(1,3,5)
Expected output would be:
Ape
I can design a loop or function to do this, but there has to be an easier way...

For a single string and a single vector you can
rawToChar(charToRaw(example.string)[characters.wanted])
Output
[1] "Ape"
For a vector of characters, you can
sapply(your_vector, function(x, i) rawToChar(charToRaw(x)[i]), characters.wanted)

A possible solution for a single string.
example.string <- "ApplesAndCookies"
characters.wanted <- c(1,3,5)
paste(unlist(strsplit(example.string, ''))[characters.wanted], collapse = '')
# ---------------------------------------------------------------------------
[1] "Ape"
Extension to a vector of strings.
example.string <- c("ApplesAndCookies","ApplesAndCookies","ApplesAndCookies")
characters.wanted <- c(1,3,5)
sapply(strsplit(example.string, ''), function(x) {
paste(x[characters.wanted], collapse = '')
})
# ---------------------------------------------------------------------------
[1] "Ape" "Ape" "Ape"

There's a function in the package "Biostrings" that allows you to do this.
You first have to install BiocManger
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install(version = "3.14")
Next install and load the package "Biostrings"
BiocManager::install("Biostrings")
library(Biostrings)
You can then use the function letter() to subset your string. For example:
x <- "abcde"
letter(x, 1:2)
"ab"

You could use:
example.string <- "ApplesAndCookies"
characters.wanted <- c(1,3,5)
paste(strsplit(example.string, "")[[1]][characters.wanted], collapse="")
Output:
[1] "Ape"

How to remove specific pattern in string?

I have data in that string is like f <- "./DAYA-1178/10TH FEB.xlsx". I would like to extract only DAYA-1178
what I have tried is
f1 <- gsub(".*./","", f)
But it is giving last result of my file "10TH FEB.xlsx"
Appreciate any lead.

It seems you are dealing with files. You need the basename of the directory:
basename(dirname(f))
[1] "DAYA-1178"
or you could do:
sub(".*/","",dirname(f))
[1] "DAYA-1178"

Using strsplit, we can split the input on path separator / and retain the second element:
f <- "./DAYA-1178/10TH FEB.xlsx"
unlist(strsplit(f, "/"))[2]
[1] "DAYA-1178"
If you wish to use sub, here is one way:
sub("^.*/(.*?)/.*$", "\\1", f)
[1] "DAYA-1178"

f1 <- gsub("[.,xlsx]","",f)
u can try like these it will give
f1 <- /DAYA-1178/10TH FEB
f3 <- strsplit(f1,"/")[[1]][2]
DAYA-1178 --> answer

How to split a string based on a semicolon following a conditional digit

I'm working in R with strings like the following:
"a1_1;a1_2;a1_5;a1_6;a1_8"
"two1_1;two1_4;two1_5;two1_7"
I need to split these strings into two strings based on the last digit being less than 7 or not. For instance, the desired output for the two strings above would be:
"a1_1;a1_2;a1_5;a1_6" "a1_8"
"two1_1;two1_4;two1_5" "two1_7"
I attempted the following to no avail:
x <- "a1_1;a1_2;a1_5;a1_6;a1_8"
str_split("x", "(\\d<7);")
In an earlier version of the question I was helped by someone that provided the following function, but I don't think it's set up to handle digits both before and after the semicolon in the strings above. I'm trying to modify it but I haven't been able to get it to come out correctly.
f1 <- function(strn) {
strsplit(gsubfn("(;[A-Za-z]+\\d+)", ~ if(readr::parse_number(x) >= 7)
paste0(",", sub(";", "", x)) else x, strn), ",")[[1]]
}
Can anyone help me understand what I'd need to do to make this split as desired?

Splitting and recombining on ;, with a simple regex capture in between.
s <- c("a1_1;a1_2;a1_5;a1_6;a1_8", "two1_1;two1_4;two1_5;two1_7")
sp <- strsplit(s, ";")
lapply(sp,
function(x) {
l <- sub(".*(\\d)$", "\\1", x) < 7
c(paste(x[l], collapse=";"), paste(x[!l], collapse=";"))
}
)
# [[1]]
# [1] "a1_1;a1_2;a1_5;a1_6" "a1_8"
#
# [[2]]
# [1] "two1_1;two1_4;two1_5" "two1_7"

Extracting identifiers without matching files in a folder

How to extract the identifiers which do not have corresponding files being generated?
Identifiers which are given as input for generation fo files:
fileIden <- c('a-1','a-2','a-3','b-1','b-2','c-1','d-1','d-2','d-3','d-4')
Checking the files generated:
files <- list.files(".")
files
# [1] "a-2.csv" "a-3.csv" "b-1.csv" "c-1.csv" "d-3.csv"
# Generated here for reproducibility.
# files <- c("a-2.csv", "a-3.csv", "b-1.csv", "c-1.csv", "d-3.csv")
Expected files if all the process is completely successful
fileExp <- paste(fileIden, ".csv", sep = "")
# [1] "a-1.csv" "a-2.csv" "a-3.csv" "b-1.csv" "b-2.csv" "c-1.csv" "d-1.csv" "d-2.csv" "d-3.csv" "d-4.csv"
Any expected files are missing?
fileMiss <- fileExp[!fileExp %in% files]
# [1] "a-1.csv" "b-2.csv" "d-1.csv" "d-2.csv" "d-4.csv"
Expected output
# "a-1" "b-2" "d-1" "d-2" "d-4"
I am sure that there is an easy process directly to get the above output without creating the files: fileExp, fileMiss. Could you please guide me there?

You can do this :
fileIden <- c('a-1','a-2','a-3','b-1','b-2','c-1','d-1','d-2','d-3','d-4')
file <- c("a-2.csv", "a-3.csv" ,"b-1.csv", "c-1.csv", "d-3.csv")
setdiff(fileIden, trimws(gsub("\\.csv","", file)))
Another approach:
setdiff(fileIden, stringr::str_extract(file,"(.*)(?=\\.csv)"))
Logic:
setdiff finds the difference between two vectors, gsub replaces the ".csv" with nothing , we club them together to find the difference between those vectors.
Output:
#[1] "a-1" "b-2" "d-1" "d-2" "d-4"

a less elegant approach
result <- ifelse(fileIden %in% substr(file, 1, 3), "", fileIden)
result[result != ""]

R: How to split string and keep part of it?

I have a lot of strings like shown below.
> x=c("cat_1_2_3", "dog_2_6_3", "cow_2_8_6")
> x
[1] "cat_1_2_3" "dog_2_6_3" "cow_2_8_6"
I would like to seperate the string, while still holding the first part of it, as demonstrated below.
"cat_1" "cat_2" "cat_3" "dog_2" "dog_6" "dog_3" "cow_2" "cow_8" "cow_6"
Any suggestions?

We can use sub
scan(text=sub("([a-z]+)_(\\d+)_(\\d+)_(\\d+)", "\\1_\\2,\\1_\\3,\\1_\\4",
x), what ='', sep=",", quiet = TRUE)
#[1] "cat_1" "cat_2" "cat_3" "dog_2" "dog_6" "dog_3" "cow_2" "cow_8" "cow_6"
Or another option is split the string with
unlist(lapply(strsplit(x, "_"), function(x) paste(x[1], x[-1], sep="_")))

You could try to split the string, then re-combine using paste.
f <- function(x) {
res <- strsplit(x,'_')[[1]]
paste(res[1], res[2:4], sep='_')
}
x <- c("cat_1_2_3", "dog_2_6_3", "cow_2_8_6")
unlist(lapply(x, f))

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

change the sequence of numbers in a filename using R - r

Or you could do: a <- c("00_01_02.csv", "00_02_02.csv") gsub("(\\d{2})\\_(\\d{2})\\_(\\d{2})(.*)", "\\3_\\2_\\1\\4", a) #[1] "02_01_00.csv" "02_02_00.csv"

Not sure if this is the best solution, but seem to work a <- c("00_01_02.csv", "00_02_02.csv") b <- unname(sapply(a, function(x) {temp <- strsplit(x, "(_|[.])")[[1]] ; paste0(temp[[3]], "_", temp[[2]], "_", temp[[1]], ".", temp[[4]])})) b ## [1] "02_01_00.csv" "02_02_00.csv"

Related

Subset String by Position of Characters in R

How to remove specific pattern in string?

How to split a string based on a semicolon following a conditional digit

Extracting identifiers without matching files in a folder

R: How to split string and keep part of it?

Categories

Resources