How to rename mutliples files in r - r

I need to convert over 100 images name into a format like: SITE_T001_L001.jpg, where Site is CGS1, T= TUBES, L= image number.
All those images are contain into a single file named CGS1 (the site), subdivided by file named accordingly to their tubes number. Then the images are organised by date. This order represents the image number. The first one is 1, the second one is two.(the alpahabetic order is not correct)
here, I have a graphical representation:
I found how to do it manually in R
file.rename("Snap_029.jpg",
paste("CGS1","T001","L003", ".jpg", sep = "_"))
but is there anyway to automate it with a loop?
In more details - as requested in the response:
I have this series of input filenames (including leading path)- ordered by dates of modification (important).
file_list
[1] "CGS1/1/Snap_001.jpg" "CGS1/1/Snap_002.jpg" "CGS1/1/Snap_005.jpg" "CGS1/2/Snap_006.jpg" "CGS1/2/Snap_007.jpg" "CGS1/2/Snap_082.jpg"
I am looking to modify the name of each images following the main folder CGS1, the subfolder from T001 to T002, and following the date of modification from L001 to L003 as this output filenames
new_file_list
[1] "CGS1_T001_L001.jpg" "CGS1_T001_L002.jpg" "CGS1_T001_L003.jpg" "CGS1_T002_L001.jpg" "CGS1_T002_L002.jpg" "CGS1_T002_L003.jpg"

Try this:
file_list <- list.files(path = "...", recursive = TRUE, pattern = "\\.jpg$")
### for testing
file_list <- c(
"CGS1/1/Snap_001.jpg", "CGS1/1/Snap_005.jpg", "CGS1/1/Snap_002.jpg",
"CGS1/2/Snap_006.jpg", "CGS1/2/Snap_007.jpg", "CGS1/2/Snap_0082.jpg"
)
spl <- strsplit(file_list, "[/\\\\]")
# ensure that all files are exactly two levels down
stopifnot(all(lengths(spl) == 3))
m <- do.call(rbind, spl)
m
# [,1] [,2] [,3]
# [1,] "CGS1" "1" "Snap_001.jpg"
# [2,] "CGS1" "1" "Snap_005.jpg"
# [3,] "CGS1" "1" "Snap_002.jpg"
# [4,] "CGS1" "2" "Snap_006.jpg"
# [5,] "CGS1" "2" "Snap_007.jpg"
# [6,] "CGS1" "2" "Snap_0082.jpg"
From this, we'll update the second/third columns to be what you expect.
# either one (not both), depending on if you are guaranteed integers
m[,2] <- sprintf("T%03.0f", as.integer(m[,2]))
# ... or if you may have non-numbers
m[,2] <- paste0("T", strrep("0", max(0, 3 - nchar(m[,2]))), m[,2])
# since we really don't care about 'Snap_001.jpg' (etc), we can discard the third column
new_file_list <- apply(m[,1:2], 1, paste, collapse = "_")
# back-street way of applying sequences to each CGS/T combination while preserving order
for (prefix in unique(new_file_list)) {
new_file_list[new_file_list == prefix] <- sprintf("%s_L%03d.jpg",
new_file_list[new_file_list == prefix],
seq_len(sum(new_file_list == prefix)))
}
new_file_list
# [1] "CGS1_T001_L001.jpg" "CGS1_T001_L002.jpg" "CGS1_T001_L003.jpg"
# [4] "CGS1_T002_L001.jpg" "CGS1_T002_L002.jpg" "CGS1_T002_L003.jpg"
Now it's a matter of renaming. Note that this will move all files into the current directory.
file.rename(file_list, new_file_list)

Related

Transposing a dataframe that contains character strings in R

I have hundreds of tab-separated tables saved as text files in a folder. I would like to transpose all those tables using R and then export the transposed tables as tab-separated text files. I'm using the following code:
files <- list.files()
for (i in files) {
x <- t(read.table(i, header = TRUE, sep = "\t"))
filename <- paste0("transposed_", i)
write.table(x, file = filename)
}
The above code works perfectly well except that, because the tables being transposed contain character strings, the function t() returns matrices with all values as strings. As a result, the transposed tables exported with write.table have all values within quotation marks "".
So, the question is: how could I transpose dataframes that contain character strings without getting all values converted into character strings? If someone can demonstrate this using the following dataset, I can replicate for my task.
# Hypothetical dataframe
data <- data.frame(dist = 1:5,
time = 6:10,
vel = 11:15,
pos = c("x","y","z","w","k"))
row.names(data) <- c("indA","indB","indC","indD","indE")
data
# dist time vel pos
# indA 1 6 11 x
# indB 2 7 12 y
# indC 3 8 13 z
# indD 4 9 14 w
# indE 5 10 15 k
t(data)
# indA indB indC indD indE
# dist "1" "2" "3" "4" "5"
# time " 6" " 7" " 8" " 9" "10"
# vel "11" "12" "13" "14" "15"
# pos "x" "y" "z" "w" "k"
# Even if I use as.data.frame(t(data)), all values remain as character strings
I've tried several solutions offered in other topics, but none worked. Also, I'd like (if possible) to perform this task using R base functions only.

Row names disappear after as.matrix

I notice that if the row names of the dataframe follows a sequence of numbers from 1 to the number of rows. The row names of the dataframe will disappear after using as.matrix. But the row names re-appear if the row name is not a sequence.
Here are a reproducible example:
test <- as.data.frame(list(x=c(0.1, 0.1, 1), y=c(0.1, 0.2, 0.3)))
rownames(test)
# [1] "1" "2" "3"
rownames(as.matrix(test))
# NULL
rownames(as.matrix(test[c(1, 3), ]))
# [1] "1" "3"
Does anyone have an idea on what is going on?
Thanks a lot
You can enable rownames = TRUE when you apply as.matrix
> as.matrix(test, rownames = TRUE)
x y
1 0.1 0.1
2 0.1 0.2
3 1.0 0.3
First and foremost, we always have a numerical index for sub-setting that won't disappear and that we should not confuse with row names.
as.matrix(test)[c(1, 3), ]
# x y
# [1,] 0.1 0.1
# [2,] 1.0 0.3
WHAT's going on while using rownames is the dimnames feature in the serene source code of base:::rownames(),
function (x, do.NULL = TRUE, prefix = "row")
{
dn <- dimnames(x)
if (!is.null(dn[[1L]]))
dn[[1L]]
else {
nr <- NROW(x)
if (do.NULL)
NULL
else if (nr > 0L)
paste0(prefix, seq_len(nr))
else character()
}
}
which yields NULL for dimnames(as.matrix(test))[[1]] but yields "1" "3" in the case of dimnames(as.matrix(test[c(1, 3), ]))[[1]].
Note, that the method base:::row.names.data.frame is applied in case of data frames, e.g. rownames(test).
The WHAT should be explained with it, fortunately you did not ask for the WHY, which would be rather opinion-based.
There is a difference between 'automatic' and non-'automatic' row names.
Here is a motivating example:
automatic
test <- as.data.frame(list(x = c(0.1,0.1,1), y = c(0.1,0.2,0.3)))
rownames(test)
# [1] "1" "2" "3"
rownames(as.matrix(test))
# NULL
non-'automatic'
test1 <- test
rownames(test1) <- as.character(1:3)
rownames(test1)
# [1] "1" "2" "3"
rownames(as.matrix(test1))
# [1] "1" "2" "3"
You can read about this in e.g. ?data.frame, which mentions the behavior you discovered at the end:
If row.names was supplied as NULL or no suitable component was found the row names are the integer sequence starting at one (and such row names are considered to be ‘automatic’, and not preserved by as.matrix).
When you call test[c(1, 3), ] then you create non-'automatic' rownames implicitly, which is kinda documented in ?Extract.data.frame:
If `[` returns a data frame it will have unique (and non-missing) row names.
(type `[.data.frame` into your console if you want to go deeper here.)
Others showed what this means for your case already, see the argument rownames.force in ?matrix:
rownames.force: ... The default, NA, uses NULL rownames if the data frame has ‘automatic’ row.names or for a zero-row data frame.
The difference dataframe vs. matrix:
?rownames
rownames(x, do.NULL = TRUE, prefix = "row")
The important part is do.NULL = TRUE the default is TRUE: This means:
If do.NULL is FALSE, a character vector (of length NROW(x) or NCOL(x)) is returned in any case,
If the replacement versions are called on a matrix without any existing dimnames, they will add suitable dimnames. But constructions such as
rownames(x)[3] <- "c"
may not work unless x already has dimnames, since this will create a length-3 value from the NULL value of rownames(x).
For me that means (maybe not correct or professional) to apply rownames() function to a matrix the dimensions of the row must be declared before otherwise you will get NULL -> because this is the default setting in the function rownames().
In your example you experience this kind of behaviour:
Here you declare row 1 and 3 and get 1 and 3
rownames(as.matrix(test[c(1, 3), ]))
[1] "1" "3"
Here you declare nothing and get NULL because NULL is the default.
rownames(as.matrix(test))
NULL
You can overcome this by declaring before:
rownames(test) <- 1:3
rownames(as.matrix(test))
[1] "1" "2" "3"
or you could do :
rownames(as.matrix(test), do.NULL = FALSE)
[1] "row1" "row2" "row3"
> rownames(as.matrix(test), do.NULL = FALSE, prefix="")
[1] "1" "2" "3"
Similar effect with rownames.force:
rownames.force
logical indicating if the resulting matrix should have character (rather than NULL) rownames. The default, NA, uses NULL rownames if the data frame has ‘automatic’ row.names or for a zero-row data frame.
dimnames(matrix_test)
I don't know exactly why it happens, but one way to fix it is to include the argument rownames.force = T, inside as.matrix
rownames(as.matrix(test, rownames.force = T))

How to insert back a character in a string at the exact position where it was originally

I have strings that have dots here and there and I would like to remove them - that is done, and after some other operations - these are also done, I would like to insert the dots back at their original place - this is not done. How could I do that?
library(stringr)
stringOriginal <- c("abc.def","ab.cd.ef","a.b.c.d")
dotIndex <- str_locate_all(pattern ='\\.', stringOriginal)
stringModified <- str_remove_all(stringOriginal, "\\.")
I see that str_sub() may help, for example str_sub(stringModified[2], 3,2) <- "." gets me somewhere, but it is still far from the right place, and also I have no idea how to do it programmatically. Thank you for your time!
update
stringOriginal <- c("11.123.100","11.123.200","1.123.1001")
stringOriginalF <- as.factor(stringOriginal)
dotIndex <- str_locate_all(pattern ='\\.', stringOriginal)
stringModified <- str_remove_all(stringOriginal, "\\.")
stringNumFac <- sort(as.numeric(stringModified))
stringi::stri_sub(stringNumFac[1:2], 3, 2) <- "."
stringi::stri_sub(stringNumFac[1:2], 7, 6) <- "."
stringi::stri_sub(stringNumFac[3], 2, 1) <- "."
stringi::stri_sub(stringNumFac[3], 6, 5) <- "."
factor(stringOriginal, levels = stringNumFac)
after such manipulation, I am able to order the numbers and convert them back to strings and use them later for plotting.
But since I wouldn't know the position of the dot, I wanted to make it programmatical. Another approach for factor ordering is also welcomed. Although I am still curious about how to insert programmatically back a character in a string at the exact position where it was originally.
This might be one of the cases for using base R's strsplit, which gives you a list, with a vector of substrings for each entry in your original vector. You can manipulate these with lapply or sapply very easily.
split_string <- strsplit(stringOriginal, "[.]")
#> split_string
#> [[1]]
#> [1] "11" "123" "100"
#>
#> [[2]]
#> [1] "11" "123" "200"
#>
#> [[3]]
#> [1] "1" "123" "1001"
Now you can do this to get the numbers
sapply(split_string, function(x) as.numeric(paste0(x, collapse = "")))
# [1] 11123100 11123200 11231001
And this to put the dots (or any replacement for the dots) back in:
sapply(split_string, paste, collapse = ".")
# [1] "11.123.100" "11.123.200" "1.123.1001"
And you could get the location of the dots within each element of your original vector like this:
lapply(split_string, function(x) cumsum(nchar(x) + 1))
# [[1]]
# [1] 3 7 11
#
# [[2]]
# [1] 3 7 11
#
# [[3]]
# [1] 2 6 11

Collapsing mixed types into a neat comma separated string

I have a list of mixed types which I would like to collapse into a neat comma separated string to be read somewhere else. The following is a MWE:
a <- "name"
b <- as.vector(c(10))
names(b) <- c('s')
c <- as.vector(c(1, 2))
names(c) <- c('p1', 'p2')
d <- 20
r <- list(a, b, c, d)
r
# [[1]]
# [1] "name"
#
# [[2]]
# s
# 10
#
# [[3]]
# p1 p2
# 1 2
#
# [[4]]
# [1] 20
I want this:
# [1] '"name","10","1,2","20"'
But this is as far as I got:
# Collapse individual elements into individual strings.
# `sapply` with `paste` works perfectly:
> sapply(r, paste, collapse = ",")
# [1] "name" "10" "1,2" "20"
# Try paste again (doesn't work):
> paste(sapply(r, paste, collapse = ","), collapse = ',')
# [1] "name,10,1,2,20"
I tried paste0, cat to no avail. The only way I could do it is using write.table and passing it a buffer memory. That way is too complicated, and quite error prone. I need to have my code working on a cluster with MPI.
You need to add in the quotes - the ones printed after your sapply are just markers to show they are strings. This seems to work...
cat(paste0('"',sapply(r, paste, collapse = ','),'"',collapse=','))
"name","10","1,2","20"
You might need to try with and without the cat if you are writing to a file. Without it, at the terminal, you get backslashes before the 'real' quotes.

Concatenate two strings with common elements

I am working on a simple problem in R (but I have not yet figured it out though;p):
Given a vector vect1 <- c("Andy+Pete", "Mary + Pete", "Pete+ Amada", ..., "Amada + Steven", "Steven + Henry"). I want to create a new vector vect2 that contains all the elements in vect1 and new elements that share the following property: for every two strings "A+B" and "B+C", we concatenate it into "A+C" and add this new element into vect2. Can anyone please help me do this?
Also, I want to get all the elements standing in front of + in each string, is the following code correct?
for (i in length(vect1)){
vect3[i] <- regexpr(".*+", vect1[i])
}
3rd question: if I have a dataframe d with a Date column in the format %d-%b (for example, 01-Apr), how do I order this dataframe in an increasing order based on Date?? Let's just say d <- c(01-Apr,01-Mar,02-Jan,31-June,30-May).
I think you could (should) avoid both for loops and the use of external lib if not required.
So this might be a solution:
// create data
vect1 <- c("Andy+Pete", "Mary + Pete", "Pete+ Amada", "Amada + Steven", "Steven + Henry")
// create a matrix of pairs with removed white spaces
pairsMatrix <- do.call(rbind, sapply(vect1, function(v) strsplit(gsub(pattern = " ", replacement = "", x = v), "\\+")))
// remove dimnames (not necessary though)
dimnames(pairsMatrix) <- NULL
// for all line of the pairsMatrix, find if second element is somewhere else first element. Bind that with the previous pairs
allPairs <- do.call(rbind, c(list(pairsMatrix), apply(pairsMatrix, 1, function(names) c(names[1], pairsMatrix[names[2]==pairsMatrix[,1], 2]))))
// filter for oneself-relationships
allPairs[allPairs[,1]!=allPairs[,2],]
[,1] [,2]
[1,] "Andy" "Pete"
[2,] "Mary" "Pete"
[3,] "Pete" "Amada"
[4,] "Amada" "Steven"
[5,] "Steven" "Henry"
[6,] "Andy" "Amada"
[7,] "Mary" "Amada"
[8,] "Pete" "Steven"
[9,] "Amada" "Henry"
Concerning your last point, I think a simple sort with proper Date object will do it.
I think this should do it but I did things I probably shouldn't do... like growing objects and nesting for loops. If you want to access all elements in front of the '+', just use name.matrix[,1].
vect1 <- c("Andy+Pete", "Mary + Pete", "Pete+ Amada","Amada + Steven", "Steven + Henry")
library(stringr)
name.matrix <- matrix(do.call('rbind',str_split(vect1, pattern = "\\s?[+]\\s?")), ncol = 2)
new.stuff <- c()
for(x in unique(name.matrix[,2])){
sub.mat.1 <- matrix(name.matrix[name.matrix[,2] == x,], ncol = 2)
sub.mat.2 <- matrix(name.matrix[name.matrix[,1] == x,], ncol = 2)
if(length(sub.mat.1) && length(sub.mat.2)){
for(y in seq_along(sub.mat.1[,2])){
new.add <- paste0(sub.mat.1[y,1],'+', sub.mat.2[,2])
new.stuff <- c(new.stuff, new.add)
}
}
}
vect2 <- c(vect1, new.stuff)
vect2
#[1] "Andy+Pete" "Mary + Pete" "Pete+ Amada" "Amada + Steven" "Steven + Henry" "Andy+Amada"
#[7] "Mary+Amada" "Pete+Steven" "Amada+Henry"
Update:
Third question. Well there's only 30 days in June. So you're going to get an NA there. If it's a data.frame that you're trying to sort based on date, you'll need to use the format df[order(df$Date),]. The lubridate package also might be helpful when working with dates.
d <- c('01-Apr','01-Mar','02-Jan','31-June','30-May')
d.new <- as.Date(d, format = '%d-%b')
d.new <- d.new[order(d.new)]
d.new
#[1] "2018-01-02" "2018-03-01" "2018-04-01" "2018-05-30" NA

Resources