how create a sequence of strings with different numbers in R - r

I just cant figure it out how to create a vector in which the strings are constant but the numbers are not. For example:
c("raster[1]","raster[2]","raster[3]")
I'd like to use something like seq(raster[1],raster[99], by=1), but this does not work.
Thanks in advance.

The sprintf function should also work:
rasters <- sprintf("raster[%s]",seq(1:99))
head(rasters)
[1] "raster[1]" "raster[2]" "raster[3]" "raster[4]" "raster[5]" "raster[6]"
As suggested by Richard Scriven, %d is more efficient than %s. So, if you were working with a longer sequence, it would be more appropriate to use:
rasters <- sprintf("raster[%d]",seq(1:99))

We can do
paste0("raster[", 1:6, "]")
# [1] "raster[1]" "raster[2]" "raster[3]" "raster[4]" "raster[5]" "raster[6]"

Related

regex single digit

I have a question which I think is solved by regex use in R.
I have a set of dates (as chr) which I would like in a different format (as chr).
I have tried to fool around with the below examples where the first (new_dates) gives the right format for months 1-9 and wrong for 10-12 and (new_dates2) gives the right format for 10-12 but nothing for 1-9.
I see that the code in the first case matches a single digit twice for 10-12, but don't really know how to tell it to match only single digit.
The final vector of correct dates shows the result I would like.
dates <- c("1/2016", "2/2016", "3/2016", "4/2016", "5/2016", "6/2016", "7/2016", "8/2016", "9/2016", "10/2016", "11/2016", "12/2016", "1/2017")
new_dates <- sub("(\\d)[:/:](\\d{4})","\\2M0\\1", dates)
new_dates2 <- sub("(\\d{2})[:/:](\\d{4})","\\2M\\1", dates)
correctdates <- c("2016M01", "2016M02", "2016M03", "2016M04", "2016M05", "2016M06", "2016M07", "2016M08", "2016M09", "2016M10", "2016M11", "2016M12", "2017M1")
Here's a base R method that will return the desired format:
format(as.Date(paste0("1/",dates), "%d/%m/%Y"), "%YM%m")
[1] "2016M01" "2016M02" "2016M03" "2016M04" "2016M05" "2016M06" "2016M07" "2016M08" "2016M09"
[10] "2016M10" "2016M11" "2016M12" "2017M01"
The idea is to first convert to a Date object and then use the format function to create the desired character representation. I pasted on 1/ so that a day is present in each element.
As #a p o m said it might be better to look for another solution if you are manipulating dates but if you want to stick with regular expressions you can try this one.
([02-9]|1[0-2]?)[:\/](\d{4}) example
new_dates <- sub("(\\d{1,2})\\/(\\d{4})","\\2M0\\1", dates)
It's fine.

Processing files in a particular order in R

I have several datafiles, which I need to process in a particular order. The pattern of the names of the files is, e.g. "Ad_10170_75_79.txt".
Currently they are sorted according to the first numbers (which differ in length), see below:
f <- as.matrix (list.files())
f
[1] "Ad_10170_75_79.txt" "Ad_10345_76_79.txt" "Ad_1049_25_79.txt" "Ad_10531_77_79.txt"
But I need them to be sorted by the middle number, like this:
> f
[1] "Ad_1049_25_79.txt" "Ad_10170_75_79.txt" "Ad_10345_76_79.txt" "Ad_10531_77_79.txt"
As I just need the middle number of the filename, I thought the easiest way is, to get rid of the rest of the name and renaming all files. For this I tried using strsplit (plyr).
f2 <- strsplit (f,"_79.txt")
But I'm sure there is a way to sort the files directly, without renaming all files. I tried using sort and to describe the name with regex but without success. This has been a problem for many days, and I spent several hours searching and trying, to solve this presumably easy task. Any help is very much appreciated.
old example dataset:
f <- c("Ad_10170_75_79.txt", "Ad_10345_76_79.txt",
"Ad_1049_25_79.txt", "Ad_10531_77_79.txt")
Thank your for your answers. I think I have to modify my example, because the solution should work for all possible middle numbers, independent of their digits.
new example dataset:
f <- c("Ad_10170_75_79.txt", "Ad_10345_76_79.txt",
"Ad_1049_9_79.txt", "Ad_10531_77_79.txt")
Here's a regex approach.
f[order(as.numeric(gsub('Ad_\\d+_(\\d+)_\\d+\\.txt', '\\1', f)))]
# [1] "Ad_1049_9_79.txt" "Ad_10170_75_79.txt" "Ad_10345_76_79.txt" "Ad_10531_77_79.txt"
Try this:
f[order(as.numeric(unlist(lapply(strsplit(f, "_"), "[[", 3))))]
[1] "Ad_1049_25_79.txt" "Ad_10170_75_79.txt" "Ad_10345_76_79.txt" "Ad_10531_77_79.txt"
First we split by _, then select the third element of every list element, find the order and subset f based on that order.
I would create a small dataframe containing filenames and their respective extracted indices:
f<- c("Ad_10170_75_79.txt","Ad_10345_76_79.txt","Ad_1049_25_79.txt","Ad_10531_77_79.txt")
f2 <- strsplit (f,"_79.txt")
mydb <- as.data.frame(cbind(f,substr(f2,start=nchar(f2)-1,nchar(f2))))
names(mydb) <- c("filename","index")
library(plyr)
arrange(mydb,index)
Take the first column of this as your filename vector.
ADDENDUM:
If a numeric index is required, simply convert character to numeric:
mydb$index <- as.numeric(mydb$index)

Using column names with signs of a data frame in a qplot

I have a dataset and unfortunately some of the column labels in my dataframe contain signs (- or +). This doesn't seem to bother the dataframe, but when I try to plot this with qplot it throws me an error:
x <- 1:5
y <- x
names <- c("1+", "2-")
mydf <- data.frame(x, y)
colnames(mydf) <- names
mydf
qplot(1+, 2-, data = mydf)
and if I enclose the column names in quotes it will just give me a category (or something to that effect, it'll give me a plot of "1+" vs. "2-" with one point in the middle).
Is it possible to do this easily? I looked at aes_string but didn't quite understand it (at least not enough to get it to work).
Thanks in advance.
P.S. I have searched for a solution online but can't quite find anything that helps me with this (it could be due to some aspect I don't understand), so I reason it might be because this is a completely retarded naming scheme I have :p.
Since you have non-standard column names, you need to to use backticks (`)in your column references.
For example:
mydf$`1+`
[1] 1 2 3 4 5
So, your qplot() call should look like this:
qplot(`1+`, `2-`, data = mydf)
You can find more information in ?Quotes and ?names
As said in the other answer you have a problem because you you don't have standard names. When solution is to avoid backticks notation is to convert colnames to a standard form. Another motivation to convert names to regular ones is , you can't use backticks in a lattice plot for example. Using gsub you can do this:
gsub('(^[0-9]+)[+|-]+|[+|-]+','a\\1',c("1+", "2-","a--"))
[1] "a1" "a2" "aa"
Hence, applying this to your example :
colnames(mydf) <- gsub('(^[0-9]+)[+|-]+|[+|-]+','a\\1',colnames(mydf))
qplot(a1,a2,data = mydf)
EIDT
you can use make.names with option unique =T
make.names(c("10+", "20-", "10-", "a30++"),unique=T)
[1] "X10." "X20." "X10..1" "a30.."
If you don't like R naming rules, here a custom version with using gsubfn
library(gsubfn)
gsubfn("[+|-]|^[0-9]+",
function(x) switch(x,'+'= 'a','-' ='b',paste('x',x,sep='')),
c("10+", "20-", "10-", "a30++"))
"x10a" "x20b" "x10b" "a30aa" ## note x10b looks better than X10..1

R - Using grep and gsub to return more than one match in the same (character) vector element

Imagine we want to find all of the FOOs and subsequent numbers in the string below and return them as a vector (apologies for unreadability, I wanted to make the point there is no regular pattern before and after the FOOs):
xx <- "xasdrFOO1921ddjadFOO1234dakaFOO12345ndlslsFOO1643xasdf"
We can use this to find one of them (taken from 1)
gsub(".*(FOO[0-9]+).*", "\\1", xx)
[1] "FOO1643"
However, I want to return all of them, as a vector.
I've thought of a complicated way to do it using strplit() and gregexpr() - but I feel there is a better (and easier) way.
You may be interested in regmatches:
> regmatches(xx, gregexpr("FOO[0-9]+", xx))[[1]]
[1] "FOO1921" "FOO1234" "FOO12345" "FOO1643"
xx <- "xasdrFOO1921ddjadFOO1234dakaFOO12345ndlslsFOO1643xasdf"
library(stringr)
str_extract_all(xx, "(FOO[0-9]+)")[[1]]
#[1] "FOO1921" "FOO1234" "FOO12345" "FOO1643"
this can take vectors of strings as well, and mathces will be in list elements.
Slightly shorter version.
library(gsubfn)
strapplyc(xx,"FOO[0-9]*")[[1]]

R gsub add leading line break

I need to add a leading lines break "\n" to a list of axis label names in R. I cannot work out how to do this with gsub. For example, I need "Q1\n/\n15" to read "\nQ1\n/\n15". Neither google nor the help commands are leading me to the answer. Any advice?
Thanks in advance.
So there are about 4 answers in the comments (as of this writing), so I'll just summarize them in a proper answer.
examp <- "Q1\n/\n15"
paste("\n", examp, sep="")
gsub("^(.)","\n\\1",examp)
sprintf("\n%s", examp)
gsub("^", "\n", examp)
all of which give
[1] "\nQ1\n/\n15"
And all of which are properly vectorized (that is, if examp <- c("Q1\n/\n15", "Q1\n/\n16"), all return [1] "\nQ1\n/\n15" "\nQ1\n/\n16".

Resources