I want to generate readable number sequences (e.g. 1, 2, 3, 4 = 1-4), but for a set of data where each number in the sequence must have four digits (e.g. 99 = 0099 or 1 = 0001 or 1022 = 1022) AND where there are different letters in front of each number.
I was looking at the answer to this question, which managed to do almost exactly as I want with two caveats:
If there is a stand-alone number that does not appear in a sequence, it will appear twice with a hyphen in between
If there are several stand-alone numbers that do no appear in a sequence, they won't be included in the result
### Create Data Set ====
## Create the data for different tags. I'm only using two unique levels here, but in my dataset I've got
## 400+ unique levels.
FM <- paste0('FM', c('0001', '0016', '0017', '0018', '0019', '0021', '0024', '0026', '0028'))
SC <- paste0('SC', c('0002', '0003', '0004', '0010', '0012', '0014', '0033', '0036', '0039'))
## Combine data
my.seq1 <- c(FM, SC)
## Sort data by number in sequence
my.seq1 <- my.seq1[order(substr(my.seq1, 3, 7))]
### Attempt Number Sequencing ====
## Get the letters
sp.tags <- substr(my.seq1, 1, 2)
## Get the readable number sequence
lapply(split(my.seq1, sp.tags), ## Split data by the tag ID
function(x){
## Get the run lengths as per [previous answer][1]
rl <- rle(c(1, pmin(diff(as.numeric(substr(x, 3, 7))), 2)))
## Generate number sequence by separator as per [previous answer][1]
seq2 <- paste0(x[c(1, cumsum(rl$lengths))], c("-", ",")[rl$values], collapse="")
return(substr(seq2, 1, nchar(seq2)-1))
})
## Combine lists and sort elements
my.seq2 <- unlist(strsplit(do.call(c, my.seq2), ","))
my.seq2 <- my.seq2[order(substr(my.seq2, 3, 7))]
names(my.seq2) <- NULL
my.seq2
[1] "FM0001-FM0001" "SC0002-SC0004" "FM0016-FM0019" "FM0028" "SC0039"
my.seq1
[1] "FM0001" "SC0002" "SC0003" "SC0004" "SC0010" "SC0012" "SC0014" "FM0016" "FM0017" "FM0018" "FM0019" "FM0021"
[13] "FM0024" "FM0026" "FM0028" "SC0033" "SC0036" "SC0039"
The major problems with this are:
Some values are completely missing from the data set (e.g. FM0021, FM0024, FM0026)
The first number in the sequence (FM0001) appears with a hyphen in between
I feel like I'm getting warmer by using A5C1D2H2I1M1N2O1R2T1's answer to utilize seqToHumanReadable because it's quite elegant AND solves both problems. Two more problems are that I'm not able to tag the ID before each number and can't force the number of digits to four (e.g. 0004 becomes 4).
library(R.utils)
lapply(split(my.seq1, sp.tags), function(x){
return(unlist(strsplit(seqToHumanReadable(substr(x, 3, 7)), ',')))
})
$FM
[1] "1" " 16-19" " 21" " 24" " 26" " 28"
$SC
[1] "2-4" " 10" " 12" " 14" " 33" " 36" " 39"
Ideally the result would be:
"FM0001, SC002-SC004, SC0012, SC0014, FM0017-FM0019, FM0021, FM0024, FM0026, FM0028, SC0033, SC0036, SC0039"
Any ideas? It's one of those things that's really simple to do by hand but would take blinking ages, and you'd think a function would exist for it but I haven't found it yet or it doesn't exist :(
This should do?
# get the prefix/tag and number
tag <- gsub("(^[A-z]+)(.+)", "\\1", my.seq1)
num <- gsub("([A-z]+)(\\d+$)", "\\2", my.seq1)
# get a sequence id
n <- length(tag)
do_match <- c(FALSE, diff(as.numeric(num)) == 1 & tag[-1] == tag[-n])
seq_id <- cumsum(!do_match) # a sequence id
# tapply to combine the result
res <- setNames(tapply(my.seq1, seq_id, function(x)
if(length(x) < 2)
return(x)
else
paste(x[1], x[length(x)], sep = "-")), NULL)
# show the result
res
#R> [1] "FM0001" "SC0002-SC0004" "SC0010" "SC0012" "SC0014" "FM0016-FM0019" "FM0021"
#R> [8] "FM0024" "FM0026" "FM0028" "SC0033" "SC0036" "SC0039"
# compare with
my.seq1
#R> [1] "FM0001" "SC0002" "SC0003" "SC0004" "SC0010" "SC0012" "SC0014" "FM0016" "FM0017" "FM0018" "FM0019" "FM0021" "FM0024"
#R> [14] "FM0026" "FM0028" "SC0033" "SC0036" "SC0039"
Data
FM <- paste0('FM', c('0001', '0016', '0017', '0018', '0019', '0021', '0024', '0026', '0028'))
SC <- paste0('SC', c('0002', '0003', '0004', '0010', '0012', '0014', '0033', '0036', '0039'))
my.seq1 <- c(FM, SC)
my.seq1 <- my.seq1[order(substr(my.seq1, 3, 7))]
Related
From my inputs, which is numeric format and represent the year and the week number, I need to create a sequence, from one input to the other.
Inputs example :
input.from <- 202144
input.to <- 202208
Desired output would be :
c(202144:202152, 202201:202208)
According to me, it is a little more complex, because of these constraints :
Years with 53 weeks : I tried lubridate::isoweek(), the %W or %v format, ...
Always keep two digits for the week : I tried "%02d", ...
I also tried to convert my input to date, ...
Anyway, many attemps without success to create my function.
Thanks for your help !
In case it would be useful to someone one day, here is finally the function I wrote, which respects ISO 8601 :
library(ISOweek)
foo <- function(pdeb, pfin) {
from <- ISOweek::ISOweek2date(paste0(substr(pdeb, 1, 4), "-W", substr(pdeb, 5, 6), "-1"))
to <- ISOweek::ISOweek2date(paste0(substr(pfin, 1, 4), "-W", substr(pfin, 5, 6), "-1"))
res <- seq.Date(from, to, by = "week")
return(format(res, format = "%G%V"))
}
foo(201950, 202205)
Step #1 : tranform input to character : YYYY-"W"WW-1
Step #2 : capture the ISOweek
Step #3 : sequence by week
Step #4 : return the sequence to the format "%G%V", still to respect ISO 8601 and YYYYWW
I'd go with
x <- c("202144", "202208")
out <- do.call(seq, c(as.list(as.Date(paste0(x, "1"), format="%Y%U%u")), by = "week"))
out
# [1] "2021-11-01" "2021-11-08" "2021-11-15" "2021-11-22" "2021-11-29" "2021-12-06" "2021-12-13" "2021-12-20" "2021-12-27"
# [10] "2022-01-03" "2022-01-10" "2022-01-17" "2022-01-24" "2022-01-31" "2022-02-07" "2022-02-14" "2022-02-21"
If you really want to keep them in the %Y%W format, then
format(out, format = "%Y%W")
# [1] "202144" "202145" "202146" "202147" "202148" "202149" "202150" "202151" "202152" "202201" "202202" "202203" "202204"
# [14] "202205" "202206" "202207" "202208"
(This answer heavily informed by Transform year/week to date object)
We could do some mathematics.
f <- function(from, to) {
r <- from:to
r[r %% 100 > 0 & r %% 100 < 53]
}
input.from <- 202144; input.to <- 202208
f(input.from, input.to)
# [1] 202144 202145 202146 202147 202148 202149 202150 202151 202152
# [10] 202201 202202 202203 202204 202205 202206 202207 202208
I have the following dataset containing dates:
> dates
[1] "20180412" "20180424" "20180506" "20180518" "20180530" "20180611" "20180623" "20180705" "20180717" "20180729"
I am trying to create a list where in each position, the name is 'Coherence_' + the first and second dates in dates. So in output1[1] I would have Coherence_20180412_20180424. Then in output1[2] I would have Coherence_20180506_20180518, etc.
I am starting with this code but it is not working they way I need:
output1<-list()
for (i in 1:5){
output1[[i]]<-paste("-Poutput1=", S1_Out_Path,"Coherence_VV_TC", dates[[i]],"_", dates[[i+1]], ".tif", sep="")
}
Do you have any suggestions?
M
Try this:
Without loop
even_indexes<-seq(2,10,2) # List of even indexes
odd_indexes<-seq(1,10,2) # List of odd indexes
print(paste('Coherence',paste(odd_indexes,even_indexes,sep = "_"),sep = "_"))
Link answer from here: Create list in R with specific iteration
Updated (To get data in List)
lst=c(paste('Coherence',paste(odd_indexes,even_indexes,sep = "_"),sep = "_"))
OR
a=c(1:10)
for (i in seq(1, 9, 2)){
print(paste('Coherence',paste(a[i],a[i+1],sep = "_"),sep = "_"))
}
Output:
[1] "Coherence_1_2"
[1] "Coherence_3_4"
[1] "Coherence_5_6"
[1] "Coherence_7_8"
[1] "Coherence_9_10"
You can create these patterns using paste capability to operate on vectors:
dates <- c("20180412", "20180424", "20180506", "20180518", "20180530",
"20180611", "20180623", "20180705", "20180717", "20180729")
paste("Coherence", dates[1:length(dates)-1], dates[2:length(dates)], sep="_")
[1] "Coherence_20180412_20180424" "Coherence_20180424_20180506" "Coherence_20180506_20180518"
[4] "Coherence_20180518_20180530" "Coherence_20180530_20180611" "Coherence_20180611_20180623"
[7] "Coherence_20180623_20180705" "Coherence_20180705_20180717" "Coherence_20180717_20180729"
Or other simple patterns can be generated as:
paste("Coherence", dates[seq(1, length(dates), 2)], dates[seq(2, length(dates), 2)], sep="_")
[1] "Coherence_20180412_20180424" "Coherence_20180506_20180518" "Coherence_20180530_20180611"
[4] "Coherence_20180623_20180705" "Coherence_20180717_20180729"
You can use matrix(..., nrow=2):
dates <- c("20180412", "20180424", "20180506", "20180518", "20180530", "20180611", "20180623", "20180705", "20180717", "20180729")
paste0("Coherence_", apply(matrix(dates, 2), 2, FUN=paste0, collapse="_"))
# > paste0("Coherence_", apply(matrix(dates, 2), 2, FUN=paste0, collapse="_"))
# [1] "Coherence_20180412_20180424" "Coherence_20180506_20180518" "Coherence_20180530_20180611" "Coherence_20180623_20180705"
# [5] "Coherence_20180717_20180729"
I've got a log file that looks as follows:
Data:
+datadir=/data/2017-11-22
+Nusers=5292
Parameters:
+outdir=/data/2017-11-22/out
+K=20
+IC=179
+ICgroups=3
-group 1: 1-1
ICeffects: 1-5
-group 2: 2-173
ICeffects: 6-10
-group 3: 175-179
ICeffects: 11-15
I would like to parse this logfile into a nested list using R so that the result will look like this:
result <- list(Data = list(datadir = '/data/2017-11-22',
Nusers = 5292),
Parameters = list(outdir = '/data/2017-11-22/out',
K = 20,
IC = 179,
ICgroups = list(list('group 1' = '1-1',
ICeffects = '1-5'),
list('group 2' = '2-173',
ICeffects = '6-10'),
list('group 1' = '175-179',
ICeffects = '11-15'))))
Is there a not-extremely-painful way of doing this?
Disclaimer: This is messy. There is no guarantee that this will work for larger/different files without some tweaking. You will need to do some careful checking.
The key idea here is to reformat the raw data, to make it consistent with the YAML format, and then use yaml::yaml.load to parse the data to produce a nested list.
By the way, this is an excellent example on why one really should use a common markup language for log-output/config files (like JSON, YAML, etc.)...
I assume you read in the log file using readLines to produce the vector of strings ss.
# Sample data
ss <- c(
"Data:",
" +datadir=/data/2017-11-22",
" +Nusers=5292",
"Parameters:",
" +outdir=/data/2017-11-22/out",
" +K=20",
" +IC=179",
" +ICgroups=3",
" -group 1: 1-1",
" ICeffects: 1-5",
" -group 2: 2-173",
" ICeffects: 6-10",
" -group 3: 175-179",
" ICeffects: 11-15")
We then reformat the data to adhere to the YAML format.
# Reformat to adhere to YAML formatting
ss <- gsub("\\+", "- ", ss); # Replace "+" with "- "
ss <- gsub("ICgroups=\\d+","ICgroups:", ss); # Replace "ICgroups=3" with "ICgroups:"
ss <- gsub("=", " : ", ss); # Replace "=" with ": "
ss <- gsub("-group", "- group", ss); # Replace "-group" with "- group"
ss <- gsub("ICeffects", " ICeffects", ss); # Replace "ICeffects" with " ICeffects"
Note that – consistent with your expected output – the value 3 from ICgroups doesn't get used, and we need to replace ICgroups=3 with ICgroups: to initiate a nested sub-list. This was the part that threw me off first...
Loading & parsing the YAML string then produces a nested list.
require(yaml);
lst <- yaml.load(paste(ss, collapse = "\n"));
lst;
#$Data
#$Data[[1]]
#$Data[[1]]$datadir
#[1] "/data/2017-11-22"
#
#
#$Data[[2]]
#$Data[[2]]$Nusers
#[1] 5292
#
#
#
#$Parameters
#$Parameters[[1]]
#$Parameters[[1]]$outdir
#[1] "/data/2017-11-22/out"
#
#
#$Parameters[[2]]
#$Parameters[[2]]$K
#[1] 20
#
#
#$Parameters[[3]]
#$Parameters[[3]]$IC
#[1] 179
#
#
#$Parameters[[4]]
#$Parameters[[4]]$ICgroups
#$Parameters[[4]]$ICgroups[[1]]
#$Parameters[[4]]$ICgroups[[1]]$`group 1`
#[1] "1-1"
#
#$Parameters[[4]]$ICgroups[[1]]$ICeffects
#[1] "1-5"
#
#
#$Parameters[[4]]$ICgroups[[2]]
#$Parameters[[4]]$ICgroups[[2]]$`group 2`
#[1] "2-173"
#
#$Parameters[[4]]$ICgroups[[2]]$ICeffects
#[1] "6-10"
#
#
#$Parameters[[4]]$ICgroups[[3]]
#$Parameters[[4]]$ICgroups[[3]]$`group 3`
#[1] "175-179"
#
#$Parameters[[4]]$ICgroups[[3]]$ICeffects
#[1] "11-15"
PS. You will need to test this on larger files, and make changes to the substitution if necessary.
I was looking for a way to format large numbers in R as 2.3K or 5.6M. I found this solution on SO. Turns out, it shows some strange behaviour for some input vectors.
Here is what I am trying to understand -
# Test vector with weird behaviour
x <- c(302.456500093388, 32553.3619756151, 3323.71232001074, 12065.4076372462,
0, 6270.87962956305, 383.337515655172, 402.20778095643, 19466.0204345063,
1779.05474064539, 1467.09928489114, 3786.27112222457, 2080.08078309959,
51114.7097545816, 51188.7710104291, 59713.9414049798)
# Formatting function for large numbers
comprss <- function(tx) {
div <- findInterval(as.numeric(gsub("\\,", "", tx)),
c(1, 1e3, 1e6, 1e9, 1e12) )
paste(round( as.numeric(gsub("\\,","",tx))/10^(3*(div-1)), 1),
c('','K','M','B','T')[div], sep = '')
}
# Compare outputs for the following three commands
x
comprss(x)
sapply(x, comprss)
We can see that comprss(x) produces 0k as the 5th element which is weird, but comprss(x[5]) gives us the expected results. The 6th element is even weirder.
As far as I know, all the functions used in the body of comprss are vectorised. Then why do I still need to sapply my way out of this?
Here's a vectorized version adapted from pryr:::print.bytes:
format_for_humans <- function(x, digits = 3){
grouping <- pmax(floor(log(abs(x), 1000)), 0)
paste0(signif(x / (1000 ^ grouping), digits = digits),
c('', 'K', 'M', 'B', 'T')[grouping + 1])
}
format_for_humans(10 ^ seq(0, 12, 2))
#> [1] "1" "100" "10K" "1M" "100M" "10B" "1T"
x <- c(302.456500093388, 32553.3619756151, 3323.71232001074, 12065.4076372462,
0, 6270.87962956305, 383.337515655172, 402.20778095643, 19466.0204345063,
1779.05474064539, 1467.09928489114, 3786.27112222457, 2080.08078309959,
51114.7097545816, 51188.7710104291, 59713.9414049798)
format_for_humans(x)
#> [1] "302" "32.6K" "3.32K" "12.1K" "0" "6.27K" "383" "402"
#> [9] "19.5K" "1.78K" "1.47K" "3.79K" "2.08K" "51.1K" "51.2K" "59.7K"
format_for_humans(x, digits = 1)
#> [1] "300" "30K" "3K" "10K" "0" "6K" "400" "400" "20K" "2K" "1K"
#> [12] "4K" "2K" "50K" "50K" "60K"
I'm trying to get a list where each element has a name, by applying a function to each row of a data frame, but can't get the right output.
Assuming this is the function that I want to apply to each row:
format_setup_name <- function(m, v, s) {
a <- list()
a[[paste(m, "machines and", v, s, "GB volumes")]] <- paste(num_machines,num_volumes,vol_size,sep="-")
a
}
If this is the input data frame:
df <- data.frame(m=c(1,2,3), v=c(3,3,3), s=c(15,20,30))
I can't get a list that looks like:
$`1-3-15`
[1] "1 machines and 3 15 GB volumes"
$`2-3-20`
[1] "2 machines and 3 20 GB volumes"
$`3-3-30`
[1] "3 machines and 3 30 GB volumes"
Can someone give me hints how to do it?
Why do I need this? Well, I want to populate selectizeInput in shiny using values coming from the database. Since I'm combining several columns, I need a way to match the selected input with the values.
This is a good use case for setNames which can add the names() attribute to an object, in place. Also, if you use as.list, you can do this in just one line without any looping:
setNames(as.list(paste(df$m, ifelse(df$m == 1, "machine", "machines"), "and", df$v, df$s, "GB volumes")), paste(df$m,df$v,df$s,sep="-"))
# $`1-3-15`
# [1] "1 machine and 3 15 GB volumes"
#
# $`2-3-20`
# [1] "2 machines and 3 20 GB volumes"
#
# $`3-3-30`
# [1] "3 machines and 3 30 GB volumes"
Thomas has already found a pretty neat solution to your problem (and in one line, too!). But I'll just show you how you could have succeeded with the approach you first tried:
# We'll use the same data, this time called "dat" (I avoid calling
# objects `df` because `df` is also a function's name)
dat <- data.frame(m = c(1,2,3), v = c(3,3,3), s = c(15,20,30))
format_setup_name <- function(m, v, s) {
a <- list() # initialize the list, all is well up to here
# But here we'll need a loop to assign in turn each element to the list
for(i in seq_along(m)) {
a[[paste(m[i], v[i], s[i], sep="-")]] <-
paste(m[i], "machines and", v[i], s[i], "GB volumes")
}
return(a)
}
Note that what goes inside the brackets is the name of the element, while what's at the right side of the <- is the content to be assigned, not the inverse as your code was suggesting.
So let's try it:
my.setup <- format_setup_name(dat$m, dat$v, dat$s)
my.setup
# $`1-3-15`
# [1] "1 machines and 3 15 GB volumes"
#
# $`2-3-20`
# [1] "2 machines and 3 20 GB volumes"
#
# $`3-3-30`
# [1] "3 machines and 3 30 GB volumes"
Everything seems nice. Just one thing to note: with the $ operator, you'll need to use single or double quotes to access individual items by their names:
my.setup$"1-3-15" # my.setup$1-3-15 won't work
# [1] "1 machines and 3 15 GB volumes"
my.setup[['1-3-15']] # equivalent
# [1] "1 machines and 3 15 GB volumes"
Edit: lapply version
Since loops have really fallen out of favor, here's a version with lapply:
format_setup_name <- function(m, v, s) {
a <- lapply(seq_along(m), function(i) paste(m[i], "machines and", v[i], s[i], "GB volumes"))
names(a) <- paste(m, v, s, sep="-")
return(a)
}