Something maybe obvious but I can't seem to see it :
I have a vector like this :
vec<-c("i: 1","n: alpha","a: term1","a: term2", "i: 2","n: beta","a: term3","i: 3","n: gamma","a: term4","a: term5","a: term6")
and I need to get this :
out<-c("i: 1","n: alpha","a: term1;term2", "i: 2","n: beta","a: term3","i: 3","n: gamma","a: term4;term5;term6")
That is, for each unique i:, fuse the a: when there are more than one.
I tried with diff and rle but the resulted code (see below) is too long and I think I'm complicating uselessly the problem...
my code :
out<-vec
a<-which(grepl("^a: ",vec))
diffa<-diff(a)
diffa1<-which(diffa==1)
rle_a<-rle(diffa)$lengths[rle(diffa)$values==1]
indwh<-1
for(ind in 1:length(rle_a)){
allindwh<-indwh:(indwh+rle_a[ind]-1)
out[a[c(diffa1[allindwh],diffa1[allindwh[length(allindwh)]]+1)]]<-paste(out[a[diffa1[allindwh[1]]]],paste(gsub("a: ","",out[a[c(diffa1[allindwh[-1]],diffa1[allindwh[length(allindwh)]]+1)]]),collapse=";"),sep=";")
indwh<-indwh+rle_a[ind]
}
out<-unique(out)
So I get what I want but I would really appreciate any hint to simplify it.
Here's an easier approach with tapply:
# index of 'a's
idx <- grepl("^a", vec)
# find groups
grp <- c(0, cumsum(diff(idx) < 0))
# apply function to vector based on groups
unlist(tapply(vec, grp, FUN = function(x)
c(x[1:2], paste("a:", paste(sub("^a:\\s*", "", x[-(1:2)]), collapse = ";")))),
use.names = FALSE)
# [1] "i: 1" "n: alpha" "a: term1;term2"
# [4] "i: 2" "n: beta" "a: term3"
# [7] "i: 3" "n: gamma" "a: term4;term5;term6"
Related
I want to generate readable number sequences (e.g. 1, 2, 3, 4 = 1-4), but for a set of data where each number in the sequence must have four digits (e.g. 99 = 0099 or 1 = 0001 or 1022 = 1022) AND where there are different letters in front of each number.
I was looking at the answer to this question, which managed to do almost exactly as I want with two caveats:
If there is a stand-alone number that does not appear in a sequence, it will appear twice with a hyphen in between
If there are several stand-alone numbers that do no appear in a sequence, they won't be included in the result
### Create Data Set ====
## Create the data for different tags. I'm only using two unique levels here, but in my dataset I've got
## 400+ unique levels.
FM <- paste0('FM', c('0001', '0016', '0017', '0018', '0019', '0021', '0024', '0026', '0028'))
SC <- paste0('SC', c('0002', '0003', '0004', '0010', '0012', '0014', '0033', '0036', '0039'))
## Combine data
my.seq1 <- c(FM, SC)
## Sort data by number in sequence
my.seq1 <- my.seq1[order(substr(my.seq1, 3, 7))]
### Attempt Number Sequencing ====
## Get the letters
sp.tags <- substr(my.seq1, 1, 2)
## Get the readable number sequence
lapply(split(my.seq1, sp.tags), ## Split data by the tag ID
function(x){
## Get the run lengths as per [previous answer][1]
rl <- rle(c(1, pmin(diff(as.numeric(substr(x, 3, 7))), 2)))
## Generate number sequence by separator as per [previous answer][1]
seq2 <- paste0(x[c(1, cumsum(rl$lengths))], c("-", ",")[rl$values], collapse="")
return(substr(seq2, 1, nchar(seq2)-1))
})
## Combine lists and sort elements
my.seq2 <- unlist(strsplit(do.call(c, my.seq2), ","))
my.seq2 <- my.seq2[order(substr(my.seq2, 3, 7))]
names(my.seq2) <- NULL
my.seq2
[1] "FM0001-FM0001" "SC0002-SC0004" "FM0016-FM0019" "FM0028" "SC0039"
my.seq1
[1] "FM0001" "SC0002" "SC0003" "SC0004" "SC0010" "SC0012" "SC0014" "FM0016" "FM0017" "FM0018" "FM0019" "FM0021"
[13] "FM0024" "FM0026" "FM0028" "SC0033" "SC0036" "SC0039"
The major problems with this are:
Some values are completely missing from the data set (e.g. FM0021, FM0024, FM0026)
The first number in the sequence (FM0001) appears with a hyphen in between
I feel like I'm getting warmer by using A5C1D2H2I1M1N2O1R2T1's answer to utilize seqToHumanReadable because it's quite elegant AND solves both problems. Two more problems are that I'm not able to tag the ID before each number and can't force the number of digits to four (e.g. 0004 becomes 4).
library(R.utils)
lapply(split(my.seq1, sp.tags), function(x){
return(unlist(strsplit(seqToHumanReadable(substr(x, 3, 7)), ',')))
})
$FM
[1] "1" " 16-19" " 21" " 24" " 26" " 28"
$SC
[1] "2-4" " 10" " 12" " 14" " 33" " 36" " 39"
Ideally the result would be:
"FM0001, SC002-SC004, SC0012, SC0014, FM0017-FM0019, FM0021, FM0024, FM0026, FM0028, SC0033, SC0036, SC0039"
Any ideas? It's one of those things that's really simple to do by hand but would take blinking ages, and you'd think a function would exist for it but I haven't found it yet or it doesn't exist :(
This should do?
# get the prefix/tag and number
tag <- gsub("(^[A-z]+)(.+)", "\\1", my.seq1)
num <- gsub("([A-z]+)(\\d+$)", "\\2", my.seq1)
# get a sequence id
n <- length(tag)
do_match <- c(FALSE, diff(as.numeric(num)) == 1 & tag[-1] == tag[-n])
seq_id <- cumsum(!do_match) # a sequence id
# tapply to combine the result
res <- setNames(tapply(my.seq1, seq_id, function(x)
if(length(x) < 2)
return(x)
else
paste(x[1], x[length(x)], sep = "-")), NULL)
# show the result
res
#R> [1] "FM0001" "SC0002-SC0004" "SC0010" "SC0012" "SC0014" "FM0016-FM0019" "FM0021"
#R> [8] "FM0024" "FM0026" "FM0028" "SC0033" "SC0036" "SC0039"
# compare with
my.seq1
#R> [1] "FM0001" "SC0002" "SC0003" "SC0004" "SC0010" "SC0012" "SC0014" "FM0016" "FM0017" "FM0018" "FM0019" "FM0021" "FM0024"
#R> [14] "FM0026" "FM0028" "SC0033" "SC0036" "SC0039"
Data
FM <- paste0('FM', c('0001', '0016', '0017', '0018', '0019', '0021', '0024', '0026', '0028'))
SC <- paste0('SC', c('0002', '0003', '0004', '0010', '0012', '0014', '0033', '0036', '0039'))
my.seq1 <- c(FM, SC)
my.seq1 <- my.seq1[order(substr(my.seq1, 3, 7))]
I've got a log file that looks as follows:
Data:
+datadir=/data/2017-11-22
+Nusers=5292
Parameters:
+outdir=/data/2017-11-22/out
+K=20
+IC=179
+ICgroups=3
-group 1: 1-1
ICeffects: 1-5
-group 2: 2-173
ICeffects: 6-10
-group 3: 175-179
ICeffects: 11-15
I would like to parse this logfile into a nested list using R so that the result will look like this:
result <- list(Data = list(datadir = '/data/2017-11-22',
Nusers = 5292),
Parameters = list(outdir = '/data/2017-11-22/out',
K = 20,
IC = 179,
ICgroups = list(list('group 1' = '1-1',
ICeffects = '1-5'),
list('group 2' = '2-173',
ICeffects = '6-10'),
list('group 1' = '175-179',
ICeffects = '11-15'))))
Is there a not-extremely-painful way of doing this?
Disclaimer: This is messy. There is no guarantee that this will work for larger/different files without some tweaking. You will need to do some careful checking.
The key idea here is to reformat the raw data, to make it consistent with the YAML format, and then use yaml::yaml.load to parse the data to produce a nested list.
By the way, this is an excellent example on why one really should use a common markup language for log-output/config files (like JSON, YAML, etc.)...
I assume you read in the log file using readLines to produce the vector of strings ss.
# Sample data
ss <- c(
"Data:",
" +datadir=/data/2017-11-22",
" +Nusers=5292",
"Parameters:",
" +outdir=/data/2017-11-22/out",
" +K=20",
" +IC=179",
" +ICgroups=3",
" -group 1: 1-1",
" ICeffects: 1-5",
" -group 2: 2-173",
" ICeffects: 6-10",
" -group 3: 175-179",
" ICeffects: 11-15")
We then reformat the data to adhere to the YAML format.
# Reformat to adhere to YAML formatting
ss <- gsub("\\+", "- ", ss); # Replace "+" with "- "
ss <- gsub("ICgroups=\\d+","ICgroups:", ss); # Replace "ICgroups=3" with "ICgroups:"
ss <- gsub("=", " : ", ss); # Replace "=" with ": "
ss <- gsub("-group", "- group", ss); # Replace "-group" with "- group"
ss <- gsub("ICeffects", " ICeffects", ss); # Replace "ICeffects" with " ICeffects"
Note that – consistent with your expected output – the value 3 from ICgroups doesn't get used, and we need to replace ICgroups=3 with ICgroups: to initiate a nested sub-list. This was the part that threw me off first...
Loading & parsing the YAML string then produces a nested list.
require(yaml);
lst <- yaml.load(paste(ss, collapse = "\n"));
lst;
#$Data
#$Data[[1]]
#$Data[[1]]$datadir
#[1] "/data/2017-11-22"
#
#
#$Data[[2]]
#$Data[[2]]$Nusers
#[1] 5292
#
#
#
#$Parameters
#$Parameters[[1]]
#$Parameters[[1]]$outdir
#[1] "/data/2017-11-22/out"
#
#
#$Parameters[[2]]
#$Parameters[[2]]$K
#[1] 20
#
#
#$Parameters[[3]]
#$Parameters[[3]]$IC
#[1] 179
#
#
#$Parameters[[4]]
#$Parameters[[4]]$ICgroups
#$Parameters[[4]]$ICgroups[[1]]
#$Parameters[[4]]$ICgroups[[1]]$`group 1`
#[1] "1-1"
#
#$Parameters[[4]]$ICgroups[[1]]$ICeffects
#[1] "1-5"
#
#
#$Parameters[[4]]$ICgroups[[2]]
#$Parameters[[4]]$ICgroups[[2]]$`group 2`
#[1] "2-173"
#
#$Parameters[[4]]$ICgroups[[2]]$ICeffects
#[1] "6-10"
#
#
#$Parameters[[4]]$ICgroups[[3]]
#$Parameters[[4]]$ICgroups[[3]]$`group 3`
#[1] "175-179"
#
#$Parameters[[4]]$ICgroups[[3]]$ICeffects
#[1] "11-15"
PS. You will need to test this on larger files, and make changes to the substitution if necessary.
I was looking for a way to format large numbers in R as 2.3K or 5.6M. I found this solution on SO. Turns out, it shows some strange behaviour for some input vectors.
Here is what I am trying to understand -
# Test vector with weird behaviour
x <- c(302.456500093388, 32553.3619756151, 3323.71232001074, 12065.4076372462,
0, 6270.87962956305, 383.337515655172, 402.20778095643, 19466.0204345063,
1779.05474064539, 1467.09928489114, 3786.27112222457, 2080.08078309959,
51114.7097545816, 51188.7710104291, 59713.9414049798)
# Formatting function for large numbers
comprss <- function(tx) {
div <- findInterval(as.numeric(gsub("\\,", "", tx)),
c(1, 1e3, 1e6, 1e9, 1e12) )
paste(round( as.numeric(gsub("\\,","",tx))/10^(3*(div-1)), 1),
c('','K','M','B','T')[div], sep = '')
}
# Compare outputs for the following three commands
x
comprss(x)
sapply(x, comprss)
We can see that comprss(x) produces 0k as the 5th element which is weird, but comprss(x[5]) gives us the expected results. The 6th element is even weirder.
As far as I know, all the functions used in the body of comprss are vectorised. Then why do I still need to sapply my way out of this?
Here's a vectorized version adapted from pryr:::print.bytes:
format_for_humans <- function(x, digits = 3){
grouping <- pmax(floor(log(abs(x), 1000)), 0)
paste0(signif(x / (1000 ^ grouping), digits = digits),
c('', 'K', 'M', 'B', 'T')[grouping + 1])
}
format_for_humans(10 ^ seq(0, 12, 2))
#> [1] "1" "100" "10K" "1M" "100M" "10B" "1T"
x <- c(302.456500093388, 32553.3619756151, 3323.71232001074, 12065.4076372462,
0, 6270.87962956305, 383.337515655172, 402.20778095643, 19466.0204345063,
1779.05474064539, 1467.09928489114, 3786.27112222457, 2080.08078309959,
51114.7097545816, 51188.7710104291, 59713.9414049798)
format_for_humans(x)
#> [1] "302" "32.6K" "3.32K" "12.1K" "0" "6.27K" "383" "402"
#> [9] "19.5K" "1.78K" "1.47K" "3.79K" "2.08K" "51.1K" "51.2K" "59.7K"
format_for_humans(x, digits = 1)
#> [1] "300" "30K" "3K" "10K" "0" "6K" "400" "400" "20K" "2K" "1K"
#> [12] "4K" "2K" "50K" "50K" "60K"
I'm trying to get a list where each element has a name, by applying a function to each row of a data frame, but can't get the right output.
Assuming this is the function that I want to apply to each row:
format_setup_name <- function(m, v, s) {
a <- list()
a[[paste(m, "machines and", v, s, "GB volumes")]] <- paste(num_machines,num_volumes,vol_size,sep="-")
a
}
If this is the input data frame:
df <- data.frame(m=c(1,2,3), v=c(3,3,3), s=c(15,20,30))
I can't get a list that looks like:
$`1-3-15`
[1] "1 machines and 3 15 GB volumes"
$`2-3-20`
[1] "2 machines and 3 20 GB volumes"
$`3-3-30`
[1] "3 machines and 3 30 GB volumes"
Can someone give me hints how to do it?
Why do I need this? Well, I want to populate selectizeInput in shiny using values coming from the database. Since I'm combining several columns, I need a way to match the selected input with the values.
This is a good use case for setNames which can add the names() attribute to an object, in place. Also, if you use as.list, you can do this in just one line without any looping:
setNames(as.list(paste(df$m, ifelse(df$m == 1, "machine", "machines"), "and", df$v, df$s, "GB volumes")), paste(df$m,df$v,df$s,sep="-"))
# $`1-3-15`
# [1] "1 machine and 3 15 GB volumes"
#
# $`2-3-20`
# [1] "2 machines and 3 20 GB volumes"
#
# $`3-3-30`
# [1] "3 machines and 3 30 GB volumes"
Thomas has already found a pretty neat solution to your problem (and in one line, too!). But I'll just show you how you could have succeeded with the approach you first tried:
# We'll use the same data, this time called "dat" (I avoid calling
# objects `df` because `df` is also a function's name)
dat <- data.frame(m = c(1,2,3), v = c(3,3,3), s = c(15,20,30))
format_setup_name <- function(m, v, s) {
a <- list() # initialize the list, all is well up to here
# But here we'll need a loop to assign in turn each element to the list
for(i in seq_along(m)) {
a[[paste(m[i], v[i], s[i], sep="-")]] <-
paste(m[i], "machines and", v[i], s[i], "GB volumes")
}
return(a)
}
Note that what goes inside the brackets is the name of the element, while what's at the right side of the <- is the content to be assigned, not the inverse as your code was suggesting.
So let's try it:
my.setup <- format_setup_name(dat$m, dat$v, dat$s)
my.setup
# $`1-3-15`
# [1] "1 machines and 3 15 GB volumes"
#
# $`2-3-20`
# [1] "2 machines and 3 20 GB volumes"
#
# $`3-3-30`
# [1] "3 machines and 3 30 GB volumes"
Everything seems nice. Just one thing to note: with the $ operator, you'll need to use single or double quotes to access individual items by their names:
my.setup$"1-3-15" # my.setup$1-3-15 won't work
# [1] "1 machines and 3 15 GB volumes"
my.setup[['1-3-15']] # equivalent
# [1] "1 machines and 3 15 GB volumes"
Edit: lapply version
Since loops have really fallen out of favor, here's a version with lapply:
format_setup_name <- function(m, v, s) {
a <- lapply(seq_along(m), function(i) paste(m[i], "machines and", v[i], s[i], "GB volumes"))
names(a) <- paste(m, v, s, sep="-")
return(a)
}
I am reposting my problem here, after I noticed that was the approach advised by knitr's author to get more help.
I am a bit puzzle with a .Rmd file that I can proceed line by line in an interactive R session, and also with R CMD BATCH, but that fails when using knit("test.Rmd"). I am not sure where the problem lies, and I tried to narrow the problem down as much as I could. Here is the example (in test.Rmd):
```{r Rinit, include = FALSE, cache = FALSE}
opts_knit$set(stop_on_error = 2L)
library(adehabitatLT)
```
The functions to be used later:
```{r functions}
ld <- function(ltraj) {
if (!inherits(ltraj, "ltraj"))
stop("ltraj should be of class ltraj")
inf <- infolocs(ltraj)
df <- data.frame(
x = unlist(lapply(ltraj, function(x) x$x)),
y = unlist(lapply(ltraj, function(x) x$y)),
date = unlist(lapply(ltraj, function(x) x$date)),
dx = unlist(lapply(ltraj, function(x) x$dx)),
dy = unlist(lapply(ltraj, function(x) x$dy)),
dist = unlist(lapply(ltraj, function(x) x$dist)),
dt = unlist(lapply(ltraj, function(x) x$dt)),
R2n = unlist(lapply(ltraj, function(x) x$R2n)),
abs.angle = unlist(lapply(ltraj, function(x) x$abs.angle)),
rel.angle = unlist(lapply(ltraj, function(x) x$rel.angle)),
id = rep(id(ltraj), sapply(ltraj, nrow)),
burst = rep(burst(ltraj), sapply(ltraj, nrow)))
class(df$date) <- c("POSIXct", "POSIXt")
attr(df$date, "tzone") <- attr(ltraj[[1]]$date, "tzone")
if (!is.null(inf)) {
nc <- ncol(inf[[1]])
infdf <- as.data.frame(matrix(nrow = nrow(df), ncol = nc))
names(infdf) <- names(inf[[1]])
for (i in 1:nc) infdf[[i]] <- unlist(lapply(inf, function(x) x[[i]]))
df <- cbind(df, infdf)
}
return(df)
}
ltraj2sldf <- function(ltr, proj4string = CRS(as.character(NA))) {
if (!inherits(ltr, "ltraj"))
stop("ltr should be of class ltraj")
df <- ld(ltr)
df <- subset(df, !is.na(dist))
coords <- data.frame(df[, c("x", "y", "dx", "dy")], id = as.numeric(row.names(df)))
res <- apply(coords, 1, function(dfi) Lines(Line(matrix(c(dfi["x"],
dfi["y"], dfi["x"] + dfi["dx"], dfi["y"] + dfi["dy"]),
ncol = 2, byrow = TRUE)), ID = format(dfi["id"], scientific = FALSE)))
res <- SpatialLinesDataFrame(SpatialLines(res, proj4string = proj4string),
data = df)
return(res)
}
```
I load the object and apply the `ltraj2sldf` function:
```{r fail}
load("tr.RData")
juvStp <- ltraj2sldf(trajjuv, proj4string = CRS("+init=epsg:32617"))
dim(juvStp)
```
Using knitr("test.Rmd") fails with:
label: fail
Quitting from lines 66-75 (test.Rmd)
Error in SpatialLinesDataFrame(SpatialLines(res, proj4string =
proj4string), (from <text>#32) :
row.names of data and Lines IDs do not match
Using the call directly in the R console after the error occurred works as expected...
The problem is related to the way format produces the ID (in the apply call of ltraj2sldf), just before ID 100,000: using an interactive call, R gives "99994", "99995", "99996", "99997", "99998", "99999", "100000"; using knitr R gives "99994", " 99995", " 99996", " 99997", " 99998", " 99999", "100000", with additional leading spaces.
Is there any reason for this behaviour to occur? Why should knitr behave differently than a direct call in R? I have to admit I'm having hard time with that one, since I cannot debug it (it works in an interactive session)!
Any hint will be much appreciated. I can provide the .RData if it helps (the file is 4.5 Mo), but I'm mostly interested in why such a difference happens. I tried without any success to come up with a self-reproducible example, sorry about that. Thanks in advance for any contribution!
After a comment of baptiste, here are some more details about IDs generation. Basically, the ID is generated at each line of the data frame by an apply call, which in turn uses format like this: format(dfi["id"], scientific = FALSE). Here, the column id is simply a series from 1 to the number of rows (1:nrow(df)). scientific = FALSE is just to ensure that I don't have results such as 1e+05 for 100000.
Based on an exploration of the IDs generation, the problem only occurred for those presented in the first message, i.e. 99995 to 99999, for which a leading space is added. This should not happen with this format call, since I did not ask for a specific number of digit in the output. For instance:
> format(99994:99999, scientific = FALSE)
[1] "99994" "99995" "99996" "99997" "99998" "99999"
However, if the IDs are generated in chunks, it might occur:
> format(99994:100000, scientific = FALSE)
[1] " 99994" " 99995" " 99996" " 99997" " 99998" " 99999" "100000"
Note that the same processed one at a time gives the expected result:
> for (i in 99994:100000) print(format(i, scientific = FALSE))
[1] "99994"
[1] "99995"
[1] "99996"
[1] "99997"
[1] "99998"
[1] "99999"
[1] "100000"
In the end, it's exactly like if the IDs were not prepared one at a time (as I would expect from an apply call by line), but in this case, 6 at a time, and only when close to 1e+05... And of course, only when using knitr, not interactive or batch R.
Here is my session information:
> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-pc-linux-gnu (64-bit)
locale:
[1] LC_CTYPE=fr_FR.UTF-8 LC_NUMERIC=C
[3] LC_TIME=fr_FR.UTF-8 LC_COLLATE=fr_FR.UTF-8
[5] LC_MONETARY=fr_FR.UTF-8 LC_MESSAGES=fr_FR.UTF-8
[7] LC_PAPER=C LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] knitr_1.2 adehabitatLT_0.3.12 CircStats_0.2-4
[4] boot_1.3-9 MASS_7.3-27 adehabitatMA_0.3.6
[7] ade4_1.5-2 sp_1.0-11 basr_0.5.3
loaded via a namespace (and not attached):
[1] digest_0.6.3 evaluate_0.4.4 formatR_0.8 fortunes_1.5-0
[5] grid_3.0.1 lattice_0.20-15 stringr_0.6.2 tools_3.0.1
Both Jeff and baptiste were were indeed right! This is an option problem, related to the digits argument. I managed to come up with a working minimal example (e.g. in test.Rmd):
Simple reproducible example : df1 is a data frame of 110,000 rows,
with 2 random normal variables + an `id` variable which is a series
from 1 to the number of row.
```{r example}
df1 <- data.frame(x = rnorm(110000), y = rnorm(110000), id = 1:110000)
```
From this, we create a `id2` variable using `format` and `scientific =
FALSE` to have results with all numbers instead of scientific
notations (e.g. 100,000 instead of 1e+05):
```{r example-continued}
df1$id2 <- apply(df1, 1, function(dfi) format(dfi["id"], scientific = FALSE))
df1$id2[99990:100010]
```
It works as expected using R interactively, resulting in:
[1] "99990" "99991" "99992" "99993" "99994" "99995" "99996"
[8] "99997" "99998" "99999" "100000" "100001" "100002" "100003"
[15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"
However, the results are quite different using knit:
> library(knitr)
> knit("test.Rmd")
[...]
## [1] "99990" "99991" "99992" "99993" "99994" " 99995" " 99996"
## [8] " 99997" " 99998" " 99999" "100000" "100001" "100002" "100003"
## [15] "100004" "100005" "100006" "100007" "100008" "100009" "100010"
Note the additional leading spaces after 99994. The difference actually comes from the digits option, as rightly suggested by Jeff: R uses 7 by default, while knitr uses 4. This difference affects the output of format, although I don't really understand what's going on here. R-style:
> options(digits = 7)
> format(99999, scientific = FALSE)
[1] "99999"
knitr-style:
> options(digits = 4)
> format(99999, scientific = FALSE)
[1] " 99999"
But it should affect all numbers, not just after 99994 (well, to be honest, I don't even understand why it's adding leading spaces at all):
> options(digits = 4)
> format(c(1:10, 99990:100000), scientific = FALSE)
[1] " 1" " 2" " 3" " 4" " 5" " 6" " 7"
[8] " 8" " 9" " 10" " 99990" " 99991" " 99992" " 99993"
[15] " 99994" " 99995" " 99996" " 99997" " 99998" " 99999" "100000"
From this, I have no idea which is at fault: knitr, apply or format? At least, I came up with a workaround, using the argument trim = TRUE in format. It doesn't solve the cause of the problem, but did remove the leading space in the results...
I added a comment to your knitr GitHub issue with this information.
format() adds the extra whitespace when the digits option is not sufficient to display a value but scientific=FALSE is also specified. knitr sets digits to 4 inside code blocks, which causes the behavior you describe:
options(digits=4)
format(99999, scientific=FALSE)
Produces:
[1] " 99999"
While:
options(digits=5)
format(99999, scientific=FALSE)
Produces:
[1] "99999"
Thanks to Aleksey Vorona and Duncan Murdoch, this bug is now fixed in R-devel!
See: https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=15411