Understanding vectorisation

Understanding vectorisation - r

I was looking for a way to format large numbers in R as 2.3K or 5.6M. I found this solution on SO. Turns out, it shows some strange behaviour for some input vectors.
Here is what I am trying to understand -
# Test vector with weird behaviour
x <- c(302.456500093388, 32553.3619756151, 3323.71232001074, 12065.4076372462,
0, 6270.87962956305, 383.337515655172, 402.20778095643, 19466.0204345063,
1779.05474064539, 1467.09928489114, 3786.27112222457, 2080.08078309959,
51114.7097545816, 51188.7710104291, 59713.9414049798)
# Formatting function for large numbers
comprss <- function(tx) {
div <- findInterval(as.numeric(gsub("\\,", "", tx)),
c(1, 1e3, 1e6, 1e9, 1e12) )
paste(round( as.numeric(gsub("\\,","",tx))/10^(3*(div-1)), 1),
c('','K','M','B','T')[div], sep = '')
}
# Compare outputs for the following three commands
x
comprss(x)
sapply(x, comprss)
We can see that comprss(x) produces 0k as the 5th element which is weird, but comprss(x[5]) gives us the expected results. The 6th element is even weirder.
As far as I know, all the functions used in the body of comprss are vectorised. Then why do I still need to sapply my way out of this?

Here's a vectorized version adapted from pryr:::print.bytes:
format_for_humans <- function(x, digits = 3){
grouping <- pmax(floor(log(abs(x), 1000)), 0)
paste0(signif(x / (1000 ^ grouping), digits = digits),
c('', 'K', 'M', 'B', 'T')[grouping + 1])
}
format_for_humans(10 ^ seq(0, 12, 2))
#> [1] "1" "100" "10K" "1M" "100M" "10B" "1T"
x <- c(302.456500093388, 32553.3619756151, 3323.71232001074, 12065.4076372462,
0, 6270.87962956305, 383.337515655172, 402.20778095643, 19466.0204345063,
1779.05474064539, 1467.09928489114, 3786.27112222457, 2080.08078309959,
51114.7097545816, 51188.7710104291, 59713.9414049798)
format_for_humans(x)
#> [1] "302" "32.6K" "3.32K" "12.1K" "0" "6.27K" "383" "402"
#> [9] "19.5K" "1.78K" "1.47K" "3.79K" "2.08K" "51.1K" "51.2K" "59.7K"
format_for_humans(x, digits = 1)
#> [1] "300" "30K" "3K" "10K" "0" "6K" "400" "400" "20K" "2K" "1K"
#> [12] "4K" "2K" "50K" "50K" "60K"

Related

Sequence of numbers by hyphen without hyphenating single occurrences

I want to generate readable number sequences (e.g. 1, 2, 3, 4 = 1-4), but for a set of data where each number in the sequence must have four digits (e.g. 99 = 0099 or 1 = 0001 or 1022 = 1022) AND where there are different letters in front of each number.
I was looking at the answer to this question, which managed to do almost exactly as I want with two caveats:
If there is a stand-alone number that does not appear in a sequence, it will appear twice with a hyphen in between
If there are several stand-alone numbers that do no appear in a sequence, they won't be included in the result
### Create Data Set ====
## Create the data for different tags. I'm only using two unique levels here, but in my dataset I've got
## 400+ unique levels.
FM <- paste0('FM', c('0001', '0016', '0017', '0018', '0019', '0021', '0024', '0026', '0028'))
SC <- paste0('SC', c('0002', '0003', '0004', '0010', '0012', '0014', '0033', '0036', '0039'))
## Combine data
my.seq1 <- c(FM, SC)
## Sort data by number in sequence
my.seq1 <- my.seq1[order(substr(my.seq1, 3, 7))]
### Attempt Number Sequencing ====
## Get the letters
sp.tags <- substr(my.seq1, 1, 2)
## Get the readable number sequence
lapply(split(my.seq1, sp.tags), ## Split data by the tag ID
function(x){
## Get the run lengths as per [previous answer][1]
rl <- rle(c(1, pmin(diff(as.numeric(substr(x, 3, 7))), 2)))
## Generate number sequence by separator as per [previous answer][1]
seq2 <- paste0(x[c(1, cumsum(rl$lengths))], c("-", ",")[rl$values], collapse="")
return(substr(seq2, 1, nchar(seq2)-1))
})
## Combine lists and sort elements
my.seq2 <- unlist(strsplit(do.call(c, my.seq2), ","))
my.seq2 <- my.seq2[order(substr(my.seq2, 3, 7))]
names(my.seq2) <- NULL
my.seq2
[1] "FM0001-FM0001" "SC0002-SC0004" "FM0016-FM0019" "FM0028" "SC0039"
my.seq1
[1] "FM0001" "SC0002" "SC0003" "SC0004" "SC0010" "SC0012" "SC0014" "FM0016" "FM0017" "FM0018" "FM0019" "FM0021"
[13] "FM0024" "FM0026" "FM0028" "SC0033" "SC0036" "SC0039"
The major problems with this are:
Some values are completely missing from the data set (e.g. FM0021, FM0024, FM0026)
The first number in the sequence (FM0001) appears with a hyphen in between
I feel like I'm getting warmer by using A5C1D2H2I1M1N2O1R2T1's answer to utilize seqToHumanReadable because it's quite elegant AND solves both problems. Two more problems are that I'm not able to tag the ID before each number and can't force the number of digits to four (e.g. 0004 becomes 4).
library(R.utils)
lapply(split(my.seq1, sp.tags), function(x){
return(unlist(strsplit(seqToHumanReadable(substr(x, 3, 7)), ',')))
})
$FM
[1] "1" " 16-19" " 21" " 24" " 26" " 28"
$SC
[1] "2-4" " 10" " 12" " 14" " 33" " 36" " 39"
Ideally the result would be:
"FM0001, SC002-SC004, SC0012, SC0014, FM0017-FM0019, FM0021, FM0024, FM0026, FM0028, SC0033, SC0036, SC0039"
Any ideas? It's one of those things that's really simple to do by hand but would take blinking ages, and you'd think a function would exist for it but I haven't found it yet or it doesn't exist :(

This should do?
# get the prefix/tag and number
tag <- gsub("(^[A-z]+)(.+)", "\\1", my.seq1)
num <- gsub("([A-z]+)(\\d+$)", "\\2", my.seq1)
# get a sequence id
n <- length(tag)
do_match <- c(FALSE, diff(as.numeric(num)) == 1 & tag[-1] == tag[-n])
seq_id <- cumsum(!do_match) # a sequence id
# tapply to combine the result
res <- setNames(tapply(my.seq1, seq_id, function(x)
if(length(x) < 2)
return(x)
else
paste(x[1], x[length(x)], sep = "-")), NULL)
# show the result
res
#R> [1] "FM0001" "SC0002-SC0004" "SC0010" "SC0012" "SC0014" "FM0016-FM0019" "FM0021"
#R> [8] "FM0024" "FM0026" "FM0028" "SC0033" "SC0036" "SC0039"
# compare with
my.seq1
#R> [1] "FM0001" "SC0002" "SC0003" "SC0004" "SC0010" "SC0012" "SC0014" "FM0016" "FM0017" "FM0018" "FM0019" "FM0021" "FM0024"
#R> [14] "FM0026" "FM0028" "SC0033" "SC0036" "SC0039"
Data
FM <- paste0('FM', c('0001', '0016', '0017', '0018', '0019', '0021', '0024', '0026', '0028'))
SC <- paste0('SC', c('0002', '0003', '0004', '0010', '0012', '0014', '0033', '0036', '0039'))
my.seq1 <- c(FM, SC)
my.seq1 <- my.seq1[order(substr(my.seq1, 3, 7))]

How to select a specific interval of dataframes/objects inside a list()?

I have a list composed of 10 numeric vectors. I would like to select the first 5 1:5, or let's say just the 3rd and the 9th of this numeric vectors inside the list.
This below would be an example of a list:
n_vec = lapply(1:10, function(x) rnorm(20,5,2))
bLister = list()
keeping_names = NULL
for (i in 1:length(n_vec)) {
single_name_ = paste("thisis_vec",i)
temp = n_vec[[i]]
keeping_names = c(keeping_names,single_name_)
bLister[[i]] = temp
}
names(bLister) = keeping_names
This way doesn't work:
bLister[[1:5]]
bLister[[c(3,9)]]
How can I do this?

You can subset vectors like so. Notice the number of square brackets.
> bLister[c(3, 9)]
$`thisis_vec 3`
[1] 5.603467 3.749571 3.944807 7.279552 7.122220 2.065051 2.587282 4.405463
[9] 6.687400 7.567451 6.239640 6.017510 2.484759 3.223271 5.301008 1.545704
[17] 2.465992 1.518966 6.997675 3.966775
$`thisis_vec 9`
[1] 3.900151 5.260895 7.971662 6.578425 4.861220 3.770569 1.128102 6.164506
[9] 4.767511 5.286352 3.898185 2.298500 8.476691 7.794415 7.148588 6.699527
[17] 3.638074 4.240355 8.575829 5.340551

Create list with specific iteration in R

I have the following dataset containing dates:
> dates
[1] "20180412" "20180424" "20180506" "20180518" "20180530" "20180611" "20180623" "20180705" "20180717" "20180729"
I am trying to create a list where in each position, the name is 'Coherence_' + the first and second dates in dates. So in output1[1] I would have Coherence_20180412_20180424. Then in output1[2] I would have Coherence_20180506_20180518, etc.
I am starting with this code but it is not working they way I need:
output1<-list()
for (i in 1:5){
output1[[i]]<-paste("-Poutput1=", S1_Out_Path,"Coherence_VV_TC", dates[[i]],"_", dates[[i+1]], ".tif", sep="")
}
Do you have any suggestions?
M

Try this:
Without loop
even_indexes<-seq(2,10,2) # List of even indexes
odd_indexes<-seq(1,10,2) # List of odd indexes
print(paste('Coherence',paste(odd_indexes,even_indexes,sep = "_"),sep = "_"))
Link answer from here: Create list in R with specific iteration
Updated (To get data in List)
lst=c(paste('Coherence',paste(odd_indexes,even_indexes,sep = "_"),sep = "_"))
OR
a=c(1:10)
for (i in seq(1, 9, 2)){
print(paste('Coherence',paste(a[i],a[i+1],sep = "_"),sep = "_"))
}
Output:
[1] "Coherence_1_2"
[1] "Coherence_3_4"
[1] "Coherence_5_6"
[1] "Coherence_7_8"
[1] "Coherence_9_10"

You can create these patterns using paste capability to operate on vectors:
dates <- c("20180412", "20180424", "20180506", "20180518", "20180530",
"20180611", "20180623", "20180705", "20180717", "20180729")
paste("Coherence", dates[1:length(dates)-1], dates[2:length(dates)], sep="_")
[1] "Coherence_20180412_20180424" "Coherence_20180424_20180506" "Coherence_20180506_20180518"
[4] "Coherence_20180518_20180530" "Coherence_20180530_20180611" "Coherence_20180611_20180623"
[7] "Coherence_20180623_20180705" "Coherence_20180705_20180717" "Coherence_20180717_20180729"
Or other simple patterns can be generated as:
paste("Coherence", dates[seq(1, length(dates), 2)], dates[seq(2, length(dates), 2)], sep="_")
[1] "Coherence_20180412_20180424" "Coherence_20180506_20180518" "Coherence_20180530_20180611"
[4] "Coherence_20180623_20180705" "Coherence_20180717_20180729"

You can use matrix(..., nrow=2):
dates <- c("20180412", "20180424", "20180506", "20180518", "20180530", "20180611", "20180623", "20180705", "20180717", "20180729")
paste0("Coherence_", apply(matrix(dates, 2), 2, FUN=paste0, collapse="_"))
# > paste0("Coherence_", apply(matrix(dates, 2), 2, FUN=paste0, collapse="_"))
# [1] "Coherence_20180412_20180424" "Coherence_20180506_20180518" "Coherence_20180530_20180611" "Coherence_20180623_20180705"
# [5] "Coherence_20180717_20180729"

Accessing selected elements of a list of lists in R

I have a list of list subgame[[i]]$Weight of this type:
[[1]]
[1] 0.4720550 0.4858826 0.4990469 0.5115899 0.5235512 0.5349672 0.5458720
[8] 0.5562970 0.5662715 0.5758226 0.5849754 0.5937532 0.6021778 0.6102692
[15] 0.6180462 0.6255260 0.6327250 0.6396582 0.6463397 0.6527826
[[2]]
[1] 0.4639948 0.4779027 0.4911519 0.5037834 0.5158356 0.5273443 0.5383429
[8] 0.5488623 0.5589313 0.5685767 0.5778233 0.5866943 0.5952111 0.6033936
[15] 0.6112605 0.6188291 0.6261153 0.6331344 0.6399002 0.6464260
[[3]]
[1] 0.4629488 0.4768668 0.4901266 0.5027692 0.5148329 0.5263534 0.5373639
[8] 0.5478953 0.5579764 0.5676339 0.5768926 0.5857755 0.5943041 0.6024984
[15] 0.6103768 0.6179568 0.6252543 0.6322844 0.6390611 0.6455976
What I am looking for is to access all the j-th elements of every list. Example if j=1 I must get:
>0.4720550 0.4639948 0.4629488
How can I do it?
I found
sapply(1:length(subgame[[i]]$Weight),function(k) subgame[[i]]$Weight[[k]][1])
But seems too tricky to me.
There is a more elegant way?

If j=1, then you're interested in subgame[[i]]$Weight[[1]][1], subgame[[i]]$Weight[[2]][1], and subgame[[i]]$Weight[[3]][1]. In other words, you want to use [1] on each list element.
But what happens when you subset a vector? For example:
(x <- rnorm(5))
# [1] -1.8965529 0.4688618 0.6588774 0.2749539 0.1829046
x[3]
# [1] 0.6588774
[ is actually a function, and it gets called in this situation. You can read a bit more about it with ?"[", but the point is that you can call it like any other function. Its first argument will be the object to subset, then you can pass it the index (or indices) you're interested in (along with some other arguments that the help page discusses):
x[3]
# [1] 0.6588774
`[`(x, 3)
# [1] 0.6588774
Note the backticks surrounding the name. A bare [ will throw an error, so you need to quote it. The same goes for other functions like +.
So if you want to get the first element of each list element, you can apply [ to each element of the list, passing it 1 or whatever j is:
sapply(subgame[[i]]$Weight, `[`, 1)

I would like to add a solution which returns the result you want for the Weight list of each elements of your subgame list.
> subgame <- list(list(weight = list(c(1, 2), c(3, 4), c(5, 6))), list(weight = list(c(7, 8), c(9, 10), c(11, 12))))
>
> j = 1
>
> do.call(rbind, subgame[[1]]$weight)[,j]
[1] 1 3 5
>
> lapply(subgame, function(x) {do.call(rbind, x$weight)[,j]})
[[1]]
[1] 1 3 5
[[2]]
[1] 7 9 11

How to format numbers in R, specifying the number of significant digits but keep significant zeroes and integer part?

I've been struggling with formatting numbers in R using what I feel are very sensible rules. What I would want is to specify a number of significant digits (say 3), keep significant zeroes, and also keep all digits before the decimal point, some examples (with 3 significant digits):
1.23456 -> "1.23"
12.3456 -> "12.3"
123.456 -> "123"
1234.56 -> "1235"
12345.6 -> "12346"
1.50000 -> "1.50"
1.49999 -> "1.50"
Is there a function in R that does this kind of formatting? If not, how could it be done?
I feel these are quite sensible formatting rules, yet I have not managed to find a function that formats in this way in R. As far as I googled this is not a duplicate of many similar questions such as this
Edit:
Inspired by the two good answers I put together a function myself that I believe works for all cases:
sign_digits <- function(x,d){
s <- format(x,digits=d)
if(grepl("\\.", s) && ! grepl("e", s)) {
n_sign_digits <- nchar(s) -
max( grepl("\\.", s), attr(regexpr("(^[-0.]*)", s), "match.length") )
n_zeros <- max(0, d - n_sign_digits)
s <- paste(s, paste(rep("0", n_zeros), collapse=""), sep="")
}
s
}

format(num,3) comes very close.
format(1.23456,digits=3)
# [1] "1.23"
format(12.3456,digits=3)
# [1] "12.3"
format(123.456,digits=3)
# [1] "123"
format(1234.56,digits=3)
# [1] "1235"
format(12345.6,digits=3)
# [1] "12346"
format(1.5000,digits=3)
# [1] "1.5"
format(1.4999,digits=3)
# [1] "1.5"
Your rules are not actually internally consistent. You want 1234.56 to round down to 1234, yet you want 1.4999 to round up to 1.5.
EDIT This appears to deal with the very valid point made by #Henrik.
sigDigits <- function(x,d){
z <- format(x,digits=d)
if (!grepl("[.]",z)) return(z)
require(stringr)
return(str_pad(z,d+1,"right","0"))
}
z <- c(1.23456, 12.3456, 123.456, 1234.56, 12345.6, 1.5000, 1.4999)
sapply(z,sigDigits,d=3)
# [1] "1.23" "12.3" "123" "1235" "12346" "1.50" "1.50"

As #jlhoward points out, your rounding rule is not consistent. Hence you should use a conditional statement:
x <- c(1.23456, 12.3456, 123.456, 1234.56, 12345.6, 1.50000, 1.49999)
ifelse(x >= 100, sprintf("%.0f", x), ifelse(x < 100 & x >= 10, sprintf("%.1f", x), sprintf("%.2f", x)))
# "1.23" "12.3" "123" "1235" "12346" "1.50" "1.50"
It's hard to say the intended usage, but it might be better to use consistent rounding. Exponential notation could be an option:
sprintf("%.2e", x)
[1] "1.23e+00" "1.23e+01" "1.23e+02" "1.23e+03" "1.23e+04" "1.50e+00" "1.50e+00"

sig0=\(x,y){
dig=abs(pmin(0,floor(log10(abs(x)))-y+1))
dig[is.infinite(dig)]=y-1
sprintf(paste0("%.",dig,"f"),x)
}
> v=c(1111,111.11,11.1,1.1,1.99,.01,.001,0,-.11,-.9,-.000011)
> paste(sig0(v,2),collapse=" ")
[1] "1111 111 11 1.1 2.0 0.010 0.0010 0.0 -0.11 -0.90 -0.000011"
Or the following is almost the same with the exception that 0 is converted to 0 and not 0.0 (fg is a special version of f where the digits specify significant digits and not digits after the decimal point, and the # flag causes fg to not drop trailing zeroes):
> paste(sub("\\.$","",formatC(v,2,,"fg","#")),collapse=" ")
[1] "1111 111 11 1.1 2.0 0.010 0.0010 0 -0.11 -0.90 -0.000011"

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Understanding vectorisation - r

Related

Sequence of numbers by hyphen without hyphenating single occurrences

How to select a specific interval of dataframes/objects inside a list()?

Create list with specific iteration in R

Accessing selected elements of a list of lists in R

How to format numbers in R, specifying the number of significant digits but keep significant zeroes and integer part?

Categories

Resources