Convert number to alphabetically sortable character representation

Convert number to alphabetically sortable character representation - r

I need to pass some numbers into functions/environments/contexts that only accept strings. They will then be processed there, after receiving the result I can reconvert them to numbers. The key issue is that when the string representations are sorted, they must be sorted in the correct order (in what would be the numerical sort order of the numeric representation). I also need to be able to add numbers after creating the initial batch. Following are two conversion approaches and why they do not work:
The simplest conversion is the standard one. 1->"1", 10->"10" and so on. This does not satisfy the criteria of sortability, because "10" gets sorted before "2".
The next approach is to prefix with zeroes. 1->"001", 10->"010" and so on. This satisfies sortability, ("002" gets sorted before "010"), but if a a larger number needs to be added later, this approach fails. If, say the numbers 2000 and 10000 need to be added later it is not possible to do so in a way that maintains sorting.
Are there any good approaches to doing this? The question doen not pertain specifically to any particular language (although the target language in my use-case is R, which has a number of places such as vector names and others that accept only character variables). Simplicity and/or standardization (of the representation, not implementation-wise) would both be big factors in choosing the best solution here.

I've had the same problem I think, and I used workaround that could help you. But I'm not sure it will apply in all situations.
First, a vector with number as string ordered as string:
str.numbers <- sort(as.character(1:20))
Now, I used the numeric representation of the string to order numerically the same vector:
str.numbers[order(as.numeric(str.numbers))]
This does the trick for simple vectors. But not sure it'll solve more complex problems.

Funny how the world works, after searching for a solution but not finding it, I came across a package, with the last release only four days ago, that focuses on this problem. This is the strex package. A reproducible example of my troubles, and how strex provides a pretty good fix, follows:
# Load the strex library
library(strex)
#> Loading required package: stringr
# This would be a function created by others, doing more than
# sorting, but nevertheless requiring sortable input
fun_requiring_char <- function(x) {
stopifnot(is.character(x))
sort(x)
}
# Example data
set.seed(42)
a <- sample(20, 5) ; a
#> [1] 17 5 1 10 4
b <- sample(2000, 10) ; b
#> [1] 1170 634 49 1152 1327 24 1863 356 1625 165
# Won't work, error
#fun_requiring_char(a)
# Works, but returns incorrectly sorted input
fun_requiring_char(as.character(a))
#> [1] "1" "10" "17" "4" "5"
fun_requiring_char(as.character(b))
#> [1] "1152" "1170" "1327" "1625" "165" "1863" "24" "356" "49" "634"
# Solution provided by strex
fun_requiring_char(str_alphord_nums(a))
#> [1] "01" "04" "05" "10" "17"
fun_requiring_char(str_alphord_nums(b))
#> [1] "0024" "0049" "0165" "0356" "0634" "1152" "1170" "1327" "1625" "1863"
# What quick and dirty zero padding did not allow was to first
# convert a, and then b into character, where both were of a
# unified format that would represent the numbers and yet be
# sortable in correct order according to the numbers they
# represent. However, using str_alphord_nums repeatedly gets
# very close to a solution.
ac <- str_alphord_nums(a); ac
#> [1] "17" "05" "01" "10" "04"
bc <- str_alphord_nums(b); bc
#> [1] "1170" "0634" "0049" "1152" "1327" "0024" "1863" "0356" "1625" "0165"
# Wrong order here
fun_requiring_char(c(ac,bc))
#> [1] "0024" "0049" "01" "0165" "0356" "04" "05" "0634" "10" "1152"
#> [11] "1170" "1327" "1625" "17" "1863"
# But doing an alphord again on the concatenated vectors provides a fix
fun_requiring_char(str_alphord_nums(c(ac,bc)))
#> [1] "0001" "0004" "0005" "0010" "0017" "0024" "0049" "0165" "0356" "0634"
#> [11] "1152" "1170" "1327" "1625" "1863"
Created on 2020-10-21 by the reprex package (v0.3.0)

Related

R: bench::mark does not return max and mean

Consider the follwoing example
res <- bench::mark(rnorm(1e5))
names(res)
#> [1] "expression" "min" "median" "itr/sec" "mem_alloc"
#> [6] "gc/sec" "n_itr" "n_gc" "total_time" "result"
#> [11] "memory" "time" "gc"
I am somewhat confused, that the mean and the maximum runtime are not included, contradicting the help page. Bench version is 1.0.4 and R 3.63.
Does anyone know what the issue is here?

It is usually useful to run
str(res)
to see what is the output of functions. In this case one of the object's members, near the end, is
# $ time :List of 1
# ..$ : 'bench_time' num 12.3ms 16.1ms 17.9ms 12.3ms 13.4ms ...
This means that res$time is a list with just one member.
So the results are kept in res$time[[1]] and the mean and maximum values can be calculated from it.
For instance, compare median(res$time[[1]]) with the printed result. They are the same value.
median(res$time[[1]])
#[1] 12.3ms
And the mean and maximum will be
mean(res$time[[1]])
#[1] 12.5ms
max(res$time[[1]])
#[1] 17.9ms

Reducing a data.tree created from List

I'm working on a shiny app which plots data trees. I'm looking to incorporate the shinyTree app to permit quick comparison of plotted nodes. The issue is that the shinyTree app returns a redundant list of lists of the sub node plot.
The actual list of list is included below. I would like to keep the longest branches only. I would also like to remove the id node (integer node), I'm struggling as to why it even shows up based on the list. I have tried many different methods to work with this list but it's been a real struggle. The list concept is difficult to understand.
I create the data.tree and plot via:
dataTree.a <- FromListSimple(checkList)
plot(dataTree.a)
> checkList
[[1]]
[[1]]$Asia
[[1]]$Asia$China
[[1]]$Asia$China$Beijing
[[1]]$Asia$China$Beijing$Round
[[1]]$Asia$China$Beijing$Round$`20383994`
[1] 0
[[2]]
[[2]]$Asia
[[2]]$Asia$China
[[2]]$Asia$China$Beijing
[[2]]$Asia$China$Beijing$Round
[1] 0
[[3]]
[[3]]$Asia
[[3]]$Asia$China
[[3]]$Asia$China$Beijing
[1] 0
[[4]]
[[4]]$Asia
[[4]]$Asia$China
[[4]]$Asia$China$Shanghai
[[4]]$Asia$China$Shanghai$Round
[[4]]$Asia$China$Shanghai$Round$`23740778`
[1] 0
[[5]]
[[5]]$Asia
[[5]]$Asia$China
[[5]]$Asia$China$Shanghai
[[5]]$Asia$China$Shanghai$Round
[1] 0
[[6]]
[[6]]$Asia
[[6]]$Asia$China
[[6]]$Asia$China$Shanghai
[1] 0
[[7]]
[[7]]$Asia
[[7]]$Asia$China
[1] 0
[[8]]
[[8]]$Asia
[[8]]$Asia$India
[[8]]$Asia$India$Delhi
[[8]]$Asia$India$Delhi$Round
[[8]]$Asia$India$Delhi$Round$`25703168`
[1] 0
[[9]]
[[9]]$Asia
[[9]]$Asia$India
[[9]]$Asia$India$Delhi
[[9]]$Asia$India$Delhi$Round
[1] 0
[[10]]
[[10]]$Asia
[[10]]$Asia$India
[[10]]$Asia$India$Delhi
[1] 0
[[11]]
[[11]]$Asia
[[11]]$Asia$India
[1] 0
[[12]]
[[12]]$Asia
[[12]]$Asia$Japan
[[12]]$Asia$Japan$Tokyo
[[12]]$Asia$Japan$Tokyo$Round
[[12]]$Asia$Japan$Tokyo$Round$`38001000`
[1] 0
[[13]]
[[13]]$Asia
[[13]]$Asia$Japan
[[13]]$Asia$Japan$Tokyo
[[13]]$Asia$Japan$Tokyo$Round
[1] 0
[[14]]
[[14]]$Asia
[[14]]$Asia$Japan
[[14]]$Asia$Japan$Tokyo
[1] 0
[[15]]
[[15]]$Asia
[[15]]$Asia$Japan
[1] 0
[[16]]
[[16]]$Asia
[1] 0

Well, I did cobble together a poor hack to make this work here is what I did to the 'checkList' list
checkList <- get_selected(tree, format = "slices")
# Convert and collapse shinyTree slices to data.tree
# This is a bit of a cluge to work the graphic with
# shinyTree an alternate one liner is in works
# This transform works by finding the longest branches
# and only plotting them since the other branches are
# subsets due to the slices.
# Extract the checkList name (as characters) from the checkList
tmp <- names(unlist(checkList))
# Determine the length of the individual checkList Names
lens <- lapply(tmp, function(x) length(strsplit(x, ".", fixed=TRUE)[[1]]))
# Find the elements with the highest length returns a list of high vals
lens.max <- which(lens == max(sapply(lens, max)))
# Replace all '.' with '\' prepping for DataFrameTable Converions
tmp <- relist(str_replace_all(tmp, "\\.", "/"), skeleton=tmp)
# Add a root node to work with multiple branches
tmp <- unlist(lapply(tmp, function(x) paste0("Root/", x)))
# Create a list of only the longest branches
longBranches <- as.list(tmp[lens.max])
# Convert the list into a data.frame for convert
longBranches.df <- data.frame(pathString = do.call(rbind, longBranches))
# Publish the data.frame for use
vals$selDF <- longBranches.df
#save(checkList, file = "chkLists.RData") # Save for troubleshooting
print(vals$selDF)ode here
The new checkList looks like this:
[1] "Root/Europe/France/Paris/Round/10843285" "Root/Europe/France/Paris/Round"
[3] "Root/Europe/France/Paris" "Root/Europe/France"
[5] "Root/Europe/Germany/Berlin/Diamond/3563194" "Root/Europe/Germany/Berlin/Diamond"
[7] "Root/Europe/Germany/Berlin/Round/3563194" "Root/Europe/Germany/Berlin/Round"
[9] "Root/Europe/Germany/Berlin" "Root/Europe/Germany"
[11] "Root/Europe/Italy/Rome/Round/3717956" "Root/Europe/Italy/Rome/Round"
[13] "Root/Europe/Italy/Rome" "Root/Europe/Italy"
[15] "Root/Europe/United Kingdom/London/Round/10313307" "Root/Europe/United Kingdom/London/Round"
[17] "Root/Europe/United Kingdom/London" "Root/Europe/United Kingdom"
[19] "Root/Europe"
It works :)... but I think this could be done with a two liner.... I'll work on it again in a week or so. Any other Ideas would be appreciated.

How to access attributes of a dendrogram in R

From a dendrogram which i created with
hc<-hclust(kk)
hcd<-as.dendrogram(hc)
i picked a subbranch
k=hcd[[2]][[2]][[2]][[2]][[2]][[2]][[2]][1]
When i simply have k displayed, this gives:
> k
[[1]]
[[1]][[1]]
[1] 243
attr(,"label")
[1] "NAfrica_002"
attr(,"members")
[1] 1
attr(,"height")
[1] 0
attr(,"leaf")
[1] TRUE
[[1]][[2]]
[1] 257
attr(,"label")
[1] "NAfrica_016"
attr(,"members")
[1] 1
attr(,"height")
[1] 0
attr(,"leaf")
[1] TRUE
attr(,"members")
[1] 2
attr(,"midpoint")
[1] 0.5
attr(,"height")
[1] 37
How can i access, for example, the "midpoint" attribute, or the second of the "label" attributes?
(I hope i use the correct terminology here)
I have tried things like
k$midpoint
attr(k,"midpoint")
but both returned 'NULL'.
Sorry for question number 2: how could i add a "label" attribute after the attribute "midpoint"?

Your k is still buried one layer too deep. The attributes have been set on the first element of the list k.
attributes(k[[1]]) # Display attributes
attributes(k[[1]])$label # Access attributes
attributes(k[[1]])$label <- 'new' # Change attribute
Alternatively, you can use attr:
attr(k[[1]],'label') # Display attribute

You can change parameters manually as in the previous answer. The problem with this is that it is not efficient to do manually when you want to do it many times. Also, while it is easy to change parameters - that change may not be reflected in any other function, since they won't implement any action based on that change (it must be programmed).
For your specific question - it generally depends on which attribute we want to view. For "midpoint", use the get_nodes_attr function, with the "midpoint" parameter - from the dendextend package.
# install.packages("dendextend")
library(dendextend)
dend <- as.dendrogram(hclust(dist(USArrests[1:5,])))
# Like:
# dend <- USArrests[1:5,] %>% dist %>% hclust %>% as.dendrogram
# midpoint for all nodes
get_nodes_attr(dend, "midpoint")
And you get this:
[1] 1.25 NA 1.50 0.50 NA NA 0.50 NA NA
To also change an attribute, you can use the various assign functions from the package: assign_values_to_leaves_nodePar, assign_values_to_leaves_edgePar, assign_values_to_nodes_nodePar, assign_values_to_branches_edgePar, remove_branches_edgePar, remove_nodes_nodePar
If all you want is to change the labels, the following ability from the package would solve your question:
> labels(dend)
[1] "Arkansas" "Arizona" "California" "Alabama" "Alaska"
> labels(dend) <- 1:5
> labels(dend)
[1] 1 2 3 4 5
For more details on the package, you can have a look at its vignette.

Swap row 1-22 with 23-48

sampleFiles <- list.files(path="/path",pattern="*.txt");
> sampleFiles
[1] "D104.txt" "D121.txt" "D153.txt" "D155.txt" "D161.txt" "D162.txt" "D167.txt"
[8] "D173.txt" "D176.txt" "D177.txt" "D179.txt" "D204.txt" "D221.txt" "D253.txt"
[15] "D255.txt" "D261.txt" "D262.txt" "D267.txt" "D273.txt" "D276.txt" "D277.txt"
[22] "D279.txt" "N101.txt" "N108.txt" "N113.txt" "N170.txt" "N171.txt" "N172.txt"
[29] "N175.txt" "N181.txt" "N182.txt" "N183.txt" "N186.txt" "N187.txt" "N188.txt"
[36] "N201.txt" "N208.txt" "N213.txt" "N270.txt" "N271.txt" "N272.txt" "N275.txt"
[43] "N281.txt" "N282.txt" "N283.txt" "N286.txt" "N287.txt" "N288.txt"
How can I get all started with "N" first and "D" last? In other words swap them.

If you want to sort by letter (N, D) and number (101, ..) you could -just- swap your elements:
#random vector
vec <- c("D104.txt", "D121.txt", "D279.txt", "N101.txt", "N108.txt", "N113.txt")
#swap places
vec[c(grep("N", vec), grep("D", vec))]
[1] "N101.txt" "N108.txt" "N113.txt" "D104.txt" "D121.txt" "D279.txt"
grep finds what element of the vector has the pattern wanted. So, we move elements with "N" in front and with "D" in the back.
If you just want to sort with letters and numbers decreasing, you just (like Thomas suggested):
sort(vec, decreasing = T)
[1] "N113.txt" "N108.txt" "N101.txt" "D279.txt" "D121.txt" "D104.txt"
Also, since you know the indices of the elements you want to swap, then:
sampleFiles[c(23:48, 1:22)]

In this case it would be as simple as:
sampleFiles[c(23:48, 1:22)]
More general solutions have been suggested including, but sort(sampleFiles) would NOT succeed with "D" < "N". You could have used:
sampleFiles[rev(order(substr(sampleFiles, 1,1)))]
If you just used:
sampleFiles[rev(order(sampleFiles, 1,1))]
.. then the numeric values would get reversed as well. So you could have used chartr to swap them as the argument to order to selectively reverse the values of only "D" and "N":
sampleFiles[ order( chartr(c("DN"), c("ND"), sampleFiles) ) ]

Convert two matrices into a list using apply

I have two matrix with the same number of columns, but with different number of rows:
a <- cbind(runif(5), runif(5))
b <- cbind(runif(8), runif(8))
I want to associate these in a same list, so that the first columns of a and b are associated with each other, and so on:
my_result <- list(list(a[,1], b[,1]), list(a[,2], b[,2]))
So the result would look like this:
> print(my_result)
[[1]]
[[1]][[1]]
[1] 0.9440956 0.7259602 0.7804068 0.7115368 0.2771190
[[1]][[2]]
[1] 0.4155642 0.1535414 0.6983123 0.7578231 0.2126765 0.6753884 0.8160817
[8] 0.6548915
[[2]]
[[2]][[1]]
[1] 0.7343330 0.7751599 0.4463870 0.6926663 0.9692621
[[2]][[2]]
[1] 0.5708726 0.1234482 0.2875474 0.4760349 0.2027653 0.5142006 0.4788264
[8] 0.7935544
I can't figure how to do that without a for loop, but I'm pretty sure some *pply magic could be used here.
Any directions would be much appreciated.

I'm not sure how general a solution you're looking for (arbitrary number of matrices, ability to pass a list of matrices, etc.) but this works for your specific example:
lapply(1:2,function(i){list(a[,i],b[,i])})

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Convert number to alphabetically sortable character representation - r

Related

R: bench::mark does not return max and mean

Reducing a data.tree created from List

How to access attributes of a dendrogram in R

Swap row 1-22 with 23-48

Convert two matrices into a list using apply

Categories

Resources