reorder a vector to first, last, second, second last, etc - r

Is there actually an easy solution to reordering a vector like
first element, last element, second element, second last element, etc.
So I expect for c(1,2,3,4,5) to get c(1,5,2,4,3).
The reason is I have a color palette with 16 colours and color 1 is very similar to two but not to color 16. But within my plots, the dots coloured by color 1 are close to the ones are coloured by color 2.
For my color palette I use Set 1 from color brewer and also use colorRampPalette to calculate colours in between, so they get a bit similar.
One solution would be to just sample(my_colors) but actually I would like to reorder them like I told above.

This will do what you need:
a <- c(1,2,3,4,5)
b <- rbind(a,a[5:1])
c <-b [1:5]
Hope this helps
Here is a fiddle
You can generalise this with
rbind(a,rev(a))[1:length(a)]

Here is an easy way to do this:
a<-seq(1,100)
b<-a-median(a)
names(a)=b
a<-order(-abs(b))
print(a)
[1] 1 100 2 99 3 98 4 97 5 96 6 95 7 94 8 93 9 92 10 91 11 90 12 89 13 88 14 87
[29] 15 86 16 85 17 84 18 83 19 82 20 81 21 80 22 79 23 78 24 77 25 76 26 75 27 74 28 73
[57] 29 72 30 71 31 70 32 69 33 68 34 67 35 66 36 65 37 64 38 63 39 62 40 61 41 60 42 59
[85] 43 58 44 57 45 56 46 55 47 54 48 53 49 52 50 51
From the comments:
1: From #bgoldst: A better (one line) approach that doesn't involve vector names:
a[order(-abs(a-median(a)))]
2: (Also from bgoldst) For dealing with non-numeric (alphabetical order) values:
letters[order(-abs(seq_along(letters)-(length(letters)+1)/2))]

Related

In igraph, which network specifications allow groups of nodes to have the same distribution?

I am currently trying to generate a network where the degree distribution has a large variance, but with a sufficient number of nodes at each degree. For example, in igraph, if we use the Barabasi-Albert network, we can do:
g <- sample_pa(n=100,power = 1,m = 10)
g_adj <- as.matrix(as_adj(g))
rowSums(g_adj)
[1] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
[29] 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55
[57] 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83
[85] 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
The above shows the degree on each of the 100 nodes. The problem for me is that I would like to only have 10-15 unique degree values, so that instead of having 93 94 95 96 97 98 99 at the end, we have instead, for example, 93 for each of the last 7 nodes. In other words, when I call
unique(rowSums(g_adj))
I'd like at most 10-15 values. Is there a way to "cluster" the nodes instead of having so many different unique degree values? thanks.
You may use sample_degseq: Generate random graphs with a given degree sequence. For instance,
degrees <- seq(1, 61, length = 10) # Ten different degrees
times <- rep(10, 10) # Giving each of the degrees to ten vertices
g <- sample_degseq(rep(degrees, times = times), method = "vl")
table(degree(g))
# 1 7 14 21 27 34 41 47 54 61
# 10 10 10 10 10 10 10 10 10 10
Note that you may need to play with degree and times as ultimately rep(degrees, times = times) needs to be a graphic sequence.

Select values within/outside of a set of intervals (ranges) R

I've got some sort of index, like:
index <- 1:100
I've also got a list of "exclusion intervals" / ranges
exclude <- data.frame(start = c(5,50, 90), end = c(10,55, 95))
start end
1 5 10
2 50 55
3 90 95
I'm looking for an efficient way (in R) to remove all the indexes that belong in the ranges in the exclude data frame
so the desired output would be:
1,2,3,4, 11,12,...,48,49, 56,57,...,88,89, 96,97,98,99,100
I could do this iteratively: go over every exclusion interval (using ddply) and iteratively remove indexes that fall in each interval. But is there a more efficient way (or function) that does this?
I'm using library(intervals) to calculate my intervals, I could not find a built-in function tha does this.
Another approach that looks valid could be:
starts = findInterval(index, exclude[["start"]])
ends = findInterval(index, exclude[["end"]])# + 1L) ##1 needs to be added to remove upper
##bounds from the 'index' too
index[starts != (ends + 1L)] ##a value above a lower bound and
##below an upper is inside that interval
The main advantage here is that no vectors including all intervals' elements need to be created and, also, that it handles any set of values inside a particular interval; e.g.:
set.seed(101); x = round(runif(15, 1, 100), 3)
x
# [1] 37.848 5.339 71.259 66.111 25.736 30.705 58.902 34.013 62.579 55.037 88.100 70.981 73.465 93.232 46.057
x[findInterval(x, exclude[["start"]]) != (findInterval(x, exclude[["end"]]) + 1L)]
# [1] 37.848 71.259 66.111 25.736 30.705 58.902 34.013 62.579 55.037 88.100 70.981 73.465 46.057
We can use Map to get the sequence for the corresponding elements in 'start' 'end' columns, unlist to create a vector and use setdiff to get the values of 'index' that are not in the vector.
setdiff(index,unlist(with(exclude, Map(`:`, start, end))))
#[1] 1 2 3 4 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
#[20] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
#[39] 45 46 47 48 49 56 57 58 59 60 61 62 63 64 65 66 67 68 69
#[58] 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88
#[77] 89 96 97 98 99 100
Or we can use rep and then use setdiff.
i1 <- with(exclude, end-start) +1L
setdiff(index,with(exclude, rep(start, i1)+ sequence(i1)-1))
NOTE: Both the methods return the index position that needs to be excluded. In the above case, the original vector ('index') is a sequence so I used setdiff. If it contains random elements, use the position vector appropriately, i.e.
index[-unlist(with(exclude, Map(`:`, start, end)))]
or
index[setdiff(seq_along(index), unlist(with(exclude,
Map(`:`, start, end))))]
Another approach
> index[-do.call(c, lapply(1:nrow(exclude), function(x) exclude$start[x]:exclude$end[x]))]
[1] 1 2 3 4 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
[25] 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 56 57 58 59 60
[49] 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84
[73] 85 86 87 88 89 96 97 98 99 100

Fixing inconsistent spacing after ## in output of knitted document

Working in RStudio, I am using knitr to create pdf files with chunks of code. In the following example, notice how in the output, spacing after the ## characters is different across the three vectors:
This looks pretty neat, but I am writing a document with examples having only one line of output and I'd like to have all the [1]'s properly in line with one another.
In the example, that would mean removing an extra space after the ##'s for the second vector. I am only starting to work with knitr and latex, so I'm not sure how I would achieve such a thing. Some sort of post-processing of the .tex? Or maybe something simpler?
This is not a knitr problem but arises from R's printing:
> 1:5
[1] 1 2 3 4 5
> 1:10
[1] 1 2 3 4 5 6 7 8 9 10
> 1:100
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
[19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
[37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
[55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
[73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
[91] 91 92 93 94 95 96 97 98 99 100
Post processing would stop your output looking like it would from R.
I'd work on getting that fixed in base R (if it is really a bug and not intended) rather than try to special case this. An RDevel email with the above example (confirmed in a recent R - the above was with 3.0.x-patched) would help you clarify (if) you need a work around.
To focus attention, consider (From #Dominic Comtois' comment)
> 20:28
[1] 20 21 22 23 24 25 26 27 28
> 20:29
[1] 20 21 22 23 24 25 26 27 28 29
why does adding a tenth element change the way R prints the vector?
This may not necessarily be an ideal solution, but I hope it will vaguely suit your needs after some tweaks.
I've defined an "adjusted" print function:
print_adj <- function(x, adjpos=6, width=3) {
# capture output
con <- textConnection("text", open="w")
sink(con)
print(format(x, width=width), quote=FALSE)
sink()
close(con)
library(stringr)
pos <- str_locate(text, fixed("]"))
for (i in seq_along(text))
text[i] <- str_c(str_dup(" ", adjpos-pos[i,1]), text[i])
cat(text, sep="\n")
}
It prints a vector x in such a way that:
the square bracket ] always occurs in the given text column
each element occupies exactly width text columns
Sample output:
> print_adj(1:5)
[1] 1 2 3 4 5
> print_adj(1:10)
[1] 1 2 3 4 5 6 7 8 9 10
> print_adj(1:100)
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
[29] 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56
[57] 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84
[85] 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
If you'd like to use this function in a knitr chunk, try:
<<eval=2,echo=1>>=
print(1:100) # don't eval
print_adj(1:100) # don't print this cmd
#
I was able to solve my problem by defining a hook, as Gavin Simpson suggested.
\documentclass{article}
\begin{document}
<<setup, include=FALSE>>=
require(stringr)
hook.out <- function(x, options)
return(str_c("\\begin{verbatim}",
sub("\\s+\\[1\\]\\s+"," [1] ",x),
"\\end{verbatim}"))
knit_hooks$set(output=hook.out)
#
<<>>=
1:9
1:10
#
\end{document}
Output now looks like this:
My only remaining concern is that for longer vectors, I will need to bypass the hook and I don't know how to do that.
Credits also go to Rod Alence for his example on this page.

Stop printing after n number of lines

The getOption("max.print") can be used to limit the number of values that can be printed from a single function call. For example:
options(max.print=20)
print(cars)
prints only the first 10 rows of 2 columns. However, max.print doesn't work very well lists. Especially if they are nested deeply, the amount of lines printed to the console can still be infinite.
Is there any way to specify a harder cutoff of the amount that can be printed to the screen? For example by specifying the amount of lines after which the printing can be interrupted? Something that also protects against printing huge recursive objects?
Based in part on this question, I would suggest just building a wrapper for print that uses capture.output to regulate what is printed:
print2 <- function(x, nlines=10,...)
cat(head(capture.output(print(x,...)), nlines), sep="\n")
For example:
> print2(list(1:10000,1:10000))
[[1]]
[1] 1 2 3 4 5 6 7 8 9 10 11 12
[13] 13 14 15 16 17 18 19 20 21 22 23 24
[25] 25 26 27 28 29 30 31 32 33 34 35 36
[37] 37 38 39 40 41 42 43 44 45 46 47 48
[49] 49 50 51 52 53 54 55 56 57 58 59 60
[61] 61 62 63 64 65 66 67 68 69 70 71 72
[73] 73 74 75 76 77 78 79 80 81 82 83 84
[85] 85 86 87 88 89 90 91 92 93 94 95 96
[97] 97 98 99 100 101 102 103 104 105 106 107 108

Shingles with lattice package's equal.count()

Why does the equal.count() function create overlapping shingles when it is clearly possible to create groupings with no overlap. Also, on what basis are the overlaps decided?
For example:
equal.count(1:100,4)
Data:
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
[23] 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
[45] 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66
[67] 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88
[89] 89 90 91 92 93 94 95 96 97 98 99 100
Intervals:
min max count
1 0.5 40.5 40
2 20.5 60.5 40
3 40.5 80.5 40
4 60.5 100.5 40
Overlap between adjacent intervals:
[1] 20 20 20
Wouldn't it be better to create groups of size 25 ? Or maybe I'm missing something that makes this functionality useful?
The overlap smooths transitions between the shingles (which, as the name says, overlap on the roof), but a better choice would have been to use some windowing function such as in spectral analysis.
I believe it is a pre-historic relic, because the behavior goes back to some very old pre-lattice code and is used in coplot remembered only by veteRans. lattice::equal.count calls co.intervals in graphics, where you will find some explanation. Try:
lattice:::equal.count(1:100,4,overlap=0)

Resources