order() not behaving as expected - r

Ok. I am pretty convinced I am about to embarrass myself, but here we go.
I cannot get order() to work properly. I am trying to come up with a composite ranking by two different factors, which is distilled down to an example below:
test1 <- rnorm(5)
test2 <- abs(rnorm(5))
test1; test2
> 0.4839582 0.1665794 -0.7648058 -0.5492701 0.6616983
> 0.8491913 0.2840523 2.3413548 0.7299879 0.1584666
test1Ord <- order(test1, decreasing = TRUE)
test2Ord <- order(test2)
test3Ord <- test1Ord + test2Ord
test1Ord; test2Ord; test3Ord
> 5 1 2 4 3
> 5 2 4 1 3
> 10 3 6 5 6
order(as.numeric(test3Ord), decreasing = TRUE)
> 1 3 5 4 2
As you can see, the vector c(10, 3, 6, 5, 6) should be ordered 1, 5, 3, 4, 2 or 1, 5, 2, 4, 3 (since the tie at 6). This is not what the output is.
Am I missing something?!

It looks like I was looking for rank(). (I was previously unaware of this function.) I am pretty familiar with order(), but got mixed up in what I was trying to do.
The rank() of the vector provides what I was going for.
Thanks to all for setting me straight!

Related

Change order of vector of nodes from level order to infix order in r

I have a vector of nodes taken from a binary regression tree. These are in level order, for example, 1,2,4,5,10,11. I would like to place them in infix order like so: 4,2,10,5,11,1. Thanks to Alistaire I have a solution that uses recursion. But as they point out, "There has to be a better way". I was hoping someone might be able to help me out with a non-recursive approach. The recursive version is very slow for vectors of any reasonable length. I have also tried creating a binary tree using igraph and data.tree but I cannot seem to get the ordering I want from these.
Yes, it's possible to do this without recursion since you are dealing with a binary tree, which has a fixed structure like the following tree with depth 5:
Suppose we have a vector of your nodes:
nodes <- c(1, 2, 4, 5, 10, 11)
First of all, we only want a binary tree that is of a suitable depth to accommodate your largest node. We can get the required depth by doing:
depth <- ceiling(log(max(nodes), 2))
And a data frame that gives the node number, depth and 'leftness' of a sufficiently large binary tree like this:
df <- data.frame(node = seq(2^(depth) - 1),
depth = rep(seq(depth), times = 2^(seq(depth) - 1)),
leftness = unlist(sapply(2^seq(depth) - 1,
function(x) (seq(x)[seq(x) %% 2 ==1])/(x + 1))))
However, we only need the subset of this tree that matches your nodes:
df <- df[match(nodes, df$node),]
df
#> node depth leftness
#> 1 1 1 0.5000
#> 2 2 2 0.2500
#> 4 4 3 0.1250
#> 5 5 3 0.3750
#> 10 10 4 0.3125
#> 11 11 4 0.4375
And we can sort the nodes in order according to leftness:
df$node[order(df$leftness)]
#> [1] 4 2 10 5 11 1
Which is your expected result.
To generalize this, just put the above steps in a function:
sort_left <- function(nodes) {
depth <- ceiling(log(max(nodes), 2))
df <- data.frame(node = seq(2^(depth) - 1),
depth = rep(seq(depth), times = 2^(seq(depth) - 1)),
leftness = unlist(sapply(2^seq(depth) - 1,
function(x) (seq(x)[seq(x) %% 2 ==1])/(x + 1))))
df <- df[match(nodes, df$node),]
df$node[order(df$leftness)]
}
So we can do:
sort_left( c(1, 2, 4, 5, 10, 11))
#> [1] 4 2 10 5 11 1
Or, given the example in your original question,
sort_left(c(1,2,4,5,10,11,20,21))
#> [1] 4 2 20 10 21 5 11 1
Which was the desired result. All without recursion.

Conditional Replacement Column Content--many ids to be updated

Thinking I can take the easy way out, I was going to use elseif to replace id codes in an entire dataset. I have a specific dataset with a id column. I have to replace these old ids with updated ids, but there are 50k+ rows with 270 unique ids. So, I first tried:
df$id<- ifelse(df$id== 2, 1,
ifelse(df$id== 3, 5,
ifelse(df$id == 4, 5,
ifelse(df$id== 6, NA,
ifelse(df$id== 7, 7,
ifelse(df$id== 285, NA,
ifelse(df$id== 8, 10,.....
ifelse(df$id=200, 19, df$id)
While this would have worked, I am limited to 51 nests, and I cannot separate them because it would only a 1/4 of the set. And then updates for first half would interfere as codes do overlap.
I then tried
df$id[df$id== 2] <- 1
and I was going to do that for every code. However, if I update all twos to one, there is still a later code in which old and new "1" will become X number, and I would only want the old "1" to become X... I actually think this takes out the if else even if 51 was not the limit. A function similar to vlookup in Excel? Any ideas?
Thanks!
Old forum related to replacing cell contents, but does not work in my case.
Replace contents of factor column in R dataframe
partial example
df <- data.frame(id=seq(1, 10))
old.id <- c(2, 3, 4, 6)
new.id <- c(1, 5, 5, NA)
df$id[df$id %in% old.id] <- new.id[unlist(sapply(df$id, function(x) which(old.id==x)))]
output
> df
id
1 1
2 1
3 5
4 5
5 5
6 NA
7 7
8 8
9 9
10 10

Using rollapply to process time window -- not fixed sample window for irregular time series

I am trying to compute various stats for a time window (think 20 seconds) for various signals which may or may not be recorded at every sample window. Additionally, the sampling interval is not regular -- it may be 2 or 3 or 4 seconds. Consider where t is the elapsed seconds of the experiment and d is the measurement:
require('zoo')
t<- c( 0, 1, 2, 4, 5, 6, 9, 10 )
d<- c( 2, 2, 2, 4, 4, 4, 8, 10 )
z<- zoo(d, t)
Now, as you see, there are no measurements at 3, 7, or 8 seconds. I would like to compute something like the max value in a 3 second window. Ideally my output would be like
NA, 2, 2, 4, 4, 4, 8, NA
(I don't need the NAs -- just trying to make the example clear.)
trying:
rollapply( z, 3, max)
1 2 4 5 6 9
2 4 4 4 8 10
Not quite what I'm looking for! Consider the rollapply result at t[3]. This should be 2 not 4 as the non-existent measure at 3s is IN the window, but the existing measurement at 4s is NOT. It "looks" like the results are just shifted, but you can play around with other numbers and realize it's just plain wrong.
I'm a noob to zoo, but fairly experienced in signal processing. Can't quite seem to get this to do what I need.
Thanks in advance.
Fill in the series with NAs at the missing points using a grid g and then use rollapplyr to right align the window (the default for rollapply is center alignment):
library(zoo)
g <- seq(start(z), end(z), 1.0)
zz <- merge(z, zoo(, g))
rollapplyr(zz, 3, max, na.rm = TRUE)
giving:
2 3 4 5 6 7 8 9 10
2 2 4 4 4 4 4 8 10

How to preserve the order of a vector in a table in R?

Pretty simple question, I assume. I am trying to do this for a different type of object (with class 'acf' and type 'list'), but I assume the answer is easily extendable for a vector (class numeric, type 'double'):
x<-c(4, 5, 6, 1, 2, 10, 15)
table(x)
x
1 2 4 5 6 10 15
1 1 1 1 1 1 1
I would like the output of the table to be in the same order as the vector (4, 5, 6, 1, 2, 10, 15). How can I achieve this?
table(factor(x, levels=unique(x)))

Benford's Law in R

I'm trying to implement Benford's Law in R. So far, everything works accordingly, except that if there are some first-digits with 0 occurrences, an exception is thrown:
Error in data.frame(digit = 1:9, actual.count = first_digit_counts, actual.fraction = first_digit_counts/nrow(fraudDetection), :
arguments imply differing number of rows: 9, 5
This is because for my current data set, there are only first digits starting with 1, 2, 7, 8 and 9. How can I make it such that 3, 4, 5, 6 will have a count of 0 instead of not appearing at all in the table?
Current Data Set:
This is the part that is causing the exception to be thrown:
first_digit_counts <- as.vector(table(fraudDetection$first.digit))
The current code in which this code fits in is as follows:
# load the required packages
require(reshape)
require(stringr)
require(plyr)
require(ggplot2)
require(scales)
# load in data from CSV file
fraudDetection <- read.csv("Fraud Case in Arizona 1993.csv")
names(fraudDetection)
# take only the columns containing the counts and manipulate the data into a "long" format with only one value per row
# let's try to compare the amount of the fraudulent transactions against the Benford's Law
fraudDetection <- melt(fraudDetection["Amount"])
# add columns containing the first and last digits, extracted using regular expressions
fraudDetection <- ddply(fraudDetection, .(variable), transform, first.digit = str_extract(value, "[123456789]"), last.digit = str_extract(value, "[[:digit:]]$"))
# compare counts of each actual first digit against the counts predicted by Benford’s Law
first_digit_counts <- as.vector(table(fraudDetection$first.digit))
first_digit_actual_vs_expected <- data.frame(
digit = 1:9,
actual.count = first_digit_counts,
actual.fraction = first_digit_counts / nrow(fraudDetection),
benford.fraction = log10(1 + 1 / (1:9))
)
In order to ensure that all digits are represented in first_digit_counts, you can convert first.digit to a factor, explicitly setting the levels so they include all digits from 1 to 9:
first_digit = c(1, 1, 3, 5, 5, 5, 7, 7, 7, 7, 9)
first_digit_factor = factor(first_digit, levels=1:9) # Explicitly set the levels
That makes your table calls perform as expected:
> table(first_digit)
first_digit
1 3 5 7 9
2 1 3 4 1
> table(first_digit_factor)
first_digit_factor
1 2 3 4 5 6 7 8 9
2 0 1 0 3 0 4 0 1
> as.vector(table(first_digit_factor))
[1] 2 0 1 0 3 0 4 0 1
A function for this is available from the rattle package
library(rattle)
dummy <- rnorm(100)
calcInitialDigitDistr(dummy, split = "none")
Useful one line function
benford = function(x) barplot(table(as.numeric(substr(x,1,1))))
benford(ggplot2::diamonds$price)

Resources