Cumulative sum after 64 in R(Conditional Cumulative sum) - r

I am new to R. I am not sure how to do the following function in R. I am able to do this in excel. But not able to do it in R. Can anybody help me in this?
I want to get the cumulative sum of the counter value once it reaches 64,
The following is my data,
x
57
57
57
57
57
57
58
58
58
58
61
61
62
62
1
1
11
16
16
16
16
16
16
22
22
22
27
28
I want the cumulative sum after the count reaches 64. I am not sure how to do that in R.
The following is the output I require,
x
57
57
57
57
57
57
58
58
58
58
61
61
62
62
65
65
76
81
81
81
81
81
81
87
87
87
92
93
Can anybody help with doing this?
Thanks

If your data is resetting at 64, and continuing on, and you want it to keep the 64, try:
diffs <- c(dat$x[1], diff(dat$x))
diffs[diffs < 0] <- 64 + diffs[diffs < 0]
cumsum(diffs)
An explanation:
The first line takes all the differences from one number to the next, starting with the initial value (in the example case, 57).
The second line finds all negative diff values, and changes them to 64 + what they were - if it was 62 changing to 2, we need to add on 4: 2 to hit 64 and then 2 more.
The third line takes the cumsum to give the final values.

Related

R: indexing by T/F vector returns more values than it should -why?

I stumbled upon something weird, where I don't know why r behaves in the way it does.
I have a vector, with values from 1:100 (vec100). Now I only want to get the values lower than 50. Normally, I'd write vec100[vec100<50] and would be happy. Today, however, I used the assigned the logical vec100<50 to an object x for reasons of demonstration. To show, that this approach is dangerous, as it selects the positions and not the values per se, I selected x from another vector vec2 <- c(vec100,20,50,100,10). Funnily, it also returns those added values although they are out of the range of x, and I don't find an explanation why it does that.
vec <- 1:100
x <- vec<50 #logical vector of length 100
vec[x] #selects the values just as
vec[vec<50] #does
#now let's add some values
vec2 <- c(vec, 20,50,100,10)
vec2[x]
#returns:
# [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
#[30] 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 20 50 100 10
# so adds 20, 50, 100, 10 although those positions are not in x
#though it omits those new positions (not in x) when I look at the values which are not TRUE in x
vec2[x==F]
# [1] 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 #73 74 75 76 77 78
#[30] 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
confused greetings,
Lea
It is basically due to recycling of vector when one of the vector is of lesser length. i.e. the lesser length vector recycles from the beginning. Here the 'x' is a logical vector of the same length as vec, but, when we concatenated four more elements to create 'vec2', the 'x' begins from the beginning i.e. x[1:4] is recycled. It can be checked by
v1 <- vec2[x]
v2 <- vec2[c(x, x[1:4])]
identical(v1, v2)
#[1] TRUE

Contradiction between complete.casses and which and is.na that I do not understand

I use a simple example from dataset "airquality".
The first four rows are complete which can be checked simply with complete.cases
Row 5 contains missing values.
Row 6 also contains missing values.
This can be checked quickly by:
is.na(airquality[5,])
is.na(airquality[6,])
I would expect that which(is.na(airquality)) would give me the list of row numbers that include at least one true statement, i.e. at least one NA value.
However, it lists 5, 10, 25 ... , i.e. row number 6 is NOT listed. Why? there is a NA value in row number 6!
library(datasets)
complete.cases(airquality)
is.na(airquality[5,])
is.na(airquality[6,])
which(is.na(airquality))
There is obviously something that I do not understand here.
From help("is.na"):
The data frame method for is.na returns a logical matrix with the same
dimensions as the data frame, and with dimnames taken from the row and
column names of the data frame.
In other words, it's not giving you the information you're assuming it's giving you. It's giving you the elements of the matrix described above, by counting going down the columns. Try
# get the cases with missingness
which(!complete.cases(airquality))
[1] 5 6 10 11 25 26 27 32 33 34 35 36 37 39 42 43 45 46 52
[20] 53 54 55 56 57 58 59 60 61 65 72 75 83 84 96 97 98 102 103
[39] 107 115 119 150
# and check against is.na
unique(sort(which(is.na(airquality), arr.ind = TRUE)[ , 1]))
[1] 5 6 10 11 25 26 27 32 33 34 35 36 37 39 42 43 45 46 52
[20] 53 54 55 56 57 58 59 60 61 65 72 75 83 84 96 97 98 102 103
[39] 107 115 119 150
all.equal(which(!complete.cases(airquality)),
unique(sort(which(is.na(airquality), arr.ind = TRUE)[ , 1])))
TRUE

Select values within/outside of a set of intervals (ranges) R

I've got some sort of index, like:
index <- 1:100
I've also got a list of "exclusion intervals" / ranges
exclude <- data.frame(start = c(5,50, 90), end = c(10,55, 95))
start end
1 5 10
2 50 55
3 90 95
I'm looking for an efficient way (in R) to remove all the indexes that belong in the ranges in the exclude data frame
so the desired output would be:
1,2,3,4, 11,12,...,48,49, 56,57,...,88,89, 96,97,98,99,100
I could do this iteratively: go over every exclusion interval (using ddply) and iteratively remove indexes that fall in each interval. But is there a more efficient way (or function) that does this?
I'm using library(intervals) to calculate my intervals, I could not find a built-in function tha does this.
Another approach that looks valid could be:
starts = findInterval(index, exclude[["start"]])
ends = findInterval(index, exclude[["end"]])# + 1L) ##1 needs to be added to remove upper
##bounds from the 'index' too
index[starts != (ends + 1L)] ##a value above a lower bound and
##below an upper is inside that interval
The main advantage here is that no vectors including all intervals' elements need to be created and, also, that it handles any set of values inside a particular interval; e.g.:
set.seed(101); x = round(runif(15, 1, 100), 3)
x
# [1] 37.848 5.339 71.259 66.111 25.736 30.705 58.902 34.013 62.579 55.037 88.100 70.981 73.465 93.232 46.057
x[findInterval(x, exclude[["start"]]) != (findInterval(x, exclude[["end"]]) + 1L)]
# [1] 37.848 71.259 66.111 25.736 30.705 58.902 34.013 62.579 55.037 88.100 70.981 73.465 46.057
We can use Map to get the sequence for the corresponding elements in 'start' 'end' columns, unlist to create a vector and use setdiff to get the values of 'index' that are not in the vector.
setdiff(index,unlist(with(exclude, Map(`:`, start, end))))
#[1] 1 2 3 4 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
#[20] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
#[39] 45 46 47 48 49 56 57 58 59 60 61 62 63 64 65 66 67 68 69
#[58] 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88
#[77] 89 96 97 98 99 100
Or we can use rep and then use setdiff.
i1 <- with(exclude, end-start) +1L
setdiff(index,with(exclude, rep(start, i1)+ sequence(i1)-1))
NOTE: Both the methods return the index position that needs to be excluded. In the above case, the original vector ('index') is a sequence so I used setdiff. If it contains random elements, use the position vector appropriately, i.e.
index[-unlist(with(exclude, Map(`:`, start, end)))]
or
index[setdiff(seq_along(index), unlist(with(exclude,
Map(`:`, start, end))))]
Another approach
> index[-do.call(c, lapply(1:nrow(exclude), function(x) exclude$start[x]:exclude$end[x]))]
[1] 1 2 3 4 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
[25] 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 56 57 58 59 60
[49] 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84
[73] 85 86 87 88 89 96 97 98 99 100

reorder a vector to first, last, second, second last, etc

Is there actually an easy solution to reordering a vector like
first element, last element, second element, second last element, etc.
So I expect for c(1,2,3,4,5) to get c(1,5,2,4,3).
The reason is I have a color palette with 16 colours and color 1 is very similar to two but not to color 16. But within my plots, the dots coloured by color 1 are close to the ones are coloured by color 2.
For my color palette I use Set 1 from color brewer and also use colorRampPalette to calculate colours in between, so they get a bit similar.
One solution would be to just sample(my_colors) but actually I would like to reorder them like I told above.
This will do what you need:
a <- c(1,2,3,4,5)
b <- rbind(a,a[5:1])
c <-b [1:5]
Hope this helps
Here is a fiddle
You can generalise this with
rbind(a,rev(a))[1:length(a)]
Here is an easy way to do this:
a<-seq(1,100)
b<-a-median(a)
names(a)=b
a<-order(-abs(b))
print(a)
[1] 1 100 2 99 3 98 4 97 5 96 6 95 7 94 8 93 9 92 10 91 11 90 12 89 13 88 14 87
[29] 15 86 16 85 17 84 18 83 19 82 20 81 21 80 22 79 23 78 24 77 25 76 26 75 27 74 28 73
[57] 29 72 30 71 31 70 32 69 33 68 34 67 35 66 36 65 37 64 38 63 39 62 40 61 41 60 42 59
[85] 43 58 44 57 45 56 46 55 47 54 48 53 49 52 50 51
From the comments:
1: From #bgoldst: A better (one line) approach that doesn't involve vector names:
a[order(-abs(a-median(a)))]
2: (Also from bgoldst) For dealing with non-numeric (alphabetical order) values:
letters[order(-abs(seq_along(letters)-(length(letters)+1)/2))]

Producing numeric sequences in R using standard patterns

I am working on a project where I need to enter a number of "T score" tables into R. These are tables used to convert raw test scores into standardized values. They generally follow a specific pattern, but not one that is simple. For instance, one pattern is:
34,36,39,42,44,47,50,52,55,58,60,63,66,68,
71,74,76,79,82,84,87,90,92,95,98,100,103,106
I'd prefer to use a simple function to fill these in, rather than typing them by hand. I know that the seq() function can create a simple seqeuence, like:
R> seq(1,10,2)
[1] 1 3 5 7 9
Is there any way to create more complex sequences based on specific patterns? For instance, the above data could be done as:
c(34,seq(36:106,c(3,3,2)) # The pattern goes 36,39,42,44,47,50,52 (+3,+3,+2)
...however, this results in an error. I thought there would be a function that should do this, but all my Google-fu has just brought me back to the original seq().
This could be done using the cumsum (cumulative sum) function and rep:
> 31 + cumsum(rep(c(3, 2, 3), 9))
[1] 34 36 39 42 44 47 50 52 55 58 60 63 66 68 71 74 76 79 82
[20] 84 87 90 92 95 98 100 103
To make sure sure the sequence stops at the right place:
> (31 + cumsum(rep(c(3, 2, 3), 10)))[1:28]
[1] 34 36 39 42 44 47 50 52 55 58 60 63 66 68 71 74 76 79 82
[20] 84 87 90 92 95 98 100 103 106
Here is a custom function that should work in most cases. It uses the cumulative sum (cumsum()) of a sequence, and integer division to calculate the length of the desired sequence.
cseq <- function(from, to, by){
times <- (to-from) %/% sum(by)
x <- cumsum(c(from, rep(by, times+1)))
x[x<=to]
}
Try it:
> cseq(36, 106, c(3,3,2))
[1] 36 39 42 44 47 50 52 55 58 60 63 66 68 71 74 76 79 82 84 87 90 92 95 98
[25] 100 103 106
> cseq(36, 109, c(3,3,2))
[1] 36 39 42 44 47 50 52 55 58 60 63 66 68 71 74 76 79 82 84 87 90 92 95 98
[25] 100 103 106 108
Here is a non-iterative solution, in case you need a specific element of the sequence
f <- function(x){
d <- (x) %/% 3
r <- x %% 3
31 + d*8 + c(0,3,5)[r+1]
}
> f(1:10)
[1] 34 36 39 42 44 47 50 52 55 58

Resources