I need some hints to make effective loop in vector but for “FOR…” loop because of optimization issues.
At first glance, it is recommended to use such functions as apply(), sapply().
I have a vector converted into matrix:
x1<-c(1,2,4,1,4,3,5,3,1,0)
Looping through the vector I need to replace all x1[i+1]=x1[i] if x[i]>x[i+1].
Example:
Input vector:
x1<-as.matrix(c(1,2,4,1,4,3,5,3,1,0))
Output vector:
c(1,2,4,4,4,4,5,5,5,5)
My approach is to use user function in apply() but I have some difficulties how to code correctly the relation of x[i] and x[i+1] in user function.
I would be very grateful for your ideas or hints.
In general you can use Reduce with accumulate=TRUE for cumulative operations
Reduce(max,x1,accumulate=TRUE)
# [1] 1 2 4 4 4 4 5 5 5 5
But as #Khashaa points out, the common cases cumsum,cumprod,cummin, and yours, cummax are provided as efficient base functions.
cummax(x1)
# [1] 1 2 4 4 4 4 5 5 5 5
We could do this using ave. (Using the vector x1)
ave(x1,cumsum(c(TRUE,x1[-1]>x1[-length(x1)])), FUN=function(x) head(x,1))
#[1] 1 2 4 4 4 4 5 5 5 5
We create a grouping variable based on the condition described in the OP's post. Check whether the succeeding element (x1[-1] - removed first element) is greater than the current element (x1[-length(x1)] -removed last element).
x1[-1]>x1[-length(x1)]
#[1] TRUE TRUE FALSE TRUE FALSE TRUE FALSE FALSE FALSE
The length is one less than the length of the vector x1. So, we append TRUE to make the length equal and then do the cumsum
cumsum(c(TRUE,x1[-1]>x1[-length(x1)]))
#[1] 1 2 3 3 4 4 5 5 5 5
This we use as grouping variable in ave and select the first observation of 'x1'
within each group
Another option would to get the logical index (c(TRUE, x1[-1] > x1[-length(x1)])) as before, negate it (!) so that TRUE becomes FALSE, and FALSE as TRUE, convert the TRUE values to 'NA' (NA^(!...)), and then use na.locf from library(zoo) to replace the NA values with the preceding non-NA value.
library(zoo)
na.locf(x1*NA^(!c(TRUE,x1[-1]>x1[-length(x1)])))
#[1] 1 2 4 4 4 4 5 5 5 5
Related
My input is a vector like this
v = c(1,2,2,3,4,5,4,1,1)
unique(v) == c(1,2,3,4,5)
instead, I need to check and operate uniqueness only on pairs of subsequent element:
.f(v) == c(1,2,3,4,5,4,1)
Use rle from base R and extract the 'values'
rle(v)$values
[1] 1 2 3 4 5 4 1
unique gets the unique values from the whole dataset, whereas rle returns a list of 'values' and its lengths for each adjacent unique value
Or another option is to do a comparison with the current and adjacent value and apply duplicated to subset the vector
v[!duplicated(cumsum(c(TRUE, v[-1] != v[-length(v)])))]
[1] 1 2 3 4 5 4 1
Another possible solution:
v[v != dplyr::lag(v, default = Inf)]
#> [1] 1 2 3 4 5 4 1
This question already has answers here:
Calculating cumulative sum for each row
(6 answers)
Sum of previous rows in a column R
(1 answer)
Closed 3 years ago.
I have a vector of alternating TRUE and FALSE values:
dat <- c(T,F,F,T,F,F,F,T,F,T,F,F,F,F)
I'd like to number each instance of TRUE with a unique sequential number and to assign each FALSE value the number associated with the TRUE value preceding it.
therefore, my desired output using the example dat above (which has 4 TRUE values):
1 1 1 2 2 2 2 3 3 4 4 4 4 4
What I tried:
I've tried the following (which works), but I know there must be a simpler solution!!
whichT <- which(dat==T)
whichF <- which(dat==F)
l1 <- lapply(1:length(whichT),
FUN = function(x)
which(whichF > whichT[x] & whichF < whichT[(x+1)])
)
l1[[length(l1)]] <- which(whichF > whichT[length(whichT)])
replaceFs <- unlist(
lapply(1:length(whichT),
function(x) l1[[x]] <- rep(x,length(l1[[x]]))
)
)
replaceTs <- 1:length(whichT)
dat2 <- dat
dat2[whichT] <- replaceTs
dat2[whichF] <- replaceFs
dat2
[1] 1 1 1 2 2 2 2 3 3 4 4 4 4 4
I need a simpler and quicker solution b/c my real data set is 181k rows long!
Base R solutions preferred, but any solution works
cumsum(dat) will do what you want. When used in mathematical functions TRUE gets converted to 1 and FALSE to 0 so taking the cumulative sum will add 1 every time you see a TRUE and add nothing when there is a FALSE which is what you want.
dat <- c(T,F,F,T,F,F,F,T,F,T,F,F,F,F)
cumsum(dat)
# [1] 1 1 1 2 2 2 2 3 3 4 4 4 4 4
Instead of doing the indexing, it can be easily done with cumsum from base R. Here, TRUE/FALSE gets coerced to 1/0 and when we do the cumulative sum, whereever there is 1, it gets increment by 1
cumsum(dat)
#[1] 1 1 1 2 2 2 2 3 3 4 4 4 4 4
cumsum() is the most straightforward way, however, you can also do:
Reduce("+", dat, accumulate = TRUE)
[1] 1 1 1 2 2 2 2 3 3 4 4 4 4 4
This question already has answers here:
Finding the index of first changes in the elements of a vector
(5 answers)
Closed 4 years ago.
I have a lot of data frames in R which look like that:
A B
1 0
2 0
3 0
4 1
5 1
6 1
So between 3 and 4 the B changes value from 0 to 1. What is the most R way of returning the value of A where B changes value?
In the data B changes the value only once, and A is sorted (from 1 to n).
Here is a possible way. Use diff to get the values where column b changes but be carefull, the first value of b, by definition of change, hasn't changed. (The problem is that diff returns a vector with one less element.)
inx <- c(FALSE, diff(data$b) != 0)
data[inx, ]
# a b
#4 4 1
After seeing the OP's comment to another post, the following code shows that this method can also solve the issue when b starts with any value,not just zero.
data2 <- data.frame(a=c(1,2,3,4,5,6),b=c(1,1,1,0,0,0))
inx <- c(FALSE, diff(data2$b) != 0)
data2[inx, ]
# a b
#4 4 0
As OP mentioned,
In the data B changes the value only once
We can use cumsum with duplicated and which.max
which.max(cumsum(!duplicated(df$B)))
#[1] 4
If the value changes multiple times, this will give the index for last change instead.
If we need to subset the row, then we can do
df[which.max(cumsum(!duplicated(df$B))), ]
# A B
#4 4 1
To break it down further, for better understanding
!duplicated(df$B)
#[1] TRUE FALSE FALSE TRUE FALSE FALSE
cumsum(!duplicated(df$B))
#[1] 1 1 1 2 2 2
which.max(cumsum(!duplicated(df$B)))
#[1] 4
In order to identify a change in a sequence, one may use diff, like in the following code:
my_df <- data.frame(A = 1:6, B = c(0,0,0,1,1,1))
which(diff(my_df$B)==1)+1
[1] 4
I have task to multiply numbers in vector, but only those that can be divided by 3 modulo 0. I figured out how to replace certain elements in vector by different numbers, but it works only if i replace with certain number. I wasn't able to find any answer here http://www.r-tutor.com/r-introduction/vector or even on this site. Everyone only extracting values to another vector.
x <- c(1,1,2,2,2,3,3)
x[x%%2==0] = 5
# [1] 1 1 5 5 5 3 3
why this doesn't work ?
x[x%%3==0] = x*3
I expect to get this:
c(1,1,5,5,5,9,9)
The assignment vectors are not the same on the lhs and rhs of the assignment operator.
length(x*3)
#[1] 7
length(x[x%%3 ==0])
#[1] 2
We need to do
x[x%%3==0] <- x[x%%3==0]*3
x
#[1] 1 1 5 5 5 9 9
Instead of repeating the logical vector, an object can be created and then do the substitution
i1 <- x%%3 == 0
x[i1] <- x[i1]*3
In the first assignment, there was only a single element and it was assigned to replace the values returned by the logical condition is met
Another option is
pmax(x, x*(!x%%3)*3)
#[1] 1 1 5 5 5 9 9
Having two dataframes:
x <- data.frame(numbers=c('1','2','3','4','5','6','7','8','9'), coincidence="NA")
and
y <- data.frame(numbers=c('1','3','10'))
How can I check if the observations in y (1, 3 and 10) also exist in x and fill accordingly the column x["coincidence"] (for example with YES|NO, TRUE|FALSE...).
I would do the same in Excel with a formula combining IFERROR and VLOOKUP, but I don't know how to do the same with R.
Note:
I am open to change data.frames to tables or use libraries. The dataframe with the numbers to check (y) will never have more than 10-20 observations, while the other one (x) will never have more than 1K observations. Therefore, I could also iterate with an if, if it´s necessary
We can create the vector matching the desired output with a set difference search that outputs boolean TRUE and FALSE values where appropriate. The sign %in%, is a binary operator that compares the values on the left-hand side to the set of values on the right:
x$coincidence <- x$numbers %in% y$numbers
# numbers coincidence
# 1 1 TRUE
# 2 2 FALSE
# 3 3 TRUE
# 4 4 FALSE
# 5 5 FALSE
# 6 6 FALSE
# 7 7 FALSE
# 8 8 FALSE
# 9 9 FALSE
Do numbers have to be factors, as you've set them up? (They're not numbers, but character.) If not, it's easy:
x <- data.frame(numbers=c('1','2','3','4','5','6','7','8','9'), coincidence="NA", stringsAsFactors=FALSE)
y <- data.frame(numbers=c('1','3','10'), stringsAsFactors=FALSE)
x$coincidence[x$numbers %in% y$numbers] <- TRUE
> x
numbers coincidence
1 1 TRUE
2 2 NA
3 3 TRUE
4 4 NA
5 5 NA
6 6 NA
7 7 NA
8 8 NA
9 9 NA
If they need to be factors, then you'll need to either set common levels or use as.character().