Create sequence based on a condition - r

How to conditionally increment if the previous value is greater than the current value? Say I have a column x on my data frame and I want a column y which starts from 1 and increments if the previous value is greater than the current.
x y
1 1
2 1
3 1
4 1
5 1
6 1
1 2
2 2
3 2
4 2
5 2
6 2
7 2
8 2
1 3
2 3
5 3

As #A5C1D2H2I1M1N2O1R2T1 mentioned, you can use cumsum with diff to generate y.
cumsum(diff(x) < 0) + 1
#[1] 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3
You might want to prepend 1 in the beginning to get y with same length as x.
c(1, cumsum(diff(x) < 0) + 1)
#[1] 1 1 1 1 1 1 2 2 2 2 2 2 2 2 3 3 3
data
x <- c(1:6, 1:8, 1, 2, 5)

Related

How to keep only first value in every sequence of duplicated values in R [duplicate]

This question already has answers here:
Select first row in each contiguous run by group
(4 answers)
Closed 5 months ago.
I am trying to create a subset where I keep the first value in each sequence of numbers in a column. I tried to use:
df %>% group_by(x) %>% slice_head(n = 1)
But it only works for the first instance of each sequence.
An example data where x column contains the repeated sequence can be seen below:
x = c(2,2,2,3,3,3,1,1,1,5,5,5,2,2,2,1,1,1,3,3,3)
y = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)
df= data.frame(x,y)
> df
x y
1 2 1
2 2 1
3 2 1
4 3 1
5 3 1
6 3 1
7 1 1
8 1 1
9 1 1
10 5 1
11 5 1
12 5 1
13 2 1
14 2 1
15 2 1
16 1 1
17 1 1
18 1 1
19 3 1
20 3 1
21 3 1
So the end result that I would like to achive is:
x = c(2,3,1,5,2,1,3)
y = c(1,1,1,1,1,1,1)
df= data.frame(x,y)
> df
x y
1 2 1
2 3 1
3 1 1
4 5 1
5 2 1
6 1 1
7 3 1
Could you please help or point me to any useful existing topics as I haven't managed to find it?
Thanks
You can try rleid from package data.table
> library(data.table)
> setDT(df)[!duplicated(rleid(x))]
x y
1: 2 1
2: 3 1
3: 1 1
4: 5 1
5: 2 1
6: 1 1
7: 3 1
Base R.
df[c(1, diff(df$x)) != 0, ]
Or also with helper functions from data.table.
library(data.table)
df[rowid(rleid(df$x)) == 1L, ]
# x y
# 1 2 1
# 4 3 1
# 7 1 1
# 10 5 1
# 13 2 1
# 16 1 1
# 19 3 1
Using rle and match.
df[match(with(rle(df$x), values), df$x), ]
# x y
# 1 2 1
# 4 3 1
# 7 1 1
# 10 5 1
# 1.1 2 1
# 7.1 1 1
# 4.1 3 1

rep and/or seq function to create continuously reducing vector?

Suppose I have a vector from 1 to 5,
a<-c(1:5)
What I need to do is to repeat the vector by losing one element continuously. That is, the final outcome should be like
1 2 3 4 5 1 2 3 4 1 2 3 1 2 1
We can reverse the vector and apply sequence
sequence(rev(a))
#[1] 1 2 3 4 5 1 2 3 4 1 2 3 1 2 1
Or another option is toeplitz
m1 <- toeplitz(a)
m1[lower.tri(m1, diag=TRUE)]
#[1] 1 2 3 4 5 1 2 3 4 1 2 3 1 2 1

Select rows of data frame based on a vector with duplicated values

What I want can be described as: give a data frame, contains all the case-control pairs. In the following example, y is the id for the case-control pair. There are 3 pairs in my data set. I'm doing a resampling with respect to the different values of y (the pair will be both selected or neither).
sample_df = data.frame(x=1:6, y=c(1,1,2,2,3,3))
> sample_df
x y
1 1 1
2 2 1
3 3 2
4 4 2
5 5 3
6 6 3
select_y = c(1,3,3)
select_y
> select_y
[1] 1 3 3
Now, I have computed a vector contains the pairs I want to resample, which is select_y above. It means the case-control pair number 1 will be in my new sample, and number 3 will also be in my new sample, but it will occur 2 times since there are two 3. The desired output will be:
x y
1 1
2 1
5 3
6 3
5 3
6 3
I can't find out an efficient way other than writing a for loop...
Solution:
Based on #HubertL , with some modifications, a 'vectorized' approach looks like:
sel_y <- as.data.frame(table(select_y))
> sel_y
select_y Freq
1 1 1
2 3 2
sub_sample_df = sample_df[sample_df$y%in%select_y,]
> sub_sample_df
x y
1 1 1
2 2 1
5 5 3
6 6 3
match_freq = sel_y[match(sub_sample_df$y, sel_y$select_y),]
> match_freq
select_y Freq
1 1 1
1.1 1 1
2 3 2
2.1 3 2
sub_sample_df$Freq = match_freq$Freq
rownames(sub_sample_df) = NULL
sub_sample_df
> sub_sample_df
x y Freq
1 1 1 1
2 2 1 1
3 5 3 2
4 6 3 2
selected_rows = rep(1:nrow(sub_sample_df), sub_sample_df$Freq)
> selected_rows
[1] 1 2 3 3 4 4
sub_sample_df[selected_rows,]
x y Freq
1 1 1 1
2 2 1 1
3 5 3 2
3.1 5 3 2
4 6 3 2
4.1 6 3 2
Another method of doing the same without a loop:
sample_df = data.frame(x=1:6, y=c(1,1,2,2,3,3))
row_names <- split(1:nrow(sample_df),sample_df$y)
select_y = c(1,3,3)
row_num <- unlist(row_names[as.character(select_y)])
ans <- sample_df[row_num,]
I can't find a way without a loop, but at least it's not a for loop, and there is only one iteration per frequency:
sample_df = data.frame(x=1:6, y=c(1,1,2,2,3,3))
select_y = c(1,3,3)
sel_y <- as.data.frame(table(select_y))
do.call(rbind,
lapply(1:max(sel_y$Freq),
function(freq) sample_df[sample_df$y %in%
sel_y[sel_y$Freq>=freq, "select_y"],]))
x y
1 1 1
2 2 1
5 5 3
6 6 3
51 5 3
61 6 3

R - How create a variable based in another variable

I have:
v1 <- c(1,1,1,2,2,2,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4)
and I want create v2 which assigns to v1 the number of sets of 3 elements:
v2 <- c(1,1,1,1,1,1,1,1,1,2,2,2,3,3,3,1,1,1,2,2,2)
Explanation:
For the first three times a number is repeated the value corresponding to that number is a 1, for the second three times it's a 2, and so on.
v1 <- c(1,1,1,2,2,2,3,3,3,3,3,3,3,3,3,4,4,4,4,4,4)
Use rle to find the run lengths:
l <- rle(v1)$lengths
#[1] 3 3 9 6
Create a sequence 1:n for each run length n:
s <- sequence(l)
#[1] 1 2 3 1 2 3 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6
Use integer division:
(s - 1) %/% 3 + 1
#[1] 1 1 1 1 1 1 1 1 1 2 2 2 3 3 3 1 1 1 2 2 2

Two dimensional heatmap with R

I have an input file of this form:
0.35217720 1 201 1
0.26413283 1 209 1
1.1665874 1 210 1
...
0.30815500 2 194 1
0.15407741 2 196 1
0.15407741 2 197 1
0.33016610 2 205 1
...
where the first column is a scalar value, the second is the x coordinate of a discrete lattice, the third is the y coordinate and the last one is time-like discrete component.
I would like to make a two dimensional heatmap of the scalar values at fixed time. How can i do? Edit: I don't know how to use image() to use the second and the third column as x, y coordinates.
Example file:
7.62939453 1 1 1
1.3153768 1 2 1
7.5560522 1 3 1
4.5865011 1 4 1
5.3276706 1 5 1
2.1895909 2 1 1
0.47044516 2 2 1
6.7886448 2 3 1
6.7929626 2 4 1
9.3469286 2 5 1
3.8350201 3 1 1
5.1941633 3 2 1
8.3096523 3 3 1
0.34571886 3 4 1
0.53461552 3 5 1
5.2970004 4 1 1
6.7114925 4 2 1
7.69805908 4 3 1
3.8341546 4 4 1
0.66842079 4 5 1
4.1748595 5 1 1
6.8677258 5 2 1
5.8897662 5 3 1
9.3043633 5 4 1
8.4616680 5 5 1
Reshape your data to a matrix and then use heatmap():
This worked on R version 2.10.1 (2009-12-14):
txt <- textConnection("7.62939453 1 1 1
1.3153768 1 2 1
7.5560522 1 3 1
4.5865011 1 4 1
5.3276706 1 5 1
2.1895909 2 1 1
0.47044516 2 2 1
6.7886448 2 3 1
6.7929626 2 4 1
9.3469286 2 5 1
3.8350201 3 1 1
5.1941633 3 2 1
8.3096523 3 3 1
0.34571886 3 4 1
0.53461552 3 5 1
5.2970004 4 1 1
6.7114925 4 2 1
7.69805908 4 3 1
3.8341546 4 4 1
0.66842079 4 5 1
4.1748595 5 1 1
6.8677258 5 2 1
5.8897662 5 3 1
9.3043633 5 4 1
8.4616680 5 5 1
")
df <- read.table(txt)
close(txt)
names(df) <- c("value", "x", "y", "t")
require(reshape)
dfc <- cast(df[ ,-4], x ~ y)
heatmap(as.matrix(dfc))
## Some copy/pasteable fake data for you (dput() works nicely for pasteable real data)
your_matrix <- cbind(runif(25, 0, 10), rep(1:5, each = 5), rep(1:5, 5), rep(1, 25))
heatmap_matrix <- matrix(your_matrix[, 1], nrow = 5)
## alternatively, if your_matrix isn't in order
## (The reshape method in EDi's answer is a nicer alternative)
for (i in 1:nrow(your_matrix)) {
heatmap_matrix[your_matrix[i, 2], you_matrix[i, 3]]
}
heatmap(heatmap_matrix) # one option
image(z = heatmap_matrix) # another option
require(gplots)
heatmap.2(heatmap_matrix) # this has fancier preferences

Resources