I have these numbers:
login.day$wday
[1] 5 6 7 1 2 3 4
and I want to map them to:
login.day$wday
[1] 4 5 6 7 1 2 3
Each number is subtracted by 1, and if the answer is 0, wrap it around back to 7. This is embarrassingly simple but I just can't figure it out. My attempt keeps giving me a zero:
> (login.day$wday + 6) %% 7
[1] 4 5 6 0 1 2 3
Prefer solution in R. Is it possible to do with modulo arithmetic or must I use an if statement?
Mathematically equivalent to the other solution, and with some explanation.
(login.day$wday - 1 - 1) %% 7 + 1
The problem is that it is hard to do modular arithmetic with numbers starting at 1.
We start by doing -1 to shift everything down by 1, so we have a zero-based numbers ranging from [0,6].
We then subtract 1, because that is what we were trying to do to begin with.
Next, we take the modulus, and add 1 back to shift everything back up to the range [1,7].
(login.day$wday + 5) %% 7 + 1
perhaps?
The boundary conditions are 7 -> 6, 1 -> 7 and 2 -> 1.
The result had to involve %% 7 as you so rightly spotted.
And since the last of these boundary conditions results in 1, then we need to add 1 after doing the modulo, and reduce the number added before the modulo by 1.
I have a silly function I've written called shift that does this:
shift <- function(x = 1:10, n = 1) {
if (n == 0) x <- x
else x <- c(tail(x, -n), head(x, n))
x
}
x <- c(5, 6, 7, 1, 2, 3, 4)
shift(x, -1)
# [1] 4 5 6 7 1 2 3
shift(x, -2)
# [1] 3 4 5 6 7 1 2
The use I had in mind for this was something like the following:
set.seed(1)
X <- sample(7, 20, TRUE)
X
# [1] 2 3 5 7 2 7 7 5 5 1 2 2 5 3 6 4 6 7 3 6
shift(sort(unique(X)), -1)[X]
# [1] 1 2 4 6 1 6 6 4 4 7 1 1 4 2 5 3 5 6 2 5
I like the solution of #merlin2011 but just to add to the options here is a lookup table approach:
c(7, 1:6)[login.day$wday]
Related
I have a file with interval values such as this for 50M lines:
>data
start_pos end_pos
1 1 10
2 3 6
3 5 9
4 6 11
And I would like to have a table of position occurrences so that I can compute the coverage on each position in the interval file such as this:
>occurence
position coverage
1 1
2 1
3 2
4 2
5 3
6 4
7 3
8 3
9 3
10 2
11 1
Is there any fast and best way to complete this task in R?
My plan was to loop through the data and concatenate the sequence in each interval into a vector and convert the final vector into a table.
count<-c()
for (row in 1:nrow(data)){
count<-c(count,(data[row,]$start_pos:data[row,]$end_pos))
}
occurence <- table(count)
The problem is that my file is huge and it takes way to much time and memory to do so.
The Bioconductor IRanges package does this fast and efficiently
library(IRanges)
ir = IRanges(start = c(1, 3, 5, 6), end = c(10, 6, 9, 11))
coverage(ir)
with
> coverage(ir) |> as.data.frame()
value
1 1
2 1
3 2
4 2
5 3
6 4
7 3
8 3
9 3
10 2
11 1
I'm looking for an easy way to add the minimum value for each column inside my dataframe.
This feels like a common thing, but I haven't been able to find any good answers yet...maybe I'm missing something obvious.
Let's say I've got two columns (in reality I have close to 100) with positive and negative numbers.
w <- c(9, 9, 9, 9)
x <- c(-2, 0, 1, 3)
y <- c(-1, 1, 3, 4)
z <- as.data.frame(cbind(w, x, y))
w x y
1 9 -2 -1
2 9 0 1
3 9 1 3
4 9 3 4
I want z to look like this after a transformation for only x and y columns [,2:3]
w x y
1 9 0 0
2 9 2 2
3 9 3 4
4 9 5 5
Does that make sense?
library(dplyr)
dplyr::mutate(z, across(c(x, y), ~ . + abs(min(.))))
w x y
1 9 0 0
2 9 2 2
3 9 3 4
4 9 5 5
You can also do by column position rather than column name by changing c(x,y) to 2:3 or c(2:3, 5) for non-sequential column positions.
Depends exactly what you mean and what you want to happen if there aren't negative values. No matter the values, this will anchor the minimum at 0, but you should be able to adapt it if you want something slightly different.
z[] = lapply(z, function(col) col - min(col))
z
# x y
# 1 0 0
# 2 2 2
# 3 3 4
# 4 5 5
As a side note, as.data.frame(cbind(x, y)) is bad - if you have a mix of numeric and character values, cbind() will convert everything to character. It's shorter and better to simplify to data.frame(x, y).
Do you want
z[] <- lapply(z, function(columnValues) columnValues + abs(min(columnValues)))
This question already has answers here:
Generate a sequence of numbers with repeated intervals
(6 answers)
Closed 5 years ago.
I've got some struggle with a small issue. What I want to get is a dim=1 array to be filled up with help of this for-loop.
Minimal-Example (it's not working!):
Numbers <- seq(1,5)
Result <- array(NA)
for(n in Numbers){
Result[n] <- seq(n,5)
# The Result array should be like this:
# (1, 2, 3, 4, 5, 2, 3, 4, 5, 3, 4, 5, 4, 5, 5)
}
I guess there a two problems:
The Result[n] don't have the same length
The index n in Result[n] is wrong. Actually, it should be dynamic, thus, change with every new n.
Can you guys help me?
Thank you!
We can do this with sapply
unlist(sapply(Numbers, function(x) seq(x, 5)))
#[1] 1 2 3 4 5 2 3 4 5 3 4 5 4 5 5
Or using the for loop
Result <- c()
for(n in Numbers){
Result <- c(Result, seq(n, 5))
}
Result
#[1] 1 2 3 4 5 2 3 4 5 3 4 5 4 5 5
Using sequence and rep:
n <- 5
sequence(n:1) + rep(0:(n-1), n:1)
# [1] 1 2 3 4 5 2 3 4 5 3 4 5 4 5 5
You may also create an 'oversized' matrix and select the lower triangle:
m <- matrix(c(NA, 1:n), nrow = n + 1, ncol = n + 1)
m[lower.tri(m)]
# [1] 1 2 3 4 5 2 3 4 5 3 4 5 4 5 5
I'm looking for a way to count the frequency of each element in a vector.
ex <- c(2,2,2,3,4,5)
Desired outcome:
[1] 3 3 3 1 1 1
Is there a simple command for this?
rep(table(ex), table(ex))
# 2 2 2 3 4 5
# 3 3 3 1 1 1
If you don't want the labels you can wrap in as.vector()
as.vector(rep(table(ex), table(ex)))
# [1] 3 3 3 1 1 1
I'll add (because it seems related somehow) that if you only wanted consecutive values, you could use rle instead of table:
ex2 = c(2, 2, 2, 3, 4, 2, 2, 3, 4, 4)
rep(rle(ex2)$lengths, rle(ex2)$lengths)
# [1] 3 3 3 1 1 2 2 1 2 2
As pointed out in comments, for a large vector calculating a table can be expensive, so doing it only once is more efficient:
tab = table(ex)
rep(tab, tab)
# 2 2 2 3 4 5
# 3 3 3 1 1 1
You can use
ex <- c(2,2,2,3,4,5)
outcome <- ave(ex, ex, FUN = length)
This is what thelatemail suggested. Also similar to the answer at this question
Title says it all: how would I code such a repeating sequence where the base repeat unit is : a vector c(1,1,1,2) - repeated 4 times, but incrementing the values in the vector by 2 each time?
I've tried a variety of rep,times,each,seq and can't get the wanted result out..
c(1,1,1,2) + rep(seq(0, 6, 2), each = 4)
# [1] 1 1 1 2 3 3 3 4 5 5 5 6 7 7 7 8
The rep function allows for a vector of the same length as x to be used in the times argument. We can extend the desired pattern with the super secret rep_len.
rep(1:8, rep_len(c(3, 1), 8))
#[1] 1 1 1 2 3 3 3 4 5 5 5 6 7 7 7 8
I'm not sure if I get it right but what's wrong with something as simple as that:
rep<-c(1,1,1,2)
step<-2
vec<-c(rep,step+rep,2*step+rep,3*step+rep)
I accepted luke as it is the easiest for me to understand (and closest to what I was already trying, but failing with!)
I have used this final form:
> c(1,1,1,2)+rep(c(0,2,4,6),each=4)
[1] 1 1 1 2 3 3 3 4 5 5 5 6 7 7 7 8
You could do:
pattern <- rep(c(3, 1), len = 50)
unlist(lapply(1:8, function(x) rep(x, pattern[x])))
[1] 1 1 1 2 3 3 3 4 5 5 5 6 7 7 7 8
This lets you just adjust the length of the pattern under rep(len = X) and removes any usage of addition, which some of the other answers show.
How about:
input <- c(1,1,1,2)
n <- 4
increment <- 2
sort(rep.int(seq.int(from = 0, by = increment, length.out = n), length(input))) + input
[1] 1 1 1 2 3 3 3 4 5 5 5 6 7 7 7 8