How do I repeat only a part of a vector? - r

I have a vector of: 0,24,12,12,12,96,12,12,12,12,12,12.
I want to repeat only a part of it from 96 to the last element (12). The first part (0, 24, 12, 12, 12) I want to keep constant.
Could you please help ?

The answer depends on whether number 96 is always located at the 6th position inside your vector. If so, please refer to the first comment underneath your question. If the position is variable, however, you could implement a simple query that identifies the position of 96 inside your vector, and then repeat the part of the vector starting from there as often as you wish (2 times in the below-mentioned code).
x <- c(0,24,12,12,12,96,12,12,12,12,12,12)
# Identify index of 96
id <- which(x == 96)
# Repeat part of vector starting from `id` 2 times
c(x[1:(id-1)], rep(x[id:length(x)], 2))
# # Which results in
# [1] 0 24 12 12 12 96 12 12 12 12 12 12 96 12 12 12 12 12 12

Related

seq_along() - truncating a replication in r

I would like to generate the month number to go along with a list of values. The problem is that the list is not a full 2 replications of 12 months. It is 12 from the first year and 10 from the second year.
tibble(value=rnorm(22))
Some things I have tried are rep(1:12,2), thinking that the sequence would stop
when it hit the end of the length of the dataframe. I also tried seq_along(along.with=value,1:12) with the same line of thinking.
You want the length.out argument to rep():
rep(1:12, length.out = 22)
which gives
> rep(1:12, length.out = 22)
[1] 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10
We get this because, from ?rep:
‘length.out’ may be given in place of ‘times’, in which case ‘x’
is repeated as many times as is necessary to create a vector of
this length. If both are given, ‘length.out’ takes priority and
‘times’ is ignored.
I would roll out 22 months and then use a modulo operator to get months in subsequent year(s)
library(dplyr)
tibble(value=rnorm(22)) %>%
mutate(month=1:22,
month=ifelse(month%%12==0, 12, month%%12)

Searching the closest value in other column

Suppose we have a data frame of two columns
X Y
10 14
12 16
14 17
15 19
21 19
The first element of Y that is 14, the nearest value (or same) to it is 14 (which is 3rd element of X). Similarly, next element of Y is closest to 15 that is 4th element of X
So, the output I would like should be
3
4
4
5
5
As my data is large, Can you give me some advice on the systemic/proper code for doing it?
You can try this piece of code:
apply(abs(outer(d$X,d$Y,FUN = '-')),2,which.min)
# [1] 3 4 4 5 5
Here, abs(outer(d$X,d$Y,FUN = '-')) returns a matrix of unsigned differences between d$X and d$Y, and apply(...,2,which.min) will return position of the minimum by row.

filling matrix with circular patern

I want to write a function that fill a matrix m by m where m is odd as follows :
1) it's starts from middle cell of matrix (for example for 5 by 5 A, matrix middle cell are A[2,2] ) , and put number 1 there
2) it's go one cell forward and add 1 to previous cell and put it in second cell
3) it's go down and put 3, left 4, left 5, up 6, up 7,...
for example the resulting matrix could be like this :
> 7 8 9
6 1 2
5 4 3
could somebody help me to implement?
max_x=5
len=max_x^2
middle=ceiling(max_x/2)
A=matrix(NA,max_x,max_x)
increments=Reduce(
f=function(lhs,rhs) c(lhs,(-1)^(rhs/2+1)*rep(1,rhs)),
x=2*(1:(max_x)),
init=0
)[1:len]
idx_x=Reduce(
f=function(lhs,rhs) c(lhs,rep(c(TRUE,FALSE),each=rhs)),
1:max_x,
init=FALSE
)[1:len]
increments_x=increments
increments_y=increments
increments_x[!idx_x]=0
increments_y[idx_x]=0
A[(middle+cumsum(increments_x)-1)*(max_x)+middle+cumsum(increments_y)]=1:(max_x^2)
Gives
#> A
# [,1] [,2] [,3] [,4] [,5]
#[1,] 21 22 23 24 25
#[2,] 20 7 8 9 10
#[3,] 19 6 1 2 11
#[4,] 18 5 4 3 12
#[5,] 17 16 15 14 13
Explanation:
The vector increments denotes the steps along the path of the increasing numbers. It's either 0/+1/-1 for unchanged/increasing/decreasing row and column indices. Important here is that these numbers do not differentiate between steps along columns and rows. This is managed by the vector idx_x - it masks out increments that are either along a row (TRUE) or a column (FALSE).
The last line takes into account R's indexing logic (matrix index increases along columns).
Edit:
As per request of the OP, here some more information about how the increments vector is calculated.
You always go two consecutive straight lines of equal length (row-wise or column-wise). The length, however, increases by 1 after you have walked twice. This corresponds to the x=2*(1:(max_x)) argument together with rep(1,rhs). The first two consecutive walks are in increasing column/row direction. Then follow two in negative direction and so on (alternating). This is accounted for by (-1)^(rhs/2+1).

How to extract certain rows

So As you can see I have a price and Day columns below
Price Day
2 1
5 2
8 3
11 4
14 5
17 6
20 7
23 8
26 9
29 10
32 11
35 12
38 13
41 14
44 15
47 16
50 17
53 18
56 19
59 20
I then want the output below
Difference Day
12 5
15 10
15 15
15 20
So now I have the difference in prices every 5 days...it just basically subtracts the 5th day with the first day.....and then the 10th day with the 5th day etc....
I already made a code that will seperate my data into 5 day intervals...but I want the code that will let me minus the 5th with the 1st day....the 10th day with the 5th day...etc
So the code should look something like this
difference<-tapply(Price[,1],Day, ____________)
So basically Price[,1] will be my Price data.....while "Day" is the variable that I created that will let me seperate my Day data into 5 day intervals.....I'm thinking that in the blank section I could put in the function or another variable that will let me subtract the 5th day with the 1st day prices and then the 10th day and 5th day prices...etc.....you dont have to help me to seperate my Days into intervals...just how to do "difference" section....thanks guys
Here's one option, assuming your data.frame is called "SODF":
within(SODF[c(1, seq(5, nrow(SODF), 5)), ], {
Price <- diff(c(0, Price))
})[-1, ]
# Price Day
# 5 12 5
# 10 15 10
# 15 15 15
# 20 15 20
The first step is basic subsetting. According to your description and expected answer, you want the first row, and then every fifth row starting from row 5:
> SODF[c(1, seq(5, nrow(SODF), 5)), ]
Price Day
1 2 1
5 14 5
10 29 10
15 44 15
20 59 20
From there, you can use diff on the "Price" column, but since diff will result in a vector that is one in length shorter than your input, you need to "pad" the input vector, which I did with diff(c(0, Price)).
# Correct values, but the number of rows needs to be 5
> diff(SODF[c(1, seq(5, nrow(SODF), 5)), "Price"])
[1] 12 15 15 15
Then, the [-1, ] at the end just deletes the extraneous row.
Update
In the comments below, #geektrader points out in the comments (thanks!), an alternative to using:
SODF[c(1, seq(5, nrow(SODF), 5)), ]
as your input data.frame, you may consider using the following instead:
rbind(SODF[1,], SODF[$Day %% 5 == 0,] )
The difference in the two approaches is that the first approach simply subsets by row number, while the second approach subsets according to the value in the "Day" column, extracting rows where "Day" is a multiple of 5. This second approach might be useful, for instance, when there are missing rows in the dataset.
Ananda's is a nice approach (always forget about within myself). Here's another approach:
dat2 <- dat[seq(0, nrow(dat), by=5), ]
data.frame(Difference=diff(c(dat[1,1], dat2[, 1])), Day=dat2[, 2])
Here a solution if you have a matrix as input.
The subsequent function, given a matrix m, a column col_id and a numeric interval interv, subtracts every interv rows the current value in the col_id column of the m matrix with the previous value (5 rows before, same column, obiviously).
The results are stored in a new column called diff and appended to the end of the m matrix.
In short, the approach is very similar to that used by #Ananda Mahto.
So, this is the function:
subtract_column <- function(m, col_id, interv) {
select <- c(1, seq(interv, nrow(m), interv))
cbind(m[select[-1], ], diff = diff(m[select, col_id]))
}
Example:
# this emulates your data as a matrix
price_vect <- c(2,5,8,11,14,17,20,23,26,29,32,35,38,41,44,47,50,53,56,59)
day_vect <- 1:20
matr <- do.call(cbind, list(price = price_vect, day = day_vect))
# and this calls the function above and does the job:
# subtracts every 5 rows the current and the previous (5 rows back) value in the column `price` of matrix `matr`
subtract_column(matr, 'price', 5)
Output:
price day diff
[1,] 14 5 12
[2,] 29 10 15
[3,] 44 15 15
[4,] 59 20 15

How to reorder a column in a data frame to be the last column

I have a data frame where columns are constantly being added to it. I also have a total column that I would like to stay at the end. I think I must have skipped over some really basic command somewhere but cannot seem to find the answer anywhere. Anyway, here is some sample data:
x=1:10
y=21:30
z=data.frame(x,y)
z$total=z$x+z$y
z$w=11:20
z$total=z$x+z$y+z$w
When I type z I get this:
x y total w
1 1 21 33 11
2 2 22 36 12
3 3 23 39 13
4 4 24 42 14
5 5 25 45 15
6 6 26 48 16
7 7 27 51 17
8 8 28 54 18
9 9 29 57 19
10 10 30 60 20
Note how the total column comes before the w, and obviously any subsequent columns. Is there a way I can force it to be the last column? I am guessing that I would have to use ncol(z) somehow. Or maybe not.
You can reorder your columns as follows:
z <- z[,c('x','y','w','total')]
To do this programmatically, after you're done adding your columns, you can retrieve their names like so:
nms <- colnames(z)
Then you can grab the ones that aren't 'total' like so:
nms[nms!='total']
Combined with the above:
z <- z[, c(nms[nms!='total'],'total')]
You have a logic issue here. Whenever you add to a data.frame, it grows to the right.
Easiest fix: keep total a vector until you are done, and only then append it. It will then be the rightmost column.
(For critical applications, you would of course determine your width k beforehand, allocate k+1 columns and just index the last one for totals.)

Resources