subtraction and division of row values in a data frame column - r

The details of the dataframe are
ID Price Result
1 20 -0.1
2 18 0.1667
3 21 -0.2381
4 16 0.1875
5 19 -1
so i have to subtract the second row from first row then divide by the first row. (18-20)/20 = -0.1 but for the last row as there is no next value its like (0-19)/19 = -1
Please help me with this. I am getting NA at the end.

transform(df, Result = diff(c(Price, 0))/Price)
ID Price Result
1 1 20 -0.1000000
2 2 18 0.1666667
3 3 21 -0.2380952
4 4 16 0.1875000
5 5 19 -1.0000000

Related

How to calculate the difference between rows and divide the difference with the value from the previous row in R?

Let's say I have the following dataframe:
A B C
1 15 14 12
2 7 1 6
3 8 22 5
4 11 5 1
5 4 12 4
I want to calculate the difference between the rows and then divide the difference by the value of the previous row. This is done for each variable.
The result would be something like this:
A B C A_r B_r C_r
1 15 14 12 NA NA NA
2 7 1 6 -0.53 -0.93 -0.50
3 8 22 5 0.14 21 -0.16
4 11 5 1 ... ... ...
5 4 12 4 ... ... ...
The general formula would be:
R(n) = [S(n) - S(n-1)] / S(n-1)
Where R represents the newly calculated variable and S represents the current variable the value R is being calculated for (A, B, C in this example).
I know I can use the diff function to calculate the difference but I don't know how I'd divide that difference by the values of previous rows.
We can use across with lag - loop across all the columns (everything()), apply the formula, and create new columns by modifying the .names - i.e. adding suffix _r with the corresponding column names ({.col})
library(dplyr)
df1 <- df1 %>%
mutate(across(everything(), ~ (. - lag(.))/lag(.),
.names = "{.col}_r"))
-output
df1
A B C A_r B_r C_r
1 15 14 12 NA NA NA
2 7 1 6 -0.5333333 -0.9285714 -0.5000000
3 8 22 5 0.1428571 21.0000000 -0.1666667
4 11 5 1 0.3750000 -0.7727273 -0.8000000
5 4 12 4 -0.6363636 1.4000000 3.0000000
Or use base R with diff
df1[paste0(names(df1), "_r")] <- rbind(NA,
diff(as.matrix(df1)))/rbind(NA, df1[-nrow(df1),])

Randomly select number (without repetition) for each group in R

I have the following dataframe containing a variable "group" and a variable "number of elements per group"
group elements
1 3
2 1
3 14
4 10
.. ..
.. ..
30 5
then I have a bunch of numbers going from 1 to (let's say) 30
when summing "elements" I would get 900. what I want to obtain is to randomly select a number (from 0 to 30) from 1-30 and assign it to each group until I fill the number of elements for that group. Each of those should appear 30 times in total.
thus, for group 1, I want to randomly select 3 number from 0 to 30
for group 2, 1 number from 0 to 30 etc. until I filled all of the groups.
the final table should look like this:
group number(randomly selected)
1 7
1 20
1 7
2 4
3 21
3 20
...
any suggestions on how I can achieve this?
In base R, if you have df like this...
df
group elements
1 3
2 1
3 14
Then you can do this...
data.frame(group = rep(df$group, #repeat group no...
df$elements), #elements times
number = unlist(sapply(df$elements, #for each elements...
sample.int, #...sample <elements> numbers
n=30, #from 1 to 30
replace = FALSE))) #without duplicates
group number
1 1 19
2 1 15
3 1 28
4 2 15
5 3 20
6 3 18
7 3 27
8 3 10
9 3 23
10 3 12
11 3 25
12 3 11
13 3 14
14 3 13
15 3 16
16 3 26
17 3 22
18 3 7
Give this a try:
df <- read.table(text = "group elements
1 3
2 1
3 14
4 10
30 5", header = TRUE)
# reproducibility
set.seed(1)
df_split2 <- do.call("rbind",
(lapply(split(df, df$group),
function(m) cbind(m,
`number(randomly selected)` =
sample(1:30, replace = TRUE,
size = m$elements),
row.names = NULL
))))
# remove element column name
df_split2$elements <- NULL
head(df_split2)
#> group number(randomly selected)
#> 1.1 1 25
#> 1.2 1 4
#> 1.3 1 7
#> 2 2 1
#> 3.1 3 2
#> 3.2 3 29
The split function splits the df into chunks based on the group column. We then take those smaller data frames and add a column to them by sampling 1:30 a total of elements time. We then do.call on this list to rbind back together.
Yo have to generate a new dataframe repeating $group $element times, and then using sample you can generate the exact number of random numbers:
data<-data.frame(group=c(1,2,3,4,5),
elements=c(2,5,2,1,3))
data.elements<-data.frame(group=rep(data$group,data$elements),
number=sample(1:30,sum(data$elements)))
The result:
group number
1 1 9
2 1 4
3 2 29
4 2 28
5 2 18
6 2 7
7 2 25
8 3 17
9 3 22
10 4 5
11 5 3
12 5 8
13 5 26
I solved as follow:
random_sample <- rep(1:30, each=30)
random_sample <- sample(random_sample)
then I create a df with this variable and a variable containing one group per row repeated by the number of elements in the group itself

Split data when time intervals exceed a defined value

I have a data frame of GPS locations with a column of seconds. How can I split create a new column based on time-gaps? i.e. for this data.frame:
df <- data.frame(secs=c(1,2,3,4,5,6,7,10,11,12,13,14,20,21,22,23,24,28,29,31))
I would like to cut the data frame when there is a time gap between locations of 3 or more seconds seconds and create a new column entitled 'bouts' which gives a running tally of the number of sections to give a data frame looking like this:
id secs bouts
1 1 1
2 2 1
3 3 1
4 4 1
5 5 1
6 6 1
7 7 1
8 10 2
9 11 2
10 12 2
11 13 2
12 14 2
13 20 3
14 21 3
15 22 3
16 23 3
17 24 3
18 28 4
19 29 4
20 31 4
Use cumsum and diff:
df$bouts <- cumsum(c(1, diff(df$secs) >= 3))
Remember that logical values get coerced to numeric values 0/1 automatically and that diff output is always one element shorter than its input.

How do I perform mathematical operations between values in two columns of an R data frame based on their position?

I'm working on an R project where I'm trying to compare frequencies their respective values. Essentially I have a 11852X3 column data frame with position number in slot 1, a unique value ranging from 1-11852 in the second column, and then the same set of unique values just in different positions in column 3.
Essentially because the values in columns 2 and 3 have overlap I want to find the difference between these two values based on the position number (1st) column on the far left and store it in another data frame. So if the the second column has the value 2017 in position one and then the third column also has 2017 in position one, the new data frame would have an entry of 2017 and then a value of 0 since they have the same position. If column 2 has a value of 5276 in the second position, and column 3 has the value 5276 in position 73 then the new data frame would have a value of 70.
I would love some guidance as to the way on how to do this. Thanks.
Let me know if the below code works fro you. The code will generate negative values if the number in 3rd column occurs above the number in 2nd column.
#Generate simulated data
n = 20
x <- data.frame(c1 = c(1:n), c2 = sample(n),c3 = sample(n))
#Calculate diff in position by taking difference in order
x$diff = order(x$c3)- order(x$c2)
#Reassign difference to its correct position
x$diff[order(x$c2)] <- x$diff
x
c1 c2 c3 diff
1 1 12 8 4
2 2 11 5 9
3 3 7 4 6
4 4 15 3 12
5 5 19 12 12
6 6 13 1 12
7 7 9 14 12
8 8 18 16 7
9 9 8 7 -8
10 10 16 20 -2
11 11 6 11 1
12 12 4 6 -9
13 13 14 10 -6
14 14 5 17 -12
15 15 10 18 -2
16 16 1 15 -10
17 17 3 19 -13
18 18 2 13 2
19 19 17 9 -5
20 20 20 2 -10

In R, how do I set a value for a variable based on the change from the prior (or following) row?

Given a data frame as follows:
id<-c(1,1,1,1,1,1,2,2,2,2,2,2)
t<-c(6,8,9,11,12,14,55,57,58,60,62,63)
p<-c("a","a","a","b","b","b","a","a","b","b","b","b")
df<-data.frame(id,t,p)
row id t p
1 1 6 a
2 1 8 a
3 1 9 a
4 1 11 b
5 1 12 b
6 1 14 b
7 2 55 a
8 2 57 a
9 2 58 b
10 2 60 b
11 2 62 b
12 2 63 b
I want to create a new variable 'ta' such that the value of ta is:
Zero for the row in which 'p' changes from a to b for a given ID (rows 4 and 9) (this I can do)
Within each unique id, when p is 'a', the value of ta should count down from zero by the change in t between the row in question and the row above it. For example, for row 3, the value of ta should be 0 - (11-9) = -2.
Within each unique id, when p is 'b', the value of ta should count up from zero by the change in t between the row in question and the row below it. For example, for row 5, the value of ta should be 0 + (12-11) = 1.
Thus, when complete, the data frame should look as follows:
row id t p ta
1 1 6 a -5
2 1 8 a -3
3 1 9 a -2
4 1 11 b 0
5 1 12 b 1
6 1 14 b 3
7 2 55 a -3
8 2 57 a -1
9 2 58 b 0
10 2 60 b 2
11 2 62 b 4
12 2 63 b 5
I've been playing around with loops and cumsum() and head() and tail() and can't quite make this kind of within id/within condition summing work. There are a number of other questions about working with values from previous or following rows, but I can't quite reshape any of those techniques to work here. Your thoughts are greatly appreciated.
Here you go. This is a split-apply-combine strategy of breaking everything up by id, establishing the transition point between p=='a' and p=='b' and then subtracting values above and below that. It only works if your data are actually ordered in the way you show them here.
do.call('rbind',
lapply(split(df, id), function(x) {
# save values of `0` at transition points in `p`
x <- cbind.data.frame(x, ta=ifelse(c(0,diff(as.numeric(as.factor(x$p))))==1, 0, NA))
# identify indices for those points
w <- which(x$ta==0)
# handle `ta` values for `p=='b'`
x$ta[(w+1):nrow(x)] <- x$ta[w] + (x$t[(w+1):nrow(x)] - x$t[w])
# handle `ta` values for `p=='a'`
x$ta[1:(w-1)] <- x$ta[w] - (x$t[w] - x$t[1:(w-1)])
return(x)
})
)
Result:
id t p ta
1.1 1 6 a -5
1.2 1 8 a -3
1.3 1 9 a -2
1.4 1 11 b 0
1.5 1 12 b 1
1.6 1 14 b 3
2.7 2 55 a -3
2.8 2 57 a -1
2.9 2 58 b 0
2.10 2 60 b 2
2.11 2 62 b 4
2.12 2 63 b 5

Resources