This question already has answers here:
R: Differences by group and adding
(3 answers)
Closed 6 years ago.
I have the following dataset:
df <- data.frame (id= c(1,1,1,2,2), time = c(13,14,17,17,17))
id time
1 1 13
2 1 14
3 1 17
4 2 17
5 2 17
and I wish to go over on each id and subtract the next time and the previous time. So, My ideal output will be:
#output
id time diff
1 1 13 0
2 1 14 1
3 1 17 3
4 2 17 0
5 2 17 0
What is the most efficient way for that?
Thank so Zheyuan Li.
This is a great solution:
df$diff <- with(df, ave(time, id, FUN = function (x) c(0, diff(x))))
Related
This question already has answers here:
subtract first or second value from each row [duplicate]
(2 answers)
Closed 3 days ago.
I would like to create a new column in my dataset that shows the difference in the values (column b in example dataset) between the current row and the first row within a group (column a in example dataset) in R. How would I go about doing this?
a<-c(1,1,1,1,2,2,2,2)
b<-c(2,4,6,8,10,12,14,16)
have<-as.data.frame(cbind(a,b))
> have
a b
1 2
1 4
1 6
1 8
2 10
2 12
2 14
2 16
> want
a b c
1 2 0
1 4 2
1 6 4
1 8 6
2 10 0
2 12 2
2 14 4
2 16 6
You can use first() to address the first member in the group:
library(dplyr)
as.data.frame(cbind(a,b)) %>%
group_by(a) %>%
mutate(c = b - first(b)) %>%
ungroup()
This question already has an answer here:
Include levels of zero count in result of table()
(1 answer)
Closed 2 years ago.
I have this collection
x <- c(3,4,5,7,7,9,9,9,10,10,10,10,11,11,11,11,11,11,11,12,12,12,12,12,12,12,12,12,12,13,13,13,13,13,13,13,13,13,13,13,13,13,13,14,14,14,15,15)
And I want to get the frequencies of each value of the sequence 3:15 within that collection. If I do table(x) it gives me the frequencies of the existing values, but for example, the value 6 would have a frequency value of 0 and is not shown with table().
Use factor with levels in table.
table(factor(x, levels = 3:15))
# 3 4 5 6 7 8 9 10 11 12 13 14 15
# 1 1 1 0 2 0 3 4 7 10 14 3 2
Or for a general case :
table(factor(x, levels = min(x):max(x)))
This question already has answers here:
Repeat each row of data.frame the number of times specified in a column
(10 answers)
Closed 2 years ago.
library(data.table)
dataHAVE=data.frame("student"=c(1,2,3),
"score" = c(10,11,12),
"count"=c(4,1,2))
dataWANT=data.frame("student"=c(1,1,1,1,2,3,3),
"score"=c(10,10,10,10,11,12,12),
"count"=c(4,4,4,4,1,2,2))
setDT(dataHAVE)dataHAVE[rep(1:.N,count)][,Indx:=1:.N,by=student]
I have data 'dataHAVE' and seek to produce 'dataWANT' that basically copies each 'student' 'count' number of times as shown in 'dataWANT'. I try doing this as shown above in data.table as this is the solution I seek but get error
Error: unexpected symbol in "setDT(dat)dat"
and I cannot resolve thank you so much.
Try:
setDT(dataHAVE)[rep(1:.N,count)]
Output:
student score count
1: 1 10 4
2: 1 10 4
3: 1 10 4
4: 1 10 4
5: 2 11 1
6: 3 12 2
7: 3 12 2
As explained you could also replace 1:.N and do setDT(dataHAVE)[dataHAVE[, rep(.I, count)]].
Just FYI, there's also a nice function in tidyr that does similar thing:
tidyr::uncount(dataHAVE, count, .remove = FALSE)
Here is a base R solution
dataWANT<-do.call(rbind,
c(with(dataHAVE,rep(split(dataHAVE,student),count)),
make.row.names = FALSE))
such that
> dataWANT
student score count
1 1 10 4
2 1 10 4
3 1 10 4
4 1 10 4
5 2 11 1
6 3 12 2
7 3 12 2
This question already has answers here:
How to sum a variable by group
(18 answers)
Closed 6 years ago.
I have searched a lot, but not found a solution.
I have the following data frame:
Age no.observations Factor
1 1 4 A
2 1 3 A
3 1 12 A
4 1 5 B
5 1 9 B
6 1 3 B
7 2 12 A
8 2 3 A
9 2 6 A
10 2 7 B
11 2 9 B
12 2 1 B
I would like to sum create another column with the sum by the categories Age and Factor, thus having 19 for the first three rows, 26 for the next three etc. I want this to be a column added to this data.frame, therefore dplyr and its summarise function do not help.
Use mutate with group_by to not summarise:
df %>%
group_by(Age, Factor) %>%
mutate(no.observations.in.group = sum(no.observations)) %>%
ungroup()
This question already has answers here:
How to find the difference in value in every two consecutive rows in R?
(4 answers)
Closed 7 years ago.
I have a dataframe like so:
df <- data.frame(start=c(5,4,2),end=c(2,6,3))
start end
5 2
4 6
2 3
And I want the following result:
start end diff
5 2
4 6 1
2 3 -1
Essentially it is:
end[2] (second row) - start[1] = 6-5=1
and end[3] - start[2] = 3-4 = -1
What is a good way of doing this in R?
Just a simple vector subtraction should work
df$diff <- c(NA,df[2:nrow(df), 2] - df[1:(nrow(df)-1), 1])
start end diff
1 5 2 NA
2 4 6 1
3 2 3 -1
library(data.table)
setDT(df)[,value:=end-shift(start,1,type="lag")]
start end value
1: 5 2 NA
2: 4 6 1
3: 2 3 -1