Subtract from the previous row R [duplicate] - r

This question already has answers here:
How to find the difference in value in every two consecutive rows in R?
(4 answers)
Closed 7 years ago.
I have a dataframe like so:
df <- data.frame(start=c(5,4,2),end=c(2,6,3))
start end
5 2
4 6
2 3
And I want the following result:
start end diff
5 2
4 6 1
2 3 -1
Essentially it is:
end[2] (second row) - start[1] = 6-5=1
and end[3] - start[2] = 3-4 = -1
What is a good way of doing this in R?

Just a simple vector subtraction should work
df$diff <- c(NA,df[2:nrow(df), 2] - df[1:(nrow(df)-1), 1])
start end diff
1 5 2 NA
2 4 6 1
3 2 3 -1

library(data.table)
setDT(df)[,value:=end-shift(start,1,type="lag")]
start end value
1: 5 2 NA
2: 4 6 1
3: 2 3 -1

Related

Receive the total sum score of every number [duplicate]

This question already has answers here:
Count number of occurences for each unique value
(14 answers)
Closed 2 years ago.
Using as input data frame:
df1 <- data.frame(num = c(1,1,1,2,2,2,3))
How is it possible to receive the sum of every number excited in the num column?
Example output:
num frequency
1 3
2 3
3 1
Using table and coerce it to a data frame.
as.data.frame(table(df1$num))
# Var1 Freq
# 1 1 3
# 2 2 3
# 3 3 1
or
with(df1, data.frame(num=unique(num), freq=tabulate(num)))
# num freq
# 1 1 3
# 2 2 3
# 3 3 1

R Repeat Rows Data Table [duplicate]

This question already has answers here:
Repeat each row of data.frame the number of times specified in a column
(10 answers)
Closed 2 years ago.
library(data.table)
dataHAVE=data.frame("student"=c(1,2,3),
"score" = c(10,11,12),
"count"=c(4,1,2))
dataWANT=data.frame("student"=c(1,1,1,1,2,3,3),
"score"=c(10,10,10,10,11,12,12),
"count"=c(4,4,4,4,1,2,2))
setDT(dataHAVE)dataHAVE[rep(1:.N,count)][,Indx:=1:.N,by=student]
I have data 'dataHAVE' and seek to produce 'dataWANT' that basically copies each 'student' 'count' number of times as shown in 'dataWANT'. I try doing this as shown above in data.table as this is the solution I seek but get error
Error: unexpected symbol in "setDT(dat)dat"
and I cannot resolve thank you so much.
Try:
setDT(dataHAVE)[rep(1:.N,count)]
Output:
student score count
1: 1 10 4
2: 1 10 4
3: 1 10 4
4: 1 10 4
5: 2 11 1
6: 3 12 2
7: 3 12 2
As explained you could also replace 1:.N and do setDT(dataHAVE)[dataHAVE[, rep(.I, count)]].
Just FYI, there's also a nice function in tidyr that does similar thing:
tidyr::uncount(dataHAVE, count, .remove = FALSE)
Here is a base R solution
dataWANT<-do.call(rbind,
c(with(dataHAVE,rep(split(dataHAVE,student),count)),
make.row.names = FALSE))
such that
> dataWANT
student score count
1 1 10 4
2 1 10 4
3 1 10 4
4 1 10 4
5 2 11 1
6 3 12 2
7 3 12 2

Group-ID according to numbering reset [duplicate]

This question already has answers here:
Group variable based on continuous values
(1 answer)
Group a dataframe based on sequence breaks in a column?
(2 answers)
Something like conditional seq_along on grouped data
(1 answer)
How do I create a variable that increments by 1 based on the value of another variable?
(3 answers)
Closed 3 years ago.
I have following data:
d <- as_tibble(c(1,2,1,2,3,4,5,1,2,3,4,1,2,3,4,5,6,7))
The running numbers are one group, and for every reset
I need hvae a new group. What I need is a group-ID for
every numbering reset; hence:
d$ID <- c(1,1,2,2,2,2,2,3,3,3,3,4,4,4,4,4,4,4)
To visualize it:
value ID
1 1
2 1
1 2
2 2
3 2
4 2
5 2
1 3
2 3
3 3
4 3
1 4
2 4
3 4
4 4
5 4
6 4
7 4
I have tried using group_indices of dplyr but
that doesnt do the trick as it groups by same value:
d$ID <- d %>% group_indices(value)
We can use diff to subtract the current value with previous value and increment the counter whenever the values are reset.
cumsum(c(TRUE, diff(d$value) < 0))
#[1] 1 1 2 2 2 2 2 3 3 3 3 4 4 4 4 4 4 4
In dplyr,we can use lag to compare it with previous value.
library(dplyr)
d %>% mutate(ID = cumsum(value < lag(value, default = first(value))) + 1)

Creating a repeating vector sequence in R [duplicate]

This question already has an answer here:
Generate a repeating sequence based on vector
(1 answer)
Closed 5 years ago.
I need some help. How to create the following vector sequence:
1 1 1 1 2 2 2 3 3 4
I tried to use (rep) and (seq) but still unsucessfull.
Try this:
rep(1:4,4:1)
Output:
[1] 1 1 1 1 2 2 2 3 3 4
or less concisely: c(rep(1, 4), rep(2,3), rep(3, 2), 4)
output: [1] 1 1 1 1 2 2 2 3 3 4

R - Subset dataframe where 2 columns have values [duplicate]

This question already has answers here:
Remove rows with all or some NAs (missing values) in data.frame
(18 answers)
Closed 6 years ago.
How can I subset a dataframe where 2 columns have values?
For example:
A B
1 2
3
5 6
8
becomes
A B
1 2
5 6
> subset(df, !is.na(df$A) & !is.na(df$B))
> df[!is.na(df$A) & !is.na(df$B),]
> df[!is.na(rowSums(df)),]
> na.omit(df)
all equivalent
One easiest way is to use na.omit (if you are targeting NA values).
Kindly go through following R code snippet:
> x
a b
1 1 2
2 3 NA
3 5 6
4 NA 8
> na.omit(x)
a b
1 1 2
3 5 6
Another way is to use complete.cases as shown below:
> x[complete.cases(x),]
a b
1 1 2
3 5 6
You can also use na.exclude as shown below:
> na.exclude(x)
a b
1 1 2
3 5 6
Hope it works for you!

Resources