R Repeat Rows Data Table [duplicate] - r

This question already has answers here:
Repeat each row of data.frame the number of times specified in a column
(10 answers)
Closed 2 years ago.
library(data.table)
dataHAVE=data.frame("student"=c(1,2,3),
"score" = c(10,11,12),
"count"=c(4,1,2))
dataWANT=data.frame("student"=c(1,1,1,1,2,3,3),
"score"=c(10,10,10,10,11,12,12),
"count"=c(4,4,4,4,1,2,2))
setDT(dataHAVE)dataHAVE[rep(1:.N,count)][,Indx:=1:.N,by=student]
I have data 'dataHAVE' and seek to produce 'dataWANT' that basically copies each 'student' 'count' number of times as shown in 'dataWANT'. I try doing this as shown above in data.table as this is the solution I seek but get error
Error: unexpected symbol in "setDT(dat)dat"
and I cannot resolve thank you so much.

Try:
setDT(dataHAVE)[rep(1:.N,count)]
Output:
student score count
1: 1 10 4
2: 1 10 4
3: 1 10 4
4: 1 10 4
5: 2 11 1
6: 3 12 2
7: 3 12 2
As explained you could also replace 1:.N and do setDT(dataHAVE)[dataHAVE[, rep(.I, count)]].
Just FYI, there's also a nice function in tidyr that does similar thing:
tidyr::uncount(dataHAVE, count, .remove = FALSE)

Here is a base R solution
dataWANT<-do.call(rbind,
c(with(dataHAVE,rep(split(dataHAVE,student),count)),
make.row.names = FALSE))
such that
> dataWANT
student score count
1 1 10 4
2 1 10 4
3 1 10 4
4 1 10 4
5 2 11 1
6 3 12 2
7 3 12 2

Related

Calculate difference between current row and first row within group [duplicate]

This question already has answers here:
subtract first or second value from each row [duplicate]
(2 answers)
Closed 3 days ago.
I would like to create a new column in my dataset that shows the difference in the values (column b in example dataset) between the current row and the first row within a group (column a in example dataset) in R. How would I go about doing this?
a<-c(1,1,1,1,2,2,2,2)
b<-c(2,4,6,8,10,12,14,16)
have<-as.data.frame(cbind(a,b))
> have
a b
1 2
1 4
1 6
1 8
2 10
2 12
2 14
2 16
> want
a b c
1 2 0
1 4 2
1 6 4
1 8 6
2 10 0
2 12 2
2 14 4
2 16 6
You can use first() to address the first member in the group:
library(dplyr)
as.data.frame(cbind(a,b)) %>%
group_by(a) %>%
mutate(c = b - first(b)) %>%
ungroup()

How can I remove one number each time while replicating a number of sequence in r? [duplicate]

This question already has answers here:
Generate an incrementally increasing sequence like 112123123412345
(4 answers)
Closed 1 year ago.
I have a sequence of numbers like this:
> seq(2,6,1)
[1] 2 3 4 5 6
I would like to replicate this sequence and remove one number from the end of the sequence while doing replication. This is what I want to have:
[1] 2 3 4 5 6 2 3 4 5 2 3 4 2 3 2
Is there any functions in R can help me get this sequence?
Thank you very much.
I am sure there is a fancier way, but the following code achieves the goal in base R.
out = integer()
a = 2:6
while( length(a) > 1 ) {
a = a[-length(a)]
out = c(out, a)
}
out
#> [1] 2 3 4 5 2 3 4 2 3 2
Created on 2021-03-18 by the reprex package (v1.0.0)
That should be it:
sequence(5:1, from = 2)
[1] 2 3 4 5 6 2 3 4 5 2 3 4 2 3 2

R - Subset dataframe where 2 columns have values [duplicate]

This question already has answers here:
Remove rows with all or some NAs (missing values) in data.frame
(18 answers)
Closed 6 years ago.
How can I subset a dataframe where 2 columns have values?
For example:
A B
1 2
3
5 6
8
becomes
A B
1 2
5 6
> subset(df, !is.na(df$A) & !is.na(df$B))
> df[!is.na(df$A) & !is.na(df$B),]
> df[!is.na(rowSums(df)),]
> na.omit(df)
all equivalent
One easiest way is to use na.omit (if you are targeting NA values).
Kindly go through following R code snippet:
> x
a b
1 1 2
2 3 NA
3 5 6
4 NA 8
> na.omit(x)
a b
1 1 2
3 5 6
Another way is to use complete.cases as shown below:
> x[complete.cases(x),]
a b
1 1 2
3 5 6
You can also use na.exclude as shown below:
> na.exclude(x)
a b
1 1 2
3 5 6
Hope it works for you!

Loop over each group and subtract their value [duplicate]

This question already has answers here:
R: Differences by group and adding
(3 answers)
Closed 6 years ago.
I have the following dataset:
df <- data.frame (id= c(1,1,1,2,2), time = c(13,14,17,17,17))
id time
1 1 13
2 1 14
3 1 17
4 2 17
5 2 17
and I wish to go over on each id and subtract the next time and the previous time. So, My ideal output will be:
#output
id time diff
1 1 13 0
2 1 14 1
3 1 17 3
4 2 17 0
5 2 17 0
What is the most efficient way for that?
Thank so Zheyuan Li.
This is a great solution:
df$diff <- with(df, ave(time, id, FUN = function (x) c(0, diff(x))))

Subtract from the previous row R [duplicate]

This question already has answers here:
How to find the difference in value in every two consecutive rows in R?
(4 answers)
Closed 7 years ago.
I have a dataframe like so:
df <- data.frame(start=c(5,4,2),end=c(2,6,3))
start end
5 2
4 6
2 3
And I want the following result:
start end diff
5 2
4 6 1
2 3 -1
Essentially it is:
end[2] (second row) - start[1] = 6-5=1
and end[3] - start[2] = 3-4 = -1
What is a good way of doing this in R?
Just a simple vector subtraction should work
df$diff <- c(NA,df[2:nrow(df), 2] - df[1:(nrow(df)-1), 1])
start end diff
1 5 2 NA
2 4 6 1
3 2 3 -1
library(data.table)
setDT(df)[,value:=end-shift(start,1,type="lag")]
start end value
1: 5 2 NA
2: 4 6 1
3: 2 3 -1

Resources