I have a dataset with repeated measures which I want to use to assign IDs. The repeated measures are from a sequence of consecutive days. However, the sequence itself may be unbalanced (e.g., some have more days while others have less, some start with day 1 but a few others may start with 2 or 3). My question is how to create and assign the same ID withinid the same block of sequence. Here is a toy dataset:
days <- data.frame(
day = c(1L,2L,3L,4L,5L,6L,8L,9L,10L,
2L,3L,4L,5L,6L,7L,9L,10L,
1L,2L,4L,5L,6L,8L,9L,10L,
1L,2L,3L,4L,5L,6L,7L,8L,9L,10L)
)
Here is the end result I expect:
id day
1 1 1
2 1 2
3 1 3
4 1 4
5 1 5
6 1 6
7 1 8
8 1 9
9 1 10
10 2 2
11 2 3
12 2 4
13 2 5
14 2 6
15 2 7
16 2 9
17 2 10
18 3 1
19 3 2
20 3 4
21 3 5
22 3 6
23 3 8
24 3 9
25 3 10
26 4 1
27 4 2
28 4 3
29 4 4
30 4 5
31 4 6
32 4 7
33 4 8
34 4 9
35 4 10
Get the difference between adjacent elements and check if it is less than 0, take the cumulative sum
days$id <- cumsum(c(TRUE, diff(days$day) < 0))
Related
New to R and computer science so help appreciated
Trying to figure out how to create a new column (y) in a data frame that corresponds to the value of (x), every value of 10 in x should given to a new y. Quite difficult to explain, don't know where to start either with a loop or if statement. For Example current data set
event_id x
1 0
2 2
3 5
4 11
5 12
6 17
7 25
8 28
9 30
10 34
but I want it to look like this
event_id x y
1 0 1
2 2 1
3 5 1
4 11 2
5 12 2
6 17 2
7 25 3
8 28 3
9 30 3
10 34 4
Hope this makes sense as the first 3 values are all < 10 so are given a value of 1, but then this repeats as the next 3 values are between 10-20 so y is 2 etc.
df$y <- with(
df,
findInterval(x, seq(0, max(x) + 10, by = 10))
)
df
event_id x y
1 1 0 1
2 2 2 1
3 3 5 1
4 4 11 2
5 5 12 2
6 6 17 2
7 7 25 3
8 8 28 3
9 9 30 4
10 10 34 4
This assumes 30 should be mapped 4, just as 0 is mapped to 1.
You can use
df$y = df$x %/% 10 + 1
df$y <- floor(df$x / 10.1) + 1
matches OP request:
# event_id x y
#1 1 0 1
#2 2 2 1
#3 3 5 1
#4 4 11 2
#5 5 12 2
#6 6 17 2
#7 7 25 3
#8 8 28 3
#9 9 30 3
#10 10 34 4
Suppose I have the next data frame.
table<-data.frame(group=c(0,5,10,15,20,25,30,35,40,0,5,10,15,20,25,30,35,40,0,5,10,15,20,25,30,35,40),plan=c(1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3),price=c(1,4,5,6,8,9,12,12,12,3,5,6,7,10,12,20,20,20,5,6,8,12,15,20,22,28,28))
group plan price
1 0 1 1
2 5 1 4
3 10 1 5
4 15 1 6
5 20 1 8
6 25 1 9
7 30 1 12
8 35 1 12
9 40 1 12
10 0 2 3
11 5 2 5
12 10 2 6
13 15 2 7
14 20 2 10
15 25 2 12
16 30 2 20
17 35 2 20
18 40 2 20
How can I get the values from the table up to the maximum price, without duplicates.
So the result would be:
group plan price
1 0 1 1
2 5 1 4
3 10 1 5
4 15 1 6
5 20 1 8
6 25 1 9
7 30 1 12
10 0 2 3
11 5 2 5
12 10 2 6
13 15 2 7
14 20 2 10
15 25 2 12
16 30 2 20
You can use slice in dplyr:
library(dplyr)
table %>%
group_by(plan) %>%
slice(1:which.max(price == max(price)))
which.max gives the index of the first occurrence of price == max(price). Using that, I can slice the data.frame to only keep rows for each plan up to the maximum price.
Result:
# A tibble: 22 x 3
# Groups: plan [3]
group plan price
<dbl> <dbl> <dbl>
1 0 1 1
2 5 1 4
3 10 1 5
4 15 1 6
5 20 1 8
6 25 1 9
7 30 1 12
8 0 2 3
9 5 2 5
10 10 2 6
# ... with 12 more rows
I have a data frame consisting of the fluorescence read out of multiple cells tracked over time, for example:
Number=c(1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4)
Fluorescence=c(9,10,20,30,8,11,21,31,6,12,22,32,7,13,23,33)
df = data.frame(Number, Fluorescence)
Which gets:
Number Fluorescence
1 1 9
2 2 10
3 3 20
4 4 30
5 1 8
6 2 11
7 3 21
8 4 31
9 1 6
10 2 12
11 3 22
12 4 32
13 1 7
14 2 13
15 3 23
16 4 33
Number pertains to the cell number. What I want is to collate the fluorescence readout based on the cell number. The data.frame here has it counting 1-4, whereas really I want something like this:
Number Fluorescence
1 1 9
2 1 8
3 1 6
4 1 7
5 2 10
6 2 11
7 2 12
8 2 13
9 3 20
10 3 21
11 3 22
12 3 23
13 4 30
14 4 31
15 4 32
16 4 33
Or even more ideal would be having columns based on Number, then respective cell fluorescence:
1 2 3 4
1 9 10 20 30
2 8 11 21 31
3 6 12 22 32
4 7 13 23 33
I've used the which function to extract them one at a time:
Cell1=df[which(df[,1]==1),2]
But this would require me to write a line for each cell (of which there are hundreds).
Thank you for any help with this! Apologies that I'm still a bit of an R noob.
How about this:
library(tidyr);library(data.table)
number <- c(1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4)
fl <- c(9,10,20,30,8,11,21,31,6,12,22,32,7,13,23,33)
df <- data.table(number,fl)
df[, index:=1:.N, keyby=number]
df
number fl index
1: 1 9 1
2: 1 8 2
3: 1 6 3
4: 1 7 4
5: 2 10 1
6: 2 11 2
7: 2 12 3
8: 2 13 4
9: 3 20 1
10: 3 21 2
11: 3 22 3
12: 3 23 4
13: 4 30 1
14: 4 31 2
15: 4 32 3
16: 4 33 4
The index is added for the unique identifier in spread function from tidyr. Look this post for more information.
spread(df,number,fl)
index 1 2 3 4
1: 1 9 10 20 30
2: 2 8 11 21 31
3: 3 6 12 22 32
4: 4 7 13 23 33
So, I’ve been trying to get this working but for some reason, I’m just not making any progress on this. And I was hoping if you guys could help me. Pretty much, I have a data frame that I would like to get the average of a specific range of values, where these values are from other columns within the same data frame, for each user.
So, let’s say I have this data frame.
a<-data.frame(user=c(rep(1,10),rep(2,10),rep(3,10)),
values=c(1:30),toot=c(rep(4,10),rep(5,10),rep(3,10)))
user values toot
1 1 4
1 2 4
1 3 4
1 4 4
1 5 4
1 6 4
1 7 4
1 8 4
1 9 4
1 10 4
2 11 5
2 12 5
2 13 5
2 14 5
2 15 5
2 16 5
2 17 5
2 18 5
2 19 5
2 20 5
3 21 3
3 22 3
3 23 3
3 24 3
3 25 3
3 26 3
3 27 3
3 28 3
3 29 3
3 30 3
So, what I would like is to take the average of the values between 2 elements prior of the toot element through the toot element.
Here's what I'm looking for:
user values toot deck
1 1 4 3
1 2 4 3
1 3 4 3
1 4 4 3
1 5 4 3
1 6 4 3
1 7 4 3
1 8 4 3
1 9 4 3
1 10 4 3
2 11 5 14
2 12 5 14
2 13 5 14
2 14 5 14
2 15 5 14
2 16 5 14
2 17 5 14
2 18 5 14
2 19 5 14
2 20 5 14
3 21 3 22
3 22 3 22
3 23 3 22
3 24 3 22
3 25 3 22
3 26 3 22
3 27 3 22
3 28 3 22
3 29 3 22
3 30 3 22
As you see, for user 1, that user’s toot value is 4, so I want to take the average of user’s 1 values at the 4th element and average it with the 2 elements before it.
This is what I have so far (with many variations of this and with the by function):
a$deck<-ave(a$values,a$user,FUN=function(x)
{
z<-a$toot
y<-z-2
mean(x[y:z])
})
But the problem is that it’s not using the toot value as it’s starting position. Here are the warning messages:
> Warning messages:
1: In y:z : numerical expression has 30 elements: only the first used
2: In y:z : numerical expression has 30 elements: only the first used
Error in mean(x[y:z]) :
error in evaluating the argument 'x' in selecting a method for function 'mean': Error in x[y:z] : only 0's may be mixed with negative subscripts
Anything is welcomed and appreciated, thanks.
You can do it with by(). Like:
do.call(rbind, by(a, a$user, function(x) { cbind(x,deck=mean(x$values[x$toot[1]:(x$toot[1]-2)])) }))
library(plyr)
ddply(a,.(user),function(df) {
df$deck <- mean(df$values[(df$toot[1]-2):df$toot[1]])
df
})
When I create a dataframe I do:
dt = data.frame(a=c(1:5),b=c(1:20))
dt
a b
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 1 6
7 2 7
8 3 8
9 4 9
10 5 10
11 1 11
12 2 12
13 3 13
14 4 14
15 5 15
16 1 16
17 2 17
18 3 18
19 4 19
20 5 20
as you can see the value of the first column (a) are repeated.
How can I create different "columns" with different number of values?
Thanks
H
Use a list. A data.frame is a special kind of list in which all elements are of the same length.
list(a=c(1:5),b=c(1:20))
$a
[1] 1 2 3 4 5
$b
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20