Split data when time intervals exceed a defined value

Split data when time intervals exceed a defined value - r

I have a data frame of GPS locations with a column of seconds. How can I split create a new column based on time-gaps? i.e. for this data.frame:
df <- data.frame(secs=c(1,2,3,4,5,6,7,10,11,12,13,14,20,21,22,23,24,28,29,31))
I would like to cut the data frame when there is a time gap between locations of 3 or more seconds seconds and create a new column entitled 'bouts' which gives a running tally of the number of sections to give a data frame looking like this:
id secs bouts
1 1 1
2 2 1
3 3 1
4 4 1
5 5 1
6 6 1
7 7 1
8 10 2
9 11 2
10 12 2
11 13 2
12 14 2
13 20 3
14 21 3
15 22 3
16 23 3
17 24 3
18 28 4
19 29 4
20 31 4

Use cumsum and diff:
df$bouts <- cumsum(c(1, diff(df$secs) >= 3))
Remember that logical values get coerced to numeric values 0/1 automatically and that diff output is always one element shorter than its input.

Related

Randomly select number (without repetition) for each group in R

I have the following dataframe containing a variable "group" and a variable "number of elements per group"
group elements
1 3
2 1
3 14
4 10
.. ..
.. ..
30 5
then I have a bunch of numbers going from 1 to (let's say) 30
when summing "elements" I would get 900. what I want to obtain is to randomly select a number (from 0 to 30) from 1-30 and assign it to each group until I fill the number of elements for that group. Each of those should appear 30 times in total.
thus, for group 1, I want to randomly select 3 number from 0 to 30
for group 2, 1 number from 0 to 30 etc. until I filled all of the groups.
the final table should look like this:
group number(randomly selected)
1 7
1 20
1 7
2 4
3 21
3 20
...
any suggestions on how I can achieve this?

In base R, if you have df like this...
df
group elements
1 3
2 1
3 14
Then you can do this...
data.frame(group = rep(df$group, #repeat group no...
df$elements), #elements times
number = unlist(sapply(df$elements, #for each elements...
sample.int, #...sample <elements> numbers
n=30, #from 1 to 30
replace = FALSE))) #without duplicates
group number
1 1 19
2 1 15
3 1 28
4 2 15
5 3 20
6 3 18
7 3 27
8 3 10
9 3 23
10 3 12
11 3 25
12 3 11
13 3 14
14 3 13
15 3 16
16 3 26
17 3 22
18 3 7

Give this a try:
df <- read.table(text = "group elements
1 3
2 1
3 14
4 10
30 5", header = TRUE)
# reproducibility
set.seed(1)
df_split2 <- do.call("rbind",
(lapply(split(df, df$group),
function(m) cbind(m,
`number(randomly selected)` =
sample(1:30, replace = TRUE,
size = m$elements),
row.names = NULL
))))
# remove element column name
df_split2$elements <- NULL
head(df_split2)
#> group number(randomly selected)
#> 1.1 1 25
#> 1.2 1 4
#> 1.3 1 7
#> 2 2 1
#> 3.1 3 2
#> 3.2 3 29
The split function splits the df into chunks based on the group column. We then take those smaller data frames and add a column to them by sampling 1:30 a total of elements time. We then do.call on this list to rbind back together.

Yo have to generate a new dataframe repeating $group $element times, and then using sample you can generate the exact number of random numbers:
data<-data.frame(group=c(1,2,3,4,5),
elements=c(2,5,2,1,3))
data.elements<-data.frame(group=rep(data$group,data$elements),
number=sample(1:30,sum(data$elements)))
The result:
group number
1 1 9
2 1 4
3 2 29
4 2 28
5 2 18
6 2 7
7 2 25
8 3 17
9 3 22
10 4 5
11 5 3
12 5 8
13 5 26

I solved as follow:
random_sample <- rep(1:30, each=30)
random_sample <- sample(random_sample)
then I create a df with this variable and a variable containing one group per row repeated by the number of elements in the group itself

r - aggregate / substract two variables, rows

I'm using the aggregate function for calculating the difference for every observation of two variables,so somehow like this (and the I want to save the result as a new variable) :
data1
Group Points_Attempt1 Points_Attempt2
1 1 10 5
2 1 34 23
3 1 50 5
4 1 10 12
5 2 11 21
6 2 23 23
7 2 32 10
8 2 12 10
I'm able to do something like this:
aggregate(data1[c("Points_Attempt1","Points_Attempt2")],list(data1$group),diff)
But I want it for every single observations and I just do not now to select the observations, so somehow the row numbers (here from 1-8).
So I'm searching for the following fourth column (Difference), which I then would like to safe as a new variable:
Group Points_Attempt1 Points_Attempt2 Difference
1 1 10 5 5
2 1 34 23 11
3 1 50 5 45
4 1 10 12 -2
5 2 11 21 -10
6 2 23 23 0
7 2 32 10 22
8 2 12 10 2
I would be highly thankful, if someone could help me with this.

We can use mutate_each
library(dplyr)
data1 %>%
group_by(Group) %>%
mutate_each(funs(c(NA, diff(.))), 2:3)
Or if we need to subtract between the variables,
data1 %>%
mutate(Difference = Points_Attemp1 - Points_Attemp2)

Counting rows based on column values in R

I have a dataframe df
Reads Counts
aaaa 10
bbbb 20
cccc 25
and so on.
I want to calculate the number of reads which exceed a certain value of counts and plot that. Example I want a data frame that looks like
Counts>= #reads with Counts>=
1 3
2 3
3 3
11 2
20 2
21 1
and so on. Can you suggest how I can get such a dataframe and plot it.

Given the levels you want to plot at...
cutoffs <- 1:30
... you could do something like:
data.frame(cutoff=cutoffs, num.above=Reduce("+", lapply(dat$Counts, ">=", cutoffs)))
# cutoff num.above
# 1 1 3
# 2 2 3
# 3 3 3
# 4 4 3
# 5 5 3
# 6 6 3
# 7 7 3
# 8 8 3
# 9 9 3
# 10 10 3
# 11 11 2
# 12 12 2
# 13 13 2
# 14 14 2
# 15 15 2
# 16 16 2
# 17 17 2
# 18 18 2
# 19 19 2
# 20 20 2
# 21 21 1
# 22 22 1
# 23 23 1
# 24 24 1
# 25 25 1
# 26 26 0
# 27 27 0
# 28 28 0
# 29 29 0
# 30 30 0
Basically for each value in the original data frame you compute a vector of whether it's greater than or equal to each cutoff (using lapply with >=). Then you add them up (using Reduce with +), getting the total number greater than or equal to each cutoff.

Another option would be using outer/colSums
cutoff <- 1:30
data.frame(cutoff=cutoffs, num.above=colSums(outer(df$Counts, cutoffs, ">=")))

How to use less than or equal to a value of a column as a condition to select the row in another column?

Simple question, I think. Basically, I want to use the concept "less than or equal to a number" as the condition to select the row of one column, and then find the value on the same row in another column. But what happens if the number stated in the condition isn't found in the first column?
Let's assume this is my data frame:
df<-as.data.frame((matrix(c(1:10,11:20), nrow = 10, ncol = 2)))
df
V1 V2
1 1 11
2 2 12
3 3 13
4 4 14
5 5 15
6 6 16
7 7 17
8 8 18
9 9 19
10 10 20
Let's assume I want to use the condition <=5 in df$V1 to obtain the row that is used to find the value of the same row in df$V2.
df[which(df$V1 <= 5),2]
15
But what happens if the number used in the condition isn't found? Let's assume this is my new data.frame
V1 V2
1 1 11
2 2 12
3 3 13
4 4 14
5 6 15
6 7 16
7 8 17
8 9 18
9 10 19
10 11 20
Using the same above command df[which(df$V1 <= 5),2], I obtain a different answer. For some reason I obtain the entire column instead of one number.
11 12 13 14 15 16 17 18 19 20
Any suggestions?

Use the subset operator:
df[df[,2]<= 5,1]

How do I perform mathematical operations between values in two columns of an R data frame based on their position?

I'm working on an R project where I'm trying to compare frequencies their respective values. Essentially I have a 11852X3 column data frame with position number in slot 1, a unique value ranging from 1-11852 in the second column, and then the same set of unique values just in different positions in column 3.
Essentially because the values in columns 2 and 3 have overlap I want to find the difference between these two values based on the position number (1st) column on the far left and store it in another data frame. So if the the second column has the value 2017 in position one and then the third column also has 2017 in position one, the new data frame would have an entry of 2017 and then a value of 0 since they have the same position. If column 2 has a value of 5276 in the second position, and column 3 has the value 5276 in position 73 then the new data frame would have a value of 70.
I would love some guidance as to the way on how to do this. Thanks.

Let me know if the below code works fro you. The code will generate negative values if the number in 3rd column occurs above the number in 2nd column.
#Generate simulated data
n = 20
x <- data.frame(c1 = c(1:n), c2 = sample(n),c3 = sample(n))
#Calculate diff in position by taking difference in order
x$diff = order(x$c3)- order(x$c2)
#Reassign difference to its correct position
x$diff[order(x$c2)] <- x$diff
x
c1 c2 c3 diff
1 1 12 8 4
2 2 11 5 9
3 3 7 4 6
4 4 15 3 12
5 5 19 12 12
6 6 13 1 12
7 7 9 14 12
8 8 18 16 7
9 9 8 7 -8
10 10 16 20 -2
11 11 6 11 1
12 12 4 6 -9
13 13 14 10 -6
14 14 5 17 -12
15 15 10 18 -2
16 16 1 15 -10
17 17 3 19 -13
18 18 2 13 2
19 19 17 9 -5
20 20 20 2 -10

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Split data when time intervals exceed a defined value - r

Use cumsum and diff: df$bouts <- cumsum(c(1, diff(df$secs) >= 3)) Remember that logical values get coerced to numeric values 0/1 automatically and that diff output is always one element shorter than its input.

Related

Randomly select number (without repetition) for each group in R

r - aggregate / substract two variables, rows

Counting rows based on column values in R

How to use less than or equal to a value of a column as a condition to select the row in another column?

How do I perform mathematical operations between values in two columns of an R data frame based on their position?

Categories

Resources