I have a dataframe (df1) with two columns, one (grp) is a grouping variable, the second (num) has some measurements.
For each group I want to:
replace all numbers greater than 3.5 with 4
replace all numbers after the first instance of 4 with 4
I just want to get to step 2, but step 1 seems like a logical starting point, maybe it isn't required though?
Example data
library(dplyr)
df1 <- data.frame(
grp = rep(c("a", "b"), each = 10),
num = c(0,1,2,5,0,1,7,0,2,1,2,2,2,2,5,0,0,0,0,6))
I can get the first part:
df1 %>%
group_by(grp) %>%
mutate(num = ifelse(num > 3.5, 4, num))
For the second part I tried using dplyr::lag and dplyr::case_when but no luck. Here is the desired output:
grp num
1 a 0
2 a 1
3 a 2
4 a 4
5 a 4
6 a 4
7 a 4
8 a 4
9 a 4
10 a 4
11 b 2
12 b 2
13 b 2
14 b 2
15 b 4
16 b 4
17 b 4
18 b 4
19 b 4
20 b 4
Any advice would be much appreciated.
You could use cumany() to find all cases after the first event, i.e. num > 3.5.
library(dplyr)
df1 %>%
group_by(grp) %>%
mutate(num2 = replace(num, cumany(num > 3.5), 4)) %>%
ungroup()
# A tibble: 20 × 3
grp num num2
<chr> <dbl> <dbl>
1 a 0 0
2 a 1 1
3 a 2 2
4 a 5 4
5 a 0 4
6 a 1 4
7 a 7 4
8 a 0 4
9 a 2 4
10 a 1 4
11 b 2 2
12 b 2 2
13 b 2 2
14 b 2 2
15 b 5 4
16 b 0 4
17 b 0 4
18 b 0 4
19 b 0 4
20 b 6 4
You can also replace cumany(num > 3.5) with cumsum(num > 3.5) > 0.
My data frame looks like this but with thousands of entries
type <- rep(c("A","B","C"),4)
time <- c(0,0,0,1,1,1,2,2,2,3,3,3)
counts <- c(0,30,15,30,30,10,31,30,8,30,8,0)
df <- data.frame(time,type,counts)
df
time type counts
1 0 A 0
2 0 B 30
3 0 C 15
4 1 A 30
5 1 B 30
6 1 C 10
7 2 A 31
8 2 B 30
9 2 C 8
10 3 A 30
11 3 B 8
12 3 C 0
I want at each time point bigger than 0 to extract all the types that have counts==30
and then I want to extract for these types their counts at the next time point.
I want my data to look like this
time type counts time_after type_after counts_after
1 A 30 2 A 30
1 B 30 2 B 31
2 B 30 3 B 8
Any help or guidance are appreciated
Not very elegant but should do the job
library(dplyr)
type <- rep(c("A","B","C"),4)
time <- c(0,0,0,1,1,1,2,2,2,3,3,3)
counts <- c(0,30,15,30,30,10,31,30,8,30,8,0)
df <- tibble(time,type,counts)
df
#> # A tibble: 12 x 3
#> time type counts
#> <dbl> <chr> <dbl>
#> 1 0 A 0
#> 2 0 B 30
#> 3 0 C 15
#> 4 1 A 30
#> 5 1 B 30
#> 6 1 C 10
#> 7 2 A 31
#> 8 2 B 30
#> 9 2 C 8
#> 10 3 A 30
#> 11 3 B 8
#> 12 3 C 0
thirties <- df %>%
filter(counts == 30 & time != 0) %>%
mutate(time_after = time + 1)
inner_join(thirties, df, by = c("time_after" = "time",
"type" = "type")) %>%
select(time,
type = type,
counts = counts.x,
time_after,
type_after = type,
count_after = counts.y)
#> # A tibble: 3 x 6
#> time type counts time_after type_after count_after
#> <dbl> <chr> <dbl> <dbl> <chr> <dbl>
#> 1 1 A 30 2 A 31
#> 2 1 B 30 2 B 30
#> 3 2 B 30 3 B 8
I have a huge dataset and wanted to create a binary dummy variable indicating whether a value is observed before. Here is the sample data set.
data.frame(
id = c(rep("A",3),rep("B",3),rep("C",3)),
time = rep(seq(1:3),3),
item = c(11,12,13,11,11,13,22,11,22))
From the dataset, here is the desired column,
observed_b4 = c(NA,0,0,NA,1,0,NA,0,1)
For each group, I want to have information about whether item is observed before or not. I can do it with for-loop but the data size is too big to do.
Using duplicated:
base:
cbind(x, flag = as.integer(duplicated(paste(x$id, x$item))))
# id time item flag
# 1 A 1 11 0
# 2 A 2 12 0
# 3 A 3 13 0
# 4 B 1 11 0
# 5 B 2 11 1
# 6 B 3 13 0
# 7 C 1 22 0
# 8 C 2 11 0
# 9 C 3 22 1
or dplyr:
library(dplyr)
x %>%
group_by(id) %>%
mutate(flag = as.integer(duplicated(item)))
## A tibble: 9 x 4
## Groups: id [3]
# id time item flag
# <chr> <int> <dbl> <int>
#1 A 1 11 0
#2 A 2 12 0
#3 A 3 13 0
#4 B 1 11 0
#5 B 2 11 1
#6 B 3 13 0
#7 C 1 22 0
#8 C 2 11 0
#9 C 3 22 1
A solution with base R that uses: ave and duplicated.
ave allows you to apply a function over df$item for each group made by df$id. duplicated checks whether an item was already shown. ave returns automatically a numeric vector (the name class of the input vector).
df$observed_b4 <- ave(df$item, df$id, FUN = duplicated)
df
#> id time item observed_b4
#> 1 A 1 11 0
#> 2 A 2 12 0
#> 3 A 3 13 0
#> 4 B 1 11 0
#> 5 B 2 11 1
#> 6 B 3 13 0
#> 7 C 1 22 0
#> 8 C 2 11 0
#> 9 C 3 22 1
However, to get exactly what you're looking for, you can use this:
df$observed_b4 <- ave(df$item, df$id, FUN = function(x) replace(duplicated(x),1,NA))
df
#> id time item observed_b4
#> 1 A 1 11 NA
#> 2 A 2 12 0
#> 3 A 3 13 0
#> 4 B 1 11 NA
#> 5 B 2 11 1
#> 6 B 3 13 0
#> 7 C 1 22 NA
#> 8 C 2 11 0
#> 9 C 3 22 1
We could group by 'id', 'item', create a logical vector with row_number() and coerce it to binary (+)
library(dplyr)
df1 %>%
group_by(id, item) %>%
mutate(flag = +(row_number() != 1))
-output
# A tibble: 9 x 4
# Groups: id, item [7]
# id time item flag
# <chr> <int> <dbl> <int>
#1 A 1 11 0
#2 A 2 12 0
#3 A 3 13 0
#4 B 1 11 0
#5 B 2 11 1
#6 B 3 13 0
#7 C 1 22 0
#8 C 2 11 0
#9 C 3 22 1
I have a data looks like this:
The sample data can be get by following codes:
ID<-c(1,1,1,1,2,2,2,3,3,3,4,4,4,4)
Days<-c(-5,1,18,30,1,8,16,1,8,6,-6,1,7,15)
Event_P<-c("","","P","","","","P","","","P","","","P","P")
Event_N<-c("","","","","N","","N","","","N","N","","N","N")
Event_C<-c("C","","C","","","","C","","","C","","","","")
Sample.data <- data.frame(ID, Days, Event_P, Event_N,Event_C)
I want to build a variable "Event" to capture all events. The final results will look like this:
What should I do? I would like to know as many ways as possible. Thanks.
One option could be using apply() like this. The suggestion from #AllanCameron is also a great choice. Here the code as option for you:
#Vectors
ID<-c(1,1,1,1,2,2,2,3,3,3,4,4,4,4)
Days<-c(-5,1,18,30,1,8,16,1,8,6,-6,1,7,15)
Event_P<-c("","","P","","","","P","","","P","","","P","P")
Event_N<-c("","","","","N","","N","","","N","N","","N","N")
Event_C<-c("C","","C","","","","C","","","C","","","","")
#Data
Sample.data <- data.frame(ID, Days, Event_P, Event_N,Event_C,stringsAsFactors = F)
#Option 1
index <- which(grepl('Event',names(Sample.data)))
Sample.data$Event <- apply(Sample.data[,index],1,function(x) paste0(x[x!=''],collapse='/'))
Output:
ID Days Event_P Event_N Event_C Event
1 1 -5 C C
2 1 1
3 1 18 P C P/C
4 1 30
5 2 1 N N
6 2 8
7 2 16 P N C P/N/C
8 3 1
9 3 8
10 3 6 P N C P/N/C
11 4 -6 N N
12 4 1
13 4 7 P N P/N
14 4 15 P N P/N
Duck's answer is very good, but you mentioned you want as many ways as possible so here are two more ways:
You could also use tidyverse's mutate and base r's interaction to combine the columns then use gsub to clear out all the unnecessary things:
ID<-c(1,1,1,1,2,2,2,3,3,3,4,4,4,4)
Days<-c(-5,1,18,30,1,8,16,1,8,6,-6,1,7,15)
Event_P<-c("","","P","","","","P","","","P","","","P","P")
Event_N<-c("","","","","N","","N","","","N","N","","N","N")
Event_C<-c("C","","C","","","","C","","","C","","","","")
Sample.data <- data.frame(ID, Days, Event_P, Event_N,Event_C)
library(tidyverse)
Sample.data %>%
mutate(Event = paste(Event_P, Event_N, Event_C, sep='/'),
Event = gsub('^/|^//|/$|//$', '', Event),
Event = gsub('//', '/', Event))
#> ID Days Event_P Event_N Event_C Event
#> 1 1 -5 C C
#> 2 1 1
#> 3 1 18 P C P/C
#> 4 1 30
#> 5 2 1 N N
#> 6 2 8
#> 7 2 16 P N C P/N/C
#> 8 3 1
#> 9 3 8
#> 10 3 6 P N C P/N/C
#> 11 4 -6 N N
#> 12 4 1
#> 13 4 7 P N P/N
#> 14 4 15 P N P/N
Sample.data$Event <-
interaction(Sample.data$Event_P, Sample.data$Event_N, Sample.data$Event_C, sep = '/') %>%
gsub('^/|^//|/$|//$', '', .) %>%
gsub('//', '/', .)
Sample.data
#> ID Days Event_P Event_N Event_C Event
#> 1 1 -5 C C
#> 2 1 1
#> 3 1 18 P C P/C
#> 4 1 30
#> 5 2 1 N N
#> 6 2 8
#> 7 2 16 P N C P/N/C
#> 8 3 1
#> 9 3 8
#> 10 3 6 P N C P/N/C
#> 11 4 -6 N N
#> 12 4 1
#> 13 4 7 P N P/N
#> 14 4 15 P N P/N
Created on 2020-09-18 by the reprex package (v0.3.0)
What inside the gsub(^/|^//|/$|//$) does is
^/|^//: Take out all / or // that start the string
/$|//$: Take out all / or // that end the string
I have a large data set which contains a time column and a column with the identification of a saccade or fixation of the eye (saccade = fast eye movement, fixation = relative stable eye movement). I want to calculate how long each period of fixations and saccades last, by taking the time at the start of the first "f" until the first "s" and so on. So if there are 3 consecutive rows with "s", I want it to take the time in column [i] where the first "s" appeared and the time in column [i] where the last "s" appeared before the next "f". By distracting these 2 times I know the duration of each fixation and saccade period.
The time scale is not continuous, since sometimes rows are deleted because of blinks in the data.
example.df <- data.frame(time = seq(1:100),
saccade = sample(letters[c(6, 19)], 100, replace = T))
Is there an easy way to do this?
Thanks a lot
We can create an index using rle() and then group_by() this index to sum() the time:
library(tidyverse)
example.df <- data.frame(time = seq(1:100),
saccade = sample(letters[c(6, 19)], 100, replace = T))
test <- rle(example.df$saccade == "s")
example.df$indexer <- rep(1:length(test$lengths), test$lengths)
example.df <- example.df %>%
group_by(indexer) %>%
mutate(period = time[n()] - time[1])
# A tibble: 100 x 4
# Groups: indexer [53]
time saccade indexer period
<int> <fctr> <int> <int>
1 1 s 1 1
2 2 s 1 1
3 3 f 2 0
4 4 s 3 0
5 5 f 4 3
6 6 f 4 3
7 7 f 4 3
8 8 f 4 3
9 9 s 5 1
10 10 s 5 1
# ... with 90 more rows
# drop indexer column
example.df <- example.df[setdiff(names(example.df),"indexer")]
Result as a data.frame:
example.df <- data.frame(time = seq(1:100),
saccade = sample(letters[c(6, 19)], 100, replace = T),
stringsAsFactors = FALSE)
run_len_encoding <- rle(example.df$saccade)
length_of_runs <- run_len_encoding$length
index_of_changes <- cumsum(length_of_runs)
duration <- diff(c(1,index_of_changes),1)
result.df <- data.frame(duration, state = run_len_encoding$values)
result.df
duration state
1 1 s
2 2 f
3 1 s
4 4 f
5 1 s
6 3 f
7 3 s
8 2 f
9 3 s
10 1 f
11 2 s
12 1 f
13 1 s
14 2 f
15 4 s
16 1 f
17 2 s
18 1 f
19 1 s
20 1 f
21 1 s
22 1 f
23 2 s
24 1 f
25 2 s
26 3 f
27 1 s
28 1 f
29 2 s
30 1 f
31 1 s
32 1 f
33 6 s
34 1 f
35 3 s
36 3 f
37 1 s
38 2 f
39 2 s
40 4 f
41 1 s
42 1 f
43 1 s
44 1 f
45 1 s
46 2 f
47 1 s
48 3 f
49 2 s
50 1 f
51 4 s
52 1 f
53 1 s
54 1 f
55 2 s