replacing specific values in a dataframe in R - r

I have a large df with values in a column with the sequence pattern: seq(1,3000,10).
I need to change every value in the column so that
1 = 1
11 = 2
21 = 3
31 = 4
41 = 5
The order of these numbers are jumbled in places, therefore I need to define that every 1 is converted to 1, 11 to 2, 21 to 3, 31 to 4 and so on for thousands of numbers with this sequence pattern.

Example
x <- seq(1, 100, by = 10)
# [1] 1 11 21 31 41 51 61 71 81 91
You can use %/%:
x %/% 10 + 1
# [1] 1 2 3 4 5 6 7 8 9 10

Related

Drop rows in a data frame that are in-between two integer values in R

I have this data frame coming out of certain participant's behaviour in an episodic task, and let's say the episode starts at 90 and finishes when we have a certain trigger that can be in the range of 40s. I am doing a sample dataframe with a column with the number of the rows and the other with the actual triggers.
ex1 <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)
ex2 <- c(41,1,1,90,1,1,1,44,1,90,1,2,42,1,1,1,1,90,1,41)
df <- data.frame(ex1,ex2)
> df
ex1 ex2
1 1 41
2 2 1
3 3 1
4 4 90
5 5 1
6 6 1
7 7 1
8 8 44
9 9 1
10 10 90
11 11 1
12 12 2
13 13 42
14 14 1
15 15 1
16 16 1
17 17 1
18 18 90
19 19 1
20 20 41
Now, what I am trying to do is remove all the rows that are outside the beginning and the end of the episode, as they are recordings of typed behaviour that is not interesting as it falls outside of the episode. Therefore, I want to end up with a dataframe like this:
ex1 <- c(1,4,5,6,7,8,10,11,12,13,18,19,20)
ex2 <- c(41,90,1,1,1,44,90,1,2,42,90,1,41)
df <- data.frame(ex1,ex2)
> df
ex1 ex2
1 1 41
2 4 90
3 5 1
4 6 1
5 7 1
6 8 44
7 10 90
8 11 1
9 12 2
10 13 42
11 18 90
12 19 1
13 20 41
I have been trying to use subset but I cannot make it work between a range and a number.
Thanks in advance!
Setting the values:
ex1 <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)
ex2 <- c(41,1,1,90,1,1,1,44,1,90,1,2,42,1,1,1,1,90,1,41)
before <- data.frame(ex1,ex2)
before
ex1 ex2
1 1 41
2 2 1
3 3 1
4 4 90
5 5 1
6 6 1
7 7 1
8 8 44
9 9 1
10 10 90
11 11 1
12 12 2
13 13 42
14 14 1
15 15 1
16 16 1
17 17 1
18 18 90
19 19 1
20 20 41
I have built a function that should do the work.
The function is constructed based on my understanding of your problem so there is a chance that my function would not work perfectly to your setting.
However I believe you can do your task by adjusting the function a little bit to satisfy your needs.
library(dplyr)
episode <- function(start = 90, end = 40, data){#the default value of start is 90 and the default value of end is 40
#retrieving all the row indices that correspond to values that indicates an end
end_idx <- which(data$ex2>=end & data$ex2<=end+10)
#retrieving all the row indices that correspond to values that indicates a start
start_idx <- which(data$ex2==start)
#declaring a list that would contain the extracted sub samples in your liking
sub_sample_list <- vector("list", length(start_idx))
#looping through the start indices
for(i in 1:length(start_idx)){
#extracting the minimum among those have values larger than the i-th start_idx value
temp_end <- min(end_idx[end_idx>start_idx[i]])
#extracting the rows between the i-th start index and the minimum end index that is larger than the i-th start index
temp_sub_sample <- data[start_idx[i]:temp_end,]
#saving the sub-sample in the list
sub_sample_list[[i]] <- temp_sub_sample
}
#now row binding all the extracted sub samples
clean.df <- do.call(rbind.data.frame, sub_sample_list)
#if there is an end index that is smaller than the minimum start index
if(min(end_idx)< min(start_idx)){
#only retrieve those corresponding rows and add to the clean.df
clean.df <- rbind(data[end_idx[end_idx<min(start_idx)],], clean.df)
}
#cleaning up the row numbers a bit
rownames(clean.df) <- 1:nrow(clean.df)
#sort the clean.df by ex1
clean.df <- clean.df %>% arrange(ex1)
#returning the clean.df
return(clean.df)
}
Generating the after data set by using the episode function.
after <- episode(start = 90, end = 40, before)
after
ex1 ex2
1 1 41
2 4 90
3 5 1
4 6 1
5 7 1
6 8 44
7 10 90
8 11 1
9 12 2
10 13 42
11 18 90
12 19 1
13 20 41
And base:
ex1 <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20)
ex2 <- c(41,1,1,90,1,1,1,44,1,90,1,2,42,1,1,1,1,90,1,41)
df <- data.frame(ex1,ex2)
index start of series [90] and if not row 1 and subset out rows prior to start as incomplete:
start_idx <- which(df$ex2 == 90)
df <- df[start_idx[1]:nrow(df), ]
re-index start and index end >= 40 & < 90
start_idx <- which(df$ex2 == 90)
end_idx <- which(df$ex2 >= 40 & df$ex2 < 90)
make an empty list and for loop through, subsetting out start:end sections
df_lst <- list()
for (k in 1:length(start_idx)) {
df_lst[[k]] <- df[start_idx[k]:end_idx[k], ]
}
bring them all together
df2 <- do.call('rbind' df_lst)
df2
ex1 ex2
4 4 90
5 5 1
6 6 1
7 7 1
8 8 44
10 10 90
11 11 1
12 12 2
13 13 42
18 18 90
19 19 1
20 20 41
Fairly compact.

using intervals in a column to populate values for another column

I have a dataframe:
dataframe <- data.frame(Condition = rep(c(1,2,3), each = 5, times = 2),
Time = sort(sample(1:60, 30)))
Condition Time
1 1 1
2 1 3
3 1 4
4 1 7
5 1 9
6 2 11
7 2 12
8 2 14
9 2 16
10 2 18
11 3 19
12 3 24
13 3 25
14 3 28
15 3 30
16 1 31
17 1 34
18 1 35
19 1 38
20 1 39
21 2 40
22 2 42
23 2 44
24 2 47
25 2 48
26 3 49
27 3 54
28 3 55
29 3 57
30 3 59
I want to divide the total length of Time (i.e., max(Time) - min(Time)) per Condition by a constant 'x' (e.g., 3). Then I want to use that quotient to add a new variable Trial such that my dataframe looks like this:
Condition Time Trial
1 1 1 A
2 1 3 A
3 1 4 B
4 1 7 C
5 1 9 C
6 2 11 A
7 2 12 A
8 2 14 B
9 2 16 C
10 2 18 C
... and so on
As you can see, for Condition 1, Trial is populated with unique identifying values (e.g., A, B, C) every 2.67 seconds = 8 (total time) / 3. For Condition 2, Trial is populated every 2.33 seconds = 7 (total time) /3.
I am not getting what I want with my current code:
dataframe %>%
group_by(Condition) %>%
mutate(Trial = LETTERS[cut(Time, 3, labels = F)])
# Groups: Condition [3]
Condition Time Trial
<dbl> <int> <chr>
1 1 1 A
2 1 3 A
3 1 4 A
4 1 7 A
5 1 9 A
6 2 11 A
7 2 12 A
8 2 14 A
9 2 16 A
10 2 18 A
# ... with 20 more rows
Thanks!
We can get the diffrence of range (returns min/max as a vector) and divide by the constant passed into i.e. 3 as the breaks in cut). Then, use integer index (labels = FALSE) to get the corresponding LETTER from the LETTERS builtin R constant
library(dplyr)
dataframe %>%
group_by(Condition) %>%
mutate(Trial = LETTERS[cut(Time, diff(range(Time))/3,
labels = FALSE)])
If the grouping should be based on adjacent values in 'Condition', use rleid from data.table on the 'Condition' column to create the grouping, and apply the same code as above
library(data.table)
dataframe %>%
group_by(grp = rleid(Condition)) %>%
mutate(Trial = LETTERS[cut(Time, diff(range(Time))/3,
labels = FALSE)])
Here's a one-liner using my santoku package. The rleid line is the same as mentioned in #akrun's solution.
dataframe %<>%
group_by(grp = data.table::rleid(Condition)) %>%
mutate(
Trial = chop_evenly(Time, intervals = 3, labels = lbl_seq("A"))
)

How can I create a new column with the same id every n rows in R?

I have a data frame where I want to create a new column in which to assign the same ID every 30 rows.
My data frame is from an experiment and I wish to create a new "bloc" column, so that every 30 rows it increments by 1
example:
col1 : response latency = 1,0002, 1.2566, ...30times, 1.5422, ...
col2 : difficulty = easy, hard, intermediate, ...
col3 : ID = 1, 2, 3, ...30times, 31, 32, ...
And I want a new column
new col : bloc = 1, 1, ...30times, 2, 2, ...30times, 3, 3, ...
Using 5 as an example, but this of course works the same for 30
df <- data.frame(rownum = 1:23)
bloc_len <- 5
df$bloc <-
rep(seq(1, 1 + nrow(df) %/% bloc_len), each = bloc_len, length.out = nrow(df))
df
# rownum bloc
# 1 1 1
# 2 2 1
# 3 3 1
# 4 4 1
# 5 5 1
# 6 6 2
# 7 7 2
# 8 8 2
# 9 9 2
# 10 10 2
# 11 11 3
# 12 12 3
# 13 13 3
# 14 14 3
# 15 15 3
# 16 16 4
# 17 17 4
# 18 18 4
# 19 19 4
# 20 20 4
# 21 21 5
# 22 22 5
# 23 23 5
You could also use %/% (same output)
df$bloc <-
1 + seq(0, nrow(df) - 1) %/% bloc_len
You can use rep(x, times) function to create the bloc you wished.
See the example above
set.seed(12345)
Create a random data set
data <- data.frame(
response_latency = abs(rnorm(90, 2, 1)),
difficulty = sample(c("easy", "hard", "intermediate"), 90, replace = TRUE),
ID = 1:90
)
head(data, n = 35)
response_latency difficulty ID bloc
1 1.8890497 intermediate 1 1
2 2.9996586 intermediate 2 1
3 3.0255886 hard 3 1
4 0.3949156 hard 4 1
5 2.0027199 easy 5 1
6 2.9580737 hard 6 1
7 1.3337903 intermediate 7 1
8 1.4844084 hard 8 1
9 1.3941750 hard 9 1
10 1.6923244 intermediate 10 1
11 1.8186642 easy 11 1
12 0.9167691 easy 12 1
13 2.5987185 easy 13 1
14 1.8345693 intermediate 14 1
15 0.9177725 hard 15 1
16 2.3445309 easy 16 1
17 2.5187724 hard 17 1
18 1.2220053 hard 18 1
19 2.1636086 hard 19 1
20 0.7847963 hard 20 1
21 1.3785363 hard 21 1
22 2.9451529 intermediate 22 1
23 2.3722482 intermediate 23 1
24 2.1812877 intermediate 24 1
25 0.1383615 easy 25 1
26 1.3996498 easy 26 1
27 3.7593749 hard 27 1
28 2.0056114 hard 28 1
29 3.2195714 hard 29 1
30 2.1481248 easy 30 1
31 3.2546741 intermediate 31 2
32 2.4221608 hard 32 2
33 2.0465687 intermediate 33 2
34 1.7649423 easy 34 2
35 1.7338255 hard 35 2
Here, to add the bloc column in your dataset, you can use the following code:
bloc <- c(rep(x = 1, times = 30), rep(x = 2, times = 30), rep(x = 3, times = 30))
data$bloc <- bloc
head(data,n=35)
The new dataset will be as follow.
response_latency difficulty ID bloc
1 1.8890497 intermediate 1 1
2 2.9996586 intermediate 2 1
3 3.0255886 hard 3 1
4 0.3949156 hard 4 1
5 2.0027199 easy 5 1
6 2.9580737 hard 6 1
7 1.3337903 intermediate 7 1
8 1.4844084 hard 8 1
9 1.3941750 hard 9 1
10 1.6923244 intermediate 10 1
11 1.8186642 easy 11 1
12 0.9167691 easy 12 1
13 2.5987185 easy 13 1
14 1.8345693 intermediate 14 1
15 0.9177725 hard 15 1
16 2.3445309 easy 16 1
17 2.5187724 hard 17 1
18 1.2220053 hard 18 1
19 2.1636086 hard 19 1
20 0.7847963 hard 20 1
21 1.3785363 hard 21 1
22 2.9451529 intermediate 22 1
23 2.3722482 intermediate 23 1
24 2.1812877 intermediate 24 1
25 0.1383615 easy 25 1
26 1.3996498 easy 26 1
27 3.7593749 hard 27 1
28 2.0056114 hard 28 1
29 3.2195714 hard 29 1
30 2.1481248 easy 30 1
31 3.2546741 intermediate 31 2
32 2.4221608 hard 32 2
33 2.0465687 intermediate 33 2
34 1.7649423 easy 34 2
35 1.7338255 hard 35 2

R creates a different sequence of numbers

Newbie here
I want an equidistant series of numbers between 0 and 20.
Why do I get two different sets of numbers?
0:20
#[1] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
seq(0:20)
#[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
Any help would be appreciated. Thank you
Apparently when you pass a vector to seq(), it just prints 1:length(vector), as in
> seq(c(2, 4, 6, 100))
[1] 1 2 3 4
> seq(c('a', 2, mean))
[1] 1 2 3
I don't think that's how you typically use seq(). You'll get the behavior you expect if you pass the first value in the sequence, the last value, and optionally the length of the output or the step size. Better would be
> seq(0, 20)
[1] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
> seq(from = 0, to = 20)
[1] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Look at the seq header from the documentation.
seq(from = 1, to = 1, by = ((to - from)/(length.out - 1)), ...)
To achieve the same behaviour as the first statement you should call it with 2 parameters (from and to).
In your second statement you are calling it with only one parameter and it seems that it is using the length of the given parameter as the number of elements to generate from the default from value, which is 1.
Check this quick example, with a 6-elements vector:
> seq(c(1,4,5,6,2,3))
[1] 1 2 3 4 5 6
In your case, the length of the list 0:20 is 20 and generates 20 numbers from 1: 1,2,...,21

R: how to use expand.grid to generate combinations based on group

I am trying to get all combinations of values per group. I want to prevent combination of values between different groups.
To create all combinations of values (no matter which group the value belongs) vaI can use:
expand.grid(value, value)
Awaited result should be the subset of result of previous command.
Example:
#base data
value = c(1,3,5, 1,5,7,9, 2)
group = c("a", "a", "a","b","b","b","b", "c")
base <- data.frame(value, group)
#creating ALL combinations of value
allComb <- expand.grid(base$value, base$value)
#awaited result is subset of allComb.
#Note: first colums shows the number of row from allComb.
#Empty rows are separating combinations per group and are shown only for clarification.
Var1 Var2
1 1 1
2 3 1
3 5 1
11 1 3
12 3 3
13 5 3
21 1 5
22 3 5
23 5 5
34 1 1
35 5 1
36 7 1
37 9 1
44 1 5
45 5 5
46 7 5
47 9 5
54 1 7
55 5 7
56 7 7
57 9 7
64 1 9
65 5 9
66 7 9
67 9 9
78 2 2

Resources