Increment variable based on specific string change in r - r

I am looking to create a variable that increments when specific strings appear in a column. If strings "x", "y", or "z" appear in Event, I want the sequence to increment, otherwise I would like it to stay constant. Any help would be appreciated!
See table below:
Event Seq
1 a 1
2 b 1
3 x 2
4 c 2
5 a 2
6 b 2
7 y 3
8 a 3
9 z 4
10 b 4
11 y 5
12 a 5
13 b 5

This for loop can update the Seq as you requested in your question:
for(i in 1:nrow(df)){
if(df$Event[i] %in% c('x','y','z')){
df$Seq[i] <- df$Seq[i] + 1
}
}
> df
Event Seq
1 a 1
2 b 1
3 x 3
4 c 2
5 a 2
6 b 2
7 y 4
8 a 3
9 z 5
10 b 4
11 y 6
12 a 5
13 b 5

Related

get the value of a cell of a dataframe based on the value in one of the columns in R

I have an example of a data frame in which columns "a" and "b" have certain values, and in column "c" the values are 1 or 2. I would like to create column "d" in which the value found in the frame will be located at the index specified in column "c".
x = data.frame(a = c(1:10), b = c(3:12), c = seq(1:2))
x
a b c
1 1 3 1
2 2 4 2
3 3 5 1
4 4 6 2
5 5 7 1
6 6 8 2
7 7 9 1
8 8 10 2
9 9 11 1
10 10 12 2
thus column "d" for the first row will contain the value 1, since the index in column "c" is 1, for the second row d = 4, since the index in column "c" is 2, and so on. I was not helped by the standard indexing in R, it just returns the value of the column c. in what ways can I solve my problem?
You may can create a matrix of row and column numbers to subset values from the dataframe.
x$d <- x[cbind(1:nrow(x), x$c)]
x
# a b c d
#1 1 3 1 1
#2 2 4 2 4
#3 3 5 1 3
#4 4 6 2 6
#5 5 7 1 5
#6 6 8 2 8
#7 7 9 1 7
#8 8 10 2 10
#9 9 11 1 9
#10 10 12 2 12
If the input is tibble, you need to change the tibble to dataframe to use the above answer.
If you don't want to change to dataframe, here is another option using rowwise.
library(dplyr)
x <- tibble(x)
x %>% rowwise() %>% mutate(d = c_across()[c])
By using dplyr::mutate and ifelse,
x %>% mutate(d = ifelse(c == 1, a, b))
a b c d
1 1 3 1 1
2 2 4 2 4
3 3 5 1 3
4 4 6 2 6
5 5 7 1 5
6 6 8 2 8
7 7 9 1 7
8 8 10 2 10
9 9 11 1 9
10 10 12 2 12

Assign unique non-repeated ID to nested groups with the same values in R

I have run across similar questions, but have not been able to find an answer for my specific needs.
I have a data set with a nested group design and I need to include a unique non-repeating ID to nested groups that can have identical values. While I regularly conduct this type of data wrangling, both the structure of this data set as well as the required outcome are beyond my skillset at this time.
Below I have provided an example data set (df) and what the results should look like.
I used the below code in my actual data set, but realized that it fails under certain circumstances...which are exaggerated in the example data set provided here. I prefer the ID to be sequentially numbered.
df$ID = cumsum(c(TRUE, diff(df$LENGTH) != 0))
I am open to all options (e.g., library(data.table), library(boot), etc) as it would be great if others find this post useful. However, I prefer solutions that do not require the installation and loading of additional packages.
Thanks in advance for you help.
Take care.
df <- read.table(text = "GROUP REGION TIME LENGTH
a x 1 3
a x 2 3
a x 3 3
a y 4 3
a y 5 3
a y 6 3
a z 7 2
a z 8 2
b z 1 2
b z 2 2
b x 3 2
b x 4 2
c x 1 2
c x 2 2
c y 3 2
c y 4 2
c x 5 2
c x 6 2
c z 7 1", header = TRUE)
result <- read.table(text = "GROUP REGION TIME LENGTH ID
a x 1 3 1
a x 2 3 1
a x 3 3 1
a y 4 3 2
a y 5 3 2
a y 6 3 2
a z 7 2 3
a z 8 2 3
b z 1 2 4
b z 2 2 4
b x 3 2 5
b x 4 2 5
c x 1 2 6
c x 2 2 6
c y 3 2 7
c y 4 2 7
c x 5 2 8
c x 6 2 8
c z 7 1 9", header = TRUE)
Paste GROUP and REGION columns and use rle to create a sequential ID column.
transform(df,ID = with(rle(paste(GROUP, REGION)),rep(seq_along(values),lengths)))
In data.table we can use rleid.
library(data.table)
setDT(df)[, ID := rleid(GROUP, REGION)]
# GROUP REGION TIME LENGTH ID
# 1: a x 1 3 1
# 2: a x 2 3 1
# 3: a x 3 3 1
# 4: a y 4 3 2
# 5: a y 5 3 2
# 6: a y 6 3 2
# 7: a z 7 2 3
# 8: a z 8 2 3
# 9: b z 1 2 4
#10: b z 2 2 4
#11: b x 3 2 5
#12: b x 4 2 5
#13: c x 1 2 6
#14: c x 2 2 6
#15: c y 3 2 7
#16: c y 4 2 7
#17: c x 5 2 8
#18: c x 6 2 8
#19: c z 7 1 9
Another base R option, but without rle
transform(
df,
ID = cumsum(c(1, (s <- paste0(GROUP, REGION))[-1] != head(s, -1)))
)
gives
GROUP REGION TIME LENGTH ID
1 a x 1 3 1
2 a x 2 3 1
3 a x 3 3 1
4 a y 4 3 2
5 a y 5 3 2
6 a y 6 3 2
7 a z 7 2 3
8 a z 8 2 3
9 b z 1 2 4
10 b z 2 2 4
11 b x 3 2 5
12 b x 4 2 5
13 c x 1 2 6
14 c x 2 2 6
15 c y 3 2 7
16 c y 4 2 7
17 c x 5 2 8
18 c x 6 2 8
19 c z 7 1 9
With dplyr
library(dplyr)
library(data.table)
df %>%
mutate(ID = rleid(GROUP, REGION))

What does subset(df, !duplicated(x)) do?

Looking for a detailed answer.
When we have a data frame (df) that contains three variables x, y, and z, what does the following command do?
subset(df, !duplicated(x))
The duplicated function traverses its argument(s) sequentially and returns TRUE if there has been a prior value identical to the current value. It is a generic function, so it has a default definition (for vectors) but also a definition for other classes, such as objects of the data.frame class. The subset function treats expressions passed as a second or third argument as though column names are first class objects. This is called "non-standard evaluation". (Notice the negation operator.) So this call to subset will return the rows of a data.frame where only the first instance of the column named "x" is not duplicated. It would probably return a dataframe with only the number of rows that equal the number of unique items in the x column.
> dat <- data.frame( x =sample(1:5, 20, repl=TRUE), y=1:5, z=1:4)
> dat
x y z
1 2 1 1
2 2 2 2
3 2 3 3
4 5 4 4
5 4 5 1
6 1 1 2
7 2 2 3
8 2 3 4
9 5 4 1
10 1 5 2
11 2 1 3
12 4 2 4
13 5 3 1
14 4 4 2
15 3 5 3
16 3 1 4
17 4 2 1
18 4 3 2
19 1 4 3
20 1 5 4
> subset(dat, !duplicated(x))
x y z
1 2 1 1
4 5 4 4
5 4 5 1
6 1 1 2
15 3 5 3

How to reverse a column in R

I have a dataframe as described below. Now I want to reverse the order of column B without hampering the total order of the dataframe. So now the column B has 5,4,3,2,1. I want to change it to 1,2,3,4,5. I don't want to sort as it will hamper the total ordering.
A B C
1 5 6
2 4 8
3 3 5
4 2 5
5 1 3
You can replace just that column:
x$B <- rev(x$B)
On your data:
> x$B <- rev(x$B)
> x
A B C
1 1 1 6
2 2 2 8
3 3 3 5
4 4 4 5
5 5 5 3
transform is also handy for this:
> transform(x, B = rev(B))
A B C
1 1 1 6
2 2 2 8
3 3 3 5
4 4 4 5
5 5 5 3
This doesn't modify x so you need to assign the result to something (perhaps back to x).

R cumulative sum based upon other columns

I have a data.frame as below. The data is sorted by column txt and then by column val. summ column is sum of value in val colummn and the summ column value from the earlier row provided that the current row and the earlier row have same value in txt column...How could i do this in R?
txt=c(rep("a",4),rep("b",5),rep("c",3))
val=c(1,2,3,4,1,2,3,4,5,1,2,3)
summ=c(1,3,6,10,1,3,6,10,15,1,3,6)
dd=data.frame(txt,val,summ)
> dd
txt val summ
1 a 1 1
2 a 2 3
3 a 3 6
4 a 4 10
5 b 1 1
6 b 2 3
7 b 3 6
8 b 4 10
9 b 5 15
10 c 1 1
11 c 2 3
12 c 3 6
If by "most earlier" (which in English is more properly written "earliest") you mean the nearest, which is what is implied by your expected output, then what you're talking about is a cumulative sum. You can apply cumsum() separately to each group of txt with ave():
dd <- data.frame(txt=c(rep("a",4),rep("b",5),rep("c",3)), val=c(1,2,3,4,1,2,3,4,5,1,2,3) );
dd$summ <- ave(dd$val,dd$txt,FUN=cumsum);
dd;
## txt val summ
## 1 a 1 1
## 2 a 2 3
## 3 a 3 6
## 4 a 4 10
## 5 b 1 1
## 6 b 2 3
## 7 b 3 6
## 8 b 4 10
## 9 b 5 15
## 10 c 1 1
## 11 c 2 3
## 12 c 3 6

Resources