How to remove duplicate consecutive text in R separated by : [duplicate] - r

This question already has an answer here:
Selecting only unique values from a comma separated string [duplicate]
(1 answer)
Closed 5 years ago.
the data set looks like
id agent final_col
1 1 A:A A
2 1 A:A A
3 2 B B
4 3 C C
5 4 A:C:C A:C
6 4 A:C:C A:C
7 4 A:C:C A:C
How can I remove duplicate entries, to have a clean column like the final_col in R?

Let's just generate a new column based on df$agent
df$final_col <- sapply(df$agent, function(txt){
paste(unique(unlist(strsplit(txt, ":"))), collapse=":")
})
For each element we split by :, select unique elements, and again put them together.

You can do this with gsub and a regular expression
gsub("\\b(\\w+)(\\:\\1)+\\b", "\\1", DAT$agent)
[1] "A" "A" "B" "C" "A:C" "A:C" "A:C"
Your Data
DAT = read.table(text=" id agent final_col
1 1 A:A A
2 1 A:A A
3 2 B B
4 3 C C
5 4 A:C:C A:C
6 4 A:C:C A:C
7 4 A:C:C A:C",
header=TRUE, stringsAsFactors=FALSE)

Related

Combining the rows of a dataframe where each row is a df itself [duplicate]

This question already has answers here:
Combine a list of data frames into one data frame by row
(10 answers)
Closed 1 year ago.
I have an object with each row being a dataframe or list itself like this:
[[1]]
1: a b c d
1 1 2 4
[[2]]
1: a b c d
4 3 6 2
[[3]]
1: a b c d
1 2 2 1
How can I transform this to a dataframe like below?
a b c d
1 1 2 4
4 3 6 2
1 2 2 1
We can use rbindlist
library(data.table)
rbindlist(lst1)
Or with rbind and do.call in base R
do.call(rbind, lst1)

R create group variable based on row order and condition

I have a dataframe containing multiple groups that are not explicitly stated. Instead, new group always start when type == 1, and is the same for following rows, containing type == 2. The number of rows per group can vary.
How can I explicitly create new variable based on order of another column? The groups, of course, should be exclusive.
My data:
df <- data.frame(type = c(1,2,2,1,2,1,2,2,2,1),
stand = 1:10)
Expected output with new group myGroup:
type stand myGroup
1 1 1 a
2 2 2 a
3 2 3 a
4 1 4 b
5 2 5 b
6 1 6 c
7 2 7 c
8 2 8 c
9 2 9 c
10 1 10 d
One option could be:
with(df, letters[cumsum(type == 1)])
[1] "a" "a" "a" "b" "b" "c" "c" "c" "c" "d"
Here is another option using rep() + diff(), but not as simple as the approach by #tmfmnk
idx <- which(df$type==1)
v <- diff(which(df$type==1))
df$myGroup <- rep(letters[seq(idx)],c(v <- diff(which(df$type==1)),nrow(df)-sum(v)))
such that
> df
type stand myGroup
1 1 1 a
2 2 2 a
3 2 3 a
4 1 4 b
5 2 5 b
6 1 6 c
7 2 7 c
8 2 8 c
9 2 9 c
10 1 10 d

Repeat rows with a variable in r [duplicate]

This question already has answers here:
Repeat each row of data.frame the number of times specified in a column
(10 answers)
Closed 3 years ago.
I have a data.frame with n rows and I would like to repeat this rows according to the observation of another variable
This is an example for a data.frame
df <- data.frame(a=1:3, b=letters[1:2])
df
a b
1 1 a
2 2 b
3 3 c
And this one is an example for a variable
df1 <- data.frame(x=1:3)
df1
x
1 1
2 2
3 3
In the next step I would like to repeat every row from the df with the observation of df1
So that it would look like this
a b
1 1 a
2 2 b
3 2 b
4 3 c
5 3 c
6 3 c
If you have any idea how to solve this problem, I would be very thankful
You simply can repeat the index like:
df[rep(1:3,df1$x),]
# a b
#1 1 a
#2 2 b
#2.1 2 b
#3 3 c
#3.1 3 c
#3.2 3 c
or not fixed to size 3
df[rep(seq_along(df1$x),df1$x),]

Repeat elements of data.frame [duplicate]

This question already has answers here:
Repeat rows of a data.frame [duplicate]
(10 answers)
Closed 7 years ago.
This seems to be a fairly simple problem but I can't find a simple solution:
I want to repeat a data.frame (i) several times as follows:
My initial data.frame:
i <- data.frame(c("A","A","A","B","B","B","C","C","C"))
i
Printing i results in:
1 A
2 A
3 A
4 B
5 B
6 B
7 C
8 C
9 C
How I want to repeat the elements (The numbers on the first column is just for easy understanding/viewing)
i
1 A
2 A
3 A
4 B
5 B
6 B
7 C
8 C
9 C
1 A
2 A
3 A
4 B
5 B
6 B
7 C
8 C
9 C
I tried doing it using:
i[rep(seq_len(nrow(i)), each=2),]
but it provides me output as such (The numbers on the first column is just for easy understanding/viewing):
1 A
2 A
3 A
1 A
2 A
3 A
4 B
5 B
6 B
4 B
5 B
6 B
7 C
8 C
9 C
7 C
8 C
9 C
Please help!
Not sure if this solves your problem, but to obtain the desired output You could simply repeat the entire sequence:
i <- c("A","A","A","B","B","B","C","C","C")
i2 <- rep(i,2)
#> i2
# [1] "A" "A" "A" "B" "B" "B" "C" "C" "C" "A" "A" "A" "B" "B" "B" "C" "C" "C"
Since you're dealing with a data frame, you could use a slightly modified variant:
i <- data.frame(c("A","A","A","B","B","B","C","C","C"))
i2 <- rep(i[,1],2)
You could use rbind(i, i). Does that work?
If you are working with a data frame, this code will work fine too:
i[rep(1:nrow(i), 5), ,drop=F]

split a dataframe with numbers separated by the add sign '+' into new rows [duplicate]

This question already has answers here:
Split comma-separated strings in a column into separate rows
(6 answers)
Closed 6 years ago.
Sorry for the naive question but I have a dataframe like this:
n sp cap
1 1 a 3
2 2 b 3+2+4
3 3 c 2
4 4 d 1+5
I need to split the numbers separated by the add sign ("+") into new rows in order to the get a new dataframe like this below:
n sp cap
1 1 a 3
2 2 b 3
3 2 b 2
4 2 b 4
5 3 c 2
6 4 d 1
7 4 d 5
How can I do that? strsplit?
thanks in advance
We could use cSplit from splitstackshape
library(splitstackshape)
cSplit(df1, 'cap', sep="+", 'long')
# n sp cap
#1: 1 a 3
#2: 2 b 3
#3: 2 b 2
#4: 2 b 4
#5: 3 c 2
#6: 4 d 1
#7: 4 d 5
Or could do this in base R. Use strsplit to split the elements of "cap" column to substrings, which returns a list (lst), Replicate the rows of dataset by the length of each list element, subset the dataset based on the new index, convert the "lst" elements to "numeric", unlist, and cbind with the modified dataset.
lst <- strsplit(as.character(df1$cap), "[+]")
df2 <- cbind(df1[rep(1:nrow(df1), sapply(lst, length)),1:2],
cap= unlist(lapply(lst, as.numeric)))

Resources