Assign value to group based on condition in column - r

I have a data frame that looks like the following:
> df = data.frame(group = c(1,1,1,2,2,2,3,3,3),
date = c(1,2,3,4,5,6,7,8,9),
value = c(3,4,3,4,5,6,6,4,9))
> df
group date value
1 1 1 3
2 1 2 4
3 1 3 3
4 2 4 4
5 2 5 5
6 2 6 6
7 3 7 6
8 3 8 4
9 3 9 9
I want to create a new column that contains the date value per group that is associated with the value "4" from the value column.
The following data frame shows what I hope to accomplish.
group date value newValue
1 1 1 3 2
2 1 2 4 2
3 1 3 3 2
4 2 4 4 4
5 2 5 5 4
6 2 6 6 4
7 3 7 6 8
8 3 8 4 8
9 3 9 9 8
As we can see, group 1 has the newValue "2" because that is the date associated with the value "4". Similarly, group two has newValue 4 and group three has newValue 8.
I assume there is an easy way to do this using ave() or a range of dplyr/data.table functions, but I have been unsuccessful with my many attempts.

Here's a quick data.table one
library(data.table)
setDT(df)[, newValue := date[value == 4L], by = group]
df
# group date value newValue
# 1: 1 1 3 2
# 2: 1 2 4 2
# 3: 1 3 3 2
# 4: 2 4 4 4
# 5: 2 5 5 4
# 6: 2 6 6 4
# 7: 3 7 6 8
# 8: 3 8 4 8
# 9: 3 9 9 8
Here's a similar dplyr version
library(dplyr)
df %>%
group_by(group) %>%
mutate(newValue = date[value == 4L])
Or a possible base R solution using merge after filtering the data (will need some renaming afterwards)
merge(df, df[df$value == 4, c("group", "date")], by = "group")

Here is a base R option
df$newValue = rep(df$date[which(df$value == 4)], table(df$group))
Another alternative using lapply
do.call(rbind, lapply(split(df, df$group),
function(x){x$newValue = rep(x$date[which(x$value == 4)],
each = length(x$group)); x}))
# group date value newValue
#1.1 1 1 3 2
#1.2 1 2 4 2
#1.3 1 3 3 2
#2.4 2 4 4 4
#2.5 2 5 5 4
#2.6 2 6 6 4
#3.7 3 7 6 8
#3.8 3 8 4 8
#3.9 3 9 9 8

One more base R path:
df$newValue <- ave(`names<-`(df$value==4,df$date), df$group, FUN=function(x) as.numeric(names(x)[x]))
df
group date value newValue
1 1 1 3 2
2 1 2 4 2
3 1 3 3 2
4 2 4 4 4
5 2 5 5 4
6 2 6 6 4
7 3 7 6 8
8 3 8 4 8
9 3 9 9 8
10 3 11 7 8
I used a test on variable length groups. I assigned the date column as the names for the logical index of value equal to 4. Then identify the value by group.
Data
df = data.frame(group = c(1,1,1,2,2,2,3,3,3,3),
date = c(1,2,3,4,5,6,7,8,9,11),
value = c(3,4,3,4,5,6,6,4,9,7))

Related

Nested Subseting

I have the following data frame
Library(dplyr)
ID <- c(1,1,1,2,2,2,2,3,3)
Tag <- c(1,2,6,1,3,4,6,4,3)
Value <- c(5,9,3,3,5,6,4,8,9)
DF <- data.frame(ID,Tag,Value)
ID Tag Value
1 1 1 5
2 1 2 9
3 1 6 3
4 2 1 3
5 2 3 5
6 2 4 6
7 2 6 4
8 3 4 8
9 3 3 9
I would like to perform the following 1) group by rows ID 2) assign the Value corresponding to a specific Tag a new column. In the following example, I am assigning the Value of Tag 6 to a new column by ID
ID Tag Value New_Value
1 1 1 5 3
2 1 2 9 3
3 1 6 3 3
4 2 1 3 4
5 2 3 5 4
6 2 4 6 4
7 2 6 4 4
8 3 4 8 NA
9 3 3 9 NA
To the best of my knowledge, I need to subset the data in each group to get the Value for Tag 6. Here is my code and the error msg
DF %>% group_by(ID) %>% mutate(New_Value = select(filter(.,Tag==6),Value))
Adding missing grouping variables: `ID`
Error: Column `New_Value` is of unsupported class data.frame
Another possible solution is to create a new dataframe with IDs and Values for Tag 6 and join it with DF. However, I believe there is a better generic solution by only using dplyr.
I would appreciate it if you can help me understand how to perform a nested subset in this situation
Thank you
On the assumption that Tag is unique within groups, you could do:
library(dplyr)
DF %>%
group_by(ID) %>%
mutate(New_Value = ifelse(any(Tag == 6), Value[Tag == 6], NA))
# A tibble: 9 x 4
# Groups: ID [3]
ID Tag Value New_Value
<dbl> <dbl> <dbl> <dbl>
1 1 1 5 3
2 1 2 9 3
3 1 6 3 3
4 2 1 3 4
5 2 3 5 4
6 2 4 6 4
7 2 6 4 4
8 3 4 8 NA
9 3 3 9 NA

Reshaping different variables for selecting values from one column in R

Below, a sample of my data, I have more Rs and Os.
A R1 O1 R2 O2 R3 O3
1 3 3 5 3 6 4
2 3 3 5 4 7 4
3 4 4 5 5 6 5
I want to get the following data
A R O Value
1 3 1 3
1 5 2 3
1 6 3 4
2 3 1 3
2 5 2 4
2 7 3 4
3 4 1 4
3 5 2 5
3 6 3 5
I try the melt function, but I was unsuccessful. Any help would be very much appreciated.
A solution using dplyr and tidyr. The key is to use gather to collect all the columns other than A, and the use extract to split the column, and then use spread to convert the data frame back to wide format.
library(dplyr)
library(tidyr)
dt2 <- dt %>%
gather(Column, Number, -A) %>%
extract(Column, into = c("Column", "ID"), regex = "([A-Z]+)([0-9]+)") %>%
spread(Column, Number) %>%
select(A, R, O = ID, Value = O)
dt2
# A R O Value
# 1 1 3 1 3
# 2 1 5 2 3
# 3 1 6 3 4
# 4 2 3 1 3
# 5 2 5 2 4
# 6 2 7 3 4
# 7 3 4 1 4
# 8 3 5 2 5
# 9 3 6 3 5
DATA
dt <- read.table(text = "A R1 O1 R2 O2 R3 O3
1 3 3 5 3 6 4
2 3 3 5 4 7 4
3 4 4 5 5 6 5",
header = TRUE)

group cases by shared values in r [duplicate]

This question already has answers here:
R: define distinct pattern from values of multiple variables [duplicate]
(3 answers)
Closed 5 years ago.
I have a dataset like this:
case x y
1 4 5
2 4 5
3 8 9
4 7 9
5 6 3
6 6 3
I would like to create a grouping variable.
This variable should have the same values when both x and y are the same.
I do not care what this value is but it is to group them. Because in my dataset if x and y are the same for two cases they are probably part of the same organization. I want to see which organizations there are.
So my preferred dataset would look like this:
case x y org
1 4 5 1
2 4 5 1
3 8 9 2
4 7 9 3
5 6 3 4
6 6 3 4
How would I have to program this in R?
As you said , I do not care what this value is, you can just do following
dt$new=as.numeric(as.factor(paste(dt$x,dt$y)))
dt
case x y new
1 1 4 5 1
2 2 4 5 1
3 3 8 9 4
4 4 7 9 3
5 5 6 3 2
6 6 6 3 2
A solution from dplyr using the group_indices.
library(dplyr)
dt2 <- dt %>%
mutate(org = group_indices(., x, y))
dt2
case x y org
1 1 4 5 1
2 2 4 5 1
3 3 8 9 4
4 4 7 9 3
5 5 6 3 2
6 6 6 3 2
If the group numbers need to be in order, we can use the rleid from the data.table package after we create the org column as follows.
library(dplyr)
library(data.table)
dt2 <- dt %>%
mutate(org = group_indices(., x, y)) %>%
mutate(org = rleid(org))
dt2
case x y org
1 1 4 5 1
2 2 4 5 1
3 3 8 9 2
4 4 7 9 3
5 5 6 3 4
6 6 6 3 4
Update
Here is how to arrange the columns in dplyr.
library(dplyr)
dt %>%
arrange(x)
case x y
1 1 4 5
2 2 4 5
3 5 6 3
4 6 6 3
5 4 7 9
6 3 8 9
We can also do this for more than one column, such as arrange(x, y) or use desc to reverse the oder, like arrange(desc(x)).
DATA
dt <- read.table(text = " case x y
1 4 5
2 4 5
3 8 9
4 7 9
5 6 3
6 6 3",
header = TRUE)

subset function in R with more than one conditions [duplicate]

I have this data.frame:
a <- c(rep("1", 3), rep("2", 3), rep("3",3), rep("4",3), rep("5",3))
b <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15)
df <-data.frame(a,b)
a b
1 1 1
2 1 2
3 1 3
4 2 4
5 2 5
6 2 6
7 3 7
8 3 8
9 3 9
10 4 10
11 4 11
12 4 12
13 5 13
14 5 14
15 5 15
I want to have something like this:
a <- c(rep("2", 3), rep("3", 3))
b <- c(4,5,6,7,8,9)
dffinal<-data.frame(a,b)
a b
1 2 4
2 2 5
3 2 6
4 3 7
5 3 8
6 3 9
I could use the "subset" function, but its not working
sub <- subset(df,c(2,3) == a )
a b
5 2 5
8 3 8
This command only takes one row of "2" and "3" in column "a".
Any Help?
You're confusing == with %in%:
subset(df, a %in% c(2,3))
# a b
# 4 2 4
# 5 2 5
# 6 2 6
# 7 3 7
# 8 3 8
# 9 3 9
what about this?
library(dplyr)
df %>% filter(a == 2 | a==3)
a b
1 2 4
2 2 5
3 2 6
4 3 7
5 3 8
6 3 9
We can use data.table. We convert the 'data.frame' to 'data.table' (setDT(df)), and set the 'key' as column 'a', then we subset the rows.
library(data.table)
setDT(df, key= 'a')[c('2','3')]
# a b
#1: 2 4
#2: 2 5
#3: 2 6
#4: 3 7
#5: 3 8
#6: 3 9

remove i+1th term if reoccuring

Say we have the following data
A <- c(1,2,2,2,3,4,8,6,6,1,2,3,4)
B <- c(1,2,3,4,5,1,2,3,4,5,1,2,3)
data <- data.frame(A,B)
How would one write a function so that for A, if we have the same value in the i+1th position, then the reoccuring row is removed.
Therefore the output should like like
data.frame(c(1,2,3,4,8,6,1,2,3,4), c(1,2,5,1,2,3,5,1,2,3))
My best guess would be using a for statement, however I have no experience in these
You can try
data[c(TRUE, data[-1,1]!= data[-nrow(data), 1]),]
Another option, dplyr-esque:
library(dplyr)
dat1 <- data.frame(A=c(1,2,2,2,3,4,8,6,6,1,2,3,4),
B=c(1,2,3,4,5,1,2,3,4,5,1,2,3))
dat1 %>% filter(A != lag(A, default=FALSE))
## A B
## 1 1 1
## 2 2 2
## 3 3 5
## 4 4 1
## 5 8 2
## 6 6 3
## 7 1 5
## 8 2 1
## 9 3 2
## 10 4 3
using diff, which calculates the pairwise differences with a lag of 1:
data[c( TRUE, diff(data[,1]) != 0), ]
output:
A B
1 1 1
2 2 2
5 3 5
6 4 1
7 8 2
8 6 3
10 1 5
11 2 1
12 3 2
13 4 3
Using rle
A <- c(1,2,2,2,3,4,8,6,6,1,2,3,4)
B <- c(1,2,3,4,5,1,2,3,4,5,1,2,3)
data <- data.frame(A,B)
X <- rle(data$A)
Y <- cumsum(c(1, X$lengths[-length(X$lengths)]))
View(data[Y, ])
row.names A B
1 1 1 1
2 2 2 2
3 5 3 5
4 6 4 1
5 7 8 2
6 8 6 3
7 10 1 5
8 11 2 1
9 12 3 2
10 13 4 3

Resources