group cases by shared values in r [duplicate] - r

This question already has answers here:
R: define distinct pattern from values of multiple variables [duplicate]
(3 answers)
Closed 5 years ago.
I have a dataset like this:
case x y
1 4 5
2 4 5
3 8 9
4 7 9
5 6 3
6 6 3
I would like to create a grouping variable.
This variable should have the same values when both x and y are the same.
I do not care what this value is but it is to group them. Because in my dataset if x and y are the same for two cases they are probably part of the same organization. I want to see which organizations there are.
So my preferred dataset would look like this:
case x y org
1 4 5 1
2 4 5 1
3 8 9 2
4 7 9 3
5 6 3 4
6 6 3 4
How would I have to program this in R?

As you said , I do not care what this value is, you can just do following
dt$new=as.numeric(as.factor(paste(dt$x,dt$y)))
dt
case x y new
1 1 4 5 1
2 2 4 5 1
3 3 8 9 4
4 4 7 9 3
5 5 6 3 2
6 6 6 3 2

A solution from dplyr using the group_indices.
library(dplyr)
dt2 <- dt %>%
mutate(org = group_indices(., x, y))
dt2
case x y org
1 1 4 5 1
2 2 4 5 1
3 3 8 9 4
4 4 7 9 3
5 5 6 3 2
6 6 6 3 2
If the group numbers need to be in order, we can use the rleid from the data.table package after we create the org column as follows.
library(dplyr)
library(data.table)
dt2 <- dt %>%
mutate(org = group_indices(., x, y)) %>%
mutate(org = rleid(org))
dt2
case x y org
1 1 4 5 1
2 2 4 5 1
3 3 8 9 2
4 4 7 9 3
5 5 6 3 4
6 6 6 3 4
Update
Here is how to arrange the columns in dplyr.
library(dplyr)
dt %>%
arrange(x)
case x y
1 1 4 5
2 2 4 5
3 5 6 3
4 6 6 3
5 4 7 9
6 3 8 9
We can also do this for more than one column, such as arrange(x, y) or use desc to reverse the oder, like arrange(desc(x)).
DATA
dt <- read.table(text = " case x y
1 4 5
2 4 5
3 8 9
4 7 9
5 6 3
6 6 3",
header = TRUE)

Related

adding rank of size column in r [duplicate]

This question already has answers here:
How to create a consecutive group number
(13 answers)
Closed 1 year ago.
I have these set of variables in the column Num I want to create another column that ranks them with size similar to rankt below but I don't like how this is done.
x <- data.frame("Num" = c(2,5,2,7,7,7,2,5,5))
x$rankt <- rank(x$Num)
Num rankt
1 2 2
2 5 5
3 2 2
4 7 8
5 7 8
6 7 8
7 2 2
8 5 5
9 5 5
Desired Outcome I would like for rankt
Num rankt
1 2 1
2 5 2
3 2 1
4 7 3
5 7 3
6 7 3
7 2 1
8 5 2
9 5 2
Well, a crude approach is to turn them to factors, which are just increasing numbers with labels, and then fetch those numbers:
x <- data.frame("Num" = c(2,5,2,7,7,7,2,5,5))
x$rankt <- as.numeric(as.factor( rank(x$Num) ))
x
It produces:
Num rankt
1 2 1
2 5 2
3 2 1
4 7 3
5 7 3
6 7 3
7 2 1
8 5 2
9 5 2
A solution with dplyr
library(dplyr)
x1 <- x %>%
mutate(rankt=dense_rank(desc(-Num)))

create a column in R with data from two other columns [duplicate]

This question already has answers here:
Replace a value NA with the value from another column in R
(5 answers)
Closed 3 years ago.
I don't have the slightest idea of programming, but I need to solve the following problem in R.
Let's suppose I have this data:
x y
5 8
6 5
2
9 8
4
0
6 6
7 3
3 2
I need to create a third column called "z" containing the data of "y" exccept for the missing values where it should have the values of "x". It would be something like this:
x y z
5 8 8
6 5 5
2 2
9 8 8
4 4
0 0
6 6 6
7 3 3
3 2 2
dat <- data.frame(x=c(5,6,2,9,4,0,6,7,3), y = c(8,5,NA,8,NA,NA,6,3,2))
library(tidyverse)
dat %>% mutate(z = ifelse(is.na(y), x, y))
# x y z
# 1 5 8 8
# 2 6 5 5
# 3 2 NA 2
# 4 9 8 8
# 5 4 NA 4
# 6 0 NA 0
# 7 6 6 6
# 8 7 3 3
# 9 3 2 2

Reshaping different variables for selecting values from one column in R

Below, a sample of my data, I have more Rs and Os.
A R1 O1 R2 O2 R3 O3
1 3 3 5 3 6 4
2 3 3 5 4 7 4
3 4 4 5 5 6 5
I want to get the following data
A R O Value
1 3 1 3
1 5 2 3
1 6 3 4
2 3 1 3
2 5 2 4
2 7 3 4
3 4 1 4
3 5 2 5
3 6 3 5
I try the melt function, but I was unsuccessful. Any help would be very much appreciated.
A solution using dplyr and tidyr. The key is to use gather to collect all the columns other than A, and the use extract to split the column, and then use spread to convert the data frame back to wide format.
library(dplyr)
library(tidyr)
dt2 <- dt %>%
gather(Column, Number, -A) %>%
extract(Column, into = c("Column", "ID"), regex = "([A-Z]+)([0-9]+)") %>%
spread(Column, Number) %>%
select(A, R, O = ID, Value = O)
dt2
# A R O Value
# 1 1 3 1 3
# 2 1 5 2 3
# 3 1 6 3 4
# 4 2 3 1 3
# 5 2 5 2 4
# 6 2 7 3 4
# 7 3 4 1 4
# 8 3 5 2 5
# 9 3 6 3 5
DATA
dt <- read.table(text = "A R1 O1 R2 O2 R3 O3
1 3 3 5 3 6 4
2 3 3 5 4 7 4
3 4 4 5 5 6 5",
header = TRUE)

Assign value to group based on condition in column

I have a data frame that looks like the following:
> df = data.frame(group = c(1,1,1,2,2,2,3,3,3),
date = c(1,2,3,4,5,6,7,8,9),
value = c(3,4,3,4,5,6,6,4,9))
> df
group date value
1 1 1 3
2 1 2 4
3 1 3 3
4 2 4 4
5 2 5 5
6 2 6 6
7 3 7 6
8 3 8 4
9 3 9 9
I want to create a new column that contains the date value per group that is associated with the value "4" from the value column.
The following data frame shows what I hope to accomplish.
group date value newValue
1 1 1 3 2
2 1 2 4 2
3 1 3 3 2
4 2 4 4 4
5 2 5 5 4
6 2 6 6 4
7 3 7 6 8
8 3 8 4 8
9 3 9 9 8
As we can see, group 1 has the newValue "2" because that is the date associated with the value "4". Similarly, group two has newValue 4 and group three has newValue 8.
I assume there is an easy way to do this using ave() or a range of dplyr/data.table functions, but I have been unsuccessful with my many attempts.
Here's a quick data.table one
library(data.table)
setDT(df)[, newValue := date[value == 4L], by = group]
df
# group date value newValue
# 1: 1 1 3 2
# 2: 1 2 4 2
# 3: 1 3 3 2
# 4: 2 4 4 4
# 5: 2 5 5 4
# 6: 2 6 6 4
# 7: 3 7 6 8
# 8: 3 8 4 8
# 9: 3 9 9 8
Here's a similar dplyr version
library(dplyr)
df %>%
group_by(group) %>%
mutate(newValue = date[value == 4L])
Or a possible base R solution using merge after filtering the data (will need some renaming afterwards)
merge(df, df[df$value == 4, c("group", "date")], by = "group")
Here is a base R option
df$newValue = rep(df$date[which(df$value == 4)], table(df$group))
Another alternative using lapply
do.call(rbind, lapply(split(df, df$group),
function(x){x$newValue = rep(x$date[which(x$value == 4)],
each = length(x$group)); x}))
# group date value newValue
#1.1 1 1 3 2
#1.2 1 2 4 2
#1.3 1 3 3 2
#2.4 2 4 4 4
#2.5 2 5 5 4
#2.6 2 6 6 4
#3.7 3 7 6 8
#3.8 3 8 4 8
#3.9 3 9 9 8
One more base R path:
df$newValue <- ave(`names<-`(df$value==4,df$date), df$group, FUN=function(x) as.numeric(names(x)[x]))
df
group date value newValue
1 1 1 3 2
2 1 2 4 2
3 1 3 3 2
4 2 4 4 4
5 2 5 5 4
6 2 6 6 4
7 3 7 6 8
8 3 8 4 8
9 3 9 9 8
10 3 11 7 8
I used a test on variable length groups. I assigned the date column as the names for the logical index of value equal to 4. Then identify the value by group.
Data
df = data.frame(group = c(1,1,1,2,2,2,3,3,3,3),
date = c(1,2,3,4,5,6,7,8,9,11),
value = c(3,4,3,4,5,6,6,4,9,7))

How to replace the NA values after merge two data.frame? [duplicate]

This question already has answers here:
Replacing NAs with latest non-NA value
(21 answers)
Closed 7 years ago.
I have two data.frame as the following:
> a <- data.frame(x=c(1,2,3,4,5,6,7,8), y=c(1,3,5,7,9,11,13,15))
> a
x y
1 1 1
2 2 3
3 3 5
4 4 7
5 5 9
6 6 11
7 7 13
8 8 15
> b <- data.frame(x=c(1,5,7), z=c(2, 4, 6))
> b
x z
1 1 2
2 5 4
3 7 6
Then I use "join" for two data.frames:
> c <- join(a, b, by="x", type="left")
> c
x y z
1 1 1 2
2 2 3 NA
3 3 5 NA
4 4 7 NA
5 5 9 4
6 6 11 NA
7 7 13 6
8 8 15 NA
My requirement is to replace the NAs in the Z column by the last None-Na value before the current place. I want the result like this:
> c
x y z
1 1 1 2
2 2 3 2
3 3 5 2
4 4 7 2
5 5 9 4
6 6 11 4
7 7 13 6
8 8 15 6
This time (if your data is not too large) a loop is an elegant option:
for(i in which(is.na(c$z))){
c$z[i] = c$z[i-1]
}
gives:
> c
x y z
1 1 1 2
2 2 3 2
3 3 5 2
4 4 7 2
5 5 9 4
6 6 11 4
7 7 13 6
8 8 15 6
data:
library(plyr)
a <- data.frame(x=c(1,2,3,4,5,6,7,8), y=c(1,3,5,7,9,11,13,15))
b <- data.frame(x=c(1,5,7), z=c(2, 4, 6))
c <- join(a, b, by="x", type="left")
You might also want to check na.locf in the zoo package.

Resources