After running kCliques in RBGL, I have a list comprised of cliques and their members.
I wish to construct a member-by-clique matrix from the list object created by kCliques.
As an example:
con <- file(system.file("XML/snacliqueex.gxl",package="RBGL"))
coex <- fromGXL(con)
close(con)
kcl <- kCliques(coex)
which results in
kcl<-structure(list(`1-cliques` = list(c("1", "2", "3"), c("2", "4"),
c("3", "5"), c("4", "6"), c("5", "6")), `2-cliques` = list(
c("1", "2", "3", "4", "5"), c("2", "3", "4", "5", "6")),
`3-cliques` = list(c("1", "2", "3", "4", "5", "6"))),
.Names = c("1-cliques", "2-cliques", "3-cliques"))
kcl is a list where elements are character vectors indicating clique members.
I wish to construct a member-by-clique matrix where cell i,j indicates whether node i is a member of clique j.
Here are some transformations that should work
#remove one level of nesting
x <- do.call("c", kcl);
#assign number to each cliqeq
xx <- do.call("rbind", Map(function(x,y) data.frame(x,y), x, seq_along(x)));
#track participation
xtabs(~x+y, xx)
which gives
y
x 1 2 3 4 5 6 7 8
1 1 0 0 0 0 1 0 1
2 1 1 0 0 0 1 1 1
3 1 0 1 0 0 1 1 1
4 0 1 0 1 0 1 1 1
5 0 0 1 0 1 1 1 1
6 0 0 0 1 1 0 1 1
Related
Taking this dummy data for example
structure(list(Metastasis_Brain = c("1", "1", "0", "1", "0",
"0"), Metastasis_Liver = c("0", "0", "1", "1", "1", "0"), Metastasis_Bone = c("1",
"1", "0", "1", "1", "0")), class = "data.frame", row.names = c("Patient_1",
"Patient_2", "Patient_3", "Patient_4", "Patient_5", "Patient_6"
))
Example of what I'm searching for: If there is 1 in columns Metastasis_Brain and Metastasis_Liver, the new column will contain "Brain, Liver".
If all three tissues are 1, then that row in the new column will contain "Brain, Liver, Bone".
If all are 0, then it doesn't matter, NA would be fine.
Using tidyverse:
library(tidyverse)
df %>%
rownames_to_column() %>%
left_join(pivot_longer(.,-rowname, names_prefix = '.*_') %>%
filter(value>0) %>%
group_by(rowname) %>%
summarise(nm = toString(name)))
rowname Metastasis_Brain Metastasis_Liver Metastasis_Bone nm
1 Patient_1 1 0 1 Brain, Bone
2 Patient_2 1 0 1 Brain, Bone
3 Patient_3 0 1 0 Liver
4 Patient_4 1 1 1 Brain, Liver, Bone
5 Patient_5 0 1 1 Liver, Bone
6 Patient_6 0 0 0 <NA>
in Base R you could do:
aggregate(ind~rn, subset(transform(stack(df),
ind = sub('.*_', '', ind), rn = rownames(df)), values>0), toString)
rn ind
1 Patient_1 Brain, Bone
2 Patient_2 Brain, Bone
3 Patient_3 Liver
4 Patient_4 Brain, Liver, Bone
5 Patient_5 Liver, Bone
base
df <- data.frame(
stringsAsFactors = FALSE,
row.names = c("Patient_1","Patient_2","Patient_3","Patient_4","Patient_5","Patient_6"),
Metastasis_Brain = c("1", "1", "0", "1", "0", "0"),
Metastasis_Liver = c("0", "0", "1", "1", "1", "0"),
Metastasis_Bone = c("1", "1", "0", "1", "1", "0"),
res = c("Brain, Bone","Brain, Bone",
"Liver","Brain, Liver, Bone","Liver, Bone",NA)
)
df$res <- sapply(apply(df, 1, function(x) gsub("Metastasis_", "", names(df)[x == 1])), toString)
df
#> Metastasis_Brain Metastasis_Liver Metastasis_Bone res
#> Patient_1 1 0 1 Brain, Bone
#> Patient_2 1 0 1 Brain, Bone
#> Patient_3 0 1 0 Liver
#> Patient_4 1 1 1 Brain, Liver, Bone
#> Patient_5 0 1 1 Liver, Bone
#> Patient_6 0 0 0 NA
Created on 2022-06-20 by the reprex package (v2.0.1)
I'm triying to generate a new variable using multiple conditionals that evaluate against factor variables.
So, let's say I got this factor variables data.frame
x<-c("1", "2", "1","NA", "1", "2", "NA", "1", "2", "2", "NA" )
y<-c("1","NA", "2", "1", "1", "NA", "2", "1", "2", "1", "1" )
z<-c("1", "2", "3", "4", "1", "2", "3", "4", "1", "2", "3")
w<- c("01", "02", "03", "04","05", "06", "07", "01", "02", "03", "04")
df<-data.frame(x,y,z,w)
df$x<-as.factor(df$x)
df$y<-as.factor(df$y)
df$z<-as.factor(df$z)
df$w<-as.factor(df$w)
str(df)
So I need to get a new v colum on my dataframe which takes values between 1, 0 or NA with the following conditionals:
Takes value 1 if: x = "1", y = "1", z = "1" or "2", w = "01" to "06"
Takes value 0 if it doesn't meet at least one of the conditionals.
Takes value NA if any of x, y, z, or w is NA.
Had tried using a pipe %>% along mutate and case_when but have been unable to make it work.
So my desired result would be a new column v in df which would look like this:
[1] 1 NA 0 NA 1 NA NA 0 0 0 NA
Here I also use mutate with case_when. Since the NA in your dataset is of character "NA" (literal string of "NA"), we cannot use function like is.na() to idenify it. Would recommend to change it to "real" NA (by removing double quotes in your input).
As I've pointed out in the comment, I'm not sure why the eighth entry is "1" when the corresponding z is not "1" or "2".
library(dplyr)
df %>% mutate(v = case_when(x == "1" & y == "1" & z %in% c("1", "2") & w %in% paste0(0, seq(1:6)) ~ "1",
x == "NA" | y == "NA" | z == "NA" | w == "NA" ~ NA_character_,
T ~ "0"))
x y z w v
1 1 1 1 01 1
2 2 NA 2 02 <NA>
3 1 2 3 03 0
4 NA 1 4 04 <NA>
5 1 1 1 05 1
6 2 NA 2 06 <NA>
7 NA 2 3 07 <NA>
8 1 1 4 01 0
9 2 2 1 02 0
10 2 1 2 03 0
11 NA 1 3 04 <NA>
This question already has answers here:
Select first row in each contiguous run by group
(4 answers)
Closed 1 year ago.
I have data with a grouping variable (ID) and some values (type):
ID <- c("1", "1", "1", "1", "2", "2", "2", "2", "3", "3", "3", "3")
type <- c("1", "3", "3", "2", "3", "3", "1", "1", "1", "2", "2", "1")
dat <- data.frame(ID,type)
Within each ID, I want to delete the repeated number, not the unique one but the one the same as the previous one. I have annotated some examples:
# ID type
# 1 1 1
# 2 1 3 # first value in a run of 3s within ID 1: keep
# 3 1 3 # 2nd value: remove
# 4 1 2
# 5 2 3
# 6 2 3
# 7 2 1
# 8 2 1
# 9 3 1
# 10 3 2 # first value in a run of 2s within ID 3: keep
# 11 3 2 # 2nd value: remove
# 12 3 1
For example, ID 3 have the sequence of values 1, 2, 2, 1. The third value is the same as the second value, so it should be deleted, to become 1,2,1
Thus, the desired output is:
data.frame(ID = c("1", "1", "1", "2", "2", "3", "3", "3"),
type = c("1", "3", "2", "3", "1", "1", "2", "1"))
ID type
1 1 1
2 1 3
3 1 2
4 2 3
5 2 1
6 3 1
7 3 2
8 3 1
I've tried
df[!duplicated(df), ]
however what I got was
ID <- c("1", "1", "1", "2", "2", "3", "3")
type<- c("1", "3", "2", "3", "1", "1", "2")
I know duplicated would only keep the unique one. how can I get the values I want?
Thanks for the help in advance!
Does this work:
library(dplyr)
dat %>% group_by(ID) %>%
mutate(flag = case_when(type == lag(type) ~ TRUE, TRUE ~ FALSE)) %>%
filter(!flag) %>% select(-flag)
# A tibble: 8 x 2
# Groups: ID [3]
ID type
<chr> <chr>
1 1 1
2 1 3
3 1 2
4 2 3
5 2 1
6 3 1
7 3 2
8 3 1
Using data.table rleid and duplicated -
library(data.table)
setDT(dat)[!duplicated(rleid(ID, type))]
# ID type
#1: 1 1
#2: 1 3
#3: 1 2
#4: 2 3
#5: 2 1
#6: 3 1
#7: 3 2
#8: 3 1
Improved answer including suggestion from #Henrik.
Base R way If you want to eliminate consecutive duplicate rows only (8 rows output)
ID <- c("1", "1", "1", "1", "2", "2", "2", "2", "3", "3", "3", "3")
type<- c("1", "3", "3", "2", "3", "3", "1", "1", "1", "2", "2", "1")
dat <- data.frame(ID,type)
subset(dat, !duplicated(with(rle(paste(dat$ID, dat$type)), rep(seq_len(length(lengths)), lengths))))
#> ID type
#> 1 1 1
#> 2 1 3
#> 4 1 2
#> 5 2 3
#> 7 2 1
#> 9 3 1
#> 10 3 2
#> 12 3 1
Created on 2021-05-22 by the reprex package (v2.0.0)
This question already has answers here:
R semicolon delimited a column into rows
(3 answers)
Closed 6 years ago.
Can anyone please help with this little data.frame expansion problem?
Thanks in advance!
# I have
data.frame(rbind(c("1", "2", "3", "a/b/c"),
c("11", "0", "5", "c/d"),
c("3", "33", "0", "a"))
)
# X1 X2 X3 X4
# 1 1 2 3 a/b/c
# 2 11 0 5 c/d
# 3 3 33 0 a
# I want
data.frame(rbind(c("1", "2", "3", "a"),
c("1", "2", "3", "b"),
c("1", "2", "3", "c"),
c("11", "0", "5", "c"),
c("11", "0", "5", "d"),
c("3", "33", "0", "a"))
)
# X1 X2 X3 X4
# 1 1 2 3 a
# 2 1 2 3 b
# 3 1 2 3 c
# 4 11 0 5 c
# 5 11 0 5 d
# 6 3 33 0 a
We can use data.table
library(data.table)
setDT(df1)[, strsplit(as.character(X4), "/"), by = .(X1, X2, X3)]
I have a data frame that I need to reshape, transforming repeated values in a single column into a single row with several data columns. I know this should be simple but I can't figure out how to do this, and which of the many reshape/cast functions available I need to use.
Part of my data looks like this:
Source ID info
1 In 842701 1
2 Out 842701 1
3 In 21846591 2
4 Out 21846591 2
5 In 22181760 3
6 In 39338740 4
7 Out 9428 5
I want to make it look like this:
ID In Out info
1 842701 1 1 1
2 21846591 1 1 2
3 22181760 1 0 3
4 39338740 1 0 4
5 9428 0 1 5
and so on, while preserving all the remaining columns (which are identical for a given entry).
I would really appreciate some help. TIA.
Here is a way using reshape2
library(reshape2)
res <- dcast(transform(df, indx=1, ID=factor(ID, levels=unique(ID))),
ID~Source, value.var="indx", fill=0)
res
# ID In Out
#1 842701 1 1
#2 21846591 1 1
#3 22181760 1 0
#4 39338740 1 0
#5 9428 0 1
Or
res1 <- as.data.frame.matrix(table(transform(df,
ID=factor(ID, levels=unique(ID)))[,2:1]))
Update
dcast(transform(df1, indx=1, ID=factor(ID, levels=unique(ID))),
...~Source, value.var="indx", fill=0)
# ID info In Out
#1 842701 1 1 1
#2 21846591 2 1 1
#3 22181760 3 1 0
#4 39338740 4 1 0
#5 9428 5 0 1
You could also use reshape from base R
res2 <- reshape(transform(df1, indx=1), idvar=c("ID", "info"),
timevar="Source", direction="wide")
res2[,3:4][is.na(res2)[,3:4]] <- 0
res2
# ID info indx.In indx.Out
#1 842701 1 1 1
#3 21846591 2 1 1
#5 22181760 3 1 0
#6 39338740 4 1 0
#7 9428 5 0 1
data
df <- structure(list(Source = c("In", "Out", "In", "Out", "In", "In",
"Out"), ID = c(842701L, 842701L, 21846591L, 21846591L, 22181760L,
39338740L, 9428L)), .Names = c("Source", "ID"), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7"))
df1 <- structure(list(Source = c("In", "Out", "In", "Out", "In", "In",
"Out"), ID = c(842701L, 842701L, 21846591L, 21846591L, 22181760L,
39338740L, 9428L), info = c(1L, 1L, 2L, 2L, 3L, 4L, 5L)), .Names = c("Source",
"ID", "info"), class = "data.frame", row.names = c("1", "2",
"3", "4", "5", "6", "7"))