R merging a dataframe with a vector - r

I have a dataframe df that looks like this:
indx adj_coords
1 1 2, 3, 4, 5, 6, 7
2 2 1, 3, 7, 8, 9, 10
3 3 1, 2, 4, 10, 11, 12
4 4 1, 3, 5, 12, 13, 14
5 5 1, 4, 6, 14, 15, 16
6 6 1, 5, 7, 16, 17, 18
I also have a vector vec that looks like this:
vec<-c(1,4,5,3,1)
I would like to get a dataframe of length 5 where each row has the adj_coords of the indx given in vec. It should look something like:
vec adj_coords
1 2, 3, 4, 5, 6, 7
4 1, 3, 5, 12, 13, 14
5 1, 4, 6, 14, 15, 16
3 1, 2, 4, 10, 11, 12
1 2, 3, 4, 5, 6, 7
After that I would like to sample adj_coords so that I have something like:
vec adj_coords sampled_adj_coords
1 2, 3, 4, 5, 6, 7 3
4 1, 3, 5, 12, 13, 14 5
5 1, 4, 6, 14, 15, 16 14
3 1, 2, 4, 10, 11, 12 11
1 2, 3, 4, 5, 6, 7 6

tried something for you... see if something similar you are looking for...
vec <- c(1,4,5,3,1)
vec <- data.frame("vec"=vec, indx=vec)
df <- structure(list(indx = 1:6, adj_coords = list(2:7, c(1L, 3L, 7L, 8L, 9L, 10L), c(1L, 2L, 4L, 10L, 11L, 12L), c(1L, 3L, 5L, 12L, 13L, 14L), c(1L, 4L, 6L, 14L, 15L, 16L), c(1L, 5L, 7L, 16L, 17L, 18L))), row.names = c(NA, 6L), class = "data.frame")
library(dplyr)
inner_join(vec, df, by = 'indx')
results:
vec indx adj_coords
1 1 1 2, 3, 4, 5, 6, 7
2 4 4 1, 3, 5, 12, 13, 14
3 5 5 1, 4, 6, 14, 15, 16
4 3 3 1, 2, 4, 10, 11, 12
5 1 1 2, 3, 4, 5, 6, 7
Just drop the column that is not needed...

Another option:
df <- df[vec,]
Output:
indx adj_coords
1 1 2, 3, 4, 5, 6, 7
4 4 1, 3, 5, 12, 13, 14
5 5 1, 4, 6, 14, 15, 16
3 3 1, 2, 4, 10, 11, 12
1.1 1 2, 3, 4, 5, 6, 7
For the random sample you can use this:
df$sampled_adj_coords <- apply(df[-1], 1, function(x) {sample(unlist(x), 1)})
Output:
indx adj_coords sampled_adj_coords
1 1 2, 3, 4, 5, 6, 7 2
4 4 1, 3, 5, 12, 13, 14 12
5 5 1, 4, 6, 14, 15, 16 4
3 3 1, 2, 4, 10, 11, 12 2
1.1 1 2, 3, 4, 5, 6, 7 3

Related

variable based on other variables in R

I have a df like this
my_df <- data.frame(
b1 = c(2, 6, 3, 6, 4, 2, 1, 9, NA),
b2 = c(100, 4, 106, 102, 6, 6, 1, 1, 7),
b3 = c(75, 79, 8, 0, 2, 3, 9, 5, 80),
b4 = c(NA, 6, NA, 10, 12, 8, 3, 6, 2),
b5 = c(2, 12, 1, 7, 8, 5, 5, 6, NA),
b6 = c(9, 2, 4, 6, 7, 6, 6, 7, 9),
b7 = c(1, 3, 7, 7, 4, 2, 2, 9, 5),
b8 = c(NA, 8, 4, 5, 1, 4, 1, 3, 6),
b9 = c(4, 5, 7, 9, 5, 1, 1, 2, NA),
b10 = c(14, 2, 4, 2, 1, 1, 1, 1, 5))
I want to create a new column (NEW) which says BLUE or RED based on columns b2 and b3. so, if column b2 is Greater than or equal to 100 0R b3 is Greater than or equal to 75, then input BLUE otherwise input RED.
So that I will have something like this:
my_df <- data.frame(
b1 = c(2, 6, 3, 6, 4, 2, 1, 9, NA),
b2 = c(100, 4, 106, 102, 6, 6, 1, 1, 7),
b3 = c(75, 79, 8, 0, 2, 3, 9, 5, 80),
b4 = c(NA, 6, NA, 10, 12, 8, 3, 6, 2),
b5 = c(2, 12, 1, 7, 8, 5, 5, 6, NA),
b6 = c(9, 2, 4, 6, 7, 6, 6, 7, 9),
b7 = c(1, 3, 7, 7, 4, 2, 2, 9, 5),
b8 = c(NA, 8, 4, 5, 1, 4, 1, 3, 6),
b9 = c(4, 5, 7, 9, 5, 1, 1, 2, NA),
b10 = c(14, 2, 4, 2, 1, 1, 1, 1, 5),
NEW = c("BLUE", "BLUE", "BLUE", "BLUE", "RED", "RED", "RED", "RED", "BLUE"))
I have been able to work this out using this:
library (tidyverse)
greater_threshold <- 99.9
greater_threshold1 <- 74.9
my_df1 <- my_df %>%
mutate(NEW = case_when(b2 > greater_threshold ~ "BLUE",
b3 > greater_threshold1 ~ "BLUE",
+ T~"RED"))
At the moment, you can see that I am setting my 'greater threshold' to be slightly less than the required value. Although it works well. My question is this. Is there a way I set set my 'greater threshold to be ≥ 100 for b2 and ≥ 75 for b3.
For this example, I'd go whit if_else instead of case_when:
library(dplyr)
greater_threshold <- 100
greater_threshold1 <- 75
my_df <- data.frame(
b1 = c(2, 6, 3, 6, 4, 2, 1, 9, NA),
b2 = c(100, 4, 106, 102, 6, 6, 1, 1, 7),
b3 = c(75, 79, 8, 0, 2, 3, 9, 5, 80),
b4 = c(NA, 6, NA, 10, 12, 8, 3, 6, 2),
b5 = c(2, 12, 1, 7, 8, 5, 5, 6, NA),
b6 = c(9, 2, 4, 6, 7, 6, 6, 7, 9),
b7 = c(1, 3, 7, 7, 4, 2, 2, 9, 5),
b8 = c(NA, 8, 4, 5, 1, 4, 1, 3, 6),
b9 = c(4, 5, 7, 9, 5, 1, 1, 2, NA),
b10 = c(14, 2, 4, 2, 1, 1, 1, 1, 5)
)
my_df1 <- my_df %>%
mutate(
NEW = if_else(
b2 >= greater_threshold | b3 >= greater_threshold1,
"BLUE",
"RED"
)
)
my_df1
# b1 b2 b3 b4 b5 b6 b7 b8 b9 b10 NEW
# 1 2 100 75 NA 2 9 1 NA 4 14 BLUE
# 2 6 4 79 6 12 2 3 8 5 2 BLUE
# 3 3 106 8 NA 1 4 7 4 7 4 BLUE
# 4 6 102 0 10 7 6 7 5 9 2 BLUE
# 5 4 6 2 12 8 7 4 1 5 1 RED
# 6 2 6 3 8 5 6 2 4 1 1 RED
# 7 1 1 9 3 5 6 2 1 1 1 RED
# 8 9 1 5 6 6 7 9 3 2 1 RED
# 9 NA 7 80 2 NA 9 5 6 NA 5 BLUE

Is there a way to select every other row in a dataframe? [duplicate]

This question already has answers here:
Selecting multiple odd or even columns/rows for dataframe
(5 answers)
Closed 2 years ago.
I have a dataframe where I want to keep only the odd rows, how would I go about doing that?
We can use a logical index as recycling vectot to return every other row from the original dataset
df1[c(TRUE, FALSE),]
Another option can be testing the module across all rows like this (I have used dummy data). The %% helps you to test the module with a value. As you want odd rows you can extract the rows which module different from two (1:nrow(d)%%2!=0). The 1:nrow(d) is a sequential index to evaluate the rows. Here the code:
#Code
dnew <- d[1:nrow(d)%%2!=0,]
Output:
col1 col2 col3 col4 col5
1 4 4 1 4 Y
3 6 3 3 2 N
5 3 3 3 3 N
7 5 5 5 2 Y
9 6 6 6 6 N
11 2 2 2 2 N
13 0 0 0 0 Y
15 6 6 6 3 N
17 9 1 9 8 N
Some data used:
#Data
d <- structure(list(col1 = c(4, 5, 6, 4, 3, 4, 5, 5, 6, 9, 2, 1, 0,
3, 6, 7, 9), col2 = c(4, 2, 3, 4, 3, 3, 5, 6, 6, 9, 2, 1, 0,
3, 6, 7, 1), col3 = c(1, 2, 3, 4, 3, 4, 5, 5, 6, 9, 2, 1, 0,
3, 6, 7, 9), col4 = c(4, 5, 2, 4, 3, 4, 2, 5, 6, 5, 2, 3, 0,
3, 3, 7, 8), col5 = structure(c(2L, 1L, 1L, 2L, 1L, 1L, 2L, 1L,
1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 1L), .Label = c("N", "Y"), class = "factor")), class = "data.frame", row.names = c(NA,
-17L))

Reducing dataframe if first 4 columns match

I have a dataframe that looks like this
> head(printing_id_map_unique_frames)
# A tibble: 6 x 5
# Groups: frame_number [6]
X1 X2 X3 row_in_frame frame_number
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 2 3 15 1
2 1 2 3 15 2
3 1 2 3 15 3
4 1 2 3 15 4
5 1 2 3 15 5
6 1 2 3 15 6
As you can see, X1,X2,X3, row_in_frame is identical
However, eventually you get to a
X1 X2 X3 row_in_frame frame_number
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 2 3 15 32
2 1 2 3 15 33
3 1 2 3 5 34**
4 1 4 5 15 35
5 1 4 5 15 36
What I would like to do is essentially compute a dataframe that looks like:
X1 X2 X3 row_in_frame num_duplicates
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 2 3 15 33
2 1 2 3 5 1
...
Essentially, what I want is to "collapse" over identical first 4 columns and count how many rows of that type there are in the "num_duplicates" column.
Is there a nice way to do this in dplyr without a messy for loop that tracks a count and if there is a change.
Below please find a full data structure via dput:
> dput(printing_id_map_unique_frames)
structure(list(X1 = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), X2 = c(2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4
), X3 = c(3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5), row_in_frame = c(15, 15, 15, 15,
15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15,
15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 5, 15, 15,
15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15,
15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 15, 5
), frame_number = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29,
30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45,
46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61,
62, 63, 64, 65, 66, 67, 68)), row.names = c(NA, -68L), class = c("tbl_df",
"tbl", "data.frame"))
Here is one option with count
library(dplyr) # 1.0.0
df1 %>%
count(!!! rlang::syms(names(.)[1:4]))
Or specify the unquoted column names
df1 %>%
count(X1, X2, X3, row_in_frame)
If we don't want to change the order, an option is to convert the first 4 columns to factor with levels specified as the unique values (which is the same as the order of occurrence of values) and then apply the count
df1 %>%
mutate(across(1:4, ~ factor(.x, levels = unique(.x)))) %>%
count(!!! rlang::syms(names(.)[1:4])) %>%
type.convert(as.is = TRUE)
# A tibble: 4 x 5
# X1 X2 X3 row_in_frame n
# <int> <int> <int> <int> <int>
#1 1 2 3 15 33
#2 1 2 3 5 1
#3 1 4 5 15 33
#4 1 4 5 5 1

R Tidyverse expand a dataframe with all combinations of two variables (edgelist)

This code generates a data frame just so:
library(tidyverse)
A = c(7, 4, 3, 12, 6)
B = c(1, 10, 9, 8, 5)
C = c(5, 3, 1, 7, 6)
df <- data_frame(A, B, C) %>% gather(letter1, rank)
nested <- df %>% group_by(letter1) %>% nest(ranks = c(rank))
nested
A grouped_df: 3 × 2
letter1 ranks
<chr> <list>
A 7, 4, 3, 12, 6
B 1, 10, 9, 8, 5
C 5, 3, 1, 7, 6
This is the desired data frame:
A tibble: 9 × 4
letter1 letter2 data1 data2
<chr> <chr> <list> <list>
A A 7, 4, 3, 12, 6 7, 4, 3, 12, 6
B A 1, 10, 9, 8, 5 7, 4, 3, 12, 6
C A 5, 3, 1, 7, 6 7, 4, 3, 12, 6
A B 7, 4, 3, 12, 6 1, 10, 9, 8, 5
B B 1, 10, 9, 8, 5 1, 10, 9, 8, 5
C B 5, 3, 1, 7, 6 1, 10, 9, 8, 5
A C 7, 4, 3, 12, 6 5, 3, 1, 7, 6
B C 1, 10, 9, 8, 5 5, 3, 1, 7, 6
C C 5, 3, 1, 7, 6 5, 3, 1, 7, 6
Once this step is solved, I'll run a mutate using data1 and data2 to get value, and then selecting letter1, letter2 and value will give an edgelist. I'm working with about 700 letters and the ranks lists will all be the same size and contain about 20 elements.
I'd expected to be able to use expand or expand.grid, but to no avail. Any tidyverse suggestions will be greatly appreciated.
crossing can be used
library(tidyr)
library(purrr)
library(dplyr)
crossing(ind1 = seq_len(nrow(nested)),
ind2 = seq_len(nrow(nested))) %>%
pmap_dfr(~ bind_cols(nested[..1,], nested[..2,]) )
We can use crossing after renaming the second dataframe.
tidyr::crossing(nested, setNames(nested, c('letter2', 'rank2')))
# letter1 ranks letter2 rank2
#1 A 7, 4, 3, 12, 6 A 7, 4, 3, 12, 6
#2 A 7, 4, 3, 12, 6 B 1, 10, 9, 8, 5
#3 A 7, 4, 3, 12, 6 C 5, 3, 1, 7, 6
#4 B 1, 10, 9, 8, 5 A 7, 4, 3, 12, 6
#5 B 1, 10, 9, 8, 5 B 1, 10, 9, 8, 5
#6 B 1, 10, 9, 8, 5 C 5, 3, 1, 7, 6
#7 C 5, 3, 1, 7, 6 A 7, 4, 3, 12, 6
#8 C 5, 3, 1, 7, 6 B 1, 10, 9, 8, 5
#9 C 5, 3, 1, 7, 6 C 5, 3, 1, 7, 6
The same is also valid for expand_grid.
tidyr::expand_grid(nested, setNames(nested, c('letter2', 'rank2')))

How to identify fully connected node clusters with igraph?

I'm trying to calculate the clusters of a network using igraph in R, where all nodes are connected. The plot seems to work OK, but then I'm not able to return the correct groupings from my clusters.
In this example, the plot shows 4 main clusters, but in the largest cluster, not all nodes are connected:
I would like to be able to return the following list of clusters from this graph object:
[[1]]
[1] 8 9
[[2]]
[1] 7 10
[[3]]
[1] 4 6 11
[[4]]
[1] 2 3 5
[[5]]
[1] 1 3 5 12
Example code:
library(igraph)
topology <- structure(list(N1 = c(1, 3, 5, 12, 2, 3, 5, 1, 2, 3, 5, 12, 4,
6, 11, 1, 2, 3, 5, 12, 4, 6, 11, 7, 10, 8, 9, 8, 9, 7, 10, 4,
6, 11, 1, 3, 5, 12), N2 = c(1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3,
3, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10,
11, 11, 11, 12, 12, 12, 12)), .Names = c("N1", "N2"), row.names = c(NA,
-38L), class = "data.frame")
g2 <- graph.data.frame(topology, directed=FALSE)
g3 <- simplify(g2)
plot(g3)
The cliques function gets me part of the way there:
tmp <- cliques(g3)
tmp
but, this list also gives groupings where not all nodes connect. For example, this clique includes the nodes 1,2,3,5 but 1 only connects to 3, and 2 only connects to 3 and 5, and 5 only connects to 2 :
topology[tmp[[31]],]
# N1 N2
#6 3 2
#7 5 2
#8 1 3
Thanks in advance for any help.
You could use maximal.cliques in the igraph package. See below.
# Load package
library(igraph)
# Load data
topology <- structure(list(N1 = c(1, 3, 5, 12, 2, 3, 5, 1, 2, 3, 5, 12, 4,
6, 11, 1, 2, 3, 5, 12, 4, 6, 11, 7, 10, 8, 9, 8, 9, 7, 10, 4,
6, 11, 1, 3, 5, 12), N2 = c(1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3,
3, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10,
11, 11, 11, 12, 12, 12, 12)), .Names = c("N1", "N2"), row.names = c(NA,
-38L), class = "data.frame")
# Get rid of loops and ensure right naming of vertices
g3 <- simplify(graph.data.frame(topology[order(topology[[1]]),],directed = FALSE))
# Plot graph
plot(g3)
# Calcuate the maximal cliques
maximal.cliques(g3)
# > maximal.cliques(g3)
# [[1]]
# [1] 9 8
#
# [[2]]
# [1] 10 7
#
# [[3]]
# [1] 2 3 5
#
# [[4]]
# [1] 6 4 11
#
# [[5]]
# [1] 12 1 5 3

Resources