I have a dataset that starts like this:
In dput it is
structure(list(20, TRUE, c(0, 0, 1, 1, 1, 1, 2, 3, 4, 4, 4, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 7, 7), c(8, 1, 0, 8, 9, 5,
8, 10, 10, 5, 7, 4, 11, 12, 6, 13, 14, 15, 16, 17, 18, 4, 5,
19, 4, 17), c(1, 0, 2, 5, 3, 4, 6, 7, 9, 10, 8, 11, 14, 12, 13,
15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25), c(2, 1, 11, 21,
24, 5, 9, 22, 14, 10, 0, 3, 6, 4, 7, 8, 12, 13, 15, 16, 17, 18,
19, 25, 20, 23), c(0, 2, 6, 7, 8, 11, 21, 24, 26, 26, 26, 26,
26, 26, 26, 26, 26, 26, 26, 26, 26), c(0, 1, 2, 2, 2, 5, 8, 9,
10, 13, 14, 16, 17, 18, 19, 20, 21, 22, 24, 25, 26), list(c(1,
0, 1), structure(list(), names = character(0)), list(name = c("1",
"3", "5", "6", "8", "9", "12", "19", "2", "4", "7", "10", "11",
"14", "15", "16", "17", "18", "20", "13")), list(`Number of messages` = c(157,
1058, 2481, 833, 178, 119, 66, 222, 20, 343, 3, 4991, 47, 11,
83, 26, 10, 19, 33, 84, 51, 589, 79, 37, 110, 55))), <environment>), class = "igraph")
so far I have the following codelines:
Datensatz <- read_xlsx("...")
Netzwerkgraph <- graph.data.frame(Datensatz[,1:3], directed = TRUE)
actors<-Datensatz$From
relations<-Datensatz$To
weight<-Datensatz$`Number of messages`
How can I calculate the following formula in R with my data set?
I´ve tried the following code
Function <- function(i,j,x,y,z){
i <- actors
j <- relations
w <- weight
for(i in 1:20)
print (-1/(cumsum 1:length(actors, i)(w,i+1))logb(x,base=2)*1/(cumsum 1:length(actors, i)*w,i+1))
}
It isn't entirely clear how you wish to apply the given formula to your example data set, that is, exactly what inputs you are using and what outputs you wish to achieve. Hence, it also isn't clear if the following approach will be sufficient for your purposes. Here is my interpretation thus far.
If one interprets each unique value in the "from" column as being a node i, then it appears that you wish to calculate the sum of messages to each j in the "to" column for each sender i in the "from" column. One approach might then be to calculate all such sums by sender first and then run them all through a simple function that accepts the sum along with some lambda constant.
I used a lambda value of "2" below arbitrarily for illustrative purposes. Additionally, while the formula references a time t, there does not appear to be a time component in your example data set; time isn't represented in this approach. The output would presumably represent the expression for each node at a single point in time.
#written in R version 4.2.1
require(data.table)
##Example data frame
df = data.frame(from = c(1,1,3,3,3), to = c(2,3,1,2,4),nm = c(157,1058,2481,833,178))
df = data.table(df)
df
from to nm
1: 1 2 157
2: 1 3 1058
3: 3 1 2481
4: 3 2 833
5: 3 4 178
##Calculate the sum of messages by sender in "from" column
nf = df[,sum(nm), by = from]
colnames(nf) = c("from","message_total")
nf
from message_total
1: 1 1215
2: 3 3492
## Function
## inputs to function are the total number of messages of a sender in
## "from" column (called cit) and some lambda constant
icit = function(cit,lambda = 2){
-(1/(cit + lambda))*log(1/((cit + lambda)), base = 2)
}
##Find vector of values for each sender in the data set
ans = NULL
for(i in 1:dim(nf)[1]){
ans[i] = icit(nf$message_total[i])
}
ans
[1] 0.008421622 0.003368822
Related
I have a data looks like this
df<- structure(list(14, FALSE, c(1, 2, 3, 4, 5, 7, 8, 9, 10, 11, 12,
13, 6), c(0, 0, 0, 0, 0, 6, 6, 6, 6, 6, 6, 6, 0), c(0, 1, 2,
3, 4, 12, 5, 6, 7, 8, 9, 10, 11), c(0, 1, 2, 3, 4, 12, 5, 6,
7, 8, 9, 10, 11), c(0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
12, 13), c(0, 6, 6, 6, 6, 6, 6, 13, 13, 13, 13, 13, 13, 13, 13
), list(c(1, 0, 1), structure(list(), names = character(0)),
list(name = c("Bestman", "Tera1", "Tera2", "Tera3", "Tera4",
"Tera5", "Tetra", "Brownie1", "Brownie2", "Brownie3", "Brownie4",
"Brownie5", "Brownie6", "Brownie7")), list()), <environment>), class = "igraph")
I am trying to make a list and assign the two core as root
I can easily do this
as_tbl_graph(df) %>%
activate(nodes) %>%
mutate(type = ifelse(name %in% c("Bestman", "Tetra"), "root", "branch")) %>%
mutate(group = ifelse(name == "Bestman" | grepl("Tera", name),
"Bestman", "Tera"))
when the number of core grows, this method does not work, for example if I have more and I do the following
for example when my data becomes like this
df2<-structure(list(28, FALSE, c(1, 2, 3, 4, 5, 6, 1, 2, 8, 7, 9,
10, 11, 7, 7, 13, 14, 15, 16, 17, 19, 20, 21, 22, 23, 24, 26,
27, 7, 12, 18, 25, 12, 18, 25, 18, 25, 25), c(0, 0, 0, 0, 0,
0, 0, 0, 7, 6, 7, 7, 7, 2, 1, 12, 12, 12, 12, 12, 18, 18, 18,
18, 18, 18, 25, 25, 0, 0, 0, 0, 7, 7, 7, 12, 12, 18), c(6, 0,
7, 1, 2, 3, 4, 5, 28, 14, 13, 9, 8, 10, 11, 12, 29, 32, 15, 16,
17, 18, 19, 30, 33, 35, 20, 21, 22, 23, 24, 25, 31, 34, 36, 37,
26, 27), c(6, 0, 7, 1, 2, 3, 4, 5, 28, 29, 30, 31, 14, 13, 9,
8, 10, 11, 12, 32, 33, 34, 15, 16, 17, 18, 19, 35, 36, 20, 21,
22, 23, 24, 25, 37, 26, 27), c(0, 0, 2, 4, 5, 6, 7, 8, 12, 13,
14, 15, 16, 18, 19, 20, 21, 22, 23, 26, 27, 28, 29, 30, 31, 32,
36, 37, 38), c(0, 12, 13, 14, 14, 14, 14, 15, 22, 22, 22, 22,
22, 29, 29, 29, 29, 29, 29, 36, 36, 36, 36, 36, 36, 36, 38, 38,
38), list(c(1, 0, 1), structure(list(), names = character(0)),
list(name = c("Bestman", "Tera1", "Tera2", "Tera3", "Tera4",
"Tera5", "Brownie2", "Tetra", "Brownie1", "Brownie3", "Brownie4",
"Brownie5", "trueG", "ckage1", "ckage2", "ckage3", "ckage4",
"ckage5", "Carowner", "Hoghet1", "Hoghet2", "Hoghet3", "Hoghet4",
"Hoghet5", "Hoghet6", "Bestwomen", "Esme2", "Esme3")), list()),
<environment>), class = "igraph")
as_tbl_graph(df2) %>%
activate(nodes) %>%
mutate(type = ifelse(name %in% c("Bestman", "Tetra", "trueG", "Carowner","Bestwomen"), "root", "branch")) %>%
mutate(group = ifelse(name == "Bestman" | grepl("Tetra", name) | grepl("trueG",name) | grepl("Carowner", name) | grepl("Bestwomen", name) , "Bestman", "Tetra","trueG","Carowner","Bestwomen" ))
I get error, I want to know what I am doing wrong here ?
Your second graph is more complex than your first. Some of the 'peripheral' nodes join more than one central node, so it is not clear how they should be labelled / colored. However, tidygraph has various grouping functions which can be used to assign the nodes to groups based on their connectivity, and the centrality of a node can be calculated automatically to help with labelling and sizing.
library(tidygraph)
library(ggraph)
df2 %>%
as_tbl_graph() %>%
activate(nodes) %>%
mutate(is_central = centrality_hub() > 0.6) %>%
mutate(group = factor(group_label_prop())) %>%
ggraph(layout = "igraph", algorithm = "nicely") +
geom_edge_link(width = 2, alpha = 0.1) +
geom_node_circle(aes(r = ifelse(is_central, nchar(name)/12, 0.1), fill = group),
color = NA) +
geom_node_text(aes(label = ifelse(is_central, name, '')), size = 5,
color = "gray40", family = "Roboto Condensed", fontface = 2) +
theme_graph() +
coord_equal() +
scale_fill_brewer(palette = "Pastel2", guide = "none")
ifelse only allows for two options, try using dplyr::case_when instead.
https://dplyr.tidyverse.org/reference/case_when.html
Update to add requested code:
mutate(group = dplyr::case_when(name == "Bestman" ~ "Bestman",
grepl("Tetra", name) ~ "Tetra",
grepl("trueG",name) ~ "trueG",
grepl("Carowner", name) ~ "Carowner",
grepl("Bestwomen", name) ~ "Bestwomen"))
I have a lot of financial trading data with around a million rows and I want to be able to condense this into a new data frame with a list of Unique UserIDs. I then want to be able to add up the "trades" for their account, with some conditions, ie if TransactionTypeId == 2 & AC_Type== 19. I would use a sumifs in excel for this but the size of the file means its pretty much impossible to run on my computer.
df<- structure(list(UserId = c(1, 1, 1, 1, 2,
2, 2, 3, 3, 3, 4, 5, 6,
6, 6, 7, 7, 7, 8, 8, 8,
8, 8, 9, 9, 9, 10, 11, 12,
12, 13, 13, 13, 14, 14, 15, 15,
16, 16, 16), TransactionTypeId = c(14, 1, 1, 70,
15, 1, 1, 14, 1, 1, 70, 14, 14, 1, 1, 14, 1, 1, 14, 1, 1, 1,
1, 14, 1, 1, 14, 14, 1, 1, 14, 1, 1, 1, 1, 70, 70, 14, 1, 1),
AC_Type = c(21, 21, 21, 21, 19, 19, 19, 19, 19, 19, 19, 19,
19, 19, 19, 21, 21, 21, 19, 19, 19, 19, 19, 19, 19, 19, 20,
19, 19, 19, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20), Trades = c(30,
30, 0.00067116, 0.00067115, 249, 249, 0.00533033, 48.75,
48.75, 0.00101298, 0.00533, 24.37, 146.25, 146.25, 0.00309109,
100.01, 100.01, 0.00233551, 97.5, 90, 0.00189134, 5, 0.00245851,
234, 234, 0.00500802, 100.01, 48.75, 48.5, 0.0275474, 24,
24, 0.00051975, 100, 0.00223998, 0.00051975, 0.00205, 9.75,
8.75, 0.00017811)), row.names = c(NA, -40L), class = c("tbl_df",
"tbl", "data.frame"))
You can take sum of the logical condition that you want to count.
library(dplyr)
df %>%
group_by(UserId) %>%
summarise(count = sum(Trades[TransactionTypeId == 2 & AC_Type== 19]))
Not quite sure what you want ...
libary(dplyr)
df %>%
group_by(UserId) %>%
filter(TransactionTypeId == 1 & AC_Type == 19) %>%
summarise(sum = sum(Trades))
# A tibble: 6 x 2
UserId sum
<dbl> <dbl>
1 2 249.
2 3 48.8
3 6 146.
4 8 95.0
5 9 234.
6 12 48.5
Here you first group_by UserId, then filterthose rows that meet your conditions (NB: I've changed 2to 1 as there aren't any 2s in the sample data), and finally summarise by summing up the values in Trades.
Using data.table
library(data.table)
setDT(df)[, .(count = sum(Trades[TransactionTypeId == 2 &
AC_Type== 19], na.rm = TRUE)), UserId]
I would like to perform a Quade test with more than one covariate in R. I know the command quade.test and I have seen the example below:
## Conover (1999, p. 375f):
## Numbers of five brands of a new hand lotion sold in seven stores
## during one week.
y <- matrix(c( 5, 4, 7, 10, 12,
1, 3, 1, 0, 2,
16, 12, 22, 22, 35,
5, 4, 3, 5, 4,
10, 9, 7, 13, 10,
19, 18, 28, 37, 58,
10, 7, 6, 8, 7),
nrow = 7, byrow = TRUE,
dimnames =
list(Store = as.character(1:7),
Brand = LETTERS[1:5]))
y
quade.test(y)
My question is as follows: how could I introduce more than one covariate? In this example the covariate is the Store variable.
I am using this command in R Studio to split the data present in one column:
CTE.info <- data.frame(strsplit(as.character(CTE$V11),'|',fixed=TRUE))
But, I am getting the error:
Error in data.frame("orderItems", "79542;2;24.000;24.000;5.310", "Credit;1;-15.000;-15.000;.000", :
arguments imply differing number of rows: 1, 11, 10, 3, 5, 4, 9, 2, 6, 7, 8, 12, 22, 13, 16, 14, 15, 19, 17, 20, 18, 28, 24
Could someone assist and let me know how can this be sorted?
You can make the length of the list element same and it should work.
lst <- strsplit(as.character(CTE$V11),'|',fixed=TRUE)
d1 <- data.frame(lapply(lst, `length<-`, max(lengths(lst))))
colnames(d1) <- paste0('V', seq_along(d1))
data
CTE <- data.frame(V11= c('a|b|c', 'a|b', 'a|b|c|d'))
I have a directed subgraph with all the nodes in a cycle (with 21 nodes and ~250 edges) and I want to know the order of how the nodes form the cycle.
I'm not familiar with graph algorithm. I thought about using the igraph::graph.dfs function to the original or reverse graph. And use the order or order.out returned as the order, but it didn't work.
The subgraph was a strongly connected components found with igraph::clusters
I've asked a similar question but the graph.get.subisomorphisms.vf2 takes too long to run in my case.
I'm thinking if I can get an ordered adjacency list like this, I may able to find the cycle starting from the longest list
But I can only get an unordered list using igraph::get.adjlist, I'd like to know if there's a way to get an ordered list like below.
And any suggestions to find the node order of the cycle?
Thanks in advance!
data
> dput(adjlist)
structure(list(`26` = c(2, 3, 4, 5, 6, 7, 8, 10, 11, 15, 16,
18, 19), `2` = c(1, 3, 4, 5, 6, 7, 8, 10, 15, 16, 18), `30` = c(1,
2, 4, 5, 6, 7, 8, 10, 11, 14, 15, 16, 17, 18, 19, 21), `25` = c(1,
2, 3, 5, 6, 7, 8, 9, 10, 11, 15, 16, 18, 21), `29` = c(1, 2,
3, 4, 6, 7, 8, 9, 10, 11, 15, 16, 18, 21), `9` = c(1, 2, 3, 4,
5, 7, 8, 10, 14, 15, 16, 18, 19), `27` = c(1, 2, 3, 4, 5, 6,
8, 14, 15, 18), `13` = c(3, 4, 5, 15), `14` = c(1, 2, 3, 4, 5,
6, 7, 8, 10, 11, 14, 15, 16, 18, 19, 21), `8` = c(1, 2, 3, 4,
5, 6, 7, 8, 14, 15, 16, 18), `23` = c(1, 2, 3, 4, 5, 6, 7, 8,
10, 14, 15, 16, 17, 18, 19), `20` = c(1, 2, 3, 4, 5, 6, 7, 8,
9, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21), `19` = c(1, 2,
3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 15, 16, 17, 18, 19, 21),
`17` = c(1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 15, 16, 17, 18,
21), `12` = c(3, 4, 5, 6, 8), `24` = c(4, 6, 7, 8, 9, 10,
11, 15), `21` = c(13, 14), `6` = c(2, 3, 4, 5, 6, 8, 10,
15), `28` = c(1, 7, 11, 16), `15` = c(1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 14, 15, 16, 17, 18, 19, 21), `11` = c(3, 4,
5, 6, 8, 15)), .Names = c("26", "2", "30", "25", "29", "9",
"27", "13", "14", "8", "23", "20", "19", "17", "12", "24", "21",
"6", "28", "15", "11"))
Just to make sure the problem is correctly understood: you have a subgraph of a directed graph induced by the vertices of a strongly connected component. What you would like to have is a cycle containing all vertices of the component. Two possible versions (see the introductory paragraphs here for some clarification on the confusing terminology that has developed in this respect):
a) Each vertex is allowed to appear exactly once on the cycle, i.e. you want a simple cycle where each vertex is incident with exactly two edges of the cycle. Finding such a cycle is the Hamiltonian Cycle problem, a staple of complexity theory which is NP-hard; no human is known to have an efficient algorithm for that.
b) Vertices are allowed to be adjacent to more than two edges of the cycle, i.e. you want a closed walk through the component. You can do that by identifying cycles that connect the component (you should be able to extract those easily enough from an algorithm that identifies strongly connected components), and then you build a Eulerian Cycle of the union of the cycles you found, ignoring all other edges in the component. This is possible efficiently, and should be fairly straightforward to implement.