How do I convert this adjacency matrix into a graph object? - r

I have a matrix that represents social interaction data on a CSV, which looks like below:
`0` `1` `2` `3` `4` `5` `6` `7` `8` `9`
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
0 0 29 1 0 1 9 3 0 1 4
1 1 0 0 1 3 1 0 1 1 1
2 1 1 0 13 4 0 1 1 15 0
3 3 0 1 0 1 1 7 1 1 1
4 1 0 1 98 0 1 1 1 1 2
5 2 5 1 1 3 0 2 0 1 5
6 1 1 0 0 12 1 0 2 1 1
7 1 1 0 1 0 1 9 0 1 2
8 1 1 17 13 145 1 39 1 0 1
9 88 23 1 5 1 2 1 7 1 0
I am new to social network analysis, so I am not sure of my terminology, but this seems like a weighted adjacency matrix to me, as we can say from this that student 1 has had 29 interactions with student 0 in the last year. I had this object stored as a data-frame in my RStudio, but when I ran the following code, I received the below error:
> fn <- graph_from_adjacency_matrix(output, weighted = T)
Error in mde(x) : 'list' object cannot be coerced to type 'double'
I've tried converting it to matrix, but that does not seem to work either. Any help concerning this would be really appreciated.

You need to convert your data.frame to matrix first and then apply graph_from_adjacency_matrix, e.g.,
g <- graph_from_adjacency_matrix(as.matrix(df),weighted = TRUE)
and plot(g) gives
Data
> dput(df)
structure(list(``0`` = c(0L, 1L, 1L, 3L, 1L, 2L, 1L, 1L, 1L,
88L), ``1`` = c(29L, 0L, 1L, 0L, 0L, 5L, 1L, 1L, 1L, 23L), ``2`` = c(1L,
0L, 0L, 1L, 1L, 1L, 0L, 0L, 17L, 1L), ``3`` = c(0L, 1L, 13L,
0L, 98L, 1L, 0L, 1L, 13L, 5L), ``4`` = c(1L, 3L, 4L, 1L, 0L,
3L, 12L, 0L, 145L, 1L), ``5`` = c(9L, 1L, 0L, 1L, 1L, 0L, 1L,
1L, 1L, 2L), ``6`` = c(3L, 0L, 1L, 7L, 1L, 2L, 0L, 9L, 39L, 1L
), ``7`` = c(0L, 1L, 1L, 1L, 1L, 0L, 2L, 0L, 1L, 7L), ``8`` = c(1L,
1L, 15L, 1L, 1L, 1L, 1L, 1L, 0L, 1L), ``9`` = c(4L, 1L, 0L, 1L,
2L, 5L, 1L, 2L, 1L, 0L)), class = "data.frame", row.names = c("0",
"1", "2", "3", "4", "5", "6", "7", "8", "9"))

Related

How to find the statistical mode of each ID

Here are the observations of two individuals of my dataset.
data=structure(list(id = c(2L, 2L, 2L, 3L, 3L, 3L), trt = c(1L, 1L,
1L, 1L, 1L, 1L), status = c(0L, 0L, 0L, 2L, 2L, 2L), stage = c(3L,
3L, 3L, 4L, 4L, 4L), spiders = c(1L, 1L, 1L, 0L, 1L, 0L), sex = structure(c(2L,
2L, 2L, 1L, 1L, 1L), .Label = c("m", "f"), class = "factor"),
hepato = c(1L, 1L, 1L, 0L, 1L, 0L), edema = c(0, 0, 0, 0.5,
0, 0.5), ascites = c(0L, 0L, 0L, 0L, 0L, 0L)), row.names = c(NA,
-6L), class = "data.frame")
I want to calculate the the statistical mode for each individual after grouping by id. I used this code below:
library(dplyr)
library(modeest)
data%>%
group_by(id)%>%mutate(edema2=mlv(edema))
And I get an error message when calculating the mode, while this method work well with other statistical parameters such as mean, sd, min, max....
The warnings that you are getting are suggesting two things.
You have not specified what method to choose so default method 'shorth' is used.
It is suggesting that there is a tie in selection of Mode value.
Alternatively, why not use the Mode function from here :
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
To apply by group you can use it with dplyr as :
library(dplyr)
data%>% group_by(id)%>% mutate(edema2= Mode(edema))
# id trt status stage spiders sex hepato edema ascites edema2
# <int> <int> <int> <int> <int> <fct> <int> <dbl> <int> <dbl>
#1 2 1 0 3 1 f 1 0 0 0
#2 2 1 0 3 1 f 1 0 0 0
#3 2 1 0 3 1 f 1 0 0 0
#4 3 1 2 4 0 m 0 0.5 0 0.5
#5 3 1 2 4 1 m 1 0 0 0.5
#6 3 1 2 4 0 m 0 0.5 0 0.5

Error with using mlogit R function: "The two indexes don't define unique observations"

My dataset look like this
ID choice_situation Alternative Attr1 Attr2 Attr3 choice
ID_1 1 1 0 0 0 0
ID_1 1 2 1 1 0 1
ID_1 2 1 1 1 0 0
ID_1 2 2 1 1 1 1
ID_1 3 1 2 1 0 1
ID_1 3 2 3 1 0 0
ID_2 1 1 3 0 1 1
ID_2 1 2 0 0 0 0
ID_2 2 1 2 1 1 0
ID_2 2 2 2 1 1 1
ID_2 3 1 0 0 0 1
ID_2 3 2 0 0 1 0
.....
Every time I run the code of mlogit function
DCE_data<- mlogit.data(data=dataset, choice = "choice", shape = "long", alt.var = "Alternative", id.var = "ID") #ok
model<- mlogit(choice ~ Attr1 + Attr2 + Attr3 | 0, DCE_data)#error
I get the error below :
Error in dfidx(x, .idx, pkg = pkg) :
the two indexes don't define unique observations
The problem is from the transformed data : DCE_data ?
Thanks in advance!
For me your code works:
library(tidyverse)
df <- tibble::tribble(
~ID, ~choice_situation, ~Alternative, ~Attr1, ~Attr2, ~Attr3, ~choice,
"ID_1", 1L, 1L, 0L, 0L, 0L, 0L,
"ID_1", 1L, 2L, 1L, 1L, 0L, 1L,
"ID_1", 2L, 1L, 1L, 1L, 0L, 0L,
"ID_1", 2L, 2L, 1L, 1L, 1L, 1L,
"ID_1", 3L, 1L, 2L, 1L, 0L, 1L,
"ID_1", 3L, 2L, 3L, 1L, 0L, 0L,
"ID_2", 1L, 1L, 3L, 0L, 1L, 1L,
"ID_2", 1L, 2L, 0L, 0L, 0L, 0L,
"ID_2", 2L, 1L, 2L, 1L, 1L, 0L,
"ID_2", 2L, 2L, 2L, 1L, 1L, 1L,
"ID_2", 3L, 1L, 0L, 0L, 0L, 1L,
"ID_2", 3L, 2L, 0L, 0L, 1L, 0L
)
library(mlogit)
DCE_data<- mlogit.data(data=df, choice = "choice", shape = "long", alt.var = "Alternative", id.var = "ID") #ok
model<- mlogit(choice ~ Attr1 + Attr2 + Attr3 | 0, DCE_data)#error
summary(model)
> model
Call:
mlogit(formula = choice ~ Attr1 + Attr2 + Attr3 | 0, data = DCE_data, method = "nr")
Coefficients:
Attr1 Attr2 Attr3
0.34137 14.86152 0.39473

How to determine the longest timeperiod and exclude other rows in Excel or R?

In my dataset I have information of the ZIPCODE of 600K+ ID's. If ID's move to a different addressess, I want to determine at which zipcode they lived the longest and put a '1' for that specific year in that row (no need to combine rows as I want to know if they where they lived in what year). That way an ID only have a '1' for a certain year at one row (if there are multiple rows for that ID). The yellow highlight is what i don't want; in that case there is a '1' in two rows for the same year. In the preferred dataset there is only one '1' per year per ID possible.
For example: ID 4 lived in 2013 in 2 places (NY and LA), therefore there are 2 rows. At this point there is a 1 in each row for 2013 and I only want a 1 in the row the ID lived the longest between 1-1-2013 and 31-12-2018. ID 4 lived in 2013 longer in LA than in NY, and so only a 1 should be at the row for NY (so in this case the row of LA will be removed because only '0's remain).
I can also put this file in RStudio.
Thank you!
structure(v1)
ID CITY ZIPCODE DATE_START DATE_END DATE_END.1 X2013 X2014 X2015 X2016 X2017 X2018
1 1 NY 1234EF 1-12-2003 31-12-2018 1 1 1 1 1 1
2 2 NY 1234CD 1-12-2003 14-1-2019 14-1-2019 1 1 1 1 1 1
3 2 NY 1234AB 15-1-2019 31-12-2018 0 0 0 0 0 0
4 3 NY 1234AB 15-1-2019 31-12-2018 0 0 0 0 0 0
5 3 NY 1234CD 1-12-2003 14-1-2019 14-1-2019 1 1 1 1 1 1
6 4 LA 1111AB 4-5-2013 31-12-2018 1 1 1 1 1 1
7 4 NY 2222AB 1-12-2003 3-5-2013 3-5-2013 1 0 0 0 0 0
8 5 MIAMI 5555CD 6-2-2015 20-6-2016 20-6-2016 0 0 1 1 0 0
9 5 VEGAS 3333AB 1-1-2004 31-12-2018 1 1 1 1 1 1
10 5 ORLANDO 4444AB 26-2-2004 5-2-2015 5-2-2015 1 1 1 0 0 0
11 5 MIAMI 5555AB 21-6-2016 31-12-2018 31-12-2018 0 0 0 1 1 1
12 5 MIAMI 5555AB 1-1-2019 31-12-2018 0 0 0 0 0 0
13 6 AUSTIN 6666AB 28-2-2017 3-11-2017 3-11-2017 0 0 0 0 1 0
14 6 AUSTIN 6666AB 4-11-2017 31-12-2018 0 0 0 0 1 1
15 6 AUSTIN 7777AB 20-1-2017 27-2-2017 27-2-2017 0 0 0 0 1 0
16 6 AUSTIN 8888AB 1-12-2003 19-1-2017 19-1-2017 1 1 1 1 1 0
>
structure(list(ID = c(1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L, 5L,
5L, 5L, 6L, 6L, 6L, 6L), CITY = structure(c(4L, 4L, 4L, 4L, 4L,
2L, 4L, 3L, 6L, 5L, 3L, 3L, 1L, 1L, 1L, 1L), .Label = c("AUSTIN",
"LA", "MIAMI", "NY", "ORLANDO", "VEGAS"), class = "factor"),
ZIPCODE = structure(c(4L, 3L, 2L, 2L, 3L, 1L, 5L, 9L, 6L,
7L, 8L, 8L, 10L, 10L, 11L, 12L), .Label = c("1111AB", "1234AB",
"1234CD", "1234EF", "2222AB", "3333AB", "4444AB", "5555AB",
"5555CD", "6666AB", "7777AB", "8888AB"), class = "factor"),
DATE_START = structure(c(3L, 3L, 4L, 4L, 3L, 10L, 3L, 11L,
1L, 7L, 6L, 2L, 8L, 9L, 5L, 3L), .Label = c("1-1-2004", "1-1-2019",
"1-12-2003", "15-1-2019", "20-1-2017", "21-6-2016", "26-2-2004",
"28-2-2017", "4-11-2017", "4-5-2013", "6-2-2015"), class = "factor"),
DATE_END = structure(c(1L, 2L, 1L, 1L, 2L, 1L, 7L, 4L, 1L,
9L, 8L, 1L, 6L, 1L, 5L, 3L), .Label = c("", "14-1-2019",
"19-1-2017", "20-6-2016", "27-2-2017", "3-11-2017", "3-5-2013",
"31-12-2018", "5-2-2015"), class = "factor"), DATE_END.1 = structure(c(7L,
1L, 7L, 7L, 1L, 7L, 6L, 3L, 7L, 8L, 7L, 7L, 5L, 7L, 4L, 2L
), .Label = c("14-1-2019", "19-1-2017", "20-6-2016", "27-2-2017",
"3-11-2017", "3-5-2013", "31-12-2018", "5-2-2015"), class = "factor"),
X2013 = c(1L, 1L, 0L, 0L, 1L, 1L, 1L, 0L, 1L, 1L, 0L, 0L,
0L, 0L, 0L, 1L), X2014 = c(1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L,
1L, 1L, 0L, 0L, 0L, 0L, 0L, 1L), X2015 = c(1L, 1L, 0L, 0L,
1L, 1L, 0L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 1L), X2016 = c(1L,
1L, 0L, 0L, 1L, 1L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 1L
), X2017 = c(1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L,
0L, 1L, 1L, 1L, 1L), X2018 = c(1L, 1L, 0L, 0L, 1L, 1L, 0L,
0L, 1L, 0L, 1L, 0L, 0L, 1L, 0L, 0L)), class = "data.frame", row.names = c(NA,
-16L))
You can use a little help from the lubridate package to calculate how many days are spent at each location. Then we can group_by ID and use case_when to assign 1 when the time is the max or 0 otherwise.
library(lubridate)
library(dplyr)
v1 %>%
dplyr::select(ID,CITY,ZIPCODE,DATE_START,DATE_END.1) %>%
rowwise() %>%
mutate("X2013" = max(0, min(dmy("31-12-2013"),dmy(DATE_END.1)) - max(dmy("1-1-2013"),dmy(DATE_START))),
"X2014" = max(0, min(dmy("31-12-2014"),dmy(DATE_END.1)) - max(dmy("1-1-2014"),dmy(DATE_START))),
"X2015" = max(0, min(dmy("31-12-2015"),dmy(DATE_END.1)) - max(dmy("1-1-2015"),dmy(DATE_START))),
"X2016" = max(0, min(dmy("31-12-2016"),dmy(DATE_END.1)) - max(dmy("1-1-2016"),dmy(DATE_START))),
"X2017" = max(0, min(dmy("31-12-2017"),dmy(DATE_END.1)) - max(dmy("1-1-2017"),dmy(DATE_START))),
"X2018" = max(0, min(dmy("31-12-2018"),dmy(DATE_END.1)) - max(dmy("1-1-2018"),dmy(DATE_START)))) %>%
ungroup %>%
group_by(ID) %>%
mutate_at(vars(starts_with("X")),list(~ case_when(. == max(.) ~ 1,
TRUE ~ 0)))
# A tibble: 16 x 11
# Groups: ID [6]
ID CITY ZIPCODE DATE_START DATE_END.1 X2013 X2014 X2015 X2016 X2017 X2018
<int> <fct> <fct> <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 NY 1234EF 1-12-2003 31-12-2018 1 1 1 1 1 1
2 2 NY 1234CD 1-12-2003 14-1-2019 1 1 1 1 1 1
3 2 NY 1234AB 15-1-2019 31-12-2018 0 0 0 0 0 0
4 3 NY 1234AB 15-1-2019 31-12-2018 0 0 0 0 0 0
5 3 NY 1234CD 1-12-2003 14-1-2019 1 1 1 1 1 1
6 4 LA 1111AB 4-5-2013 31-12-2018 1 1 1 1 1 1
7 4 NY 2222AB 1-12-2003 3-5-2013 0 0 0 0 0 0
8 5 MIAMI 5555CD 6-2-2015 20-6-2016 0 0 0 0 0 0
9 5 VEGAS 3333AB 1-1-2004 31-12-2018 1 1 1 1 1 1
10 5 ORLANDO 4444AB 26-2-2004 5-2-2015 1 1 0 0 0 0
11 5 MIAMI 5555AB 21-6-2016 31-12-2018 0 0 0 0 1 1
12 5 MIAMI 5555AB 1-1-2019 31-12-2018 0 0 0 0 0 0
13 6 AUSTIN 6666AB 28-2-2017 3-11-2017 0 0 0 0 1 0
14 6 AUSTIN 6666AB 4-11-2017 31-12-2018 0 0 0 0 0 1
15 6 AUSTIN 7777AB 20-1-2017 27-2-2017 0 0 0 0 0 0
16 6 AUSTIN 8888AB 1-12-2003 19-1-2017 1 1 1 1 0 0
There is certainly a way that one could implement the first mutate call to not require manually writing each year, but would take much more work than just typing it out.

aggregation of data in R with assign of a dummy variable by condition

I have the following dataset
mydata=structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), ad_id = c(111L, 111L, 111L,
111L, 1111L, 1111L, 11111L, 11111L, 11111L, 111L, 111L, 1111L,
1111L, 11111L, 11111L, 11111L, 111111L, 111111L), price = c(1L,
0L, 1L, 0L, 2L, 0L, 3L, 0L, 0L, 1L, 0L, 2L, 0L, 3L, 0L, 0L, 1L,
0L), rev = c(2L, 0L, 0L, 2L, 3L, 3L, 4L, 4L, 4L, 2L, 2L, 3L,
3L, 4L, 4L, 4L, 0L, 0L), data = structure(c(1L, 2L, 2L, 3L, 1L,
3L, 1L, 3L, 4L, 1L, 3L, 1L, 3L, 1L, 3L, 4L, 1L, 3L), .Label = c("01.01.2018",
"01.02.2018", "01.03.2018", "02.03.2018"), class = "factor")), .Names = c("id",
"ad_id", "price", "rev", "data"), class = "data.frame", row.names = c(NA,
-18L))
How can I create a dummy variable according to the following logic:
For each id and ad_id I need to aggregate by data price and rev. Each ad_id has a date column (data).
If for each id and ad_idfor the period up to 90 days(data column -d-m-y) rev is greater than the price, then the flag is set to 1 otherwise the flag is 0.
In this reproducible example , I just take 1 id and 4 ad_id.
In aggregated by sum form it is view
id ad_id price rev
1 1 111 2 4
2 1 1111 2 6
3 1 11111 3 12
4 1 111111 1 0
So for id=1 , all ad_id (besides ad_id = 111111) satisfy rev > price, so in initial data
ad_id = 111, 1111, 111111 must have flag = 1 and 111111 must have flag = 0.
Here is the desired output:
id ad_id price rev data flag
1 1 111 1 2 01.01.2018 1
2 1 111 0 0 01.02.2018 1
3 1 111 1 0 01.02.2018 1
4 1 111 0 2 01.03.2018 1
5 1 1111 2 3 01.01.2018 1
6 1 1111 0 3 01.03.2018 1
7 1 11111 3 4 01.01.2018 1
8 1 11111 0 4 01.03.2018 1
9 1 11111 0 4 02.03.2018 1
10 1 111111 1 0 01.01.2018 0
11 1 111111 0 0 01.03.2018 0
How to perform such condition
I am not sure if understood you correctly, but is this what you are looking for:
library(tidyverse)
mydata %>% as_tibble() %>%
group_by(id, ad_id) %>%
summarise_at(vars("price", "rev"), sum) %>%
mutate(flag = if_else(price > rev, 0, 1)) %>%
select(id, ad_id, flag) %>%
left_join(mydata, ., by = c("id", "ad_id"))

convert dataset to longitudinal data structure in R

I have a datset that looks something like this:
> head(BurnData)
Treatment Gender Race Surface head buttock trunk up.leg low.leg resp.tract type ex.time excision antib.time antibiotic
1 0 0 0 15 0 0 1 1 0 0 2 12 0 12 0
2 0 0 1 20 0 0 1 0 0 0 4 9 0 9 0
3 0 0 1 15 0 0 0 1 1 0 2 13 0 13 0
4 0 0 0 20 1 0 1 0 0 0 2 11 1 29 0
5 0 0 1 70 1 1 1 1 0 0 2 28 1 31 0
6 0 0 1 20 1 0 1 0 0 0 4 11 0 11 0
inf.time infection
1 12 0
2 9 0
3 7 1
4 29 0
5 4 1
6 8 1
I want to run a Cox's Regression on variables Surface, ex.time and, antib.time and treatment. Treatment is an indicator variable. Surface denotes the % of body burned. ex.time and antib.time both record time to event in days.
I am aware that to run a time dependent Cox's Regression i need to convert the data in longitudinal structure, but how can i do it in R?
then i will use the forluma:
coxph(formula = Surv(tstart, tstop, infection) ~ covariate)
DATA
> dput(head(BurnData))
structure(list(Treatment = c(0L, 0L, 0L, 0L, 0L, 0L), Gender = c(0L,
0L, 0L, 0L, 0L, 0L), Race = c(0L, 1L, 1L, 0L, 1L, 1L), Surface = c(15L,
20L, 15L, 20L, 70L, 20L), head = c(0L, 0L, 0L, 1L, 1L, 1L), buttock = c(0L,
0L, 0L, 0L, 1L, 0L), trunk = c(1L, 1L, 0L, 1L, 1L, 1L), up.leg = c(1L,
0L, 1L, 0L, 1L, 0L), low.leg = c(0L, 0L, 1L, 0L, 0L, 0L), resp.tract = c(0L,
0L, 0L, 0L, 0L, 0L), type = c(2L, 4L, 2L, 2L, 2L, 4L), ex.time = c(12L,
9L, 13L, 11L, 28L, 11L), excision = c(0L, 0L, 0L, 1L, 1L, 0L),
antib.time = c(12L, 9L, 13L, 29L, 31L, 11L), antibiotic = c(0L,
0L, 0L, 0L, 0L, 0L), inf.time = c(12L, 9L, 7L, 29L, 4L, 8L
), infection = c(0L, 0L, 1L, 0L, 1L, 1L), Surface_discr = structure(c(1L,
1L, 1L, 1L, 2L, 1L), .Label = c("1", "2"), class = "factor"),
ex.time_discr = c(1L, 1L, 1L, 1L, 2L, 1L), antib.time_discr = c(1L,
1L, 1L, 2L, 2L, 1L)), .Names = c("Treatment", "Gender", "Race",
"Surface", "head", "buttock", "trunk", "up.leg", "low.leg", "resp.tract",
"type", "ex.time", "excision", "antib.time", "antibiotic", "inf.time",
"infection", "Surface_discr", "ex.time_discr", "antib.time_discr"
), row.names = c(NA, 6L), class = "data.frame")

Resources