Plotting forceNetwork diagram in R - r

I'm working on a project to plot mobile OS update times, using forceNetwork from D3 in R. I've successfully plotted a few basic iterations using simpleNetwork, but cannot get the more complicated network to correctly plot.
Have read up on a few posts on this subject, including this one (forceNetwork not displaying any edges), but haven't been able to deduce the answer.
Like the guy in the post above, I'm able to plot the graph with lots of nodes, but cannot get them to connect. Here is what I'm looking to achieve with the nodes:
At center, the mobile operating system (Android and iOS)
First outer edge, the OEMs (Motorola, Samsung, Apple, etc.)
Second outer edge, the mobile devices, connected to the appropriate OEMs
For links, I'd like to model accordingly:
first link between mobile OS and OEM, is just connecting them since there is a known connection (i.e. Motorola makes Android devices, Apple makes iOS devices). For this, I'm using the numerical variable "promised", which should have a "1" for each record to establish the connection.
Second link, between OEM and device, should be "how.long" (i.e. how long it took them to deliver on the actual upgrade)...ideally, I want the length of the link between the nodes to be derived from the length of time it took to deliver on the actual upgrade.
I've created "misNodes" and "misLinks" with the following data:
misNodes <- data.frame(operating.system, OEM, Device)
misLinks <- data.frame(promised, updated, how.long)
I've made a bit of progress plotting with legend and key, but haven't been able to structure in the desired way, nor have I had much success drawing the desired linkages. Here are my best attempts, so far:
#Try plotting
forceNetwork(Links=misLinks, Nodes=misNodes, Source="promised", Target="updated", Value="how.long", NodeID="Device", Group="OEM", width=1000, height=1000, opacity=1, zoom=TRUE, legend=TRUE, bounded=TRUE) #doesn't work, but making progress
forceNetwork(Links=misLinks, Nodes=misNodes, Source="promised", Target="how.long", NodeID="Device", Group="OEM", width=800, height=800, opacity=1, zoom=TRUE, legend=TRUE, bounded=TRUE)
forceNetwork(Links=misLinks, Nodes=misNodes, Source="promised", Target="how.long", Value="promised", NodeID="Device", Group="OEM", width=800, height=800, opacity=1, zoom=TRUE, legend=TRUE, bounded=TRUE)
str(misLinks)
'data.frame': 74 obs. of 3 variables:
$ promised: num 1 1 1 1 1 1 1 1 1 1 ...
$ updated : num 1 1 1 1 1 1 1 1 1 0 ...
$ how.long: num 6 3 1 4 1 6 6 6 6 0 ...
str(misNodes)
'data.frame': 74 obs. of 3 variables:
$ operating.system: Factor w/ 2 levels "Android","iOS": 1 1 1 1 1 1 1 1 1 1 ...
$ OEM : Factor w/ 7 levels "Apple","HTC",..: 2 2 2 2 2 2 2 3 3 3 ...
$ Device : Factor w/ 74 levels "Atrix HD","Atrix HD (developer edition)",..: 7 54 55 56 57 58 59 22 20 21 ...
Any tips would be appreciated...
EDIT: added some data:
head(misLinks)
promised updated how.long
1 1 1 6
2 1 1 3
3 1 1 1
4 1 1 4
5 1 1 1
6 1 1 6
head(misNodes)
operating.system OEM Device
1 Android HTC Droid DNA
2 Android HTC One
3 Android HTC One (developer edition)
4 Android HTC One (dual sim)
5 Android HTC One (Google Play Edition)
6 Android HTC One Max
Here is what the third plot above shows when you run it (please note, I'm holding cursor over the "one" node in image to illustrate...for some reason, those ones connect):

Related

R- Data Analytic syntax

Purpose : I want to repeat the analysis i have already done in python using R.codes are below kindly help write equivalent code in R:
Question no 1:
For below table
caught bowled run out lbw stumped
62 21 8 4 4
caught and bowled hit wicket
2 1
But then I when I converted it back to `dataframe` for using `ggplot` it so coming as
A Freq
1 1 1
2 2 1
3 4 2
4 8 1
5 21 1
6 62 1
How to i avoid this? kindly advice?
******Question no 2 :****
```python code is as below:
len(df_warner\[df_warner\['batsman_runs'\]==6\])
# what is Eqivalent R syntax?
df_six<-df_warner2[(df_warner2$batsman_runs==6),]
nrow(df_six) # worked well

Algorithm to optimally define groups based on multiple responses in R

I have a scheduling puzzle that I am looking for suggestions/solutions using R.
Context
I am coordinating a series of live online group discussions where registered participants will be grouped according to their availability. In a survey, 28 participants (id) indicated morning, afternoon, or evening (am, after, pm) availability on days Monday through Saturday (18 possibilities). I need to generate groups of 4-6 participants who are available at the same time, without replacement (meaning they can only be assigned to one group). Once assigned, groups will meet weekly at the same time (i.e. Group A members will always meet Monday mornings).
Problem
Currently group assignment is being achieved manually (by a human), but with more participants optimizing group assignment will become increasingly challenging. I am interested in finding an algorithm that efficiently achieves relatively equal group placements, and respects other factors such as a person's timezone.
Sample Data
Sample data are in long-format located in an R-script here.
>str(x)
'data.frame': 504 obs. of 4 variables:
$ id : Factor w/ 28 levels "1","10","11",..: 1 12 22 23 24 25 26 27 28 2 ...
$ timezone: Factor w/ 4 levels "Central","Eastern",..: 2 1 3 4 2 1 3 4 2 1 ...
$ day.time: Factor w/ 18 levels "Fri.after","Fri.am",..: 5 5 5 5 5 5 5 5 5 5 ...
$ avail : num 0 0 1 0 1 1 0 1 0 0 ...
The first 12 rows of the data look like this:
> head(x, 12)
id timezone day.time avail
1 1 Eastern Mon.am 0
2 2 Central Mon.am 0
3 3 Mountain Mon.am 1
4 4 Pacific Mon.am 0
5 5 Eastern Mon.am 1
6 6 Central Mon.am 1
7 7 Mountain Mon.am 0
8 8 Pacific Mon.am 1
9 9 Eastern Mon.am 0
10 10 Central Mon.am 0
11 11 Mountain Mon.am 0
12 12 Pacific Mon.am 1
Ideal Solution
An algorithm to optimally define groups (size = 4 to 6) that exactly match on day.time and avail while minimizing differences on other more flexible factors (in this case timezone). In the final result, a participant should only exist in a single group.
Okay, so I am not the most knowledge when it comes to this, but have you looked at the K-Means Clustering algorithm. You can specify the number of clusters you want and the variables for the algorithm to consider. It will then cluster the data into the specified number of clusters, aka, categories for you.
What do you think?
References:
https://datascienceplus.com/k-means-clustering-in-r/
http://www.sthda.com/english/wiki/cluster-analysis-in-r-unsupervised-machine-learning

Create a pedigree in R when only 1 parent is known for some individuals?

Is there a way to create a pedigree in R when only 1 parent is known for some individuals?
I've tried using kinship2 package in R to create a pedigree but I believe this can only handle when either no parents are known (considered generation 1) and then both parents are known thereafter.
I believe synbreed package is able to deal with this and I have tried the code below but for some reason I receive an error code that I cannot decipher. Is there something wrong with the way my data are structured? Or how I have formulated the arguments of create.pedigree() function? Or is it not possible to construct this pedigree in synbreed package?
Note rows 5 and 6 of data frame 'Ped' which have only one parent ID.
> Ped<-read.csv("Pedigree.csv",header=T)
> library(synbreed)
> head(Ped)
IndividualID SireID DamID Sex
1 019-35751 026-34118 026-34117 male
2 019-35740 <NA> <NA> female
3 019-35791 026-34129 026-34128 male
4 019-35702 <NA> <NA> male
5 019-35784 <NA> 026-34147 female
6 019-35764 <NA> 026-34133 male
> str(Ped)
'data.frame': 1136 obs. of 4 variables:
$ IndividualID: Factor w/ 1136 levels "019-35702","019-35712",..: 6 4 10 1 9 8 3 63 62 108 ...
$ SireID : Factor w/ 136 levels "019-35712","019-35756",..: 8 NA 15 NA NA NA 23 23 23 84 ...
$ DamID : Factor w/ 131 levels "026-34101","026-34103",..: 4 NA 7 NA 13 8 NA NA NA 30 ...
$ Sex : Factor w/ 2 levels "female","male": 2 1 2 2 1 2 2 2 2 2 ...
> create.pedigree(Ped$IndividualID, Ped$SireID, Ped$DamID , unknown = NA)
Error in `[<-.data.frame`(`*tmp*`, is.na(pedigree), value = 0) :
unsupported matrix index in replacement
I don't know the synbreed package and the assumptions it makes about individuals with only one known parent, so the answer below may be different from what you are looking for.
However, if you know/believe that the unknown parents refer to independent individuals then you can "fix" your dataset by adding "empty" parents.
A toy dataset that resembles what you show could be
fid id father mother sex
1 1 . . 1
1 2 . . 2
1 3 1 2 1
1 4 1 2 1
1 5 . 2 2
1 6 . 2 1
Here we have missing fathers for individual 5 and 6. Then we add two new entries to represent the fathers that we have not seen. Thus, the dataset should be
fid id father mother sex
1 1 . . 1
1 2 . . 2
1 3 1 2 1
1 4 1 2 1
1 100 . . 1
1 101 . . 1
1 5 100 2 2
1 6 101 2 1
where we have added two new fathers that are only used to fill out the pedigree structure. This last dataset can be read with
library(kinship2)
indata <- read.table("ped.txt", header=TRUE, na.strings=".")
with(indata, pedigree(id=id, dadid=father, momid=mother, sex=sex, famid=fid))
Now, in the fixed dataset we have made an implicit assumption that individuals 5 and 6 are half siblings and that they cannot be full siblings. If the synbreed package (and the relevant computations) can handle the possibility that those individuals then that is different (and computationally quite difficult) from what I suggest.

making a table with multiple columns in r

I´m obviously a novice in writing R-code.
I have tried multiple solutions to my problem from stackoverflow but I'm still stuck.
My dataset is carcinoid, patients with a small bowel cancer, with multiple variables.
i would like to know how different variables are distributed
carcinoid$met_any - with metastatic disease 1=yes, 2=no(computed variable)
carcinoid$liver_mets_y_n - liver metastases 1=yes, 2=no
carcinoid$regional_lymph_nodes_y_n - regional lymph nodes 1=yes, 2=no
peritoneal_carcinosis_y_n - peritoneal carcinosis 1=yes, 2=no
i have tried this solution which is close to my wanted result
ddply(carcinoid, .(carcinoid$met_any), summarize,
livermetastases=sum(carcinoid$liver_mets_y_n=="1"),
regionalmets=sum(carcinoid$regional_lymph_nodes_y_n=="1"),
pc=sum(carcinoid$peritoneal_carcinosis_y_n=="1"))
with the result being:
carcinoid$met_any livermetastases regionalmets pc
1 1 21 46 7
2 2 21 46 7
Now, i expected the row with 2(=no metastases), to be empty. i would also like the rows in the column carcinoid$met_any to give the number of patients.
If someone could help me it would be very much appreciated!
John
Edit
My dataset, although the column numbers are: 1, 43,28,31,33
1=yes2=no
case_nr met_any liver_mets_y_n regional_lymph_nodes_y_n pc
1 1 1 1 2
2 1 2 1 2
3 2 2 2 2
4 1 2 1 1
5 1 2 1 1
desired output - I want to count the numbers of 1:s and 2:s, if it works, all 1:s should end up in the met_any=1 row
nr liver_mets regional_lymph_nodes pc
met_any=1 4 1 4 2
met_any=2 1 4 1 3
EDIT
Although i probably was very unclear in my question, with your help i could make the table i needed!
setDT(carcinoid)[,lapply(.SD,table),.SDcols=c(43,28,31,33,17)]
gives
met_any lymph_nod liver_met paraortal extrahep
1: 50 46 21 6 15
2: 111 115 140 151 146
i am very grateful! #mtoto provided the solution
John
Based on your example data, this data.table approach works:
library(data.table)
setDT(df)[,lapply(.SD,table),.SDcols=c(2:5)]
# met_any liver_mets_y_n regional_lymph_nodes_y_n pc
# 1: 4 1 4 2
# 2: 1 4 1 3

How can I sort the X axis in a Barplot in R?

I have binned data that looks like this:
(8.048,18.05] (-21.95,-11.95] (-31.95,-21.95] (18.05,28.05] (-41.95,-31.95]
81 76 18 18 12
(-132,-122] (-122,-112] (-112,-102] (-162,-152] (-102,-91.95]
6 6 6 5 5
(-91.95,-81.95] (-192,-182] (28.05,38.05] (38.05,48.05] (58.05,68.05]
5 4 4 4 4
(78.05,88.05] (98.05,108] (-562,-552] (-512,-502] (-482,-472]
4 4 3 3 3
(-452,-442] (-412,-402] (-282,-272] (-152,-142] (48.05,58.05]
3 3 3 3 3
(68.05,78.05] (118,128] (128,138] (-582,-572] (-552,-542]
3 3 3 2 2
(-532,-522] (-422,-412] (-392,-382] (-362,-352] (-262,-252]
2 2 2 2 2
(-252,-242] (-142,-132] (-81.95,-71.95] (148,158] (-1402,-1392]
2 2 2 2 1
(-1372,-1362] (-1342,-1332] (-942,-932] (-862,-852] (-822,-812]
1 1 1 1 1
(-712,-702] (-682,-672] (-672,-662] (-632,-622] (-542,-532]
1 1 1 1 1
(-502,-492] (-492,-482] (-472,-462] (-462,-452] (-442,-432]
1 1 1 1 1
(-432,-422] (-352,-342] (-332,-322] (-312,-302] (-302,-292]
1 1 1 1 1
(-202,-192] (-182,-172] (-172,-162] (-51.95,-41.95] (88.05,98.05]
1 1 1 1 1
(108,118] (158,168] (168,178] (178,188] (298,308]
1 1 1 1 1
(318,328] (328,338] (338,348] (368,378] (458,468]
1 1 1 1 1
How can I plot this data so that the bin is sorted from most negative on the left to most positive on the right? Currently my graph looks like this. Notice that it is not sorted at all. In particular the second bar (value = 76) is placed to the right of the first:
(8.048,18.05] (-21.95,-11.95]
81 76
This is the command I use to plot:
barplot(x,ylab="Number of Unique Tags", xlab="Expected - Observed")
I really want to help answer your question, but I gotta tell you, I can't make heads or tails of your data. I see a lot of opening parenthesis but no closing ones. The data looks sorted descending by whatever the values are on the bottom of each row. I have no idea what to make out of a value like "(8.048,18.05]"
Am I missing something obvious? Can you make a more simple example where your data structure is not a factor?
I would generally expect a data frame or a matrix with two columns, one for the X and one for the Y.
See if this example of sorting helps (I'm sort of shooting in the dark here)
tN <- table(Ni <- rpois(100, lambda=5))
r <- barplot(tN)
#stop here and examine the plot
#the next bit converts the matrix to a data frame,
# sorts it, and plots it again
df<-data.frame(tN)
df2<-df[order(df$Freq),]
barplot(df2$Freq)

Resources