Is there an R function to help me plot the network connections for a single node? - r

This is my original dataset. R1,R2 and R3 are word association responses for the cue word. tf and df are total and document frequency of the cue word, respectively.
[1]: https://i.stack.imgur.com/wpfZy.png [Image shows original dataframe}
I have cleaned up a dataset into a nodes list and an edge list. I have over a million rows in both lists. Plotting this as a network graph would take too long, and also be very dense, i.e. not understandable.
[2]: https://i.stack.imgur.com/mfSfN.png [Image shows node-list]
[3]: https://i.stack.imgur.com/l60Eu.png [Image shows edge-list]
I want to be able to make a network graph for the cue words, such that upon entering a cue word, I get a network of words that are either responses to it, or are words that the cue word is a response for.
For example, I want to see all the connections for the word 'money'. Using filter(nword == "money") only shows the node 'money' as an output, but I want all nodes connected to the cue word (in this case, 'money').
[4]: https://i.stack.imgur.com/1bKrr.png [Image shows filter()]
Is there a function or a chunk of code that would help me resolve this issue?
from
to
1
1
1
6
1
8
1
17
1
18
1
22
1
23
1
38
1
67
1
80
2
82736
2
88035
2
103428
3
11
3
27
3
45
node_id
nword
n
1
money
13633
2
food
12338
3
water
12276
4
car
8907
5
music
8351
6
green
7890
7
red
7623
8
love
7406
9
sex
6552
10
happy
6432
11
cold
6333
12
bad
6132
13
sad
5958
14
dog
5940
15
white
5910
16
school
5832
17
fun
5594
18
time
5467
19
black
5233
20
hair
5219

Related

vcdExtra::datasets not working on some Packages

R3.6.1, vcdExtra 0.7.1
vcdExtra::datasets("caret")
Error in get(x) : object 'GermanCredit' not found
vcdExtra::datasets fails on some packages like "caret".
Am I missing something?
thanks
If you only require the dataset of German Credit, try this code:
library(caret)
data("GermanCredit")
GermanCredit
And you will get:
Duration Amount InstallmentRatePercentage ResidenceDuration Age NumberExistingCredits NumberPeopleMaintenance Telephone
1 6 1169 4 4 67 2 1 0
2 48 5951 2 2 22 1 1 1
3 12 2096 2 3 49 1 2 1
4 42 7882 2 4 45 1 2 1
5 24 4870 3 4 53 2 2 1
Please, comment if it is what you need.
Regards,
Alexis
This is the sequence of commands that I need to run for a correct functioning of vcdExtra::datasets("caret")
library(evtree)
library(caret)
data(Sacramento)
data(tecator)
data(BloodBrain)
data(cox2)
data(dhfr)
data(oil)
data(mdrr)
data(pottery)
data(scat)
data(segmentationData)
vcdExtra::datasets("caret")
The output is
Item class dim Title
1 GermanCredit data.frame 1000x21 German Credit Data
2 Sacramento data.frame 932x9 Sacramento CA Home Prices
3 absorp matrix 215x100 Fat, Water and Protein Content of Meat Samples
4 bbbDescr data.frame 208x134 Blood Brain Barrier Data
5 cars data.frame 50x2 Kelly Blue Book resale data for 2005 model year GM cars
6 cox2Class factor 462 COX-2 Activity Data
7 cox2Descr data.frame 462x255 COX-2 Activity Data
8 cox2IC50 numeric 462 COX-2 Activity Data
9 dhfr data.frame 325x229 Dihydrofolate Reductase Inhibitors Data
10 endpoints matrix 215x3 Fat, Water and Protein Content of Meat Samples
11 fattyAcids data.frame 96x7 Fatty acid composition of commercial oils
12 logBBB numeric 208 Blood Brain Barrier Data
13 mdrrClass factor 528 Multidrug Resistance Reversal (MDRR) Agent Data
14 mdrrDescr data.frame 528x342 Multidrug Resistance Reversal (MDRR) Agent Data
15 oilType factor 96 Fatty acid composition of commercial oils
16 potteryClass factor 58 Pottery from Pre-Classical Sites in Italy
17 scat data.frame 110x19 Morphometric Data on Scat
18 scat_orig data.frame 122x20 Morphometric Data on Scat
19 segmentationData data.frame 2019x61 Cell Body Segmentation

Doing a series of operations on every subset of the data obtained from a dataframe

This is a question of a noob in 'R' world. I tried searching and there were quite a few solutions that came close (e.g aggregate, by, etc), but I lacked the understanding to apply it to my problem. Would really appreciate if someone can guide me in a more detailed way.
Hypothetical Dataset
Name Wheels Color Mileage seat_capacity
1 2 Red 70 2
2 3 Black 60 7
3 4 Blue 12 5
4 4 White 15 6
5 3 Yellow 45 6
6 2 Green 70 2
7 3 Silver 45 6
8 6 Silver 5 4
9 14 Red 12 2
10 2 Black 70 7
11 4 Blue 70 5
12 3 White 60 6
13 4 Yellow 12 6
14 4 Green 15 2
I have initially created subsets of data based on color using split.
color <- split(df,df$color)
For each of the subsets created I would be doing more operations e.g
finding the vehicles with highest mileage among the vehicles with lowest number of wheels in each subset.....etc
I have written all the rules pertaining to the later half as well. I am struggling to find a way where I can run all the operations on each of the subset in the variable color.
Any help would be appreciated.
The following worked for me and I would sincerely want to thank #Imo and #aosmith for guiding me.
Assume, I would want to first group the df based on colour and then group further by wheels and then within each such subgroup(wheels) pick top 2 vehicles based on Mileage. Used the dplyr library to achieve the same.
my_list <- df %>% group_by(color, wheels) %>% top_n(2,Mileage)
HTH

making a table with multiple columns in r

I´m obviously a novice in writing R-code.
I have tried multiple solutions to my problem from stackoverflow but I'm still stuck.
My dataset is carcinoid, patients with a small bowel cancer, with multiple variables.
i would like to know how different variables are distributed
carcinoid$met_any - with metastatic disease 1=yes, 2=no(computed variable)
carcinoid$liver_mets_y_n - liver metastases 1=yes, 2=no
carcinoid$regional_lymph_nodes_y_n - regional lymph nodes 1=yes, 2=no
peritoneal_carcinosis_y_n - peritoneal carcinosis 1=yes, 2=no
i have tried this solution which is close to my wanted result
ddply(carcinoid, .(carcinoid$met_any), summarize,
livermetastases=sum(carcinoid$liver_mets_y_n=="1"),
regionalmets=sum(carcinoid$regional_lymph_nodes_y_n=="1"),
pc=sum(carcinoid$peritoneal_carcinosis_y_n=="1"))
with the result being:
carcinoid$met_any livermetastases regionalmets pc
1 1 21 46 7
2 2 21 46 7
Now, i expected the row with 2(=no metastases), to be empty. i would also like the rows in the column carcinoid$met_any to give the number of patients.
If someone could help me it would be very much appreciated!
John
Edit
My dataset, although the column numbers are: 1, 43,28,31,33
1=yes2=no
case_nr met_any liver_mets_y_n regional_lymph_nodes_y_n pc
1 1 1 1 2
2 1 2 1 2
3 2 2 2 2
4 1 2 1 1
5 1 2 1 1
desired output - I want to count the numbers of 1:s and 2:s, if it works, all 1:s should end up in the met_any=1 row
nr liver_mets regional_lymph_nodes pc
met_any=1 4 1 4 2
met_any=2 1 4 1 3
EDIT
Although i probably was very unclear in my question, with your help i could make the table i needed!
setDT(carcinoid)[,lapply(.SD,table),.SDcols=c(43,28,31,33,17)]
gives
met_any lymph_nod liver_met paraortal extrahep
1: 50 46 21 6 15
2: 111 115 140 151 146
i am very grateful! #mtoto provided the solution
John
Based on your example data, this data.table approach works:
library(data.table)
setDT(df)[,lapply(.SD,table),.SDcols=c(2:5)]
# met_any liver_mets_y_n regional_lymph_nodes_y_n pc
# 1: 4 1 4 2
# 2: 1 4 1 3

Adding all values of a variable in R [duplicate]

This question already has answers here:
How to sum a variable by group
(18 answers)
Closed 7 years ago.
I don't know how to word the title exactly, so I will just do my best to explain below... Sorry in advance for the .csv format.
I have the following example dataset:
print(data)
ID Tag Flowers
1 1 6871 1
2 2 6750 1
3 3 6859 1
4 4 6767 1
5 5 6747 1
6 6 6261 1
7 7 6750 1
8 8 6767 1
9 9 6812 1
10 10 6746 1
11 11 6496 4
12 12 6497 1
13 13 6495 4
14 14 6481 1
15 15 6485 1
Notice that in Lines 2 and 7, the tag 6750 appears twice. I observed one flower on plant number 6750 on two separate days, equaling two flowers in its lifetime. Basically, I want to add every flower that occurs for tag 6750, tag 6767, etc throughout ~100 rows. Each tag appears more than once, usually around 4 or 5 times.
I feel like I need to apply the unlist function here, but I'm a little bit lost as to how I should do so.
Without any extra packages, you can use function aggregate():
res<-aggregate(data$Flowers, list(data$Tag), sum)
This calculates a sum of the values in Flowers column for every value in the Tag column.

making a new dataframe by looking for keywords in specific variable

I have a big dataset of about 35000 cases X 32 variables
one of those variables is Description in which a description of status is given. for example: patient suffered ischemic stroke.
Now I would like to make a dataframe in which I place all cases in which the word "stroke", "STROKE" or "Stroke" is found in the variable Description.
Could anyone suggest a efficient way to do this. Because now I just added all by hand in a very inefficient way:
df1<-rbind(df[1,],df[2,],df[3,]
It works but it's unbelievably inelegant and prone to mistakes.
Here I create some example data to work with.
a <- c(1:10)
b <- c(11:20)
description <- c("Stroke","ALS","Parkinsons","STROKE","STROKE","stroke","Alzheimers","Stroke","ALS","Parkinsons")
df<-data.frame(a,b,description)
df
a b description
1 1 11 Stroke
2 2 12 ALS
3 3 13 Parkinsons
4 4 14 STROKE
5 5 15 STROKE
6 6 16 stroke
7 7 17 Alzheimers
8 8 18 Stroke
9 9 19 ALS
10 10 20 Parkinsons
With this code you can remove every case (row) that is not associated with "Stroke", "STROKE" or "stroke":
df1<-df[!(df$description!="STROKE" & df$description!="Stroke" & df$description!="stroke"),]
df1
a b description
1 1 11 Stroke
4 4 14 STROKE
5 5 15 STROKE
6 6 16 stroke
8 8 18 Stroke
Hope this was what you were looking for.

Resources