Preparing data for ndtv in R

Preparing data for ndtv in R - r

I want to visualize a dynamic network using the ndtv package in R.
My dataset (data) looks like this:
0 1 Apple Banana
0 1 Peach Banana
0 1 Apple Strawberry
1 2 Apple Banana
1 2 Apple Peach
2 3 Banana Peach
…
So the columns are onset, terminus, tail, head.
If I want to create a networkDynamic object from this list by
nw <- networkDynamic(edge.spells=data)
I get an error saying "the tail column of the edge.spells argument to networkDynamic must be a numeric vertex id". So I guess I need to convert those strings into numeric values. How do I do that? And if I do that, how do I keep the names? I don't want a network that just displays the numeric IDs of those names, I want to see those names in the network.
I couldn't find any useful information by searching the web, and this tutorial doesn't show what I want to do. I would've liked to see how they actually constructed the short.stergm.sim data instead of just using it.
Any help is very much appreciated!

I found a way to map ids to the names.
names <- unique(c(data$head,data$tail))
data$head <- match(data$head,names)
data$tail <- match(data$tail,names)
And then I could create the networkDynamic object
nw <-networkDynamic(edge.spells=data)
and add the names to the network
network.vertex.names(nw) <- names
This post helped me a lot.

Related

Is there an easy way of text searching using lookup tables in R? (Version 2 - multiple word searching)

I've previously asked a very similar question which was superbly answered but I have since slightly changed the search terms to multiple words so I am posting a fresh question with updated code/example.
I have a use case where I have lots of 'lookup tables', i.e. dataframes containing strings I am searching for in rows within a large second dataframe. I need to extract rows where a string exists within the dataframe but there may be other strings in the dataframe. I also need to extract the whole row and that of the lookup table when a match is found.
I've successfully achieved what I need via a nested for loop, but my actual dataset is massive and the lookup table will be circa 50,000 rows. So a for loop is going to be very inefficient. I have had success using dplyr::semi_join but that only works when the entries match exactly, whereas I am searching for a single word in a longer string:
fruit_lookup <- data.frame(fruit=c("banana drop","apple juice","pear","plum"), rating=c(3,4,3,5))
products <- data.frame(product_code=c("535A","535B","283G","786X","765G"), product_name=c("banana drop syrup","apple juice concentrate","melon juice","coconut oil","strawberry jelly"))
results <- data.frame(product_code=NA, product_name=NA, fruit=NA, rating=NA)
for(i in 1:nrow(products)) {
for(j in 1:nrow(fruit_lookup)){
if(stringr::str_detect(products$product_name[i], fruit_lookup$fruit[j])) {
results <- tibble::add_row(results)
results$product_code[i] <- products$product_code[i]
results$product_name[i] <- products$product_name[i]
results$fruit[i] <- fruit_lookup$fruit[j]
results$rating[i] <- fruit_lookup$rating[j]
break
}
}
}
results <- stats::na.omit(results)
print(results)
This yields the result I am wanting:
product_code product_name fruit rating
535A banana drop syrup banana drop 3
535B apple juice concentrate apple juice 4
Any advice gratefully received and I won't be hurt if I have missed something obvious. Please feel free to critique my other coding practices, which may not be ideal!

This seems like a regex-join. Up-front, I'm not certain how well this scales with any of the offerings:
fuzzyjoin::regex_inner_join(products, fruit_lookup, by = c("product_name" = "fruit"))
# product_code product_name fruit rating
# 1 535A banana drop syrup banana drop 3
# 2 535B apple juice concentrate apple juice 4
Similarly, sqldf:
sqldf::sqldf("
select p.*, f.*
from fruit_lookup f
inner join products p on p.product_name like '%'||f.fruit||'%'
")

Gene ontology (GO) analysis for a list of Genes (with ENTREZID) in R?

I am very new with the GO analysis and I am a bit confuse how to do it my list of genes.
I have a list of genes (n=10):
gene_list
SYMBOL ENTREZID GENENAME
1 AFAP1 60312 actin filament associated protein 1
2 ANAPC11 51529 anaphase promoting complex subunit 11
3 ANAPC5 51433 anaphase promoting complex subunit 5
4 ATL2 64225 atlastin GTPase 2
5 AURKA 6790 aurora kinase A
6 CCNB2 9133 cyclin B2
7 CCND2 894 cyclin D2
8 CDCA2 157313 cell division cycle associated 2
9 CDCA7 83879 cell division cycle associated 7
10 CDCA7L 55536 cell division cycle associated 7-like
and I simply want to find their function and I've been suggested to use GO analysis tools.
I am not sure if it's a correct way to do so.
here is my solution:
x <- org.Hs.egGO
# Get the entrez gene identifiers that are mapped to a GO ID
xx<- as.list(x[gene_list$ENTREZID])
So, I've got a list with EntrezID that are assigned to several GO terms for each genes.
for example:
> xx$`60312`
$`GO:0009966`
$`GO:0009966`$GOID
[1] "GO:0009966"
$`GO:0009966`$Evidence
[1] "IEA"
$`GO:0009966`$Ontology
[1] "BP"
$`GO:0051493`
$`GO:0051493`$GOID
[1] "GO:0051493"
$`GO:0051493`$Evidence
[1] "IEA"
$`GO:0051493`$Ontology
[1] "BP"
My question is :
how can I find the function for each of these genes in a simpler way and I also wondered if I am doing it right or?
because I want to add the function to the gene_list as a function/GO column.
Thanks in advance,

EDIT: There is a new Bioinformatics SE (currently in beta mode).
I hope I get what you are aiming here.
BTW, for bioinformatics related topics, you can also have a look at biostar which have the same purpose as SO but for bioinformatics
If you just want to have a list of each function related to the gene, you can query database such ENSEMBl through the biomaRt bioconductor package which is an API for querying biomart database.
You will need internet though to do the query.
Bioconductor proposes packages for bioinformatics studies and these packages come generally along with good vignettes which get you through the different steps of the analysis (and even highlight how you should design your data or which would be then some of the pitfalls).
In your case, directly from biomaRt vignette - task 2 in particular:
Note: there are slightly quicker way that the one I reported below:
# load the library
library("biomaRt")
# I prefer ensembl so that the one I will query, but you can
# query other bases, try out: listMarts()
ensembl=useMart("ensembl")
# as it seems that you are looking for human genes:
ensembl = useDataset("hsapiens_gene_ensembl",mart=ensembl)
# if you want other model organisms have a look at:
#listDatasets(ensembl)
You need to create your query (your list of ENTREZ ids). To see which filters you can query:
filters = listFilters(ensembl)
And then you want to retrieve attributes : your GO number and description. To see the list of available attributes
attributes = listAttributes(ensembl)
For you, the query would look like something as:
goids = getBM(
#you want entrezgene so you know which is what, the GO ID and
# name_1006 is actually the identifier of 'Go term name'
attributes=c('entrezgene','go_id', 'name_1006'),
filters='entrezgene',
values=gene_list$ENTREZID,
mart=ensembl)
The query itself can take a while.
Then you can always collapse the information in two columns (but I won't recommend it for anything else that reporting purposes).
Go.collapsed<-Reduce(rbind,lapply(gene_list$ENTREZID,function(x)
tempo<-goids[goids$entrezgene==x,]
return(
data.frame('ENTREZGENE'= x,
'Go.ID'= paste(tempo$go_id,collapse=' ; '),
'GO.term'=paste(tempo$name_1006,collapse=' ; '))
)
Edit:
If you want to query a past version of the ensembl database:
ens82<-useMart(host='sep2015.archive.ensembl.org',
biomart='ENSEMBL_MART_ENSEMBL',
dataset='hsapiens_gene_ensembl')
and then the query would be:
goids = getBM(attributes=c('entrezgene','go_id', 'name_1006'),
filters='entrezgene',values=gene_list$ENTREZID,
mart=ens82)
However, if you had in mind to do a GO enrichment analysis, your list of genes is too short.

if Col 2 is true copy the calues from Col1 into Col 3

I have a dataset which is as follows
Id Name Description Status
1 Kyla DataMining Yes
2 Kim MonteCarlo Methods No
3 Kanye Meta-Analysis May Be
4 Bruce Optimization Yes
I am trying to create a fourth column Result which will store the values from Description column if Status == Yes, if Status == No or May Be then it will just copy the values from Status which is No or May Be. The final dataset should look like this
Id Name Description Status Result
1 Kyla DataMining Yes DataMining
2 Kim MonteCarlo Methods No No
3 Kanye Meta-Analysis May Be May Be
4 Bruce Optimization Yes Optimization
So far I tired doing this using ifelse
data1$Result <- ifelse(data1$Status == "Yes", data1$Description, data1$Status)
I dont get any error but I dont get the right results either, i am seeing some completely unrelated numbers ?? Need some help.

It is because your variables Description and Status are stored as factors. You can see this using str(data1). Try to convert it to character first using as.character().
data1$Status<-as.character(data1$Status)
data1$Description<-as.character(data1$Description)

Changing values in list if that value meets criteria in R

I have a set of data that I am importing from a csv
info <- read.csv("test.csv")
here is an example of what it would look like
name type purchase
1 mark new yes
2 steve old no
3 jim old yes
4 bill new yes
What I want to do:
I want to loop through the purchase column and change all the yes's to True & no's to be False. Then loop through the type column and change all the old's to customer.
I've tried to mess with all the different apply's and couldnt get it to work. Also Ive tried a bunch of the methods in this thread Replace a value in a data frame based on a conditional (`if`) statement in R but still no luck.
Any help or guidance would be much appreciated!
Thanks,
Nico

Here's an approach using within, basic character substitution, and basic tests for character equivalence.
within(mydf, {
type <- gsub("old", "customer", type)
purchase <- purchase == "yes"
})
# name type purchase
# 1 mark new TRUE
# 2 steve customer FALSE
# 3 jim customer TRUE
# 4 bill new TRUE
I've used gsub to replace "type", but there are other approaches that can be taken (eg factor, ifelse, and so on).

Creating a vector from a file in R

I am new to R and my question should be trivial. I need to create a word cloud from a txt file containing the words and their occurrence number. For that purposes I am using the snippets package.
As it can be seen at the bottom of the link, first I have to create a vector (is that right that words is a vector?) like bellow.
> words <- c(apple=10, pie=14, orange=5, fruit=4)
My problem is to do the same thing but create the vector from a file which would contain words and their occurrence number. I would be very happy if you could give me some hints.
Moreover, to understand the format of the file to be inserted I write the vector words to a file.
> write(words, file="words.txt")
However, the file words.txt contains only the values but not the names(apple, pie etc.).
$ cat words.txt
10 14 5 4
Thanks.

words is a named vector, the distinction is important in the context of the cloud() function if I read the help correctly.
Write the data out correctly to a file:
write.table(words, file = "words.txt")
Create your word occurrence file like the txt file created. When you read it back in to R, you need to do a little manipulation:
> newWords <- read.table("words.txt", header = TRUE)
> newWords
x
apple 10
pie 14
orange 5
fruit 4
> words <- newWords[,1]
> names(words) <- rownames(newWords)
> words
apple pie orange fruit
10 14 5 4
What we are doing here is reading the file into newWords, the subsetting it to take the one and only column (variable), which we store in words. The last step is to take the row names from the file read in and apply them as the "names" on the words vector. We do the last step using the names() function.

Yes, 'vector' is the proper term.
EDIT:
A better method than write.table would be to use save() and load():
save(words. file="svwrd.rda")
load(file="svwrd.rda")
The save/load combo preserved all the structure rather than doing coercion. The write.table followed by names()<- is kind of a hassle as you can see in both Gavin's answer here and my answer on rhelp.
Initial answer:
Suggest you use as.data.frame to coerce to a dataframe an then write.table() to write to a file.
write.table(as.data.frame(words), file="savew.txt")
saved <- read.table(file="savew.txt")
saved
words
apple 10
pie 14
orange 5
fruit 4

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Preparing data for ndtv in R - r

Related

Is there an easy way of text searching using lookup tables in R? (Version 2 - multiple word searching)

Gene ontology (GO) analysis for a list of Genes (with ENTREZID) in R?

if Col 2 is true copy the calues from Col1 into Col 3

Changing values in list if that value meets criteria in R

Creating a vector from a file in R

Categories

Resources