Generating an answer excel from all generated exams in R/exams - r

I'm a professor from the University College in Ghent (Belgium) and we are brainstorming about the organization of our exams, Research Techniques (quite a bit of statistics). We are thinking of generating individual exams for all our students, but we want to make the grading as easy as possible.
We were thinking of generating n exams, using R-exams and allowing the students to answer using google forms / OneDrive forms or whatever platform to generate an excel with al the answers from the students. The content of each answer vector would be different, although the type of answer for each question would be the same.
If would be awesome if we could generate an answer excel sheet with all the answers per generated exam: this way we only need to div the answer provided by the students with the answers generated by R-exams. Is such a functionality available or possible?
With kind regards
Jens Buysse

The functionality you are looking for is not readily available in R/exams but it is not too hard to write a little bit of code that puts it together.
All exams2xyz() interfaces return a list of exams, containing a list of exercises, containing (among other things) the meta-information for each question. You can extract this and put it into an Excel sheet.
Also you can use the exams_metainfo() extractor to display the information within R.
As a simple example consider:
library("exams")
set.seed(0)
exm <- exams2html(c("swisscapital.Rmd", "deriv.Rmd"), n = 3)
Now exm is a list of n = 3 exams, each containing 2 exercises, for which then the metainformation can be extracted. For example for the first exercise in the first exam:
exm[[1]][[1]]$metainfo$name
## [1] "Swiss Capital"
exm[[1]][[1]]$metainfo$solution
## [1] FALSE FALSE FALSE TRUE FALSE
exm[[1]][[1]]$metainfo$string
## [1] "Swiss Capital: 4"
To display this information in R:
exams_metainfo(exm)
## exam1
## 1. Swiss Capital: 4
## 2. derivative exp: 55.25 (55.24--55.26)
##
## exam2
## 1. Swiss Capital: 2
## 2. derivative exp: 1.79 (1.78--1.8)
##
## exam3
## 1. Swiss Capital: 4
## 2. derivative exp: 46.73 (46.72--46.74)
You can also get just one exam via the print() method:
print(exams_metainfo(exm), 2)
## exam2
## 1. Swiss Capital: 2
## 2. derivative exp: 1.79 (1.78--1.8)

Related

how to extract the journals names from a PubMed

I am trying to perform a search on specific authors
so I can look up but I don't know how to extract citation, or plot journals that he or she published papers in
library(RISmed)
#now let's look up this author
res <- EUtilsSummary('Gene Myers', type='esearch', db='pubmed')
summary(res)
The first thing to notice is that what you already produced contains the PubMed IDs
for the papers that match your query.
res#PMID
[1] "30481296" "29335514" "26102528" "25333104" "23541733" "22743769"
[7] "21685076" "20937014" "20122179" "19447790" "12804086" "12061009"
Knowing the IDs, you can retrieve detailed information on all of them
using EUtilsGet
res2 = EUtilsGet(res#PMID)
Now we can get the items required for a citation from res2.
ArticleTitle(res2) ## Article Titles
Title(res2) ## Publication Names
YearPubmed(res2) ## Year of publication
Volume(res2) ## Volume
Issue(res2) ## Issue number
Author(res2) ## Lists of Authors
There is much more information embedded in the res2 object.
If you look at the help page ?Medline, you can get a good idea
of the other information.
When you retrieve the detailed information of the selected articles using EUtilsGet, the journal information is stored as ISO abbreviated term.
library(RISmed)
#now let's look up this author
res <- EUtilsSummary('Gene Myers', type='esearch', db='pubmed')
summary(res)
res2 = EUtilsGet(res, db = "pubmed")
sort(table(res2#ISOAbbreviation), decreasing = T)[1:5] ##Top 5 journals
Gigascience Bioinformatics J Comput Biol BMC Bioinformatics Curr Biol
3 2 2 1 1

Extract words starting with # in R dataframe and save as new column

My dataframe column looks like this:
head(tweets_date$Tweet)
[1] b"It is #DineshKarthik's birthday and here's a rare image of the captain of #KKRiders. Have you seen him do this before? Happy birthday, DK\\xf0\\x9f\\x98\\xac
[2] b'The awesome #IPL officials do a wide range of duties to ensure smooth execution of work! Here\\xe2\\x80\\x99s #prabhakaran285 engaging with the #ChennaiIPL kid-squad that wanted to meet their daddies while the presentation was on :) #cutenessoverload #lineofduty \\xf0\\x9f\\x98\\x81
[3] b'\\xf0\\x9f\\x8e\\x89\\xf0\\x9f\\x8e\\x89\\n\\nCHAMPIONS!!
[4] b'CHAMPIONS - 2018 #IPLFinal
[5] b'Chennai are Super Kings. A fairytale comeback as #ChennaiIPL beat #SRH by 8 wickets to seal their third #VIVOIPL Trophy \\xf0\\x9f\\x8f\\x86\\xf0\\x9f\\x8f\\x86\\xf0\\x9f\\x8f\\x86. This is their moment to cherish, a moment to savour.
[6] b"Final. It's all over! Chennai Super Kings won by 8 wickets
These are tweets which have mentions starting with '#', I need to extract all of them and save each mention in that particular tweet as "#mention1 #mention2". Currently my code just extracts them as lists.
My code:
tweets_date$Mentions<-str_extract_all(tweets_date$Tweet, "#\\w+")
How do I collapse those lists in each row to a form a string separated by spaces as mentioned earlier.
Thanks in advance.
I trust it would be best if you used an asis column in this case:
extract words:
library(stringr)
Mentions <- str_extract_all(lis, "#\\w+")
some data frame:
df <- data.frame(col = 1:6, lett = LETTERS[1:6])
create a list column:
df$Mentions <- I(Mentions)
df
#output
col lett Mentions
1 1 A #DineshK....
2 2 B #IPL, #p....
3 3 C
4 4 D
5 5 E #ChennaiIPL
6 6 F
I think this is better since it allows for quite easy sub setting:
df$Mentions[[1]]
#output
[1] "#DineshKarthik" "#KKRiders"
df$Mentions[[1]][1]
#output
[1] "#DineshKarthik"
and it succinctly shows whats inside the column when printing the df.
data:
lis <- c("b'It is #DineshKarthik's birthday and here's a rare image of the captain of #KKRiders. Have you seen him do this before? Happy birthday, DK\\xf0\\x9f\\x98\\xac",
"b'The awesome #IPL officials do a wide range of duties to ensure smooth execution of work! Here\\xe2\\x80\\x99s #prabhakaran285 engaging with the #ChennaiIPL kid-squad that wanted to meet their daddies while the presentation was on :) #cutenessoverload #lineofduty \\xf0\\x9f\\x98\\x81",
"b'\\xf0\\x9f\\x8e\\x89\\xf0\\x9f\\x8e\\x89\\n\\nCHAMPIONS!!",
"b'CHAMPIONS - 2018 #IPLFinal",
"b'Chennai are Super Kings. A fairytale comeback as #ChennaiIPL beat #SRH by 8 wickets to seal their third #VIVOIPL Trophy \\xf0\\x9f\\x8f\\x86\\xf0\\x9f\\x8f\\x86\\xf0\\x9f\\x8f\\x86. This is their moment to cherish, a moment to savour.",
"b'Final. It's all over! Chennai Super Kings won by 8 wickets")
The str_extract_all function from the stringr package returns a list of character vectors. So, if you instead want a list of single CSV terms, then you may try using sapply for a base R option:
tweets <- str_extract_all(tweets_date$Tweet, "#\\w+")
tweets_date$Mentions <- sapply(tweets, function(x) paste(x, collapse=", "))
Demo
Via Twitter's help site: "Your username cannot be longer than 15 characters. Your real name can be longer (20 characters), but usernames are kept shorter for the sake of ease. A username can only contain alphanumeric characters (letters A-Z, numbers 0-9) with the exception of underscores, as noted above. Check to make sure your desired username doesn't contain any symbols, dashes, or spaces."
Note that email addresses can be in tweets as can URLs with #'s in them (and not just the silly URLs with username/password in the host component). Thus, something like:
(^|[^[[:alnum:]_]#/\\!?=&])#([[:alnum:]_]{1,15})\\b
is likely a better, safer choice

Text summarization in R language

I have long text file using help of R language I want to summarize text in at least 10 to 20 line or in small sentences.
How to summarize text in at least 10 line with R language ?
You may try this (from the LSAfun package):
genericSummary(D,k=1)
whereby 'D' specifies your text document and 'k' the number of sentences to be used in the summary. (Further modifications are shown in the package documentation).
For more information:
http://search.r-project.org/library/LSAfun/html/genericSummary.html
There's a package called lexRankr that summarizes text in the same way that Reddit's /u/autotldr bot summarizes articles. This article has a full walkthrough on how to use it but just as a quick example so you can test it yourself in R:
#load needed packages
library(xml2)
library(rvest)
library(lexRankr)
#url to scrape
monsanto_url = "https://www.theguardian.com/environment/2017/sep/28/monsanto-banned-from-european-parliament"
#read page html
page = xml2::read_html(monsanto_url)
#extract text from page html using selector
page_text = rvest::html_text(rvest::html_nodes(page, ".js-article__body p"))
#perform lexrank for top 3 sentences
top_3 = lexRankr::lexRank(page_text,
#only 1 article; repeat same docid for all of input vector
docId = rep(1, length(page_text)),
#return 3 sentences to mimick /u/autotldr's output
n = 3,
continuous = TRUE)
#reorder the top 3 sentences to be in order of appearance in article
order_of_appearance = order(as.integer(gsub("_","",top_3$sentenceId)))
#extract sentences in order of appearance
ordered_top_3 = top_3[order_of_appearance, "sentence"]
> ordered_top_3
[1] "Monsanto lobbyists have been banned from entering the European parliament after the multinational refused to attend a parliamentary hearing into allegations of regulatory interference."
[2] "Monsanto officials will now be unable to meet MEPs, attend committee meetings or use digital resources on parliament premises in Brussels or Strasbourg."
[3] "A Monsanto letter to MEPs seen by the Guardian said that the European parliament was not “an appropriate forum” for discussion on the issues involved."

Gene ontology (GO) analysis for a list of Genes (with ENTREZID) in R?

I am very new with the GO analysis and I am a bit confuse how to do it my list of genes.
I have a list of genes (n=10):
gene_list
SYMBOL ENTREZID GENENAME
1 AFAP1 60312 actin filament associated protein 1
2 ANAPC11 51529 anaphase promoting complex subunit 11
3 ANAPC5 51433 anaphase promoting complex subunit 5
4 ATL2 64225 atlastin GTPase 2
5 AURKA 6790 aurora kinase A
6 CCNB2 9133 cyclin B2
7 CCND2 894 cyclin D2
8 CDCA2 157313 cell division cycle associated 2
9 CDCA7 83879 cell division cycle associated 7
10 CDCA7L 55536 cell division cycle associated 7-like
and I simply want to find their function and I've been suggested to use GO analysis tools.
I am not sure if it's a correct way to do so.
here is my solution:
x <- org.Hs.egGO
# Get the entrez gene identifiers that are mapped to a GO ID
xx<- as.list(x[gene_list$ENTREZID])
So, I've got a list with EntrezID that are assigned to several GO terms for each genes.
for example:
> xx$`60312`
$`GO:0009966`
$`GO:0009966`$GOID
[1] "GO:0009966"
$`GO:0009966`$Evidence
[1] "IEA"
$`GO:0009966`$Ontology
[1] "BP"
$`GO:0051493`
$`GO:0051493`$GOID
[1] "GO:0051493"
$`GO:0051493`$Evidence
[1] "IEA"
$`GO:0051493`$Ontology
[1] "BP"
My question is :
how can I find the function for each of these genes in a simpler way and I also wondered if I am doing it right or?
because I want to add the function to the gene_list as a function/GO column.
Thanks in advance,
EDIT: There is a new Bioinformatics SE (currently in beta mode).
I hope I get what you are aiming here.
BTW, for bioinformatics related topics, you can also have a look at biostar which have the same purpose as SO but for bioinformatics
If you just want to have a list of each function related to the gene, you can query database such ENSEMBl through the biomaRt bioconductor package which is an API for querying biomart database.
You will need internet though to do the query.
Bioconductor proposes packages for bioinformatics studies and these packages come generally along with good vignettes which get you through the different steps of the analysis (and even highlight how you should design your data or which would be then some of the pitfalls).
In your case, directly from biomaRt vignette - task 2 in particular:
Note: there are slightly quicker way that the one I reported below:
# load the library
library("biomaRt")
# I prefer ensembl so that the one I will query, but you can
# query other bases, try out: listMarts()
ensembl=useMart("ensembl")
# as it seems that you are looking for human genes:
ensembl = useDataset("hsapiens_gene_ensembl",mart=ensembl)
# if you want other model organisms have a look at:
#listDatasets(ensembl)
You need to create your query (your list of ENTREZ ids). To see which filters you can query:
filters = listFilters(ensembl)
And then you want to retrieve attributes : your GO number and description. To see the list of available attributes
attributes = listAttributes(ensembl)
For you, the query would look like something as:
goids = getBM(
#you want entrezgene so you know which is what, the GO ID and
# name_1006 is actually the identifier of 'Go term name'
attributes=c('entrezgene','go_id', 'name_1006'),
filters='entrezgene',
values=gene_list$ENTREZID,
mart=ensembl)
The query itself can take a while.
Then you can always collapse the information in two columns (but I won't recommend it for anything else that reporting purposes).
Go.collapsed<-Reduce(rbind,lapply(gene_list$ENTREZID,function(x)
tempo<-goids[goids$entrezgene==x,]
return(
data.frame('ENTREZGENE'= x,
'Go.ID'= paste(tempo$go_id,collapse=' ; '),
'GO.term'=paste(tempo$name_1006,collapse=' ; '))
)
Edit:
If you want to query a past version of the ensembl database:
ens82<-useMart(host='sep2015.archive.ensembl.org',
biomart='ENSEMBL_MART_ENSEMBL',
dataset='hsapiens_gene_ensembl')
and then the query would be:
goids = getBM(attributes=c('entrezgene','go_id', 'name_1006'),
filters='entrezgene',values=gene_list$ENTREZID,
mart=ens82)
However, if you had in mind to do a GO enrichment analysis, your list of genes is too short.

How to retrieve/calculate citation counts and/or citation indices from a list of authors?

I have a list of authors.
I wish to automatically retrieve/calculate the (ideally yearly) citation index (h-index, m-quotient,g-index, HCP indicator or ...) for each author.
Author Year Index
first 2000 1
first 2001 2
first 2002 3
I can calculate all of these metrics given the citation counts for each paper of each researcher.
Author Paper Year Citation_count
first 1 2000 1
first 2 2000 2
first 3 2002 3
Despite my efforts, I have not found an API/scraping method capable of this.
My institution has access to a number of services including Web of Science.
Effectively the main problem is to build the citation graph. Once you have that you can compute any metrics you want (e.g. h-index, g-index, PageRank).
Supposing you have a collections of papers (that you've retrieved in some way) you can extract the citations from each of them and build the citation graph. You might find useful ParsCit, an open-source CRF Reference String and Logical Document Structure Parsing Package which is also used by CiteSeerX and works great.

Resources