I want to write a R-Script which allows me to extract information of MSG-Files (Email).
The emails are automated sign-up-mails from a Website. They are containing Information about the User (Forename, Surname, Email etc.). I try to extract the specific Information by using regex. The Problem is, that the order of fields may vary.
I use the msgxtractr-Library which works fine. The Output looks like this:
\r\n\r\nAnrede \r\n\r\nHerr\r\n\r\nVorname \r\n\r\nJames \r\n\r\nName \r\n\r\nBond \r\n\r\
To get the Information, i extract the text inbetween two text patterns ->(.*?)
Example:
"Vorname \r\n\r\n(.*?) \r\n\r\n"
library(msgxtractr) #usage
library(magrittr)
#------pfad setzen-----------------------------------------------------------
setwd(dirname(rstudioapi::getActiveDocumentContext()$path))
#------Msg-Datei einlesen-----------------------------------------------------------
BALBLI = read_msg("MSG/Test2.msg")
#------Text zwischen 2 Pattern Extrahieren-----------------------------------
testAR = BALBLI[["body"]][["text"]] #Body aus MSG-Datei
patternVN= "Vorname \r\n\r\n(.*?) \r\n\r\n"
searchVN <- regmatches(testAR,regexec(patternVN,testAR))
Vorname = searchVN[[1]][2]
Vorname
I have been trying two Test-Cases:
1) Good Result:
> patternVN= "Vorname \r\n\r\n(.*?) \r\n\r\n"
> searchVN <- regmatches(testAR,regexec(patternVN,testAR))
> Vorname = searchVN[[1]][2]
> Vorname
[1] "James"
2) Bad Result:
> patternVN= "Vorname \r\n\r\n(.*?) \r\n\r\n"
> searchVN <- regmatches(testAR,regexec(patternVN,testAR))
> Vorname = searchVN[[1]][2]
> Vorname
[1] "John\r\n\r\nName"
In this Case it takes the Pattern after the Name.
I would try a completely different approach.
msg <- "\r\n\r\nAnrede \r\n\r\nHerr\r\n\r\nVorname \r\n\r\nJames \r\n\r\nName \r\n\r\nBond \r\n\r\n"
msg <- gsub("^\\s+", "", msg) # remove spaces at the beginning and end
msg <- gsub("\\s+$", "", msg)
words <- strsplit(msg, " *[\n\r]+ *")[[1]]
res <- as.list(words[seq(2, length(words), 2)])
names(res) <- words[seq(1, length(words), 2)]
Result
> res
$Anrede
[1] "Herr"
$Vorname
[1] "James"
$Name
[1] "Bond"
Related
In R, how can I make the following:
convert this string: "my test string"
to something like this ( a full width character string): "my test string"
is there a way to do this through hexidecimal character encodings?
Thanks for your help, I'm really not sure how to even start. Perhaps something with {stringr}
I'm trying to get an output similar to what I would expect from this online conversion tool:
http://www.linkstrasse.de/en/%EF%BD%86%EF%BD%95%EF%BD%8C%EF%BD%8C%EF%BD%97%EF%BD%89%EF%BD%84%EF%BD%94%EF%BD%88%EF%BC%8D%EF%BD%83%EF%BD%8F%EF%BD%8E%EF%BD%96%EF%BD%85%EF%BD%92%EF%BD%94%EF%BD%85%EF%BD%92
Here is a possible solution using a function from the archived Nippon package. This is the han2zen function, which can be found here.
x <- "my test string"
han2zen <- function(s){
stopifnot(is.character(s))
zenEisu <- paste0(intToUtf8(65295 + 1:10), intToUtf8(65312 + 1:26),
intToUtf8(65344 + 1:26))
zenKigo <- c(65281, 65283, 65284, 65285, 65286, 65290, 65291,
65292, 12540, 65294, 65295, 65306, 65307, 65308,
65309, 65310, 65311, 65312, 65342, 65343, 65372,
65374)
s <- chartr("0-9A-Za-z", zenEisu, s)
s <- chartr('!#$%&*+,-./:;<=>?#^_|~', intToUtf8(zenKigo), s)
s <- gsub(" ", intToUtf8(12288), s)
return(s)
}
han2zen(x)
# [1] "my test string"
I want to download data from a SEC filing in R. The code below does this. It creates a data frame that contains the 13F data.
#einhorn_13F_2016.R
# Holdings of D. Einhorns Hedge Fund
# Metadata / Background Info
#https://www.sec.gov/Archives/edgar/data/1079114/000107911416000025/xslForm13F_X01/primary_doc.xml
library(ggplot2)
library(rvest)
library(stringi)
library(purrr)
library(tidyr)
library(dplyr)
# data
# read in HTML:
html_url <- "https://www.sec.gov/Archives/edgar/data/1079114/000107911416000025/xslForm13F_X01/Greenlight_13FXML_06302016.xml"
html_dat <- read_html(html_url)
#find the right table in HTML DOM
html_dat <- html_table(html_dat, header = TRUE, fill=TRUE)[[4]]
glimpse(html_dat)
# parse messed-up table header
einhorn_col <- map2_chr(html_dat[1,],html_dat[2,], paste)
einhorn <- html_dat
colnames(einhorn) <- make.names(stri_trim(stringi::stri_trans_tolower(paste0( einhorn_col, sep=""))))
einhorn <- einhorn[3:nrow(einhorn),]
# there are 2 important numeric columns
einhorn[, "value..x.1000."] <- as.numeric(gsub(",", "",einhorn[, "value..x.1000."]))
einhorn[, "shrs.or.prn.amt"] <- as.numeric(gsub(",", "", einhorn[, "shrs.or.prn.amt"]))
# most important holdings by value
einhorn %>%
group_by(name.of.issuer) %>%
summarise(sum_value=sum(value..x.1000.),sum_shares=sum(shrs.or.prn.amt)) %>%
arrange(desc(sum_value))
# show some company names
companies <- unique(einhorn$name.of.issuer)
sample(companies, 6)
Now I want to augment the data frame.
colnames(einhorn)
[1] "name.of.issuer" "title.of.class" "cusip"
[4] "value..x.1000." "shrs.or.prn.amt" "sh..prn"
[7] "put..call" "investment.discretion" "other.manager"
[10] "voting.authority.sole" "voting.authority.shared" "voting.authority.none"
Starting from column 1, "name of issuer", I want to find the market category , country of residence etc.
I want output similar to the finreportr::CompanyInfo("GOOG") call
company CIK SIC state state.inc FY.end street.address city.state
1 GOOGLE INC. 0001288776 7370 CA DE 1231 1600 AMPHITHEATRE PARKWAY MOUNTAIN VIEW CA 94043
but when I enter values from the "name of issuer" column I don't know where to fetch this data from.
sample(companies, 6)
[1] "TAKE-TWO INTERACTIVE SOFTWAR" "TERRAFORM PWR INC"
[3] "APPLE INC" "VOYA FINL INC"
[5] "AERCAP HOLDINGS NV" "PERRIGO CO PLC
Does not work with one of the values above (because it is not a real ticker value):
finreportr::CompanyInfo("TERRAFORM PWR INC")
Result:
Error in open.connection(x, "rb") : HTTP error 400.
Calls: <Anonymous> -> <Anonymous> -> read_html.default
Is there a web service, API endpoint or R package that I can use to get this data?
Answering my own question:
I have used the Google Knowledge Graph Search API to look up company details from a strangely formatted and abbreviated string. It works in the majority of cases.
API Key handling/assignment omitted from code.
(...prepend code from Question block here ....)
# show some company names
companies <- unique(einhorn$name.of.issuer)
#samp <- data.frame(company=sample(companies, 6), stringsAsFactors = FALSE)
samp <- sample(companies, 6)
kgapi_call_str <- function(query,
apikey,
templatestr="https://kgsearch.googleapis.com/v1/entities:search?key=%s&limit=1&indent=True&query=%s"){
knowledgeapi <- sprintf(fmt = templatestr, apikey, URLencode(query))
knowledgeapi
}
kg_api_call <- function(api_call_str, extracolumn=NA){
json <- jsonlite::fromJSON(api_call_str)
if(is.data.frame(json$itemListElement)) {
json.result <- jsonlite::flatten(json$itemListElement)
colnames(json.result) <- make.names(colnames(json.result) )
json.result$name.of.issuer <- extracolumn
json.result
}
}
kgapi_call_data <- function(api_call_str, extracolumn=NA){
extracolumn_shortened <- gsub('\\s+\\w+$', '', extracolumn, perl=TRUE)
extracolumn_shortened.2 <- gsub('\\s+\\w+$', '', extracolumn_shortened, perl=TRUE)
json <- kg_api_call(api_call_str, extracolumn)
if(!is.null(json)){
return(json)
}
# Query unsuccessful try shortened company-name,
if (stri_length(extracolumn_shortened) > 0){
message(sprintf("cannot resolve - 2nd try:\n%s\n%s\n\n", extracolumn, extracolumn_shortened))
api_call_str <- kgapi_call_str(query=extracolumn_shortened, apikey=apikey)
json <- kg_api_call(api_call_str, extracolumn)
if(!is.null(json)){
return(json)
}
}
if(is.null(json) & stri_length(extracolumn_shortened.2) > 0) {
message(sprintf("cannot resolve - 3rd try:\n%s\n%s\n\n", extracolumn, extracolumn_shortened.2))
api_call_str <- kgapi_call_str(query=extracolumn_shortened.2, apikey=apikey)
json <- kg_api_call(api_call_str, extracolumn)
}
else {
warning(sprintf("cannot resolve: \n%s\n%s\n\n", extracolumn, extracolumn_shortened))
}
}
kgapi_lookup <- function(lookup_str, apikey) {
dat <- kgapi_call_data(api_call_str=kgapi_call_str(query=lookup_str, apikey=apikey), extracolumn = lookup_str)
dat
}
#kgapi_call_str("GENERAL MTRS CO", apikey)
companies.metadata.3 <- do.call(bind_rows, lapply(companies, kgapi_lookup, apikey))
companies.metadata.4 <- companies.metadata.3 %>%
mutate(result..type=map(map(result..type, unlist), sort, decreasing=TRUE))
einhorn <- einhorn %>%
left_join(companies.metadata.4, by="name.of.issuer")
Next time I try to use the CUSIP identifiers, which were also provided in the SEC 13F Form, but this service is non-free AFAIK.
I have a vector of strings, each of those has a number inside and I like to sort this vector according to this number.
MWE:
> str = paste0('N', sample(c(1,2,5,10,11,20), 6, replace = FALSE), 'someotherstring')
> str
[1] "N11someotherstring" "N5someotherstring" "N2someotherstring" "N20someotherstring" "N10someotherstring" "N1someotherstring"
> sort(str)
[1] "N10someotherstring" "N11someotherstring" "N1someotherstring" "N20someotherstring" "N2someotherstring" "N5someotherstring"
while I'd like to have
[1] "N1someotherstring" "N2someotherstring" "N5someotherstring" "N10someotherstring" "N11someotherstring" "N20someotherstring"
I have thought of using something like:
num = sapply(strsplit(str, split = NULL), function(s) {
as.numeric(paste0(head(s, -15)[-1], collapse = ""))
})
str = str[sort(num, index.return=TRUE)$ix]
but I guess there might be something simpler
There is an easy way to do this via gtools package,
gtools::mixedsort(str)
#[1] "N1someotherstring" "N2someotherstring" "N5someotherstring" "N10someotherstring" "N11someotherstring" "N20someotherstring"
I'm trying to parse information from the sbml/xml file below
https://dl.dropboxusercontent.com/u/10712588/file.xml
from this code
http://search.bioconductor.jp/codes/11172
It seems that I can import the file normally by
doc <- xmlTreeParse(filename,ignoreBlanks = TRUE)
but I can't recover node attributes by
atrr <- xpathApply(doc, "//species[#id]", xmlGetAttr, "id")
or
xpathApply(doc, "//species", function(n) xmlValue(n[[2]]))
A node of the file follows...
<species id="M_10fthf_m" initialConcentration="1" constant="false" hasOnly
SubstanceUnits="false" name="10-formyltetrahydrofolate(2-)" metaid="_metaM_10fth
f_m" boundaryCondition="false" sboTerm="SBO:0000247" compartment="m">
<notes>
<body xmlns="http://www.w3.org/1999/xhtml">
<p>FORMULA: C20H21N7O7</p>
<p>CHARGE: -2</p>
<p>INCHI: InChI=1S/C20H23N7O7/c21-20-25-16-15(18(32)26-20)23-11(7-22
-16)8-27(9-28)12-3-1-10(2-4-12)17(31)24-13(19(33)34)5-6-14(29)30/h1-4,9,11,13,23
H,5-8H2,(H,24,31)(H,29,30)(H,33,34)(H4,21,22,25,26,32)/p-2/t11-,13+/m1/s1</p>
<p>HEPATONET_1.0_ABBREVIATION: HC00212</p>
<p>EHMN_ABBREVIATION: C00234</p>
</body>
</notes>
<annotation>
...
I would like to retrieve all information inside species node, anyone know how to do that?
There exists an SBML parsing library libSBML (http://sbml.org/Software/libSBML).
This includes a binding to R that would allow access to the SBML objects directly within R using code similar to
document = readSBML(filename);
errors = SBMLErrorLog_getNumFailsWithSeverity(
SBMLDocument_getErrorLog(document),
enumToInteger("LIBSBML_SEV_ERROR", "_XMLErrorSeverity_t")
);
if (errors > 0) {
cat("Encountered the following SBML errors:\n");
SBMLDocument_printErrors(document);
q(status=1);
}
model = SBMLDocument_getModel(document);
if (is.null(model)) {
cat("No model present.\n");
q(status=1);
}
species = Model_getSpecies(model, index_of_species);
id = Species_getId(species);
conc = Species_getInitialConcentration(species)
There is a Species_get(NameOfAttribute) function for each possible attribute; together with Species_isSet(NameOfAttribute); Species_set(NameOfAttribute) and Species_unset(NameOfAttribute).
The API is similar for interacting with any SBML element.
The libSBML releases include R installers that are available from
http://sourceforge.net/projects/sbml/files/libsbml/5.8.0/stable
navigating to the R_interface subdirectory for the OS and architecture of your choice.
The source code distribution of libSBML contains an examples/r directory with many examples of using libSBML to interact with SBML in the R environment.
I guess it depends on what you mean when you say you want to "retrieve" all the information in the species nodes, because that retrieved data could be coerced to any number of different formats. The following assumes you want it all in a data frame, where each row is an species node from your XML file and the columns represent different pieces of information.
When just trying to extract information, I generally find it easier to work with lists than with XML.
doc <- xmlTreeParse(xml_file, ignoreBlanks = TRUE)
doc_list <- xmlToList(doc)
Once it's in a list, you can figure out where the species data is stored:
sapply(x, function(x)unique(names(x)))
[[1]]
NULL
[[2]]
NULL
[[3]]
NULL
[[4]]
[1] "species"
[[5]]
[1] "reaction"
[[6]]
[1] "metaid"
$.attrs
[1] "level" "version"
So you really only want the information in doc_list[[4]]. Take a look at just the first component of doc_list[[4]]:
str(doc_list[[4]][[1]])
List of 9
$ : chr "FORMULA: C20H21N7O7"
$ : chr "CHARGE: -2"
$ : chr "HEPATONET_1.0_ABBREVIATION: HC00212"
$ : chr "EHMN_ABBREVIATION: C00234"
$ : chr "http://identifiers.org/obo.chebi/CHEBI:57454"
$ : chr "http://identifiers.org/pubchem.compound/C00234"
$ : chr "http://identifiers.org/hmdb/HMDB00972"
$ : Named chr "#_metaM_10fthf_c"
..- attr(*, "names")= chr "about"
$ .attrs: Named chr [1:9] "M_10fthf_c" "1" "false" "false" ...
..- attr(*, "names")= chr [1:9] "id" "initialConcentration" "constant" "hasOnlySubstanceUnits" ...
So you have the information contained in the first eight lists, plus the information contained in the attributes.
Getting the attributes information is easy because it's already named. The following formats the attributes information into a data frame for each node:
doc_attrs <- lapply(doc_list[[4]], function(x) {
x <- unlist(x[names(x) == ".attrs"])
col_names <- gsub(".attrs.", "", names(x))
x <- data.frame(matrix(x, nrow = 1), stringsAsFactors = FALSE)
colnames(x) <- col_names
x
})
Some nodes didn't appear to have attributes information and so returned empty data frames. That caused problems later so I created data frames of NAs in their place:
doc_attrs_cols <- unique(unlist(sapply(doc_attrs, colnames)))
doc_attrs[sapply(doc_attrs, length) == 0] <-
lapply(doc_attrs[sapply(doc_attrs, length) == 0], function(x) {
df <- data.frame(matrix(rep(NA, length(doc_attrs_cols)), nrow = 1))
colnames(df) <- doc_attrs_cols
df
})
When it came to pulling non-attribute data, the names and values of the variables were generally contained within the same string. I originally tried to come up with a regular expression to extract the names, but they're all formatted so differently that I gave up and just identified all the possibilities in this particular data set:
flags <- c("FORMULA:", "CHARGE:", "HEPATONET_1.0_ABBREVIATION:",
"EHMN_ABBREVIATION:", "obo.chebi/CHEBI:", "pubchem.compound/", "hmdb/HMDB",
"INCHI: ", "kegg.compound/", "kegg.genes/", "uniprot/", "drugbank/")
Also, sometimes the non-attribute information was kept as just a list of values, as in the node I showed above, while other times it was contained in "notes" and "annotation" sublists, so I had to include an if else statement to make things more consistent.
doc_info <- lapply(doc_list[[4]], function(x) {
if(any(names(x) != ".attrs" & names(x) != "")) {
names(x)[names(x) != ".attrs"] <- ""
x <- unlist(do.call("c", as.list(x[names(x) != ".attrs"])))
} else {
x <- unlist(x[names(x) != ".attrs"])
}
x <- gsub("http://identifiers.org/", "", x)
need_names <- names(x) == ""
names(x)[need_names] <- gsub(paste0("(", paste0(flags, collapse = "|"), ").+"), "\\1", x[need_names], perl = TRUE)
#names(x) <- gsub("\\s+", "", names(x))
x[need_names] <- gsub(paste0("(", paste0(flags, collapse = "|"), ")(.+)"), "\\2", x[need_names], perl = TRUE)
col_names <- names(x)
x <- data.frame(matrix(x, nrow = 1), stringsAsFactors = FALSE)
colnames(x) <- col_names
x
})
To get everything together into a data frame, I suggest the plyr package's rbind.fill.
require(plyr)
doc_info <- do.call("rbind.fill", doc_info)
doc_attrs <- do.call("rbind.fill", doc_attrs)
doc_all <- cbind(doc_info, doc_attrs)
dim(doc_all)
[1] 3972 22
colnames(doc_all)
[1] "FORMULA:" "CHARGE:" "HEPATONET_1.0_ABBREVIATION:" "EHMN_ABBREVIATION:"
[5] "obo.chebi/CHEBI:" "pubchem.compound/" "hmdb/HMDB" "about"
[9] "INCHI: " "kegg.compound/" "kegg.genes/" "uniprot/"
[13] "drugbank/" "id" "initialConcentration" "constant"
[17] "hasOnlySubstanceUnits" "name" "metaid" "boundaryCondition"
[21] "sboTerm" "compartment"
As a partial answer, the document uses name spaces, and 'species' is part of the 'id' name space. So
> xpathSApply(doc, "//id:species", xmlGetAttr, "id", namespaces="id")
[1] "M_10fthf_c" "M_10fthf_m" "M_13dampp_c" "M_h2o_c" "M_o2_c"
[6] "M_bamppald_c" "M_h2o2_c" "M_nh4_c" "M_h_m" "M_nadph_m"
...
with id:species and namespaces="id" being different from what you illustrate above.
I can get a list of all the available packages with the function:
ap <- available.packages()
But how can I also get a description of these packages from within R, so I can have a data.frame with two columns: package and description?
Edit of an almost ten-year old accepted answer. What you likely want is not to scrape (unless you want to practice scraping) but use an existing interface: tools::CRAN_package_db(). Example:
> db <- tools::CRAN_package_db()[, c("Package", "Description")]
> dim(db)
[1] 18978 2
>
The function brings (currently) 66 columns back of which the of interest here are a part.
I actually think you want "Package" and "Title" as the "Description" can run to several lines. So here is the former, just put "Description" in the final subset if you really want "Description":
R> ## from http://developer.r-project.org/CRAN/Scripts/depends.R and adapted
R>
R> require("tools")
R>
R> getPackagesWithTitle <- function() {
+ contrib.url(getOption("repos")["CRAN"], "source")
+ description <- sprintf("%s/web/packages/packages.rds",
+ getOption("repos")["CRAN"])
+ con <- if(substring(description, 1L, 7L) == "file://") {
+ file(description, "rb")
+ } else {
+ url(description, "rb")
+ }
+ on.exit(close(con))
+ db <- readRDS(gzcon(con))
+ rownames(db) <- NULL
+
+ db[, c("Package", "Title")]
+ }
R>
R>
R> head(getPackagesWithTitle()) # I shortened one Title here...
Package Title
[1,] "abc" "Tools for Approximate Bayesian Computation (ABC)"
[2,] "abcdeFBA" "ABCDE_FBA: A-Biologist-Can-Do-Everything of Flux ..."
[3,] "abd" "The Analysis of Biological Data"
[4,] "abind" "Combine multi-dimensional arrays"
[5,] "abn" "Data Modelling with Additive Bayesian Networks"
[6,] "AcceptanceSampling" "Creation and evaluation of Acceptance Sampling Plans"
R>
Dirk has provided an answer that is terrific and after finishing my solution and then seeing his I debated for some time posting my solution for fear of looking silly. But I decided to post it anyway for two reasons:
it is informative to beginning scrapers like myself
it took me a while to do and so why not :)
I approached this thinking I'd need to do some web scraping and choose crantastic as the site to scrape from. First I'll provide the code and then two scraping resources that have been very helpful to me as I learn:
library(RCurl)
library(XML)
URL <- "http://cran.r-project.org/web/checks/check_summary.html#summary_by_package"
packs <- na.omit(XML::readHTMLTable(doc = URL, which = 2, header = T,
strip.white = T, as.is = FALSE, sep = ",", na.strings = c("999",
"NA", " "))[, 1])
Trim <- function(x) {
gsub("^\\s+|\\s+$", "", x)
}
packs <- unique(Trim(packs))
u1 <- "http://crantastic.org/packages/"
len.samps <- 10 #for demo purpose; use:
#len.samps <- length(packs) # for all of them
URL2 <- paste0(u1, packs[seq_len(len.samps)])
scraper <- function(urls){ #function to grab description
doc <- htmlTreeParse(urls, useInternalNodes=TRUE)
nodes <- getNodeSet(doc, "//p")[[3]]
return(nodes)
}
info <- sapply(seq_along(URL2), function(i) try(scraper(URL2[i]), TRUE))
info2 <- sapply(info, function(x) { #replace errors with NA
if(class(x)[1] != "XMLInternalElementNode"){
NA
} else {
Trim(gsub("\\s+", " ", xmlValue(x)))
}
}
)
pack_n_desc <- data.frame(package=packs[seq_len(len.samps)],
description=info2) #make a dataframe of it all
Resources:
talkstats.com thread on web scraping (great beginner
examples)
w3schools.com site on html stuff (very
helpful)
I wanted to try to do this using a HTML scraper (rvest) as an exercise, since the available.packages() in OP doesn't contain the package Descriptions.
library('rvest')
url <- 'https://cloud.r-project.org/web/packages/available_packages_by_name.html'
webpage <- read_html(url)
data_html <- html_nodes(webpage,'tr td')
length(data_html)
P1 <- html_nodes(webpage,'td:nth-child(1)') %>% html_text(trim=TRUE) # XML: The Package Name
P2 <- html_nodes(webpage,'td:nth-child(2)') %>% html_text(trim=TRUE) # XML: The Description
P1 <- P1[lengths(P1) > 0 & P1 != ""] # Remove NULL and empty ("") items
length(P1); length(P2);
mdf <- data.frame(P1, P2, row.names=NULL)
colnames(mdf) <- c("PackageName", "Description")
# This is the problem! It lists large sets column-by-column,
# instead of row-by-row. Try with the full list to see what happens.
print(mdf, right=FALSE, row.names=FALSE)
# PackageName Description
# A3 Accurate, Adaptable, and Accessible Error Metrics for Predictive\nModels
# abbyyR Access to Abbyy Optical Character Recognition (OCR) API
# abc Tools for Approximate Bayesian Computation (ABC)
# abc.data Data Only: Tools for Approximate Bayesian Computation (ABC)
# ABC.RAP Array Based CpG Region Analysis Pipeline
# ABCanalysis Computed ABC Analysis
# For small sets we can use either:
# mdf[1:6,] #or# head(mdf, 6)
However, although working quite well for small array/dataframe list (subset), I ran into a display problem with the full list, where the data would be shown either column-by-column or unaligned. I would have been great to have this paged and properly formatted in a new window somehow. I tried using page, but I couldn't get it to work very well.
EDIT:
The recommended method is not the above, but rather using Dirk's suggestion (from the comments below):
db <- tools::CRAN_package_db()
colnames(db)
mdf <- data.frame(db[,1], db[,52])
colnames(mdf) <- c("Package", "Description")
print(mdf, right=FALSE, row.names=FALSE)
However, this still suffers from the display problem mentioned...