stemDocment in tm package not working on past tense word - r

I have a file 'check_text.txt' that contains "said say says make made". I'd like to perform stemming on it to get "say say say make make". I tried to use stemDocument in tm package, as the following, but only get "said say say make made". Is there a way to perform stemming on past tense words? Is it necessary to do so in real-world natural language processing? Thanks!
filename = 'check_text.txt'
con <- file(filename, "rb")
text_data <- readLines(con,skipNul = TRUE)
close(con)
text_VS <- VectorSource(text_data)
text_corpus <- VCorpus(text_VS)
text_corpus <- tm_map(text_corpus, stemDocument, language = "english")
as.data.frame(text_corpus)$text
EDIT: I also tried wordStem in SnowballC package
> library(SnowballC)
> wordStem(c("said", "say", "says", "make", "made"))
[1] "said" "sai" "sai" "make" "made"

If there is a data set of irregular English verbs in a package, this task would be easy. I just do not know any packages with such data, so I chose to create my own database by scraping. I am not sure if this website covers all irregular words. If necessary, you want to search better websites to create your own database. Once you have your database, You can engage in your task.
First, I used stemDocument() and clean up present forms with -s. Then, I collected past forms in words (i.e., past), infinitive forms of the past forms (i.e., inf1),identified the order of the past forms in temp. I further identified the positions of the past forms in temp. I finally replaced the sat forms with their infinitive forms. I repeated the same procedure for past participles.
library(tm)
library(rvest)
library(dplyr)
library(splitstackshape)
### Create a database
x <- read_html("http://www.englishpage.com/irregularverbs/irregularverbs.html")
x %>%
html_table(header = TRUE) %>%
bind_rows %>%
rename(Past = `Simple Past`, PP = `Past Participle`) %>%
filter(!Infinitive %in% LETTERS) %>%
cSplit(splitCols = c("Past", "PP"),
sep = " / ", direction = "long") %>%
filter(complete.cases(.)) %>%
mutate_each(funs(gsub(pattern = "\\s\\(.*\\)$|\\s\\[\\?\\]",
replacement = "",
x = .))) -> mydic
### Work on the task
words <- c("said", "drawn", "say", "says", "make", "made", "done")
### says to say
temp <- stemDocument(words)
### past forms become present form
### Collect past forms
past <- mydic$Past[which(mydic$Past %in% temp)]
### Collect infinitive forms of past forms
inf1 <- mydic$Infinitive[which(mydic$Past %in% temp)]
### Identify the order of past forms in temp
ind <- match(temp, past)
ind <- ind[is.na(ind) == FALSE]
### Where are the past forms in temp?
position <- which(temp %in% past)
temp[position] <- inf1[ind]
### Check
temp
#[1] "say" "drawn" "say" "say" "make" "make" "done"
### PP forms to infinitive forms (same as past forms)
pp <- mydic$PP[which(mydic$PP %in% temp)]
inf2 <- mydic$Infinitive[which(mydic$PP %in% temp)]
ind <- match(temp, pp)
ind <- ind[is.na(ind) == FALSE]
position <- which(temp %in% pp)
temp[position] <- inf2[ind]
### Check
temp
#[1] "say" "draw" "say" "say" "make" "make" "do"

Related

R create an object in a function and rename the object by using a variable but retaining the object for use in the current session

I have a function that I use to get financial data from the Wall Street Journal website. Basically I want to make a copy of the data held in symData and give it a name the same as symbol. That means the objects are in the workspace and can be reused for looking at other information. I don't want to keep them permanently so creating temp files on the filesystem is not my favoured method.
The problem I have is that I can't figure out how to do it.
library(httr)
library(XML)
library(data.table)
getwsj.quotes <- function(symbol)
{
myUrl <- sprintf("https://quotes.wsj.com/AU/XASX/%s/FINANCIALS", symbol)
symbol.data <- GET(myUrl)
x <- content(symbol.data, as = 'text')
wsj.tables <- sub('cr_dataTable cr_sub_capital', '\\1', x)
symData <- readHTMLTable(wsj.tables)
mytemp <- summary(symData)
print(mytemp)
d2e <- gsub('^.* ', '', names(symData[[8]]))
my.out <- sprintf("%s has Debt to Equity Ratio of %s", symbol, d2e)
print(my.out)
}
TickerList <- c("AMC", "ANZ")
for (Ticker in TickerList)
{
Ticker.Data <- lapply(Ticker, FUN = getwsj.quotes)
}
The Ticker.Data output is:
> Ticker.Data
[[1]]
[1] "ANZ has Debt to Equity Ratio of 357.41"
The output from mytemp <- summary(symData) has the following:
Length Class Mode
NULL 12 data.frame list
NULL 2 data.frame list
...
I tried various ways of doing it when I call the function and all I ever get is the last symbols data. I have searched for hours trying to get an answer but so far, no luck. I need to walk away for a few hours.
Any information would be most helpful.
Regards
Stephen
Edited: I changed my answer based on the suggestion by #MrFlick. It solved another problem.
library(httr)
library(XML)
library(data.table)
getwsj.quotes <- function(Symbol)
{
MyUrl <- sprintf("https://quotes.wsj.com/AU/XASX/%s/FINANCIALS", Symbol)
Symbol.Data <- GET(MyUrl)
x <- content(Symbol.Data, as = 'text')
wsj.tables <- sub('cr_dataTable cr_sub_capital', '\\1', x)
SymData <- readHTMLTable(wsj.tables)
return(SymData)
}
TickerList <- c("AMC", "ANZ", "BHP", "BXB", "CBA", "COL", "CSL", "IAG", "MQG", "NAB", "RIO", "S32", "SCG", "SUN", "TCL", "TLS", "WBC", "WES", "WOW", "WPL")
SymbolDataList <- lapply(TickerList, FUN = getwsj.quotes)
Thanks again.

JSON applied over a dataframe in R

I used the below on one website and it returned a perfect result:
looking for key word: Emaar pasted at the end of the query:
library(httr)
library(jsonlite)
query<-"https://www.googleapis.com/customsearch/v1?key=AIzaSyA0KdZHRkAjmoxKL14eEXp2vnI4Yg_po38&cx=006431301429107149113:as7yqcm2qc8&q=Emaar"
result11 <- content(GET(query))
print(result11)
result11_JSON <- toJSON(result11)
result11_JSON <- fromJSON(result11_JSON)
result11_df <- as.data.frame(result11_JSON)
now I want to apply the same function over a data.frame containing key words:
so i did the below testing .csv file:
Company Name
[1] ADES International Holding Ltd
[2] Emirates REIT (CEIC) Limited
[3] POLARCUS LIMITED
called it Testing Website Extraction.csv
code used:
test_companies <- read.csv("... \\Testing Website Extraction.csv")
#removing space and adding "+" sign then pasting query before it (query already has my unique google key and search engine ID
test_companies$plus <- gsub(" ", "+", test_companies$Company.Name)
query <- "https://www.googleapis.com/customsearch/v1?key=AIzaSyCmD6FRaonSmZWrjwX6JJgYMfDSwlR1z0Y&cx=006431301429107149113:as7yqcm2qc8&q="
test_companies$plus <- paste0(query, test_companies$plus)
a <- test_companies$plus
length(a)
function_webs_search <- function(web_search) {content(GET(web_search))}
result <- lapply(as.character(a), function_webs_search)
Result here shows a list of length 3 (the 3 search terms) and sublist within each term containing: url (list[2]), queries (list[2]), ... items (list[10]) and these are the same for each search term (same length separately), my issue here is applying the remainder of the code
#when i run:
result_JSON <- toJSON(result)
result_JSON <- as.list(fromJSON(result_JSON))
I get a list of 6 list that has sublists
and putting it into a tidy dataframe where the results are listed under each other (not separately) is proving to be difficult
also note that I tried taking from the "result" list that has 3 separate lists in it each one by itself but its a lot of manual labor if I have a longer list of keywords
The expected end result should include 30 observations of 37 variables (for each search term 10 observations of 37 variables and all are underneath each other.
Things I have tried unsuccessfully:
These work to flatten the list:
#do.call(c , result)
#all.equal(listofvectors, res, check.attributes = FALSE)
#unlist(result, recursive = FALSE)
# for (i in 1:length(result)) {listofvectors <- c(listofvectors, result[[i]])}
#rbind()
#rbind.fill()
even after flattening I dont know how to organize them into a tidy final output for a non-R user to interact with.
Any help here would be greatly appreciated,
I am here in case anything is not clear about my question,
Always happy to learn more about R so please bear with me as I am just starting to catch up.
All the best and thanks in advance!
Basically what I did is extract only the columns I need from the dataframe list, below is the final code:
library(httr)
library(jsonlite)
library(tidyr)
library(stringr)
library(purrr)
library(plyr)
test_companies <- read.csv("c:\\users\\... Companies Without Websites List.csv")
test_companies$plus <- gsub(" ", "+", test_companies$Company.Name)
query <- "https://www.googleapis.com/customsearch/v1?key=AIzaSyCmD6FRaonSmZWrjwX6JJgYMfDSwlR1z0Y&cx=006431301429107149113:as7yqcm2qc8&q="
test_companies$plus <- paste0(query, test_companies$plus)
a <- test_companies$plus
length(a)
function_webs_search <- function(web_search) {content(GET(web_search))}
result <- lapply(as.character(a), function_webs_search)
function_toJSONall <- function(all) {toJSON(all)}
a <- lapply(result, function_toJSONall)
function_fromJSONall <- function(all) {fromJSON(all)}
b <- lapply(a, function_fromJSONall)
function_dataframe <- function(all) {as.data.frame(all)}
c <- lapply(b, function_dataframe)
function_column <- function(all) {all[ ,15:30]}
result_final <- lapply(c, function_column)
results_df <- rbind.fill(c[])

Scraping from more than one aspx pages with R

I am a linguistics student doing experiments in R. I have been looking at other questions and got a lot of help, but I am stuck at the moment as I cannot implement example functions to my case and would love to have some help.
First, I would like to go through every semester from here: http://registration.boun.edu.tr/schedule.htm, and every department here: http://registration.boun.edu.tr/scripts/schdepsel.asp
It is actually a bit easy to generate the list of it as the final link is something like this: http://registration.boun.edu.tr/scripts/sch.asp?donem=2017/2018-3&kisaadi=ATA&bolum=ATATURK+INSTITUTE+FOR+MODERN+TURKISH+HISTORY
Secondly, I need to select the code, name, days and hours of the course and tag the semester, which I did. (probably, I did it extremely poorly, but I did it nevertheless, yay!)
library("rvest")
library("dplyr")
library("magrittr")
# define the html
reg <- read_html("http://registration.boun.edu.tr/scripts/sch.asp?donem=2017/2018-3&kisaadi=ATA&bolum=ATATURK+INSTITUTE+FOR+MODERN+TURKISH+HISTORY")
# make the html a list of tables
regtable <- reg %>% html_table(fill = TRUE)
# tag their year
regtable[[4]][ ,15] <- regtable[[1]][1,2]
regtable[[4]][1,15] <- "Semester"
# Change the Days and Hours to sth usable, but how and to what?
# parse the dates, T and Th problem?
# parse the hour 10th hour problem?
# get the necessary info
regtable <- regtable %>% .[4] %>% as.data.frame() %>% select( . , X1 , X3 , X8 , X9 , V15)
# correct the names
names(regtable) <- regtable[1,]
regtable <- regtable[-1,]
View(regtable)
But the problem is that I want to write a function where I can do this for more than 20 semester and more than 50 departments. Any help would be great! I am doing this so that I can work on optimization for class hours for my department.
I guess I can do this better with XML Package, but I could not understand how to use it.
Thanks for any help,
Utku
Here is an answer building upon what you have already done. There are likely more efficient solutions, but this should be a good start. You also don't state how you would like to store the data, so currently what I have made will assign each combination of semester and department to its own data frame, which creates a huge amount for the number of departments. It is not ideal but I don't know how you plan to use the data after collection.
library("rvest")
library("dplyr")
library("magrittr")
# Create a Department list
dep_list <- read_html("http://registration.boun.edu.tr/scripts/schdepsel.asp")
# Take the read html and identify all objects of class menu2 and extract the
# href which will give you the final part of the url
dep_list <- dep_list %>%
html_nodes(xpath = '//*[#class="menu2"]') %>%
xml_attr("href")
department_list <- gsub("/scripts/sch.asp?donem=", "", dep_list, fixed = TRUE)
# Create a list for all of the semesters
sem_list <- read_html("http://registration.boun.edu.tr/schedule.htm")
sem_list <- sem_list %>% html_table(fill = TRUE)
# Extract the table from the list needed
semester_df <- sem_list[[2]]
# The website uses a table for the dropdown but the values are all in the second cell
# of the second column as a string
semester_list <- semester_df$X2[2]
# Separate the string into a list at the space characters
semester_list <- unlist(strsplit(semester_list, "\\s+"))
# Loop through the list of departments and within each department loop through the
# list of semesters to get the data you want
for(dep in department_list){
for(sem in semester_list){
url <- paste("http://registration.boun.edu.tr/scripts/sch.asp?donem=", sem, dep, sep = "")
reg <- read_html(url)
# make the html a list of tables
regtable <- reg %>% html_table(fill = TRUE)
# The data we want is in the 4th portion of the created list so extract that
regtable <- regtable[[4]]
# Rename the column headers to the values in the first row and remove the
# first row
regtable <- setNames(regtable[-1, ], regtable[1, ])
# Create semester column and select the variables we want
regtable <- regtable %>%
mutate(Semester = sem) %>%
select(Code.Sec, Name, Days, Hours, Semester)
# Assign the created table to a dataframe
# Could also save the file here instead
assign(paste("table", sem, gsub(" ", "_", dep), sep = "_"), regtable)
}
}
Thanks to #Amanda I was able to achieve what I wanted to do. Only thing left is to scraping shortnames list, match them and do the whole thing, but I can do what I want with creating a list. Any further comments to do this more elegantly are appreciated!
library("rvest")
library("dplyr")
library("magrittr")
# Create a Department list
dep_list <- read_html("http://registration.boun.edu.tr/scripts/schdepsel.asp")
dep_list <- dep_list %>% html_table(fill = TRUE)
# Select the table from the html that contains the data we want
department_df <- dep_list[[2]]
# Rename the columns with the value of the first row and remove row
department_df <- setNames(department_df[-1, ], department_df[1, ])
# Combine the two columns into a list
department_list <- c(department_df[, 1], department_df[, 2])
# Edit the department list
# We can choose accordingly.
department_list <- department_list[c(7,8,16,20,26,33,36,37,38,39)]
# Create a list for all of the semesters
sem_list <- read_html("http://registration.boun.edu.tr/schedule.htm")
sem_list <- sem_list %>% html_table(fill = TRUE)
# Extract the table from the list needed
semester_df <- sem_list[[2]]
# The website uses a table for the dropdown but the values are all in the second cell
# of the second column as a string
semester_list <- semester_df$X2[2]
# Separate the string into a list at the space characters
semester_list <- unlist(strsplit(semester_list, "\\s+"))
# Shortnames string
# We can add whichever we want.
shortname_list <- c("FLED", "HIST" , "PSY", "LL" , "PA" , "PHIL" , "YADYOK" , "SOC" , "TR" , "TKL" )
# Length
L = length(department_list)
# the function to get the schedule for the selected departments
for( i in 1:L){
for(sem in semester_list){tryCatch({
dep <- department_list[i]
sn <- shortname_list[i]
url_second_part <- interaction("&kisaadi=" , sn, "&bolum=", gsub(" ", "+", (gsub("&" , "%26", dep))), sep = "", lex.order = TRUE)
url <- paste("http://registration.boun.edu.tr/scripts/sch.asp?donem=", sem, url_second_part, sep = "")
reg <- read_html(url)
# make the html a list of tables
regtable <- reg %>% html_table(fill = TRUE)
# The data we want is in the 4th portion of the created list so extract that
regtable <- regtable[[4]]
# Rename the column headers to the values in the first row and remove the
# first row
regtable <- setNames(regtable[-1, ], regtable[1, ])
# Create semester column and select the variables we want
regtable <- regtable %>%
mutate(Semester = sem) %>%
select(Code.Sec, Name, Days, Hours, Semester)
# Assign the created table to a dataframe
# Could also save the file here instead
assign(paste("table", sem, gsub(" ", "_", dep), sep = "_"), regtable)
}, error = function(e){cat("ERROR : No information on this" , url , "\n" )})
}
}
### Maybe make Errors another dataset or list too.

R Regex seemingly not working properly in Linux

I'm trying to scrape the webpage of Fangraphs with alphabetical player indices to get a single column dataframe of each letter reference.
I have been able to get the code below to successfully work on a Windows version of R 3.4.1, but cannot get it to work on the Linux side at all, and I can't figure out what exactly is going wrong/different.
library(XML)
# Scrape to get the webpage
url <- paste0("http://www.fangraphs.com/players.aspx?")
table <- readHTMLTable(url, stringsAsFactors = FALSE)
letterz <- table[[2]]
letterz <- as.character(letterz)
letterz <- strsplit(letterz, split=", ")
letterz <- as.data.frame(letterz)
names(letterz) <- c("letters")
letterz$letters <- as.character(letterz$letters)
# Below this is where I can notice that the code is not operating the same
# as on my Windows machine. None of the gsub commands seem to impact
# the strings at all.
# Stripping the trailing whitespace
letterz$letters <- gsub("[[:space:]]+$", "", letterz$letters)
# Replacing patterns like "AzB Ba" to instead have "Az,Ba"
letterz$letters <- gsub("[[:upper:]]+?[[:space:]]+?[[:space:]]+?[[:space:]]+", ",", letterz$letters)
# Final cleaning up
letterz <- as.character(letterz)
letterz <- strsplit(letterz, split=",")
letterz <- as.data.frame(letterz)
names(letterz) <- c("letters")
letterz$letters <- as.character(letterz$letters)
letterz$letters <- gsub('c\\("|"\\)|"', "", letterz$letters)
letterz$letters <- gsub('^$', NA, letterz$letters)
letterz$letters <- gsub("^[[:space:]]+","", letterz$letters)
letterz$letters <- gsub("[[:space:]]+$","", letterz$letters)
letterz$letters <- gsub("'", "%27", letterz$letters)
letterz <- na.omit(letterz)
From what I could find, the only real difference between Windows/Linux regex would be the linebreak implementation, which I went back and tried to see if that was making the difference... but still got no change.
I also tried to substitute the R-specific "[[:space:]]" and "[[:upper:]]" style notation with the more standardized "\s" to see if that would fix anything.
As for fixes, I know there are a handful of other packages that I can look into to simply get the result I'm looking for, but more generally, are there just simply differences in how Windows and Linux implement regex that I'm unaware of and am oblivious to? And if so, how would I implement them into gsub to get the same result I get on Windows?
Thanks.

How to retrieve informations about journals from ISI Web of Knowledge?

I am working on some work of prediction citation counts for articles. The problem I have is that I need information about journals from ISI Web of Knowledge. They're gathering these information (journal impact factor, eigenfactor,...) year by year, but there is no way to download all one-year-journal-informations at once. There's just option to "mark all" which marks always first 500 journals in the list (this list then can be downloaded). I am programming this project in R. So my question is, how to retrieve this information at once or in efficient and tidy way? Thank you for any idea.
I used RSelenium to scrape WOS to get citation data and make a plot similar to this one by Kieran Healy (but mine was for archaeology journals, so my code is tailored to that):
Here's my code (from a slightly bigger project on github):
# setup broswer and selenium
library(devtools)
install_github("ropensci/rselenium")
library(RSelenium)
checkForServer()
startServer()
remDr <- remoteDriver()
remDr$open()
# go to http://apps.webofknowledge.com/
# refine search by journal... perhaps arch?eolog* in 'topic'
# then: 'Research Areas' -> archaeology -> refine
# then: 'Document types' -> article -> refine
# then: 'Source title' -> choose your favourite journals -> refine
# must have <10k results to enable citation data
# click 'create citation report' tab at the top
# do the first page manually to set the 'save file' and 'do this automatically',
# then let loop do the work after that
# before running the loop, get URL of first page that we already saved,
# and paste in next line, the URL will be different for each run
remDr$navigate("http://apps.webofknowledge.com/CitationReport.do?product=UA&search_mode=CitationReport&SID=4CvyYFKm3SC44hNsA2w&page=1&cr_pqid=7&viewType=summary")
Here's the loop to automate collecting data from the next several hundred pages of WOS results...
# Loop to get citation data for each page of results, each iteration will save a txt file, I used selectorgadget to check the css ids, they might be different for you.
for(i in 1:1000){
# click on 'save to text file'
result <- try(
webElem <- remDr$findElement(using = 'id', value = "select2-chosen-1")
); if(class(result) == "try-error") next;
webElem$clickElement()
# click on 'send' on pop-up window
result <- try(
webElem <- remDr$findElement(using = "css", "span.quickoutput-action")
); if(class(result) == "try-error") next;
webElem$clickElement()
# refresh the page to get rid of the pop-up
remDr$refresh()
# advance to the next page of results
result <- try(
webElem <- remDr$findElement(using = 'xpath', value = "(//form[#id='summary_navigation']/table/tbody/tr/td[3]/a/i)[2]")
); if(class(result) == "try-error") next;
webElem$clickElement()
print(i)
}
# there are many duplicates, but the code below will remove them
# copy the folder to your hard drive, and edit the setwd line below
# to match the location of your folder containing the hundreds of text files.
Read all text files into R...
# move them manually into a folder of their own
setwd("/home/two/Downloads/WoS")
# get text file names
my_files <- list.files(pattern = ".txt")
# make list object to store all text files in R
my_list <- vector(mode = "list", length = length(my_files))
# loop over file names and read each file into the list
my_list <- lapply(seq(my_files), function(i) read.csv(my_files[i],
skip = 4,
header = TRUE,
comment.char = " "))
# check to see it worked
my_list[1:5]
Combine list of dataframes from the scrape into one big dataframe
# use data.table for speed
install_github("rdatatable/data.table")
library(data.table)
my_df <- rbindlist(my_list)
setkey(my_df)
# filter only a few columns to simplify
my_cols <- c('Title', 'Publication.Year', 'Total.Citations', 'Source.Title')
my_df <- my_df[,my_cols, with=FALSE]
# remove duplicates
my_df <- unique(my_df)
# what journals do we have?
unique(my_df$Source.Title)
Make abbreviations for journal names, make article titles all upper case ready for plotting...
# get names
long_titles <- as.character(unique(my_df$Source.Title))
# get abbreviations automatically, perhaps not the obvious ones, but it's fast
short_titles <- unname(sapply(long_titles, function(i){
theletters = strsplit(i,'')[[1]]
wh = c(1,which(theletters == ' ') + 1)
theletters[wh]
paste(theletters[wh],collapse='')
}))
# manually disambiguate the journals that now only have 'A' as the short name
short_titles[short_titles == "A"] <- c("AMTRY", "ANTQ", "ARCH")
# remove 'NA' so it's not confused with an actual journal
short_titles[short_titles == "NA"] <- ""
# add abbreviations to big table
journals <- data.table(Source.Title = long_titles,
short_title = short_titles)
setkey(journals) # need a key to merge
my_df <- merge(my_df, journals, by = 'Source.Title')
# make article titles all upper case, easier to read
my_df$Title <- toupper(my_df$Title)
## create new column that is 'decade'
# first make a lookup table to get a decade for each individual year
year1 <- 1900:2050
my_seq <- seq(year1[1], year1[length(year1)], by = 10)
indx <- findInterval(year1, my_seq)
ind <- seq(1, length(my_seq), by = 1)
labl1 <- paste(my_seq[ind], my_seq[ind + 1], sep = "-")[-42]
dat1 <- data.table(data.frame(Publication.Year = year1,
decade = labl1[indx],
stringsAsFactors = FALSE))
setkey(dat1, 'Publication.Year')
# merge the decade column onto my_df
my_df <- merge(my_df, dat1, by = 'Publication.Year')
Find the most cited paper by decade of publication...
df_top <- my_df[ave(-my_df$Total.Citations, my_df$decade, FUN = rank) <= 10, ]
# inspecting this df_top table is quite interesting.
Draw the plot in a similar style to Kieran's, this code comes from Jonathan Goodwin who also reproduced the plot for his field (1, 2)
######## plotting code from from Jonathan Goodwin ##########
######## http://jgoodwin.net/ ########
# format of data: Title, Total.Citations, decade, Source.Title
# THE WRITERS AUDIENCE IS ALWAYS A FICTION,205,1974-1979,PMLA
library(ggplot2)
ws <- df_top
ws <- ws[order(ws$decade,-ws$Total.Citations),]
ws$Title <- factor(ws$Title, levels = unique(ws$Title)) #to preserve order in plot, maybe there's another way to do this
g <- ggplot(ws, aes(x = Total.Citations,
y = Title,
label = short_title,
group = decade,
colour = short_title))
g <- g + geom_text(size = 4) +
facet_grid (decade ~.,
drop=TRUE,
scales="free_y") +
theme_bw(base_family="Helvetica") +
theme(axis.text.y=element_text(size=8)) +
xlab("Number of Web of Science Citations") + ylab("") +
labs(title="Archaeology's Ten Most-Cited Articles Per Decade (1970-)", size=7) +
scale_colour_discrete(name="Journals")
g #adjust sizing, etc.
Another version of the plot, but with no code: http://charlesbreton.ca/?page_id=179

Resources