How to shorten the group name in bibtex so that only the abbreviation appears in the text - author

I would like to cite a report by the European Commission in bibtex (latex). In the references list I would like to have the full name but in the main text I would only like the abbreviation to appear. I am using the elsarticle-harv bibliographystyle.
Here is an example:
Main text:
The European Commission agreed in 2016 that ..... (EC, 2016).
References:
European Commission (EC), 2016. Communication from the ......
Here is my current bibentry:
#misc{EC2016,
title = {{Communication from the Commission to the European Parliament, the Council, the European Economic and Social Committee and the Committee of the Regions: An EU Strategy on Heating and Cooling}},
author = {{EC}},
year = {2016}
}

You can define a cite alias through the natbib package.
\documentclass{article}
\usepackage{natbib}
\defcitealias{EC2016}{(EC, 2016)}
\usepackage{filecontents}
\begin{filecontents}{references.bib}
#misc{EC2016, title = {{Communication from the Commission to the European Parliament, the Council, the European Economic and Social Committee and the Committee of the Regions: An EU Strategy on Heating and Cooling}}, author = {{European Commission (EC)}}, year = {2016}}
\end{filecontents}
\bibliographystyle{elsarticle-harv}
\begin{document}
\citetalias{EC2016}
\bibliography{references}
\end{document}

Related

How do I import a txt file into R and separate text into columns based on certain criteria

I have some job descriptions saved in a txt file format. The job title, job description, job title, etc are all lumped together and I am trying to separate them into columns. The text is about 5 pages long. Here is a sample of how the text is structured -
EXECUTIVE LEVEL
001 Chief Executive Officer: Job description of CEO.
040 Area Director: This line contains job description of the Area Director.
FINANCE TEAM
025 Chief Operating Officer: This line contains job description of the Chief Operating Officer
055 Chief Financial Officer: This person controls operations of the company and reports to the COO
MARKETING TEAM
056 Marketing Director: This person is in charge of the marketing team. Blab la bla
I would like to create a dataframe (or is it called tibble these days?) with 4 columns -
column 1 - The team name (Executive Level, Finance Team, Marketing Team, etc)
column 2 - Team number (001, 040 025, 055, etc)
column 3 - The job title (Chief Executive Officer, Chief Operating Officer, etc)
column 4 - The job description
Thanks in advance
x2 <- x[nzchar(x)]
x3 <- split(x2, cumsum(grepl("^[A-Z]", x2)))
x4 <- lapply(x3, function(z) transform(strcapture("^([0-9]+)\\s+([^:]+):\\s*(.*)$", z[-1], list(num="", title="", desc="")), name=z[1]))
x5 <- do.call(rbind, x4)
x5
# num title desc name
# 1.1 001 Chief Executive Officer Job description of CEO. EXECUTIVE LEVEL
# 1.2 040 Area Director This line contains job description of the Area Director. EXECUTIVE LEVEL
# 2.1 025 Chief Operating Officer This line contains job description of the Chief Operating Officer FINANCE TEAM
# 2.2 055 Chief Financial Officer This person controls operations of the company and reports to the COO FINANCE TEAM
# 3 056 Marketing Director This person is in charge of the marketing team. Blab la bla MARKETING TEAM
Data, likely the results of x <- readLines(path_to_file).
x <- c("EXECUTIVE LEVEL", "001 Chief Executive Officer: Job description of CEO.", "040 Area Director: This line contains job description of the Area Director.", "", "FINANCE TEAM", "025 Chief Operating Officer: This line contains job description of the Chief Operating Officer", "055 Chief Financial Officer: This person controls operations of the company and reports to the COO", "", "MARKETING TEAM", "056 Marketing Director: This person is in charge of the marketing team. Blab la bla")

How to read a cson file into R

I came across a cson (CoffeeScript-Object-Notation) file format and I would like to import it into R.
cson_example <- "# Comments!!!
# An Array with no commas!
greatDocumentaries: [
'earthlings.com'
'forksoverknives.com'
'cowspiracy.com'
]
# An Object without braces!
importantFacts:
# Multi-Line Strings! Without Quote Escaping!
emissions: '''
Livestock and their byproducts account for at least 32,000 million tons of carbon dioxide (CO2) per year, or 51% of all worldwide greenhouse gas emissions.
Goodland, R Anhang, J. “Livestock and Climate Change: What if the key actors in climate change were pigs, chickens and cows?”
WorldWatch, November/December 2009. Worldwatch Institute, Washington, DC, USA. Pp. 10–19.
http://www.worldwatch.org/node/6294
'''
milk: '''
1,000 gallons of water are required to produce 1 gallon of milk.
“Water trivia facts.” United States Environmental Protection Agency.
http://water.epa.gov/learn/kids/drinkingwater/water_trivia_facts.cfm#_edn11
'''
more: 'http://cowspiracy.com/facts'"
Is there a robust way to import cson files into R?

Transforming kwic objects into single dfm

I have a corpus of newspaper articles of which only specific parts are of interest for my research. I'm not happy with the results I get from classifying texts along different frames because the data contains too much noise. I therefore want to extract only the relevant parts from the documents. I was thinking of doing so by transforming several kwic objects generated by the quanteda package into a single df.
So far I've tried the following
exampletext <- c("The only reason for (the haste) which we can discern is the prospect of an Olympic medal, which is the raison d'etat of the banana republic,'' The Guardian said in an editorial under the headline ''Whatever Zola Wants. . .'' The Government made it clear it had acted promptly on the application to insure that the 5-foot-2-inch track star could qualify for the British Olympic team. The International Olympic Organization has a rule that says athletes who change their nationality must wait three years before competing for that country - a rule, however, that is often waived by the I.O.C. The British Olympic Association said it consulted with the I.O.C. before asserting Miss Budd's eligibility for the British team. ''Since Zola is now here and has a British passport she should be made to feel welcome and accepted by other British athletes,'' said Paul Dickenson, chairman of the International Athletes Club, an organization that raises money for amateur athletes and looks after their political interests. ''The thing we objected to was the way she got into the country by the Government and the Daily Mail and the commercialization exploitation associated with it.", "That left 14 countries that have joined the Soviet-led withdrawal. Albania and Iran had announced that they would not compete and did not send written notification. Bolivia, citing financial trouble, announced Sunday it would not participate.The 1972 Munich Games had the previous high number of competing countries, 122.No Protest Planned on Zola Budd YAOUNDE, Cameroon, June 4 (AP) - African countries do not plan to boycott the Los Angeles Olympics in protest of the inclusion of Zola Budd, the South African-born track star, on the British team, according to Lamine Ba, the secretary-general of the Supreme Council for Sport in Africa. Because South Africa is banned from participation in the Olympics, Miss Budd, whose father is of British descent, moved to Britain in March and was granted British citizenship.75 Olympians to Train in Atlanta ATLANTA, June 4 (AP) - About 75 Olympic athletes from six African countries and Pakistan will participate in a month-long training camp this summer in Atlanta under a program financed largely by a grant from the United States Information Agency, Anne Bassarab, a member of Mayor Andrew Young's staff, said today. The athletes, from Mozambique, Tanzania, Zambia, Zimbabwe, Uganda, Somalia and Pakistan, will arrive here June 24.")
mycorpus <- corpus(exampletext)
mycorpus.nat <- corpus(kwic(mycorpus, "nationalit*", window = 5, valuetype = "glob"))
mycorpus.cit <- corpus(kwic(mycorpus, "citizenship", window = 5, valuetype = "glob"))
mycorpus.kwic <- mycorpus.nat + mycorpus.cit
mydfm <- dfm(mycorpus.kwic)
This, however, generates a dfm that contains 4 documents instead of 2, and when both keywords are present in a document even more. I can't think of a way to bring the dfm down to the original number of documents.
Thank you for helping me out.
We recently added window argument to tokens_select() for this purpose:
require(quanteda)
txt <- c("The only reason for (the haste) which we can discern is the prospect of an Olympic medal, which is the raison d'etat of the banana republic,'' The Guardian said in an editorial under the headline ''Whatever Zola Wants. . .'' The Government made it clear it had acted promptly on the application to insure that the 5-foot-2-inch track star could qualify for the British Olympic team. The International Olympic Organization has a rule that says athletes who change their nationality must wait three years before competing for that country - a rule, however, that is often waived by the I.O.C. The British Olympic Association said it consulted with the I.O.C. before asserting Miss Budd's eligibility for the British team. ''Since Zola is now here and has a British passport she should be made to feel welcome and accepted by other British athletes,'' said Paul Dickenson, chairman of the International Athletes Club, an organization that raises money for amateur athletes and looks after their political interests. ''The thing we objected to was the way she got into the country by the Government and the Daily Mail and the commercialization exploitation associated with it.", "That left 14 countries that have joined the Soviet-led withdrawal. Albania and Iran had announced that they would not compete and did not send written notification. Bolivia, citing financial trouble, announced Sunday it would not participate.The 1972 Munich Games had the previous high number of competing countries, 122.No Protest Planned on Zola Budd YAOUNDE, Cameroon, June 4 (AP) - African countries do not plan to boycott the Los Angeles Olympics in protest of the inclusion of Zola Budd, the South African-born track star, on the British team, according to Lamine Ba, the secretary-general of the Supreme Council for Sport in Africa. Because South Africa is banned from participation in the Olympics, Miss Budd, whose father is of British descent, moved to Britain in March and was granted British citizenship.75 Olympians to Train in Atlanta ATLANTA, June 4 (AP) - About 75 Olympic athletes from six African countries and Pakistan will participate in a month-long training camp this summer in Atlanta under a program financed largely by a grant from the United States Information Agency, Anne Bassarab, a member of Mayor Andrew Young's staff, said today. The athletes, from Mozambique, Tanzania, Zambia, Zimbabwe, Uganda, Somalia and Pakistan, will arrive here June 24.")
toks <- tokens(txt)
mt_nat <- dfm(tokens_select(toks, "nationalit*", window = 5))
mt_cit <- dfm(tokens_select(toks, "citizenship*", window = 5))
Please make sure that you are using the latest version of Quanteda.

error reading text file into new columns of a dataframe using some text editing

I have a text file (0001.txt) which contains the data as below:
<DOC>
<DOCNO>1100101_business_story_11931012.utf8</DOCNO>
<TEXT>
The Telegraph - Calcutta (Kolkata) | Business | Local firms go global
6 Local firms go global
JAYANTA ROY CHOWDHURY
New Delhi, Dec. 31: Indian companies are stepping out of their homes to try their luck on foreign shores.
Corporate India invested $2.7 billion abroad in the first quarter of 2009-2010 on top of $15.9 billion in 2008-09.
Though the first-quarter investment was 15 per cent lower than what was invested in the same period last year, merchant banker Sudipto Bose said, It marks a confidence in a new world order where Indian businesses see themselves as equal to global players.
According to analysts, confidence in global recovery, cheap corporate buys abroad and easier rules governing investment overseas had spurred flow of capital and could see total investment abroad top $12 billion this year and rise to $18-20 billion next fiscal.
For example, Titagarh Wagons plans to expand abroad on the back of the proposed Asian railroad project.
We plan to travel all around the world with the growth of the railroads, said Umesh Chowdhury of Titagarh Wagons.
India is full of opportunities, but we are all also looking at picks abroad, said Gautam Mitra, managing director of Indian Structurals Engineering Company.
Mitra plans to open a holding company in Switzerland to take his business in structurals to other Asian and African countries.
Indian companies created 3 lakh jobs in the US, while contributing $105 billion to the US economy between 2004 and 2007, according to commerce ministry statistics. During 2008-09, Singapore, the Netherlands, Cyprus, the UK, the US and Mauritius together accounted for 81 per cent of the total outward investment.
Bose said, And not all of it is organic growth. Much of our investment abroad reflects takeovers and acquisitions.
In the last two years, Suzlon acquired Portugals Martifers stake in German REpower Systems for $122 million. McNally Bharat Engineering has bought the coal and minerals processing business of KHD Humboldt Wedag. ONGC bought out Imperial Energy for $2 billion.
Indias foreign assets and liabilities today add up to more than 60 per cent of its gross domestic product. By the end of 2008-09, total foreign investment was $67 billion, more than double of that at the end of March 2007.
</TEXT>
</DOC>
Above, all text data is within the HTML code for text i.e.
<TEXT> and </TEXT>.
I want to read it into an R dataframe in a way that there will be four columns and the data should be read as:
Title Author Date Text
The Telegraph - Calcutta (Kolkata) JAYANTA ROY CHOWDHURY Dec. 31 Indian companies are stepping out of their homes to try their luck on foreign shores. Corporate India invested $2.7 billion abroad in the first quarter of 2009-2010 on top of $15.9 billion in 2008-09. Though the first-quarter investment was 15 percent lower than what was invested in the same period last year, merchant banker Sudipto Bose said, It marks a confidence in a new world order where Indian businesses see themselves as equal to global players.
What I was trying to read using dplyr and as shown below:
# read text file
library(dplyr)
library(readr)
dat <- read_csv("0001.txt") %>% slice(-8)
# print part of data frame
head(dat, n=2)
In above code, I tried to skip first few lines (which are not important) from the text file that contains the above text and then read it into dataframe.
But I could not get what I was looking for and got confused what I am doing is wrong.
Could someone please help?
To be able to read data into R as a data frame or table, the data needs to have a consistent structure maintained by separators. One of the most common formats is a file with comma separated values (CSV).
The data you're working with doesn't have separators though. It's essentially a string with minimally enforced structure. Because of this, it sounds like the question is more related to regular expressions (regex) and data mining than it is to reading text files into R. So I'd recommend looking into those two things if you do this task often.
That aside, to do what you're wanting in this example, I'd recommend reading the text file into R as a single string of text first. Then you can parse the data you want using regex. Here's a basic, rough draft of how to do that:
fileName <- "Path/to/your/data/0001.txt"
string <- readChar(fileName, file.info(fileName)$size)
df <- data.frame(
Title=sub("\\s+[|]+(.*)","",string),
Author=gsub("(.*)+?([A-Z]{2,}.*[A-Z]{2,})+(.*)","\\2",string),
Date=gsub("(.*)+([A-Z]{1}[a-z]{2}\\.\\s[0-9]{1,2})+(.*)","\\2",string),
Text=gsub("(.*)+([A-Z]{1}[a-z]{2}\\.\\s[0-9]{1,2})+[: ]+(.*)","\\3",string))
Output:
str(df)
'data.frame': 1 obs. of 4 variables:
$ Title : chr "The Telegraph - Calcutta (Kolkata)"
$ Author: chr "JAYANTA ROY CHOWDHURY"
$ Date : chr "Dec. 31"
$ Text : chr "Indian companies are stepping out of their homes to"| __truncated__
The reason why regex can be useful is that it allows for very specific patterns in strings. The downside is when you're working with strings that keep changing formats. That will likely mean some slight adjustments to the regex used.
read.table( file = ... , sep = "|") will solve your issue.

Creating Tidy Text

I am using R for text analysis. I used the 'readtext' function to pull in text from a pdf. However, as you can imagine, it is pretty messy. I used 'gsub' to replace text for different purposes. The general goal is to use one type of delimiter '%%%%%' to split records into rows, and another delimiter '#' into columns. I accomplished the first but am at a loss of how to accomplish the latter. A sample of the data found in the dataframe is as follows:
895 "The ambulatory case-mix development project\n#Published:: June 6, 1994#Authors: Baker A, Honigfeld S, Lieberman R, Tucker AM, Weiner JP#Country: United States #Journal:Project final report. Baltimore, MD, USA: Johns Hopkins University and Aetna Health Plans. Johns Hopkins\nUniversity and Aetna Health Plans, USA As the US […"
896 "Ambulatory Care Groups: an evaluation for military health care use#Published:: June 6, 1994#Authors: Bolling DR, Georgoulakis JM, Guillen AC#Country: United States #Journal:Fort Sam Houston, TX, USA: United States Army Center for Healthcare Education and Studies, publication #HR 94-\n004. United States Army Center for Healthcare Education and […]#URL: http://oai.dtic.mil/oai/oai?verb=getRecord&metadataPrefix=html&identifier=ADA27804"
I want to take this data and split the #Published, #Authors, #Journal, #URL into columns -- c("Published", "Authors", "Journal", "URL").
Any suggestions?
Thanks in advance!
This seems to work OK:
dfr <- data.frame(TEXT=c("The ambulatory case-mix development project\n#Published:: June 6, 1994#Authors: Baker A, Honigfeld S, Lieberman R, Tucker AM, Weiner JP#Country: United States #Journal:Project final report. Baltimore, MD, USA: Johns Hopkins University and Aetna Health Plans. Johns Hopkins\nUniversity and Aetna Health Plans, USA As the US […",
"Ambulatory Care Groups: an evaluation for military health care use#Published:: June 6, 1994#Authors: Bolling DR, Georgoulakis JM, Guillen AC#Country: United States #Journal:Fort Sam Houston, TX, USA: United States Army Center for Healthcare Education and Studies, publication #HR 94-\n004. United States Army Center for Healthcare Education and […]#URL: http://oai.dtic.mil/oai/oai?verb=getRecord&metadataPrefix=html&identifier=ADA27804"),
stringsAsFactors = FALSE)
library(magrittr)
do.call(rbind, strsplit(dfr$TEXT, "#Published::|#Authors:|#Country:|#Journal:")) %>%
as.data.frame %>%
setNames(nm = c("Preamble","Published","Authors","Country","Journal"))
Basically split the text by one of four fields (noticing double :: after Published!), row-binding the result, converting to a dataframe, and giving some names.

Resources