How do I transpose this dataset in R? See below:
I downloaded a dataset that looks like this (the dates go backward from 2016 - 1975):
V1 V2 V3 V4 V5
1 2016 2016 2016 2015
4 Country Both-sexes Male Female Both-sexes
5 Afghanistan 23.4 [22.0-24.8] 22.6 [20.1-25.1] 24.1 [23.0-25.3] 23.3 [21.9-24.6]
6 Albania 26.7 [25.8-27.5] 27.0 [25.8-28.2] 26.3 [25.0-27.6] 26.6 [25.8-27.4]
7 Algeria 25.5 [24.5-26.5] 24.7 [23.4-26.1] 26.4 [24.9-27.8] 25.5 [24.5-26.4]
8 Andorra 26.7 [24.6-28.7] 27.3 [24.8-29.8] 26.1 [22.8-29.5] 26.7 [24.7-28.7]
I need to make the year and sex rows (currently numbered rows 1 & 4) into columns. Here's what I want:
1 Country Year Sex Rate
2 Afghanistan 2016 Both-sexes 23.4
3 Afghanistan 2016 Male 22.6
3 Afghanistan 2016 Female 24.1
4 Afghanistan 2015 Both-sexes 23.3
...and the rows continue on through all of the years for all of the countries in the dataset.
Here's what I have done trying to get there:
cfile <- read.csv(file= "countries-BMI.csv", header = F)
#removed second two rows that have unnecessary info
countries_data <- cfile[-c(2,3), ]
molten_countries_data <- melt(countries_data, id=c("V1"))
.and here's my result - head(molten_countries_data):
V1 variable value
1 V2 2016
2 Country V2 Both-sexes
3 Afghanistan V2 23.4 [22.0-24.8]
4 Albania V2 26.7 [25.8-27.5]
5 Algeria V2 25.5 [24.5-26.5]
6 Andorra V2 26.7 [24.6-28.7]
Not what I wanted! Please help.
I figured it out thanks to the tip from #Dave2e to merge the first 2 rows first. Here's what I ended up doing:
library(reshape2)
library(tidyr)
#load data frame without first two rows
cdata <- read.csv("countries-BMI.csv", skip = 2, header = F)
#create header by combining top two rows
headers <- read.csv("countries-BMI.csv", nrows=2, header=FALSE)
headers_names <- sapply(headers,paste,collapse="_")
#add the new header to data frame
names(cdata) <- headers_names
#transpose the "wide data" to make it tidy/long
longdata <- melt(cdata, id.vars = c("_Country"))
#separate the year and sex columns
countriesBMI2 <- separate(data = longdata, col = variable, into = c("Year", "Sex"), sep = "_")
My result: head(countriesBMI2)
_Country Year Sex value
1 Afghanistan 2016 Both-sexes 23.4 [22.0-24.8]
2 Albania 2016 Both-sexes 26.7 [25.8-27.5]
3 Algeria 2016 Both-sexes 25.5 [24.5-26.5]
4 Andorra 2016 Both-sexes 26.7 [24.6-28.7]
5 Angola 2016 Both-sexes 23.3 [21.2-25.6]
6 Antigua and Barbuda 2016 Both-sexes 26.7 [24.6-28.8]
Related
Currently I have the following data set for every country in the world from Afghanistan to Zimbabwe for the years 1996 until 2021 (full data set was difficult to show with a picture): Data 1
I would like to have the data in a panel data form as follows:
Country Year Central Government Debt
Afghanistan 1996 34,69009
Afghanistan ... ...
Afghanistan 2021 value for 2021
....
Zimbabwe 1996 value for 1996
Zimbabwe ... ...
Zimbabwe 2021 value for 2021
So, I would like to have the variables Country, Year and Central Government Debt as the columns. Then all the countries (from Afghanistan to Zimbabwe as one could see in the picture) as rows with the corresponding values of the government debt from every year.
I hope that I have explained the problem clear enough.
You can use pivot_longer().
For example, here is an example data similar to yours:
your_data = structure(list(country = c("Country1", "Country2", "Country3",
"Country4", "Country5"), `1996` = c(43.5759781114757, 39.4892847444862,
31.1313527473249, 22.3277196078561, 13.8667897786945), `1997` = c(64.1469739656895,
39.3858637614176, 185.817262185737, 97.6506256405264, 81.9881041860208
), `1998` = c(53.3410886977799, 42.1991529292427, 46.6682229144499,
54.6986216097139, 34.4061564910226)), class = "data.frame", row.names = c(NA,
-5L))
your_data
# country 1996 1997 1998
# 1 Country1 43.57598 64.14697 53.34109
# 2 Country2 39.48928 39.38586 42.19915
# 3 Country3 31.13135 185.81726 46.66822
# 4 Country4 22.32772 97.65063 54.69862
# 5 Country5 13.86679 81.98810 34.40616
Use tidyverse package:
library(tidyverse)
new_data = your_data |>
pivot_longer(cols = 2:4, # These are column positions
names_to = "year",
values_to = "central_government_debt")
new_data
# A tibble: 15 × 3
# country year central_government_debt
# <chr> <chr> <dbl>
# 1 Country1 1996 43.6
# 2 Country1 1997 64.1
# 3 Country1 1998 53.3
# 4 Country2 1996 39.5
# 5 Country2 1997 39.4
# 6 Country2 1998 42.2
# 7 Country3 1996 31.1
# 8 Country3 1997 186.
# 9 Country3 1998 46.7
# 10 Country4 1996 22.3
# 11 Country4 1997 97.7
# 12 Country4 1998 54.7
# 13 Country5 1996 13.9
# 14 Country5 1997 82.0
# 15 Country5 1998 34.4
I'm trying to get firsts values from diferent columns to make a data frame, but I I get stranded at one point and don't know how to solve it. Imagine you're using gapminder and want to get three higer gdppercap values for each region/year. How would you do it with dplyr?
Thanks.
I'm inferring that region is continent; if it were country, then this filter would return all rows, since each country/year combination occurs only once (so "top 3" means nothing special).
library(dplyr)
gapminder::gapminder %>%
group_by(continent, year) %>%
slice_max(desc(gdpPercap), n = 3) %>%
ungroup()
# # A tibble: 168 x 6
# country continent year lifeExp pop gdpPercap
# <fct> <fct> <int> <dbl> <int> <dbl>
# 1 Lesotho Africa 1952 42.1 748747 299.
# 2 Guinea-Bissau Africa 1952 32.5 580653 300.
# 3 Eritrea Africa 1952 35.9 1438760 329.
# 4 Lesotho Africa 1957 45.0 813338 336.
# 5 Eritrea Africa 1957 38.0 1542611 344.
# 6 Ethiopia Africa 1957 36.7 22815614 379.
# 7 Burundi Africa 1962 42.0 2961915 355.
# 8 Eritrea Africa 1962 40.2 1666618 381.
# 9 Lesotho Africa 1962 47.7 893143 412.
# 10 Burundi Africa 1967 43.5 3330989 413.
# # ... with 158 more rows
I have searched through the forums and have not found exactly the answer to my question. I have a data set from the World Bank
library(wbstats)
Gini <- wb(indicator = c("SI.POV.GINI"),
startdate = 2005, enddate = 2020)
Gini <- Gini[,c("iso3c", "date", "value")]
names(Gini)
names(Gini)<-c("iso3c", "date", "Gini")
#Change date to numeric
class(Gini$date)
Gini$date<-as.numeric(Gini$date)
#Tibble:
# A tibble: 1,012 x 3
iso3c date Gini
<chr> <dbl> <dbl>
1 ALB 2017 33.2
2 ALB 2016 33.7
3 ALB 2015 32.9
4 ALB 2014 34.6
5 ALB 2012 29
6 ALB 2008 30
7 ALB 2005 30.6
8 DZA 2011 27.6
9 AGO 2018 51.3
10 AGO 2008 42.7
# … with 1,002 more rows
Then I try to lag this estimate by one year
#Lag Gini
lg <- function(x)c(NA, x[1:(length(x)-1)])
Lagged.Gini<-ddply(Gini, ~ iso3c, transform, Gini.lag.1 = lg(Gini))
tibble(Lagged.Gini)
# A tibble: 1,032 x 4
iso3c date Gini Gini.lag.1
<chr> <dbl> <dbl> <dbl>
1 AGO 2018 51.3 NA
2 AGO 2008 42.7 51.3
3 ALB 2017 33.2 NA
4 ALB 2016 33.7 33.2
5 ALB 2015 32.9 33.7
6 ALB 2014 34.6 32.9
7 ALB 2012 29 34.6
8 ALB 2008 30 29
9 ALB 2005 30.6 30
10 ARE 2014 32.5 NA
Unfortunately, my problem is that when years are missing the lag does not recognize that year is missing and just puts the most recent year as the lag. Ex: country "ALB"'s Gini estimate is not lagged by one year in 2012 it lags to the next year which is 2008.
I would want the final data to look the same but how I edited below -- and ideally to be able to lag for multiple years:
# A tibble: 1,032 x 4
iso3c date Gini Gini.lag.1
<chr> <dbl> <dbl> <dbl>
1 AGO 2018 51.3 NA
AGO 2017 NA 51.3
2 AGO 2008 42.7 NA
AGO 2007 NA 42.7
3 ALB 2017 33.2 NA
4 ALB 2016 33.7 33.2
5 ALB 2015 32.9 33.7
6 ALB 2014 34.6 32.9
ALB 2013 NA 29
7 ALB 2012 29 NA
8 ALB 2008 30 29
9 ALB 2005 30.6 30
10 ARE 2014 32.5 NA
pseudospin's answer is great for base R. Since you're using tibbles, here's a tidyverse version with the same effect:
Gini <- readr::read_table("
iso3c date Gini
ALB 2017 33.2
ALB 2016 33.7
ALB 2015 32.9
ALB 2014 34.6
ALB 2012 29
ALB 2008 30
ALB 2005 30.6
DZA 2011 27.6
AGO 2018 51.3
AGO 2008 42.7")
library(dplyr)
Gini %>%
transmute(iso3c, date = date - 1, Gini.lag.1 = Gini) %>%
full_join(Gini, ., by = c("iso3c", "date")) %>%
arrange(iso3c, desc(date))
# # A tibble: 17 x 4
# iso3c date Gini Gini.lag.1
# <chr> <dbl> <dbl> <dbl>
# 1 AGO 2018 51.3 NA
# 2 AGO 2017 NA 51.3
# 3 AGO 2008 42.7 NA
# 4 AGO 2007 NA 42.7
# 5 ALB 2017 33.2 NA
# 6 ALB 2016 33.7 33.2
# 7 ALB 2015 32.9 33.7
# 8 ALB 2014 34.6 32.9
# 9 ALB 2013 NA 34.6
# 10 ALB 2012 29 NA
# 11 ALB 2011 NA 29
# 12 ALB 2008 30 NA
# 13 ALB 2007 NA 30
# 14 ALB 2005 30.6 NA
# 15 ALB 2004 NA 30.6
# 16 DZA 2011 27.6 NA
# 17 DZA 2010 NA 27.6
If you need to do this n times (one more lag each time), you can extend it programmatically this way:
Ginilags <- lapply(1:3, function(lg) {
z <- transmute(Gini, iso3c, date = date - lg, Gini)
names(z)[3] <- paste0("Gini.lag.", lg)
z
})
Reduce(function(a,b) full_join(a, b, by = c("iso3c", "date")),
c(list(Gini), Ginilags)) %>%
arrange(iso3c, desc(date))
# # A tibble: 28 x 6
# iso3c date Gini Gini.lag.1 Gini.lag.2 Gini.lag.3
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 AGO 2018 51.3 NA NA NA
# 2 AGO 2017 NA 51.3 NA NA
# 3 AGO 2016 NA NA 51.3 NA
# 4 AGO 2015 NA NA NA 51.3
# 5 AGO 2008 42.7 NA NA NA
# 6 AGO 2007 NA 42.7 NA NA
# 7 AGO 2006 NA NA 42.7 NA
# 8 AGO 2005 NA NA NA 42.7
# 9 ALB 2017 33.2 NA NA NA
# 10 ALB 2016 33.7 33.2 NA NA
# # ... with 18 more rows
Using dplyr & tidyr from tidyverse, you can do a rowwise mutate to find years which match the year in the current row minus 1.
library(tidyverse)
Gini %>%
rowwise() %>%
mutate(Gini.lag.1 = list(Gini$Gini[date-1 == Gini$date])) %>%
unnest(c(Gini.lag.1), keep_empty = T)
You could create a copy of the original table, but with the date having one year subtracted off. Then just join the two together on the iso3c and date columns to get the final result as you want it.
Like this
Gini_lagged <- data.frame(
iso3c = Gini$iso3c,
date = Gini$date-1,
Gini.lag.1 = Gini$Gini)
merge(Gini,Gini_lagged,all=TRUE)
I often need to rescale time series relative to their value at a certain baseline time (usually as a percent of the baseline). Here's an example.
> library(dplyr)
> library(magrittr)
> library(tibble)
> library(tidyr)
# [messages from package imports snipped]
> set.seed(42)
> mexico <- tibble(Year=2000:2004, Country='Mexico', A=10:14+rnorm(5), B=20:24+rnorm(5))
> usa <- tibble(Year=2000:2004, Country='USA', A=30:34+rnorm(5), B=40:44+rnorm(5))
> table <- rbind(mexico, usa)
> table
# A tibble: 10 x 4
Year Country A B
<int> <chr> <dbl> <dbl>
1 2000 Mexico 11.4 19.9
2 2001 Mexico 10.4 22.5
3 2002 Mexico 12.4 21.9
4 2003 Mexico 13.6 25.0
5 2004 Mexico 14.4 23.9
6 2000 USA 31.3 40.6
7 2001 USA 33.3 40.7
8 2002 USA 30.6 39.3
9 2003 USA 32.7 40.6
10 2004 USA 33.9 45.3
I want to scale A and B to express each value as a percent of the country-specific 2001 value (i.e., the A and B entries in rows 2 and 7 should be 100). My way of doing this is somewhat roundabout and awkward: extract the baseline values into a separate table, merge them back into a separate column in the main table, and then compute scaled values, with annoying intermediate gathering and spreading to avoid specifying the column names of each time series (real data sets can have far more than two value columns). Is there a better way to do this, ideally with a single short pipeline?
> long_table <- table %>% gather(variable, value, -Year, -Country)
> long_table
# A tibble: 20 x 4
Year Country variable value
<int> <chr> <chr> <dbl>
1 2000 Mexico A 11.4
2 2001 Mexico A 10.4
#[remaining tibble printout snipped]
> baseline_table <- long_table %>%
filter(Year == 2001) %>%
select(-Year) %>%
rename(baseline=value)
> baseline_table
# A tibble: 4 x 3
Country variable baseline
<chr> <chr> <dbl>
1 Mexico A 10.4
2 USA A 33.3
3 Mexico B 22.5
4 USA B 40.7
> normalized_table <- long_table %>%
inner_join(baseline_table) %>%
mutate(value=100*value/baseline) %>%
select(-baseline) %>%
spread(variable, value) %>%
arrange(Country, Year)
Joining, by = c("Country", "variable")
> normalized_table
# A tibble: 10 x 4
Year Country A B
<int> <chr> <dbl> <dbl>
1 2000 Mexico 109. 88.4
2 2001 Mexico 100. 100
3 2002 Mexico 118. 97.3
4 2003 Mexico 131. 111.
5 2004 Mexico 138. 106.
6 2000 USA 94.0 99.8
7 2001 USA 100 100
8 2002 USA 92.0 96.6
9 2003 USA 98.3 99.6
10 2004 USA 102. 111.
My second attempt was to use transform, but this failed because transform doesn't seem to recognize dplyr groups, and it would be suboptimal even if it worked because it requires me to know that 2001 is the second year in the time series.
> table %>%
arrange(Country, Year) %>%
gather(variable, value, -Year, -Country) %>%
group_by(Country, variable) %>%
transform(norm=value*100/value[2])
Year Country variable value norm
1 2000 Mexico A 11.37096 108.9663
2 2001 Mexico A 10.43530 100.0000
3 2002 Mexico A 12.36313 118.4741
4 2003 Mexico A 13.63286 130.6418
5 2004 Mexico A 14.40427 138.0340
6 2000 USA A 31.30487 299.9901
7 2001 USA A 33.28665 318.9811
8 2002 USA A 30.61114 293.3422
9 2003 USA A 32.72121 313.5627
10 2004 USA A 33.86668 324.5395
11 2000 Mexico B 19.89388 190.6402
12 2001 Mexico B 22.51152 215.7247
13 2002 Mexico B 21.90534 209.9157
14 2003 Mexico B 25.01842 239.7480
15 2004 Mexico B 23.93729 229.3876
16 2000 USA B 40.63595 389.4085
17 2001 USA B 40.71575 390.1732
18 2002 USA B 39.34354 377.0235
19 2003 USA B 40.55953 388.6762
20 2004 USA B 45.32011 434.2961
It would be nice for this to be more scalable, but here's a simple solution. You can refer to A[Year == 2001] inside mutate, much as you might do table$A[table$Year == 2001] in base R. This lets you scale against your baseline of 2001 or whatever other year you might need.
Edit: I was missing a group_by to ensure that values are only being scaled against other values in their own group. The "sanity check" (that I clearly didn't do) is that values for Mexico in 2001 should have a scaled value of 1, and same for USA and any other countries.
library(tidyverse)
set.seed(42)
mexico <- tibble(Year=2000:2004, Country='Mexico', A=10:14+rnorm(5), B=20:24+rnorm(5))
usa <- tibble(Year=2000:2004, Country='USA', A=30:34+rnorm(5), B=40:44+rnorm(5))
table <- rbind(mexico, usa)
table %>%
group_by(Country) %>%
mutate(A_base2001 = A / A[Year == 2001], B_base2001 = B / B[Year == 2001])
#> # A tibble: 10 x 6
#> # Groups: Country [2]
#> Year Country A B A_base2001 B_base2001
#> <int> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 2000 Mexico 11.4 19.9 1.09 0.884
#> 2 2001 Mexico 10.4 22.5 1 1
#> 3 2002 Mexico 12.4 21.9 1.18 0.973
#> 4 2003 Mexico 13.6 25.0 1.31 1.11
#> 5 2004 Mexico 14.4 23.9 1.38 1.06
#> 6 2000 USA 31.3 40.6 0.940 0.998
#> 7 2001 USA 33.3 40.7 1 1
#> 8 2002 USA 30.6 39.3 0.920 0.966
#> 9 2003 USA 32.7 40.6 0.983 0.996
#> 10 2004 USA 33.9 45.3 1.02 1.11
Created on 2018-05-23 by the reprex package (v0.2.0).
Inspired by Camille's answer, I found one simple approach that that scales well:
table %>%
gather(variable, value, -Year, -Country) %>%
group_by(Country, variable) %>%
mutate(value=100*value/value[Year == 2001]) %>%
spread(variable, value)
# A tibble: 10 x 4
# Groups: Country [2]
Year Country A B
<int> <chr> <dbl> <dbl>
1 2000 Mexico 109. 88.4
2 2000 USA 94.0 99.8
3 2001 Mexico 100. 100
4 2001 USA 100 100
5 2002 Mexico 118. 97.3
6 2002 USA 92.0 96.6
7 2003 Mexico 131. 111.
8 2003 USA 98.3 99.6
9 2004 Mexico 138. 106.
10 2004 USA 102. 111.
Preserving the the original values alongside the scaled ones takes more work. Here are two approaches. One of them uses an extra gather call to produce two variable-name columns (one indicating the series name, the other marking original or scaled), then unifying them into one column and reformatting.
table %>%
gather(variable, original, -Year, -Country) %>%
group_by(Country, variable) %>%
mutate(scaled=100*original/original[Year == 2001]) %>%
gather(scaled, value, -Year, -Country, -variable) %>%
unite(variable_scaled, variable, scaled, sep='_') %>%
mutate(variable_scaled=gsub("_original", "", variable_scaled)) %>%
spread(variable_scaled, value)
# A tibble: 10 x 6
# Groups: Country [2]
Year Country A A_scaled B B_scaled
<int> <chr> <dbl> <dbl> <dbl> <dbl>
1 2000 Mexico 11.4 109. 19.9 88.4
2 2000 USA 31.3 94.0 40.6 99.8
3 2001 Mexico 10.4 100. 22.5 100
4 2001 USA 33.3 100 40.7 100
5 2002 Mexico 12.4 118. 21.9 97.3
6 2002 USA 30.6 92.0 39.3 96.6
7 2003 Mexico 13.6 131. 25.0 111.
8 2003 USA 32.7 98.3 40.6 99.6
9 2004 Mexico 14.4 138. 23.9 106.
10 2004 USA 33.9 102. 45.3 111.
A second equivalent approach creates a new table with the columns scaled "in place" and then merges it back into with the original one.
table %>%
gather(variable, value, -Year, -Country) %>%
group_by(Country, variable) %>%
mutate(value=100*value/value[Year == 2001]) %>%
ungroup() %>%
mutate(variable=paste(variable, 'scaled', sep='_')) %>%
spread(variable, value) %>%
inner_join(table)
Joining, by = c("Year", "Country")
# A tibble: 10 x 6
Year Country A_scaled B_scaled A B
<int> <chr> <dbl> <dbl> <dbl> <dbl>
1 2000 Mexico 109. 88.4 11.4 19.9
2 2000 USA 94.0 99.8 31.3 40.6
3 2001 Mexico 100. 100 10.4 22.5
4 2001 USA 100 100 33.3 40.7
5 2002 Mexico 118. 97.3 12.4 21.9
6 2002 USA 92.0 96.6 30.6 39.3
7 2003 Mexico 131. 111. 13.6 25.0
8 2003 USA 98.3 99.6 32.7 40.6
9 2004 Mexico 138. 106. 14.4 23.9
10 2004 USA 102. 111. 33.9 45.3
It's possible to replace the final inner_join with arrange(County, Year) %>% select(-Country, -Year) %>% bind_cols(table), which may perform better for some data sets, though it orders the columns suboptimally.
I have a panel dataset where I want to average over a specified number of time periods (t) by variable (column).
An example:
Country Year Var 1 Var 2 Var 3
Austria 1984 1 3.6 95
Austria 1985 2 4.1 94.6
Austria 1986 1 2.6 93.6
Austria 1987 1 3 94.4
Austria 1988 1 3.9 95.2
What I want then is a new column/new dataframe with a new variable for the average for the 5 year period (1984-1988) for Var 1, a variable for the average of Var 2 and var 3 etc.
I also want to loop the function over such that I can apply it to the other countries in my dataset. It would be great if I could avoid that the averaging mixes up countries, so I was thinking of adding some matching string pattern (for code %in% AUT in this case for instance, I have a variable with country codes) but I couldn't figure out how to do it.
Thank you very much in advance
1) Using the sample input in the Note at the end, read in the country and year from the row names and round the year up to the end of the current 5 year period so that each year from 1984 to 1988 gets rounded up to 1988, etc. Then use aggregate to calculate the means of each column by both country and year. No packages are used.
By0 <- read.table(text = rownames(DF), col.names = c("Country", "Year"))
By <- transform(By0, Year = 5 * ((Year - min(Year)) %/% 5) + min(Year) + 4)
aggregate(DF, By, mean)
giving the following:
Country Year Var 1 Var 2 Var 3
1 Australia 1988 1.6 18.46 95.52
2 Austria 1988 1.2 3.44 94.56
2) or if what was wanted was to append the columns to the original data frame lapply over the columns using ave to take the mean by Country for each:
out <- cbind(DF, lapply(DF, function(x) with(By, ave(x, Country, Year, FUN = mean))))
names(out) <- c(names(DF), paste("Mean", names(DF)))
giving:
> out
Var 1 Var 2 Var 3 Mean Var 1 Mean Var 2 Mean Var 3
Austria 1984 1 3.6 95.0 1.2 3.44 94.56
Austria 1985 2 4.1 94.6 1.2 3.44 94.56
Austria 1986 1 2.6 93.6 1.2 3.44 94.56
Austria 1987 1 3.0 94.4 1.2 3.44 94.56
Austria 1988 1 3.9 95.2 1.2 3.44 94.56
Australia 1984 1 3.6 95.0 1.6 18.46 95.52
Australia 1985 2 4.1 94.6 1.6 18.46 95.52
Australia 1986 1 2.6 93.6 1.6 18.46 95.52
Australia 1987 1 3.0 94.4 1.6 18.46 95.52
Australia 1988 3 79.0 100.0 1.6 18.46 95.52
Note
The input used, shown reproducibly, is:
Lines <- "
Var 1,Var 2,Var 3
Austria 1984,1,3.6,95
Austria 1985,2,4.1,94.6
Austria 1986,1,2.6,93.6
Austria 1987,1,3,94.4
Austria 1988,1,3.9,95.2
Australia 1984,1,3.6,95
Australia 1985,2,4.1,94.6
Australia 1986,1,2.6,93.6
Australia 1987,1,3,94.4
Australia 1988,3,79,100"
DF <- read.csv(text = Lines, check.names = FALSE)