R: multiply columns by rows to create country-specific index - r

I am trying to create a country-specific index based on the import share of certain food commodities.
I have the following data: Prices contains time-series data on commodity prices for a number of food commodities. Weights contains data on the country-specific import shares for the relevant commodities (see mock data).
What I want to do is to create a country-specific food-price index which is the sum of the price-series of imported commodities multiplied by the import share.
So in the example data the food-price index for Australia will be:
FOODct = 0.12 * WHEATt + 0.08 * SUGARt
Where c indicates country and t time.
So basically my question is: How do I multiply the columns by the rows for each country?
I have some experience with R but trying to solve this I seem to be punching above my weight. I also haven't found any useful pointers elsewhere so I was hoping that maybe anyone of you might have good suggestions.
## Code to create mock data:
## Generate data on country weights
country<-c(rep("Australia",2),rep("Zimbabwe",3))
item<-c("Wheat","Sugar","Wheat","Sugar","Soybeans")
itemcode<-c(1,2,1,2,3)
share<-c(0.12,0.08,0.16,0.08,0.03)
weights<-data.frame(country,item,itemcode,share)
## Generate data on price index
date<-seq(as.Date("2005/1/1"),by="month",length.out=12)
Wheat<-runif(12,80,160)
Sugar<-runif(12,110,230)
Soybeans<-runif(12,60,130)
prices<-data.frame(date,Wheat,Sugar,Soybeans)
EDIT: Solution
Thanks to alexwhan for his suggestion ( I can't upvote unfortunately due to lack of stackoverflow street cred). And dnlbrky for the solution which was easiest to implement with the original data.
## Load data.table package
require(data.table)
## Convert data to data table
prices<-data.table(prices)
weights<-data.table(weights,key="item")
## Extract names for all the food commodities
vars<-names(prices)[!names(prices) %in% "date"]
## Unstack items to create table in long format
prices<-data.table(date=prices[,date], stack(prices,vars),key="ind")
## Rename the columns
setnames(prices,c("values","ind"),c("price","item"))
## Calculate the food price index
priceindex<-weights[prices,allow.cartesian=T][,list(index=sum(share*price)),
by=list(country,date)]
## Order food price index if not done automatically
priceindex<-priceindex[order(priceindex$country,priceindex$date),]

Here's one option. There will absolutely be a neater way to do this, but it should get you going.
First, I'm going to get weights into wide format so that it's easier to work with for our purposes:
library(reshape2)
weights.c <- dcast(weights, country~item)
# country Soybeans Sugar Wheat
# 1 Australia NA 0.08 0.12
# 2 Zimbabwe 0.03 0.08 0.16
Then I've used apply to go through each row of weights.c and calculate the 'food-price index' (tell me if this is being calculated incorrectly, I think I followed the example right...).
FOOD <- as.data.frame(apply(weights.c, 1, function(x)
as.numeric(x[3]) * prices$Soybeans +
as.numeric(x[3])*prices$Sugar + as.numeric(x[4])*prices$Wheat))
Adding in the country and date identifiers:
colnames(FOOD) <- weights.c$country
FOOD$date <- prices$date
FOOD
# Australia Zimbabwe date
# 1 35.04337 39.99131 2005-01-01
# 2 38.95579 44.72377 2005-02-01
# 3 33.45708 38.50418 2005-03-01
# 4 30.42181 34.04647 2005-04-01
# 5 36.03443 39.90905 2005-05-01
# 6 46.21269 52.29347 2005-06-01
# 7 41.88694 48.15334 2005-07-01
# 8 34.47848 39.83654 2005-08-01
# 9 36.32498 40.60091 2005-09-01
# 10 33.74768 37.17185 2005-10-01
# 11 38.84855 44.87495 2005-11-01
# 12 36.45119 40.11678 2005-12-01
Hopefully this is close enough to what you're after...

I would unstack/reshape the items in the weights table, and then use data.table to join the prices and the weights.
## Generate data table for country weights:
weights<-data.table(country=c(rep("Australia",2),rep("Zimbabwe",3)),
item=c("Wheat","Sugar","Wheat","Sugar","Soybeans"),
itemcode=c(1,2,1,2,3),
share=c(0.12,0.08,0.16,0.08,0.03),
key="item")
## Generate data table for price index:
prices<-data.table(date=seq(as.Date("2005/1/1"),by="month",length.out=12),
Wheat=runif(12,80,160),
Sugar=runif(12,110,230),
Soybeans=runif(12,60,130))
## Get column names of all the food types:
vars<-names(prices)[!names(prices) %in% "date"]
## Unstack the items and create a "long" table:
prices<-data.table(date=prices[,date], stack(prices,vars),key="ind")
## Rename the columns:
setnames(prices,c("values","ind"),c("price","item"))
prices[1:5]
## date price item
## 1: 2005-01-01 88.25818 Soybeans
## 2: 2005-02-01 71.61261 Soybeans
## 3: 2005-03-01 77.91082 Soybeans
## 4: 2005-04-01 129.05806 Soybeans
## 5: 2005-05-01 74.63005 Soybeans
## Join the weights and prices tables, multiply the share by the price, and sum by country and date:
weights[prices,allow.cartesian=T][,list(index=sum(share*price)),by=list(country,date)]
## country date index
## 1: Zimbabwe 2005-01-01 27.05711
## 2: Zimbabwe 2005-02-01 34.72842
## 3: Zimbabwe 2005-03-01 35.23615
## 4: Zimbabwe 2005-04-01 39.05027
## 5: Zimbabwe 2005-05-01 39.48388
## 6: Zimbabwe 2005-06-01 33.43677
## 7: Zimbabwe 2005-07-01 32.55172
## 8: Zimbabwe 2005-08-01 34.86790
## 9: Zimbabwe 2005-09-01 33.29748
## 10: Zimbabwe 2005-10-01 38.31180
## 11: Zimbabwe 2005-11-01 31.29709
## 12: Zimbabwe 2005-12-01 40.70930
## 13: Australia 2005-01-01 21.07165
## 14: Australia 2005-02-01 27.47660
## 15: Australia 2005-03-01 27.03025
## 16: Australia 2005-04-01 29.34917
## 17: Australia 2005-05-01 31.95188
## 18: Australia 2005-06-01 26.22890
## 19: Australia 2005-07-01 24.58945
## 20: Australia 2005-08-01 27.44728
## 21: Australia 2005-09-01 27.02199
## 22: Australia 2005-10-01 31.58282
## 23: Australia 2005-11-01 24.42326
## 24: Australia 2005-12-01 31.70109

Related

Taking variance of some rows above in panel structrure (R data table )

# Example of a panel data
library(data.table)
panel<-data.table(expand.grid(Year=c(2017:2020),Individual=c("A","B","C")))
panel$value<-rnorm(nrow(panel),10) # The value I am interested in
I want to take the variance of prior two years value by Individual.
For example, if I were to sum the value of prior two years I would do something like:
panel[,sum_of_past_2_years:=shift(value)+shift(value, 2),Individual]
I thought this would work.
panel[,var(c(shift(value),shift(value, 2))),Individual]
# This doesn't work of course
Ideally the answer should look like
a<-c(NA,NA,var(panel$value[1:2]),var(panel$value[2:3]))
b<-c(NA,NA,var(panel$value[5:6]),var(panel$value[6:7]))
c<-c(NA,NA,var(panel$value[9:10]),var(panel$value[10:11]))
panel[,variance_past_2_years:=c(a,b,c)]
# NAs when there is no value for 2 prior years
You can use frollapply to perform rolling operation of every 2 values.
library(data.table)
panel[, var := frollapply(shift(value), 2, var), Individual]
# Year Individual value var
# 1: 2017 A 9.416218 NA
# 2: 2018 A 8.424868 NA
# 3: 2019 A 8.743061 0.49138739
# 4: 2020 A 9.489386 0.05062333
# 5: 2017 B 10.102086 NA
# 6: 2018 B 8.674827 NA
# 7: 2019 B 10.708943 1.01853361
# 8: 2020 B 11.828768 2.06881272
# 9: 2017 C 10.124349 NA
#10: 2018 C 9.024261 NA
#11: 2019 C 10.677998 0.60509700
#12: 2020 C 10.397105 1.36742220

data table lapply and additional columns in output

I am just hoping there is a more convenient way. Imaging I would like to run a model with different transformations of some of the columns, e.g. winsorizing. I would like to provide the transformed data set to the model and some additional columns that do not need to be transformed. Is there a practical way to this in one line? I do not want to replace the data using := because I am planning to run the model with different specifications of the transformation.
dt<-data.table(id=1:10, Country=sample(c("Germany", "USA"),10, replace=TRUE), x=rnorm(10,1,10),y=rnorm(10,1,10),factor=factor(sample(LETTERS[1:2],10,replace=TRUE)))
sel.col<-c("x","y")
dt[,lapply(.SD,Winsorize),.SDcols=sel.col,by=factor]
I would Need to call data.table again to merge the original dt with the transformed data and pay Attention to the order.
data.table(dt[,.(id,Country),by=factor],
dt[,lapply(.SD,Winsorize),.SDcols=sel.col,by=factor])
I was hoping that I could include the additional columns with the lapply call
dt[,.(lapply(.SD,Winsorize), id, Country),.SDcols=sel.col,by=factor]
Are there any other solutions?
Do you just need?
dt[, c(lapply(.SD,Winsorize), list(id = id, Country = Country)), .SDcols=sel.col,by=factor]
Unfortunately this method get's slow with big data. Apparently this was optimised in some recent update, but it still very slow.
There is no need to merge, you can assign columns after lapply call:
> library(DescTools)
> library(data.table)
> dt<-data.table(id=1:10, Country=sample(c("Germany", "USA"),10, replace=TRUE), x=rnorm(10,1,10),y=rnorm(10,1,10),factor=factor(sample(LETTERS[1:2],10,replace=TRUE)))
> sel.col<-c("x","y")
> dt
id Country x y factor
1: 1 Germany 13.116248 -0.4609152 B
2: 2 Germany -6.623404 -3.7048052 A
3: 3 USA -18.027532 22.2946805 A
4: 4 USA -13.377736 6.2021252 A
5: 5 Germany -12.585897 0.8255081 B
6: 6 Germany -8.816252 -12.1218135 B
7: 7 USA -3.459926 -11.5710316 B
8: 8 USA 3.180706 6.3262951 B
9: 9 Germany -5.520637 7.2877123 A
10: 10 Germany 15.857069 8.6422997 A
> # Notice an assignment `(sel.col) :=` here:
> dt[,(sel.col) := lapply(.SD,Winsorize),.SDcols=sel.col,by=factor]
> dt
id Country x y factor
1: 1 Germany 11.129140 -0.4609152 B
2: 2 Germany -6.623404 -1.7234191 A
3: 3 USA -17.097573 19.5642043 A
4: 4 USA -13.377736 6.2021252 A
5: 5 Germany -11.831968 0.8255081 B
6: 6 Germany -8.816252 -12.0116571 B
7: 7 USA -3.459926 -11.5710316 B
8: 8 USA 3.180706 5.2261377 B
9: 9 Germany -5.520637 7.2877123 A
10: 10 Germany 11.581528 8.6422997 A

Assign mean values and/or conditional assignment for unordered duplicate dyads

I've come across something a bit above my skill set. I'm working with IMF trade data that consists of data between country dyads. The IMF dataset consists of ' unordered duplicate' records in that each country individually reports trade data. However, due to a variety of timing, recording systems, regime type, etc., there are discrepancies between corresponding values. I'm trying to manipulate this data in two ways:
Assign the mean values to the duplicated dyads.
Assign the dyad values conditionally based on a separate economic indicator or development index (who do I trust more?).
There are several discussions of identifying unordered duplicates here, here, here, and here but after a couple days of searching I have yet to see what I'm trying to do.
Here is an example of the raw data. In reality there are many more variables and several hundred thousand dyads:
reporter<-c('USA','GER','AFG','FRA','CHN')
partner<-c('AFG','CHN','USA','CAN','GER')
year<-c(2010,2010,2010,2009,2010)
import<-c(-1000,-2000,-2400,-1200,-2000)
export<-c(2500,2200,1200,2900,2100)
rep_econ1<-c(28,32,12,25,19)
imf<-data.table(reporter,partner,year,import,export,rep_econ1)
imf
reporter partner year import export rep_econ1
1: USA AFG 2010 -1000 2500 28
2: GER CHN 2010 -2000 2200 32
3: AFG USA 2010 -2400 1200 12
4: FRA CAN 2009 -1200 2900 25
5: CHN GER 2010 -2000 2100 19
The additional wrinkle is that import and export are inverses of each other between the dyads, so they need to be matched and meaned with an absolute value.
For objective 1, the resulting data.table is:
Mean
reporter partner year import export rep_econ1
USA AFG 2010 -1100 2450 28
GER CHN 2010 -2050 2100 32
AFG USA 2010 -2450 1100 12
FRA CAN 2009 -1200 2900 25
CHN GER 2010 -2100 2050 19
For objective 2:
Conditionally Assign on Higher Economic Indicator (rep_econ1)
reporter partner year import export rep_econ1
USA AFG 2010 -1000 2500 28
GER CHN 2010 -2000 2200 32
AFG USA 2010 -2500 1000 12
FRA CAN 2009 -1200 2900 25
CHN GER 2010 -2200 2000 19
It's possible not all dyads are represented twice so I included a solo record. I prefer data.table but I'll go with anything that leads me down the right path.
Thank you for your time.
Pre - Processing:
library(data.table)
# get G = reporter/partner group and N = number of rows for each group
# Thanks #eddi for simplifying
imf[, G := .GRP, by = .(year, pmin(reporter, partner), pmax(reporter, partner))]
imf[, N := .N, G]
Option 1 (means)
# for groups with 2 rows, average imports and exports
imf[N == 2
, `:=`(import = (import - rev(export))/2
, export = (export - rev(import))/2)
, by = G]
imf
# reporter partner year import export rep_econ1 G N
# 1: USA AFG 2010 -1100 2450 28 1 2
# 2: GER CHN 2010 -2050 2100 32 2 2
# 3: AFG USA 2010 -2450 1100 12 1 2
# 4: FRA CAN 2009 -1200 2900 25 3 1
# 5: CHN GER 2010 -2100 2050 19 2 2
Option 2 (highest economic indicator)
# for groups with 2 rows, choose imports and exports based on highest rep_econ1
imf[N == 2
, c('import', 'export') := {
o <- order(-rep_econ1)
import <- cbind(import, -export)[o[1], o]
.(import, export = -rev(import))}
, by = G]
imf
# reporter partner year import export rep_econ1 G N
# 1: USA AFG 2010 -1000 2500 28 1 2
# 2: GER CHN 2010 -2000 2200 32 2 2
# 3: AFG USA 2010 -2500 1000 12 1 2
# 4: FRA CAN 2009 -1200 2900 25 3 1
# 5: CHN GER 2010 -2200 2000 19 2 2
Option 2 explanation: You need to select the row with the highest economic indicator (i.e. row order(-rep_econ1)[1]) and use that for imports, but if the second row is the "trusted" one, it needs to be reversed. Otherwise you'd have the countries switched, since the second reporter's imports (now the first element of cbind(import, -export)[o[1],]) would be assigned as the first reporter's imports (because it's the first element).
Edit:
If imports and exports are both positive in the input data and need to be positive in the output data, the two calculations above can be modified as
imf[N == 2
, `:=`(import = (import + rev(export))/2
, export = (export + rev(import))/2)
, by = G]
And
imf[N == 2
, c('import', 'export') := {
o <- order(-rep_econ1)
import <- cbind(import, export)[o[1], o]
.(import, export = rev(import))}
, by = G]

Avoid For-Loops in R

I'm sure this question has been posed before, but would like some input on my specific question. In return for your help, I'll use an interesting example.
Sean Lahman provides giant datasets of MLB baseball statistics, available free on his website (http://www.seanlahman.com/baseball-archive/statistics/).
I'd like to use this data to answer the following question: What is the average number of home runs per game recorded for each decade in the MLB?
Below I've pasted all relevant script:
teamdata = read.csv("Teams.csv", header = TRUE)
decades = c(1870,1880,1890,1900,1910,1920,1930,1940,1950,1960,1970,1980,1990,2000,2010,2020)
i = 0
meanhomers = c()
for(i in c(1:length(decades))){
meanhomers[i] = mean(teamdata$HR[teamdata$yearID>=decades[i] & teamdata$yearID<decades[i+1]]);
i = i+1
}
My primary question is, how could this answer have been determined without resorting to the dreaded for-loop?
Side question: What simple script would have generated the decades vector for me?
(For those interested in the answer to the baseball question, see below.)
meanhomers
[1] 4.641026 23.735849 34.456522 20.421053 25.755682 61.837500 84.012500
[8] 80.987500 130.375000 132.166667 120.093496 126.700000 148.737410 173.826667
[15] 152.973333 NaN
Edit for clarity: Turns out I answered the wrong question; the answer provided above indicates the number of home runs per team per year, not per game. A little fix of the denominator would get the correct result.
Here's a data.table example. Because others showed how to use cut, I took another route for splitting the data into decades:
teamdata[,list(HRperYear=mean(HR)),by=10*floor((yearID)/10)]
However, the original question mentions average HRs per game, not per year (though the code and answers clearly deal with HRs per year).
Here's how you could compute average HRs per game (and average games per team per year):
teamdata[,list(HRperYear=mean(HR),HRperGame=sum(HR)/sum(G),games=mean(G)),by=10*floor(yearID/10)]
floor HRperYear HRperGame games
1: 1870 4.641026 0.08911866 52.07692
2: 1880 23.735849 0.21543555 110.17610
3: 1890 34.456522 0.25140108 137.05797
4: 1900 20.421053 0.13686067 149.21053
5: 1910 25.755682 0.17010657 151.40909
6: 1920 61.837500 0.40144445 154.03750
7: 1930 84.012500 0.54593453 153.88750
8: 1940 80.987500 0.52351325 154.70000
9: 1950 130.375000 0.84289640 154.67500
10: 1960 132.166667 0.81977946 161.22222
11: 1970 120.093496 0.74580935 161.02439
12: 1980 126.700000 0.80990313 156.43846
13: 1990 148.737410 0.95741873 155.35252
14: 2000 173.826667 1.07340167 161.94000
15: 2010 152.973333 0.94427984 162.00000
(The low average game totals in the 1980's and 1990's are due to the 1981 and 1994-5 player strikes).
PS: Nicely-written question, but it would be extra nice for you to provide a fully reproducible example so that I don't have to go and download the CSV to answer your question. Making dummy data is OK.
You can use seq to generate sequences.
decades <- seq(1870, 2020, by=10)
You can use cut to split up numeric variables into intervals.
teamdata$decade <- cut(teamdata$yearID, breaks=decades, dig.lab=4)
Basically it creates a factor with one level for each decade (as specified by the breaks). The dig.lab=4 is just so it prints the years as e.g. "1870" not "1.87e+03".
See ?cut for further configuration (e.g. is '1980' included in this decade or the next one, & so on. You can even configure the labels if you think you'll use them.)
Then to do something for each decade, use the plyr package (data.table and dplyr are other options, but I think plyr has the easiest learning curve, and your data does not seem very large to need data.table).
library(plyr)
ddply(teamdata, .(decade), summarize, meanhomers=mean(HR))
decade meanhomers
1 (1870,1880] 4.930233
2 (1880,1890] 25.409091
3 (1890,1900] 35.115702
4 (1900,1910] 20.068750
5 (1910,1920] 27.284091
6 (1920,1930] 67.681250
7 (1930,1940] 84.050000
8 (1940,1950] 84.125000
9 (1950,1960] 130.718750
10 (1960,1970] 133.349515
11 (1970,1980] 117.745968
12 (1980,1990] 127.584615
13 (1990,2000] 155.053191
14 (2000,2010] 170.226667
15 (2010,2020] 152.775000
Mine is a little different to yours because my intervals are (, ] whereas yours are [, ). Can adjust cut to switch these around.
You can also use the sqldf package in order to use SQL queries on the data.
Here is the code:
library(sqldf)
sqldf("select floor(yearID/10)*10 as decade,avg(hr) as count
from Teams
group by decade;")
decade count
1 1870 4.641026
2 1880 23.735849
3 1890 34.456522
4 1900 20.421053
5 1910 25.755682
6 1920 61.837500
7 1930 84.012500
8 1940 80.987500
9 1950 130.375000
10 1960 132.166667
11 1970 120.093496
12 1980 126.700000
13 1990 148.737410
14 2000 173.826667
15 2010 152.973333
aggregate is handy for this sort of thing. You can use your decades object with findInterval to put the years into bins:
aggregate(HR ~ findInterval(yearID, decades), data=teamdata, FUN=mean)
## findInterval(yearID, decades) HR
## 1 1 4.641026
## 2 2 23.735849
## 3 3 34.456522
## 4 4 20.421053
## 5 5 25.755682
## 6 6 61.837500
## 7 7 84.012500
## 8 8 80.987500
## 9 9 130.375000
## 10 10 132.166667
## 11 11 120.093496
## 12 12 126.700000
## 13 13 148.737410
## 14 14 173.826667
## 15 15 152.973333
Note that the intervals used are left-closed, as you desire. Also note that the intervals need not be regular. Yours are, which leads to the "side question" of how to produce the decades vector: don't even compute it. Instead, directly compute which decade each year falls in:
aggregate(HR ~ I(10 * (yearID %/% 10)), data=teamdata, FUN=mean)
## I(10 * (yearID%/%10)) HR
## 1 1870 4.641026
## 2 1880 23.735849
## 3 1890 34.456522
## 4 1900 20.421053
## 5 1910 25.755682
## 6 1920 61.837500
## 7 1930 84.012500
## 8 1940 80.987500
## 9 1950 130.375000
## 10 1960 132.166667
## 11 1970 120.093496
## 12 1980 126.700000
## 13 1990 148.737410
## 14 2000 173.826667
## 15 2010 152.973333
I usually prefer the formula interface to aggregate as used above, but you can get better names directly by using the non-formula interface. Here's the example for each of the above:
with(teamdata, aggregate(list(mean.HR=HR), list(Decade=findInterval(yearID,decades)), FUN=mean))
## Decade mean.HR
## 1 1 4.641026
## ...
with(teamdata, aggregate(list(mean.HR=HR), list(Decade=10 * (yearID %/% 10)), FUN=mean))
## Decade mean.HR
## 1 1870 4.641026
## ...
dplyr::group_by, mixed with cut is a good option here, and avoids looping. The decades vector is just a stepped sequence.
decades <- seq(1870,2020,by=10)
cut breaks the data into categories, which I've labelled by the decades themselves for clarity.
teamdata$decade <- cut(teamdata$yearID, breaks=decades, right=FALSE, labels=decades[1:(length(decades)-1)])
Then dplyr handles the grouped summarise as neatly as you could hope
library(dplyr)
teamdata %>% group_by(decade) %>% summarise(meanhomers=mean(HR))
# decade meanhomers
# (fctr) (dbl)
# 1 1870 4.641026
# 2 1880 23.735849
# 3 1890 34.456522
# 4 1900 20.421053
# 5 1910 25.755682
# 6 1920 61.837500
# 7 1930 84.012500
# 8 1940 80.987500
# 9 1950 130.375000
# 10 1960 132.166667
# 11 1970 120.093496
# 12 1980 126.700000
# 13 1990 148.737410
# 14 2000 173.826667
# 15 2010 152.973333

How to reshape this complicated data frame?

Here is first 4 rows of my data;
X...Country.Name Country.Code Indicator.Name
1 Turkey TUR Inflation, GDP deflator (annual %)
2 Turkey TUR Unemployment, total (% of total labor force)
3 Afghanistan AFG Inflation, GDP deflator (annual %)
4 Afghanistan AFG Unemployment, total (% of total labor force)
Indicator.Code X2010
1 NY.GDP.DEFL.KD.ZG 5.675740
2 SL.UEM.TOTL.ZS 11.900000
3 NY.GDP.DEFL.KD.ZG 9.437322
4 SL.UEM.TOTL.ZS NA
I want my data reshaped into two colums, one of each Indicator code, and I want each row correspond to a country, something like this;
Country Name NY.GDP.DEFL.KD.ZG SL.UEM.TOTL.ZS
Turkey 5.6 11.9
Afghanistan 9.43 NA
I think I could do this with Excel, but I want to learn the R way, so that I don't need to rely on excel everytime I have a problem. Here is dput of data if you need it.
Edit: I actually want 3 colums, one for each indicator and one for the country's name.
Sticking with base R, use reshape. I took the liberty of cleaning up the column names. Here, I'm only showing you a few rows of the output. Remove head to see the full output. This assumes your data.frame is named "mydata".
names(mydata) <- c("CountryName", "CountryCode",
"IndicatorName", "IndicatorCode", "X2010")
head(reshape(mydata[-c(2:3)],
direction = "wide",
idvar = "CountryName",
timevar = "IndicatorCode"))
# CountryName X2010.NY.GDP.DEFL.KD.ZG X2010.SL.UEM.TOTL.ZS
# 1 Turkey 5.675740 11.9
# 3 Afghanistan 9.437322 NA
# 5 Albania 3.459343 NA
# 7 Algeria 16.245617 11.4
# 9 American Samoa NA NA
# 11 Andorra NA NA
Another option in base R is xtabs, but NA gets replaced with 0:
head(xtabs(X2010 ~ CountryName + IndicatorCode, mydata))
# IndicatorCode
# CountryName NY.GDP.DEFL.KD.ZG SL.UEM.TOTL.ZS
# Afghanistan 9.437322 0.0
# Albania 3.459343 0.0
# Algeria 16.245617 11.4
# American Samoa 0.000000 0.0
# Andorra 0.000000 0.0
# Angola 22.393924 0.0
The result of xtabs is a matrix, so if you want a data.frame, wrap the output with as.data.frame.matrix.

Resources