Construct a variable that conditionally takes a certain value until another condition is met - r

I have a panel dataset with data on conflicts for which I want to identify the post-conflict years.
So I constructed a variable myself, which codes a transition from conflict to peace with "3". Whenever the values for a new country begin, I coded that same variable with NA. S
What I want to do now is to create a new binary variable which identifies post-conflict years with a 1 and conflict years and never conflict with 0. For that I would have to assign every year, following a 3 in the transition variable with a 1 until there is an NA in the same column. As follows:
Country Year transition post-conflict
Afghanistan 1994 0 0
Afghanistan 1995 0 0
Afghanistan 1996 3 1
Afghanistan 1997 2 1
Afghanistan 1998 2 1
Albania 1994 NA 0
Albania 1994 2 0
How could I go about this?

You probably shouldn't use NA like that. It prevents functions like which, sum, and cumsum from working as you may want them to. You likely don't need to mark the first row of a new country anyway, since most R functions you would use for your analysis can group by Country without needing a special marker showing where each group starts.
Below I change NA to something different, and make transition a factor. Then you can use cumsum to create your new column.
library(data.table)
setDT(df) # assuming your data is called df
# fix transition column
df[is.na(transition), transition := 90]
df[, transition := as.factor(transition)]
# create post_conflict column
df[, post_conflict := cumsum(transition == 3), by = Country]
# Country Year transition post_conflict
# 1: Afghanistan 1994 0 0
# 2: Afghanistan 1995 0 0
# 3: Afghanistan 1996 3 1
# 4: Afghanistan 1997 2 1
# 5: Afghanistan 1998 2 1
# 6: Albania 1994 90 0
# 7: Albania 1994 2 0

Related

R: Creating a table with the highest values by year

I hope I don't ask a question that has been asked already, but I couldn't quite find what I was looking for. I am fairly new to R and have no experience with programming.
I want to make a table with the top 10 values of three sections for each year If my data looks somthing like this:
Year Country Test1 Test2 Test3
2000 ALB 500 497 501
2001 ALB NA NA NA
...
2000 ARG 502 487 354
2001 ARG NA NA NA
...
(My years go from 2000 to 2015, I only have observations for every three years, and even in those years still a lot of NA's for some countries or tests)
I would like to get a table in which I can see the 10 top values for each test for each year. So for the year 2000,2003,2006,...,2015 the top ten values and the countries that reached those values for test 1,2&3.
AND then (I am not sure if this should be a separate question) I would like to get the table into Latex.
Easier to see top values this way.
You could use dcast and melt from data.table package:
# convert to data table
setDT(df)
# convert it to long format and select the columns to used
df1 <- melt(df, id.vars=1:2)
df1 <- df1[,c(1,2,4)]
# get top values year and country
df1 <- df1[,top_value := .(list(sort(value, decreasing = T))), .(Year, Country)][,.(Year, Country, top_value)]
print(df1)
Year Country top_value
1: 2000 ALB 501,500,497
2: 2001 ALB
3: 2000 ARG 502,487,354
4: 2001 ARG
5: 2000 ALB 501,500,497
6: 2001 ALB
7: 2000 ARG 502,487,354
8: 2001 ARG
9: 2000 ALB 501,500,497
10: 2001 ALB
11: 2000 ARG 502,487,354
12: 2001 ARG

Create time event based dummy variable in R - leads & lags

I am currently searching for a method to create a set of dummy variables indicating a time event in a panel. Explicitly I am trying to make dummy variables indicating the event 20 years prior the event and 20 years after the event, e.g. the effect of a war on trade in 20 years. I want to code this dummy for each parnter in the dyads. How is it possible, to elegantly programm these event dummies ? I would appreciate your help :)
iso_o iso_d year mid_o mid_d
ABW AFG 1980 0 1
ABW AFG 1981 0 1
ABW AFG 1982 0 1
ABW AFG 1983 0 2
ABW AFG 1984 0 1
ABW AFG 1985 0 1
ABW AFG 1986 0 1
ABW AFG 1987 0 1
ABW AFG 1988 0 0
ABW AFG 1989 0 1
So and this is where I want to go to:
iso_o iso_d year mid_o mid_d mid_o_t-20 mid_o_t-19 mid o_t-18 .... mid_d_t-20
ABW AFG 1980 0 1 0 0 0
ABW AFG 1981 0 1 0 0 0
ABW AFG 1982 0 1 0 0 0
ABW AFG 1983 0 2 0 0 0
ABW AFG 1984 0 1 0 0 0
ABW AFG 1985 0 1 0 0 0
I'm assuming here da.f (short for data.frame with no collision with known functions) follows approximately your structure as you did not include it in the question.
library(zoo)
#da.f is randomly generated in this example
da.f = data.frame(mid_o = sample(seq(0,4), 50, replace = TRUE), mid_d = sample(seq(0,4), 50, replace = TRUE))
#our result consists of 20 lags backward and forward in time
res = lag(as.zoo(da.f), -20:20, na.pad = TRUE)
On May 10th 2018 it was pointed to me by #thistleknot (thanks!) that dplyr masks stats's own lag generic. Therefore make sure you don't have dplyr attached, or instead run stats::lag explicitly, otherwise my code won't run.
I think I found the culprit: github.com/tidyverse/dplyr/issues/1586
answer: This is a natural consequence of having lots of R packages.
Just be explicit and use stats::lag or dplyr::lag
Hello There and thank you for your help!
I found the solution to the problem: I had to convert the data.frame to a data.table in the first place. Seconly I found a way to create multiple columns in data.table combining the commands sprintif and shift. Therby I could create 20 lags and 20 leads within only 4 lines of code.
df[, sprintf("mid_o_lag_%0d", 1:20) := shift(mid_o, c(1:20), type = 'lag')]
df[, sprintf("mid_d_lag_%0d", 1:20) := shift(mid_d, c(1:20), type = 'lag')]
df[, sprintf("mid_o_lead_%0d", 1:20) := shift(mid_o, c(1:20), type = 'lead')]
df[, sprintf("mid_d_lead_%0d", 1:20) := shift(mid_d, c(1:20), type = 'lead')]

Cumulative sums for the previous row

I'm trying to get cumulative sums for the previous row/year. Running cumsum(data$fonds) gives me the running totals of adjacent sells, which doesn't work for what I want to do. I would like to have my data look like the following:
year fond cumsum
1 1950 0 0
2 1951 1 0
3 1952 3 1
4 1953 0 4
5 1954 0 4
Any help would be appreciated.
data$cumsum <- c(0, cumsum(data$fonds)[-nrow(data)])
With data.table, we can use the shift function. By default, it gives type="lag"
library(data.table)
setDT(df1)[, Cumsum := cumsum(shift(fond, fill= 0))]

Combine lists of different lengths

I am new to R and started learning two weeks ago. I want to take a list of tropical cyclone counts for various years (where some years are absent, because there were no tropical cyclones) and create a list with a column of every year from 1907-2013 and a column of the number of tropical cyclones.
In the example I include the list of occurrences to 1973 (before 1912 there were none).
Year Count
1 1912 1
2 1913 1
3 1921 1
4 1940 1
5 1953 1
6 1958 1
7 1959 1
8 1960 1
9 1966 1
10 1969 1
11 1971 1
12 1973 2
I tried using a for loop and if/else statement, but it does not work. I get the message "longer object length is not a multiple of shorter object length" and "the condition has length > 1 and only the first element will be used."
tc.SP=matrix(0,len.tc.yr,2)
tc.SP[,1]=tc.year.list
for (i in 1:len.tc.yr) #107 yrs (1907-2013)
{
if (tc.SP5.count[,1] == tc.SP[,1]) #tc.SP5.count is various years of TC occ.
{tc.SP[,2]= tc.SP5.count[,2]}
else
{tc.SP[,2]= 0}
}
Thank you for any help in advance.
When you say list, i'm going to assume you want to create a data.frame. Let's say the data above is in a data.frame called cyclone. The easiest way to create a data.frame for every year is just to merge it with a complete list. For example
cyclone.full <- merge(cyclone, data.frame(Year=1907:2013), all=T)
Here the data.frames will automatically merge on the Year column because both sets have that column. This will put NA values in all the missing years. If you want the default to be 0, you can do
cyclone.full$Count[is.na(cyclone.full$Count)] <- 0
Then yo uget
head(cyclone.full)
# Year Count
# 1 1907 0
# 2 1908 0
# 3 1909 0
# 4 1910 0
# 5 1911 0
# 6 1912 1

How to reshape this complicated data frame?

Here is first 4 rows of my data;
X...Country.Name Country.Code Indicator.Name
1 Turkey TUR Inflation, GDP deflator (annual %)
2 Turkey TUR Unemployment, total (% of total labor force)
3 Afghanistan AFG Inflation, GDP deflator (annual %)
4 Afghanistan AFG Unemployment, total (% of total labor force)
Indicator.Code X2010
1 NY.GDP.DEFL.KD.ZG 5.675740
2 SL.UEM.TOTL.ZS 11.900000
3 NY.GDP.DEFL.KD.ZG 9.437322
4 SL.UEM.TOTL.ZS NA
I want my data reshaped into two colums, one of each Indicator code, and I want each row correspond to a country, something like this;
Country Name NY.GDP.DEFL.KD.ZG SL.UEM.TOTL.ZS
Turkey 5.6 11.9
Afghanistan 9.43 NA
I think I could do this with Excel, but I want to learn the R way, so that I don't need to rely on excel everytime I have a problem. Here is dput of data if you need it.
Edit: I actually want 3 colums, one for each indicator and one for the country's name.
Sticking with base R, use reshape. I took the liberty of cleaning up the column names. Here, I'm only showing you a few rows of the output. Remove head to see the full output. This assumes your data.frame is named "mydata".
names(mydata) <- c("CountryName", "CountryCode",
"IndicatorName", "IndicatorCode", "X2010")
head(reshape(mydata[-c(2:3)],
direction = "wide",
idvar = "CountryName",
timevar = "IndicatorCode"))
# CountryName X2010.NY.GDP.DEFL.KD.ZG X2010.SL.UEM.TOTL.ZS
# 1 Turkey 5.675740 11.9
# 3 Afghanistan 9.437322 NA
# 5 Albania 3.459343 NA
# 7 Algeria 16.245617 11.4
# 9 American Samoa NA NA
# 11 Andorra NA NA
Another option in base R is xtabs, but NA gets replaced with 0:
head(xtabs(X2010 ~ CountryName + IndicatorCode, mydata))
# IndicatorCode
# CountryName NY.GDP.DEFL.KD.ZG SL.UEM.TOTL.ZS
# Afghanistan 9.437322 0.0
# Albania 3.459343 0.0
# Algeria 16.245617 11.4
# American Samoa 0.000000 0.0
# Andorra 0.000000 0.0
# Angola 22.393924 0.0
The result of xtabs is a matrix, so if you want a data.frame, wrap the output with as.data.frame.matrix.

Resources