Assign values to a name within a function - r

Here is my code:
get_test <- function(name){
data <- filter(data_all_country,country == name)
# transform the data to a time series using `ts` in `stats`
data <- ts(data$investment, start = 1950)
data <- log(data)
rule <- substitute(name)
assign(rule,data)
}
As in the code, I try to build a function by which I could input a country's name given in character string, and then the variable named by the country would be generated automatically. However, I run this code, and it runs but with no exact variable generated as I want. For example, I want to have a variable called Albania in the environment after I code get_test("Albania").
I wonder why?
Ps: And the dataset of data_all_country is as following:
year country investment
1 1950 Albania NA
2 1951 Albania NA
3 1952 Albania NA
4 1953 Albania NA
5 1954 Albania NA
6 1955 Albania NA
Note that the dataset is OK, just some of it is NA

I think you have to specify the environment for assign, else it will use the current environment (in this case within the function).
You could use
assign(name, data, envir = .GlobalEnv)
or
assign(name, data, pos = 1)

Related

what's the difference between these two classes and why I must use as_tibble function in my code?

Here is my code. I have two questions about my codes. The first question is what's the difference between these two classes? The second question is why I must use as_tibble() function so that I could use pivot_wider() function?
head(global_economy)
write.table(global_economy,"global_economy.csv",sep=",",row.names=FALSE)
class(global_economy)
Country Code Year GDP Growth CPI Imports Exports Population
Afghanistan AFG 1960 537777811 NA NA 7.024793 4.132233 8996351
Afghanistan AFG 1961 548888896 NA NA 8.097166 4.453443 9166764
Afghanistan AFG 1962 546666678 NA NA 9.349593 4.878051 9345868
Afghanistan AFG 1963 751111191 NA NA 16.863910 9.171601 9533954
Afghanistan AFG 1964 800000044 NA NA 18.055555 8.888893 9731361
Afghanistan AFG 1965 1006666638 NA NA 21.412803 11.258279 9938414
'tbl_ts' 'tbl_df' 'tbl' 'data.frame'
wider_tibble <- global_economy %>%
as_tibble()%>%
pivot_wider(names_from=Country,values_from=GDP)
class(wider_tibble)
'tbl_df' 'tbl' 'data.frame'
My guess: The first one is a time_series ('table_ts'). 'as_tibble' returns it to a dataframe without time series elements so that 'pivot_wider' works.

Revaluing many observations with a for loop in R

I have a data set where I am looking at longitudinal data for countries.
master.set <- data.frame(
Country = c(rep("Afghanistan", 3), rep("Albania", 3)),
Country.ID = c(rep("Afghanistan", 3), rep("Albania", 3)),
Year = c(2015, 2016, 2017, 2015, 2016, 2017),
Happiness.Score = c(3.575, 3.360, 3.794, 4.959, 4.655, 4.644),
GDP.PPP = c(1766.593, 1757.023, 1758.466, 10971.044, 11356.717, 11803.282),
GINI = NA,
Status = 2,
stringsAsFactors = F
)
> head(master.set)
Country Country.ID Year Happiness.Score GDP.PPP GINI Status
1 Afghanistan Afghanistan 2015 3.575 1766.593 NA 2
2 Afghanistan Afghanistan 2016 3.360 1757.023 NA 2
3 Afghanistan Afghanistan 2017 3.794 1758.466 NA 2
4 Albania Albania 2015 4.959 10971.044 NA 2
5 Albania Albania 2016 4.655 11356.717 NA 2
6 Albania Albania 2017 4.644 11803.282 NA 2
I created that Country.ID variable with the intent of turning them into numerical values 1:159.
I am hoping to avoid doing something like this to replace the value at each individual observation:
master.set$Country.ID <- master.set$Country.ID[master.set$Country.ID == "Afghanistan"] <- 1
As I implied, there are 159 countries listed in the data set. Because it' longitudinal, there are 460 observations.
Is there any way to use a for loop to save me a lot of time? Here is what I attempted. I made a couple of lists and attempted to use an ifelse command to tell R to label each country the next number.
Here is what I have:
#List of country names
N.Countries <- length(unique(master.set$Country))
Country <- unique(master.set$Country)
Country.ID <- unique(master.set$Country.ID)
CountryList <- unique(master.set$Country)
#For Loop to make Country ID numerically match Country
for (i in 1:460){
for (j in N.Countries){
master.set[[Country.ID[i]]] <- ifelse(master.set[[Country[i]]] == CountryList[j], j, master.set$Country)
}
}
I received this error:
Error in `[[<-.data.frame`(`*tmp*`, Country.ID[i], value = logical(0)) :
replacement has 0 rows, data has 460
Does anyone know how I can accomplish this task? Or will I be stuck using the ifelse command 159 times?
Thanks!
Maybe something like
master.set$Country.ID <- as.numeric(as.factor(master.set$Country.ID))
Or alternatively, using dplyr
library(tidyverse)
master.set <- master.set %>% mutate(Country.ID = as.numeric(as.factor(Country.ID)))
Or this, which creates a new variable Country.ID2based on a key-value pair between Country.ID and a 1:length(unique(Country)).
library(tidyverse)
master.set <- left_join(master.set,
data.frame( Country = unique(master.set$Country),
Country.ID2 = 1:length(unique(master.set$Country))))
master.set
#> Country Country.ID Year Happiness.Score GDP.PPP GINI Status
#> 1 Afghanistan Afghanistan 2015 3.575 1766.593 NA 2
#> 2 Afghanistan Afghanistan 2016 3.360 1757.023 NA 2
#> 3 Afghanistan Afghanistan 2017 3.794 1758.466 NA 2
#> 4 Albania Albania 2015 4.959 10971.044 NA 2
#> 5 Albania Albania 2016 4.655 11356.717 NA 2
#> 6 Albania Albania 2017 4.644 11803.282 NA 2
#> Country.ID2
#> 1 1
#> 2 1
#> 3 1
#> 4 2
#> 5 2
#> 6 2
library(dplyr)
df<-data.frame("Country"=c("Afghanistan","Afghanistan","Afghanistan","Albania","Albania","Albania"),
"Year"=c(2015,2016,2017,2015,2016,2017),
"Happiness.Score"=c(3.575,3.360,3.794,4.959,4.655,4.644),
"GDP.PPP"=c(1766.593,1757.023,1758.466,10971.044,11356.717,11803.282),
"GINI"=NA,
"Status"=rep(2,6))
df1<-df %>% arrange(Country) %>% mutate(Country_id = group_indices_(., .dots="Country"))
View(df1)

R: Creating a table with the highest values by year

I hope I don't ask a question that has been asked already, but I couldn't quite find what I was looking for. I am fairly new to R and have no experience with programming.
I want to make a table with the top 10 values of three sections for each year If my data looks somthing like this:
Year Country Test1 Test2 Test3
2000 ALB 500 497 501
2001 ALB NA NA NA
...
2000 ARG 502 487 354
2001 ARG NA NA NA
...
(My years go from 2000 to 2015, I only have observations for every three years, and even in those years still a lot of NA's for some countries or tests)
I would like to get a table in which I can see the 10 top values for each test for each year. So for the year 2000,2003,2006,...,2015 the top ten values and the countries that reached those values for test 1,2&3.
AND then (I am not sure if this should be a separate question) I would like to get the table into Latex.
Easier to see top values this way.
You could use dcast and melt from data.table package:
# convert to data table
setDT(df)
# convert it to long format and select the columns to used
df1 <- melt(df, id.vars=1:2)
df1 <- df1[,c(1,2,4)]
# get top values year and country
df1 <- df1[,top_value := .(list(sort(value, decreasing = T))), .(Year, Country)][,.(Year, Country, top_value)]
print(df1)
Year Country top_value
1: 2000 ALB 501,500,497
2: 2001 ALB
3: 2000 ARG 502,487,354
4: 2001 ARG
5: 2000 ALB 501,500,497
6: 2001 ALB
7: 2000 ARG 502,487,354
8: 2001 ARG
9: 2000 ALB 501,500,497
10: 2001 ALB
11: 2000 ARG 502,487,354
12: 2001 ARG

Pass a string argument to a function as dataframe column name in dplyr

I am trying to pass a string variable to a function, to be used as the column name after some data alteration.
Here is the function:
cleandata <- function(df,name){
df <- df %>%
gather(key = 'Year',value = name,X1960:X2015)
df <- df %>%
select(-c(X,Indicator.Name,Indicator.Code))
df$Year <- substr(df$Year,start = 2,stop = 5)
df$Year <- as.factor(df$Year)
return(df)
}
I want to pass a string variable to 'name', and have it as the column name.
The current output of the function is:
> cleandata(lifeexp,'LifeExp')
Source: local data frame [13,888 x 4]
Country.Name Country.Code Year name
(fctr) (fctr) (fctr) (dbl)
1 Aruba ABW 1960 65.56937
2 Andorra AND 1960 NA
3 Afghanistan AFG 1960 32.32851
4 Angola AGO 1960 32.98483
5 Albania ALB 1960 62.25437
6 Arab World ARB 1960 46.84706
7 United Arab Emirates ARE 1960 52.24322
8 Argentina ARG 1960 65.21554
9 Armenia ARM 1960 65.86346
10 American Samoa ASM 1960 NA
.. ... ... ... ...
>
The last column should be 'LifeExp', not name. What am I missing?
Thanks in advance,
Rahul
You want to use gather_ here. See vignette('nse') for an explanation why.
year_cols <- names(df)[grepl('^X\\d{4}$', names(df))]
df %>% gather_('Year', name, year_cols)
The issue is gather takes an unquoted name for its key and value columns, so you can't pass in a variable name. It's just going to interpret what ever variable name you put in there as the the unquoted name you want for the value column. This is consistent with the principle that the tidyr functions without underscores are meant for interactive use and those with underscores should be used when your effort is more programmatic.

Creating new variable and new data rows for country-conflict-year observations

I'm very new to R, still learning the very basics, and I haven't yet figured out how to perform this particular operation, but it would save me lots and lots of labor and time.
I have a dataset of international conflicts with columns for country and dates that looks something like this:
country dates
Angola 1951-1953
Belize 1970-1972
I would like to reorganize the data to create variables for start year and end year, as well as create a year-observed (call it 'yrobs') column, so the set looks more like this:
country yrobs yrstart yrend
Angola 1951 1951 1953
Angola 1952 1951 1953
Angola 1953 1951 1953
Belize 1970 1970 1972
Belize 1971 1970 1972
Belize 1972 1970 1972
Someone suggested using data frames and a double for-loop, but I got a little confused trying that. Any help would be greatly appreciated, and feel free to use dummy language, as I'm still pretty green to the programming here. Thanks much.
No need for any for loops here. Use the power of R and its contributed packages, particularly plyr and reshape2.
library(reshape2)
library(plyr)
Create some data:
df <- data.frame(
country =c("Angola","Belize"),
dates = c("1951-1953", "1970-1972")
)
Use colsplit in the reshape package to split your dates column into two, and cbind this to the original data frame.
df <- cbind(df, colsplit(df$date, "-", c("start", "end")))
Now for the fun bit. Use ddply in package plyr to split, apply and combine (SAC). This will take df and apply a function to each change in country. The anonymous function inside ddply creates a small data.frame with country and observations, and the key bit is to use seq() to generate a sequence from start to end date. The power of ddply is that it does all of this splitting, combining and applying in one step. Think of it as a loop in other languages, but you don't need to keep track of your indexing variables.
ddply(df, .(country), function(x){
data.frame(
country=x$country,
yrobs=seq(x$start, x$end),
yrstart=x$start,
yrend=x$end
)
}
)
And the results:
country yrobs yrstart yrend
1 Angola 1951 1951 1953
2 Angola 1952 1951 1953
3 Angola 1953 1951 1953
4 Belize 1970 1970 1972
5 Belize 1971 1970 1972
6 Belize 1972 1970 1972

Resources