Running rapply on lists of dataframes - r

To follow-up on two rapply questions, here and here from years ago, it seems rapply only works on simple classes (i.e., vector, matrix) and not the multifaceted data.frame class.
In most cases and demonstrated below, the rapply equivalent is nested lapply and its variant wrappers, v/sapply where the number of nests correlates to number of levels. Below is my testing scenario between nested lapply and rapply between vector, matrix, and dataframe types. All but datafames fail to equalize.
Question
Is there a use case in base R for rapply() to recursively run operations on a list of dataframes and return a list of dataframes as it does for lists of vectors or matrices? If not, is this a bug or should it be warned in ?rapply base R docs? Most tutorials do not show rapply dataframe examples.
One Dimension (character vector)
Below shows how rapply is equivalent to nested lapply on simple character vectors running count of characters, and even shows how rapply is appreciably faster in processing:
library(microbenchmark)
ScriptLists <- list(R = list.files(path="/path/to/Scripts", pattern="\\.R"),
Python = list.files(path="/path/to/Scripts", pattern="\\.py"),
SQL = list.files(path="/path/to/Scripts", pattern="\\.sql"),
PHP = list.files(path="/path/to/Scripts", pattern="\\.xsl"),
XSLT = list.files(path="/path/to/Scripts", pattern="\\.php"))
microbenchmark(
ScriptsLists1 <- lapply(ScriptLists, function(i){
unname(vapply(i, function(x){
nchar(x)
}, numeric(1)))
})
)
# Unit: microseconds
# min lq mean median uq max neval
# 384 408.782 524.1363 434.7675 678.016 886.377 100
microbenchmark(
ScriptsLists2 <- rapply(ScriptLists, function(x){
nchar(x)
}, how="list")
)
# Unit: microseconds
# min lq mean median uq max neval
# 110.196 112.8425 131.6141 114.5265 123.91 352.722 100
all.equal(ScriptsLists1, ScriptsLists2)
# [1] TRUE
Two Dimension Type (matrix vs. data.frame)
Input dataframe (pulled from highest year rankings of StackOverflow top users) to build list of top users' dataframes by language tags (C#, Python, R, etc.).
df <- structure(list(user = structure(c(12L, 14L, 19L, 35L, 22L, 32L,
1L, 36L, 7L, 9L, 2L, 18L, 27L, 6L, 30L, 20L, 10L, 24L, 29L, 23L,
5L, 3L, 4L, 15L, 25L, 17L, 11L, 8L, 33L, 13L, 34L, 16L, 21L,
26L, 28L, 31L), .Label = c("akrun", "alecxe", "Alexey Mezenin",
"BalusC", "Barmar", "CommonsWare", "Darin Dimitrov", "dasblinkenlight",
"Eric Duminil", "Felix Kling", "Frank van Puffelen", "Gordon Linoff",
"Greg Hewgill", "Günter Zöchbauer", "GurV", "Hans Passant", "JB Nizet",
"Jean-François Fabre", "jezrael", "Jon Skeet", "Jonathan Leffler",
"Martijn Pieters", "Martin R", "matt", "Nina Scholz", "paxdiablo",
"piRSquared", "Pranav C Balan", "Psidom", "Quentin", "Suragch",
"T.J. Crowder", "Tim Biegeleisen", "unutbu", "VonC", "Wiktor Stribi?ew"
), class = "factor"), link = structure(c(2L, 17L, 21L, 31L, 1L,
10L, 27L, 28L, 22L, 33L, 35L, 34L, 20L, 3L, 15L, 19L, 18L, 25L,
29L, 4L, 8L, 5L, 11L, 32L, 6L, 30L, 16L, 24L, 13L, 36L, 14L,
12L, 9L, 7L, 23L, 26L), .Label = c("http://www.stackoverflow.com//users/100297/martijn-pieters",
"http://www.stackoverflow.com//users/1144035/gordon-linoff",
"http://www.stackoverflow.com//users/115145/commonsware", "http://www.stackoverflow.com//users/1187415/martin-r",
"http://www.stackoverflow.com//users/1227923/alexey-mezenin",
"http://www.stackoverflow.com//users/1447675/nina-scholz", "http://www.stackoverflow.com//users/14860/paxdiablo",
"http://www.stackoverflow.com//users/1491895/barmar", "http://www.stackoverflow.com//users/15168/jonathan-leffler",
"http://www.stackoverflow.com//users/157247/t-j-crowder", "http://www.stackoverflow.com//users/157882/balusc",
"http://www.stackoverflow.com//users/17034/hans-passant", "http://www.stackoverflow.com//users/1863229/tim-biegeleisen",
"http://www.stackoverflow.com//users/190597/unutbu", "http://www.stackoverflow.com//users/19068/quentin",
"http://www.stackoverflow.com//users/209103/frank-van-puffelen",
"http://www.stackoverflow.com//users/217408/g%c3%bcnter-z%c3%b6chbauer",
"http://www.stackoverflow.com//users/218196/felix-kling", "http://www.stackoverflow.com//users/22656/jon-skeet",
"http://www.stackoverflow.com//users/2336654/pirsquared", "http://www.stackoverflow.com//users/2901002/jezrael",
"http://www.stackoverflow.com//users/29407/darin-dimitrov", "http://www.stackoverflow.com//users/3037257/pranav-c-balan",
"http://www.stackoverflow.com//users/335858/dasblinkenlight",
"http://www.stackoverflow.com//users/341994/matt", "http://www.stackoverflow.com//users/3681880/suragch",
"http://www.stackoverflow.com//users/3732271/akrun", "http://www.stackoverflow.com//users/3832970/wiktor-stribi%c5%bcew",
"http://www.stackoverflow.com//users/4983450/psidom", "http://www.stackoverflow.com//users/571407/jb-nizet",
"http://www.stackoverflow.com//users/6309/vonc", "http://www.stackoverflow.com//users/6348498/gurv",
"http://www.stackoverflow.com//users/6419007/eric-duminil", "http://www.stackoverflow.com//users/6451573/jean-fran%c3%a7ois-fabre",
"http://www.stackoverflow.com//users/771848/alecxe", "http://www.stackoverflow.com//users/893/greg-hewgill"
), class = "factor"), location = structure(c(17L, 15L, 8L, 12L,
10L, 26L, 1L, 28L, 23L, 1L, 17L, 25L, 6L, 29L, 26L, 19L, 24L,
1L, 5L, 13L, 4L, 2L, 3L, 1L, 7L, 20L, 21L, 27L, 22L, 11L, 1L,
16L, 9L, 1L, 18L, 14L), .Label = c("", "??????", "Amsterdam, Netherlands",
"Arlington, MA", "Atlanta, GA, United States", "Bellevue, WA, United States",
"Berlin, Deutschland", "Bratislava, Slovakia", "California, USA",
"Cambridge, United Kingdom", "Christchurch, New Zealand", "France",
"Germany", "Hohhot, China", "Linz, Austria", "Madison, WI", "New York, United States",
"Ramanthali, Kannur, Kerala, India", "Reading, United Kingdom",
"Saint-Etienne, France", "San Francisco, CA", "Singapore", "Sofia, Bulgaria",
"Sunnyvale, CA", "Toulouse, France", "United Kingdom", "United States",
"Warsaw, Poland", "Who Wants to Know?"), class = "factor"), year_rep = structure(c(36L,
35L, 34L, 33L, 32L, 31L, 30L, 29L, 28L, 27L, 26L, 25L, 24L, 23L,
22L, 21L, 20L, 19L, 18L, 17L, 16L, 15L, 14L, 13L, 12L, 11L, 10L,
9L, 8L, 7L, 6L, 5L, 4L, 3L, 2L, 1L), .Label = c("3,580", "3,604",
"3,636", "3,649", "3,688", "3,735", "3,796", "3,814", "3,886",
"3,920", "3,923", "3,950", "4,016", "4,046", "4,142", "4,179",
"4,195", "4,236", "4,313", "4,324", "4,348", "4,464", "4,475",
"4,482", "4,526", "4,723", "4,854", "4,936", "4,948", "5,188",
"5,258", "5,337", "5,577", "5,740", "5,835", "5,985"), class = "factor"),
total_rep = structure(c(18L, 2L, 34L, 27L, 22L, 20L, 5L,
3L, 31L, 1L, 6L, 9L, 13L, 25L, 21L, 36L, 14L, 4L, 11L, 7L,
8L, 10L, 30L, 29L, 24L, 15L, 35L, 17L, 33L, 23L, 12L, 28L,
16L, 19L, 26L, 32L), .Label = c("12,557", "154,439", "158,134",
"220,515", "229,553", "233,368", "269,380", "289,989", "30,027",
"31,602", "36,950", "401,595", "41,183", "411,535", "418,780",
"455,157", "475,813", "499,408", "507,043", "508,310", "509,365",
"525,176", "529,137", "61,135", "616,135", "64,476", "651,397",
"672,118", "7,932", "703,046", "709,683", "71,032", "77,211",
"83,237", "86,520", "921,690"), class = "factor"), tag1 = structure(c(15L,
2L, 10L, 6L, 11L, 8L, 12L, 13L, 4L, 14L, 11L, 11L, 10L, 1L,
8L, 4L, 8L, 16L, 11L, 16L, 8L, 9L, 7L, 15L, 8L, 7L, 5L, 4L,
15L, 6L, 11L, 4L, 3L, 3L, 8L, 16L), .Label = c("android",
"angular2", "c", "c#", "firebase", "git", "java", "javascript",
"laravel", "pandas", "python", "r", "regex", "ruby", "sql",
"swift"), class = "factor"), tag2 = structure(c(23L, 24L,
19L, 8L, 20L, 14L, 6L, 13L, 3L, 21L, 22L, 20L, 19L, 12L,
10L, 12L, 14L, 11L, 17L, 11L, 18L, 18L, 15L, 16L, 2L, 9L,
7L, 12L, 16L, 19L, 17L, 1L, 4L, 5L, 14L, 11L), .Label = c(".net",
"arrays", "asp.net-mvc", "bash", "c++", "dplyr", "firebase-database",
"github", "hibernate", "html", "ios", "java", "javascript",
"jquery", "jsf", "mysql", "pandas", "php", "python", "python-3.x",
"ruby-on-rails", "selenium", "sql-server", "typescript"), class = "factor"),
tag3 = structure(c(20L, 17L, 11L, 12L, 24L, 15L, 11L, 8L,
5L, 4L, 23L, 24L, 11L, 3L, 10L, 1L, 6L, 31L, 25L, 28L, 18L,
19L, 26L, 27L, 22L, 16L, 2L, 9L, 15L, 13L, 21L, 30L, 29L,
7L, 14L, 2L), .Label = c(".net", "android", "android-intent",
"arrays", "asp.net-mvc-3", "asynchronous", "bash", "c#",
"c++", "css", "dataframe", "docker", "git-pull", "html",
"java", "java-8", "javascript", "jquery", "laravel-5.3",
"mysql", "numpy", "object", "protractor", "python-2.7", "r",
"servlets", "sql-server", "swift3", "unix", "winforms", "xcode"
), class = "factor")), .Names = c("user", "link", "location",
"year_rep", "total_rep", "tag1", "tag2", "tag3"), class = "data.frame", row.names = c(NA,
-36L))
R Code
Below methods average year_rep and total_rep (5th/6th) columns in either types, matrix or dataframe. Be sure to change return statements in setup block, swapping out the commented section type. Notice only the rapply() for matrix returns same as nested lapply, but not for dataframe returns.
# NESTED LIST SETUP ------------------------------------
LangLists <- list(`c#`=list(), python=list(), sql=list(), php=list(), r=list(),
java=list(), javascript=list(), ruby=list(), `c++`=list())
LangLists <- setNames(mapply(function(i, j){
df <- subset(df, tag1 == j | tag2 == j | tag3 == j)
df$year_rep <- as.numeric(as.character(gsub(",", "", df$year_rep)))
df$total_rep <- as.numeric(as.character(gsub(",", "", df$total_rep)))
return(list(as.matrix(df))) # MATRIX TYPE
# return(list(df)) # DF TYPE
}, LangLists, names(LangLists), SIMPLIFY=FALSE), names(LangLists))
# -----------------------------------------------------
# MATRIX RETURN
LangLists1 <- lapply(LangLists, function(i){
lapply(i, function(df){
cbind(mean(as.numeric(df[,5])),
mean(as.numeric(df[,6])))
})
})
LangLists2 <- rapply(LangLists, function(i){
cbind(mean(as.numeric(i[,5])),
mean(as.numeric(i[,6])))
}, classes="matrix", how="list")
all.equal(LangLists1, LangLists2)
# [1] TRUE
# DATA FRAME RETURN
LangLists1 <- lapply(LangLists, function(i){
lapply(i, function(df){
data.frame(year_rep=mean(df$year_rep),
total_rep=mean(df$total_rep))
})
})
LangLists2 <- rapply(LangLists, function(i){
data.frame(year_rep=mean(i$year_rep),
total_rep=mean(i$total_rep))
}, classes="data.frame", how="list")
all.equal(LangLists1, LangLists2)
# [1] "Component “c#”: Component 1: Names: 2 string mismatches"
# [2] "Component “c#”: Component 1: Attributes: < names for target but not for current >"
# [3] "Component “c#”: Component 1: Attributes: < Length mismatch: comparison on first 0 components >"
# [4] "Component “c#”: Component 1: Length mismatch: comparison on first 2 components"
# [5] "Component “c#”: Component 1: Component 1: Modes: numeric, NULL"
...
In fact, whereas the nested lapply remains a list of intact dataframes of the two columns for rep means, the rapply for dataframes converts underlying dataframes to lists of NULLs. So again, why does rapply fail to return original list of dataframes compared to vectors/matrices?
# $`c#`
# $`c#`[[1]]
# $`c#`[[1]]$X
# NULL
# $`c#`[[1]]$user
# NULL
# $`c#`[[1]]$link
# NULL
# $`c#`[[1]]$location
# NULL
# $`c#`[[1]]$year_rep
# NULL
# $`c#`[[1]]$total_rep
# NULL
# $`c#`[[1]]$tag1
# NULL
# $`c#`[[1]]$tag2
# NULL
# $`c#`[[1]]$tag3
# NULL
# $python
# $python[[1]]
# $python[[1]]$X
# NULL
# $python[[1]]$user
# NULL
# $python[[1]]$link
# NULL
# $python[[1]]$location
# NULL
# $python[[1]]$year_rep
# NULL
# $python[[1]]$total_rep
# NULL
# $python[[1]]$tag1
# NULL
# $python[[1]]$tag2
# NULL
# $python[[1]]$tag3
# NULL

It appears that rapply is not designed to process lists of data.frames.
From the Details section of ?rapply it says, if
how = "list" or how = "unlist", the list is copied, all non-list elements which have a class included in classes are replaced by the result of applying f to the element and all others are replaced by deflt.
Since data.frames are lists, they do not fall under the first category. Thus, they fall into the all others catch-all and are replaced by dflt, whose default value is NULL. This explains the result of the final line of code in the question.
The final alternative argument to how is "replace" and it appears that data.frames are simply ignored under this "mode"
If how = "replace", each element of the list which is not itself a list and has a class included in classes is replaced by the result of applying f to the element.
No mention of elements which are themselves lists and running the code above with how="replace" appears to return a nested list where what were data.frames are now simple lists. So it appears that rapply went through and stripped the class attribute.

Related

R Object not found when aggregating data

I want to have the mean score of every Dish 'group'. However, when I try using the following code it doesn't find the Objects Dish/Score.
dishes <- read.table(file=theFile, header=TRUE, sep=",")
aggdata <- aggregate(formula= Score ~ Dish,
data = dishes,
FUN = mean)
And a dput from some of the data for reconstruction:
structure(list(UserName.Dish.Score = structure(c(26L, 25L, 23L,
24L, 13L, 14L, 15L, 17L, 28L, 22L, 12L, 30L, 29L, 20L, 18L, 19L,
16L, 27L, 21L, 10L, 9L, 7L, 8L, 2L, 3L, 4L, 5L, 11L, 6L, 1L), .Label = c("kjetsand;Bacalao;3",
"kjetsand;Chicken Curry;5", "kjetsand;Chicken Tikka Masala;5",
"kjetsand;Chili Con Carne;3", "kjetsand;Kebab;3", "kjetsand;Paella;3",
"kjetsand;Pasta Bolognese;3", "kjetsand;Pasta Carbonara;3", "kjetsand;Pizza Margherita;3",
"kjetsand;Pizza Napolitana;4", "kjetsand;Sushi;5", "nilstesd;Bacalao;6",
"nilstesd;Chicken Curry;4", "nilstesd;Chicken Tikka Masala;5",
"nilstesd;Chili Con Carne;4", "nilstesd;Coq au vin;5", "nilstesd;Kebab;3",
"nilstesd;Kentucky Fried Chicken;2", "nilstesd;Lutefisk;5", "nilstesd;MacDonalds Cheeseburger;1",
"nilstesd;Moules frites;5", "nilstesd;Paella;6", "nilstesd;Pasta Bolognese;5",
"nilstesd;Pasta Carbonara;5", "nilstesd;Pizza Margherita;4",
"nilstesd;Pizza Napolitana;6", "nilstesd;Ratatouille;4", "nilstesd;Sushi;6",
"nilstesd;Sweet And Sour Pork;3", "nilstesd;Taco;2"), class = "factor")), .Names = "UserName.Dish.Score", row.names = c(NA,
30L), class = "data.frame")
change the class of Score to numeric
dishes$Score <- as.numeric(dishes$Score)
then you can aggregate your data
aggregate(formula= Score ~ Dish,
data = dishes,
FUN = mean)

Convert Factor to Date column [duplicate]

This question already has an answer here:
month language in the as.date function
(1 answer)
Closed 5 years ago.
My data frame is:
x=structure(list(V1 = structure(c(33L, 35L, 36L, 37L, 39L, 4L,
6L, 7L, 8L, 10L, 14L, 16L, 18L, 19L, 21L, 25L, 27L, 28L, 29L,
30L, 1L, 17L, 31L, 32L, 34L, 38L, 40L, 2L, 3L, 5L, 9L, 11L, 12L,
13L, 15L, 20L, 22L, 23L, 24L, 26L), .Label = c("1-Feb-71", "10-Feb-71",
"11-Feb-71", "11-Jan-71", "12-Feb-71", "12-Jan-71", "13-Jan-71",
"14-Jan-71", "15-Feb-71", "15-Jan-71", "16-Feb-71", "17-Feb-71",
"18-Feb-71", "18-Jan-71", "19-Feb-71", "19-Jan-71", "2-Feb-71",
"20-Jan-71", "21-Jan-71", "22-Feb-71", "22-Jan-71", "23-Feb-71",
"24-Feb-71", "25-Feb-71", "25-Jan-71", "26-Feb-71", "26-Jan-71",
"27-Jan-71", "28-Jan-71", "29-Jan-71", "3-Feb-71", "4-Feb-71",
"4-Jan-71", "5-Feb-71", "5-Jan-71", "6-Jan-71", "7-Jan-71", "8-Feb-71",
"8-Jan-71", "9-Feb-71"), class = "factor"), V2 = structure(c(1L,
15L, 2L, 4L, 3L, 5L, 10L, 5L, 7L, 12L, 8L, 16L, 16L, 22L, 16L,
19L, 22L, 12L, 17L, 23L, 24L, 24L, 21L, 17L, 19L, 16L, 6L, 11L,
9L, 25L, 25L, 8L, 5L, 13L, 20L, 18L, 16L, 13L, 12L, 14L), .Label = c("7.1359",
"7.1367", "7.1382", "7.1386", "7.1390", "7.1397", "7.1403", "7.1406",
"7.1410", "7.1411", "7.1412", "7.1414", "7.1418", "7.1420", "7.1422",
"7.1429", "7.1431", "7.1434", "7.1435", "7.1437", "7.1439", "7.1443",
"7.1445", "7.1465", "ND"), class = "factor")), .Names = c("V1",
"V2"), class = "data.frame", row.names = c(NA, -40L))
I am trying to convert column V1 to Date, but it is not working. Ive been looking some topics but it just doesnt work.
This my code:
x$V1 <- as.Date(x$V1, format="%d-%b-%y")
It works for some rows of V1 column but not for others.
Any help?
In my version of R, the conversion in your example only works for January and not for February. I think it is related to the language.
For example, in French, February is coded as fév and so Feb is not recognized.
Once I did:
x$V1=gsub("Feb", "fév", x$V1)
it worked.
It probably depends on which language your version of R uses.

Byte code version mismatch when using subset

I have been working on the same R script now for 5 months, had some minor coding problems, but this morning I got a problem that makes me unable to run the whole script. To clean my imported data I use a lot of subset(), but this morning when running the code I got the Warning:
Error in subset(T23810, date < as.Date("2015-10-22")) : byte code version mismatch
It appears that I only get this warning after trying to run a subset function, but it blocks my whole script at the moment. What could be the cause and solution for this?
EDIT: Reproducible example
x = structure(list(names = structure(c(11L, 3L, 5L, 27L, 26L, 15L,
18L, 13L, 8L, 2L, 22L, 12L, 1L, 25L, 29L, 31L, 6L, 23L, 28L,
14L, 19L, 4L, 10L, 16L, 9L, 17L, 21L, 30L, 7L, 6L, 27L, 26L,
12L, 13L, 14L, 4L, 28L, 15L, 31L, 23L, 1L, 22L, 11L, 18L, 3L,
20L, 8L, 5L, 16L, 2L, 25L, 30L, 21L, 4L, 6L, 3L, 5L, 27L, 14L,
11L, 26L, 31L, 13L, 18L, 15L, 1L, 23L, 2L, 8L, 28L, 30L, 20L,
22L, 12L, 10L, 16L, 21L, 25L, 17L, 24L, 32L, 31L, 23L, 26L, 1L,
18L, 11L, 12L, 3L, 15L, 27L, 28L, 5L, 22L, 6L, 17L, 20L, 2L,
8L, 21L, 30L, 13L, 25L, 24L, 7L, 4L, 10L, 16L, 14L), .Label = c("50/50",
"Babylon", "Big Rock Market", "Core Gut", "Customs House", "David's Dropoff",
"David's Dropoff Deep", "Diamond Rock", "Giles Quarter", "Green Island",
"Greer Gut", "Hole in the Corner", "Hot Springs", "Ladder Labyrinth",
"Man O War", "Mount Michel", "Muck Dive", "Outer Limits", "Poriotes Point",
"Porites Point", "Rays & Anchors", "Shark Shoals", "Tedran",
"Tent Boulders", "Tent Deep", "Tent Reef", "Tent Wall", "Third Encounter",
"Torens Point", "Torrens Point", "Twilight Zone", "Wells Bay"
), class = "factor")), .Names = "names", row.names = c(NA, -109L
), class = "data.frame")
Then if I execute the following:
x[x=="Torens Point"] = "Torrens Point"
x[x=="Poriotes Point"] = "Porites Point"
x = droplevels(subset(x, names != "Muck Dive"))
I get the error:
Error in subset(x, names != "Muck Dive") : byte code version mismatch
Okay solved it and in the end it was pretty easy. Since I am working on a server and rely on versions of R that are installed on that server I didn't realize how to update R itself. Now I got it it seems to work. Thank you all for your help! This one is SOLVED!

Create and output multiple plots from list

I am attempting to create and output as pdfs a list of 64 items. My data takes the form:
QQJAN List of 64
file1: List of 2
..$x: num [1:161] 96.7 96.8 97.5 ...
..$y: num [1:161] 9.3 10.3 17.3 ...
..................................................................
file64: List of 2
..$x: num [1:161] 42.6 59.9 70.4 ...
..$y: num [1:161] 9.3 10.3 17.3 ...
I can do this for any single item in the list using:
plot(QQJAN$file1)
and can then output these files to my working directory as pdfs, but how can all 64 files in the list be plotted and outputted with their names, i.e. file1.pdf, file 2.pdf etc.
Can the lapply function be used here?
A reproducible example:
QQJAN$file1$x=c(1,2,3,4)
QQJAN$file1$y=c(2,4,5,6)
QQJAN$file2$x=c(2,2,3,5)
QQJAN$file2$y=c(2,4,5,6)
Not tested due to lack of a reproducible example:
for (i in seq_along(QQJAN)) {
pdf(sprintf("plot%i.pdf", i)) #or pdf(paste0(names(QQJAN)[i], ".pdf"))
plot(QQJAN[[i]])
dev.off()
}
If you are only interested in side effects, such as plotting, a for loop is usually appropriate. You should use lapply if you need a return value.
We can use lapply to loop over the names of the list elements, create the pdf file by pasteing the individual names with .pdf, subset the list (QQJAN[[x]]), plot.
invisible(lapply(names(QQJAN), function(x) {
pdf(paste0(x, '.pdf'))
plot(QQJAN[[x]])
dev.off()}))
data
QQJAN <- structure(list(file1 = structure(list(x = c(6L, 5L, 15L, 11L,
14L, 19L, 6L, 16L, 17L, 6L, 13L, 8L, 14L, 14L, 7L, 19L, 4L, 1L,
11L, 3L, 2L, 12L, 15L, 3L, 5L, 14L, 2L, 12L, 13L, 1L, 7L, 5L,
8L, 3L, 19L, 5L, 15L, 13L, 14L, 20L), y = c(29L, 23L, 17L, 14L,
3L, 5L, 24L, 22L, 16L, 21L, 28L, 52L, 28L, 43L, 33L, 60L, 28L,
18L, 11L, 9L, 30L, 15L, 17L, 8L, 44L, 19L, 57L, 59L, 45L, 30L,
9L, 13L, 1L, 60L, 39L, 21L, 35L, 50L, 3L, 44L)), .Names = c("x",
"y")), file2 = structure(list(x = c(11L, 3L, 11L, 5L, 8L, 7L,
6L, 18L, 8L, 17L, 7L, 15L, 19L, 3L, 10L, 12L, 13L, 2L, 9L, 10L,
15L, 13L, 3L, 6L, 16L, 1L, 20L, 5L, 9L, 4L, 12L, 1L, 6L, 13L,
18L, 7L, 18L, 19L, 15L, 13L), y = c(56L, 31L, 40L, 43L, 20L,
45L, 55L, 8L, 43L, 26L, 7L, 52L, 7L, 31L, 11L, 14L, 55L, 26L,
4L, 42L, 34L, 44L, 12L, 4L, 30L, 60L, 23L, 44L, 29L, 55L, 6L,
37L, 11L, 14L, 36L, 52L, 28L, 22L, 31L, 33L)), .Names = c("x",
"y"))), .Names = c("file1", "file2"))

More efficient alternative for ifelse loop in R

For creating a new variable based on another variable (class = factor), I made an ifelse loop.
The data:
eu <- data.frame(structure(c(1L, 4L, 5L, 12L, 9L, 13L, 16L, 18L, 27L, 10L, 25L, 21L, 28L, 19L, 8L, 26L, 6L, 3L, 15L, 14L, 11L, 17L, 20L, 23L, 24L, 2L, 22L, 7L), .Label = c("Belgie", "Bulgarije", "Cyprus", "Denemarken", "Duitsland", "Estland", "Europese Unie", "Finland", "Frankrijk", "Griekenland", "Hongarije", "Ierland", "Italie", "Letland", "Litouwen", "Luxemburg", "Malta", "Nederland", "Oostenrijk", "Polen", "Portugal", "Roemenie", "Slovenie", "Slowakije", "Spanje", "Tsjechie", "Verenigd Koninkrijk", "Zweden"), class = "factor"))
names(eu) <- "land"
My ifelse loop:
eu$plicht <- ifelse(eu$land=="Belgie", "ja",
ifelse(eu$land=="Italie","ja",
ifelse(eu$land=="Cyprus","ja",
ifelse(eu$land=="Griekenland","ja",
ifelse(eu$land=="Luxemburg","ja",
ifelse(eu$land=="Spanje","ja","nee"))))))
However, I'm wondering if there a more efficient way of doing this. Any suggestions?
You could do something like:
eu$plicht <- ifelse(eu$land %in% c("Belgie", "Italie", "Cyprus", "Griekenland", "Luxemburg", "Spanje"),
"ja", "nee")

Resources