Factor order within faceted dotplot using ggplot2 - r

I'm trying to change the plotting order within facets of a faceted dotplot in ggplot2, but I can't get it to work. Here's my melted dataset:
> London.melt
country medal.type count
1 South Korea gold 13
2 Italy gold 8
3 France gold 11
4 Australia gold 7
5 Japan gold 7
6 Germany gold 11
7 Great Britain & N. Ireland gold 29
8 Russian Federation gold 24
9 China gold 38
10 United States gold 46
11 South Korea silver 8
12 Italy silver 9
13 France silver 11
14 Australia silver 16
15 Japan silver 14
16 Germany silver 19
17 Great Britain & N. Ireland silver 17
18 Russian Federation silver 26
19 China silver 27
20 United States silver 29
21 South Korea bronze 7
22 Italy bronze 11
23 France bronze 12
24 Australia bronze 12
25 Japan bronze 17
26 Germany bronze 14
27 Great Britain & N. Ireland bronze 19
28 Russian Federation bronze 32
29 China bronze 23
30 United States bronze 29
and here's my plot command:
qplot(x = count, y = country, data = London.melt, geom = "point", facets = medal.type ~.)
The result I get is as follows:
The facets themselves appear in the order I want in this plot. Within each facet, however, I'd like to sort by count. That is, for each type of medal, I'd like the country that won the greatest number of those medals on top, and so on. The procedure I have used successfully when there are no facets (say we're only looking at gold medals) is to use the reorder function on the factor country, sorting by count but this doesn't work in the present example.
I'd greatly appreciate any suggestions you might have.

Here a solution using paste, free scales and some relabeling
library(ggplot2)
London.melt$medal.type<-factor(London.melt$medal.type, levels = c("gold","silver","bronze"))
# Make every country unique
London.melt$country_l <- with(London.melt, paste(country, medal.type, sep = "_"))
#Reorder the unique countrys
q <- qplot(x = count, y = reorder(country_l, count), data = London.melt, geom = "point") + facet_grid(medal.type ~., scales = "free_y")
# Rename the countries using the original names
q + scale_y_discrete("Country", breaks = London.melt$country_l, label = London.melt$country)

This is obviously quite late, and some of what I'm doing may have not been around 6 years ago, but I came across this question while doing a similar task. I'm always reluctant to set tick labels with a vector—it feels safer to use a function that can operate on the original labels.
To do that, I'm creating a factor ID column based on the country and the medal, with some delimiter character that doesn't already appear in either of those columns—in this case, _ works. Then with forcats::fct_reorder, I can order that column by count. The last few levels of this column are below, and should correspond to the country + medal combinations with the highest counts.
library(tidyverse)
London_ordered <- London.melt %>%
mutate(id = paste(country, medal.type, sep = "_") %>%
as_factor() %>%
fct_reorder(count, .fun = min))
levels(London_ordered$id) %>% tail()
#> [1] "Great Britain & N. Ireland_gold" "United States_silver"
#> [3] "United States_bronze" "Russian Federation_bronze"
#> [5] "China_gold" "United States_gold"
Then use this ID as your y-axis. On its own, you'd then have very long labels that include the medal type. Because of the unique delimiter, you can write an inline function for the y-axis labels that will remove the delimiter and any word characters that come after it, leaving you with just the countries. Moving the facet specification to a facet_wrap function lets you then set the free y-scale.
qplot(x = count, y = id, data = London_ordered, geom = "point") +
scale_y_discrete(labels = function(x) str_remove(x, "_\\w+$")) +
facet_wrap(~ medal.type, scales = "free_y", ncol = 1)

This is the best I can do with qplot. Not exactly what you asked for but closer. OOOPs I see you already figured that out.
q <- qplot(x = count, y = reorder(country, count), data = London.melt, geom = "point", facets = medal.type ~.)
Here's a dput version so others can improve:
dput(London.melt)
structure(list(country = structure(c(9L, 6L, 3L, 1L, 7L, 4L,
5L, 8L, 2L, 10L, 9L, 6L, 3L, 1L, 7L, 4L, 5L, 8L, 2L, 10L, 9L,
6L, 3L, 1L, 7L, 4L, 5L, 8L, 2L, 10L), .Label = c("Australia",
"China", "France", "Germany", "Great Britain & N. Ireland", "Italy",
"Japan", "Russian Federation", "South Korea", "United States"
), class = "factor"), medal.type = structure(c(2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("bronze",
"gold", "silver"), class = "factor"), count = c(13L, 8L, 11L,
7L, 7L, 11L, 29L, 24L, 38L, 46L, 8L, 9L, 11L, 16L, 14L, 19L,
17L, 26L, 27L, 29L, 7L, 11L, 12L, 12L, 17L, 14L, 19L, 32L, 23L,
29L)), .Names = c("country", "medal.type", "count"), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13",
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24",
"25", "26", "27", "28", "29", "30"))

Related

How can I change the color of my statebins map?

I'm trying to change the color of my statebins map. I'm super new to R and still learning a lot, and googling around hasn't really helped.
This is what the sample looks like:
fipst stab state color
1 37 NC North Carolina 2
2 1 AL Alabama 2
3 28 MS Mississippi 2
4 5 AR Arkansas 2
5 47 TN Tennessee 2
6 45 SC South Carolina 1
7 23 ME Maine 2
49 32 NV Nevada 1
50 15 HI Hawaii 2
51 11 DC District of Columbia 2
digitaltax <- structure(list(fipst = c(37L, 1L, 28L, 5L, 47L, 45L, 23L, 32L,15L, 11L), stab = c("NC", "AL", "MS", "AR", "TN", "SC", "ME","NV", "HI", "DC"), state = c("North Carolina", "Alabama", "Mississippi","Arkansas", "Tennessee", "South Carolina", "Maine", "Nevada","Hawaii", "District of Columbia"), color = c(2L, 2L, 2L, 2L,2L, 1L, 2L, 1L, 2L, 2L)), row.names = c(1L, 2L, 3L, 4L, 5L, 6L,7L, 49L, 50L, 51L), class = "data.frame")
==X==============================================================X==
mutate(
digitaltax,
share = cut(color, breaks = 3, labels = c("No sales tax", "Exempts digital goods", "Taxes digital goods"))
) %>%
statebins(
value_col = "share", font_size = 2.5,
ggplot2_scale_function = scale_fill_brewer,
name = ""
) +
labs(title = "Which states tax digital products?") + theme_statebins()
This produces a map with a range of blues. How can I change the color? No matter what I've tried and found on google, it always throws this error:
Error in ggplot2_scale_function(...) : could not find function "ggplot2_scale_function"
Any help at all would be super appreciated. Thank you!
One approach is to use named RColorBrewer palates with brewer_pal =:
library(statebins)
statebins(digitaltax,
value_col = "color",
breaks = length(unique(digitaltax$share)),
labels = unique(digitaltax$share),
brewer_pal = "Dark2") +
labs(title = "Which states tax digital products?")
Execute this command to see all palates:
library(RColorBrewer)
display.brewer.all()
With your data and most of your code I did change DC color to 3 so it shows all categories.
library(dplyr)
library(ggplot2)
library(statebins)
# changes DC to be color 3
digitaltax <- structure(list(fipst = c(37L, 1L, 28L, 5L, 47L, 45L, 23L, 32L,15L, 11L), stab = c("NC", "AL", "MS", "AR", "TN", "SC", "ME","NV", "HI", "DC"), state = c("North Carolina", "Alabama", "Mississippi","Arkansas", "Tennessee", "South Carolina", "Maine", "Nevada","Hawaii", "District of Columbia"), color = c(2L, 2L, 2L, 2L,2L, 1L, 2L, 1L, 2L, 3L)), row.names = c(1L, 2L, 3L, 4L, 5L, 6L,7L, 49L, 50L, 51L), class = "data.frame")
mutate(
digitaltax,
share = cut(color, breaks = 3, labels = c("No sales tax", "Exempts digital goods", "Taxes digital goods"))
) %>%
statebins(
value_col = "share", font_size = 2.5,
ggplot2_scale_function = scale_fill_brewer,
name = ""
) +
labs(title = "Which states tax digital products?") +
theme_statebins()
Created on 2020-05-12 by the reprex package (v0.3.0)

Sort and Return Top 5 Rows with Greatest Values

For this dataset, I would like to order the Var1 by the corresponding frequency in order from largest to smallest and take the top 5 largest by row. I've been using the functions rank(), sort(), and order() with no avail.
Var1 Freq
2 Moderate 33
3 Luxury 31
4 Couples 31
5 Families with Children 33
6 Nightlife 23
7 Europe 60
8 Architecture 23
9 Drink 58
10 Northern Europe 27
11 Skiing 29
Ideally, I would like the final output to be:
Var1 Freq
7 Europe 60
9 Drink 58
5 Families with Children 33
2 Moderate 33
3 Luxury 31
When I use the functions stated above, R returns a series of numbers such that are either jibberish or it will only return the Freq column in a ranked order.
Here's a dplyr solution.
df %>% top_n(5, Freq) %>% arrange(-Freq)
This gives you the top 5 scores in order.
# Var1 Freq
# 1 Europe 60
# 2 Drink 58
# 3 Moderate 33
# 4 Families with Children 33
# 5 Luxury 31
# 6 Couples 31
Note that 6 entries are included due to a tie.
If you just want the top 5 regardless of ties, then you can use this:
df %>% arrange(-Freq) %>% filter(row_number() <= 5)
# Var1 Freq
# 1 Europe 60
# 2 Drink 58
# 3 Moderate 33
# 4 Families with Children 33
# 5 Luxury 31
Here is a one-liner. It uses order and head.
head(dat[order(dat$Freq, decreasing = TRUE), ], 5)
# Var1 Freq
#7 Europe 60
#9 Drink 58
#2 Moderate 33
#5 Families with Children 33
#3 Luxury 31
DATA.
dat <-
structure(list(Var1 = structure(c(7L, 6L, 2L, 5L, 8L, 4L, 1L,
3L, 9L, 10L), .Label = c("Architecture", "Couples", "Drink",
"Europe", "Families with Children", "Luxury", "Moderate", "Nightlife",
"Northern Europe", "Skiing"), class = "factor"), Freq = c(33L,
31L, 31L, 33L, 23L, 60L, 23L, 58L, 27L, 29L)), .Names = c("Var1",
"Freq"), class = "data.frame", row.names = c("2", "3", "4", "5",
"6", "7", "8", "9", "10", "11"))
dat <- structure(list(Var1 = structure(c(7L, 6L, 2L, 5L, 8L, 4L, 1L, 3L, 9L, 10L), .Label = c("Architecture", "Couples", "Drink",
"Europe", "Families with Children", "Luxury", "Moderate", "Nightlife",
"Northern Europe", "Skiing"), class = "factor"), Freq = c(33L,
31L, 31L, 33L, 23L, 60L, 23L, 58L, 27L, 29L)), .Names = c("Var1",
"Freq"), class = "data.frame", row.names = c("2", "3", "4", "5",
"6", "7", "8", "9", "10", "11"))
Using data.table.
library(data.table)
DFDT <- as.data.table(dat)
DFDT[order(-Freq)][1:5]
Var1 Freq
1: Europe 60
2: Drink 58
3: Moderate 33
4: Families with Children 33
5: Luxury 31

R, change ggplot legend names with scale_linetype_manual

I have a dataframe which looks like this:
> df
Year mpft value type index
1 1996 2 0.033827219 solid 2.1
2 1997 2 0.133278701 solid 2.1
3 1998 2 0.261428650 solid 2.1
4 1999 2 0.394702438 solid 2.1
5 1996 3 0.019079686 solid 3.1
6 1997 3 0.074332942 solid 3.1
7 1998 3 0.149042964 solid 3.1
8 1999 3 0.227812452 solid 3.1
9 1996 4 0.009909126 solid 4.1
10 1997 4 0.026231721 solid 4.1
11 1998 4 0.052912805 solid 4.1
12 1999 4 0.086256016 solid 4.1
13 1996 17 0.017256492 solid 17.1
14 1997 17 0.079446280 solid 17.1
15 1998 17 0.166014538 solid 17.1
16 1999 17 0.316175339 solid 17.1
17 1996 18 0.080072523 solid 18.1
18 1997 18 0.313289644 solid 18.1
19 1998 18 0.629398957 solid 18.1
20 1999 18 1.024946245 solid 18.1
110 1996 2 0.031634282 dashed 2.2
21 1997 2 0.139244701 dashed 2.2
31 1998 2 0.273270126 dashed 2.2
41 1999 2 0.412409808 dashed 2.2
51 1996 3 0.019430502 dashed 3.2
61 1997 3 0.079252516 dashed 3.2
71 1998 3 0.161607337 dashed 3.2
81 1999 3 0.252595611 dashed 3.2
91 1996 4 0.009976637 dashed 4.2
101 1997 4 0.027057403 dashed 4.2
111 1998 4 0.055755671 dashed 4.2
121 1999 4 0.093064641 dashed 4.2
171 1996 18 0.061041422 dashed 18.2
181 1997 18 0.245554619 dashed 18.2
191 1998 18 0.490633135 dashed 18.2
201 1999 18 0.758070060 dashed 18.2
I am trying to plot the data and have the right legend, so far I have initially tried with
ggplot(df,aes(x=Year,y=value, colour = factor(mpft),linetype=type)) +
geom_line(aes(group = index), size = 1.4) +
#scale_linetype_manual(name= "Run Type", values = unique(df$type), labels = run.type) +
scale_color_manual(name = "PFT",
values = setNames(mycol[unique(df$mpft)], unique(df$mpft)),
labels = setNames(mynam[unique(df$mpft)], unique(df$mpft)))
Which gives me
I have tried adding a scale_linetype_manual with
ggplot(df,aes(x=Year,y=value, colour = factor(mpft),linetype=type)) +
geom_line(aes(group = index), size = 1.4) +
scale_linetype_manual(name= "Run Type", values = unique(df$type), labels = run.type) +
scale_color_manual(name = "PFT",
values = setNames(mycol[unique(df$mpft)], unique(df$mpft)),
labels = setNames(mynam[unique(df$mpft)], unique(df$mpft)))
with
> run.type
[1] "current" "origED3"
But I get
which has the right names for the legend but has a different linetype.
What am I missing?
EDIT
The dput of my dataframe is
> dput(df)
structure(list(Year = c(1996, 1997, 1998, 1999, 1996, 1997, 1998,
1999, 1996, 1997, 1998, 1999, 1996, 1997, 1998, 1999, 1996, 1997,
1998, 1999, 1996, 1997, 1998, 1999, 1996, 1997, 1998, 1999, 1996,
1997, 1998, 1999, 1996, 1997, 1998, 1999), mpft = c(2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 17L, 17L, 17L, 17L, 18L,
18L, 18L, 18L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L,
18L, 18L, 18L, 18L), value = c(0.0338272191848643, 0.133278701149992,
0.261428650232716, 0.394702437670559, 0.0190796862689925, 0.0743329421068756,
0.149042964352043, 0.227812451937011, 0.00990912614900737, 0.0262317206863519,
0.0529128049802722, 0.0862560162908444, 0.017256491619149, 0.0794462797803606,
0.166014537897384, 0.31617533869767, 0.0800725232220131, 0.31328964372358,
0.629398957462415, 1.02494624459608, 0.0316342818911836, 0.139244700529005,
0.273270126484303, 0.412409807917143, 0.0194305022713642, 0.0792525159706922,
0.161607337403947, 0.252595610607411, 0.00997663742883768, 0.0270574028188436,
0.0557556714277292, 0.0930646413413941, 0.0610414215913856, 0.245554619318541,
0.490633135315979, 0.758070059865948), type = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L), .Label = c("solid", "dashed"), class = "factor"),
index = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L,
3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 7L,
7L, 7L, 7L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L), .Label = c("2.1",
"3.1", "4.1", "17.1", "18.1", "2.2", "3.2", "4.2", "18.2"
), class = "factor")), .Names = c("Year", "mpft", "value",
"type", "index"), row.names = c("1", "2", "3", "4", "5", "6",
"7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17",
"18", "19", "20", "110", "21", "31", "41", "51", "61", "71",
"81", "91", "101", "111", "121", "171", "181", "191", "201"), class = "data.frame")
EDIT
Not a very elegant solution but substituting "dashed" with "22" and then using:
ggplot(df,aes(x=Year,y=value, colour = factor(mpft),linetype=type)) +
geom_line(aes(group = index), size = 1.4) +
scale_linetype_manual(name= "Run Type", values = unique(as.character(df$type)), labels = run.type) +
scale_color_manual(name = "PFT",
values = setNames(mycol[unique(df$mpft)], unique(df$mpft)),
labels = setNames(mynam[unique(df$mpft)], unique(df$mpft)))
I am able to get it right
Interesting question, and one I haven't come across on SO before.
Short answer
The difference is because the linetype in your first version isn't "dashed" at all.
Long answer
In your first version, without specifying anything for the linetype aesthetic, ggplot defaults to scale_linetype_discrete(), and the current code for that is:
scale_linetype <- function(..., na.value = "blank") {
discrete_scale("linetype", "linetype_d", linetype_pal(),
na.value = na.value, ...)
}
Hence, scale_linetype_discrete() gets its linetype values from the linetype_pal function, courtesy of the scales package (at least, that's the only place I found it):
> scales::linetype_pal()(2)
[1] "solid" "22"
When you specified the linetype aesthetic mapping in the second version, using scale_linetype_manual(), the corresponding current code is:
scale_linetype_manual <- function(..., values) {
manual_scale("linetype", values, ...)
}
Thus when you explicitly ask for c("solid", "dashed") as the two linetype values in your plot, ggplot uses them. When you don't, the default values are c("solid", "22"), and "22" corresponds to a different, more tightly spaced pattern than "dashed"'s pattern.
Demonstration below, using built-in data:
df.sample <- diamonds %>%
filter(cut %in% c("Fair", "Good")) %>%
group_by(cut, clarity) %>%
summarise(price = mean(price / carat)) %>%
ungroup()
p <- ggplot(df.sample,
aes(x = clarity, y = price, group = cut,
linetype = cut)) +
geom_line(size = 1) +
guides(linetype = guide_legend(keywidth = 3, keyheight = 1)) +
theme(legend.position = c(1, 0), legend.justification = c(1, 0))
library(gridExtra)
grid.arrange(p +
labs(title = "Default scale",
subtitle = c("values = linetype_pal()(2)")),
p + scale_linetype_manual(values = c("solid", "dashed")) +
labs(title = "Manual scale",
subtitle = "values = c('solid', 'dashed')"),
p + scale_linetype_manual(values = c("solid", "22")) +
labs(title = "Manual scale",
subtitle = "values = c('solid', '22')"),
nrow = 1)
The third plot mimics the behaviour of the default scale.
you're just not able to see the linetype difference - if you make the legend wider it's visible:
ggplot(df,aes(x=Year,y=value, colour = factor(mpft),linetype=type)) +
geom_line(aes(group = index), size = 1.4) +
scale_linetype_manual(name= "Run Type", values = unique(df$type), labels = run.type) +
guides(linetype = guide_legend(keywidth = 3, keyheight = 1))

r if statement returns number of level rather than the level text

I have a table like the following image and I'm trying to use a simple if statement to return the country name only in cases where food is "Oranges". The 3rd column is the desired outcome, the 4th column is what I get in R.
In excel the formula would be:
=IF(A2="Oranges",B2,"n/a")
I have used the following r code to generate the "oranges_country" variable:
table$oranges_country <- ifelse (Food == "Oranges", Country , "n/a")
[As per the image above] The code returns the number of the level (e.g. 6) in the levels list for 'Country' rather than 'Country' itself (e.g. "Spain"). I understand where this coming from (the position in the extract as below), but it's a pain particularly when using several nested if statements.
levels(Country)
[1] "California" "Ecuador" "France" "New Zealand" "Peru" "Spain" "UK"
There must be a simple way to change this???
As requested in a comment: dput(table) output as follows:
dput(table)
structure(list(Food = structure(c(1L, 1L, 3L, 1L, 1L, 3L, 3L,
2L, 2L), .Label = c("Apples", "Bananas", "Oranges"), class = "factor"),
Country = structure(c(3L, 7L, 6L, 4L, 7L, 6L, 1L, 5L, 2L), .Label = c("California",
"Ecuador", "France", "New Zealand", "Peru", "Spain", "UK"
), class = "factor"), Desired_If.Outcome = structure(c(2L,
2L, 3L, 2L, 2L, 3L, 1L, 2L, 2L), .Label = c("California",
"n/a", "Spain"), class = "factor"), oranges_country = c("n/a",
"n/a", "6", "n/a", "n/a", "6", "1", "n/a", "n/a"), desiredcolumn = c(NA,
NA, 6L, NA, NA, 6L, 1L, NA, NA)), .Names = c("Food", "Country",
"Desired_If.Outcome", "oranges_country", "desiredcolumn"), row.names = c(NA,
-9L), class = "data.frame")
Try the ifelse loop. Firstly , change Table$Country to character()
table$Country<-as.character(Table$Country)
table$desiredcolumn<-ifelse(table$Food == "Oranges", table$Country, NA)
Here is my version:
Food<-c("Ap","Ap","Or","Ap","Ap","Or","Or","Ba","Ba")
Country<-c("Fra","UK","Sp","Nz","UK","Sp","Cal","Per","Eq")
Table<-cbind(Food,Country)
Table<-data.frame(Table)
Table$Country<-as.character(Table$Country)
Table$DC<-ifelse(Table$Food=="Or", Table$Country, NA)
Table
Food Country DC
1 Ap Fra <NA>
2 Ap UK <NA>
3 Or Sp Sp
4 Ap Nz <NA>
5 Ap UK <NA>
6 Or Sp Sp
7 Or Cal Cal
8 Ba Per <NA>
9 Ba Eq <NA>
Try this (if your table is called table):
table[table$Food=="Oragnes", ]

subseting in a for loop

My dataset has 34,000 rows and 353 columns. One column is location and it has 11,000 unique values. I want to subset the dataset within a for loop. I can do this by creating a new data frame for each subset, but I want the subsets to form a single data frame. I have included a sample dataset below
structure(list(X = structure(c(1L, 1L, 1L, 1L, 3L, 3L, 3L, 2L,
3L), .Label = c("Car", "DOG", "House"), class = "factor"), Y = c(20L,
20L, 20L, 20L, 410L, 410L, 410L, 410L, 60L), Z = structure(c(1L,
3L, 8L, 1L, 7L, 5L, 2L, 4L, 6L), .Label = c("ARGENTINA", "BERLIN GERMANY",
"BUENOS AIRES ARGENTINA", "DUBLIN IRELAND", "FROM AUSTRIA", "GERMANY",
"IN TRANSIT FROM GERMANY", "RIVER PLATE ARGENTINA"), class = "factor"),
K = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "A", class = "factor")),
.Names = c("X", "Y", "Z", "K"), class = "data.frame", row.names = c(NA, -9L))
I can use the following code to create new data frames
l=c("ARGENTINA","IRELAND")
for(i in l){
assign(paste("newdata",i,sep=""),
subset(TESTL[which(grepl(i,TESTL$Z)&
!grepl("IN TRANSIT",TESTL$Z)&!grepl("FROM",TESTL$Z)),],
select=c("X","Y","Z")))}
However I want to create a single new dataframe to hold all the subsets. I have tried the following code
d<-data.frame()
for(i in l){d<-rbind(d,c(
subset(TESTL[which(grepl(i,TESTL$Z) & !grepl("IN TRANSIT",TESTL$Z)
& !grepl("FROM",TESTL$Z)),],
select=c("X","Y","Z")))}
I get the following errors
Warning messages:
1: In `[<-.factor`(`*tmp*`, ri, value = "DOG") :
invalid factor level, NA generated
2: In `[<-.factor`(`*tmp*`, ri, value = "DUBLIN IRELAND") :
invalid factor level, NA generated
I have attempted to convert the factors to characters with no success. Any help appreciated
I think you are making your life rather difficult by using assign here and trying to store the subsets in separate data frames. Try something more like this:
l <- c("ARGENTINA","IRELAND")
res <- setNames(vector("list",length(l)),l)
for (i in seq_along(l)){
res[[i]] <- dat[grepl(l[i],dat$Z) & !grepl("IN TRANSIT",dat$Z) & !grepl("FROM",dat$Z),c("X","Y","Z")]
}
> res
$ARGENTINA
X Y Z
1 Car 20 ARGENTINA
2 Car 20 BUENOS AIRES ARGENTINA
3 Car 20 RIVER PLATE ARGENTINA
4 Car 20 ARGENTINA
$IRELAND
X Y Z
8 DOG 410 DUBLIN IRELAND
> do.call("rbind",res)
X Y Z
ARGENTINA.1 Car 20 ARGENTINA
ARGENTINA.2 Car 20 BUENOS AIRES ARGENTINA
ARGENTINA.3 Car 20 RIVER PLATE ARGENTINA
ARGENTINA.4 Car 20 ARGENTINA
IRELAND DOG 410 DUBLIN IRELAND
The warnings is becouse at first iteration of a loop (ARGENTINA) it introduces factors variables X and Z, and on the second indtroduce IRELAND with another factor levels. So:
First you should change a classes of your vaiables n TESTL:
for (i in names(TESTL) [grep ("factor", sapply (TESTL, class))]) {
TESTL[[i]] <- as.character (TESTL[[i]])
}
Then it will work with the next code:
d <- data.frame(stringsAsFactors=F)
for(i in l){d <- rbind(d,
TESTL [grepl(i,TESTL$Z) & !grepl("FROM|IN TRANSIT", TESTL$Z), c("X", "Y", "Z")])}

Resources