Behavior of function "identical" with factors - r

R stores factors as integers. Therefore, when using the function identical, it cannot find when two factors are of the same name if they have different levels.
Here's an MWE:
y <- structure(list(portfolio_date = structure(c(1L, 1L, 1L, 2L, 2L,
2L), .Label = c("2000-10-31", "2001-04-30"), class = "factor"),
security = structure(c(2L, 2L, 1L, 3L, 2L, 4L), .Label = c("Currency Australia (Fwd)",
"Currency Euro (Fwd)", "Currency Japan (Fwd)", "Currency United Kingdom (Fwd)"
), class = "factor")), .Names = c("portfolio_date", "security"
), row.names = c(10414L, 10417L, 10424L, 21770L, 21771L, 21774L
), class = "data.frame")
x <- structure(list(portfolio_date = structure(1L, .Label = "2000-10-31", class = "factor"),
security = structure(1L, .Label = "Currency Euro (Fwd)", class = "factor")),
.Names = c("portfolio_date", "security"), row.names = 10414L, class = "data.frame")
identical(y[1,], x)
Returns FALSE
But if we look at the objects, they appear identical to the user
y[1,]
portfolio_date security
10414 2000-10-31 Currency Euro (Fwd)
x
portfolio_date security
10414 2000-10-31 Currency Euro (Fwd)
Ultimately I want to be able to do something like the following:
apply(y, 1, identical, x)
10414 10417 10424 21770 21771 21774
TRUE TRUE FALSE FALSE FALSE FALSE
which(apply(y, 1, identical, x))
1 2
Any suggestions as to how to achieve this? Thanks.

One option is to use the rowwise from dplyr to check row-by-row; If you need to compare the row.names at the same time then you need to create an id column for both, otherwise, it will return TRUE for the first two rows.
library(dplyr)
x$id <- row.names(x)
y$id <- row.names(y)
rowwise(y) %>% do(check = isTRUE(all.equal(., x, check.attributes = F))) %>% data.frame
check
1 TRUE
2 FALSE
3 FALSE
4 FALSE
5 FALSE
6 FALSE

In order to perform the comparison, the factors need to be converted into character objects.
By using base R alone here is a solution:
apply(apply(y, 2, as.character), 1, identical, apply(x, 2, as.character))
The inner apply loops convert each column in the source and target data frames to character objects and the outer apply loops through the rows.
If the x data frame has more than one row, the actual behavior may not be as expected.

Use the package 'compare'.
library(compare)
result <- NULL
for (i in 1:NROW(y)){
one <- compare(y[i,], x, dropLevels=T)
two <- one$detailedResult[1]==T & one$detailedResult[2]==T
result <- c(result, two)
}
as.character(result)#TRUE TRUE FALSE FALSE FALSE FALSE

Solution for data posted in OP
The example posted in the OP can be easily treated by using droplevels().
Let us first look at why the comparison identical(y[1,], x) returns FALSE:
str(y[1,])
#'data.frame': 1 obs. of 2 variables:
#$ portfolio_date: Factor w/ 2 levels "2000-10-31","2001-04-30": 1
#$ security : Factor w/ 4 levels "Currency Australia (Fwd)",..: 2
whereas
str(x)
#'data.frame': 1 obs. of 2 variables:
#$ portfolio_date: Factor w/ 1 level "2000-10-31": 1
#$ security : Factor w/ 1 level "Currency Euro (Fwd)": 1
So the difference lies in the factors, even though both objects are displayed in the same way, as shown in the OP's question.
This is where the function droplevels() is useful: it removes unused factors. By applying droplevels() to y[1,] with its redundant factors, we obtain:
identical(droplevels(y[1,]), x)
#[1] TRUE
If x also contains unused factors, it will be necessary to wrap it into droplevels(), too. In any case, it won't do any harm:
identical(droplevels(y[1,]), droplevels(x))
#[1] TRUE
General solution
Using droplevels() may not work if the real data is more complex than the data posted in the "MWE" of the OP. Such situations may include, e.g., equivalent entries in x and y[1,] that are stored as different factor levels. An example where droplevels() fails is given in the data section at the end of this answer.
The following solution represents an efficient possibility to treat such general situations. It works for the data posted in the OP as well as for the more complicated case of the data posted below.
First, two auxiliary vectors are created that contain only the characters of each row. By using paste() we can concatenate each row to a single character string:
temp_x <- apply(x, 1, paste, collapse=",")
temp_y <- apply(y, 1, paste, collapse=",")
With these vectors it becomes easily possible to compare rows of the original data.frames, even if the entries were originally stored as factors with different levels and numbering.
To identify which rows are identical, we can use the %in% operator, which is more appropriate than the function identical() in this case, as the former checks for equality of all possible row combinations, and not just individual pairs.
With these simple modifications the desired output can be obtained quickly and without further loops:
setNames(temp_y %in% temp_x, names(temp_y))
#10414 10417 10424 21770 21771 21774
# TRUE TRUE FALSE FALSE FALSE FALSE
which(temp_y %in% temp_x)
#[1] 1 2
y[temp_y %in% temp_x,]
# portfolio_date security
#10414 2000-10-31 Currency Euro (Fwd)
#10417 2000-10-31 Currency Euro (Fwd)
data
x <- structure(list(portfolio_date = structure(1:2, .Label = c("2000-05-15",
"2000-10-31"), class = "factor"), security = structure(c(2L, 1L),
.Label = c("Currency Euro (Fwd)", "Currency USD (Fwd)"),
class = "factor")), .Names = c("portfolio_date", "security"),
class = "data.frame", row.names = c("10234", "10414"))
y <- structure(list(portfolio_date = structure(c(1L, 1L, 1L, 2L, 2L, 2L),
.Label = c("2000-10-31", "2001-04-30"), class = "factor"),
security = structure(c(2L, 2L, 1L, 3L, 2L, 4L),
.Label = c("Currency Australia (Fwd)", "Currency Euro (Fwd)",
"Currency Japan (Fwd)", "Currency United Kingdom (Fwd)"),
class = "factor")), .Names = c("portfolio_date", "security"),
row.names = c(10414L, 10417L, 10424L, 21770L, 21771L, 21774L),
class = "data.frame")

Related

R not updating field based on criteria

i have a simple DF:
Dev_Func
agn
agn
ttt
ttt
agn
all i am trying to do is if the field contains "agn" replace it with "PE"
this is the code that i have written:
test = subset(Final.ds,Device_Function == "AGN" | Device_Function ==
"TTT", select = c(Device_Function))
colnames(test) = c("Device_Function")
as.character(test)
test = within(test, Device_Function[Device_Function == 'AGN'] = 'PE')
but i just keep on geting this error:
Warning message:
In `[<-.factor`(`*tmp*`, Device_Function == "AGN", value = "PE") :
invalid factor level, NA generated
and all it does is replaces all the "AGN" values with NA.
help please!
You could do this with gsub:
df$Dev_Func <- gsub("agn", "PE", df$Dev_Func)
df
# Dev_Func
#1 PE
#2 PE
#3 ttt
#4 ttt
#5 PE
An alternative solution to keeping Dev_Func as a factor (as mentioned by akrun):
df$Dev_Func <- as.factor(gsub("agn", "PE", df$Dev_Func))
class(df$Dev_Func)
[1] "factor"
As the column is a factor, we can assign the levels that are 'agn' to 'PE'
levels(DF$Dev_Func)[levels(DF$Dev_Func)=='agn'] <- 'PE'
and keep it as a factor column
levels(DF$Dev_Func)
#[1] "PE" "ttt"
DF
# Dev_Func
#1 PE
#2 PE
#3 ttt
#4 ttt
#5 PE
NOTE: Assuming that 'agn' is a fixed match and not a substring
In the OP's code, i.e. within function, there are some issues
1) the assignment is <- instead of =
2) it cannot do a logical subset assignment
3) the column is factor and doesn't have any level 'PE' which generates the warning message about invalid factor level, NA generated
4) According to the example the 'agn' is lower case and not 'AGN' (could be a typo), but R is case-sensitive
Suppose, we add the PE as levels
DF$Dev_Func <- factor(DF$Dev_Func, levels = c(levels(DF$Dev_Func), 'PE'))
then the assignment below would work
DF$Dev_Func[DF$Dev_Func=='agn'] <- 'PE'
It is still not a cleaner way compared to change based on levels assignment
data
DF <- structure(list(Dev_Func = structure(c(1L, 1L, 2L, 2L, 1L), .Label = c("agn",
"ttt"), class = "factor")), .Names = "Dev_Func", row.names = c(NA,
-5L), class = "data.frame")

Multiple matching function in r

I am trying to match two datasets using the following variables School (unique) with classes that need teachers. Some teachers have one specialty, some have more than one. I have been trying to use the match() and which( %in% ) base functions but I cannot get it to search for all the possible teacher matches. It always stops after the first match. Here is some sample data:
class<-c("english","history","art","art","math","history","art")
school<-c("C.H.S.","B.H.S.","D.H.S.","A.H.S.","Z.H.S.","M.H.S.","L.H.S.")
specialty<-c("math","history","English","history","literature","art","English")
teacher<-c("Jill","Jill","Sam","Liz","Liz","Liz","Rob")
teacher.skills<-data.frame(teacher, specialty)
school.needs<-data.frame(school,class)
teacher.match<-data.frame(Jill,Sam,Rob,Liz)
The final result would look like this:
Jill<-c("No","Yes","No","No","Yes","Yes","No")
Sam<-c("Yes","No","No","No","No","No","No")
Liz<-c("No","Yes","Yes","Yes","No","Yes","Yes")
Rob<-c("Yes","No","No","No","No","No","No")
match.result<-data.frame(school.needs, teacher.match)
match.result
I have even tried working on a little function like this but still can't get the final formatting right.
source.1<-school.needs
source.2<-teacher.skills
dist.name<-adist(source.1$class, source.2$specialty, partial = FALSE, ignore.case = TRUE)
min.name<-apply(dist.name, 1, min)
school.teacher.match<-NULL
for(i in 1:nrow(dist.name))
{
skills.ref<-match(min.name[i], dist.name[i,])
school.ref<-i
school.teacher.match<-rbind(data.frame(skills.ref=skills.ref, school.ref=school.ref, Teacher=source.2[skills.ref,]$teacher, Class=source.1[school.ref,]$class, School=source.1[school.ref,]$school, adist=min.name[i]), school.teacher.match)
school.teacher.match<-subset(school.teacher.match, school.teacher.match$adist==0)
}
school.teacher.match
Any help would be much appreciated, thanks!
Note that I had to modify your input data to change "English" to "english" for each match. The data is given by:
school.needs <- structure(list(school = structure(c(3L, 2L, 4L, 1L, 7L, 6L, 5L
), .Label = c("A.H.S.", "B.H.S.", "C.H.S.", "D.H.S.", "L.H.S.",
"M.H.S.", "Z.H.S."), class = "factor"), class = structure(c(2L,
3L, 1L, 1L, 4L, 3L, 1L), .Label = c("art", "english", "history",
"math"), class = "factor")), .Names = c("school", "class"), row.names = c(NA,
-7L), class = "data.frame")
teacher.skills <- structure(list(teacher = structure(c(1L, 1L, 4L, 2L, 2L, 2L,
3L), .Label = c("Jill", "Liz", "Rob", "Sam"), class = "factor"),
specialty = structure(c(5L, 3L, 2L, 3L, 4L, 1L, 2L), .Label = c("art",
"english", "history", "literature", "math"), class = "factor")), .Names = c("teacher",
"specialty"), row.names = c(NA, -7L), class = "data.frame")
Using merge and dcast from reshape2 (or data.table):
library(reshape2)
## use merge to match needs to skills
m <- merge(school.needs,teacher.skills,by.x="class",by.y="specialty")
m$val <- "Yes" ## add a column for the "Yes"
## go to wide format for the final result filling NA with "No"
result <- dcast(m,school+class~teacher,value.var="val",fill="No")
## school class Jill Liz Rob Sam
##1 A.H.S. art No Yes No No
##2 B.H.S. history Yes Yes No No
##3 C.H.S. english No No Yes Yes
##4 D.H.S. art No Yes No No
##5 L.H.S. art No Yes No No
##6 M.H.S. history Yes Yes No No
##7 Z.H.S. math Yes No No No
Here's how I'd do it:
(data)
schools <- data.frame(
school = c("C.H.S.", "B.H.S.", "D.H.S.", "A.H.S.","Z.H.S.", "M.H.S.", "L.H.S."),
class = c("english", "history", "art", "art", "math", "history", "art"),
stringsAsFactors = F)
teachers <- data.frame(
teacher = c("Jill", "Jill", "Sam", "Liz", "Liz", "Liz", "Rob"),
specialty = c("math", "history", "English", "history", "literature", "art", "English"),
stringsAsFactors = F)
(key concepts)
# you can get the specialties of a given teacher like this:
subset(teachers, teacher == 'Jill')$specialty
# [1] "math" "history"
# you can get the set of unique teachers like this:
unique(teachers$teacher)
# [1] "Jill" "Sam" "Liz" "Rob"
(solution)
# for each teacher, do any of their specialties match the class need of each school?
matches <-
sapply(unique(teachers$teacher), function(this_t) {
specs <- subset(teachers, teacher == this_t)$specialty
schools$class %in% specs
})
# combine with school data.frame
data.frame(schools, matches)
# school class Jill Sam Liz Rob
# 1 C.H.S. english FALSE FALSE FALSE FALSE
# 2 B.H.S. history TRUE FALSE TRUE FALSE
# 3 D.H.S. art FALSE FALSE TRUE FALSE
# 4 A.H.S. art FALSE FALSE TRUE FALSE
# 5 Z.H.S. math TRUE FALSE FALSE FALSE
# 6 M.H.S. history TRUE FALSE TRUE FALSE
# 7 L.H.S. art FALSE FALSE TRUE FALSE
Some notes:
1) It's way easier to read (and think about) when you include appropriate spacing in your code. Also, rather than create a bunch of vectors and then assemble into data.frames, do this in one step -- it's shorter, it helps show how the vectors relate to each other, and it won't clutter your global environment.
2) I'm leaving the match values as FALSE/TRUE, because since this is boolean data, it makes sense to use the appropriate data type. If you really want No/Yes, though, you can change these values into factors with those labels
3) The results are a little bit different than what you expected because 'English' == 'english' is FALSE. You might want to clean up your starting data. If you know that cases will be mixed and you case-insensitive matching, you can coerce all values to lowercase before comparing: tolower(schools$class) %in% tolower(specs)

R - Why does sapply return lists within this dataframe when searching for values in another dataframe?

I have the following two 2 data.frames:
df1
structure(list(thread_id = c(1L, 1L, 2L, 2L, 2L, 2L), course_week = c(1,
1, 1, 1, 1, 1), user_id = c(1237305, 3001241, 1237305, 1237305,
4455134, 4398594), post_id_unique = c("1-NA", "1-post-1", "2-NA",
"2-post-2", "2-post-2", "2-post-2"), to = list(NULL, 1L, NULL,
2L, 2L, 2L)), .Names = c("thread_id", "course_week", "user_id",
"post_id_unique", "to"), row.names = c(NA, 6L), class = "data.frame")
df2
structure(list(thread_id = c(1L, 1L, 2L, 2L, 2L, 2L), course_week = c(1,
1, 1, 1, 1, 1), user_id = c(1237305, 3001241, 1237305, 1237305,
4455134, 4398594), post_id_unique = c("1-post-1", "1-post-1125",
"2-post-2", "2-post-3", "2-post-43", "2-post-54")), .Names = c("thread_id",
"course_week", "user_id", "post_id_unique"), row.names = c(NA,
6L), class = "data.frame")
I am trying to replace df1$to with the value in df2$user_id that matches the $post_id_unique column in both files.
I've made the following code for it:
from <- as.list(df1$post_id_unique)
replace <- function(i){if(grepl("NA",i)!=TRUE) {df2[df2$post_id_unique==i,1]}}
df1$to <- sapply(from, replace)
Which works almost perfectly... except that every value within df1$to is a list rather than a numeric or character vector:
'data.frame': 6 obs. of 5 variables:
$ thread_id : int 1 1 2 2 2 2
$ course_week : num 1 1 1 1 1 1
$ user_id : num 1237305 3001241 1237305 1237305 4455134 ...
$ post_id_unique: chr "1-NA" "1-post-1" "2-NA" "2-post-2" ...
$ to :List of 6
..$ : NULL
..$ : int 1
..$ : NULL
..$ : int 2
..$ : int 2
..$ : int 2
Why is my original code creating lists within the dataframe? How can I unlist them? Or avoid them to start with.
I know this is similar to a merge(), but I am interested in doing it this way for learning and other reasons.
The "problem" is that sometimes your replace() function doesn't return a value (when the i value contains "NA"). Since sapply always returns an object with the same length as the input, NULL values are returned for the function. NULLs cannot be placed in a simple vector, so the result of sapply is cast to a list. you can fix this by returning a NA instead of nothing
replace <- function(i){if(grepl("NA",i)!=TRUE) {df2[df2$post_id_unique==i,1]} else {NA}}
But really it looks like you are doing a basic left merge operation. The basic syntax would be
merge(df1, df2, by="post_id_unique", all.x=T)

Extracting values from R table within grouped values

I have the following table ordered group by first, second and name.
myData <- structure(list(first = c(120L, 120L, 126L, 126L, 126L, 132L, 132L), second = c(1.33, 1.33, 0.36, 0.37, 0.34, 0.46, 0.53),
Name = structure(c(5L, 5L, 3L, 3L, 4L, 1L, 2L), .Label = c("Benzene",
"Ethene._trichloro-", "Heptene", "Methylamine", "Pentanone"
), class = "factor"), Area = c(699468L, 153744L, 32913L,
4948619L, 83528L, 536339L, 105598L), Sample = structure(c(3L,
2L, 3L, 3L, 3L, 1L, 1L), .Label = c("PO1:1", "PO2:1", "PO4:1"
), class = "factor")), .Names = c("first", "second", "Name",
"Area", "Sample"), class = "data.frame", row.names = c(NA, -7L))
Within each group I want to extract the area that correspond to the specific sample. Several groups don´t have areas from the samples, so if the sample is´nt detected it should return "NA".Ideally, the final output should be a column for each sample.
I have tried the ifelse function to create one column to each sample:
PO1<-ifelse(myData$Sample=="PO1:1",myData$Area, "NA")
However this doesn´t takes into account the group distribution. I want to do this, but within the group. Within each group (a group as equal value for first, second and Name columns) if sample=PO1:1, Area, else NA.
For the first group:
structure(list(first = c(120L, 120L), second = c(1.33, 1.33),
Name = structure(c(1L, 1L), .Label = "Pentanone", class = "factor"),
Area = c(699468L, 153744L), Sample = structure(c(2L, 1L), .Label = c("PO2:1",
"PO4:1"), class = "factor")), .Names = c("first", "second", "Name",
"Area", "Sample"), class = "data.frame", row.names = c(NA, -2L))
The output should be:
structure(list(PO1.1 = NA, PO2.1 = 153744L, PO3.1 = NA, PO4.1 = 699468L), .Names =c("PO1.1", "PO2.1", "PO3.1", "PO4.1"), class = "data.frame", row.names = c(NA, -1L))
Any suggestion?
As in the example in the quesiton, I am assuming Sample is a factor. If this is not the case, consider making it such.
First, lets clean up the column Sample to make it a legal name, or else it might cause errors
levels(myData$Sample) <- make.names(levels(myData$Sample))
## DEFINE THE CUTS##
# Adjust these as necessary
#--------------------------
max.second <- 3 # max & nin range of myData$second
min.second <- 0 #
sprd <- 0.15 # with spread for each group
#--------------------------
# we will cut the myData$second according to intervals, cut(myData$second, intervals)
intervals <- seq(min.second, max.second, sprd*2)
# Next, lets create a group column to split our data frame by
myData$group <- paste(myData$first, cut(myData$second, intervals), myData$Name, sep='-')
groups <- split(myData, myData$group)
samples <- levels(myData$Sample) ## I'm assuming not all samples are present in the example. Manually adjusting with: samples <- sort(c(samples, "PO3.1"))
# Apply over each group, then apply over each sample
myOutput <-
t(sapply(groups, function(g) {
#-------------------------------
# NOTE: If it's possible that within a group there is more than one Area per Sample, then we have to somehow allow for thi. Hence the "paste(...)"
res <- sapply(samples, function(s) paste0(g$Area[g$Sample==s], collapse=" - ")) # allowing for multiple values
unlist(ifelse(res=="", NA, res))
## If there is (or should be) only one Area per Sample, then remove the two lines aboce and uncomment the two below:
# res <- sapply(samples, function(s) g$Area[g$Sample==s]) # <~~ This line will work when only one value per sample
# unlist(ifelse(res==0, NA, res))
#-------------------------------
}))
# Cleanup names
rownames(myOutput) <- paste("Group", 1:nrow(myOutput), sep="-") ## or whichever proper group name
# remove dummy column
myData$group <- NULL
Results
myOutput
PO1.1 PO2.1 PO3.1 PO4.1
Group-1 NA "153744" NA "699468"
Group-2 NA NA NA "32913 - 4948619"
Group-3 NA NA NA "83528"
Group-4 "536339" NA NA NA
Group-5 "105598" NA NA NA
You cannot really expect R to intuit that there is a fourth factor level between PO2 and PO4 , now can you.
> reshape(inp, direction="wide", idvar=c('first','second','Name'), timevar="Sample")
first second Name Area.PO4:1 Area.PO2:1 Area.PO1:1
1 120 1.3 Pentanone 699468 153744 NA
3 126 0.4 Heptene 32913 NA NA
4 126 0.4 Heptene 4948619 NA NA
5 126 0.3 Methylamine 83528 NA NA
6 132 0.5 Benzene NA NA 536339
7 132 0.5 Ethene._trichloro- NA NA 105598

R - ordering in boxplot

I am trying to produce a series of box plots in R that is grouped by 2 factors. I've managed to make the plot, but I cannot get the boxes to order in the correct direction.
My data farm I am using looks like this:
Nitrogen Species Treatment
2 G L
3 R M
4 G H
4 B L
2 B M
1 G H
I tried:
boxplot(mydata$Nitrogen~mydata$Species*mydata$Treatment)
this ordered the boxes alphabetically (first three were the "High" treatments, then within those three they were ordered by species name alphabetically).
I want the box plot ordered Low>Medium>High then within each of those groups G>R>B for the species.
So i tried using a factor in the formula:
f = ordered(interaction(mydata$Treatment, mydata$Species),
levels = c("L.G","L.R","L.B","M.G","M.R","M.B","H.G","H.R","H.B")
then:
boxplot(mydata$Nitrogen~f)
however the boxes are still shoeing up in the same order. The labels are now different, but the boxes have not moved.
I have pulled out each set of data and plotted them all together individually:
lg = mydata[mydata$Treatment="L" & mydata$Species="G", "Nitrogen"]
mg = mydata[mydata$Treatment="M" & mydata$Species="G", "Nitrogen"]
hg = mydata[mydata$Treatment="H" & mydata$Species="G", "Nitrogen"]
etc ..
boxplot(lg, lr, lb, mg, mr, mb, hg, hr, hb)
This gives what i want, but I would prefer to do this in a more elegant way, so I don't have to pull each one out individually for larger data sets.
Loadable data:
mydata <-
structure(list(Nitrogen = c(2L, 3L, 4L, 4L, 2L, 1L), Species = structure(c(2L,
3L, 2L, 1L, 1L, 2L), .Label = c("B", "G", "R"), class = "factor"),
Treatment = structure(c(2L, 3L, 1L, 2L, 3L, 1L), .Label = c("H",
"L", "M"), class = "factor")), .Names = c("Nitrogen", "Species",
"Treatment"), class = "data.frame", row.names = c(NA, -6L))
The following commands will create the ordering you need by rebuilding the Treatment and Species factors, with explicit manual ordering of the levels:
mydata$Treatment = factor(mydata$Treatment,c("L","M","H"))
mydata$Species = factor(mydata$Species,c("G","R","B"))
edit 1 : oops I had set it to HML instead of LMH. fixing.
edit 2 : what factor(X,Y) does:
If you run factor(X,Y) on an existing factor, it uses the ordering of the values in Y to enumerate the values present in the factor X. Here's some examples with your data.
> mydata$Treatment
[1] L M H L M H
Levels: H L M
> as.integer(mydata$Treatment)
[1] 2 3 1 2 3 1
> factor(mydata$Treatment,c("L","M","H"))
[1] L M H L M H <-- not changed
Levels: L M H <-- changed
> as.integer(factor(mydata$Treatment,c("L","M","H")))
[1] 1 2 3 1 2 3 <-- changed
It does NOT change what the factor looks like at first glance, but it does change how the data is stored.
What's important here is that many plot functions will plot the lowest enumeration leftmost, followed by the next, etc.
If you create factors simply using factor(X) then usually the enumeration is based upon the alphabetical order of the factor levels, (e.g. "H","L","M"). If your labels have a conventional ordering different from alphabetical (i.e. "H","M","L"), this can make your graphs seems strange.
At first glance, it may seem like the problem is due to the ordering of data in the data frame - i.e. if only we could place all "H" at the top and "L" at the bottom, then it would work. It doesn't. But if you want your labels to appear in the same order as the first occurrence in the data, you can use this form:
mydata$Treatment = factor(mydata$Treatment, unique(mydata$Treatment))
This earlier StackOverflow question shows how to reorder a boxplot based on a numerical value; what you need here is probably just a switch from factor to the related type ordered. But it is hard say as we do not have your data and you didn't provide a reproducible example.
Edit Using the dataset you posted in variable md and relying on the solution I pointed to earlier, we get
R> md$Species <- ordered(md$Species, levels=c("G", "R", "B"))
R> md$Treatment <- ordered(md$Treatment, levels=c("L", "M", "H"))
R> with(md, boxplot(Nitrogen ~ Species * Treatment))
which creates the chart you were looking to create.
This is also equivalent to the other solution presented here.

Resources