Missing Tree Tips When Plotting Phenogram - r

When I generate a phenogram using the phytools package, the tips and tip labels of the trees are not displaying. Does anyone have any ideas on how to fix this, or another way of plotting a phenogram with nodes and tips with a y axis plotted at the value of the trait in question?
Here's what I have:
midpointData <-
structure(list(Species = structure(1:6, .Label = c("Icterus_croconotus",
"Icterus_graceannae", "Icterus_icterus", "Icterus_jamacaii",
"Icterus_mesomelas", "Icterus_pectoralis"), class = "factor"),
bio_1nam = c(243L, 193L, 225L, 209L, 189L, 180L), bio_12nam = c(5127.5,
751.5, 1373, 914.5, 4043.5, 2623.5), bio_16nam = c(1470.5,
442, 656.5, 542, 1392.5, 1074), bio_17nam = c(1094.5, 51.5,
135, 189.5, 768.5, 377.5), bio_2nam = c(97.5, 91.5, 83, 82.5,
81, 102), bio_5nam = c(314, 265.5, 311, 274, 282, 281), bio_6nam = c(167.5,
132.5, 175.5, 154.5, 128, 114)), .Names = c("Species", "bio_1nam",
"bio_12nam", "bio_16nam", "bio_17nam", "bio_2nam", "bio_5nam",
"bio_6nam"), class = "data.frame", row.names = c(NA, -6L))
prunedTargetTree <-
structure(list(edge = structure(c(7L, 7L, 8L, 9L, 9L, 8L, 10L,
11L, 11L, 10L, 1L, 8L, 9L, 2L, 3L, 10L, 11L, 4L, 5L, 6L), .Dim = c(10L,
2L)), Nnode = 5L, tip.label = c("Icterus_mesomelas", "Icterus_pectoralis",
"Icterus_graceannae", "Icterus_croconotus", "Icterus_icterus",
"Icterus_jamacaii"), edge.length = c(0.152443952069696, 0.014866140819964,
0.0311847312922788, 0.106393079957453, 0.106393079957453, 0.0727572150872864,
0.0130293222294024, 0.0517912739330428, 0.0517912739330428, 0.0648205961624452
)), .Names = c("edge", "Nnode", "tip.label", "edge.length"), class = "phylo", order = "cladewise")
library(phytools)
reconBio1 <- ace(midpointData$bio_1nam, prunedTargetTree, type = "continuous", method = "ML")
bio1final <- c(reconBio1$ace, midpointData$bio_1nam)
names(bio1final) <- c(7,8,9,10,11,4,3,5,6,1,2)
plot.new()
phenogram(prunedTargetTree, bio1final, ylim = c(min(bio1final), max(bio1final)))
Here's what the tree looks like:

I have solved the problem, but wanted to share the solution in case others run into the same issue. pheonogram() looks for names in the argument x (aka bio1final) that match prunedTargetTree$tip.label, not the numeric index of the tip. Instead of:
bio1final <- c(reconBio1$ace, midpointData$bio_1nam);
names(bio1final) <- c(7,8,9,10,11,4,3,5,6,1,2)
it should read:
bio1final <- c(reconBio1$ace, midpointData$bio_1nam);
names(bio1final) <- c(7,8,9,10,11,as.character(midpointData$Species))
**as.character is important, because otherwise $Species is read in as a factor, and the tips of the tree still won't plot.

Related

What is the best way to use agricolae to do ANOVAs on a split plot design?

I'm trying to run some ANOVAs on data from a split plot experiment, ideally using the agricolae package. It's been a while since I've taken a stats class and I wanted to be sure I'm analyzing this data correctly, so I did some searching online and couldn't really find consistency in the way people were analyzing their split plot experiments. What is the best way for me to do this?
Here's the head of my data:
dput(head(rawData))
structure(list(ï..Plot = 2111:2116, Variety = structure(c(5L,
4L, 3L, 6L, 1L, 2L), .Label = c("Burbank", "Hodag", "Lamoka",
"Norkotah", "Silverton", "Snowden"), class = "factor"), Rate = c(4L,
4L, 4L, 4L, 4L, 4L), Rep = c(1L, 1L, 1L, 1L, 1L, 1L), totalTubers = c(594L,
605L, 656L, 729L, 694L, 548L), totalOzNoCulls = c(2544.18, 2382.07,
2140.69, 2401.56, 2440.56, 2503.5), totalCWTacNoCulls = c(461.76867,
432.345705, 388.535235, 435.88314, 442.96164, 454.38525), avgLWratio = c(1.260615419,
1.287949374, 1.111981583, 1.08647584, 1.350686661, 1.107173509
), Hollow = c(14L, 15L, 22L, 25L, 14L, 13L), Double = c(10L,
13L, 15L, 22L, 11L, 9L), Knob = c(86L, 80L, 139L, 156L, 77L,
126L), Researcher = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "Wang", class = "factor"),
CullsPounds = c(1.75, 1.15, 4.7, 1.85, 0.8, 5.55), CullsOz = c(28,
18.4, 75.2, 29.6, 12.8, 88.8), totalOz = c(2572.18, 2400.47,
2215.89, 2431.16, 2453.36, 2592.3), totalCWTacCulls = c(466.85067,
435.685305, 402.184035, 441.25554, 445.28484, 470.50245)), row.names = c(NA,
6L), class = "data.frame")
For these data, the whole plot is Rate, the split plot is Variety, the block is Rep, and for discussion's sake here, we can look at totalCWTacNoCulls as the response.
Any help would be very much appreciated! I am still getting the hang of Stack Overflow, so if I have made any mistakes or shared my data wrong, please let me know and I'll change it. Thank you!
You can do this using agricolae package as follows
library(agricolae)
attach(rawData)
Rate = factor(Rate)
Variety = factor(Variety)
Rep = factor(Rep)
sp.plot(Rep, Rate, Variety, totalCWTacNoCulls)
Usage according to agricolae package is
sp.plot(block, pplot, splot, Y)
where, block is replications, pplot is main-plot Factor, splot is sub-plot Factor and Y response variable

Display error while modifying the hoverinfo of the choropleth maps using ggplotly and choroplethrZip package in R

I tried to modify the hoverinfo using style function of plotly and end up displaying the entire column information on every hover point.
I tried to use style function using <\n>, "br", "br /" and tooltip. The problem does not exist with ggplot objects converted to plotly objects.
library(tidyverse)
library(plotly)
library(choroplethr)
library(choroplethrZip)
trips$region<- as.character(trips$zip)
trips$value<- as.numeric(trips$AVG_COST_PER_MILE)
title <- paste("Average Cost Per Mile ")
choropleths<- zip_choropleth(trips,title = title, state_zoom =
"delaware")
mytext=paste("Cost per Mile =", trips$AVG_COST_PER_MILE, "<br> Trip
Count = ", trips$Trip_count, "<br> Cityname",trips$CITY)
p<- ggplotly(choropleths)
style( p, text=mytext, hoverinfo = "text")
Sample Data:
trips<- trips<-structure(list(TIME = c(10L, 10L, 10L, 10L, 10L, 10L,
10L,10L, 10L, 10L), zip = c(19700L, 19711L, 19720L, 19730L, 19731L,
19732L,19735L, 19801L, 19814L, 19901L), AVG_COST_PER_MILE = c(3.33,
2.63, 2.05, 2.85, 2.98, 5.32, 3.37, 2.57, 3.5, 1.95), Trip_count = c(2L,
1L, 7L, 5L, 3L, 1L, 10L, 0L, 1L, 0L), CITY = structure(c(3L,
6L, 5L, 1L, 8L, 9L, 2L, 10L, 7L, 4L), .Label = c(" Odessa", "
Winterthur", "Delaware City", "Dover", "New Castle", "Newark",
"Newport", "Port Penn", "Rockland", "Wilmington"), class = "factor"),
latitude = c(39.57032, 39.70056, 39.66922, 39.45648, 39.51816, 39.7945,
39.7944, 39.73856, 39.71363, 39.16426), longitude = c(-75.59066,
-75.7431, -75.59003, -75.65976, -75.57656, -75.57433, -75.5976,
-75.54833, -75.59628, -75.51163)), class = "data.frame", row.names =
c(NA, -10L))
I want to see hover-point display the appropriate information on a right point.

Plotting multiple lines in R

I'm pretty new to R so I don't really know what I'm doing. Anyway, I have data in this format in excel (as a csv file):
dt <- data.frame(species = rep(c("a", "b", "c"), each = 4),
cover = rep(1:3, times = 4),
depth = rep(c(15, 30, 60, 90), times = 3),
stringsAsFactors = FALSE)
I want to plot a graph of cover against depth, with a different coloured line for each species, and a key for which species is which colour. I don't even know where to start.
Sorry if something similar has been asked before. Any help would be much appreciated!
Don't know if this is in a helpful format but here's some of the actual data, I need to read more about dput I think:
structure(list(species = structure(c(1L, 1L, 2L, 2L, 3L, 3L,
4L, 4L, 5L, 5L, 6L, 6L, 7L, 7L, 8L, 8L, 9L, 9L, 10L, 10L, 11L,
11L), .Label = c("Agaricia fragilis", "bryozoan", "Dichocoenia stokesi",
"Diploria labyrinthiformis", "Diploria strigosa", "Madracis decactis",
"Manicina", "Montastrea cavernosa", "Orbicella franksi", "Porites asteroides",
"Siderastrea radians"), class = "factor"), cover = c(0.021212121,
0.04047619, 0, 0, 0, 0, 1.266666667, 4.269047619, 3.587878788,
3.25, 0.118181818, 0.152380952, 0, 0.007142857, 3.806060606,
2.983333333, 14.13030303, 15.76190476, 0.415151515, 0.2, 0.26969697,
0.135714286), depth = c(30L, 15L, 30L, 15L, 30L, 15L, 30L, 15L,
30L, 15L, 30L, 15L, 30L, 15L, 30L, 15L, 30L, 15L, 30L, 15L, 30L,
15L)), .Names = c("species", "cover", "depth"), row.names = c(NA,
22L), class = "data.frame")
Here is a solution using the ggplot2 package.
# Load packages
library(ggplot2)
# Create example data frame based on the original example the OP provided
dt <- data.frame(species = rep(c("a", "b", "c"), each = 4),
cover = rep(1:3, times = 4),
depth = rep(c(15, 30, 60, 90), times = 3),
stringsAsFactors = FALSE)
# Plot the data
ggplot(dt, aes(x = depth, y = cover, group = species, colour = species)) +
geom_line()
This should get you going!
df1 <- read.csv("//file_location.csv", headers=T)
library(dplyr)
df1 <- df1 %>% select(species, depth) %>% group_by(species) %>%
summarise(mean(depth)
library(ggplot2)
ggplot(df1, aes(x=depth, y=species, group=species, color=species) +
geom_line()

Filter out parent row if child row present

I've got an interesting filter problem. For each TEI I need to check if it exists in any CHILDREN_LIST, and if it does delete the parent row where it exists.
For example: TEI 611100 exists in the CHILDREN_LIST for TEI 611000 so I need to delete the 611000 row.
Here is the dput() for the table. Thanks!
structure(list(TEI = c(611000L, 611100L, 238000L, 452000L, 561000L,
621000L, 622000L, 622100L, 623000L, 722000L, 722500L, 722510L
), OWNERSHIP = c(30L, 30L, 50L, 50L, 50L, 50L, 50L, 50L, 50L,
50L, 50L, 50L), RESULT = c(266.9, 259.5, 138, 103.3, 105.8, 130,
230, 214.1, 171.9, 204, 185.2, 185.2), CODE = c(3L, 4L, 3L, 3L,
3L, 3L, 3L, 4L, 3L, 3L, 4L, 5L), CHILDREN_LIST = structure(c(4L,
NA, 1L, 2L, 3L, 5L, 6L, NA, 7L, 8L, 9L, 10L), .Label = c("238100 238200 238300 238900",
"452100 452900", "561100 561200 561300 561400 561500 561600 561700 561900",
"611100 611200", "621100 621200 621300 621400 621500 621600 621900",
"622100 622200 622300", "623100 623200 623300 623900", "722300 722400 722500",
"722510", "722511 722513 722514 722515"), class = "factor"),
ESTIMATE_TYPE = c(TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
TRUE, TRUE, TRUE, TRUE, TRUE), NAICS_LABEL = c(611, 6111,
238, 452, 561, 621, 622, 6221, 623, 722, 7225, 72251), NAICS_TITLE = structure(c(3L,
4L, 11L, 7L, 1L, 2L, 8L, 6L, 9L, 5L, 10L, 10L), .Label = c("Administrative and support services",
"Ambulatory health care services", "Educational services",
"Elementary and secondary schools", "Food services and drinking places",
"General medical and surgical hospitals", "General merchandise stores",
"Hospitals", "Nursing and residential care facilities", "Restaurants",
"Specialty trade contractors"), class = "factor")), .Names = c("TEI",
"OWNERSHIP", "RESULT", "CODE", "CHILDREN_LIST", "ESTIMATE_TYPE",
"NAICS_LABEL", "NAICS_TITLE"), row.names = c(NA, 12L), class = "data.frame")
library(dplyr)
#Construct a numeric list of children nodes for each row
child_list <- df$CHILDREN_LIST %>% as.character %>% strsplit("\\W+") %>% sapply(as.numeric)
#Test whether a TEI has a child
has_child <- sapply(child_list, function(ch) {
any(ch %in% df$TEI)
})
subset(df, !has_child)
Assuming by any CHILDREN_LIST, you meant as any element in that particular row's list.
Here is what I did. I know using for loops in R is not popular, but here it makes the code clearer for me.
which_rows_to_delete<-vector()
for ( i in 1:length(a)){
#first create a vector of all the TEI in the children list
children<-unlist(strsplit(as.character(factor(a$CHILDREN_LIST[i])), split=" "))
#check if the TEI of the row matches any element of the vector
check<-any(a$TEI[i]==children)&!is.na(a$CHILDREN_LIST[i])
#store that information in another vector
which_rows_to_delete[i]<-check
}
a<-a[!check,]
Assuming by any CHILDREN_LIST, you meant as any element in that particular row's list. If not, and you need to see if it matches any element in any entry in CHILDREN_LIST column at all, instead of children in the above code, use:
children_all<-unlist(strsplit(levels(a$CHILDREN_LIST), split=" "))
The dput you give does not have any such overlaps, hence the output for this data frame is the same. But this code should work in general. :)

Order axis when doing a bubble chart using plotly in R

I have a bubble chart using plotly in R but the order of the axis appear to be somehow odd.
The output is as follows and you can see how the axis are not correct:
The code that I'm using is as follows
library(plotly)
library(ggplot2)
file <- c("C://link//data.csv")
#dataSource <- read.csv(file, sep =",", header = TRUE)
dataSource <- read.table(file, header=T, sep=",")
dataSource <- na.omit(dataSource)
slope <- 1
dataSource$size <- sqrt(dataSource$Y.1 * slope)
colors <- c('#4AC6B7', '#1972A4') #, '#965F8A', '#FF7070', '#C61951')
plot_ly(dataSource,
x = ~Y.1.vs.Y.2,
y = ~YTD.vs.Y.1.YTD,
color = ~BU,
size = ~size,
colors = colors,
type = 'scatter',
mode = 'markers',
sizes = c(min(dataSource$size), max(dataSource$size)),
marker = list(symbol = 'circle', sizemode = 'diameter',
line = list(width = 2, color = '#FFFFFF')),
text = ~paste('Business Unit:',
BU, '<br>Product:',
Product, '<br>Y.1.vs.Y.2:',
Y.1.vs.Y.2, '<br>YTD.vs.Y.1.YTD:',
YTD.vs.Y.1.YTD)) %>%
layout(title = 'Y.1.vs.Y.2 v. YTD.vs.Y.1.YTD',
xaxis = list(title = 'Y.1.vs.Y.2',
gridcolor = 'rgb(255, 255, 255)',
zerolinewidth = 1,
ticklen = 5,
gridwidth = 2),
yaxis = list(title = 'YTD.vs.Y.1.YTD',
gridcolor = 'rgb(255, 255, 255)',
zerolinewidth = 1,
ticklen = 5,
gridwith = 2),
paper_bgcolor = 'rgb(243, 243, 243)',
plot_bgcolor = 'rgb(243, 243, 243)')
The data is as follows:
structure(list(BU = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("B", "D"), class = "factor"), Product = structure(c(4L, 5L, 7L, 8L, 9L, 13L, 1L, 3L, 4L, 11L, 12L, 13L), .Label = c("ADT", "BHL", "CEX", "CMX", "CTL", "HTH", "MTL", "SSL", "TLS", "UTV", "WEX", "WLD", "WMX"), class = "factor"), Y.2 = c(4065L, 499L, 20L, 5491L, 781L, 53L, 34L, 1338L, 557L, 428L, 310L, 31L), Y.1 = c(4403L, 550L, 28L, 5225L, 871L, 46L, 22L, 1289L, 602L, 426L, 318L, 37L), Y.1.YTD = c(4403L, 550L, 28L, 5225L, 871L, 46L, 22L, 1289L, 602L, 426L, 318L, 37L), YTD = c(5026L, 503L, 29L, 3975L, 876L, 40L, 62L, 1395L, 717L, 423L, 277L, 35L), Y.1.vs.Y.2 = structure(c(12L, 7L, 11L, 4L, 8L, 1L, 2L, 3L, 12L, 6L, 10L, 9L), .Label = c("-13%", "-35%", "-4%", "-5%", "-76%", "0%", "10%", "12%", "19%", "3%", "40%", "8%"), class = "factor"), YTD.vs.Y.1.YTD = structure(c(8L, 5L, 11L, 3L, 7L, 2L, 9L, 12L, 10L, 1L, 2L, 4L), .Label = c("-1%", "-13%", "-24%", "-5%", "-9%", "0%", "1%", "14%", "182%", "19%", "4%", "8%"), class = "factor")), .Names = c("BU", "Product", "Y.2", "Y.1", "Y.1.YTD", "YTD", "Y.1.vs.Y.2", "YTD.vs.Y.1.YTD"), row.names = c(2L, 3L, 4L, 5L, 6L, 8L, 9L, 10L, 11L, 13L, 14L, 15L), class = "data.frame", na.action = structure(c(1L, 7L, 12L), .Names = c("1", "7", "12"), class = "omit"))
Any ideas on how can I order the axis properly?
Thanks
There are a few ways to manipulate factor levels, but things can get a bit messy if you're not careful. You should familiarize yourself with ?levels and ?factor, as well as maybe ?reorder, ?relevel
In the meantime, try something like this
dataSource[[7]] <- factor(dataSource[[7]], levels = c("-76%", "-35%", "-13%", "-5%", "-4%", "0%", "3%", "8%", "10%", "12%", "19%", "40%"))
Edit
To consolidate my answer and comment...
This behaviour is caused because of the way factors are encoded. Your axes are strings and factor order is determined alphnumerically. So to change their order you have to specify it as above, or else code them numerically and give them the required names. There are many different ways to change them, in several packages. This answer provides a standard base R method for handling factors. For further info start with the manual pages I suggested.
As for it being "very manual", since factors are categorical (and therefore have a potentially arbitrary order), there is no way to automate their order unless you code them numerically in the desired order.
Thanks to the comments above I've been able to resolve the issue. Find below the full code, which I hope might help other users:
library(plotly)
library(ggplot2)
file <- c("C://link//data.csv")
dataSource <- read.table(file, header=T, sep=",")
dataSource <- na.omit(dataSource)
# Additional code to format the input values and recalculate the percentages
BUValues = dataSource$BU
ProductValues = dataSource$Product
dataSource <- as.data.frame(data.matrix(dataSource), stringsAsfactors = FALSE)
dataSource$BU = BUValues
dataSource$Product = ProductValues
dataSource$Y.1.vs.Y.2 = round((dataSource$Y.1/dataSource$Y.2 -1)*100,2)
dataSource$YTD.vs.Y.1.YTD = round((dataSource$YTD/dataSource$Y.1.YTD -1)*100,2)
slope <- 1
dataSource$size <- sqrt(dataSource$Y.1 * slope)
colors <- c('#4AC6B7', '#1972A4') #, '#965F8A', '#FF7070', '#C61951')
plot_ly(dataSource,
x = ~Y.1.vs.Y.2,
y = ~YTD.vs.Y.1.YTD,
color = ~BU,
size = ~size,
colors = colors,
type = 'scatter',
mode = 'markers',
sizes = c(min(dataSource$size), max(dataSource$size)),
marker = list(symbol = 'circle', sizemode = 'diameter',
line = list(width = 2, color = '#FFFFFF')),
text = ~paste('Business Unit:', BU,
'<br>Product:', Product,
'<br>YoY:',Y.1.vs.Y.2,
'<br>YTD:',YTD.vs.Y.1.YTD)) %>%
layout(title = 'YoY vs YTD Performance',
xaxis = list(title = 'YoY Performance (%)',
gridcolor = 'rgb(255, 255, 255)',
zerolinewidth = 1,
ticklen = 5,
gridwidth = 2),
yaxis = list(title = 'YTD Performance (%)',
gridcolor = 'rgb(255, 255, 255)',
zerolinewidth = 1,
ticklen = 5,
gridwith = 2),
paper_bgcolor = 'rgb(243, 243, 243)',
plot_bgcolor = 'rgb(243, 243, 243)')

Resources