Advanced stacked bar chart ggplot2 - r

Say I have the data frame:
df<-structure(list(predworker = c(1, 1, 1, 1, 1, 3, 3, 3, 3, 3, 4,
4, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 7, 7, 7, 7, 7, 8, 8,
8, 8, 8, 9, 9, 9, 9, 9, 10, 10, 10, 10, 10,
11, 11, 11, 11, 11
), worker = c(1, 14, 11, 19, 13, 23, 3, 15, 20, 6, 15, 3, 5,
4, 22, 5, 21, 11, 14, 4, 15, 23, 6, 20, 3, 17, 16, 8, 9, 7, 8,
17, 9, 16, 7, 17, 9, 8, 16, 7, 10, 19, 2, 15, 14, 14, 1, 11,
19, 13), finalratio = c(0.358338156170776, 0.328697413978311,
0.200283479825366, 0.0634027658799677, 0.049278184145579, 0.245483741112573,
0.216351263581975, 0.211285529819829, 0.171813670019988, 0.155065795465635,
0.216637792442049, 0.21365067362223, 0.20254559121035, 0.184813787488195,
0.182352155237176, 0.257680316012908, 0.233934275233779, 0.18618378722994,
0.173241645742261, 0.14895997578111, 0.295633225885233, 0.197824577675154,
0.173926460086197, 0.169883366487268, 0.162732369866148, 0.312634332494825,
0.213471605336063, 0.168500990861721, 0.156199312722058, 0.149193758585333,
0.288139828063799, 0.249716321272007, 0.228189414450808, 0.132448859555662,
0.101505576657724, 0.28062982129018, 0.24896481457126, 0.185822099676468,
0.175529116141424, 0.109054148320668, 0.843396823680576, 0.0488581484138975,
0.0419903739183709, 0.0332313337137541, 0.0325233202734015, 0.354288383060293,
0.308159669367751, 0.222981515774462, 0.0731493536310783, 0.0414210781664159
), rank = c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L,
4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L,
5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L,
1L, 2L, 3L, 4L, 5L)),
.Names = c("predworker", "worker", "finalratio", "rank"),
row.names = c(NA, -50L), class = c("grouped_df", "tbl_df","tbl", "data.frame"),
vars = "predworker", drop = TRUE,
indices = list( 0:4, 5:9, 10:14, 15:19, 20:24, 25:29, 30:34, 35:39, 40:44,
45:49),
group_sizes = c(5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L),
biggest_group_size = 5L, labels = structure(list(predworker = c(1, 3, 4, 5, 6, 7, 8, 9, 10, 11)),
row.names = c(NA, -10L), class = "data.frame", vars = "predworker",
drop = TRUE, .Names = "predworker"))
which looks as follows:
predworker worker finalratio rank
<dbl> <dbl> <dbl> <int>
1. 1. 0.358 1
1. 14. 0.329 2
1. 11. 0.200 3
1. 19. 0.0634 4
1. 13. 0.0493 5
3. 23. 0.245 1
I'm trying to do a stacked bar chart using ggplot2. I'm looking for something similar to this
ggplot(df, aes(x = factor(predworker) ,y = finalratio, fill = factor(rank))) + geom_bar(stat = "identity")
However, there are some other details I do not know how to add to this plot:
I'd like to order each bar by rank (i.e. I want the longest bar at the beginning from the bottom. Opposite of what is now.)
How can I do two subplots in the same figure. Say I want the first 6 bars in one subplot and the rest in another subplot, but self-contained (sort of a facet in ggplot.)
How can I write a value within each bar for each category? For instance, for each rank, I'd like to write the corresponding finalratio and the worker (other column) value within the limits of each sub-bar.

To order bars by rank you need to reorder the factor levels, from 5 to 1.
You could subset the data by predworker and use something like gridExtra::gridarrange or cowplot::plot_grid to combine subplots. Or: you could add another column to indicate facets and facet on that.
You use geom_text. You'll want to round finalratio or there will be too many digits.
Putting it all together: I'm using the facet approach and ungrouping your grouped tibble because it interferes with mutate:
library(tidyverse)
df %>%
ungroup() %>%
mutate(facet = ifelse(predworker > 7, 2, 1),
rank = factor(rank, levels = 5:1),
predworker = factor(predworker)) %>%
group_by(predworker) %>%
ggplot(aes(predworker, finalratio)) +
geom_col(aes(fill = rank)) +
geom_text(aes(label = paste(worker, "=", round(finalratio, digits = 2))),
position = position_stack(vjust = 0.5)) +
facet_grid(~facet, scales = "free_x")
Or to facet vertically:
df %>%
ungroup() %>%
mutate(facet = ifelse(predworker > 7, 2, 1),
rank = factor(rank, levels = 5:1),
predworker = factor(predworker)) %>%
group_by(predworker) %>%
ggplot(aes(predworker, finalratio)) +
geom_col(aes(fill = rank)) +
geom_text(aes(label = paste(worker, "=", round(finalratio, digits = 2))),
position = position_stack(vjust = 0.5)) +
facet_wrap(~facet, scales = "free_x", ncol = 1)

Related

Order Grouped Geom_lines in ggplot

I am wishing to show multiple geom_points in order of "season_pts" from each group "drafted_qbs". The issue though is that I'm not sure what to assign the other variable. I have a "team" column which is just the row number of each group but that will only order the first grouping "2".
Any way of laying on the same graph (not interested in faceting) each groups "fantasy_pts" in order of points would be helpful.
Data
structure(list(team = c(1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L,
1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L, 1L, 2L, 3L, 4L, 5L),
season_pts = c(447.44, 381.62, 416.6, 367.96, 419.92, 490.78,
501.66, 458.56, 484.48, 458.36, 518, 495.7, 511.34, 499.68,
536.42, 522.92, 536.92, 518.46, 538.06, 525.96, 541.84, 523.26,
542.98, 527.4, 527), drafted_qbs = c(2, 2, 2, 2, 2, 3, 3,
3, 3, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -25L))
It usually helps to ask yourself "what is it I am trying to show with this plot?"
If you are trying to show that an increased number of drafted QBs tends to produce an increased number of points, then you can do something like this:
ggplot(df, aes(drafted_qbs, season_pts)) +
geom_point(size = 4, aes(color = factor(team))) +
geom_smooth(color = 'gray20', size = 0.5, linetype = 2, alpha = 0.15) +
scale_color_brewer(palette = 'Set1') +
theme_light(base_size = 16) +
labs(x = 'Drafted QBs', y = 'Season Points', color = 'Team') +
theme(panel.grid.minor.x = element_blank())
If you want to show that not all teams are affected equally by this effect, then something like this might be preferable:
ggplot(df, aes(team, season_pts, color = drafted_qbs)) +
geom_point(size = 4, alpha = 0.5) +
scale_color_gradient(low = 'red3', high = 'blue3') +
theme_light(base_size = 16) +
labs(x = 'Team', y = 'Season Points', color = 'Drafted QBs') +
theme(panel.grid.minor.x = element_blank())

set x-axis display range

The example data:
nltt <- structure(list(time = c(0, 1.02579504946471, 1.66430039972234,
1.67435173974229, 1.82009140034063, 1.95574135900067, 2.06963147976273,
2.64869249896209, 3.10438864450297, 0, 0.56927280073675, 1.94864234867127,
3.40490224199942, 0, 0.444318793403606, 1.34697160089298, 5.86547288923207,
0, 1.10824151737219, 1.77801220982399, 1.82246583876729, 2.18034182214015,
2.33051663760657, 3.01615794541527, 0, 0.101501512884473, 0.98261402255534,
1.04492141817475, 1.16554239437694, 1.25441082400256, 1.25777371029976,
1.62464049949719, 1.87253384965378, 1.91118229908154, 1.94105777022533,
2.17755553127212, 2.37899716574036, 2.85003451051712, 3.16711386665322
), num = c(2, 3, 4, 5, 4, 5, 6, 5, 6, 2, 3, 4, 5, 2, 3, 2, 3,
2, 3, 4, 5, 6, 7, 6, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 10, 11,
12, 13, 14), rep = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 5L, 5L,
5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L)), row.names = c(NA,
-39L), class = "data.frame")
The example code:
ggplot2::ggplot(nltt, aes(time, num, group = as.factor(rep), color = as.factor(rep))) +
ggplot2::geom_line() + ggplot2::coord_cartesian(xlim = c(0, 3)) +
ggplot2::theme(legend.position = "none") + ggplot2::xlab("age")
The example plot:
I would like each line in the plot to stop precisely at x = 3, but adding coord_cartesian(xlim = c(0, 3)) does not achieve my goal because the lines continues in the right padding area. How can I limit the lines in the range of [0, 3] without truncating my raw data?
Up front: two answers, the first removes the margins and requires no change to the data; the second preserves the margins, which requires one to modify the data in-place.
Remove the margin(s)
The default behavior is to expand the margins a little. While there is the ggplot2::expansion to control the multiplicative and additive components of the expansion, it can only be used in scale_continuous which, as you know, will result in loss (NA) of out-of-bound points.
If you can accept losing the boundary on the left as well, though, you can add expand=FALSE to your coord_cartesian and get your desired results:
ggplot2::ggplot(nltt, aes(time, num, group = as.factor(rep), color = as.factor(rep))) +
ggplot2::geom_line() +
ggplot2::coord_cartesian(xlim = c(0, 3), expand = FALSE) +
ggplot2::theme(legend.position = "none") +
ggplot2::xlab("age")
If you want to retain the left margin, though, you can force it by adjusting the xlim=, realizing that the default is around expansion(mult=0.05, add=0):
ggplot2::ggplot(nltt, aes(time, num, group = as.factor(rep), color = as.factor(rep))) +
ggplot2::geom_line() +
ggplot2::coord_cartesian(xlim = c(-0.15, 3), expand = FALSE) +
ggplot2::theme(legend.position = "none") +
ggplot2::xlab("age")
Interpolate and truncate, external to ggplot
ggplot2::scale_x_continuous(..., oob=) supports several mechanisms for dealing with out-of-bounds data, including:
the default censor (replaces with NA), which doesn't work since we don't have data at time=3
scales::squish that will take (for example) x=4 and squish it back to x=3; the unfortunate side-effect of this is that it is univariate (it does not attempt to change the corresponding y= value), so the slopes of the squished line segments will be steeper, and (in my mind at least) this corrupts the data and vis;
a user-defined function that is passed the values and the associated limits; unfortunately, it is also univariate, so we're stuck with the same data/vis slope-corruption as the previous bullet.
This brings me to the suggestion to interpolate the data yourself before passing to ggplot. I'll demo with dplyr but it can be done easily with base R or other dialects as well.
library(dplyr)
group_by(nltt, rep) %>%
## step 1: interpolate, returns *just* time=3 data, nothing more
summarize(as.data.frame(setNames(approx(time, num, xout = 3), c("time", "num")))) %>%
## step 2: combine with the original data
bind_rows(nltt) %>%
## step 3: remove data over 3
dplyr::filter(time <= 3) %>%
ggplot(aes(time, num, group = as.factor(rep), color = as.factor(rep))) +
ggplot2::geom_line() + ggplot2::coord_cartesian(xlim = c(0, 3)) +
ggplot2::theme(legend.position = "none") +
ggplot2::xlab("age")

How to correct a different distance between bars in geom_col

I am making a geom_col in ggplot2. The x-axis is a numerical vector of timepoints (0, 6, 18, 24, 32, 44). There is a difference between each column corresponding to the numerical difference between each timepoint. But i want an equal distance between all the columns. I have searched for answers in here, but i didn't find a similar issue.
This is my code:
ggplot(data = ny_dataframe_scratch, aes(x=timepoint, y = relative_wound_healing, fill = Condition)) +
geom_col(width = 5, position = position_dodge()) +
scale_x_continuous(breaks=c(0, 6, 18, 24, 32, 44), name = "Time point, hours") +
scale_y_continuous(name = "Relative scratch area") +
scale_fill_manual(values=c("palevioletred4", "slategray")) +
geom_point(data = ny_dataframe_scratch, position = position_dodge(width = 5), aes(x=timepoint, y=relative_wound_healing, fill = Condition))
This is the output of dput():
structure(list(timepoint = c(0, 0, 0, 0, 0, 0, 6, 6, 6, 6, 6,
6, 18, 18, 18, 18, 18, 18, 24, 24, 24, 24, 24, 24, 32, 32, 32,
32, 32, 32, 44, 44, 44, 44, 44, 44), Condition = structure(c(2L,
2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L,
1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L,
1L, 1L, 1L), .Label = c("Control", "Knockout"), class = "factor"),
relative_wound_healing = c(1, 1, 1, 1, 1, 1, 0.819981, 0.78227,
0.811902, 0.873852, 0.893572, 0.910596, 0.39819, 0.436948,
0.559486, 0.534719, 0.591295, 0.612154, 0.222731, 0.2592,
0.453575, 0.37238, 0.477891, 0.505393, 0.05243246, 0.0809449,
0.2108063, 0.261122, 0.3750218, 0.4129873, 0, 0.0240122,
0.0778219, 0.0806758, 0.2495444, 0.3203724)), class = "data.frame", row.names = c(NA,
-36L))
Picture of how the graph looks:
The x-scale has proportional gaps because ‘ggplot2’ considers the values as continuous rather than categorical.
To make it categorical, you can for instance use factors:
aes(x = factor(timepoint, ordered = TRUE), …
(Without ordered = TRUE, ‘ggplot2’ assumes alphabetical ordering, so it would put 11 before 5, which probably isn’t what you want.)
To fix the bar heights, you need to compute and plot a summary statistic — ‘ggplot2’ allows you to do this using stat_summary (instead of geom_col):
stat_summary(fun.y = mean, geom = "col", position = position_dodge())
Taken together:
ggplot(ny_dataframe_scratch) +
aes(x = factor(timepoint, ordered = TRUE), y = relative_wound_healing, fill = Condition) +
scale_fill_manual(values = c("palevioletred4", "slategray")) +
stat_summary(fun.y = mean, geom = "col", position = position_dodge()) +
geom_point(position = position_dodge(width = 1)) +
labs(x = "Time point, hours", y = "Relative scratch area")
Your timepoints are "numeric". Try coercing them to factor. At that point, ggplot should plot them at equidistance from each other.
xy$timepoint <- as.factor(xy$timepoint)

Changing size of legend and point shapes in plot from CGPfunctions::Plot2WayANOVA with GGplot2 in R

I would like to adjust the legend size so you can see the different types of lines in the legends and also change the shape of the points so that there are different point shapes per cyl group.
The Plot2WayANOVA function has an argument for passing ggplot2 arguments but maybe its just for changing components of the theme? Also, there is an option of saving the plot but it saves it as a png and when I assign the plot as an object and print it only text shows up.
library(CGPfunctions)
Plot2WayANOVA(formula = mpg ~ am * cyl, dataframe = mtcars)
Update: I am getting an error when I run the following data.
d <- structure(list(Wave = c(1, 2, 3, 4, 3, 1, 2, 3, 4, 1, 2, 3, 4,
1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1,
2, 3, 4, 1, 2, 3, 1, 2, 3, 4, 3, 4, 1, 3, 4, 1, 2, 3, 4, 1, 2,
3, 4, 1, 3, 4, 1, 2, 3, 4, 1, 4, 1, 1, 2, 3, 1, 2, 3, 4, 1, 2,
3, 1, 1, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 2, 4, 1, 2, 3, 4, 1, 2,
3, 4, 1), CigAd = c(1, 0, 4, 1, 15, 8, 4, 1, 2, 36, 17, 14, 16,
21, 16, 34, 29, 30, 17, 32, 24, 23, 5, 15, 17, 26, 26, 29, 12,
22, 30, 30, 37, 25, 0, 25, 22, 22, 22, 30, 21, 29, 36, 37, 14,
23, 0, 0, 0, 0, 0, 0, 0, 0, 17, 15, 34, 0, 0, 0, 13, 15, 24,
19, 18, 25, 40, 4, 0, 3, 1, 0, 0, 10, 18, 16, 18, 29, 1, 15,
14, 30, 21, 22, 27, 26, 28, 28, 24, 4, 5, 0, 0, 5, 0, 9, 4, 17,
16, 26), storetype = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L,
2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L), .Label = c("1",
"2", "3"), class = "factor")), row.names = c(NA, 100L), class = "data.frame")
Thank you!
The problem here is that Plot2WayANOVA is a wrapper function around ggplot that attempts to make plotting your ANOVA easier by handling many of the technical aspects of creating the appropriate ggplot to visualize the data. Although this kind of function is good for learners or for quick visualisation of data, it doesn't lend itself to customizing a plot the way you want.
The option to do all the things you ask doesn't exist within the function itself. You can pass extra geom calls and scale calls as well as theme calls, but the difficulty is that the shape aesthetic for the points is not mapped. It is possible to add a theme that widens the legend key, but when you do so you find that the line for the errobars is also included in the legend as a thin unbroken line overlying the thick patterned lines, which looks messy and again cannot be changed through the function parameters.
You therefore have two options:
Build the plot yourself using ggplot directly : the problem with this is that it would be a pain to get all of the relevant model summary information into the caption below the title, and of course defeats the point of having Plot2WayANOVA in the first place
Modify the plot object returned by Plot2WayANOVA : The problem with this is that Plot2WayANOVA doesn't actually return a ggplot object; it only prints the ggplot, then returns a nice big list of information about the analysis. You will need to hack the function to get it to return a ggplot object.
I will demonstrate how you would go about the second method here. It requires a bit less code than building from scratch, but is also a bit more technical and harder to follow.
First, write a new version of Plot2WayANOVA that returns a ggplot instead of the summary object:
f <- CGPfunctions::Plot2WayANOVA
body(f)[[84]] <- quote(return(p))
My_Plot2WayANOVA <- f
So now we can get our ggplot object by doing:
p <- My_Plot2WayANOVA(formula = mpg ~ am * cyl, dataframe = mtcars)
We will need to hack at the aesthetic mappings to get them to behave as we want, and overwrite the geom_errorbar to ensure it has a blank key glyph:
p$layers[[3]]$aes_params$shape <- NULL
p$layers[[3]]$aes_params$colour <- NULL
p$layers[[3]]$geom_params$colour <- NULL
p$layers[[3]]$geom_params$shape <- NULL
p$layers[[3]]$mapping <- aes(color = cyl, fill = cyl)
p$mapping <- aes(colour = cyl, shape = cyl,
fill = cyl, group = cyl,
x = am, y = TheMean)
p$layers[[1]] <- geom_errorbar(aes(ymin = LowerBound, ymax = UpperBound),
size = 1, width = 0.1, key_glyph = "blank")
Now p is a ggplot object to which you can add custom scales and themes as you see fit:
p + scale_shape_manual(values = c(21:23), name = "cyl") +
theme(legend.key.width = unit(75, "points"))
Whether it is worth going to all this trouble is debatable; it would probably be easier and safer to build the plot yourself from scratch if you really find the original styling unacceptable.
EDIT
Here's a working example with the data added to the question:
p <- My_Plot2WayANOVA(formula = CigAd ~ Wave * storetype, dataframe = d)
p$layers[[3]]$aes_params$shape <- NULL
p$layers[[3]]$aes_params$colour <- NULL
p$layers[[3]]$geom_params$colour <- NULL
p$layers[[3]]$geom_params$shape <- NULL
p$layers[[3]]$mapping <- aes(color = storetype, fill = storetype)
p$mapping <- aes(colour = storetype, shape = storetype,
fill = storetype, group = storetype,
x = Wave, y = TheMean)
p$layers[[1]] <- geom_errorbar(aes(ymin = LowerBound, ymax = UpperBound),
size = 1, width = 0.1, key_glyph = "blank")
p + scale_shape_manual(values = c(21:23), name = "storetype") +
theme(legend.key.width = unit(75, "points"))

correlation in R, when I do "pairwise.complet.obs" I get error "standard deviation is 0"

I am trying to do some correlation by group and have been using this very helpful thread:
spearman correlation by group in R
however, there are some NA values in my 2 variables and in my groupings, so I get NA as the result for each group
so I tried this:
> j <- lapply(split(HTNPS, HTNPS$callcat), function(HTNPS){cor(HTNPS$NPS_int,
HTNPS$holdtime_int,use="pairwise.complete.obs", method = "spearman")})
but then, although I get more sensible numbers, I get this warning:
In cor(HTNPS$NPS_int, HTNPS$holdtime_int, use = "pairwise.complete.obs", :
the standard deviation is zero
As requested I have done dput(head(HTNPS,40) for the relevant columns
> dput(head(HTNPS[,20:24], 40))
structure(list(holdtime_int = structure(c(6, 11, 7, 7, 5, 7,
6, 5, 3, 6, 3, 5, 6, 105, 7, 6, 353, 5, 6, 9, 6, 6, 12, 5, 5,
5, 249, 5, 7, 11, 5, 7, 5, 290, 6, 6, 6, 6, 5, 6), .Dim = c(40L,
1L)), NPS_int = structure(c(1, NA, NA, 3, NA, 1, 1, 2, NA, NA,
NA, NA, 3, 2, 1, NA, 2, 4, 1, 2, NA, 3, 1, 1, 1, 1, 1, 1, 1,
2, 1, 3, 1, 1, 1, 2, 4, 2, 1, 1), .Dim = c(40L, 1L)), HTnot0 = structure(c(6,
11, 7, 7, 5, 7, 6, 5, 3, 6, 3, 5, 6, 105, 7, 6, 353, 5, 6, 9,
6, 6, 12, 5, 5, 5, 249, 5, 7, 11, 5, 7, 5, 290, 6, 6, 6, 6, 5,
6), .Dim = c(40L, 1L)), callcat = structure(c(NA, NA, "CARD",
"CARD", "GENERAL", "LOAN", "CHANGE DETAILS", "GENERAL", "LOAN",
"CHANGE DETAILS", "LOAN", "CARD", "FUNDS TRANSFER", "FEE", "BALANCE",
NA, "CARD", NA, NA, "STATEMENT", "CARD", "CARD", "GENERAL", "CARD",
"CARD", "TERM DEPOSIT", "CARD", "GENERAL", "CARD", "CARD", "GENERAL",
NA, NA, NA, NA, "CARD", "CARD", "FUNDS TRANSFER", "GENERAL",
"MyBusinessOverride"), .Dim = c(40L, 1L), .Dimnames = list(NULL,
"callcat")), HTcat = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 4L, 1L, 1L, 12L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 9L, 1L, 1L, 1L, 1L, 1L, 1L, 10L, 1L, 1L,
1L, 1L, 1L, 1L), .Dim = c(40L, 1L), .Dimnames = list(NULL, "HTcat"))), .Names = c("holdtime_int",
"NPS_int", "HTnot0", "callcat", "HTcat"), row.names = c(NA, 40L
), class = "data.frame")
If you do that split, many of your samples consist of only a single observation (after removing the NA's). Obviously there's no correlation to be calculated there.
The warning you get, is when one of both variables contains only a single value. In your example that is eg the data frame for callcat==FUNDS TRANSFER. holdtime_int has only a single value (being 6), so the standard deviation is 0 (hence the warning) and the resulting correlation is NA.
I don't know why you're looking at those correlations, but on the data you provided, they hardly make any sense to me. If you want to get rid of the warning, you can build in a check eg like this:
lapply(split(HTNPS,HTNPS$callcat), function(x){
x <- na.exclude( x[c("holdtime_int","NPS_int")] )
if(any(sapply(x, function(i)length(unique(i))) < 2 )){
NA
} else {
cor(x[,1],x[,2], method="spearman")
}
})
Which should give you the same result but without the warning. Note the use of na.exclude to get rid of the NA's.

Resources