How to display separate rows in histogram in R? - r

I have a set of data that I've assigned to a variable named "data1". I know how to make a histogram of certain column, by hist(data1$RT). But among the RT column, there are "high", "medium", and "low", 'Factor's', I want to make 3 separate histograms for each factor variable but can't figure out how to do this. Here's an example of the data:
Frequency Prime_type RT
1 high prime 450
2 high prime 460
3 med prime 520
4 med prime 430
5 low prime 450
6 low prime 420
I can display hist(data1$RT), but how would I just display RT's 'high' or 'med' factors for example? I've tried a lot of things and am still stumped.

You can do it by faceting the plot with ggplot2. First, we modify df$Frequency to have the panels in order: high, med and low. Then we create the histogram specifying the breaks and using facet_wrap to divide the chart in panels. Note that we add the argument right = TRUE (right-closed and left-open intervals) to calculate the intervals as the hist function does.
library(ggplot2)
df$Frequency <- factor(df$Frequency, levels=unique(df$Frequency))
h <- ggplot(df, aes(x=RT), xlim=c(420,520)) +
geom_histogram(breaks=seq(420, 520, by=20), col="white", right = TRUE) +
facet_wrap( ~ Frequency) +
scale_x_continuous(breaks=seq(420, 520, by=20))
h
Output:
Data:
df <- structure(list(Frequency = structure(c(1L, 1L, 3L, 3L, 2L, 2L
), .Label = c("high", "low", "med"), class = "factor"), Prime_type = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = "prime", class = "factor"), RT = c(450L,
460L, 520L, 430L, 450L, 420L)), .Names = c("Frequency", "Prime_type",
"RT"), class = "data.frame", row.names = c("1", "2", "3", "4",
"5", "6"))

Related

manually scale color of a factor in ggplot

Let's say i have a data frame like this
id password year length Something
1 1234567 2001 7 good
2 pass4 2001 5 bad
3 angel3 2003 6 bad
4 pizza 2004 5 ok
im trying to get a code that would create a geom_point with 3 variable but i only want to highlight a single level of the factor ''Something'' . And i dont want any of the other levels of the factor Something(like good or bad) to colored. Or at least they can stay black.
im was thinking maybe something like this :
graph <- dat %>%
ggplot(aes(x=(year), y=length, color=Something$ok)+
geom_point()
but i can't use $ .
You can color just one point by setting all points to one color and changing the color of the point you want to change. To do this you can use scale_color_manual
Data:
dat <- structure(list(id = 1:4, password = structure(c(1L, 3L, 2L, 4L
), .Label = c("1234567", "angel3", "pass4", "pizza"), class = "factor"),
year = c(2001L, 2001L, 2003L, 2004L), length = c(7L, 5L,
6L, 5L), Something = structure(c(2L, 1L, 1L, 3L), .Label = c("bad",
"good", "ok"), class = "factor")), class = "data.frame", row.names = c(NA,
-4L))
Plot:
dat %>%
ggplot(aes(x=(year), y=length, color = Something == "ok"))+
geom_point() +
scale_color_manual(values = c("blue", "orange"))

R stackedBar chart

If this is my dataset.
Surgery Surv_Prob Group
CV 0.5113 Diabetic
Hip 0.6619 Diabetic
Knee 0.6665 Diabetic
QFox 0.7054 Diabetic
CV 0.5113 Non-Diabetic
Hip 0.6629 Non-Diabetic
Knee 0.6744 Non-Diabetic
QFox 0.7073 Non-Diabetic
How do i plot a stacked bar plot like this below.
Please note the values are already cumulative in nature, so the plot should show a very little increase from CV to Hip (delta = 0.6619- 0.5113)
And the order should be CV -> Hip -> Knee -> QFox
There could be a way where you can plot the cumulative values directly, however one way is to get the actual value and plot the stacked bar plot by arranging the Surgery data in the order you want using factor. For factor levels I have used rev(unique(Surgery)) for convenience as you want order in opposite order of how they appear in the dataset. For more complex types you might need to add levels manually.
library(tidyverse)
df %>%
group_by(Group) %>%
mutate(Surv_Prob1 = c(Surv_Prob[1], diff(Surv_Prob)),
Surgery = factor(Surgery, levels = rev(unique(Surgery)))) %>%
ggplot() + aes(Group, Surv_Prob1, fill = Surgery, label = Surv_Prob) +
geom_bar(stat = "identity") +
geom_text(size = 3, position = position_stack(vjust = 0.5))
data
df <- structure(list(Surgery = structure(c(1L, 2L, 3L, 4L, 1L, 2L,
3L, 4L), .Label = c("CV", "Hip", "Knee", "QFox"), class = "factor"),
Surv_Prob = c(0.5113, 0.6619, 0.6665, 0.7054, 0.5113, 0.6629,
0.6744, 0.7073), Group = structure(c(1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L), .Label = c("Diabetic", "Non-Diabetic"), class =
"factor")), class = "data.frame", row.names = c(NA, -8L))

How do I visualize a three way table as a heat map in R

I am a newbie to R and have been struggling like crazy to visualize a 3 way table as a heat map using geom_tile in R. I can easily do this in Excel, but cannot find any examples of how to do this in R. I have looked at using Mosaics but this is not what I want and I have found hundreds of examples of two way tables, but seems there are no examples of three way tables.
I want the output to look like this:
my data set looks like this: (its a small snapshot of 30,000 records):
xxx <- structure(list(rfm_score = c(111, 112, 113, 114, 115, 121), n = c(2624L,
160L, 270L, 23L, 5L, 650L), rec = structure(c(1L, 1L, 1L, 1L,
1L, 1L), .Label = c("1", "2", "3", "4", "5"), class = "factor"),
freq = structure(c(1L, 1L, 1L, 1L, 1L, 2L), .Label = c("1",
"2", "3", "4", "5"), class = "factor"), mon = structure(c(1L,
2L, 3L, 4L, 5L, 1L), .Label = c("1", "2", "3", "4", "5"), class = "factor")), row.names = c(NA,
6L), class = "data.frame")
It is essentially an RFM analysis of customer shopping behavior (Recency, Frequency and Monetary). The output heat map (that I want) should be the count of customers in each RFM segments. In the heat map I supplied, you will see there are two variables on the left (e.g. R = Recency(quintile ranges 1 to 5) and F = Frequency (quintile ranges 1 to 5)and at the top of the heat map is the M = monetary variable (quintile ranges 1 to 5). So, for instance, the segment RFM = 555 has a count of 2511 customers.
I have tried the following code and variations of it, but just get errors
library(ggplot2)
library(RColorBrewer)
library(dplyr)
cols <- rev(brewer.pal(11, 'RdYlBu'))
ols <- brewer.pal(9, 'RdYlGn')
ggplot(xxx)+ geom_tile(aes(x= mon, y = reorder(freq, desc(freq)), fill = n)) +
theme_change +
facet_grid(rec~.) +
# geom_text(aes(label=n)) +
# scale_fill_gradient2(midpoint = (max(xxx$n)/2), low = "red", mid = "yellow", high = "darkgreen") +
# scale_fill_gradient(low = "red", high = "blue") + scale_fill_gradientn(colours = cols) +
# scale_fill_brewer() +
labs(x = "monetary", y= "frequency") +
scale_x_discrete(expand = c(0,0)) + scale_y_discrete(expand = c(0,0)) +
coord_fixed(ratio= 0.5)
I have no idea how to to create this heat map in R. Can anyone please help me..
Kind regards
Heinrich
You can use DT and formattable package to make table with conditional colour formatting:
library(DT)
library(formattable)
xxx <- data.frame(rfm_score = c(111, 112, 113, 114, 115, 121),
n = c(2624L, 160L, 270L, 23L, 5L, 650L),
rec = c(1L, 1L, 1L, 1L, 1L, 1L),
freq = c(1L, 1L, 1L, 1L, 1L, 2L),
mon = c(1L, 2L, 3L, 4L, 5L, 1L))
xxx_dt <- formattable(
xxx,
list(
rfm_score = color_tile("pink", "light blue"),
n = color_tile("pink", "light blue"),
rec = color_tile("pink", "light blue"),
freq = color_tile("pink", "light blue"),
mon = color_tile("pink", "light blue")))
as.datatable(xxx_dt)
Output:

Errorbars in r of two groups ggplot2

I'd like to plot standard deviations of the mean(z)/mean(b) which are grouped by two factors $angle and $treatment:
z= Tracer angle treatment
60 0 S
51 0 S
56.415 15 X
56.410 15 X
b=Tracer angle treatment
21 0 S
15 0 S
16.415 15 X
26.410 15 X
So far I've calculated the mean for each variable based on angle and treatment:
aggmeanz <-aggregate(z$Tracer, list(angle=z$angle,treatment=z$treatment), FUN=mean)
aggmeanb <-aggregate(b$Tracer, list(angle=b$angle,treatment=b$treatment), FUN=mean)
It now looks like this:
aggmeanz
angle treatment x
1 0 S 0.09088021
2 30 S 0.18463353
3 60 S 0.08784315
4 80 S 0.09127198
5 90 S 0.12679296
6 0 X 2.68670392
7 15 X 0.50440692
8 30 X 0.83564470
9 60 X 0.52856956
10 80 X 0.63220093
11 90 X 1.70123025
But when I come to plot it, I can't quite get what I'm after
ggplot(aggmeanz, aes(x=aggmeanz$angle,y=aggmeanz$x/aggmeanb$x, colour=treatment)) +
geom_bar(position=position_dodge(), stat="identity") +
geom_errorbar(aes(ymin=0.1, ymax=1.15),
width=.2,
position=position_dodge(.9)) +
theme(panel.grid.minor = element_blank()) +
theme_bw()
EDIT:
dput(aggmeanz)
structure(list(time = structure(c(1L, 3L, 4L, 5L, 6L, 1L, 2L,
3L, 4L, 5L, 6L), .Label = c("0", "15", "30", "60", "80", "90"
), class = "factor"), treatment = structure(c(1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("S", "X"), class = "factor"),
x = c(56.0841582902523, 61.2014237854156, 42.9900742785269,
42.4688447229277, 41.3354173870287, 45.7164231791512, 55.3943182966382,
55.0574951462903, 48.1575625699563, 60.5527200655174, 45.8412287451211
)), .Names = c("time", "treatment", "x"), row.names = c(NA,
-11L), class = "data.frame")
> dput(aggmeanb)
structure(list(time = structure(c(1L, 3L, 4L, 5L, 6L, 1L, 2L,
3L, 4L, 5L, 6L), .Label = c("0", "15", "30", "60", "80", "90"
), class = "factor"), treatment = structure(c(1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("S", "X"), class = "factor"),
x = c(56.26325504249, 61.751655279608, 43.1687113436753,
43.4147408285209, 41.9113698082799, 46.2800894420131, 55.1550995335947,
54.7531592595068, 47.3280215294235, 62.4629068516043, 44.2590192583692
)), .Names = c("time", "treatment", "x"), row.names = c(NA,
-11L), class = "data.frame")
EDIT 2: I calculated the standard dev as follows:
aggstdevz <-aggregate(z$Tracer, list(angle=z$angle,treatment=z$treatment), FUN=std)
aggstdevb <-aggregate(b$Tracer, list(angle=b$angle,treatment=b$treatment), FUN=std)
Any thoughts would be much appreciated,
Cheers
As others have noted, you'll need to join the two dataframes together. There are also some little quirks in the dput data you showed, so I've renamed some columns to make sure that they join appropriately and match what you've attempted. NOTE: You'll need name the two means differently so that they don't get merged together or cause conflicts.
names(aggmeanb)[names(aggmeanb) == "x"] = "mean_b"
names(aggmeanb)[names(aggmeanb) == "time"] = "angle"
names(aggmeanz)[names(aggmeanz) == "x"] = "mean_z"
names(aggmeanz)[names(aggmeanz) == "time"] = "angle"
joined_data = join(aggmeanb, aggmeanz)
joined_data$divmean = joined_data$mean_b/joined_data$mean_z
> head(joined_data)
angle treatment mean_b mean_z divmean
1 0 S 56.26326 56.08416 1.003193
2 30 S 61.75166 61.20142 1.008991
3 60 S 43.16871 42.99007 1.004155
4 80 S 43.41474 42.46884 1.022273
5 90 S 41.91137 41.33542 1.013934
6 0 X 46.28009 45.71642 1.012330
ggplot(joined_data, aes(factor(angle), divmean)) +
geom_boxplot() +
theme(panel.grid.minor = element_blank()) +
theme_bw()
It might be that the data you've included is just a bit of your real data set, but as is there's only one data point per angle-treatment group. However, when you are using a fuller dataset, you can try something like:
ggplot(joined_data, aes(factor(angle), diffmean, group = treatment)) +
geom_boxplot() +
facet_grid(.~angle, scales = "free_x")
That will group the boxes by angle and then allow you to fill them by treatment.
Think about the problem in two steps:
create a data frame (say data) which contains all the information
you would like to visualize. In this case, this seems to be the two
factors (angle, treatment), the mean group differences (say dif)
and standard errors (say ste).
visualize this information.
Step 2) will be easy. This should probably produce something very similar to your sketch.
ggplot(data, aes(x=angle, y=dif, colour=treatment)) +
geom_point(position=position_dodge(0.1)) +
geom_errorbar(aes(ymin=dif-ste, ymax=dif+ste), width=.1, position=position_dodge(0.1)) +
theme_bw()
However, at this point, you do not provide enough information to get help with Step 1. Try to include code which produces your original data (or the type of data you have) instead of copy-pasting chunks of your data output or pasting the aggregated data which lacks standard errors.
Combining your two aggregated data frames and generating random numbers for standard error produces the graph below:
#I imported your two aggregated data frames from your dput output.
data <- cbind(aggmeanb, aggmeanz$x, rnorm(11))
names(data) <- c("angle", "treatment", "meanz", "meanb", "ste")
data$dif <- data$meanz - data$meanb

Ggmap-geompoint, how to make grouping?

Suppose I have this dataframe
latitude longitude category
42.39905 -72.93871 A
42.39905 -73.93871 B
43.37471 -73.36336 A
43.37471 -74.36336 B
44.28322 -74.31423 B
What I would like to do is to group the coordinates by its integer. Then for each group, I could create a bubble with a size function on the counts in a group.
The colour diverges from A to B, based on how many A than B. So far, I've been doing this,
map = get_map(location="jk",zoom=6,source="stamen")
#Plot the point
ggmap(map)+
geom_point(data=zipmap,
aes(x=round(longitude),y=round(latitude),colour=category))+
scale_color_brewer(type='div')
But as you would expect, the colour is not diverging, and the size of the bubble is not implemented. How could I achieve this? I can't use scale_x_continuous, as it already used somewhere in ggmap
Here is one direction to try.
dput(df)
structure(list(latitude = c(42.39905, 42.39905, 43.37471, 43.37471,
44.28322), longitude = c(-73, -74, -73, -74, -74), category = structure(c(1L,
2L, 1L, 2L, 2L), .Label = c("A", "B"), class = "factor"), latround = structure(c(1L,
1L, 2L, 2L, 3L), .Label = c("42", "43", "44"), class = "factor"),
longround = structure(c(2L, 1L, 2L, 1L, 1L), .Label = c("-74",
"-73"), class = "factor")), .Names = c("latitude", "longitude",
"category", "latround", "longround"), row.names = c(NA, -5L), class = "data.frame")
df$latround <- as.factor(round(df$latitude)) # round the coords
df$longround <- as.factor(round(df$longitude))
library(dplyr) # group by rounded coordinates and count the categories
df2 <- df %>% group_by(latround) %>% summarise(catnumber = n())
latround catnumber
1 42 2
2 43 2
3 44 1
library(ggmap)
From here you don't specify the location jk so I outlined an approach to plotting.
map <- get_map(location="jk",zoom=6,source="stamen")
#Plot the point
ggmap(map)+
geom_point(df2, aes(x=longround),y=latround), size = catnumber, colour=catnumber))+
scale_color_brewer(type='div') # more is needed in the ggmap code

Resources