I have this dataframe that I'm trying to make a vertical line on an x-axis that is categorical.
data <- data.frame(
condition = c('1', '1', '1', '1', '1', '2', '2', '2', '2', '2', '3', '3', '3', '3', '3'),
AssessmentGrade = c('400', '410', '420', '430', '440', '500', '510', '520', '530', '540',
'300', '310', '320', '330', '340'),
Freq = c('1', '2', '1', '5', '7', '9', '1', '5', '3', '4', '5', '8', '1', '3', '5'),
MathGrade = c('A+', 'B-', 'C-', 'D', 'F', 'A-', 'B', 'C+', 'D-', 'F', 'A+', 'D', 'D', 'F', 'C'),
Condition = c('Condition 1', 'Condition 1', 'Condition 1', 'Condition 1', 'Condition 1',
'Condition 2', 'Condition 2', 'Condition 2', 'Condition 2', 'Condition 2',
'Condition 3', 'Condition 3', 'Condition 3', 'Condition 3', 'Condition 3'))
I tried adding a field to make grade numeric and that helped
data$Gradenum <- as.numeric(data$MathGrade)
I used ggplot to get abubble graph but I was wondering how I would edit it to use my company's standard colors
p <- ggplot(data, aes(x = MathGrade, y = AssessmentGrade, size = Freq, fill = Condition)) +
geom_point(aes(colour = Condition)) +
ggtitle("Main Title") +
labs(x = "First Math Grade", y = "Math Assessment Score")
How can I get a vertical line between C+ and D? I see a lot of information out there if your x axis is a date but not for other categorical values
Hardcoded solutions are error-prone
MrSnake's solution works - but only for the given data set because the value of 7.5 is hardcoded.
It will fail with just a minor change to the data, e.g., by replacing grade "A+" in row 1 of data by an "A".
Using the hardcoded xintercept of 7.5
p + geom_vline(xintercept = 7.5)
draws the line between grades C- and C+ instead of C+ and D:
This can be solved using ordered factors. But first note that the chart contains another flaw: The grades on the x-axis are ordered alphabetically
A, A-, A+, B, B-, C, C-, C+, D, D-, F
where I would have expected
A+, A, A-, B, B-, C+, C, C-, D, D-, F
Fixing the x-axis
This can be fixed by turning MathGrade into an ordered factor with levels in a given order:
grades <- c(as.vector(t(outer(LETTERS[1:4], c("+", "", "-"), paste0))), "F")
grades
[1] "A+" "A" "A-" "B+" "B" "B-" "C+" "C" "C-" "D+" "D" "D-" "F"
data$MathGrade <- ordered(data$MathGrade, levels = grades)
factor()would be sufficient to plot a properly ordered x-axis but we need an ordered factor for the next step, the correct placement of the vertical line.
Programmatically placing the vertical line
Let's suppose that the vertical line should be drawn between grades C- and D+. However, it may happen that either or both grades are missing from the data. Missing factors won't be plotted. In the sample data set, there are no data with grade D+, so the vertical line should be plotted between grades C- and D.
So, we need to look for the lowest grade equal or greater D+ and the highest grade equal or less than C- in the data set:
upper <- as.character(min(data$MathGrade[data$MathGrade >= "D+"]))
lower <- as.character(max(data$MathGrade[data$MathGrade <= "C-"]))
These are the grades in the actual data set where the vertical line is to be plotted between:
xintercpt <- mean(which(levels(droplevels(data$MathGrade)) %in% c(lower, upper)))
p + geom_vline(xintercept = xintercpt)
Just add geom_vline ;)
p + geom_vline(xintercept = 7.5)
For changing the colors as to fit your company scheme, you can add something like:
+ scale_color_manual(values = c('Condition 1' = 'grey20',
'Condition 2' = 'darkred',
'Condition 3' = 'blue'))
Related
I am using the R programming language. I am trying to follow this tutorial over here: http://www.semspirit.com/artificial-intelligence/machine-learning/regression/support-vector-regression/support-vector-regression-in-r/
For the famous Iris dataset, I am trying to plot the 3D decision surface for the random forest algorithm (using tsne dimensions):
library(Rtsne)
library(dplyr)
library(ggplot2)
library(plotly)
library(caret)
library(randomForest)
#data
a = iris
a <- unique(a)
#create two species just to make things easier
s <- c("a","b")
species<- sample(s , 149, replace=TRUE, prob=c(0.3, 0.7))
a$species = species
a$species = as.factor(a$species)
#split data into train/test, and then random forest
index = createDataPartition(a$species, p=0.7, list = FALSE)
train = a[index,]
test = a[-index,]
rf = randomForest(species ~ ., data=train, ntree=50, mtry=2)
#have the model predict the test set
pred = predict(rf, test, type = "prob")
labels = as.factor(ifelse(pred[,2]>0.5, "a", "b"))
confusionMatrix(labels, test$species)
#tsne algorithm
tsne_obj_3 <- Rtsne(test[,-5], perplexity=1, dims=3)
df_m2 <- as.data.frame(tsne_obj_3$Y)
df_m2$labels = test$species
From here, I am trying to plot the 3d decision surface (http://www.semspirit.com/artificial-intelligence/machine-learning/regression/support-vector-regression/support-vector-regression-in-r/) :
axis_1 = df_m2$V1
axis_2 = df_m2$V2
axis_3 = df_m2$V3
plot_ly(x=as.vector(axis_1),y=as.vector(axis_2),z=axis_3, type="scatter3d", mode="markers", name = "Obs", marker = list(size = 3)) %>%
add_trace(x=as.vector(axis_1),y=as.vector(axis_2),z=df_m2$labels, type = "mesh3d", name = "Preds")
But I am getting the following error:
2: In RColorBrewer::brewer.pal(N, "Set2") :
minimal value for n is 3, returning requested palette with 3 different levels
3: 'mesh3d' objects don't have these attributes: 'mode', 'marker'
Valid attributes include:
'type', 'visible', 'legendgroup', 'name', 'uid', 'ids', 'customdata', 'meta', 'hoverlabel', 'stream', 'uirevision', 'x', 'y', 'z', 'i', 'j', 'k', 'text', 'hovertext', 'hovertemplate', 'delaunayaxis', 'alphahull', 'intensity', 'intensitymode', 'color', 'vertexcolor', 'facecolor', 'cauto', 'cmin', 'cmax', 'cmid', 'colorscale', 'autocolorscale', 'reversescale', 'showscale', 'colorbar', 'coloraxis', 'opacity', 'flatshading', 'contour', 'lightposition', 'lighting', 'hoverinfo', 'showlegend', 'xcalendar', 'ycalendar', 'zcalendar', 'scene', 'idssrc', 'customdatasrc', 'metasrc', 'xsrc', 'ysrc', 'zsrc', 'isrc', 'jsrc', 'ksrc', 'textsrc', 'hovertextsrc', 'hovertemplatesrc', 'intensitysrc', 'vertexcolorsrc', 'facecolorsrc', 'hoverinfosrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule', '_bbox'
A 3D plot is produced, but the 3D plane is completely gone.
Can someone please tell me what I am doing wrong?
I am trying to make so that when you move your mouse over each point, for that point it will display the value of a$Sepal.Length, a$Sepal.Width, a$Petal.Length, a$Petal.Width, a$Species
Thanks
When you called add_trace(), z is not assigned correctly. The labels won't plot; you need to plot the probabilities you identified, z=df_m2$pred.
There are multiple ways to fix the issues with the mesh plot, but the easiest would be to use add_mesh instead of add_trace.
plot_ly(x=as.vector(axis_1),
y=as.vector(axis_2),
z=axis_3,
type="scatter3d",
mode="markers",
name = "Obs",
marker = list(size = 3)) %>%
add_mesh(x=as.vector(axis_1),
y=as.vector(axis_2),
z=df_m2$pred,
type = "mesh3d",
name = "Preds")
I'm trying to manipulate the attributes of a subset of edges in an igraph object in R using edge_attr (or alternatively, set_edge_attr) based on certain criteria. For example, in the code below, I'm trying to double the age attribute of edges with weight = 1.
nodes <- data.frame(name=c('1', '4', '5', '6', '8'))
edges <- data.frame(
from = c('1', '4', '5', '1', '8', '1'),
to = c('4', '5', '6', '8', '6', '6'),
weight = c(1, 1, 1.5, 1.5, 2.5, 5),
age=c(48, 33, 45, 34, 21, 56)
)
graph = graph_from_data_frame(d = edges, vertices = nodes, directed=FALSE)
edgeseq = E(graph)[[weight==1]]
newage <- edge_attr(graph, "age", index = edgeseq)*2
edge_attr(graph, "age", edgeseq) <- newage
#Alternatively:
set_edge_attr(graph, "age", edgeseq, newage)
However, this throws an error:
Error in `[[<-`(`*tmp*`, index, value = value) :
attempt to select more than one element in vectorIndex
The error does not occur when I set the attributes without an edge sequence.
Any help would be much appreciated!
The proper way to select edges is:
edgeseq = E(graph)[weight==1]
Note the difference:
E(graph)[weight==1]
+ 2/6 edges from 33d7121 (vertex names):
[1] 1--4 4--5
E(graph)[[weight==1]]
+ 2/6 edges from 33d7121 (vertex names):
tail head tid hid weight age
1 1 4 1 2 1 96
2 4 5 2 3 1 66
I am trying to use a theme to conditionally set the element text based on an attribute ActivitySort. If the value is 0, I would like to bold the text otherwise I would like it stay plain.
require("tidyverse")
task0 <- c('Strategy 1', 'Strategy 1', '2017-04-01', '2020-04-01',0, "Strategy")
task1 <- c('Strategy 1', 'Collect data', '2017-04-01', '2018-04-01',1, "In Progress")
task2 <- c('Strategy 1', 'Clean data', '2018-04-01', '2018-06-01', 1, "Completed")
task3 <- c('Strategy 1', 'Analyse data', '2018-06-01', '2019-04-01',1, "Discontinued")
task10 <- c('Strategy 2', 'Strategy 2', '2017-04-01', '2020-04-01',0, "Strategy")
task11 <- c('Strategy 2', 'Collect data again', '2017-04-01', '2018-04-01',1, "In Progress")
task12 <- c('Strategy 2', 'Clean data again', '2018-04-01', '2018-06-01', 1, "Completed")
task13 <- c('Strategy 2', 'Analyse data again', '2018-06-01', '2019-04-01',1, "Discontinued")
task14 <- c('Strategy 2', 'Write report again', '2019-04-01', '2020-04-01', 1, "Planned")
dataset <- as.data.frame(rbind(task0, task1, task2, task3,task10, task11, task12, task13, task14))
names(dataset) <- c('StrategyName', 'Activity', 'Start', 'End', 'ActivitySort', "Status")
dataset <- as_tibble(dataset)
dataset <- dataset %>% mutate(StartSort = as.Date(Start, "%Y-%m-%d" ))
dataset <- dataset %>% arrange(desc(StrategyName), desc(ActivitySort), desc(StartSort),Activity, End)
acts <- c("Strategy", "Completed","In Progress", "Discontinued","Planned")
actcols <- c("#000000","#548235", "#2E75B6", "#BF9000", "#7030A0")
els <-unique(dataset$Activity)
g.gantt <- gather(dataset, "state", "date", 3:4) %>% mutate(date = as.Date(date, "%Y-%m-%d" ), Status=factor(Status, acts[length(acts):1]), Activity=factor(Activity, els))
plot <- ggplot(g.gantt, aes(x = date, y = Activity, color = Status, group=Activity)) +
geom_line(size = 5) +
scale_color_manual(values=actcols, name="Status", breaks = acts, limits = acts) +
labs(x="Project year", y=NULL, title="Activity timeline")
plot <- plot + facet_grid(rows = vars(StrategyName), scales="free")
plot <- plot + theme(axis.text.y= element_text(face=ifelse((dataset$ActivitySort == 0),"bold","plain")))
plot
The code currently bolds the text using ifelse, but the results are not as expected. I am wanting only strategies, items with the black lines and status of strategy to be bolded. Note that currently only the second strategy is bolded where the expectation is for both strategies to be bold.
Your problem lies in the fact that you're splitting your data over two facets and you're setting your y axes to free. This means they are not the same. Through ifelse statement, you're just passing on 9 TRUE or FALSE values that are pertaining to an outside data set, i.e. your upper graph gets 9 values (out of which first 4 are FALSE and hence nothing is in bold face), while your lower graph also gets the same 9 values and since the 5th one is TRUE, Strategy 2 is in bold.
You can easily check this by removing the scales="free" part from facet_grid. You should get something like this:
So you'll have to use a custom function that will bold each label based on its content.
I've modified the code from here.
bold_labels <- function(breaks) {
strategy <- filter(dataset, Activity %in% breaks) %>%
mutate(check = str_detect(Activity, "Strategy")) %>%
pull(check)
labels <- purrr::map2(
breaks, strategy,
~ if (.y) bquote(bold(.(.x))) else bquote(plain(.(.x)))
)
parse(text = labels)
}
plot <- ggplot(g.gantt, aes(x = date, y = Activity, color = Status, group=Activity)) +
geom_line(size = 5) +
scale_color_manual(values=actcols, name="Status", breaks = acts, limits = acts) +
labs(x="Project year", y=NULL, title="Activity timeline") +
scale_y_discrete(labels = bold_labels) +
facet_grid(rows = vars(StrategyName), scales="free_y")
plot
The result:
I convert factor to numeric in my dataset as below
library(dplyr)
df = data.frame(level= c( 'low', 'medium', 'high', 'very high'))
df$level = as.numeric(revalue(df$level, c('low' = 1, 'medium' =2, 'high'= 3, 'very high'=4)))
df
It's ok.
The problem arises when I try to apply this rule for new dataset (I traned the model & and want to predict a new data)
newdude = data.frame(level = c( 'high'))
newdude$level = as.numeric(revalue(newdude$level, c('low' = 1, 'medium' =2, 'high'= 3, 'very high'=4)))
Error
The following `from` values were not present in `x`: low, medium, very high
> newdude
level
1 1
I get '1' instead of '3'
I can not make for ample
newdude$level = as.numeric(revalue(newdude$level, c( 'high'= 3)))
because I can not know in advance what value it will take
How to fix it?
Try instead
newdude = data.frame(level = factor('high', levels = c('low', 'medium', 'high', 'very high')))
newdude$level
[1] high
Levels: low medium high very high
as.numeric(newdude$level)
[1] 3
I recently switched from ggplot to Rcharts and have a fairly simple question about the labels.
Sample data
data_1 <- data.table(Filter = c('Filter 1', 'Filter 2'),
Amount = c(100, 50))
data_2 <- data.table(Filter = c('Filter 1'),
Amount = c(100))
Plots
hPlot(Amount ~ Filter, data = data_1, type = 'bar', group.na = 'NA\'s')
hPlot(Amount ~ Filter, data = data_2, type = 'bar', group.na = 'NA\'s')
Question:
Why do we see the correct label in the first plot, but only the first letter of the label in the second plot? This issue always occurs when the number of rows = 1 (as it is in data_2).
Does anyone has a quick fix / workaround?