Graphing 3 Variable Scatterplot R - r

I imported some data from Islander and am trying to graph something with 3 variables. I'm thinking of trying to graph 2 numeric variables with a nominal category (gender). The plot I'm trying to do therefore is a regular scatterplot, but color-coded.
I looked at this starter tutorial on R: Scatterplots, but didn't see any mention of 3 variable plotting.
http://www.laptopmag.com/articles/ssd-upgrade-tutorial
Can anyone help me out? My variables hold values pertaining to number of balls bounced, minutes of physical activity per week, and gender.
Picture of the data:
Data

Since gender is a binary variable (usually, otherwise ternary), I would plot a 2D scatterplot with color encoding the gender.
Dummy data:
a = data.frame(x=runif(100), y = runif(100)+2, group = round(runif(100))+1 )
Now I would plot y against x using a$group to select the color:
plot(a$y, a$x, pch = 16, col = c('cornflowerblue', 'springgreen')[a$group])
Output:
If you do have missingness I would add a third group to the color vector.
Here is a bunch of other solutions for 2D scatter with color

Related

R - 2 covariate explanatory variables and a third categorical variable I need to colour code the data with

In R, I am using the command plot(Strength, Weight, col= Area) to plot a scatterplot, with Weight as the explanatory numerical variable, and Area as the categorical explanatory variable, and Strength as the response.
There are, say, 6 areas, 1-6, but how can I tell which colour is associated with which area?
The scatterplot is coming out fine, but I can't tell which area the 6 colours on the scatterplot belong to.
You need to add a legend to your plot, see for instance https://www.geeksforgeeks.org/add-legend-to-plot-in-r/
But it will be easier to use the package ggplot2, which makes a legend for you, automatically. Something like, assuming your variables are in data frame yourdata :
library(ggplot2)
ggplot(yourdata, aes(Strength, Weight, color= Area)) +
geom_point()
Learning ggplot2 (gg is "grammar of graphics") will save you time in the long run!

visualize relationship between categorical variable and frequency of other variable in one graph?

how in R, should I have a histogram with a categorical variable in x-axis and
the frequency of a continuous variable on the y axis?
is this correct?
There are a couple of ways one could interpret "one graph" in the title of the question. That said, using the ggplot2 package, there are at least a couple of ways to render histograms with by groups on a single page of results.
First, we'll create data frame that contains a normally distributed random variable with a mean of 100 and a standard deviation of 20. We also include a group variable that has one of four values, A, B, C, or D.
set.seed(950141237) # for reproducibility of results
df <- data.frame(group = rep(c("A","B","C","D"),200),
y_value = rnorm(800,mean=100,sd = 20))
The resulting data frame has 800 rows of randomly generated values from a normal distribution, assigned into 4 groups of 200 observations.
Next, we will render this in ggplot2::ggplot() as a histogram, where the color of the bars is based on the value of group.
ggplot(data = df,aes(x = y_value, fill = group)) + geom_histogram()
...and the resulting chart looks like this:
In this style of histogram the values from each group are stacked atop each other(i.e. the frequency of group A is added to B, etc. before rendering the chart), which might not be what the original poster intended.
We can verify the "stacking" behavior by removing the fill = group argument from aes().
# verify the stacking behavior
ggplot(data = df,aes(x = y_value)) + geom_histogram()
...and the output, which looks just like the first chart, but drawn in a single color.
Another way to render the data is to use group with facet_wrap(), where each distribution appears in a different facet on one chart.
ggplot(data = df,aes(x = y_value)) + geom_histogram() + facet_wrap(~group)
The resulting chart looks like this:
The facet approach makes it easier to see differences in frequency of y values between the groups.

How to color a plot with ggplot2 according to 2 variables?

I want to plot change in weight over time for 9 different groups and I have colored the plot adding color=Group in ggplot.
Now I have another column with information about which box was each animal located in(no of boxes=3) and I want to add this information to the plot too.
Ideally, I would like to see the lines colored by different shades of 3 colors. How can I do this in R?
bcs_avg <- read_excel("C:/BCS1.xlsx",sheet = 2,col_names = TRUE)
pl <- ggplot(bcs_avg,aes(x=Month, y=Average,color=factor(Group)))+
geom_line()

Plot Line Chart of Binary Variable Against Continuous Data

I am looking for a way to help better visualize the relationship between a independent continuous variable and a binary response variable.
I am trying to understand how I can add a 2nd y axis to the existing plot I have below. I want to get a sense of the response rate over different numerical ranges visually.
How can I add in the response percent at any given histogram bin? For example if there were 10 observations in a bin and 2 were the positive class, then this would show a response of 20%.
Ideally it's possible that this would be dynamic in that I might change the # of bins. For instance, I have 10 here, I might want 20 the next time.
This would be a connected line-chart with the corresponding percentages from #1 on the right y axis.
Or in other words, I want a line chart of the positive class to be displayed as a line chart with % show in Y axis.
library(mlbench)
library(tidyverse)
data(Sonar) ## from mlbench
library(ggplot2)
ggplot(Sonar, aes(x=V11, fill=Class)) +
geom_histogram(col='black', bins = 10) +
scale_fill_manual(values=c("purple", "green")) +
labs(title = "Count Left Y Axis; 'R' class percent of BIN in Right Y Axis" ,
x = 'Variable Value in this case V33', y ='Count of Observations' )
Not sure if this is what you are after but the description you gave sounded very similar to a conditional density plot.
ggplot probably has an alternative to this, but with base R:
cdplot(Class ~ V1, Sonar, col=c("cornflowerblue", "orange"), main="Conditional density plot")
And the result:

R: Plot interaction between categorial Factor and continuous Variable on DV

What I have is a 3-Levels Repeated Measures Factor and a continuous variable (Scores in psychological questionnaire, measured only once pre-experiment, NEO), which showed significant interaction together in a Linear Mixed Effects Model with a Dependent Variable (DV; State-Scores measured at each time level, IAS).
To see the nature of this interaction, I would like to create a plot with time levels on X-Axis, State-Score on Y-Axis and multiple curves for the continuous variable, similar to this. The continuous variable should be categorized in, say quartiles (so I get 4 different curves), which is exactly what I can't achieve. Until now I get a separate curve for each value in the continuous variable.
My goal is also comparable to this, but I need the categorial (time) variable not as separate curves but on the X-Axis.
I tried out a lot with different plot functions in R but did'nt manage to get what I want, maybe because I am not so skilled in dealing with R.
F. e.
gplot(Data_long, aes(x = time, y = IAS, colour = NEO, group = NEO)) +
geom_line()
from the first link shows me dozens of curves (one for each value in the measurement NEO) and I can't find how to group continuous variables in a meaningful way in that gplot function.
Edit:
Original Data:
http://www.pastebin.ca/2598926
(I hope it is not too inconvenient.)
This object (Data_long) was created/converted with the following line:
Data_long <- transform(Data_long0, neo.binned=cut(NEO,c(25,38,46,55,73),labels=c(".25",".50",".75","1.00")))
Every value in the neo.binned col seems to be set correctly with enough cases per quantile.
What I then tried and didn't work:
ggplot(Data_long, aes(x = time, y = ias, color = neo.binned)) + stat_summary(fun.y="median",geom="line")
geom_path: Each group consist of only one observation. Do you need to adjust the group >aesthetic?
I got 92 subjects and values for NEO between 26-73. Any hints what to enter for cut and labels function? Quantiles are 0% 25% 50% 75% 100% 26 38 46 55 73.
Do you mean something like this? Here, your data is binned according to NEO into three classes, and then the median of IAS over these bins is drawn. Check out ?cut.
Data_long <- transform(Data_long, neo.binned=cut(NEO,c(0,3,7,10),labels=c("lo","med","hi")))
Plot everything in one plot.
ggplot(Data_long, aes(x = time, y = IAS, color = neo.binned))
+ stat_summary(aes(group=neo.binned),fun.y="median",geom="line")
And stealing from CMichael's answer you can do it all in multiple (somehow you linked to facetted plots in your question):
ggplot(Data_long,aes(x=time,y=IAS))
+ stat_summary(fun.y="median",geom="line")
+ facet_grid(neo.binned ~ .)
Do you mean facetting #ziggystar initial Plot?
quantiles = quantile(Data_long$NEO,c(0.25,0.5,0.75))
Data_long$NEOQuantile = ifelse(Data_long$NEO<=quantiles[1],"first NEO Quantile",
ifelse(Data_long$NEO<=quantiles[2],
"second NEO Quantile",
ifelse(Data_long$NEO<=quantiles[3],
"third NEO Quantile","forth NEO Quantile")))
require(ggplot2)
p = ggplot(Data_long,aes(x=time,y=IAS)) + stat_quantile(quantiles=c(1),formula=y ~ x)
p = p + facet_grid(.~NEOQuantile)
p

Resources