R: Order of points and lines within geom in ggplot2 - r

I am trying to plot a dataframe in ggplot and am having trouble getting the points and lines to display in the desired order.
The data is split based on the same column (of factors 0 or 1) and I want 0 to plot over 1 for both lines and points (which use data from 4 other separate columns).
I have made a test data frame below to illustrate my point. My real dataframe has thousands of points, and I want to plot a number of dataframes so don't really want to use a work around like subsetting my data and plotting as separate layers/geoms.
testdata <- data.frame(Split = c(rep(0,5), rep(1,5)), a = rep(1:5,2),
b = c(7,8,9,10,11,6,8,9,10,12), x = c(1:5, 1:5), y = c(1:3,5,6,1.1,2.1,4.1,5.1,7.1))
testdata$Split <- factor(testdata$Split)
ggplot(data = testdata)+
geom_point(aes(x = x, y = y, colour = Split), size = 4)+
geom_line(aes(x = a, y = b, colour = Split))
testdata$Split <- ordered(testdata$Split, levels = rev(levels(testdata$Split)))
When i run the line of code to reverse the order of my levels, it swaps which of my lines is brought to the front, but not which set of points. So initially both the points and line relating to Split = 0 are behind, however when I reverse the order the line from Split = 0 is infront (what I want) but the points for Split = 0 remain behind the points for Split = 1.
Any idea what's going on here and how I can get this to work would be appreciated.
Thanks

After investigating the situation for some time, this is what I found and suggest. In short, I believe that the solution is to assign the value 2 to 0 in unclass().
foo <- data.frame(split = rep(c("0", "1"), each = 5),
a = rep(1:5,2),
b = c(7,8,9,10,11,6,8,9,10,12),
x = c(1:5, 1:5),
y = c(1:3,5,6,1.1,2.1,4.1,5.1,7.1),
stringsAsFactors=F)
In order to assign 2 in unclass() to 0 in split, I did the following.
foo <- arrange(foo, desc(split))
foo$split <- as.factor(foo$split)
#> str(foo)
#'data.frame': 10 obs. of 5 variables:
# $ split: Factor w/ 2 levels "0","1": 2 2 2 2 2 1 1 1 1 1
# $ a : int 1 2 3 4 5 1 2 3 4 5
# $ b : num 6 8 9 10 12 7 8 9 10 11
# $ x : int 1 2 3 4 5 1 2 3 4 5
# $ y : num 1.1 2.1 4.1 5.1 7.1 1 2 3 5 6
Once again, 0 has 2 in unclass().
#> unclass(foo$split)
# [1] 2 2 2 2 2 1 1 1 1 1
#attr(,"levels")
#[1] "0" "1"
Now I run the following. q (for points) has the ideal outcome. But q2 (for lines) does not.
q <- ggplot(data = foo, aes(x = x, y = y, colour = split))+
geom_point(size = 6)
q2 <- ggplot(data = foo, aes(x = a, y = b, colour = split))+
geom_line()
So, I reversed the factor order and see what happens.
### Reorder the factor levels.
foo$split <- ordered(foo$split, rev(levels(foo$split)))
#> str(foo)
#'data.frame': 10 obs. of 5 variables:
#$ split: Ord.factor w/ 2 levels "1"<"0": 1 1 1 1 1 2 2 2 2 2
#$ a : int 1 2 3 4 5 1 2 3 4 5
#$ b : num 6 8 9 10 12 7 8 9 10 11
#$ x : int 1 2 3 4 5 1 2 3 4 5
#$ y : num 1.1 2.1 4.1 5.1 7.1 1 2 3 5 6
#> unclass(foo$split)
#[1] 1 1 1 1 1 2 2 2 2 2
#attr(,"levels")
#[1] "1" "0"
Both q3 and q4 got the correct outcomes.
q3 <- ggplot(data = foo, aes(x = x, y = y, colour = split))+
geom_point(size = 6)
q4 <- ggplot(data = foo, aes(x = a, y = b, colour = split))+
geom_line()
So, Here is the final form.
ggplot(data = foo)+
geom_point(aes(x = x, y = y, colour = split), size = 6)+
geom_line(aes(x = a, y = b, colour = split))

Related

Is there a methodology to assign integer values to factors in R

I am quite new to R, but was wondering if there is a specific way to group/analyze integer values from my data frame i.e.,
Sample X : int 1 2 3 4 5
Sample Y : int 6 7 8 9 10
Sample Z : int 11 12 13 14 15
and assign these to my factor variable which has the corresponding number of levels (5 in this example) which are called in this example lvl 1, lvl 2, lvl 3, lvl 4, lvl 5. The goal is to be able to graph the observations at each level, for example lvl 1 had the observations 1, 6, and 11/ lvl 2 had 2, 7, and 12, etc.
I've found no clean way to do this. Other attempts have including individually typing out the name of each sample and manually linking this to the factor levels, but that has not gone well.
Any advice would be appreciated!
If I understood correctly, you want to have each x, y and z observations associated with a level and plot by level.
library(ggplot2)
library(reshape2)
df = data.frame(x = 1:5, y = 6:10, z = 11:15)
df$level = factor(paste0("lvl",1:5))
df
df
# x y z level
# 1 1 6 11 lvl1
# 2 2 7 12 lvl2
# 3 3 8 13 lvl3
# 4 4 9 14 lvl4
# 5 5 10 15 lvl5
It's easier to use long formatted data for plot (with ggplot2 package). I use reshape2::melt here but you could find equivalent solution with tidyr::pivot_long
df <- reshape2::melt(df, id.vars = "level")
df
level variable value
1 lvl1 x 1
2 lvl2 x 2
3 lvl3 x 3
4 lvl4 x 4
5 lvl5 x 5
6 lvl1 y 6
7 lvl2 y 7
8 lvl3 y 8
9 lvl4 y 9
10 lvl5 y 10
11 lvl1 z 11
12 lvl2 z 12
13 lvl3 z 13
14 lvl4 z 14
15 lvl5 z 15
Finally, you can plot. Let's say you want points for each level:
ggplot(df, aes(x = level, y = value)) + geom_point()

Creating many line plots using positions as function as time

I am trying to make a single plot of the trajectory of many particles from a Brownian Motion experiment.
There are five measurements for each particle, a total of 10, for the x and y components of position.
I have the data in multiple data structures, as I am unaware of which is most useful for the end I aim to achieve.
1. All within a single data frame, with my 5 time measurements in x for the 16 particles measured, followed by the 16 for the y component.
Single data frame
In two separate dataframes, one for the x-component and one for the y.
I have tried to use rbind to create a single array that I can use geom_line() but this means I have one single line where each particle trajectory is connected to one another.
How could I go about making these different lines, all within one x-y plane. Thanks
The easiest way to achieve this is to have 3 columns, one for the common x component, one for the y, and one for the particle. To get this you'll need to convert your data to long format:
> df <- data.frame(t=c(1,2,3,4,5), x.1 = c(-1,1,3,4,5), x.2 = c(5,2,1,4,6))
> df
t x.1 x.2
1 1 -1 5
2 2 1 2
3 3 3 1
4 4 4 4
5 5 5 6
> (df <- tidyr::gather(df, "particle", "y", -t))
t particle y
1 1 x.1 -1
2 2 x.1 1
3 3 x.1 3
4 4 x.1 4
5 5 x.1 5
6 1 x.2 5
7 2 x.2 2
8 3 x.2 1
9 4 x.2 4
10 5 x.2 6
Then, use the group parameter to geom_line to plot them separately:
ggplot(df, aes(x = t, y = y)) + geom_line(aes(group = particle, color = particle))
First you have to have your data in this format
data <- data.table(particle = as.factor(rep(1:3, each = 5)),
x = sample(-10:10, 15, replace = TRUE),
y = sample(-10:10, 15, replace = TRUE))
data
particle x y
1: 1 -8 -4
2: 1 -5 -2
3: 1 -1 -5
4: 1 -3 9
5: 1 4 -7
6: 2 2 1
7: 2 -8 -10
8: 2 -4 -8
9: 2 -6 -4
10: 2 -8 -3
11: 3 -10 10
12: 3 6 -5
13: 3 -5 -6
14: 3 -6 8
15: 3 1 -4
One column for identifying the particle and the other for the position in coordinates.
This link might help you changing your data: http://www.cookbook-r.com/Manipulating_data/Converting_data_between_wide_and_long_format/
Then just plot grouping by particle (using color aes)
ggplot(data = data,
aes(x = x, y = y, color = particle)) +
geom_path(size = 3)
If you want to change the order of the path, just add a column of time and sort the df by that column.

Function in R that creates dummy variables if a condition is met

I am looking to create a function that will convert any factor variable with more than 4 levels into a dummy variable. The dataset has ~2311 columns, so I would really need to create a function. Your help would be immensely appreciated.
I have compiled the code below and was hoping to get it to work.
library(dummies)
# example function
for(i in names(Final_Dataset)){
if(count (Final_Dataset[i])>4){
y <- Final_Dataset[i]
Final_Dataset <- cbind(Final_Dataset, dummy(y, sep = "_"))
}
}
I was also considering an alternative approach where I would get all the number of columns that need to be dummied and then loop through all the columns and if the column number is in that array then create dummy variables out of the variable.
Example data
fct = data.frame(a = as.factor(letters[1:10]), b = 1:10, c = as.factor(sample(letters[1:4], 10, replace = T)), d = as.factor(letters[10:19]))
str(fct)
'data.frame': 10 obs. of 4 variables:
$ a: Factor w/ 10 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10
$ b: int 1 2 3 4 5 6 7 8 9 10
$ c: Factor w/ 4 levels "a","b","c","d": 2 4 1 3 1 1 2 3 1 2
$ d: Factor w/ 10 levels "j","k","l","m",..: 1 2 3 4 5 6 7 8 9 10
# keep columns with more than 4 factors
fact_cols = sapply(fct, function(x) is.factor(x) && length(levels(x)) > 4)
# create dummy variables for subset (omit intercept)
dummy_cols = model.matrix(~. -1, fct[, fact_cols])
# cbind new data
out_df = cbind(fct[, !fact_cols], dummy_cols)
You could get all the columns with more than a given number of levels (n = 4) with something like
which(sapply(Final_Dataset, function (c) length(levels(c)) > n))

Rearranging Numeric Axis

I'm trying to plot data and running into an issue with a numeric axis. It should be plotted in order:
1, 2, 3, 4, 5... 22, X, Y
Instead it's plotting like this:
1, 10, 11, 12... 2, 22, 3, 4..., X, Y
I've tried changing the column in question with as.character, as.factor, as.numeric. I've also checked out a few "rearrange" suggestions, but they all deal with the observations themselves, and not the axis.
What am I overlooking?
Here is a sample of the data:
Chr Chunk A B C
1 1 3 4 4
1 2 3 4 4
1 3 3 2 4
1 4 3 4 9
2 1 3 3 4
2 2 3 4 4
2 3 3 4 4
10 1 3 4 4
10 2 3 4 4
X 1 3 4 5
X 2 3 4 8
Y 1 3 4 5
I'm attempting to make a series of heat plots using ggplot:
heat <- ggplot(data, aes(Chr, Chunk, fill = A, label = sprintf("", A))) + geom_tile() + geom_text() + scale_fill_gradient2(high = "red")
Since you’re dealing with character data, ggplot will simply sort your data for plotting (and character strings are lexicographically ordered, such that '10' comes before '2'). If you want to influence the order, convert your character to an ordered factor. Unfortunately this requires actually providing the order manually (but in your case that order isn’t too hard to write down):
data$Chr = factor(data$Chr, levels = c(1 : 22, 'X', 'Y'), ordered = TRUE)

Apply function to all possible values of a variable

I would like to get as many plots as factors/values in a variable.
For example, I would like to plot the following variables (v1, v2, v3, v4, v5, v6, v7, v8) that I have defined as a scale for all possible values on the variable country. So i get, in that case, a total of three different plots.
I know how to plot it separately, for example in this cases I would have used the following:
basicgraph(Data[country==1, scale1] )
basicgraph(Data[country==2, scale1] )
basicgraph(Data[country==3, scale1] )
I would like my function to plot as many graphs as factors/values (without specifying the number of factors/values). I have tried with "apply" but i can't really make it work, so any clue could be good for me.
I have a dataset that looks like:
v1 v2 v3 v4 v5 v6 v7 v8 country
1 NA NA NA NA NA NA NA NA 1
2 5 5 5 5 5 4 5 5 2
3 4 5 3 5 4 5 5 5 3
4 5 5 5 4 2 4 4 5 1
5 4 3 5 4 4 5 4 5 2
6 5 5 5 2 3 4 3 5 3
7 NA NA NA NA NA NA NA NA 1
8 3 5 5 5 4 5 4 4 2
9 4 5 5 4 5 5 4 5 3
10 2 4 4 5 4 5 4 5 1
11 4 5 5 3 4 4 4 5 2
12 4 5 4 4 5 4 4 5 3
13 5 5 4 3 3 5 5 5 1
14 3 5 1 2 3 1 4 5 2
Ihave defined the scale as:
scale1 <- names(Data) %in% c( "v1", "v2", "v3", "v4", "v5", "v6", "v7", "v8")
I have defined a plot function by:
basicgraph<-function(df, title, lab)
{
for(i in 1:length(df))
{
y <- melt(df)
z <- with(y, as.data.frame(table(variable, value, exclude = NULL)))
z <- z[!is.na(z$variable), ]
z$scale <- z$variable
levelss<-levels(z$variable)
}
theme_nogrid <- function (base_size = 12, base_family = "")
{
theme_bw(base_size = base_size, base_family = base_family) %+replace%
theme(panel.grid = element_blank()) +
theme(axis.text.x =element_text(size = base_size * 0.8 , lineheight = 0.9,
vjust = 0.5, hjust=1, angle=90))
}
plot1<-function(z) {
ggplot(data = z, aes(x = variable, y = value, size = Freq))+
geom_point(aes(size = Freq, stat = "identity", position = "identity"), shape = 20, color="black", alpha=0.6) +
scale_size_continuous(range = c(3,15)) +
scale_x_discrete(breaks=levelss,labels=lab)+
xlab("")+ #Afegir/canviar títol eix x
ylab("Response")+ #Afegir/canviar títol eix y
ggtitle(title)+ #Títol a dalt
theme_nogrid()
}
}
This is a pretty confusing question and example. I think you want to produce a different graph for each country value? In that case I'd suggest something like this:
library(reshape2)
Data_m <- melt(Data, id.vars="country") # melt the data into 'long' format
f <- function(d) { # function that produces a graph and waits
print(qplot(variable, value, data=d) + ggtitle(unique(d$country)))
readline()
}
library(plyr)
d_ply(Data_m, .(country), f) # produces three separate graphs
The d_ply call splits Data_m into three parts and repeatedly calls f on each, producing a graph of that subset of the data, without knowing anything about the data being graphed.

Resources