How do I change the colours on my 3d categorical data plot? - plot

I have categorical data on 3 axes (X1, X2, X3) in df mapping whether something is Y = "Red" or Y = "Green".
https://itecnote.com/tecnote/r-3d-plot-with-categorical-colors/
> df
# A tibble: 6 x 7
Obs. X1 X2 X3 Y dist Y1
<chr> <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
1 1 0 3 0 R 3 0
2 2 2 0 0 R 2 0
3 3 0 1 3 R 3.16 0
4 4 0 1 2 G 2.24 1
5 5 -1 0 1 G 1.41 1
6 6 1 1 1 R 1.73 0
I followed the procedure here, but I am getting a white spot (presumably when Y1 = 0). I would like to change the colours to red and green.
library(scatterplot3d)
s3d = with(df, scatterplot3d(X1, X2, X3, colour = as.numeric(Y1), pch = 19))

Related

issues using first and mutate with group_by

I am using mutate to create a column depending on the first value of a group
library(tidyverse)
test = data.frame(grp = c(1,1,1,2,2,2), x = c(1,2,3,1,2,3), y = c(1,2,3,1,2,3))
test
grp x y
1 1 1 1
2 1 2 2
3 1 3 3
4 2 1 1
5 2 2 2
6 2 3 3
test %>% group_by(grp) %>%
mutate(y = ifelse(grp[[1]] == x[[1]], y-1, y))
grp x y
<dbl> <dbl> <dbl>
1 1 1 0
2 1 2 0
3 1 3 0
4 2 1 1
5 2 2 1
6 2 3 1
However output is not as I expected.
Expected output is
grp x y
<dbl> <dbl> <dbl>
1 1 1 0
2 1 2 1
3 1 3 2
4 2 1 1
5 2 2 2
6 2 3 3
Can you please explain what is happening and how best to get my expected solution?
You need to remove the index [[1]] from grp since it will only change the first value of that group and use that to replace y. Since grp is the group you should avoid indexing it. Just use it as is, i.e.
library(dplyr)
test %>%
group_by(grp) %>%
mutate(new_y = ifelse(grp == first(x), y-1, y))
# A tibble: 6 × 4
# Groups: grp [2]
grp x y new_y
<dbl> <dbl> <dbl> <dbl>
1 1 1 1 0
2 1 2 2 1
3 1 3 3 2
4 2 1 1 1
5 2 2 2 2
6 2 3 3 3
Because of the x[[1]], you are always comparing the group values of each row with the the x value of the first row. I think you want grp==x within ifelse()

Ignore zeros and NAs in cumsum

I need to assign numbers to sets of consecutive values in every column and create new columns. Eventually I want to find a sum of values in z column that correspond to the first consecutive numbers in each column.
My data looks something like this:
library(dplyr)
y1 = c(1,2,3,8,9,0)
y2 = c(0,0,0,4,5,6)
z = c(200,250,200,100,90,80)
yabc <- tibble(y1, y2, z)
# A tibble: 6 × 3
y1 y2 z
<dbl> <dbl> <dbl>
1 1 0 200
2 2 0 250
3 3 0 200
4 8 4 100
5 9 5 90
6 0 6 80
I tried the following formula:
yabc %>%
mutate_at(vars(starts_with("y")),
list(mod = ~ cumsum(c(FALSE, diff(.x)!=1))+1))
that gave me the following result:
# A tibble: 6 × 5
y1 y2 z y1_mod y2_mod
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0 200 1 1
2 2 0 250 1 2
3 3 0 200 1 3
4 8 4 100 2 4
5 9 5 90 2 4
6 0 6 80 3 4
I am only interested in numbers greater than zero. I tried replacing zeros with NA, but it did not work either.
# A tibble: 6 × 5
y1 y2 z y1_mod y2_mod
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 NA 200 1 1
2 2 NA 250 1 NA
3 3 NA 200 1 NA
4 8 4 100 2 NA
5 9 5 90 2 NA
6 NA 6 80 NA NA
What I would like the data to look like is:
# A tibble: 6 × 5
y1 y2 z y1_mod y2_mod
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 0 200 1 NA
2 2 0 250 1 NA
3 3 0 200 1 NA
4 8 4 100 2 1
5 9 5 90 2 1
6 0 6 80 NA 1
Is there any way to exclude zeros and start applying the formula only when .x is greater than 0? Or any other way to make the formula work the way I need? Thank you!
FYI: mutate_at has been superseded by across, I'll demonstrate the new method in my code.
yabc %>%
mutate(
across(starts_with("y"),
list(mod = ~ if_else(.x > 0,
cumsum(.x > 0 & c(FALSE, diff(.x) != 1)) + 1L,
NA_integer_) )
)
)
# # A tibble: 6 x 5
# y1 y2 z y1_mod y2_mod
# <dbl> <dbl> <dbl> <int> <int>
# 1 1 0 200 1 NA
# 2 2 0 250 1 NA
# 3 3 0 200 1 NA
# 4 8 4 100 2 2
# 5 9 5 90 2 2
# 6 0 6 80 NA 2
If this is sufficient (you don't care if it's 1 or 2 for the first effective group in y2_mod), then you're good. If you want to reduce them all to be 1-based, then
yabc %>%
mutate(
across(starts_with("y"),
list(mod = ~ if_else(.x > 0,
cumsum(.x > 0 & c(FALSE, diff(.x) != 1)),
NA_integer_))),
across(ends_with("_mod"),
~ if_else(is.na(.x), .x, match(.x, na.omit(unique(.x))))
)
)
# # A tibble: 6 x 5
# y1 y2 z y1_mod y2_mod
# <dbl> <dbl> <dbl> <int> <int>
# 1 1 0 200 1 NA
# 2 2 0 250 1 NA
# 3 3 0 200 1 NA
# 4 8 4 100 2 1
# 5 9 5 90 2 1
# 6 0 6 80 NA 1
Notes:
if_else is helpful to handle the NA-including rows specially; it requires the same class, which can be annoying/confusing. Because of this, we need to pass the specific "class" of NA as the false= (third) argument to if_else. For example, cumsum(.)+1 produces a numeric, so the third arg would need to be NA_real_ (since the default NA is actually logical). Another way to deal with it is to either use cumsum(.)+1L (produces an integer) and NA_integer_ or (as I show in my second example) use cumsum(.) by itself (and NA_integer_) since we match things later (and match(.) returns integer)
I demo the shift from your mutate_at to mutate(across(..)). An important change here from mutate is that we run across without assigning its return to anything. In essence, it returns a named-list where each element of the list is an updated column or a new one, depending on the presence of .names; that takes a glue-like string to allow for renaming the calculated columns, thereby adding new columns instead of the default action (no .names) of overwriting the columns in-place. The alternate way of producing new (not in-place) columns is the way you used, with a named list of functions, still a common/supported way to use a list of functions within across(..).
library(data.table)
library(tidyverse)
yabc %>%
mutate(across(starts_with('y'),
~ as.integer(factor(`is.na<-`(rleid(.x - row_number()), !.x))),
.names = '{col}_mod'))
# A tibble: 6 x 5
y1 y2 z y1_mod y2_mod
<dbl> <dbl> <dbl> <int> <int>
1 1 0 200 1 NA
2 2 0 250 1 NA
3 3 0 200 1 NA
4 8 4 100 2 1
5 9 5 90 2 1
6 0 6 80 NA 1
The trick lies in knowing that for consecutive numbers, the difference between the number and their row_number() is the same:
ie consider:
x <- c(1,2,3,6,7,8,10,11,12)
The consecutive numbers can be grouped as:
x - seq_along(x)
[1] 0 0 0 2 2 2 3 3 3
As you can see, the consecutive numbers are grouped together. To get the desired groups, we should use rle
rleid(x-seq_along(x))
[1] 1 1 1 2 2 2 3 3 3
Another possible solution:
library(tidyverse)
y1=c(1,2,3,8,9,0)
y2=c(0,0,0,4,5,6)
z=c(200,250,200,100,90,80)
yabc<-tibble(y1,y2,z)
yabc %>%
mutate(across(starts_with("y"),
~if_else(.x==0, NA_real_, 1+cumsum(c(1,diff(.x)) != 1)), .names="{.col}_mod"))%>%
mutate(across(ends_with("mod"), ~ factor(.x) %>% as.numeric(.)))
#> # A tibble: 6 × 5
#> y1 y2 z y1_mod y2_mod
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0 200 1 NA
#> 2 2 0 250 1 NA
#> 3 3 0 200 1 NA
#> 4 8 4 100 2 1
#> 5 9 5 90 2 1
#> 6 0 6 80 NA 1

fitting a diagonal line to graph in ggplot

I have a data frame that looks like this
> wider_data
# A tibble: 12 x 6
treat FR CC HP Other x
<fct> <int> <int> <int> <int> <fct>
1 1 0 2 5 3 0
2 1 2 1 5 1 1
3 1 3 5 3 2 2
4 1 0 2 4 1 3
5 1 1 2 4 1 4
6 1 0 2 4 0 5
7 2 4 1 4 2 0
8 2 1 5 2 0 1
9 2 4 0 1 4 2
10 2 0 5 2 3 3
11 2 0 3 4 1 4
12 2 1 5 1 2 5
I melt this data to
> m_wider_data <- melt(wider_data) #Using treat, levels, x as id variables
Using treat, x as id variables
> m_wider_data
treat x variable value
1 1 0 FR 0
2 1 1 FR 2
3 1 2 FR 3
4 1 3 FR 0
5 1 4 FR 1
6 1 5 FR 0
7 2 0 FR 4
8 2 1 FR 1
9 2 2 FR 4
10 2 3 FR 0
11 2 4 FR 0
12 2 5 FR 1
13 1 0 CC 2
14 1 1 CC 1
15 1 2 CC 5
16 1 3 CC 2
(48 lines in total)
> class(m_wider_data$x)
[1] "factor"
> class(m_wider_data$value)
[1] "integer"
Then I plot 2 graphs (corresponding to the 2 levels of 'treat') using
plot_test <- m_wider_data %>%
ggplot(aes(x = x, y = value, colour = variable, group = variable)) +
facet_wrap(vars(treat), ncol = 2) +
geom_point() +
geom_line(aes(linetype=variable)) +
geom_abline(slope = 1, intercept = 0) +
labs(x = "X", y = "Y") +
ggtitle('Table') +
theme(legend.title=element_blank()) #turns of the legend title
However, the diagonal line ends up wrong
I want it to go through (0,0) (1,1) (2,2) (3,3) (4,4) (5,5).
In addition, I want cut the margins so that the graph area start so close to x=0 and y=0 as possible. I have tried to change x to a numeric variable and use
plot_test +
scale_x_continuous(breaks=c(0,1,2,3,4,5),limits = c(-0.05, 5.05)) +
scale_y_continuous(breaks=c(0,1,2,3,4,5),limits = c(-0.05, 5.05))
But it is not working.
Is there a way to achieve this? Thank you in advance!
Maybe this is what you are looking for:
It's best to convert numeric. However, when converting a factor to a numeric you have to keep in mind that you first have to convert to a character. Otherwise your first category "0" becomes 1, your second category "1" becomes 2 and so on. Therefore always use as.numeric(as.character(x)).
After this change your code should work fine. Nonetheless I made some small changes. I set the limits via coord_cartesain. And also I used the expand argument to scale_x/y_continuous to set the default expansion (= 5 percent) to 0. BTW: You don't need to add an extra ".05" to you limits as ggplot will expand your axes by default. (If you are fine with the default expansion you ould simply remove scale_x/y_continuous or if you want a different expansion try e.g. c(0.025, 0) which will expand the axis by 2.5 percent on both sides.)
library(ggplot2)
library(dpylr)
plot_test <- m_wider_data %>%
ggplot(aes(x = as.numeric(as.character(x)), y = value, colour = variable, group = variable)) +
facet_wrap(vars(treat), ncol = 2) +
geom_point() +
geom_line(aes(linetype=variable)) +
geom_abline(slope = 1, intercept = 0) +
labs(x = "X", y = "Y") +
ggtitle('Table') +
theme(legend.title=element_blank()) +
coord_cartesian(xlim = c(0,5), ylim = c(0,5)) +
scale_x_continuous(expand = c(0, 0)) +
scale_y_continuous(expand = c(0, 0))
plot_test
DATA
m_wider_data <- read.table(text = "treat x variable value
1 1 0 FR 0
2 1 1 FR 2
3 1 2 FR 3
4 1 3 FR 0
5 1 4 FR 1
6 1 5 FR 0
7 2 0 FR 4
8 2 1 FR 1
9 2 2 FR 4
10 2 3 FR 0
11 2 4 FR 0
12 2 5 FR 1
13 1 0 CC 2
14 1 1 CC 1
15 1 2 CC 5
16 1 3 CC 2", header = TRUE)
m_wider_data$x <- factor(m_wider_data$x)
I think your problem is a result of x being a factor and not a number. It is best to perform that conversion early in the process.
Also, I have updated the matrix transformation using the more modern pivot_longer instead of melt
widedf<-read.table(header = TRUE, text=" treat FR CC HP Other x
1 0 2 5 3 0
1 2 1 5 1 1
1 3 5 3 2 2
1 0 2 4 1 3
1 1 2 4 1 4
1 0 2 4 0 5
2 4 1 4 2 0
2 1 5 2 0 1
2 4 0 1 4 2
2 0 5 2 3 3
2 0 3 4 1 4
2 1 5 1 2 5")
library(tidyr)
#make the table long
df <- pivot_longer(widedf, -c("treat", "x"), names_to="variable", values_to="value")
#convert x from a character to an integer
df$x <- as.integer(as.character(df$x))
library(ggplot2)
plot_test <- df %>%
ggplot(aes(x = x, y = value, colour = variable, group = variable)) +
facet_wrap(vars(treat), ncol = 2) +
geom_point() +
geom_line(aes(linetype=variable)) +
geom_abline(slope = 1, intercept = 0) +
labs(x = "X", y = "Y") +
ggtitle('Table') +
theme(legend.title=element_blank()) #turns of the legend title
plot_test

How to create columns from anothers columns?

I want to built a dataframe like df2 from df1, looking always for the name of the column where the value is closet to 0: Where clossets_1 - closer value to 0 of the columns x,y and z. clossets_2 - closer value to 0 of the columns x and a, because x is the most received value in clossets_1. clossets_3 - closer value to 0 of the columns a and b, because a is the most received value in clossets_2.
df1
df1
# x y z a b
#1 1 2 3 4 3
#2 2 3 4 1 2
#3 3 2 4 2 1
#4 4 3 2 3 6
Desire output:
df2
# x y z clossets_1 a clossets_2 b clossets_3
#1 1 2 3 x 4 x 3 b
#2 2 3 4 x 1 a 2 a
#3 3 2 4 y 2 a 1 b
#4 4 3 2 z 3 a 2 b
Here is the first step to get you started:
cols = c("x","y","z")
df2 = df1
df2$clossets_1 = cols[apply(df1[,cols], 1, function(x) {which(x == min(x))})]
df2
## x y z a b clossets_1
## 1 1 2 3 4 3 x
## 2 2 3 4 1 2 x
## 3 3 2 4 2 1 y
## 4 4 3 2 3 6 z
I solved it this way, using the first step of #BigFinger answer and the mlv() function from the package modeest to find the most repeated value in the closests columns
library(DescTools)
library(modeest)
library(tibble)
df1 = tibble(x = c(1,2,3,4),
y = c(2,3,2,3),
z = c(3,4,4,2),
clossest_1 = c("x","y","z")[apply(data.frame(x,y,z),1,function(x){which(x == Closest(x,0))})],
a = c(4,1,2,3),
clossest_2 = c(mlv(clossest_1),"a")[apply(data.frame(get(mlv(clossest_1)),a),1,function(x){which(x == Closest(x,0))})],
b = c(3,2,1,2),
clossest_3 = c(mlv(clossest_2),"b")[apply(data.frame(get(mlv(clossest_2)),b),1,function(x){which(x == Closest(x,0))})])
df1
# A tibble: 4 x 8
# x y z clossest_1 a clossest_2 b clossest_3
# <dbl> <dbl> <dbl> <chr> <dbl> <chr> <dbl> <chr>
#1 1 2 3 x 4 x 3 b
#2 2 3 4 x 1 a 2 a
#3 3 2 4 y 2 a 1 b
#4 4 3 2 z 3 a 2 b

Mutate multiple variable to create multiple new variables

Let's say I have a tibble where I need to take multiple variables and mutate them into new multiple new variables.
As an example, here is a simple tibble:
tb <- tribble(
~x, ~y1, ~y2, ~y3, ~z,
1,2,4,6,2,
2,1,2,3,3,
3,6,4,2,1
)
I want to subtract variable z from every variable with a name starting with "y", and mutate the results as new variables of tb. Also, suppose I don't know how many "y" variables I have. I want the solution to fit nicely within tidyverse / dplyr workflow.
In essence, I don't understand how to mutate multiple variables into multiple new variables. I'm not sure if you can use mutate in this instance? I've tried mutate_if, but I don't think I'm using it right (and I get an error):
tb %>% mutate_if(starts_with("y"), funs(.-z))
#Error: No tidyselect variables were registered
Thanks in advance!
Because you are operating on column names, you need to use mutate_at rather than mutate_if which uses the values within columns
tb %>% mutate_at(vars(starts_with("y")), funs(. - z))
#> # A tibble: 3 x 5
#> x y1 y2 y3 z
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0 2 4 2
#> 2 2 -2 -1 0 3
#> 3 3 5 3 1 1
To create new columns, instead of overwriting existing ones, we can give name to funs
# add suffix
tb %>% mutate_at(vars(starts_with("y")), funs(mod = . - z))
#> # A tibble: 3 x 8
#> x y1 y2 y3 z y1_mod y2_mod y3_mod
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2 4 6 2 0 2 4
#> 2 2 1 2 3 3 -2 -1 0
#> 3 3 6 4 2 1 5 3 1
# remove suffix, add prefix
tb %>%
mutate_at(vars(starts_with("y")), funs(mod = . - z)) %>%
rename_at(vars(ends_with("_mod")), funs(paste("mod", gsub("_mod", "", .), sep = "_")))
#> # A tibble: 3 x 8
#> x y1 y2 y3 z mod_y1 mod_y2 mod_y3
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2 4 6 2 0 2 4
#> 2 2 1 2 3 3 -2 -1 0
#> 3 3 6 4 2 1 5 3 1
Edit: In dplyr 0.8.0 or higher versions, funs() will be deprecated (source1 & source2), need to use list() instead
tb %>% mutate_at(vars(starts_with("y")), list(~ . - z))
#> # A tibble: 3 x 5
#> x y1 y2 y3 z
#> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 0 2 4 2
#> 2 2 -2 -1 0 3
#> 3 3 5 3 1 1
tb %>% mutate_at(vars(starts_with("y")), list(mod = ~ . - z))
#> # A tibble: 3 x 8
#> x y1 y2 y3 z y1_mod y2_mod y3_mod
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2 4 6 2 0 2 4
#> 2 2 1 2 3 3 -2 -1 0
#> 3 3 6 4 2 1 5 3 1
tb %>%
mutate_at(vars(starts_with("y")), list(mod = ~ . - z)) %>%
rename_at(vars(ends_with("_mod")), list(~ paste("mod", gsub("_mod", "", .), sep = "_")))
#> # A tibble: 3 x 8
#> x y1 y2 y3 z mod_y1 mod_y2 mod_y3
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2 4 6 2 0 2 4
#> 2 2 1 2 3 3 -2 -1 0
#> 3 3 6 4 2 1 5 3 1
Edit 2: dplyr 1.0.0+ has across() function which simplifies this task even further
Basic usage
across() has two primary arguments:
The first argument, .cols, selects the columns you want to operate on.
It uses tidy selection (like select()) so you can pick variables by
position, name, and type.
The second argument, .fns, is a function or list of functions to apply to
each column. This can also be a purrr style formula (or list of formulas)
like ~ .x / 2. (This argument is optional, and you can omit it if you just want
to get the underlying data; you'll see that technique used in
vignette("rowwise").)
# Control how the names are created with the `.names` argument which
# takes a [glue](http://glue.tidyverse.org/) spec:
tb %>%
mutate(
across(starts_with("y"), ~ .x - z, .names = "mod_{col}")
)
#> # A tibble: 3 x 8
#> x y1 y2 y3 z mod_y1 mod_y2 mod_y3
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2 4 6 2 0 2 4
#> 2 2 1 2 3 3 -2 -1 0
#> 3 3 6 4 2 1 5 3 1
tb %>%
mutate(
across(num_range(prefix = "y", range = 1:3), ~ .x - z, .names = "mod_{col}")
)
#> # A tibble: 3 x 8
#> x y1 y2 y3 z mod_y1 mod_y2 mod_y3
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2 4 6 2 0 2 4
#> 2 2 1 2 3 3 -2 -1 0
#> 3 3 6 4 2 1 5 3 1
### Multiple functions
tb %>%
mutate(
across(c(matches("x"), contains("z")), ~ max(.x, na.rm = TRUE), .names = "max_{col}"),
across(c(y1:y3), ~ .x - z, .names = "mod_{col}")
)
#> # A tibble: 3 x 10
#> x y1 y2 y3 z max_x max_z mod_y1 mod_y2 mod_y3
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 2 4 6 2 3 3 0 2 4
#> 2 2 1 2 3 3 3 3 -2 -1 0
#> 3 3 6 4 2 1 3 3 5 3 1
Created on 2018-10-29 by the reprex package (v0.2.1)

Resources