How to plot 3D in R with multi-conditions - r

I have data set with 3 features as below:
V1 V2 V3
0.268 0.917 0.191
0.975 0.467 0.447
0.345 0.898 0.984
0.901 0.043 0.456
0.243 0.453 0.964
0.001 0.464 0.953
0.998 0.976 0.978
0.954 0.932 0.923
How to plot this data in 3D graphic based on the following conditions giving different colour for each condition.
(v1>=0.90 && v3>=0.90 && v3>=0.90) || (v1>=0.90 && v3< 0.50 && v3< 0.50) || (v1 < 0.50 && v3>=0.90 && v3< 0.50)|| (v1< 0.50 && v3< 0.50 && v3>=0.90)

I assumed the second statement in each condition is referring to V2, which makes more sense. To color the points according to which condition is met first you need to create a column with that value:
df = data.frame(
"V1" = c(0.268,0.975,0.345,0.901,0.243,0.001,0.998,0.954),
"V2" = c(0.917,0.467,0.898,0.043,0.453,0.464,0.976,0.932),
"V3" = c(0.191,0.447,0.984,0.456,0.964,0.953,0.978,0.923)
)
df = df %>%
mutate(
group = case_when(
V1 >= 0.9 & V2 >= 0.9 & V3 >=0.9 ~ "1",
V1 >= 0.9 & V2 < 0.5 & V3 < 0.5 ~ "2",
V1 < 0.5 & V2 >= 0.9 & V3 <0.5 ~ "3",
V1 <0.5 & V2 <0.5 & V3 >=0.9 ~ "4",
T ~ "5"
))
Then we can use the plotlyor scatterplot3d packages to build the graph:
scatterplot3d(x=df$V1,y=df$V2,z=df$V3,color=df$group)
plot_ly(x=df$V1,y=df$V2,z=df$V3,color = df$group)

You can start by creating a logical vector using the vectorized &;|
# Create the logical vector
ind <- (mat$v1>=0.90 & mat$v3>=0.90 & mat$v3>=0.90) | (mat$v1>=0.90 & mat$v3< 0.50 & mat$v3< 0.50) |
(mat$v1 < 0.50 & mat$v3>=0.90 & mat$v3< 0.50) | (mat$v1< 0.50 & mat$v3< 0.50 & mat$v3>=0.90)
And now one can plot it e.g. using the plotly
# plot
plotly::plot_ly(x = mat$v1[ind], y = mat$v2[ind], z = mat$v3[ind])
With the data
mat = structure(list(v1 = c(0.268, 0.975, 0.345, 0.901, 0.243, 0.001,
0.998, 0.954), v2 = c(0.917, 0.467, 0.898, 0.043, 0.453, 0.464,
0.976, 0.932), v3 = c(0.191, 0.447, 0.984, 0.456, 0.964, 0.953,
0.978, 0.923)), class = "data.frame", row.names = c(NA, -8L))

Related

Convert the factors of a variable into the columns of the dataframe

I have a dataframe that looks like this
Concentration Value
Low 0.21
Medium 0.85
Low 0.10
Low 0.36
High 2.21
Medium 0.50
High 1.85
I would like to transform it into a dataframe where the column names are the factors of the variable:
Low Medium High
0.21 0.85 2.21
0.10 0.50 1.85
0.367
I've tried using pivot_wider, however, the values for each of the factors are stored as vectors.
Low Medium High
c(0.21,...) c(0.87 ,...) c(1.47 ,...)
Use an id variable for rows by group:
dat %>%
group_by(Concentration) %>%
mutate(id = row_number()) %>%
pivot_wider(names_from = Concentration, values_from = Value)
id Low Medium High
<int> <dbl> <dbl> <dbl>
1 1 0.21 0.85 2.21
2 2 0.1 0.5 1.85
3 3 0.36 NA NA
Using unstack from base R
mx <- max(table(df1$Concentration))
data.frame(lapply(unstack(df1, Value ~ Concentration), `length<-`, mx))
High Low Medium
1 2.21 0.21 0.85
2 1.85 0.10 0.50
3 NA 0.36 NA
data
df1 <- structure(list(Concentration = c("Low", "Medium", "Low", "Low",
"High", "Medium", "High"), Value = c(0.21, 0.85, 0.1, 0.36, 2.21,
0.5, 1.85)), class = "data.frame", row.names = c(NA, -7L))

R: Elegant Way to Filter Dataframe Such that First Few Decimals of a Set of Variables Are Considered

I have this toy data which I want to get only rows that have 0.5 in the v2 variable and also have 0.3 in the v3 variable.
I have tried this:
library(tidyverse)
toy_data <- tibble(v1 = c(20215, 20549, 21678, 20562, 20245, 20225, 21245, 22322, 20618, 21993, 22394, 21581), v2 = c(0.612, 0.618, 0.642, 0.618, 0.612, 0.593, 0.659, 0.619, 0.651, 0.662, 0.640, 0.509), v3 = c(0.533, 0.567, 0.469, 0.545, 0.675, 0.399, 0.322, 0.543, 0.576, 0.457, 0.552, 0.390), v4 = c(49, 118, 257, 384, 566, 569, 637, 1028, 1253, 2277, 2300, 2390), v5 = rep(NA, 12))
toy_data |> filter(v2 >= 0.5, v3 >= 0.3) |> filter(v2 < 0.6, v3 < 0.4)
## A tibble: 2 x 5
# v1 v2 v3 v4 v5
# <dbl> <dbl> <dbl> <dbl> <lgl>
#1 20225 0.593 0.399 569 NA
#2 21581 0.509 0.39 2390 NA
.
What I Want
Is there an elegant way to do this not necessarily by tidyversr::filter() such that I can tell R to look for 0.5 in variable v2 and 0.3 in variable v3 like this:
toy_data |> substr(toy_data$v2, 1, 3) %in% 0.5 & substr(toy_data$v3, 1, 3) %in% 0.3
You could multiply by 10 and use floor:
toy_data |> filter(floor(10 * v2) == 5 & floor(10 * v3) == 3)
#> # A tibble: 2 x 5
#> v1 v2 v3 v4 v5
#> <dbl> <dbl> <dbl> <dbl> <lgl>
#> 1 20225 0.593 0.399 569 NA
#> 2 21581 0.509 0.39 2390 NA

Computing mean of different columns depending on date

My data set is about forest fires and NDVI values (a value ranging from 0 to 1, indicating how green is the surface). It has an initial column which says when the forest fire of row one took place, and subsequent columns indicating the NDVI value on different dates, before and after the fire happened. NDVI values before the fire are substantially higher compared with values after the fire. Something like:
data1989 <- data.frame("date_fire" = c("1987-01-01", "1987-07-03", "1988-01-01"),
"1986-01-01" = c(0.5, 0.589, 0.66),
"1986-06-03" = c(0.56, 0.447, 0.75),
"1986-10-19" = c(0.8, NA, 0.83),
"1987-01-19" = c(0.75, 0.65,0.75),
"1987-06-19" = c(0.1, 0.55,0.811),
"1987-10-19" = c(0.15, 0.12, 0.780),
"1988-01-19" = c(0.2, 0.22,0.32),
"1988-06-19" = c(0.18, 0.21,0.23),
"1988-10-19" = c(0.21, 0.24, 0.250),
stringsAsFactors = FALSE)
> data1989
date_fire X1986.01.01 X1986.06.03 X1986.10.19 X1987.01.19 X1987.06.19 X1987.10.19 X1988.01.19 X1988.06.19 X1988.10.19
1 1987-01-01 0.500 0.560 0.80 0.75 0.100 0.15 0.20 0.18 0.21
2 1987-07-03 0.589 0.447 NA 0.65 0.550 0.12 0.22 0.21 0.24
3 1988-01-01 0.660 0.750 0.83 0.75 0.811 0.78 0.32 0.23 0.25
I would like to compute the average of NDVI values, in a new column, PRIOR to the forest fire. In case one, it would be the average of columns 2, 3, 4 and 5.
What I need to get is:
date_fire X1986.01.01 X1986.06.03 X1986.10.19 X1987.01.19 X1987.06.19 X1987.10.19 X1988.01.19 X1988.06.19 X1988.10.19 meanPreFire
1 1987-01-01 0.500 0.560 0.80 0.75 0.100 0.15 0.20 0.18 0.21 0.653
2 1987-07-03 0.589 0.447 NA 0.65 0.550 0.12 0.22 0.21 0.24 0.559
3 1988-01-01 0.660 0.750 0.83 0.75 0.811 0.78 0.32 0.23 0.25 0.764
Thanks!
EDIT: SOLUTION
How to adapt the code with more than one column to exclude:
data1989 <- data.frame("date_fire" = c("1987-02-01", "1987-07-03", "1988-01-01"),
"type" = c("oak", "pine", "oak"),
"meanRainfall" = c(600, 300, 450),
"1986.01.01" = c(0.5, 0.589, 0.66),
"1986.06.03" = c(0.56, 0.447, 0.75),
"1986.10.19" = c(0.8, NA, 0.83),
"1987.01.19" = c(0.75, 0.65,0.75),
"1987.06.19" = c(0.1, 0.55,0.811),
"1987.10.19" = c(0.15, 0.12, 0.780),
"1988.01.19" = c(0.2, 0.22,0.32),
"1988.06.19" = c(0.18, 0.21,0.23),
"1988.10.19" = c(0.21, 0.24, 0.250),
check.names = FALSE,
stringsAsFactors = FALSE)
Using:
j1 <- findInterval(as.Date(data1989$date_fire), as.Date(names(data1989)[-(1:3)],format="%Y.%m.%d"))
m1 <- cbind(rep(seq_len(nrow(data1989)), j1), sequence(j1))
data1989$meanPreFire <- tapply(data1989[-(1:3)][m1], m1[,1], FUN = mean, na.rm = TRUE)
> data1989
date_fire type meanRainfall 1986.01.01 1986.06.03 1986.10.19 1987.01.19 1987.06.19 1987.10.19 1988.01.19 1988.06.19 1988.10.19 meanPreFire
1 1987-02-01 oak 600 0.500 0.560 0.80 0.75 0.100 0.15 0.20 0.18 0.21 0.6525
2 1987-07-03 pine 300 0.589 0.447 NA 0.65 0.550 0.12 0.22 0.21 0.24 0.5590
3 1988-01-01 oak 450 0.660 0.750 0.83 0.75 0.811 0.78 0.32 0.23 0.25 0.7635
Reshape data to the long form and filter dates prior to the forest fire.
library(tidyverse)
data1989 %>%
pivot_longer(-date_fire, names_to = "date") %>%
mutate(date_fire = as.Date(date_fire),
date = as.Date(date, "X%Y.%m.%d")) %>%
filter(date < date_fire) %>%
group_by(date_fire) %>%
summarise(meanPreFire = mean(value, na.rm = T))
# # A tibble: 3 x 2
# date_fire meanPreFire
# <date> <dbl>
# 1 1987-01-01 0.62
# 2 1987-07-03 0.559
# 3 1988-01-01 0.764
The solution would be much more concise if we would keep the data in long(er) form... but this reproduces the desired output:
library(dplyr)
library(tidyr)
data1989 %>%
pivot_longer(-date_fire, names_to = "date_NDVI", values_to = "value", names_prefix = "^X") %>%
mutate(date_fire = as.Date(date_fire, "%Y-%m-%d"),
date_NDVI = as.Date(date_NDVI, "%Y.%m.%d")) %>%
group_by(date_fire) %>%
mutate(period = ifelse(date_NDVI < date_fire, "before_fire", "after_fire")) %>%
group_by(date_fire, period) %>%
mutate(average_NDVI = mean(value, na.rm = TRUE)) %>%
pivot_wider(names_from = date_NDVI, names_prefix = "X", values_from = value) %>%
pivot_wider(names_from = period, values_from = average_NDVI) %>%
group_by(date_fire) %>%
summarise_all(funs(sum(., na.rm=T)))
Returns:
# A tibble: 3 x 12
date_fire `X1986-01-01` `X1986-06-03` `X1986-10-19` `X1987-01-19` `X1987-06-19` `X1987-10-19` `X1988-01-19` `X1988-06-19` `X1988-10-19` before_fire after_fire
<date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1987-01-01 0.5 0.56 0.8 0.75 0.1 0.15 0.2 0.18 0.21 0.62 0.265
2 1987-07-03 0.589 0.447 0 0.65 0.55 0.12 0.22 0.21 0.24 0.559 0.198
3 1988-01-01 0.66 0.75 0.83 0.75 0.811 0.78 0.32 0.23 0.25 0.764 0.267
Edit:
If we stop the expression right after calculating the averages we can use the data in this structure to easily calculate the variance or account for variable number of observations. I think it's ok to keep the date_fireas its own column, but I'd suggest leaving the other dates as a column (because they correspond to observations). Especially if we want to do more analysis with the data using ggplot2 and other tidyverse functions.
We can use base R, by creating a row/column index. The column index can be got from findInterval with the column names and the 'date_fire'
j1 <- findInterval(as.Date(data1989$date_fire), as.Date(names(data1989)[-1]))
l1 <- lapply(j1+1, `:`, ncol(data1989)-1)
m1 <- cbind(rep(seq_len(nrow(data1989)), j1), sequence(j1))
m2 <- cbind(rep(seq_len(nrow(data1989)), lengths(l1)), unlist(l1))
data1989$meanPreFire <- tapply(data1989[-1][m1], m1[,1], FUN = mean, na.rm = TRUE)
data1989$meanPostFire <- tapply(data1989[-1][m2], m2[,1], FUN = mean, na.rm = TRUE)
data1989
# date_fire 1986-01-01 1986-06-03 1986-10-19 1987-01-19 1987-06-19 1987-10-19 1988-01-19 1988-06-19 1988-10-19
#1 1987-01-01 0.500 0.560 0.80 0.75 0.100 0.15 0.20 0.18 0.21
#2 1987-07-03 0.589 0.447 NA 0.65 0.550 0.12 0.22 0.21 0.24
#3 1988-01-01 0.660 0.750 0.83 0.75 0.811 0.78 0.32 0.23 0.25
# meanPreFire meanPostFire
#1 0.6200 0.2650000
#2 0.5590 0.1975000
#3 0.7635 0.2666667
Or using melt/dcast from data.table
library(data.table)
dcast(melt(setDT(data1989), id.var = 'date_fire')[,
.(value = mean(value, na.rm = TRUE)),
.(date_fire, grp = c('postFire', 'preFire')[1 + (as.IDate(variable) < as.IDate(date_fire))]) ], date_fire ~ grp)[data1989, on = .(date_fire)]
# date_fire postFire preFire 1986-01-01 1986-06-03 1986-10-19 1987-01-19 1987-06-19 1987-10-19 1988-01-19 1988-06-19
#1: 1987-01-01 0.2650000 0.6200 0.500 0.560 0.80 0.75 0.100 0.15 0.20 0.18
#2: 1987-07-03 0.1975000 0.5590 0.589 0.447 NA 0.65 0.550 0.12 0.22 0.21
#3: 1988-01-01 0.2666667 0.7635 0.660 0.750 0.83 0.75 0.811 0.78 0.32 0.23
# 1988-10-19
#1: 0.21
#2: 0.24
#3: 0.25
data
data1989 <- data.frame("date_fire" = c("1987-01-01", "1987-07-03", "1988-01-01"),
"1986-01-01" = c(0.5, 0.589, 0.66),
"1986-06-03" = c(0.56, 0.447, 0.75),
"1986-10-19" = c(0.8, NA, 0.83),
"1987-01-19" = c(0.75, 0.65,0.75),
"1987-06-19" = c(0.1, 0.55,0.811),
"1987-10-19" = c(0.15, 0.12, 0.780),
"1988-01-19" = c(0.2, 0.22,0.32),
"1988-06-19" = c(0.18, 0.21,0.23),
"1988-10-19" = c(0.21, 0.24, 0.250), check.names = FALSE,
stringsAsFactors = FALSE)

Ordering a subset of columns by date r

I have a data frame which part of the columns are not in the correct order (they are dates). See:
data1989 <- data.frame("date_fire" = c("1987-02-01", "1987-07-03", "1988-01-01"),
"Foresttype" = c("oak", "pine", "oak"),
"meanSolarRad" = c(500, 550, 450),
"meanRainfall" = c(600, 300, 450),
"meanTemp" = c(14, 15, 12),
"1988.01.01" = c(0.5, 0.589, 0.66),
"1986.06.03" = c(0.56, 0.447, 0.75),
"1986.10.19" = c(0.8, NA, 0.83),
"1988.01.19" = c(0.75, 0.65,0.75),
"1986.06.19" = c(0.1, 0.55,0.811),
"1987.10.19" = c(0.15, 0.12, 0.780),
"1988.01.19" = c(0.2, 0.22,0.32),
"1986.06.19" = c(0.18, 0.21,0.23),
"1987.10.19" = c(0.21, 0.24, 0.250),
check.names = FALSE,
stringsAsFactors = FALSE)
> data1989
date_fire Foresttype meanSolarRad meanRainfall meanTemp 1988.01.01 1986.06.03 1986.10.19 1988.01.19 1986.06.19 1987.10.19 1988.01.19 1986.06.19 1987.10.19
1 1987-02-01 oak 500 600 14 0.500 0.560 0.80 0.75 0.100 0.15 0.20 0.18 0.21
2 1987-07-03 pine 550 300 15 0.589 0.447 NA 0.65 0.550 0.12 0.22 0.21 0.24
3 1988-01-01 oak 450 450 12 0.660 0.750 0.83 0.75 0.811 0.78 0.32 0.23 0.25
I would like to order the columns by increasing date, and keep the first 5 columns the same. Keep in mind that in my original dataset I have 30 initial columns to be kept the same.
As commented, try to avoid wide formatted data with columns that contain data elements such as dates, category values, other indicators. Instead use long-formatted, tidy data where ordering is much easier including aggregation, merging, plotting, and modeling.
Specifically, consider reshape to melt dates into one field such as quarter with value. Then order quarter column easily:
# RESHAPE WIDE TO LONG
long_data1989 <- reshape(data1989, varying = names(data1989)[6:ncol(data1989)],
times = names(data1989)[6:ncol(data1989)],
v.names = "value", timevar = "quarter", ids = NULL,
new.row.names = 1:1E4, direction = "long")
# ORDER DATES AND RESET row.names
long_data1989 <- `row.names<-`(with(long_data1989, long_data1989[order(date_fire, quarter),]),
NULL)
long_data1989
Online Demo
If you wanted to use dplyr here is an alternative. Note each colname would have to be unique. In you df there were some duplicate ones
library(dplyr)
data1989 <- data.frame("date_fire" = c("1987-02-01", "1987-07-03", "1988-01-01"),
"Foresttype" = c("oak", "pine", "oak"),
"meanSolarRad" = c(500, 550, 450),
"meanRainfall" = c(600, 300, 450),
"meanTemp" = c(14, 15, 12),
"1988.01.01" = c(0.5, 0.589, 0.66),
"1986.06.03" = c(0.56, 0.447, 0.75),
"1986.10.19" = c(0.8, NA, 0.83),
"1988.01.19" = c(0.75, 0.65,0.75),
"1986.06.19" = c(0.1, 0.55,0.811),
"1987.10.19" = c(0.15, 0.12, 0.780),
# "1988.01.19" = c(0.2, 0.22,0.32),
# "1986.06.19" = c(0.18, 0.21,0.23),
# "1987.10.19" = c(0.21, 0.24, 0.250),
check.names = FALSE,
stringsAsFactors = FALSE)
# Sort date column names. replace 6 with first date column
sorted_colnames = sort(names(data1989)[6:ncol(data1989)])
# Sort columns. Replace 5 with last non-date column
data1989 %>%
select(1:5, sorted_colnames)
We can convert the column names that are dates to Date class, do the order and then use that as column index
i1 <- grep('^\\d{4}\\.\\d{2}\\.\\d{2}$', names(data1989))
data1989[c(seq_len(i1[1]-1), order(as.Date(names(data1989)[i1], "%Y.%m.%d")) + i1[1]-1)]
# date_fire Foresttype meanSolarRad meanRainfall meanTemp 1986.06.03 1986.06.19 1986.06.19.1 1986.10.19 1987.10.19
#1 1987-02-01 oak 500 600 14 0.560 0.100 0.18 0.80 0.15
#2 1987-07-03 pine 550 300 15 0.447 0.550 0.21 NA 0.12
#3 1988-01-01 oak 450 450 12 0.750 0.811 0.23 0.83 0.78
# 1987.10.19.1 1988.01.01 1988.01.19 1988.01.19.1
#1 0.21 0.500 0.75 0.20
#2 0.24 0.589 0.65 0.22
#3 0.25 0.660 0.75 0.32
Base R solution (similar to #Parfaits):
# Reshape dataframe wide --> long:
df_long <-
reshape(data1989,
direction = "long",
varying = which(!(is.na(as.Date(names(data1989), "%Y.%m.%d")))),
idvar = which(is.na(as.Date(names(data1989), "%Y.%m.%d"))),
v.names = "value",
times = na.omit(as.Date(names(data1989), "%Y.%m.%d")),
timevar = "date_surveyed",
new.row.names = 1:(nrow(data1989)*length(na.omit(as.Date(names(data1989),
"%Y.%m.%d")))))
# Order the data frame and reset the index:
ordered_df_long <- data.frame(df_long[with(df_long, order(date_fire, date_surveyed)),],
row.names = NULL)

How to recombine values after a split in R?

I have a data variable X that I have done the following to
Xnew = split(X$col1,list(X$col3,X$col4))
S = sapply(Xnew,mean)
I now have a vector where each element can be accessed by
S['SomeValCol3.SomeValCol4']
Now I would like to create a vector containing columns equal to the number of unique values in col3 and where col4 is added as a column vector indexing each value. That is,
Col4 | Col3[1]| Col3[2] |....
Col4[0]| S['SomeValCol3.SomeValCol4'] | ...
.
.
.
And so on.
As an example lets say I have the following vector
S['v31.v41'] = 0.5
S['v32.v41'] = 0.25
S['v33.v41'] = 0.35
S['v31.v42'] = 0.5
S['v32.v42'] = 0.25
S['v33.v42'] = 0.35
S['v31.v43'] = 0.5
S['v32.v43'] = 0.25
S['v33.v43'] = 0.35
which I got from the split and then I want this matrix
V4 | V31 | V32 | V33
V41 0.5 0.25 035
V42 0.5 0.25 035
V43 0.5 0.25 035
Using base R
xtabs(values~V1+V2, transform(stack(S), V2=sub('\\..*', '', ind),
V1=sub('.*\\.', '', ind)))
# V2
#V1 v31 v32 v33
# v41 0.50 0.25 0.35
# v42 0.50 0.25 0.35
# v43 0.50 0.25 0.35
data
S <- structure(c(0.5, 0.25, 0.35, 0.5, 0.25, 0.35, 0.5, 0.25, 0.35
), .Names = c("v31.v41", "v32.v41", "v33.v41", "v31.v42", "v32.v42",
"v33.v42", "v31.v43", "v32.v43", "v33.v43"))
Using reshape2 library I'd first melt vector S to a data.frame and add row/column variable names
library(reshape2)
S.melted <- melt(S)
S.melted$v1 <- gsub('\\.v[[:digit:]]+', '', rownames(S.melted))
S.melted$v2 <- gsub('\\v[[:digit:]]+\\.', '', rownames(S.melted))
which gives me S.melted in format as below:
value v1 v2
v31.v41 0.50 v31 v41
v32.v41 0.25 v32 v41
...
and then obtain preferred format using acast
> acast(S.melted, v1 ~ v2)
v41 v42 v43
v31 0.50 0.50 0.50
v32 0.25 0.25 0.25
v33 0.35 0.35 0.35

Resources