Problem with radarchart using fmbs library - r

I'm helping a friend making figures to her publication. She need radar charts, send me the data. I'm using fmsb library in R, but have some problem. The data is the following (some modified, because it's unpublished :) )
consequences
timeline
personal control
treatment
control
identity
concern
comprehensibility
emotions
Min
0
0
0
0
0
0
0
0
Max
10
10
10
10
10
10
10
10
Mean
7.74
6.14
5.2
2.82
7.12
7.18
2.44
7.26
The code:
radarchart(chaos_story2,
axistype = 4,
axislabcol = "black",
seg = 5,
caxislabels= c(0,2,4,6,8,10),
cglcol="black")
The problem, that it seems the radarchart use the 1-mean values to plot the means, not the actual values. How I can solve this?
Thank you in advance

The issue is that the min and max rows are in the wrong order. According to the docs ?radarchart:
If maxmin is TRUE, this must include maximum values as row 1 and minimum values as row 2 for each variables, and actual data should be given as row 3 and lower rows.
Fixing the order gives the desired result.
Note: There is probably an issue with your example data, i.e. I dropped the consequences column as it contained the rownames.
library(fmsb)
# Max has to be first row, min the second
chaos_story2 <- chaos_story2[c(2, 1, 3), ]
radarchart(chaos_story2,
axistype = 4,
axislabcol = "black",
seg = 5,
caxislabels= c(0,2,4,6,8,10),
cglcol="black")
DATA
chaos_story2 <- data.frame(
consequences = c("Min", "Max", "Mean"),
timeline = c(0, 10, 7.74),
personal.control = c(0, 10, 6.14),
treatment = c(0, 10, 5.2),
control = c(0, 10, 2.82),
identity = c(0, 10, 7.12),
concern = c(0, 10, 7.18),
comprehensibility = c(0, 10, 2.44),
emotions = c(0, 10, 7.26)
)
rownames(chaos_story2) <- chaos_story2$consequences
chaos_story2$consequences <- NULL
``

Related

Multiply probability distributions in R [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 7 months ago.
Improve this question
I'm trying to multiply some probability functions as to update the probability given certain factors. I've tried several things using the pdqr and bayesmeta packages, but they all work out not the way I intend, what am I missing?
A reproducible example showing two different distributions, a and b, which I want to multiply. That is because, as you notice, b doesn't have measurements in the low values, so a probability of 0. This should be reflected in the updated distribution.
library(tidyverse)
library(pdqr)
library(bayesmeta)
#measurements
a <- c(1, 2, 2, 4, 5, 5, 6, 6, 7, 7, 7, 8, 7, 8, 2, 6, 9, 10)
b <- c(5, 6, 6, 6, 7, 7, 7, 7, 7, 8, 8, 8, 9, 9, 9, 7)
#create probability distribution functions
distr_a <- new_d(a, type = "continuous")
distr_b <- new_d(b, type = "continuous")
#try to combine distributions
summarized <- distr_a + distr_b
multiplied <- distr_a * distr_b
mixture <- form_mix(list(distr_a, distr_b))
convolution <- convolve(distr_a, distr_b)
The resulting PDF's are plotted like this:
The bayesmeta::convolve() does the same as summarizing two pdqr PDF's and seem to oddly shift the distributions to the right and make them not as high as supposed to be.
Ordinarily multiplying the pdqr PDF's leaves a very low probablity overall.
Using the pdqr::form_mix() seems to even the PDF's out in between, but leaving probabilies above 0 for the lower x-values.
So, I tried to gain some insight in what I wanted to do, by using the PDF's for a and b to generate probabilities for each x value and multiply that:
#multiply distributions manually
x <- c(1:10)
manual <- data.frame(x) %>%
mutate(a = distr_a(x),
b = distr_b(x),
multiplied = a*b)
This indeed gives a resulting shape I am after, it however (logically) has too low probabilities:
I would like to multiply (multiple) PDF's. What am I doing wrong? Are my statistics wrong, or am I missing a usefull function?
UPDATE:
It seems I am a stats noob on this subject, but I would like to achieve something like the below distribution. Given that both situation a and b are true, I would expect the distribution te be something like the dotted line. Is that possible?
multiplied is the correct one. One can check with log-normal distributions. The sum of two independant log-normal random variables is log-normal with µ = µ_a + µ_b and sigma² = sigma²_a + sigma²_b.
a <- rlnorm(25000, meanlog = 0, sdlog = 1)
b <- rlnorm(25000, meanlog = 1, sdlog = 1)
distr_a <- new_d(a, type = "continuous")
distr_b <- new_d(b, type = "continuous")
distr_ab <- form_trans(
list(distr_a, distr_b), trans = function(x, y) x*y
)
# or: distr_ab <- distr_a * distr_b
plot(distr_ab, xlim = c(0, 40))
curve(dlnorm(x, meanlog = 1, sdlog = sqrt(2)), add = TRUE, col = "red")
As demonstrated here:
https://www.r-bloggers.com/2019/05/bayesian-models-in-r-2/
# Example distributions
probs <- seq(0,1,length.out= 100)
prior <- dbinom(x = 8, prob = probs, size = 10)
lik <- dnorm(x = probs, mean = .5, sd = .1)
# Multiply distributions
unstdPost <- lik * prior
# If you wanted to get an actual posterior, it must be a probability
# distribution (integrate to 1), so we can divide by the sum:
stdPost <- unstdPost / sum(unstdPost)
# Plot
plot(probs, prior, col = "black", # rescaled
type = "l", xlab = "P(Black)", ylab = "Density")
lines(probs, lik / 15, col = "red")
lines(probs, unstdPost, col = "green")
lines(probs, stdPost, col = "blue")
legend("topleft", legend = c("Lik", "Prior", "Unstd Post", "Post"),
text.col = 1:4, bty = "n")
Created on 2022-08-06 by the reprex package (v2.0.1)

How to write a function that collects a specific list of observations from a time series data frame

In the data set created below, assume I randomly picked up 20 flat rocks. Each of these rocks were assigned a unique ID number. I measured the concentration of 7 substances (Copper,Iron,Carbon,Lead,Mg,CaCO, and Zinc) across the surface of the longest axis of each rock. Distance is recorded in mm, and therefore is a function of each rocks length. Note that not all Rocks are of the same length. Location is a grouping variable that describes where the Rock was picked up.
ID <- data.frame(ID=rep(c(12,122,242,329,595,130,145,245,654,878), each = 200))
ID2 <- data.frame(ID=rep(c(863,425,24,92,75,3,200,300,40,500), each = 300))
RockID<-data.frame(RockID = c(unlist(ID), unlist(ID2)))
Location <- rep(c("Alpha","Beta","Charlie","Delta","Echo"), each = 1000)
a <- rep(c(1:200),times = 10)
b <- rep(c(1:300), times = 10)
Time <- data.frame(Time = c(unlist(a), unlist(b)))
set.seed(1)
Copper <- rnorm(5000, mean = 0, sd = 5)
Iron <- rnorm(5000, mean = 0, sd = 10)
Carbon <- rnorm(5000, mean = 0, sd = 1)
Lead <- rnorm(5000, mean = 0, sd = 4)
Mg <- rnorm(5000, mean = 0, sd = 6)
CaCO <- rnorm(5000, mean = 0, sd = 2)
Zinc <- rnorm(5000, mean = 0, sd = 3)
data <-cbind(RockID, Location, Time,Copper,Iron,Carbon,Lead,Mg,CaCO,Zinc)
data$ID <- as.factor(data$RockID)
I want to create a new data frame that contains the following information:
1. The first observation and the last observation for each individual
2. The average of the first 3 observations and last 3 observations for each individual
3. The same as step 2. for the first and last 5, 7, and 10 observations
I want the new data frame to be set up like this:
ID FirstPt First3 First5 First7 First10 LastPt Last3 Last5 Last7 Last10
12 … … … … … … … … … …
122
242
329
595
130
145
245
654
878
863
425
ect...
How would I write a function to accomplish this?
We can create a function to calculate average of first and last n values. Use pivot_longer to get data in long format, group_by each RockID and substance and calculate the mean.
library(dplyr)
average_of_first_n_values <- function(value, x) mean(head(value, x))
average_of_last_n_values <- function(value, x) mean(tail(value, x))
data %>%
tidyr::pivot_longer(cols = Copper:Zinc) %>%
group_by(RockID, name) %>%
summarise(first_obs = first(value),
last_obs = last(value),
first_3_avg = average_of_first_n_values(value, 3),
first_5_avg = average_of_first_n_values(value, 5),
first_7_avg = average_of_first_n_values(value, 7),
first_10_avg = average_of_first_n_values(value, 10),
last_3_avg = average_of_last_n_values(value, 3),
last_5_avg = average_of_last_n_values(value, 5),
last_7_avg = average_of_last_n_values(value, 7),
last_10_avg = average_of_last_n_values(value, 10))

Specifying x values when converting approx() to data frame

I am trying to get a data frame from the output of approx(t,y, n=120) below. My intent is for the input values returned to be in increments of 0.25; for instance, 0, 0.25, 0.5, 0.75, ... so I've set n = 120.
However, the data frame I get doesn't return those input values.
t <- c(0, 0.5, 2, 5, 10, 30)
z <- c(1, 0.9869, .9478, 0.8668, .7438, .3945)
data.frame(approx(t, z, n = 120))
I appreciate any assistance in this matter.
There are 121, not 120, points from 0 to 30 inclusive in steps of 0.25
length(seq(0, 30, 0.25))
## [1] 121
so use this:
approx(t, z, n = 121)
Another approach is:
approx(t, z, xout = seq(min(t), max(t), 0.25))

R function with multiple operators

data ranges from -6 to 6 and I am trying to create 3 categories, however my function is not returning anyone for category 2 even though there are people present
FFMIBMDcopdcases$lowBMD = ifelse((FFMIBMDcopdcases$copd_Tscore >= -1) , 0,
ifelse((FFMIBMDcopdcases$copd_Tscore < -1), 1,
ifelse((FFMIBMDcopdcases$copd_Tscore <= -2.5), 2, NA)))
Try using cut function. Example:
myValues <- runif(n = 20, min = -6, max = 6)
as.numeric(as.character(cut(x = myValues, breaks = c(-Inf, -2.5, -1, Inf), labels = c(2, 1, 0))))
Since you want a numeric result it might be easiest to use findInterval although you will need to subtract the result from 2 to get in the inverse order ( 2 for lowest and 0 for highest) :
FFMIBMDcopdcases$lowBMD = 2 - findInterval(FFMIBMDcopdcases$copd_Tscore ,
c(-Inf, -2.5, -1, Inf) )

Density distributions in R

An assignment has tasked us with creating a series of variables: normal1, normal2, normal3, chiSquared1 and 2, t, and F. They are defined as follows:
library(tibble)
Normal.Frame <- data_frame(normal1 = rnorm(5000, 0, 1),
normal2 = rnorm(5000, 0, 1),
normal3 = rnorm(5000, 0, 1),
chiSquared1 = normal1^2,
chiSquared2 = normal2^2,
F = sum(chiSquared1/chiSquared2),
t = sum(normal3/sqrt(chiSquared1 )))
We then have to make histograms of the distributions for normal1, chiSquared1 and 2, t, and F, which is simple enough for normal1 and the chiSquared variables, but when I try to plot F and t, the plot space is blank.
Our lecturer recommended limiting the range of F to 0-10, and t to -5 to 5. To do this, I use:
HistT <- hist(Normal.Frame$t, xlim = c(-5, 5))
HistF <- hist(Normal.Frame$F, xlim = c(0, 10))
Like I mentioned, this yields blank plots.
Your t and F are defined as sums; they will be single values. If those values are outside your range, the histogram will be empty. If you remove the sum() function you should get the desired results.

Resources