Weigthing of one parameter in a multiplication formula - formula

We are working on a system to assign a score to workout sessions.
We have 2 parameters that we would like to calculate the score from:
Reps * weight
In example:
6 reps * 30 weight should give a higher score than
10 reps * 20 weight
Now:
6 reps * 30 weight = 180 score
10 reps * 20 weight = 200 score
Our goal would be that:
6 reps * 30 weight = 250 score (or similar)
10 reps * 20 weight = 220 score (or similar)
But we cannot get the formula right.
Thank you!

Related

Integrate functions for depth integrated species abundance

Hei,
I am trying to calculate the organisms quantity per class over the entire depth range (e.g., from 10 m to 90 m). To do that I have the abundance at certain depths (e.g., 10, 30 and 90 m) and I use the integrate function which calculate:
the average of abundance between each pair of depths, multiplied by the difference of the pairs of depths. The values are summed up over the entire depth water column to get a totale abundance over the water column.
See an example (only a tiny part of bigger data set with several locations and year, more class and depths):
View(df)
Class Depth organismQuantity
1 Ciliates 10 1608.89
2 Ciliates 30 2125.09
3 Ciliates 90 1184.92
4 Dinophyceae 10 0.00
5 Dinoflagellates 30 28719.60
6 Dinoflagellates 90 4445.26
integrate = function(x) {
averages = (x$organismQuantity[1:length(x)-1] + x$organismQuantity[2:length(x)]) / 2
sum(averages * diff(x$Depth))
}
result = ddply(df, .(Class), integrate)
print(result)
But I got these result and warning message for lines with NA value :
Class V1
1 Ciliates 136640.1
2 Dinoflagellates NA
3 Dinophyceae 0.0
Warning messages:
1: In averages * diff(x$Depth) :
longer object length is not a multiple of shorter object length
I don't understand why Dinoflagellates got NA value... It is the same for several others class in my complete data set (for some class abundance the integration equation applies for others I got the warning message).
thank you for the help!!
Cheers,
Lucie
Here is a way using function trapz from package caTools, adapted to the problem.
#
# library(caTools)
# Author(s)
# Jarek Tuszynski
#
# Original, adapted
trapz <- function(DF, x, y){
x <- DF[[x]]
y <- DF[[y]]
idx <- seq_along(x)[-1]
as.double( (x[idx] - x[idx-1]) %*% (y[idx] + y[idx-1]) ) / 2
}
library(plyr)
ddply(df, .(Class), trapz, x = "Depth", y = "organismQuantity")
# Class V1
#1 Ciliates 136640.1
#2 Dinoflagellates 994945.8
#3 Dinophyceae NA
Data
df <- read.table(text = "
Class Depth organismQuantity
1 Ciliates 10 1608.89
2 Ciliates 30 2125.09
3 Ciliates 90 1184.92
4 Dinophyceae 10 0.00
5 Dinoflagellates 30 28719.60
6 Dinoflagellates 90 4445.26
", header = TRUE)

How to fill a new data frame based on the value and the results of an calculus integrating this very value?

For graphical purpose, I want to create a new data frame with two columns.
The first column is the dose of the treatment received (i; 10 grammes up to 200 grammes).
The second column must be filed with the result of a calculus corresponding to the value of the dose received, id est the percentage of patients developing the disease according the corresponding dose which is given by the formula below:
The dose is extracted from a much larger dataset (data_fcpa) of more than 1 000 rows (patients).
percent_i <- round (prop.table (table (data_fcpa $ n_chir_act [data_fcpa $ cyproterone_dose > i] > 1))[2] * 100, 1)
I know how to create a new data (df) with the doses I want to explore:
df <- data.frame (dose <- seq (10, 200, by = 10))
names (df) <- c("cpa_dose")
> df
cpa_dose
1 10
2 20
3 30
4 40
5 50
6 60
7 70
8 80
9 90
10 100
11 110
12 120
13 130
14 140
15 150
16 160
17 170
18 180
19 190
20 200
For example for a dose of 10 grammes the result is:
> round (prop.table (table (data_fcpa $ n_chir_act [data_fcpa $ cyproterone_dose > 10] > 1))[2] * 100, 1)
TRUE
11.7
I suspect that a loop is needed to produce an output alike the little example provided below but, I have no idea of how to do it.
cpa_dose percentage
1 10 11.7
2 20
3 30
4 40
Any suggestion are welcomed.
Thank you in advance for your help.
It seems that you are describing a a situation where you want to show predicted effects from a statistical model? In that case, ggeffects is your best friend.
library(tidyverse)
library(ggeffects)
lm(mpg ~ hp,mtcars) %>%
ggpredict() %>%
as_tibble()
Btw, in order to answer your question it's required to provide some data and show what you have tried.

Error in using 'ddply' and 'glm' for k-value estimation (hyperbolic delay discounting) in R

I am trying to find the most suited k-value (or discount rate) that best explains my participants' choices for immediate vs delayed reward (where lower k value means they choose a lot of immediate options and higher k value means they are more "patient".)
SS = Smaller Sooner; LL = Larger Later Reward; Delay = in Days; Choice = 0:SS, 1:LL; SV = Subjective Value.
So first I assign 5001 potential k values or discount rates to each trial (from -50 to 200 in steps of 0.05) which results in a data frame with 8001600 rows (50 participants * 32 trials per participant * 5001 potential values).
This is how the k-values were assigned to the data -
uniquek<- c(seq(-50,200,0.05))
DataSoc <- do.call(rbind,lapply(1:length(uniquek),function(i) data.frame(i,SocialData)))
k <- rep(uniquek, times = nrow(SocialData))
DataSoc$k <- k
Then I create an empty data frame (called 'data_simulation' here) with 3 columns (PPN_f, k, r_squared) each 8001600 rows long.
Then I try to apply 'ddply' to a data frame to be able to perform a logistic regression using glm, something like this -
data_simulation <- ddply(DataSoc,.(PPN_f,k), function(x){
r_squared <- summary(glm(Choice ~ SV_diff, x, family=binomial()))$r_squared
return(data.frame(r_squared))}, .progress ="win")
Ideally, this would give me the r_squared values for each trial, after which I would find the k with the largest r_squared value for each participant, and assign the corresponding k-value to that participant.
BUT the regression just isn't going through. Could you help solve this issue?
Here's the first 6 rows of my raw data for reference. Thank you for your help!
> head(SocialData)
PPN_f (Participant as factor) SS LL Delay Choice SS_SV LL_SV SV_diff
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 5e7339dac6b16528d49937bc 1000 30000 60 1 1000 1000 0
2 5e7339dac6b16528d49937bc 1000 5000 60 0 1000 1000 0
3 5e7339dac6b16528d49937bc 1000 10000 60 1 1000 1000 0
4 5e7339dac6b16528d49937bc 1000 5000 30 0 1000 1000 0
5 5e7339dac6b16528d49937bc 1000 5000 5 1 1000 1000 0
6 5e7339dac6b16528d49937bc 1000 2500 14 0 1000 1000 0

Understanding the quality of the KMeans algorithm

After reading Unbalanced factor of KMeans, I am trying to understand how this works. I mean, from my examples, I can see that the less the value of the factor, the better the quality of KMeans clustering, i.e. the more balanced are its clusters. But what is the naked mathematical interpretation of this factor? Is this a known quantity or something?
Here are my examples:
C1 = 10
C2 = 100
pdd = [(C1,10), (C2, 100)]
n = 2 <-- #clusters
total = 110 <-- #points
uf = 10 * 10 + 100 * 100
uf = 100100 * 2 / 12100 = 16.5
C1 = 50
C2 = 60
pdd = [(C1, 50), (C2, 60)]
n = 2
total = 110
uf = 2500 + 3600
uf = 6100 * 2 / 12100 = 1.008
C1 = 1
C2 = 1
pdd = [(C1, 1), (C2, 1)]
n = 2
total = 2
uf = 2
uf = 2 * 2 / 2 * 2 = 1
It appears to be related to Gini index, a measure of entropy, which also uses the sum of squared counts.
as said in Cross Validated: Understanding the quality of the KMeans algorithm.

Apply LR models to another dataframe

I searched SO, but I could not seem to find the right code that is applicable to my question. It is similar to this question: Linear Regression calculation several times in one dataframe
I got a dataframe of LR coefficients following Andrie's code:
Cddply <- ddply(test, .(sumtest), function(test)coef(lm(Area~Conc, data=test)))
sumtest (Intercept) Conc
1 -108589.2726 846.0713372
2 -49653.18701 811.3982918
3 -102598.6252 832.6419926
4 -72607.4017 727.0765558
5 54224.28878 391.256075
6 -42357.45407 357.0845661
7 -34171.92228 367.3962888
8 -9332.569856 289.8631555
9 -7376.448899 335.7047756
10 -37704.92277 359.1457617
My question is how to apply each of these LR models (1-10) to specific row intervals in another dataframe in order to get x, the independent variable, into a 3rd column. For example, I would like to apply sumtest1 to Samples 6:29, sumtest2 to samples 35:50, sumtest3 to samples 56:79, etc.. in intervals of 24 and 16 samples. The sample numbers repeats after 200, so sumtest9 will be for Samples 6:29 again.
Sample Area
6 236211
7 724919
8 1259814
9 1574722
10 268836
11 863818
12 1261768
13 1591845
14 220322
15 608396
16 980182
17 1415859
18 276276
19 724532
20 1130024
21 1147840
22 252051
23 544870
24 832512
25 899457
26 285093
27 4291007
28 825922
29 865491
35 246707
36 538092
37 767269
38 852410
39 269152
40 971471
41 1573989
42 1897208
43 261321
44 481486
45 598617
46 769240
47 229695
48 782691
49 1380597
50 1725419
The resulting dataframe would look like this:
Sample Area Calc
6 236211 407.5312917
7 724919 985.1525288
8 1259814 1617.363812
9 1574722 1989.564693
10 268836 446.0919309
...
35 246707 365.2452551
36 538092 724.3591324
37 767269 1006.805521
38 852410 1111.736505
39 269152 392.9073207
Thank you for your assistance.
Is this what you want? I made up a slightly larger dummy data set of 'area' to make it easier to see how the code worked when I tried it out.
# create 400 rows of area data
set.seed(123)
df <- data.frame(area = round(rnorm(400, mean = 1000000, sd = 100000)))
# "sample numbers repeats after 200" -> add a sample nr 1-200, 1-200
df$sample_nr <- 1:200
# create a factor which cuts the vector of sample_nr into pieces of length 16, 24, 16, 24...
# repeat to a total length of the pieces is 200
# i.e. 5 repeats of (16, 24)
grp <- cut(df$sample_nr, breaks = c(-Inf, cumsum(rep(c(16, 24), 5))))
# add a numeric version of the chunks to data frame
# this number indicates the model from which coefficients will be used
# row 1-16 (16 rows): model 1; row 17-40 (24 rows): model 2;
# row 41-56 (16 rows): model 3; and so on.
df$mod <- as.numeric(grp)
# read coefficients
coefs <- read.table(text = "intercept beta_conc
1 -108589.2726 846.0713372
2 -49653.18701 811.3982918
3 -102598.6252 832.6419926
4 -72607.4017 727.0765558
5 54224.28878 391.256075
6 -42357.45407 357.0845661
7 -34171.92228 367.3962888
8 -9332.569856 289.8631555
9 -7376.448899 335.7047756
10 -37704.92277 359.1457617", header = TRUE)
# add model number
coefs$mod <- rownames(coefs)
head(df)
head(coefs)
# join area data and coefficients by model number
# (use 'join' instead of merge to avoid sorting)
library(plyr)
df2 <- join(df, coefs)
# calculate conc from area and model coefficients
# area = intercept + beta_conc * conc
# conc = (area - intercept) / beta_conc
df2$conc <- (df2$area - df2$intercept) / df2$beta_conc
head(df2, 41)

Resources