Multiple of 2 element based on certain condition - xquery

I am trying to calculate a value and display in one of the element. This value should be a multiple of 2 values based on certain condition. Like as shown in below example where for unbounded structure of ConsumptionPerRecord,
we have a element Consumption. This variable should be determine as on SizeType and Count.
if SizeType is A then value in Consumption should be 0.5 x Count(say 2) = 1
if SizeType is B then value in Consumption should be 1.0 x Count(say 2) = 3
Note - value in SizeType is a predefined list of Values ( like A, B and C) and count can be any numeric value(1,2,3,4)...
now the consumption value should be ..
if SizeType is A then value in Consumption should be 0.5( a constant which represent SizeType A) x Count(say 2) = 1
if SizeType is B then value in Consumption should be 1.0 x Count(say 2) = 3
So on..
I looked for example over google but could not find any such transformation through XQuery.
<ns0:Consumption>should be .5 * count variable</ns0:Consumption>
<ns0:Response xmlns:ns0="http://testV4">
<ns0:ReferenceData>Dummy Client reference</ns0:ReferenceData>
<ns0:DataDetails>
<ns0:Status>Not Available</ns0:Status>
<ns0:ConsumptionPerRecord>
<ns0:SizeType>A</ns0:SizeType>
<ns0:Count>2</ns0:Count>
<ns0:Consumption>should be .5 * count variable</ns0:Consumption>
<ns0:NotConsumed>5</ns0:NotConsumed>
</ns0:ConsumptionPerRecord>
<ns0:ConsumptionPerRecord>
<ns0:SizeType>B</ns0:SizeType>
<ns0:Count>2</ns0:Count>
<ns0:Consumption>0</ns0:Consumption>
<ns0:NotConsumed>2</ns0:NotConsumed>
</ns0:ConsumptionPerRecord>
<ns0:Message>Success</ns0:Message>
<ns0:StatusCode>1000</ns0:StatusCode>
</ns0:Response>

Related

Run code on each element of a list and place results in a dataframe?

I have a list with 100 elements in it.
Each element is a dataframe like this (shortened version):
ID x y
1 436823 7203241
1 510444 7188575
1 473831 7177197
1 507846 7202005
1 510444 7202006
2 436823 7203241
2 510444 7188575
2 473831 7177197
2 507846 7202005
2 510444 7202006
My aim for each element is to calculate home range sizes for each ID, and place the results in a new dataframe.
My code for calculating home range (plugin KDE) is:
proj4string(data) <- CRS("+init=epsg:5321")
library(ks)
Hpi1 <- Hpi(x = data, nstage = 2, pilot = "samse", pre = "scale")
coordinates(data) <- c("x", "y")
data2<-kernelUD(data, h = 0.009153871, grid = 1000, same4all = FALSE,hlim = c(0.1,
20), kern = "bivnorm", extent = 0.5)
ver <- getverticeshr(data2, 95)
After running this, I type ver$aera to obtain the home range size.
Once I get the home range size for all ID's each list element, I would like them to be placed in a new dataset like this (for example):
List element ID area
1 1 60000
1 2 10080
.
.
.
etc.
I was thinking of either doing a loop or function or both, but I'm new to R so am not sure how to go about it. Help would be greatly appreciated!

How to measure the average gap size in a time series panel per id?

In order to deal with product time series where lots of them showing intermittent demand, I want to measure how large the gaps consisting of zero values in between the series are.
In the next step I want to measure the average gap length per id. In my example this would be 4.33 for ID 1.
I found an older solution for measurement of gap sizes in time series, that does not give me the result in way, that I am able to process it further and derive measures like average gap size and min and max gap size:
Gap size calculation in time series with R
library(tidyverse)
library(lubridate)
library(data.table)
data <- tibble(id = as.factor(c(rep("1",24),rep("2",24),rep("3",24))),
date = rep(c(ymd("2013-01-01")+ months(0:23)),3),
value = c(c(rep(4,5),0,0,0,0,0,0,0,0,7,0,0,0,0,11,23,54,33,45,0),
c(4,6,1,2,3,4,4,6,8,11,18,6,6,1,7,7,13,9,4,33,3,6,81,45),
c(rep(4,5),0,0,0,5,2,0,0,0,7,0,0,8,0,11,23,54,33,0,0))
)
# this gives me the repeated gap size per observation
setDT(data)
data[, gap := rep(rle(value)$lengths, rle(value)$lengths) * (value == 0)]
# I want the distinct gap size per id
1: c(8,4,1)
2: c(0)
3: c(3,3,2,1,2)
If I would be able to determine the number of gaps per id, I could also calculate the mean gap size, by retrieving the total number zeros per id like this (13/3 = 4.33):
# total number of zeros per id
data <- as_tibble(data)
data %>% group_by(id) %>% summarise(zero_sum = length(which(value == 0)))
You could use rle:
library(data.table)
setDT(data)
data[,.(n=with(rle(value==0),lengths*values)),by=id][n>0]
id n
<fctr> <int>
1: 1 8
2: 1 4
3: 1 1
4: 3 3
5: 3 3
6: 3 2
7: 3 1
8: 3 2
or in the expected format:
data[,.(n=list(with(rle(value==0),{r = lengths*values;
r <- r[r!=0];
if (length(r)==0) {r <- 0L};
r }))),by=id]
id n
<fctr> <list>
1: 1 8,4,1
2: 2 0
3: 3 3,3,2,1,2

Perform conditional calculations in a R data frame

I have data in a dataframe in R like this:
Value | Metric
10 | KG
5 | lbs etc.
I want to create a new column (weight) where I can calculate a converted weight based on the Metric - something like if Metric = "Kg" then Value * 1, if Metric = "lbs" then Value * 2.20462
I also have another use case I want to do a similar conditional calculation but based on continuous values i.e. if x >= 2 then "Classification" elseif x >= 1 then "Classification 2" else "Other
Any ideas that might work for both in R?
Does this work:
library(dplyr)
df %>% mutate(converted_wt = case_when(Metric == 'lbs' ~ Value * 2.20462, TRUE ~ Value))
Value Metric converted_wt
1 10 KG 10.0000
2 5 lbs 11.0231
If you have other units apart from "KG" and "lbs" you need to include those in case_when condition accordingly.

How to generate a sample time series data set containing multiple individuals of different length (rows)?

I want to simulate a time series data frame that contains observations of 5 variables that were taken on 10 individuals. I want the number of rows (observations) to be different between each individual. For instance, I could start with something like this:
ID = rep(c("alp", "bet", "char", "delta", "echo"), times = c(1000,1200,1234,980,1300))
in which case ID represents each unique individual (I would later turn this into a factor), and the number of times each ID was repeated would represent the length of measurements for that factor. I would next need to create a column called Time with sequences from 1:1000, 1:1200, 1:1234, 1:980, and 1:1300 (to represent the length of measurements for each individual). Lastly I would need to generate 5 columns of random numbers for each of the 5 variables.
There are tons of ways to go about generating this data set, but what would be the most practical way to do it?
You can do :
ID = c("alp", "bet", "char", "delta", "echo")
num = c(1000,1200,1234,980,1300)
df <- data.frame(ID = rep(ID, num), num = sequence(num))
df[paste0('rand', seq_along(ID))] <- rnorm(length(ID) * sum(num))
head(df)
# ID num rand1 rand2 rand3 rand4 rand5
#1 alp 1 0.1340386 0.95900538 0.84573154 0.7151784 -0.07921171
#2 alp 2 0.2210195 1.67105483 -1.26068288 0.9171749 -0.09736927
#3 alp 3 1.6408462 0.05601673 -0.35454240 -2.6609228 0.21615254
#4 alp 4 -0.2190504 -0.05198191 -0.07355602 1.1102771 0.88246516
#5 alp 5 0.1680654 -1.75323736 -1.16865142 -0.4849876 0.20559750
#6 alp 6 1.1683839 0.09932759 -0.63474826 0.2306168 -0.61643584
I have used rnorm here, you can use any other distribution to generate random numbers.

New variable: sum of numbers from a list powered by value of different columns

This is my first question in Stackoverflow. I am not new to R, although I sometimes struggle with things that might be considered basic.
I want to calculate the count median diameter (CMD) for each of my rows from a Particle Size Distribution dataset.
My data looks like this (several rows and 53 columns in total):
date CPC n3.16 n3.55 n3.98 n4.47 n5.01 n5.62 n6.31 n7.08 n7.94
2015-01-01 00:00:00 5263.434 72.988 140.346 138.801 172.473 344.806 484.415 606.430 739.625 927.082
2015-01-01 01:00:00 4813.182 152.823 80.861 140.017 213.382 264.496 359.455 487.293 840.349 1069.846
Each variable starting with "n" indicates the number of particles for the corresponding size (variable n3.16 = number of particles of median size of 3.16nm). I will divide the values by 100 prior to the calculations, in order to avoid such high numbers that prevent from the computation.
To compute the CMD, I need to do the following calculation:
CMD = (D1^n1*D2^n2...Di^ni)^(1/N)
where Di is the diameter (to be extracted from the column name), ni is the number of particles for diameter Di, and N is the total sum of particles (sum of all the columns starting with "n").
To get the Di, I created a numeric list from the column names that start with n:
D <- as.numeric(gsub("n", "", names(data)[3:54]))
This is my attempt to create a new variable with the calculation of CMD, although it doesn't work.
data$cmd <- for i in 1:ncol(D) {
prod(D[[i]]^data[,i+2])
}
I also tried to use apply, but I again, it didn't work
data$cmd <- for i in 1:ncol(size) {
apply(data,1, function(x) prod(size[[i]]^data[,i+2])
}
I have different datasets from different sites which have different number of columns, so I would like to make code "universal".
Thank you very much
This should work (I had to mutilate your date variable because of read.table, but it is not involved in the calculations, so just ignore that):
> df
date CPC n3.16 n3.55 n3.98 n4.47 n5.01 n5.62 n6.31 n7.08 n7.94
1 2015-01-01 5263.434 72.988 140.346 138.801 172.473 344.806 484.415 606.430 739.625 927.082
2 2015-01-01 4813.182 152.823 80.861 140.017 213.382 264.496 359.455 487.293 840.349 1069.846
N <- sum(df[3:11]) # did you mean the sum of all n.columns over all rows? if not, you'd need to edit this
> N
[1] 7235.488
D <- as.numeric(gsub("n", "", names(df)[3:11]))
> D
[1] 3.16 3.55 3.98 4.47 5.01 5.62 6.31 7.08 7.94
new <- t(apply(df[3:11], 1, function(x, y) (x^y), y = D))
> new
n3.16 n3.55 n3.98 n4.47 n5.01 n5.62 n6.31 n7.08 n7.94
[1,] 772457.6 41933406 336296640 9957341349 5.167135e+12 1.232886e+15 3.625318e+17 2.054007e+20 3.621747e+23
[2,] 7980615.0 5922074 348176502 25783108893 1.368736e+12 2.305272e+14 9.119184e+16 5.071946e+20 1.129304e+24
df$CMD <- rowSums(new)^(1/N)
> df
date CPC n3.16 n3.55 n3.98 n4.47 n5.01 n5.62 n6.31 n7.08 n7.94 CMD
1 2015-01-01 5263.434 72.988 140.346 138.801 172.473 344.806 484.415 606.430 739.625 927.082 1.007526
2 2015-01-01 4813.182 152.823 80.861 140.017 213.382 264.496 359.455 487.293 840.349 1069.846 1.007684

Resources