Cumulative Distinct Numbers and Reset - r

i want to create a two columns about unique values in a rows. And another when get to 25 distinct values.
Let take a example:
raffle Bola1 Ball2 Ball3 Ball4 Ball5 Ball6 Ball7 Ball8 Ball9 Ball10 Ball11 Ball12 Ball13 Ball14 Ball15
2 23 15 05 04 12 16 20 06 11 19 24 01 09 13 07
3 20 23 12 08 06 01 07 11 14 04 16 10 09 17 24
4 16 05 25 24 23 08 12 02 17 18 01 10 04 19 13
5 15 13 20 02 11 24 09 16 04 23 25 12 08 19 01
6 23 19 01 05 07 21 16 10 15 25 06 02 12 04 17
7 22 04 15 08 16 14 21 23 12 01 25 19 07 10 18
8 19 16 18 09 13 08 05 25 17 10 06 15 01 22 20
9 21 04 17 05 03 13 16 09 20 24 25 19 11 15 10
10 24 19 08 23 06 02 20 11 09 03 04 10 05 12 14
11 24 09 08 19 20 22 06 10 11 16 07 25 23 02 12
12 11 05 25 01 09 08 16 04 07 24 17 02 12 14 10
13 13 06 10 05 08 14 03 11 16 15 09 17 19 07 23
14 14 21 13 19 20 06 09 05 07 23 18 01 15 02 25
15 23 06 21 04 10 24 16 01 15 02 08 19 12 18 25
16 24 17 05 08 07 12 13 02 15 10 19 25 23 21 06
17 13 20 17 01 06 07 02 14 05 09 16 19 03 21 18
18 02 23 10 07 11 14 17 22 15 06 24 08 19 20 18
19 15 17 10 23 11 24 13 14 06 02 08 05 20 16 07
20 04 09 08 24 16 20 03 17 18 19 07 06 23 14 10
21 05 02 01 22 19 08 24 04 25 23 18 20 14 11 16
22 13 15 05 09 07 10 01 03 22 02 25 14 06 04 12
23 10 11 05 19 18 14 06 04 20 01 08 03 12 16 17
24 01 19 21 14 02 23 25 05 20 11 07 10 24 17 03
25 04 23 20 02 05 13 07 09 24 03 01 06 14 22 16
26 19 11 07 16 08 21 05 10 20 13 23 09 17 14 22
27 25 06 22 21 11 24 03 14 12 13 20 08 10 15 18
28 18 21 11 07 09 03 20 16 14 12 13 17 01 19 10
29 13 14 06 01 24 04 08 05 17 22 21 19 20 09 16
30 22 02 01 17 08 04 19 20 11 14 06 21 07 23 03
I have 15 distinct values, in first rows,
I have plus 6 distinct values, in second rows,
I have plus 3 distinct values, in a third rows,
On the seven row, i complete all numbers, 25 distinct values,
I need to memory this information, like this
raffle Ball1 Ball15 unique_balls group
1 16 02 15 1
2 22 19 21 1
...
7 24 10 25 1
8 8 1 15 2
When i get to 25 distinct values, i indicate another group!
I have more than 1 hundread raffle, help me!

If you want to calculate unique values in each row and also carry it forward till the threshold is reached, we can use a for loop
num <- numeric(length = 0L) #Vector to store unique values
threshold <- 25 #Threshold value to reset
df$group <- 1 #Initialise all group values to 1
count <- 1 #Variable to keep the count of unique groups
#For every row in the dataframe
for (i in seq_len(nrow(df))) {
#Get all the unique values from previous rows before threshold was reached
#and append new unique values for this row
num <- unique(c(num, as.integer(df[i, ])))
#If the length of unique values reaches the threshold
if (length(num) >= threshold) {
df$group[i] <- count
#Empty the unique values vector
num <- numeric(length = 0L)
#Increment the group count by 1
count = count + 1
}
else {
#If the threshold is not reached, continue the previous count
df$group[i] <- count
}
}
df$group
# [1] 1 1 1 1 1 1 2 2 2 2 3 3 3 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 7

Related

rename the months of a ts object in r

Hello I have the following data of a time series object
set.seed(2019)
serie <- ts(rpois(72,25), start = c(2012,1), frequency = 12)
serie
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2012 28 22 36 21 26 27 24 26 32 26 29 16
2013 24 28 21 29 31 20 18 25 38 34 23 22
2014 37 25 28 31 21 25 28 26 29 25 23 23
2015 24 23 23 21 16 21 33 23 17 21 30 31
2016 20 23 23 27 23 28 27 23 31 36 25 20
2017 22 24 19 24 26 23 23 25 31 26 23 20
I need to change the name of an ts object, in r. By default the months are in English but I would like to put them in Spanish. Any idea how to do it. Next I leave the vector with the names I want to put in the ts object.
nom <- c("Ene","Feb","Mar","Abr","May","Jun","Jul","Ago","Sep","Oct","Nov","Dic")
print.ts uses .preformat.ts which hard codes month.abb which is a vector of abbreviated English month names but we can use trace to set month.abb to nom at the top of that function:
trace(.preformat.ts, quote(month.abb <- nom), print = FALSE)
serie
giving:
Ene Feb Mar Abr May Jun Jul Ago Sep Oct Nov Dic
2012 28 22 36 21 26 27 24 26 32 26 29 16
2013 24 28 21 29 31 20 18 25 38 34 23 22
2014 37 25 28 31 21 25 28 26 29 25 23 23
2015 24 23 23 21 16 21 33 23 17 21 30 31
2016 20 23 23 27 23 28 27 23 31 36 25 20
2017 22 24 19 24 26 23 23 25 31 26 23 20
To turn it off:
untrace(.preformat.ts)

Custom of Reordering dataframes based on column values

I have a dataframe that looks like this (there are hundreds of more rows)
hour magnitude tornadoCount hourlyTornadoCount Percentage Tornadoes
1: 01 AM 0 5 18 0.277777778
2: 01 AM 1 9 18 0.500000000
3: 01 AM 2 2 18 0.111111111
4: 01 AM 3 2 18 0.111111111
5: 01 PM 0 76 150 0.506666667
6: 01 PM 1 45 150 0.300000000
7: 01 PM 2 21 150 0.140000000
8: 01 PM 3 5 150 0.033333333
9: 01 PM 4 3 150 0.020000000
10: 02 AM 0 4 22 0.181818182
11: 02 AM 1 6 22 0.272727273
12: 02 AM 2 11 22 0.500000000
13: 02 AM 4 1 22 0.045454545
14: 02 PM 0 98 173 0.566473988
15: 02 PM 1 36 173 0.208092486
16: 02 PM 2 25 173 0.144508671
17: 02 PM 3 11 173 0.063583815
18: 02 PM 4 2 173 0.011560694
19: 02 PM 5 1 173 0.005780347
20: 03 AM 1 6 9 0.666666667
21: 03 AM 2 2 9 0.222222222
22: 03 AM 3 1 9 0.111111111
23: 03 PM 0 116 257 0.451361868
24: 03 PM 1 84 257 0.326848249
25: 03 PM 2 39 257 0.151750973
26: 03 PM 3 12 257 0.046692607
27: 03 PM 4 6 257 0.023346304
28: 04 AM 0 4 16 0.250000000
29: 04 AM 1 5 16 0.312500000
30: 04 AM 2 5 16 0.312500000
I want to reorganize this such that the data is arrange chronologically according to the "hour" column. Is there a way to do this? Thanks!
You can transform to a 24-hour based time using lubridate parser (%I is decimal hour (1-12) and %p is AM/PM indicator) an then sort based on that so using dpylr and lubridate:
library(dplyr)
library(lubridate)
ordered_df <- df %>%
mutate(hour_24 = parse_date_time(hour, '%I %p')) %>%
arrange(hour_24)

Rename several columns (variable number)

I have a dataset where the last columns indicate the number of stops extracted from that dataset.
ColA ColB ColC 1 2 3 4 5 6 7 8 9 10 (...)
a g c a q e r e r q g h q (...)
What I want is to select from column 1, until the last column, and add Stop before it, ending up with Stop1, Stop2, etc...
The problem is that those columns can vary. Sometimes I have 10 after 1 other times I have 6.
I've tried with dplyr and data.table but I'm not sure how to automate this.
EDIT: ColA to ColC are fixed and always the same.
If I correctly understood your problem, this is a sufficiently flexible code that should solve your problem. Start considering the following dataset:
set.seed(1)
df <- data.frame(matrix(rpois(130, 20),ncol=13))
names(df) <- c(paste("Col",LETTERS[1:3],sep=""),as.character(1:10))
df
#######
ColA ColB ColC 1 2 3 4 5 6 7 8 9 10
1 17 21 20 13 13 15 29 25 16 15 12 23 17
2 25 17 11 24 23 14 22 23 25 14 18 19 15
3 25 18 22 18 19 30 16 19 23 27 18 19 11
4 21 18 24 25 23 19 19 18 27 23 18 16 18
5 13 21 16 18 21 23 22 18 22 24 22 26 15
6 22 16 17 27 17 20 24 24 14 21 19 17 15
7 23 23 18 22 16 16 20 18 21 27 17 22 14
8 22 22 17 17 26 13 19 25 24 17 15 13 20
9 18 24 21 22 28 26 15 22 23 20 19 15 27
10 26 23 19 16 18 20 17 25 16 20 19 18 19
Now rename columuns as required:
k <- which(names(df)=="1")
names(df)[k:ncol(df)] <- paste("Stop",1:(ncol(df)-k+1),sep="")
df
#############
ColA ColB ColC Stop1 Stop2 Stop3 Stop4 Stop5 Stop6 Stop7 Stop8 Stop9 Stop10
1 17 21 20 13 13 15 29 25 16 15 12 23 17
2 25 17 11 24 23 14 22 23 25 14 18 19 15
3 25 18 22 18 19 30 16 19 23 27 18 19 11
4 21 18 24 25 23 19 19 18 27 23 18 16 18
5 13 21 16 18 21 23 22 18 22 24 22 26 15
6 22 16 17 27 17 20 24 24 14 21 19 17 15
7 23 23 18 22 16 16 20 18 21 27 17 22 14
8 22 22 17 17 26 13 19 25 24 17 15 13 20
9 18 24 21 22 28 26 15 22 23 20 19 15 27
10 26 23 19 16 18 20 17 25 16 20 19 18 19
I hope it can help you.

Plotting Minimum Distance within clusters and linkage distance between clusters

I using Hierarchical average-linkage method to do clustering using Euclidean distance. To find cluster number (k) to cut I need to do two plots one for Minimum Distance within clusters against number of cluster (graph 1) and one for linkage distance between clusters against number of cluster (graph 2).
> df
Site1 Site2 Site3 Site4 Site5 Site6
1985 11 0 5 15 13 15
1986 12 12 5 31 14 26
1987 23 21 17 14 25 12
1988 22 25 18 17 24 14
1989 11 16 8 18 13 19
1990 7 5 21 8 9 24
1991 20 13 9 21 22 7
1992 15 11 6 19 17 20
1993 19 18 9 11 21 11
1994 33 9 28 17 26 20
1995 16 14 19 33 17 10
1996 14 21 25 4 6 47
1997 4 0 11 22 14 16
1998 10 31 13 26 12 14
1999 24 17 18 41 19 20
2000 21 17 23 19 23 14
2001 12 8 6 7 19 20
2002 19 24 19 31 24 17
2003 13 29 10 28 7 9
2004 19 14 19 22 20 13
2005 16 8 9 10 11 13
2006 8 9 46 9 20 19
2007 12 10 15 13 10 9
2008 12 18 25 12 47 22
2009 19 18 18 23 21 20
2010 23 10 46 35 25 12
2011 20 35 18 30 22 18
2012 23 13 23 34 25 34
2013 17 28 20 13 19 21
2014 19 22 16 16 21 23
df2 <- data.frame(t(df))
tree <- hclust(dist(df2))
Since there's no question stated, I'm assuming that you are interested to plot the figure above with the example data-set. Please correct if I'm wrong with that assumption.
(i) find the number of groups based on sequence linkage distances. Sequence of linkage distance in this case was eyeballed from plot(tree):
library(dplyr)
cls.df <- data.frame(h=40:100)
cls.df$k <- sapply(cls.df$h, function(x) cutree(tree, h=x) %>% max )
(ii) clean the table by retaining only the minimum linkages distance h for number of group k
cls.df <- cls.df %>%
group_by(k) %>%
summarise(h=min(h))
(iii) Plot:
library(ggplot2)
ggplot(cls.df, aes(k, h)) +
geom_line() +
geom_point() +
theme_bw() +
ylab("Linkage Distance") +
xlab("Number of Cluster")

How to sum a field using some conditions in Axapta?

I have a user table like this
ID Date Value
---------------------------
1001 31 01 14 2035.1
1002 31 01 14 1384.65
1003 31 01 14 1011.1
1004 31 01 14 1187.04
1001 28 02 14 2035.1
1002 28 02 14 1384.65
1003 28 02 14 1011.1
1004 28 02 14 1188.86
1001 31 03 14 2035.1
1002 31 03 14 1384.65
1003 31 03 14 1011.1
1004 31 03 14 1188.86
1001 30 04 14 2066.41
1002 30 04 14 1405.95
1003 30 04 14 1026.66
1004 30 04 14 1207.15
And I want to make a sum from this table like this
ID Date Value Total
---------------------------------------
1001 31 01 14 2035.1 2035.1
1002 31 01 14 1384.65 1384.65
1003 31 01 14 1011.1 1011.1
1004 31 01 14 1187.04 1187.04
1001 28 02 14 2035.1 4070.2
1002 28 02 14 1384.65 2769.3
1003 28 02 14 1011.1 2022.2
1004 28 02 14 1188.86 2375.9
1001 31 03 14 2035.1 6105.3
1002 31 03 14 1384.65 4153.95
1003 31 03 14 1011.1 3033.3
1004 31 03 14 1188.86 3564.76
1001 30 04 14 2066.41 8171.71
1002 30 04 14 1405.95 5180.61
1003 30 04 14 1026.66 4059.96
1004 30 04 14 1207.15 4771.91
I have id, for each id for the first month it should write it is value for total and for second month of that id, it should add the value of first month + second month and it should go on like this. How can I do this summation in X++?
Can anyone help me?
It can be done as a display method on the table:
display Amount total()
{
return (select sum(Value) of Table
where Table.Id == this.Id &&
Table.Date <= this.Date).Value;
}
Change the table and field names to your fit.
This may not be the fastest way to do it though. In say a report context, it might be better to keep a running total for each id (in a map).
Also it can be done in a select like this:
Table table1, table2
while select table1
group Date, Id, Value
inner join sum(Value) of table2
where table2.Id == table1.Id &&
table2.Date <= table1.Date
{
...
}
You need to group on the wanted fields, because it is an aggregate select.

Resources