Custom of Reordering dataframes based on column values

Custom of Reordering dataframes based on column values - r

I have a dataframe that looks like this (there are hundreds of more rows)
hour magnitude tornadoCount hourlyTornadoCount Percentage Tornadoes
1: 01 AM 0 5 18 0.277777778
2: 01 AM 1 9 18 0.500000000
3: 01 AM 2 2 18 0.111111111
4: 01 AM 3 2 18 0.111111111
5: 01 PM 0 76 150 0.506666667
6: 01 PM 1 45 150 0.300000000
7: 01 PM 2 21 150 0.140000000
8: 01 PM 3 5 150 0.033333333
9: 01 PM 4 3 150 0.020000000
10: 02 AM 0 4 22 0.181818182
11: 02 AM 1 6 22 0.272727273
12: 02 AM 2 11 22 0.500000000
13: 02 AM 4 1 22 0.045454545
14: 02 PM 0 98 173 0.566473988
15: 02 PM 1 36 173 0.208092486
16: 02 PM 2 25 173 0.144508671
17: 02 PM 3 11 173 0.063583815
18: 02 PM 4 2 173 0.011560694
19: 02 PM 5 1 173 0.005780347
20: 03 AM 1 6 9 0.666666667
21: 03 AM 2 2 9 0.222222222
22: 03 AM 3 1 9 0.111111111
23: 03 PM 0 116 257 0.451361868
24: 03 PM 1 84 257 0.326848249
25: 03 PM 2 39 257 0.151750973
26: 03 PM 3 12 257 0.046692607
27: 03 PM 4 6 257 0.023346304
28: 04 AM 0 4 16 0.250000000
29: 04 AM 1 5 16 0.312500000
30: 04 AM 2 5 16 0.312500000
I want to reorganize this such that the data is arrange chronologically according to the "hour" column. Is there a way to do this? Thanks!

You can transform to a 24-hour based time using lubridate parser (%I is decimal hour (1-12) and %p is AM/PM indicator) an then sort based on that so using dpylr and lubridate:
library(dplyr)
library(lubridate)
ordered_df <- df %>%
mutate(hour_24 = parse_date_time(hour, '%I %p')) %>%
arrange(hour_24)

Related

Repeat rows in dataframe and change 1 score until a condition is reached by each ID

Can't seem to find any answer to this question. I have no idea what is the best way to do it.
This is a dummy dataframe:
ID
year
age
death
1
1999
18
2015
2
1984
25
3
2012
24
What i want to do is to repeat each row and add 1 to year and age until one of a set of condition is reached
year == 2018 or year == death
Ultimately so the new dataframe looks like this:
ID
year
age
death
1
1999
18
2015
1
2000
19
2015
1
2001
20
2015
1
2002
21
2015
1
2003
22
2015
1
2004
23
2015
1
2005
24
2015
1
2006
25
2015
1
2007
26
2015
1
2008
27
2015
1
2009
28
2015
1
2010
29
2015
1
2011
30
2015
1
2012
31
2015
1
2013
32
2015
1
2014
33
2015
1
2015
34
2015
2
2010
25
2
2011
26
2
2012
27
2
2013
28
2
2014
29
2
2015
30
2
2016
31
2
2017
32
2
2018
33
3
2012
24
3
2013
25
3
2014
26
3
2015
27
3
2016
28
3
2017
29
3
2018
30
Any suggestions?
Sincerely,
A desperate newbie epidemiologist

A dplyr solution:
library(dplyr)
df %>%
group_by(ID) %>%
summarise(year = year:min(death, 2018, na.rm = TRUE),
age = age + 1:length(year) - 1, death) %>%
ungroup()
Output
# A tibble: 59 × 4
ID year age death
<int> <int> <dbl> <int>
1 1 1999 18 2015
2 1 2000 19 2015
3 1 2001 20 2015
4 1 2002 21 2015
5 1 2003 22 2015
6 1 2004 23 2015
7 1 2005 24 2015
8 1 2006 25 2015
9 1 2007 26 2015
10 1 2008 27 2015
11 1 2009 28 2015
12 1 2010 29 2015
13 1 2011 30 2015
14 1 2012 31 2015
15 1 2013 32 2015
16 1 2014 33 2015
17 1 2015 34 2015
18 2 1984 25 NA
19 2 1985 26 NA
20 2 1986 27 NA
21 2 1987 28 NA
22 2 1988 29 NA
23 2 1989 30 NA
24 2 1990 31 NA
25 2 1991 32 NA
26 2 1992 33 NA
27 2 1993 34 NA
28 2 1994 35 NA
29 2 1995 36 NA
30 2 1996 37 NA
31 2 1997 38 NA
32 2 1998 39 NA
33 2 1999 40 NA
34 2 2000 41 NA
35 2 2001 42 NA
36 2 2002 43 NA
37 2 2003 44 NA
38 2 2004 45 NA
39 2 2005 46 NA
40 2 2006 47 NA
41 2 2007 48 NA
42 2 2008 49 NA
43 2 2009 50 NA
44 2 2010 51 NA
45 2 2011 52 NA
46 2 2012 53 NA
47 2 2013 54 NA
48 2 2014 55 NA
49 2 2015 56 NA
50 2 2016 57 NA
51 2 2017 58 NA
52 2 2018 59 NA
53 3 2012 24 NA
54 3 2013 25 NA
55 3 2014 26 NA
56 3 2015 27 NA
57 3 2016 28 NA
58 3 2017 29 NA
59 3 2018 30 NA

Using data.table:
library(data.table)
yr_max = 2018L
chnalocf = \(x) x[nafill(replace(seq_along(x), is.na(x), NA), "locf")]
DT = DT[,
.SD[CJ(year = min(year):min(yr_max, death, na.rm = TRUE)), on = 'year'],
by = ID]
DT[, age := seq(first(age), first(age) + .N - 1L), by = ID]
DT[, death := first(death), by = ID]
Result:
ID year age death
# <int> <int> <int> <int>
# 1: 1 1999 18 2015
# 2: 1 2000 19 2015
# 3: 1 2001 20 2015
# 4: 1 2002 21 2015
# 5: 1 2003 22 2015
# 6: 1 2004 23 2015
# 7: 1 2005 24 2015
# 8: 1 2006 25 2015
# 9: 1 2007 26 2015
# 10: 1 2008 27 2015
# 11: 1 2009 28 2015
# 12: 1 2010 29 2015
# 13: 1 2011 30 2015
# 14: 1 2012 31 2015
# 15: 1 2013 32 2015
# 16: 1 2014 33 2015
# 17: 1 2015 34 2015
# 18: 2 1984 25 NA
# 19: 2 1985 26 NA
# 20: 2 1986 27 NA
# 21: 2 1987 28 NA
# 22: 2 1988 29 NA
# 23: 2 1989 30 NA
# 24: 2 1990 31 NA
# 25: 2 1991 32 NA
# 26: 2 1992 33 NA
# 27: 2 1993 34 NA
# 28: 2 1994 35 NA
# 29: 2 1995 36 NA
# 30: 2 1996 37 NA
# 31: 2 1997 38 NA
# 32: 2 1998 39 NA
# 33: 2 1999 40 NA
# 34: 2 2000 41 NA
# 35: 2 2001 42 NA
# 36: 2 2002 43 NA
# 37: 2 2003 44 NA
# 38: 2 2004 45 NA
# 39: 2 2005 46 NA
# 40: 2 2006 47 NA
# 41: 2 2007 48 NA
# 42: 2 2008 49 NA
# 43: 2 2009 50 NA
# 44: 2 2010 51 NA
# 45: 2 2011 52 NA
# 46: 2 2012 53 NA
# 47: 2 2013 54 NA
# 48: 2 2014 55 NA
# 49: 2 2015 56 NA
# 50: 2 2016 57 NA
# 51: 2 2017 58 NA
# 52: 2 2018 59 NA
# 53: 3 2012 24 NA
# 54: 3 2013 25 NA
# 55: 3 2014 26 NA
# 56: 3 2015 27 NA
# 57: 3 2016 28 NA
# 58: 3 2017 29 NA
# 59: 3 2018 30 NA
# ID year age death
Reproducible data:
DT = data.table(
ID = 1:3,
year = c(1999L, 1984L, 2012L),
age = c(18L, 25L, 24L),
death = c(2015L, NA, NA)
)

Using by.
by(dat, dat$ID, \(x) {
if (is.na(x$death)) u <- 2018 else u <- x$death[1]
year <- x$year[1]:u
age <- (year - x$year[1]) + x$age[1]
cbind(x[setdiff(names(x), c('age', 'year'))], year, age, row.names=NULL)[c(1, 3, 4, 2)]
}) |> c(make.row.names=FALSE) |> do.call(what=rbind)
# ID year age death
# 1 1 1999 18 2015
# 2 1 2000 19 2015
# 3 1 2001 20 2015
# 4 1 2002 21 2015
# 5 1 2003 22 2015
# 6 1 2004 23 2015
# 7 1 2005 24 2015
# 8 1 2006 25 2015
# 9 1 2007 26 2015
# 10 1 2008 27 2015
# 11 1 2009 28 2015
# 12 1 2010 29 2015
# 13 1 2011 30 2015
# 14 1 2012 31 2015
# 15 1 2013 32 2015
# 16 1 2014 33 2015
# 17 1 2015 34 2015
# 18 2 1984 25 NA
# 19 2 1985 26 NA
# 20 2 1986 27 NA
# 21 2 1987 28 NA
# 22 2 1988 29 NA
# 23 2 1989 30 NA
# 24 2 1990 31 NA
# 25 2 1991 32 NA
# 26 2 1992 33 NA
# 27 2 1993 34 NA
# 28 2 1994 35 NA
# 29 2 1995 36 NA
# 30 2 1996 37 NA
# 31 2 1997 38 NA
# 32 2 1998 39 NA
# 33 2 1999 40 NA
# 34 2 2000 41 NA
# 35 2 2001 42 NA
# 36 2 2002 43 NA
# 37 2 2003 44 NA
# 38 2 2004 45 NA
# 39 2 2005 46 NA
# 40 2 2006 47 NA
# 41 2 2007 48 NA
# 42 2 2008 49 NA
# 43 2 2009 50 NA
# 44 2 2010 51 NA
# 45 2 2011 52 NA
# 46 2 2012 53 NA
# 47 2 2013 54 NA
# 48 2 2014 55 NA
# 49 2 2015 56 NA
# 50 2 2016 57 NA
# 51 2 2017 58 NA
# 52 2 2018 59 NA
# 53 3 2012 24 NA
# 54 3 2013 25 NA
# 55 3 2014 26 NA
# 56 3 2015 27 NA
# 57 3 2016 28 NA
# 58 3 2017 29 NA
# 59 3 2018 30 NA
Data:
dat <- structure(list(ID = 1:3, year = c(1999L, 1984L, 2012L), age = c(18L,
25L, 24L), death = c(2015L, NA, NA)), class = "data.frame", row.names = c(NA,
-3L))

how to add data or values based on time to the time series in r

I have a problem. I have one time series and I have the values from other data. Now I just want to add 2nd data values into the time series in the new column. for better understanding, I am attaching an example down.
the data that I have
time value
09:00 14
09:10 25
09:13 44
09:16 33
09:24 22
09:30 55
09:45 33
09:55 45
10:00 22
10:05 33
time value
09:13 22
09:30 33
09:45 22
So, I have data like that but more than 2 thousand rows.
I just want to know how we can use 2 different data and put them at the exact same time to get one data frame with 3 columns.
for example, my output should be like that
time value values2
09:00 14 0
09:10 25 0
09:13 44 22
09:16 33 0
09:24 22 0
09:30 55 33
09:45 33 22
09:55 45 0
10:00 22 0
10:05 33 0

Base R:
df.merged <- merge(df1, df2, by = "time", all = TRUE)
df.merged
time value.x value.y
1: 09:00 14 NA
2: 09:10 25 NA
3: 09:13 44 22
4: 09:16 33 NA
5: 09:24 22 NA
6: 09:30 55 33
7: 09:45 33 22
8: 09:55 45 NA
9: 10:00 22 NA
10: 10:05 33 NA
with dplyr:
library(dplyr)
df.joined <- df1 %>%
left_join(df2, by = "time")
df.joined
time value.x value.y
1: 09:00 14 NA
2: 09:10 25 NA
3: 09:13 44 22
4: 09:16 33 NA
5: 09:24 22 NA
6: 09:30 55 33
7: 09:45 33 22
8: 09:55 45 NA
9: 10:00 22 NA
10: 10:05 33 NA
To match the names of your expected output, with base:
colnames(df2)[2] <- "value2"
df.merged <- merge(df1, df2, by = "time", all = TRUE)
df.merged
time value value2
1: 09:00 14 NA
2: 09:10 25 NA
3: 09:13 44 22
4: 09:16 33 NA
5: 09:24 22 NA
6: 09:30 55 33
7: 09:45 33 22
8: 09:55 45 NA
9: 10:00 22 NA
10: 10:05 33 NA
or programmatically with dplyr:
df.joined <- df2 %>%
rename_with(function(x) paste0(x, which(colnames(.)==x)), .cols = -time) %>%
right_join(df1, by = "time") %>%
arrange(time) %>%
select(time, value, everything())
df.joined
time value value2
1: 09:00 14 NA
2: 09:10 25 NA
3: 09:13 44 22
4: 09:16 33 NA
5: 09:24 22 NA
6: 09:30 55 33
7: 09:45 33 22
8: 09:55 45 NA
9: 10:00 22 NA
10: 10:05 33 NA

How to sum a field using some conditions in Axapta?

I have a user table like this
ID Date Value
---------------------------
1001 31 01 14 2035.1
1002 31 01 14 1384.65
1003 31 01 14 1011.1
1004 31 01 14 1187.04
1001 28 02 14 2035.1
1002 28 02 14 1384.65
1003 28 02 14 1011.1
1004 28 02 14 1188.86
1001 31 03 14 2035.1
1002 31 03 14 1384.65
1003 31 03 14 1011.1
1004 31 03 14 1188.86
1001 30 04 14 2066.41
1002 30 04 14 1405.95
1003 30 04 14 1026.66
1004 30 04 14 1207.15
And I want to make a sum from this table like this
ID Date Value Total
---------------------------------------
1001 31 01 14 2035.1 2035.1
1002 31 01 14 1384.65 1384.65
1003 31 01 14 1011.1 1011.1
1004 31 01 14 1187.04 1187.04
1001 28 02 14 2035.1 4070.2
1002 28 02 14 1384.65 2769.3
1003 28 02 14 1011.1 2022.2
1004 28 02 14 1188.86 2375.9
1001 31 03 14 2035.1 6105.3
1002 31 03 14 1384.65 4153.95
1003 31 03 14 1011.1 3033.3
1004 31 03 14 1188.86 3564.76
1001 30 04 14 2066.41 8171.71
1002 30 04 14 1405.95 5180.61
1003 30 04 14 1026.66 4059.96
1004 30 04 14 1207.15 4771.91
I have id, for each id for the first month it should write it is value for total and for second month of that id, it should add the value of first month + second month and it should go on like this. How can I do this summation in X++?
Can anyone help me?

It can be done as a display method on the table:
display Amount total()
{
return (select sum(Value) of Table
where Table.Id == this.Id &&
Table.Date <= this.Date).Value;
}
Change the table and field names to your fit.
This may not be the fastest way to do it though. In say a report context, it might be better to keep a running total for each id (in a map).
Also it can be done in a select like this:
Table table1, table2
while select table1
group Date, Id, Value
inner join sum(Value) of table2
where table2.Id == table1.Id &&
table2.Date <= table1.Date
{
...
}
You need to group on the wanted fields, because it is an aggregate select.

set limits for scale_x_date in ggplot2 in facet_grid context

I have a data frame named "a" as :
date individus annee
80 2013-07-23 0 2013
77 2013-07-12 0 2013
63 2013-05-13 7 2013
72 2013-06-25 2 2013
7 2011-04-19 20 2011
58 2013-04-23 6 2013
4 2011-04-11 7 2011
52 2012-07-03 0 2012
56 2012-08-06 9 2012
6 2011-04-15 0 2011
38 2012-05-02 8 2012
67 2013-05-28 1 2013
66 2013-05-24 0 2013
59 2013-04-26 46 2013
73 2013-06-28 9 2013
74 2013-07-02 0 2013
22 2011-06-14 44 2011
70 2013-06-17 0 2013
41 2012-05-11 0 2012
14 2011-05-13 6 2011
42 2012-05-15 0 2012
27 2011-07-18 0 2011
18 2011-05-26 0 2011
36 2012-04-13 39 2012
31 2011-07-29 12 2011
55 2012-07-13 25 2012
49 2012-06-14 17 2012
50 2012-06-18 69 2012
51 2012-06-25 65 2012
57 2013-04-19 41 2013
I would like to plot this data with ggplot2, whit facet_grid on annee with this code
plot<-ggplot(data=lob.df)+
# geom_point(aes(x=date, y=individus))+
geom_smooth(aes(x=date, y=individus, colour=annee))+
labs(x="Date",y="Nombre d'individus")+
scale_colour_discrete(name="Année")+
facet_grid(.~annee)
how give me that :
And I would like to remove all blanc data ... so I have played with scale_y_date but I wasn't able to reduce the graph limit :-S

You can set the scales parameter in facet_grid to be "free_x" like this:
plot<-ggplot(data=df)+
# geom_point(aes(x=date, y=individus))+
geom_smooth(aes(x=date, y=individus, colour=annee))+
labs(x="Date",y="Nombre d'individus")+
facet_grid(.~annee,scales="free_x")
##
print(plot)
I had to modify the aesthetics of your plot a little bit because your code was not running on my machine (I'm not using a very recent release of R), but using facet_grid(.~annee,scales="free_x")
should still work fine for you.
Data:
df <- read.table(
text=" date individus annee
80 2013-07-23 0 2013
77 2013-07-12 0 2013
63 2013-05-13 7 2013
72 2013-06-25 2 2013
7 2011-04-19 20 2011
58 2013-04-23 6 2013
4 2011-04-11 7 2011
52 2012-07-03 0 2012
56 2012-08-06 9 2012
6 2011-04-15 0 2011
38 2012-05-02 8 2012
67 2013-05-28 1 2013
66 2013-05-24 0 2013
59 2013-04-26 46 2013
73 2013-06-28 9 2013
74 2013-07-02 0 2013
22 2011-06-14 44 2011
70 2013-06-17 0 2013
41 2012-05-11 0 2012
14 2011-05-13 6 2011
42 2012-05-15 0 2012
27 2011-07-18 0 2011
18 2011-05-26 0 2011
36 2012-04-13 39 2012
31 2011-07-29 12 2011
55 2012-07-13 25 2012
49 2012-06-14 17 2012
50 2012-06-18 69 2012
51 2012-06-25 65 2012
57 2013-04-19 41 2013")
##
df$date <- as.Date(df$date)
df$individus <- as.numeric(df$individus)
df$annee <- as.numeric(df$annee)

How can I "roll up" values into subsequent records?

I have a data set (x) that looks like this:
DATE WEEKDAY A B C D
2011-02-04 Friday 113 67 109 72
2011-02-05 Saturday 1 0 0 1
2011-02-06 Sunday 9 5 0 0
2011-02-07 Monday 154 48 85 60
str(x):
'data.frame': 4 obs. of 6 variables:
$ DATE : Date, format: "2011-02-04" "2011-02-05" "2011-02-06" "2011-02-07"
$ WEEKDAY: Factor w/ 7 levels "Friday","Monday",..: 1 3 4 2
$ A : num 113 1 9 154
$ B : num 67 0 5 48
$ C : num 109 0 0 85
$ D : num 72 1 0 60
Tuesday - Saturday values don't change, but I want Sunday to be the sum of Saturday and Sunday and Monday to be the sum of Saturday, Sunday, and Monday.
I tried shifting Saturday's and Sunday's dates to date + 2 and date + 1 respectively, then aggregating by date, but I lose the weekend records.
For my example, the correct results would be the following:
DATE WEEKDAY A B C D
2011-02-04 Friday 113 67 109 72
2011-02-05 Saturday 1 0 0 1
2011-02-06 Sunday 10 5 0 1
2011-02-07 Monday 164 53 85 61
How can I roll up weekend values into the next day?
Three weeks' worth of data:
DATE WEEKDAY A B C D
1 2011-01-02 Sunday 2 1 0 0
2 2011-01-03 Monday 153 51 7 1
3 2011-01-04 Tuesday 182 103 13 5
4 2011-01-05 Wednesday 192 102 14 12
5 2011-01-06 Thursday 160 67 50 20
6 2011-01-07 Friday 154 96 50 39
7 2011-01-09 Sunday 0 0 0 1
8 2011-01-10 Monday 195 94 48 39
9 2011-01-11 Tuesday 206 72 71 38
10 2011-01-12 Wednesday 232 94 96 52
11 2011-01-13 Thursday 178 113 93 52
12 2011-01-14 Friday 173 97 68 56
13 2011-01-15 Saturday 2 0 1 0
14 2011-01-17 Monday 170 91 66 52
15 2011-01-18 Tuesday 176 76 70 78
16 2011-01-19 Wednesday 164 159 117 37
17 2011-01-20 Thursday 198 87 95 111
18 2011-01-21 Friday 213 86 89 90
19 2011-01-24 Monday 195 73 102 52
20 2011-01-25 Tuesday 193 108 116 70
21 2011-01-26 Wednesday 193 102 118 63

Since you've provided a small data, I've not been able to test this on a bigger data. But the idea is something like this. I'll use data.table as I find it can be very efficient here.
The code:
require(data.table)
my_days <- c("Saturday", "Sunday", "Monday")
dt <- data.table(df)
dt[, `:=`(DATE = as.Date(DATE))]
setkey(dt, "DATE")
dt[WEEKDAY %in% my_days, `:=`(A = cumsum(A), B = cumsum(B),
C = cumsum(C), D = cumsum(D)), by = format(DATE-1, "%W")]
The idea:
First, change the DATE Column to actual Date type using as.Date (line 4).
Second, ensure that the columns are sorted by DATE column by setting the key column of dt to DATE (line 5).
Now, the last line (line 6) is where all the magic happens and is the trickiest:
The first part of the expression WEEKDAY %in% my_days, subsets the data.table dt with only days = Sat, Sun or Mon.
The last part of the same line by = format(DATE-1, "%W"), subsets the data by the week they belong to. Here, since Monday falls on the next week, just subtract 1 from the current Date and then get the week number. This will group the Dates by Week, where, Tuesday until Monday should have the same week.
The expression in the middle ':='(A = ... , D = ...) computes the cumsum and replaces just those values per grouping by reference.
For the new data you've posted, I get this as the result. Let me know if it's not what you seek.
# DATE WEEKDAY A B C D
# 1: 2011-01-02 Sunday 2 1 0 0
# 2: 2011-01-03 Monday 155 52 7 1
# 3: 2011-01-04 Tuesday 182 103 13 5
# 4: 2011-01-05 Wednesday 192 102 14 12
# 5: 2011-01-06 Thursday 160 67 50 20
# 6: 2011-01-07 Friday 154 96 50 39
# 7: 2011-01-09 Sunday 0 0 0 1
# 8: 2011-01-10 Monday 195 94 48 40
# 9: 2011-01-11 Tuesday 206 72 71 38
# 10: 2011-01-12 Wednesday 232 94 96 52
# 11: 2011-01-13 Thursday 178 113 93 52
# 12: 2011-01-14 Friday 173 97 68 56
# 13: 2011-01-15 Saturday 2 0 1 0
# 14: 2011-01-17 Monday 172 91 67 52
# 15: 2011-01-18 Tuesday 176 76 70 78
# 16: 2011-01-19 Wednesday 164 159 117 37
# 17: 2011-01-20 Thursday 198 87 95 111
# 18: 2011-01-21 Friday 213 86 89 90
# 19: 2011-01-24 Monday 195 73 102 52
# 20: 2011-01-25 Tuesday 193 108 116 70
# 21: 2011-01-26 Wednesday 193 102 118 63
# DATE WEEKDAY A B C D

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Custom of Reordering dataframes based on column values - r

You can transform to a 24-hour based time using lubridate parser (%I is decimal hour (1-12) and %p is AM/PM indicator) an then sort based on that so using dpylr and lubridate: library(dplyr) library(lubridate) ordered_df <- df %>% mutate(hour_24 = parse_date_time(hour, '%I %p')) %>% arrange(hour_24)

Related

Repeat rows in dataframe and change 1 score until a condition is reached by each ID

how to add data or values based on time to the time series in r

How to sum a field using some conditions in Axapta?

set limits for scale_x_date in ggplot2 in facet_grid context

How can I "roll up" values into subsequent records?

Categories

Resources