I need to populate a column based on conditional precedence:
If O_M is not zero (eg: 0.34) , I check the Prev. record(which is sequenced by TP_N) in the same column O_M and if it is zero for 3 or more instances coded with codes (OD03,OT03,MO03) then i should populate To_Compute column with current O_M value - 0.34. I need to repeat this for every partition (DT,MNTH,P_ID,A_BR,D_BR,B_BR,DR) sequenced by TP_N. I should only look out for these codes from column O_N - (OD03,OT03,MO03)
DT MNTH P_ID A_BR D_BR B_BR TP_N DR O_M O_N TO_Compute
9/29/2016 9 QT21 1506 05Y XS-123 487,006 0 0 ? 0
9/29/2016 9 QT21 1506 05Y XS-123 487,007 0 0 OD03 0
9/29/2016 9 QT21 1506 05Y XS-123 487,008 0 0 OD03 0
9/29/2016 9 QT21 1506 05Y XS-123 487,009 0 0 OD03 0
9/29/2016 9 QT21 1506 05Y XS-123 487,010 0 0 OD03 0
9/29/2016 9 QT21 1506 05Y XS-123 487,011 0 0 OD03 0
9/29/2016 9 QT21 1506 05Y XS-123 487,012 0 0 OD03 0
9/29/2016 9 QT21 1506 05Y XS-123 487,013 0 0 OD03 0
9/29/2016 9 QT21 1506 05Y XS-123 487,014 0 0 MO03 0
9/29/2016 9 QT21 1506 05Y XS-123 487,015 0 0 OT03 0
9/29/2016 9 QT21 1506 05Y XS-123 487,016 0 0 OD03 0
9/29/2016 9 QT21 1506 05Y XS-123 487,017 0 0.34 ? 0.34
9/29/2016 9 QT21 1506 05Y XS-123 487,018 0 0 OD03 0
9/29/2016 9 QT21 1506 05Y XS-123 487,019 0 1.03 ? 0
9/29/2016 9 QT21 1506 05Y XS-123 487,020 0 0.3 ? 0
9/29/2016 9 QT21 1506 05Y XS-123 487,021 0 1.25 ? 0
9/29/2016 9 QT21 1506 05Y XS-123 487,022 0 0 OP04 0
9/29/2016 9 QT21 1506 05Y XS-123 487,023 0 10.53 ? 0
9/29/2016 9 QT21 1506 05Y XS-123 487,024 0 0.37 ? 0
9/29/2016 9 QT21 1506 05Y XS-123 487,025 0 0.28 ? 0
9/29/2016 9 QT21 1506 05Y XS-123 487,026 0 0 OD03 0
9/29/2016 9 QT21 1506 05Y XS-123 487,027 0 0 OD03 0
9/29/2016 9 QT21 1506 05Y XS-123 487,028 0 0.6 ? 0
9/29/2016 9 QT21 1506 05Y XS-123 487,029 0 0.38 ? 0
9/29/2016 9 QT21 1506 05Y XS-123 487,030 0 0.4 ? 0
9/29/2016 9 QT21 1506 05Y XS-123 487,031 0 0.35 ? 0
9/29/2016 9 QT21 1506 05Y XS-123 487,032 0 0.45 ? 0
9/29/2016 9 QT21 1506 05Y XS-123 487,033 0 0 OD03 0
9/29/2016 9 QT21 1506 05Y XS-123 487,034 0 0 OD03 0
9/29/2016 9 QT21 1506 05Y XS-123 487,035 0 0 OD03 0
9/29/2016 9 QT21 1506 05Y XS-123 487,036 0 0.3 ? 0.3
9/29/2016 9 QT21 1506 05Y XS-123 487,037 0 0.35 ? 0
9/29/2016 9 QT21 1506 05Y XS-123 487,038 0 0.52 ? 0
However, If O_M is not zero (eg: 0.6 - 11th row from bottom for column O_M) , I check the Prev. record in the same column O_M and i only have 2 prev records as zero ( for 3 or more instances coded with codes (OD03,OT03,MO03) then i should populate To_Compute column with 0.
If O_M is not zero (eg: 0.3 Third last row for O_M) , I check the Prev. record in the same column O_M and here it is zero for 3 or more instances coded with codes (OD03,OD03,OD03) then i should populate To_Compute column with current O_M value - 0.3.
I am new to TD. Any light on this could help.
Untested, but this should work based on your description:
case
when sum(O_M) -- previous three rows are all 0 (assuming no negative values exist)
over (partition by ??
order by TP_N
rows between 3 preceding and 1 preceding) = 0
and -- prvious three rows contain any of the searched codes
sum(case when O_N IN ('OD03','OT03','MO03') then 1 else 0 end)
over (partition by ??
order by TP_N
rows between 3 preceding and 1 preceding) = 3
then O_M
else 0
end
I'll provide two possible solutions for you to decide which works best for your situation. The first is simpler, but more tedious as it simply joins to the same table 3 times. Assume your data above exists in the DATASET table:
select ds1.dt
,ds1.mnth
,ds1.p_id
,ds1.a_br
,ds1.d_br
,ds1.b_br
,ds1.tp_n
,ds1.dr
,ds1.o_m
,ds1.o_n
,case when zeroifnull(ds4.o_m) + zeroifnull(ds3.o_m) + zeroifnull(ds2.o_m) = 0 and ds4.o_n in ('OD03','OT03','MO03') and ds3.o_n in ('OD03','OT03','MO03') and ds2.o_n in ('OD03','OT03','MO03') then ds1.o_m
else 0 end as TO_COMPUTE
from dataset ds1
left join dataset ds2
on ds1.tp_n = ds2.tp_n +1
and ds1.dt = ds2.dt
and ds1.mnth = ds2.mnth
and ds1.p_id = ds2.p_id
and ds1.a_br = ds2.a_br
and ds1.d_br = ds2.d_br
and ds1.b_br = ds2.b_br
and ds1.dr = ds2.dr
left join dataset ds3
on ds1.tp_n = ds3.tp_n +2
and ds1.dt = ds3.dt
and ds1.mnth = ds3.mnth
and ds1.p_id = ds3.p_id
and ds1.a_br = ds3.a_br
and ds1.d_br = ds3.d_br
and ds1.b_br = ds3.b_br
and ds1.dr = ds3.dr
left join dataset ds4
on ds1.tp_n = ds4.tp_n +3
and ds1.dt = ds4.dt
and ds1.mnth = ds4.mnth
and ds1.p_id = ds4.p_id
and ds1.a_br = ds4.a_br
and ds1.d_br = ds4.d_br
and ds1.b_br = ds4.b_br
and ds1.dr = ds4.dr
order by 7;
The second uses partitions within a subquery:
select sub.dt
,sub.mnth
,sub.p_id
,sub.a_br
,sub.d_br
,sub.b_br
,sub.tp_n
,sub.dr
,sub.o_m
,sub.o_n
,case when o_m2 = 0 and o_m3 = 0 and o_m4 = 0 and o_n2 in ('OD03','OT03','MO03') and o_n4 in ('OD03','OT03','MO03') and o_n4 in ('OD03','OT03','MO03') then sub.o_m
else 0 end as TO_COMPUTE
from
(
select ds.dt
,ds.mnth
,ds.p_id
,ds.a_br
,ds.d_br
,ds.b_br
,ds.tp_n
,ds.dr
,ds.o_m
,ds.o_n
,max(ds.o_m) over (partition by ds.dt, ds.mnth, ds.p_id, ds.a_br, ds.d_br, ds.b_br, ds.dr order by ds.tp_n rows between 1 preceding and 1 preceding) as O_M2
,max(ds.o_m) over (partition by ds.dt, ds.mnth, ds.p_id, ds.a_br, ds.d_br, ds.b_br, ds.dr order by ds.tp_n rows between 2 preceding and 2 preceding) as O_M3
,max(ds.o_m) over (partition by ds.dt, ds.mnth, ds.p_id, ds.a_br, ds.d_br, ds.b_br, ds.dr order by ds.tp_n rows between 3 preceding and 3 preceding) as O_M4
,max(ds.o_n) over (partition by ds.dt, ds.mnth, ds.p_id, ds.a_br, ds.d_br, ds.b_br, ds.dr order by ds.tp_n rows between 1 preceding and 1 preceding) as O_N2
,max(ds.o_n) over (partition by ds.dt, ds.mnth, ds.p_id, ds.a_br, ds.d_br, ds.b_br, ds.dr order by ds.tp_n rows between 2 preceding and 2 preceding) as O_N3
,max(ds.o_n) over (partition by ds.dt, ds.mnth, ds.p_id, ds.a_br, ds.d_br, ds.b_br, ds.dr order by ds.tp_n rows between 3 preceding and 3 preceding) as O_N4
from dataset ds
) sub
order by 7;
Related
I have the data frame and i have tabulated the output as per my requirement with xtabs :
df1<-data.frame(
Year=sample(2016:2018,100,replace = T),
Month=sample(month.abb,100,replace = T),
category1=sample(letters[1:6],100,replace = T),
catergory2=sample(LETTERS[8:16],100,replace = T),
lic=sample(c("P","F","T"),100,replace = T),
count=sample(1:1000,100,replace = T)
)
Code :
xtabs(count~Month+category1+lic,data=df1)
Output :
, , lic = F
category1
Month a b c d e f
Apr 0 0 0 0 0 0
Aug 418 0 0 0 0 208
Dec 628 0 0 0 0 0
Feb 0 0 0 968 0 701
Jan 388 0 0 0 0 0
Jul 771 0 0 0 0 2514
Jun 987 913 0 216 0 395
Mar 454 0 0 0 0 314
May 0 1298 0 0 0 0
Nov 906 0 526 262 0 1417
Oct 783 0 853 336 310 286
Sep 0 0 0 0 928 0
, , lic = P
category1
Month a b c d e f
Apr 13 0 0 0 0 0
Aug 0 774 0 0 416 652
Dec 0 0 0 241 462 123
Feb 150 857 0 169 6 1
Jan 954 0 567 0 0 0
Jul 481 0 0 0 0 846
Jun 0 0 0 484 0 535
Mar 751 0 0 0 241 0
May 0 549 37 0 0 2
Nov 649 0 0 0 154 692
Oct 0 0 182 0 0 0
Sep 0 0 585 0 493 0
, , lic = T
category1
Month a b c d e f
Apr 0 0 410 0 0 0
Aug 0 0 0 0 0 0
Dec 0 0 833 289 811 0
Feb 0 1223 0 716 366 552
Jan 555 0 802 0 1598 0
Jul 0 0 69 0 0 696
Jun 0 0 0 0 190 0
Mar 0 1165 0 0 0 0
May 979 951 676 0 0 0
Nov 267 0 79 1951 290 530
Oct 230 78 0 679 321 0
Sep 0 871 0 0 0 0
Output matches my requirement but order of month is misplaced.
can i achieve same thing with any package? or any easiest methods to get the same data?
I suggest making Month an ordered factor:
df1$Month <- ordered(df1$Month, levels = month.abb)
xtabs(count~Month+category1+lic,data=df1)
#, , lic = F
#
# category1
#Month a b c d e f
# Jan 0 0 0 0 563 0
# Feb 0 0 0 826 0 0
# Mar 0 0 3 685 443 814
# Apr 0 848 0 474 0 0
# May 192 412 1942 0 803 545
# Jun 593 0 0 0 520 807
# Jul 829 745 0 0 926 0
# Aug 1474 0 603 376 0 706
# Sep 0 0 0 173 0 0
# Oct 0 0 661 915 814 0
# Nov 0 881 0 0 0 0
# Dec 0 0 0 0 0 0
#</snip>
Hopefully this is what OP is aiming to do:
library(tidyverse)
df1<-as.tibble(df1)
df1 %>%
arrange(Month)
Year Month category1 catergory2 lic count
<int> <fct> <fct> <fct> <fct> <int>
1 2016 Apr a N F 745
2 2016 Apr b K F 346
3 2016 Apr b O T 61
4 2016 Apr a J T 680
5 2018 Apr d O P 308
6 2017 Apr e M F 408
7 2016 Apr b P P 474
8 2017 Apr b O P 332
9 2016 Apr b P F 321
10 2017 Apr e N T 384
# ... with 90 more rows
How can I get the name of child list in a list in R? My list is like:
$sd1
freq value order
11 1.15 17 0
12 2.12 13 0
13 2.81 21 0
14 4.13 15 0
15 4.84 18 0
16 7.54 59 0
17 9.36 17 0
$sd2
freq value order
31 0.63 4 0
32 1.54 3 0
33 3.22 3 0
34 3.98 4 0
35 4.66 38 0
36 7.14 3 0
37 9.39 29 0
$sd3
freq value order
41 0.97 4 0
42 2.03 7 0
43 2.65 4 0
44 3.34 680 0
45 4.15 4 0
46 6.67 10 0
47 7.51 6 0
48 8.35 4 0
49 10.57 4 0
50 15.97 6 0
I'd like to get sd1,sd2... with lapply function and make some changes on each child list of sd1, sd2, etc.
Sorry for the very specific question, but I have a file as such:
Adj Year man mt wm wmt by bytl gr grtl
3 careless 1802 0 126 0 54 0 13 0 51
4 careless 1803 0 166 0 72 0 1 0 18
5 careless 1804 0 167 0 58 0 2 0 25
6 careless 1805 0 117 0 5 0 5 0 7
7 careless 1806 0 408 0 88 0 15 0 27
8 careless 1807 0 214 0 71 0 9 0 32
...
560 mean 1939 21 5988 8 1961 0 1152 0 1512
561 mean 1940 20 5810 6 1965 1 914 0 1444
562 mean 1941 10 6062 4 2097 5 964 0 1550
563 mean 1942 8 5352 2 1660 2 947 2 1506
564 mean 1943 14 5145 5 1614 1 878 4 1196
565 mean 1944 42 5630 6 1939 1 902 0 1583
566 mean 1945 17 6140 7 2192 4 1004 0 1906
Now I have to call for specific values (e.g. [careless,1804,man] or [mean, 1944, wmt].
Now I have no clue how to do that, one possibility would be to split the data.frame and create an array if I'm correct. But I'd love to have a simpler solution.
Thank you in advance!
Subsetting for specific values in Adj and Year column and selecting the man column will give you the required output.
df[df$Adj == "careless" & df$Year == 1804, "man"]
I have a table that looks similar to this
MUNI YEAR ENTE SALE
D101 1995 F001 1000
D101 1995 F002 1200
D101 1995 F003 1300
D101 1996 F001 1000
D101 1996 F003 1250
D101 1996 F004 1300
D101 1997 F001 1000
D101 1998 F002 1400
D101 1998 F003 1500
D102 1995 F001 1000
D102 1995 F003 1200
D102 1995 F006 1300
D102 1996 F001 1050
D102 1996 F002 1320
D102 1996 F003 1250
D102 1996 F006 1350
D102 1996 F002 1320
...
It is a sales table where MUNI stands for markets and ENTE stands for firms. The data consists of 7 years, 1200 markets and 200 firms. I would like to reorganize this table into a matrix form such that the dimensions are (rows = MUNI X YEAR, Cols = ENTE) and in each cell there is the value of sale, something like this
MUNIxYEAR\ENTE F001 F002 F003 F004 ...
D101x1995 1000 1200 1300 NA ...
D101x1996 1000 NA 1250 1300 ...
...
I am not sure how to this or the best way to proceed so I get the above-mentioned data organization. I have checked other posts and I believe the way of doing this is to use the command sparseMatrix. However, I don't know how to use it when (1) you have multiple criteria (i.e., two conditions for the rows) and (2) the dimensions of the matrix are string IDs (change them into factors and the get the levels?).
Thanks in advance for any help and guidance.
Many ways and packages to do that. I'm using a "tidyr" package method:
library(tidyr)
df = data.frame(MUNI = rep(paste0("D10", c(1,1,2,2,3,4)), each = 2),
YEAR = rep(1999:2000,3),
ENTE = paste0("F00", c(1,2,3,3,4,5)),
SALE = sample(1000:2000, 6, replace = T))
df
# MUNI YEAR ENTE SALE
# 1 D101 1999 F001 1670
# 2 D101 2000 F002 1420
# 3 D101 1999 F003 1985
# 4 D101 2000 F003 1914
# 5 D102 1999 F004 1727
# 6 D102 2000 F005 1195
# 7 D102 1999 F001 1670
# 8 D102 2000 F002 1420
# 9 D103 1999 F003 1985
# 10 D103 2000 F003 1914
# 11 D104 1999 F004 1727
# 12 D104 2000 F005 1195
spread(df,ENTE,SALE, fill=0) # in case you decide to have each column separately for querying or further grouping in the future
# MUNI YEAR F001 F002 F003 F004 F005
# 1 D101 1999 1716 0 1516 0 0
# 2 D101 2000 0 1917 1155 0 0
# 3 D102 1999 1716 0 0 1259 0
# 4 D102 2000 0 1917 0 0 1291
# 5 D103 1999 0 0 1516 0 0
# 6 D103 2000 0 0 1155 0 0
# 7 D104 1999 0 0 0 1259 0
# 8 D104 2000 0 0 0 0 1291
df2 = spread(df,ENTE,SALE, fill=0)
unite(df2, "MUNIxYEAR", MUNI,YEAR, sep = " x ") # if you want to combine columns
# MUNIxYEAR F001 F002 F003 F004 F005
# 1 D101 x 1999 1716 0 1516 0 0
# 2 D101 x 2000 0 1917 1155 0 0
# 3 D102 x 1999 1716 0 0 1259 0
# 4 D102 x 2000 0 1917 0 0 1291
# 5 D103 x 1999 0 0 1516 0 0
# 6 D103 x 2000 0 0 1155 0 0
# 7 D104 x 1999 0 0 0 1259 0
# 8 D104 x 2000 0 0 0 0 1291
You can use xtabs
For instance:
# Set random seed for reproducibility
set.seed(12345)
# Generate 500 rows of random data
my.data = data.frame(MUNI = rep(paste0("D", 101:110), each = 50),
YEAR = sample(1990:2000, 500, replace = TRUE),
ENTE = sample(paste0("F00", 1:9), 500, replace = T),
SALE = sample(1000:2000, 500, replace = T)
)
# Create a new column with the string "MUNIxYEAR"
my.data$MUNIxYEAR = paste(my.data$MUNI, my.data$YEAR, sep = "x")
# Call xtabs to get the table!
res <- xtabs(SALE ~ MUNIxYEAR + ENTE, my.data)
First lines of the output:
ENTE
MUNIxYEAR F001 F002 F003 F004 F005 F006 F007 F008 F009
D101x1990 1339 0 0 1693 0 2831 2779 0 0
D101x1991 0 1407 0 3619 0 0 0 1254 0
D101x1992 0 0 0 0 1807 0 1766 0 1657
D101x1993 1174 1154 0 0 1794 0 0 1218 0
D101x1994 0 1015 6636 0 0 0 2126 0 0
D101x1995 0 0 0 0 0 3478 3228 1517 0
D101x1996 0 0 1304 0 0 0 1505 0 0
D101x1997 0 1077 1481 1802 0 2494 0 0 0
D101x1998 0 0 1660 5366 1844 0 0 1006 0
D101x1999 0 1437 0 0 0 0 1844 0 2394
D101x2000 0 0 1714 0 0 0 1950 1758 1108
D102x1990 3761 0 3307 1182 0 0 0 0 0
D102x1991 0 0 0 1539 2716 0 1716 0 0
D102x1992 1980 0 1056 1458 0 0 0 0 1641
D102x1993 0 0 1429 0 1784 0 1114 0 0
D102x1994 0 0 0 0 1377 0 1038 1000 0
D102x1995 0 0 1088 0 0 1031 4205 1764 0
D102x1996 0 0 0 0 1658 0 3559 0 0
D102x1997 0 1048 2453 0 0 1741 0 0 0
D102x1998 1427 5139 0 1336 0 0 1372 0 1395
D102x1999 0 0 0 3957 0 1972 0 0 0
D102x2000 0 3258 0 0 0 3780 0 3299 1360
D103x1990 0 0 0 1247 1526 0 0 0 1234
D103x1991 0 1919 0 0 0 0 0 1704 0
D103x1992 0 1489 0 0 4428 0 1371 0 0
D103x1993 0 1477 0 0 0 0 1319 0 1211
D103x1994 0 2649 0 0 1488 0 0 0 0
The xtabs function can help reformat your data into a 3 dimensional array and then the ftable function can flatten it to the 2 dimensional table.
Other options would be the reshape2 or plyr packages (and probably others as well).
I am attempting to assign values to a column based on conditional statements but the POSIXct format seems to be throwing me off. I have a column of times and would like to assign them to day/night/dawn/dusk with something like this:
if(t40636$time>t40636$dawn.b&t40636$time<t40636$dawn.e){
t40636$time.periods=1
} else {
if(t40636$time>t40636$mid.day.b&t40636$time<t40636$mid.day.e){
t40636$time.periods=2
} else {
if(t40636$time>t40636$dusk.b&t40636$time<t40636$dusk.e){
t40636$time.periods=3
} else {
if(t40636$time>t40636$mid.night.b&t40636$time<t40636$mid.night.e){
t40636$time.periods=4
} else {
t40636$time.periods=0
}
}
}
}
However, this code does not work because of the format of the columns and yields the matrix seen below (only 0s in the time.periods column).
Date Temp..ÂșC. Depth..m. Light time time.at.depth dawn.b dawn.e dusk.b
1 2012-06-19 14.47 -21.5 255 15:32 0 01:42 04:42 19:13
2 2012-06-19 16.99 -20.2 255 15:37 5 01:42 04:42 19:13
3 2012-06-19 12.60 -18.8 255 15:41 4 01:42 04:42 19:13
4 2012-06-19 16.36 -17.5 255 15:46 5 01:42 04:42 19:13
5 2012-06-19 16.36 -13.4 255 15:51 5 01:42 04:42 19:13
6 2012-06-19 17.94 -2.7 255 15:56 5 01:42 04:42 19:13
dusk.e mid.day.b mid.day.e mid.night.b mid.night.e time.periods
1 22:13 10:27 13:27 22:27 01:27 0
2 22:13 10:27 13:27 22:27 01:27 0
3 22:13 10:27 13:27 22:27 01:27 0
4 22:13 10:27 13:27 22:27 01:27 0
5 22:13 10:27 13:27 22:27 01:27 0
6 22:13 10:27 13:27 22:27 01:27 0
ifelse yields something close to what I want but I can't do multiple statements with it. Any suggestions are greatly appreciated.
t40636$time.periods=ifelse(t40636$time>t40636$dawn.b&t40636$time<t40636$dawn.e,1,0)
The answer to "fix my multiple if-else statements" is nearly always "Don't use multiple if-else constructions."
The R-language has a very nice switch function, and its help page has some excellent examples.