I have a user table like this
ID Date Value
---------------------------
1001 31 01 14 2035.1
1002 31 01 14 1384.65
1003 31 01 14 1011.1
1004 31 01 14 1187.04
1001 28 02 14 2035.1
1002 28 02 14 1384.65
1003 28 02 14 1011.1
1004 28 02 14 1188.86
1001 31 03 14 2035.1
1002 31 03 14 1384.65
1003 31 03 14 1011.1
1004 31 03 14 1188.86
1001 30 04 14 2066.41
1002 30 04 14 1405.95
1003 30 04 14 1026.66
1004 30 04 14 1207.15
And I want to make a sum from this table like this
ID Date Value Total
---------------------------------------
1001 31 01 14 2035.1 2035.1
1002 31 01 14 1384.65 1384.65
1003 31 01 14 1011.1 1011.1
1004 31 01 14 1187.04 1187.04
1001 28 02 14 2035.1 4070.2
1002 28 02 14 1384.65 2769.3
1003 28 02 14 1011.1 2022.2
1004 28 02 14 1188.86 2375.9
1001 31 03 14 2035.1 6105.3
1002 31 03 14 1384.65 4153.95
1003 31 03 14 1011.1 3033.3
1004 31 03 14 1188.86 3564.76
1001 30 04 14 2066.41 8171.71
1002 30 04 14 1405.95 5180.61
1003 30 04 14 1026.66 4059.96
1004 30 04 14 1207.15 4771.91
I have id, for each id for the first month it should write it is value for total and for second month of that id, it should add the value of first month + second month and it should go on like this. How can I do this summation in X++?
Can anyone help me?
It can be done as a display method on the table:
display Amount total()
{
return (select sum(Value) of Table
where Table.Id == this.Id &&
Table.Date <= this.Date).Value;
}
Change the table and field names to your fit.
This may not be the fastest way to do it though. In say a report context, it might be better to keep a running total for each id (in a map).
Also it can be done in a select like this:
Table table1, table2
while select table1
group Date, Id, Value
inner join sum(Value) of table2
where table2.Id == table1.Id &&
table2.Date <= table1.Date
{
...
}
You need to group on the wanted fields, because it is an aggregate select.
Related
This question already has answers here:
Using spread with duplicate identifiers for rows giving error
(1 answer)
How to spread columns with duplicate identifiers?
(1 answer)
Spread with duplicate identifiers (using tidyverse and %>%) [duplicate]
(1 answer)
Closed 3 years ago.
I'm trying to use spread() function to make my data.frame wide, but i have some errors that i don't even understand...
some part of my dataframe is looks like this:
> df
NO2 Month
1 23 01
2 27 01
3 16 01
4 13 01
5 26 01
6 23 01
7 51 01
8 46 01
9 21 01
10 18 01
11 13 01
12 22 01
13 47 01
14 60 01
15 49 01
16 76 01
17 38 01
18 24 01
19 15 01
20 20 01
21 33 01
22 17 01
23 19 01
24 20 01
25 25 01
26 46 01
27 53 01
28 41 01
29 54 01
30 28 01
31 28 01
32 51 02
33 61 02
34 56 02
35 57 02
36 30 02
37 12 02
38 27 02
39 13 02
40 35 02
41 40 02
42 40 02
43 47 02
44 72 02
45 55 02
46 30 02
47 10 02
48 29 02
49 50 02
50 39 02
51 61 02
52 56 02
53 44 02
54 46 02
55 35 02
56 34 02
57 41 02
58 39 02
59 39 02
60 27 03
61 48 03
62 36 03
63 40 03
64 41 03
65 45 03
66 46 03
67 43 03
68 55 03 (...)
so simply i have values for each day in year and i want to spread them and use boxplot() for each month, to make it more clearly to read, but since i cant event spread it i cant show it in right way
I'm trying the spread and also reshape but have some errors:
df=data.frame(data)
df$Month=as.numeric(format(data$date,format="%m"))
df=df%>%select(c("NO2","Month"))
df=reshape(df,idvar=c("NO2","Month"),direction="wide",timevar="Month")
warnings() ## here i have first errors (will show them in below)
df=spread(df,Month,NO2) ## have problems here also
df=spread(df,df$Month,df$NO2) ## and here also
First error i have with reshape() function is for each "Month" i've got something like this
1: In reshapeWide(data, idvar = idvar, timevar = timevar, ... :
multiple rows match for Month=1: first taken
for second error i have something like this
Error in eval_tidy(enquo(var), var_env) : object 'Month' not found
and for third try i have this
Error: "var" must evaluate to a single number or a column name, not NULL
Can someone help me? I don't rlly get it, i've made spreads and this is my first touch with this problem..
You probably need
library(dplyr)
library(tidyr)
df %>%
group_by(Month) %>%
mutate(row = row_number()) %>%
spread(Month, NO2)
which gives you this output
# row `1` `2` `3`
# <int> <int> <int> <int>
# 1 1 23 51 27
# 2 2 27 61 48
# 3 3 16 56 36
# 4 4 13 57 40
# 5 5 26 30 41
# 6 6 23 12 45
# 7 7 51 27 46
# 8 8 46 13 43
#.....
Or
df %>%
group_by(Month) %>%
mutate(row = row_number()) %>%
spread(row, NO2)
which gives you this
# Month `1` `2` `3` `4` `5` `6` `7` `8` ....
# <int> <int> <int> <int> <int> <int> <int> <int> <int> ....
#1 1 23 27 16 13 26 23 51 46 ....
#2 2 51 61 56 57 30 12 27 13 ....
#3 3 27 48 36 40 41 45 46 43 ....
The point being we need a unique identifier when we want to cast a dataframe from long to wide. As it is not present in your original dataframe we create it by grouping every Month and assign a new number to every row using row_number().
If you want to achieve the same result with base R reshape, we can add the same unique identifier using ave and seq_along as FUN argument.
df$row <- with(df, ave(NO2, Month, FUN = seq_along))
reshape(df,direction="wide",idvar ="Month", timevar = "row")
i want to create a two columns about unique values in a rows. And another when get to 25 distinct values.
Let take a example:
raffle Bola1 Ball2 Ball3 Ball4 Ball5 Ball6 Ball7 Ball8 Ball9 Ball10 Ball11 Ball12 Ball13 Ball14 Ball15
2 23 15 05 04 12 16 20 06 11 19 24 01 09 13 07
3 20 23 12 08 06 01 07 11 14 04 16 10 09 17 24
4 16 05 25 24 23 08 12 02 17 18 01 10 04 19 13
5 15 13 20 02 11 24 09 16 04 23 25 12 08 19 01
6 23 19 01 05 07 21 16 10 15 25 06 02 12 04 17
7 22 04 15 08 16 14 21 23 12 01 25 19 07 10 18
8 19 16 18 09 13 08 05 25 17 10 06 15 01 22 20
9 21 04 17 05 03 13 16 09 20 24 25 19 11 15 10
10 24 19 08 23 06 02 20 11 09 03 04 10 05 12 14
11 24 09 08 19 20 22 06 10 11 16 07 25 23 02 12
12 11 05 25 01 09 08 16 04 07 24 17 02 12 14 10
13 13 06 10 05 08 14 03 11 16 15 09 17 19 07 23
14 14 21 13 19 20 06 09 05 07 23 18 01 15 02 25
15 23 06 21 04 10 24 16 01 15 02 08 19 12 18 25
16 24 17 05 08 07 12 13 02 15 10 19 25 23 21 06
17 13 20 17 01 06 07 02 14 05 09 16 19 03 21 18
18 02 23 10 07 11 14 17 22 15 06 24 08 19 20 18
19 15 17 10 23 11 24 13 14 06 02 08 05 20 16 07
20 04 09 08 24 16 20 03 17 18 19 07 06 23 14 10
21 05 02 01 22 19 08 24 04 25 23 18 20 14 11 16
22 13 15 05 09 07 10 01 03 22 02 25 14 06 04 12
23 10 11 05 19 18 14 06 04 20 01 08 03 12 16 17
24 01 19 21 14 02 23 25 05 20 11 07 10 24 17 03
25 04 23 20 02 05 13 07 09 24 03 01 06 14 22 16
26 19 11 07 16 08 21 05 10 20 13 23 09 17 14 22
27 25 06 22 21 11 24 03 14 12 13 20 08 10 15 18
28 18 21 11 07 09 03 20 16 14 12 13 17 01 19 10
29 13 14 06 01 24 04 08 05 17 22 21 19 20 09 16
30 22 02 01 17 08 04 19 20 11 14 06 21 07 23 03
I have 15 distinct values, in first rows,
I have plus 6 distinct values, in second rows,
I have plus 3 distinct values, in a third rows,
On the seven row, i complete all numbers, 25 distinct values,
I need to memory this information, like this
raffle Ball1 Ball15 unique_balls group
1 16 02 15 1
2 22 19 21 1
...
7 24 10 25 1
8 8 1 15 2
When i get to 25 distinct values, i indicate another group!
I have more than 1 hundread raffle, help me!
If you want to calculate unique values in each row and also carry it forward till the threshold is reached, we can use a for loop
num <- numeric(length = 0L) #Vector to store unique values
threshold <- 25 #Threshold value to reset
df$group <- 1 #Initialise all group values to 1
count <- 1 #Variable to keep the count of unique groups
#For every row in the dataframe
for (i in seq_len(nrow(df))) {
#Get all the unique values from previous rows before threshold was reached
#and append new unique values for this row
num <- unique(c(num, as.integer(df[i, ])))
#If the length of unique values reaches the threshold
if (length(num) >= threshold) {
df$group[i] <- count
#Empty the unique values vector
num <- numeric(length = 0L)
#Increment the group count by 1
count = count + 1
}
else {
#If the threshold is not reached, continue the previous count
df$group[i] <- count
}
}
df$group
# [1] 1 1 1 1 1 1 2 2 2 2 3 3 3 3 3 3 3 4 4 4 4 5 5 5 5 6 6 6 7
I have a data table that looks like this:
ID time somevalues change
001 12:33 13 NA
002 12:34 27 speed: 34
003 12:35 45 width: 127
004 12:36 41 NA
005 12:37 44 height: 19.2
006 12:35 45 NA
007 12:36 49 speed: 35
008 12:37 44 speed: 27
009 12:38 45 NA
010 12:39 44 NA
011 12:40 44 height: 18, speed: 28
012 12:41 40 NA
013 12:42 44 height: 18.1
014 12:43 55 width: 128.1
015 12:44 41 NA
... ... ... ...
The table consists of various measurements of a sensor. Some of the measurements have only been entered if they have changed. In addition, these measurements were always entered in the same column. What I need is a data table, which looks like this:
ID time somevalues speed height width
001 12:33 13 34 19.1 128
002 12:34 27 34 19.1 128
003 12:35 45 34 19.1 127
004 12:36 41 34 19.1 127
005 12:37 44 34 19.2 127
006 12:35 45 34 19.2 127
007 12:36 49 35 19.2 127
008 12:37 44 27 19.2 127
009 12:38 45 27 19.2 127
010 12:39 44 27 19.2 127
011 12:40 44 28 18 127
012 12:41 40 28 18 127
013 12:42 44 28 18.1 127
014 12:43 55 28 18.1 128.1
015 12:44 41 28 18.1 128.1
... ... ... ... ... ...
I need the data in this format to analyze and visualize it.
Is there a way to do that in R without using multiple if statements?
does this work for you?
library(dplyr)
# create data - had to remove the spaces in change, to read the table, but shouldn't make a difference.
data_temp = read.table(text = "
ID time somevalues change
001 12:33 13 NA
002 12:34 27 speed:34
003 12:35 45 width:127
004 12:36 41 NA
005 12:37 44 height:19.2
006 12:35 45 NA
007 12:36 49 speed:35
008 12:37 44 speed:27
009 12:38 45 NA
010 12:39 44 NA
011 12:40 44 height:18,speed:28
012 12:41 40 speed:29,width:120.1
013 12:42 44 height:18.1,speed:30,with:50
014 12:43 55 width:128.1
015 12:44 41 NA"
, header = T, stringsAsFactors = F)
data_wanted = select(data_temp, ID, time, somevalues)
speed = which(grepl("speed:", data_temp$change)) # in which rows is speed
speed_string = gsub(".*speed:", "", data_temp$change[speed]) # get string and remove everything before the speed value
speed_string = gsub(",.*", "", speed_string) # revomve everything behinde the speed value
# set speed variable via loop
# speed contains the positions of rows with information about speed.
# so from row 1 to speed[1]-1 we dont know anthyting about speed yet and so it shall be na
# from position speed[1] to speed[2]-1 it shall be the value of speed_string[1] and so on
data_wanted$speed = NA
for(i in 1:length(speed))
{
current = speed[i] # position of speed-update-information
till_next = ifelse(i < length(speed), speed[i+1]-1, NROW(data_wanted)) # untill position of following speed-update-information or end of Dataframe if no more update information
data_wanted$speed[current:till_next] = as.numeric(speed_string[i]) # set values
}
data_wanted
cbind(data_wanted, data_temp$change)
# ID time somevalues speed data_temp$change
# 1 1 12:33 13 NA <NA>
# 2 2 12:34 27 34 speed:34
# 3 3 12:35 45 34 width:127
# 4 4 12:36 41 34 <NA>
# 5 5 12:37 44 34 height:19.2
# 6 6 12:35 45 34 <NA>
# 7 7 12:36 49 35 speed:35
# 8 8 12:37 44 27 speed:27
# 9 9 12:38 45 27 <NA>
# 10 10 12:39 44 27 <NA>
# 11 11 12:40 44 28 height:18,speed:28
# 12 12 12:41 40 29 speed:29,width:120.1
# 13 13 12:42 44 30 height:18.1,speed:30,with:50
# 14 14 12:43 55 30 width:128.1
# 15 15 12:44 41 30 <NA>
I have a dataframe that looks like this (there are hundreds of more rows)
hour magnitude tornadoCount hourlyTornadoCount Percentage Tornadoes
1: 01 AM 0 5 18 0.277777778
2: 01 AM 1 9 18 0.500000000
3: 01 AM 2 2 18 0.111111111
4: 01 AM 3 2 18 0.111111111
5: 01 PM 0 76 150 0.506666667
6: 01 PM 1 45 150 0.300000000
7: 01 PM 2 21 150 0.140000000
8: 01 PM 3 5 150 0.033333333
9: 01 PM 4 3 150 0.020000000
10: 02 AM 0 4 22 0.181818182
11: 02 AM 1 6 22 0.272727273
12: 02 AM 2 11 22 0.500000000
13: 02 AM 4 1 22 0.045454545
14: 02 PM 0 98 173 0.566473988
15: 02 PM 1 36 173 0.208092486
16: 02 PM 2 25 173 0.144508671
17: 02 PM 3 11 173 0.063583815
18: 02 PM 4 2 173 0.011560694
19: 02 PM 5 1 173 0.005780347
20: 03 AM 1 6 9 0.666666667
21: 03 AM 2 2 9 0.222222222
22: 03 AM 3 1 9 0.111111111
23: 03 PM 0 116 257 0.451361868
24: 03 PM 1 84 257 0.326848249
25: 03 PM 2 39 257 0.151750973
26: 03 PM 3 12 257 0.046692607
27: 03 PM 4 6 257 0.023346304
28: 04 AM 0 4 16 0.250000000
29: 04 AM 1 5 16 0.312500000
30: 04 AM 2 5 16 0.312500000
I want to reorganize this such that the data is arrange chronologically according to the "hour" column. Is there a way to do this? Thanks!
You can transform to a 24-hour based time using lubridate parser (%I is decimal hour (1-12) and %p is AM/PM indicator) an then sort based on that so using dpylr and lubridate:
library(dplyr)
library(lubridate)
ordered_df <- df %>%
mutate(hour_24 = parse_date_time(hour, '%I %p')) %>%
arrange(hour_24)
Column data is distributed by YEAR, MONTH and DAY, each row is associated to a fourth column named X.
How to obtain the summatory of X at YEAR, MONTH and DAY matched row values and sort the results, for example:
A:
year month day x
2000 01 01 50
2000 01 02 30
2002 02 03 50
1994 01 01 3
2000 01 01 50
1996 01 02 5
2000 01 01 10
And obtain
A:
year month day x
1994 01 01 3
1996 01 02 5
2000 01 01 110
2000 01 02 30
2002 02 03 50
dplyr is a good option for this:
library(dplyr)
A %>% group_by(year, month, day) %>% summarise('x' = sum(x))
which gives the desired:
year month day x
1994 01 01 3
1996 01 02 5
2000 01 01 110
2000 01 02 30
2002 02 03 50