I am trying to build a function that keeps values constant for a certain amount of months (rows) in a time series. I already have a function which keeps values constant as long as the following rows are NAs. I would like to change the function (or make a new one) in a way so that it keeps the following rows constant for a certain amount of months.
This is my function:
na_locf_until = function(x, n) {
# in time series data, fill in na's until indicated n
l <- cumsum(! is.na(x))
c(NA, x[! is.na(x)])[replace(l, ave(l, l, FUN=seq_along) > (n+1), 0) + 1]
}
Example:
htm <- data.frame (Date = c("Jan 2001", "Feb 2001", "Mar 2001", "Apr 2001", "May 2001", "Jun 2001", "Jul 2001", "Aug 2001", "Aug 2001"),
prc = c(34,35,38,24,22,18,30,32,38),
buy = c(1, 1, 1, 0, 0, 1, 0, 0, 0),
htm_prc = c(34,34,38,38,22,18,18,32,38))
The binary column indicates in Jan 2001 buy. The function should keep - in a next column (or the same) - the value 24 constant for e.g. this month and the next month if the binary variable was 1. I struggle, as i do not want the htm_prc value in Feb 2001 to be 35. Column htm_prc shows my desired outcome.
Maybe my function works as an inspiration.
Thanks in advance!!
Perhaps this helps
library(dplyr)
library(data.table)
htm %>%
mutate(grp = rleid(buy|lag(buy))) %>%
group_by(grp) %>%
mutate(grp2 =as.integer(gl(n(), 2, n()))) %>%
group_by(grp2, .add = TRUE) %>%
mutate(htm_prc2 = if(1 %in% buy) first(prc) else prc) %>%
ungroup %>%
select(-grp, -grp2)
-output
# A tibble: 9 × 5
Date prc buy htm_prc htm_prc2
<chr> <dbl> <dbl> <dbl> <dbl>
1 Jan 2001 34 1 34 34
2 Feb 2001 35 1 34 34
3 Mar 2001 38 1 38 38
4 Apr 2001 24 0 38 38
5 May 2001 22 0 22 22
6 Jun 2001 18 1 18 18
7 Jul 2001 30 0 18 18
8 Aug 2001 32 0 32 32
9 Aug 2001 38 0 38 38
Related
I so have the following data frame
customerid
payment_month
payment_date
bill_month
charges
1
January
22
January
30
1
February
15
February
21
1
March
2
March
33
1
May
4
April
43
1
May
4
May
23
1
June
13
June
32
2
January
12
January
45
2
February
15
February
56
2
March
2
March
67
2
April
4
April
65
2
May
4
May
54
2
June
13
June
68
3
January
25
January
45
3
February
26
February
56
3
March
30
March
67
3
April
1
April
65
3
June
1
May
54
3
June
1
June
68
(the id data is much larger) I want to calculate payment efficiency using the following function,
efficiency = (amount paid not late / total bill amount)*100
not late is paying no later than the 21st day of the bill's month. (paying January's bill on the 22nd of January is considered as late)
I want to calculate the efficiency of each customer with the expected output of
customerid
effectivity
1
59.90
2
100
3
37.46
I have tried using the following code to calculate for one id and it works. but I want to apply and assign it to the entire group id and summarize it into 1 column (effectivity) and 1 row per ID. I have tried using group by, aggregate and ifelse functions but nothing works. What should I do?
df1 <- filter(df, (payment_month!=bill_month & id==1) | (payment_month==bill_month & payment_date > 21 & id==1) )
df2 <-filter(df, id==1001)
x <- sum(df1$charges)
x <- sum(df2$charges)
100-(x/y)*100
An option using dplyr
library(dplyr)
df %>%
group_by(customerid) %>%
summarise(
effectivity = sum(
charges[payment_date <= 21 & payment_month == bill_month]) / sum(charges) * 100,
.groups = "drop")
## A tibble: 3 x 2
#customerid effectivity
# <int> <dbl>
#1 1 59.9
#2 2 100
#3 3 37.5
df %>%
group_by(customerid) %>%
mutate(totalperid = sum(charges)) %>%
mutate(pay_month_number = match(payment_month , month.name),
bill_month_number = match(bill_month , month.name)) %>%
mutate(nolate = ifelse(pay_month_number > bill_month_number, TRUE, FALSE)) %>%
summarise(efficiency = case_when(nolate = TRUE ~ (charges/totalperid)*100))
edit: Solution at the end.
I have a dataframe that contains different variables and the sum of these different variables as a variable called "total".
I want to add a new column that calculates each variables' share of the "total"-variable.
Example:
library(dplyr)
name <- c('A','A',
'B','B')
month = c("oct 2018", "nov 2018",
"oct 2018", "nov 2018")
value <- seq(1:length(month))
df = data.frame(name, month, value)
# Create total variable
dfTotal =
df%>%
group_by_("month")%>%
summarize(value = sum(value, na.rm = TRUE))
dfTotal[["name"]] <- "Total"
dfTotal = as.data.frame(dfTotal)
# Add total column to dataframe
df2 = rbind(df, dfTotal)
df2
which gives the dataframe
name month value
1 A oct 2018 1
2 A nov 2018 2
3 B oct 2018 3
4 B nov 2018 4
5 Total nov 2018 6
6 Total oct 2018 4
What I want is to produce a new column with the shares of the total for each month in the above dataframe, so that I get something like
name month value share
1 A oct 2018 1 0.25 (=1/4)
2 A nov 2018 2 0.33 (=2/6)
3 B oct 2018 3 0.75 (=3/4)
4 B nov 2018 4 0.67 (=4/6)
5 Total nov 2018 6 1.00 (=6/6)
6 Total oct 2018 4 1.00 (=4/4)
Does anybody know how I from the first dataframe can produce the last column in the second dataframe?
Solution:
Based on tmfmnk's comment, the following solves the problem:
df2 =
df2 %>%
group_by(month) %>%
mutate(share = value/max(value))
df2
which gives
name month value share
<fct> <fct> <int> <dbl>
1 A oct 2018 1 0.25
2 A nov 2018 2 0.333
3 B oct 2018 3 0.75
4 B nov 2018 4 0.667
5 Total nov 2018 6 1
6 Total oct 2018 4 1
My time data are in this format:
datatimedf = data.frame(day_time = c('Apr 2005', '1992', "2004", "Jan 2001", "2015"))
I would like to add Jan in rows which only have year.
How is it possible to make it?
An example of expected output is this:
datatimedf = data.frame(day_time = c('Apr 2005', 'Jan 1992', "Jan 2004", "Jan 2001", "Jan 2015"))
What I have for only one row is this:
x[2,1] <- sub("^", "Jan ", x[2,1])
but how can I make it to the whole dataframe?
Here is a quick way to do it using dplyr:
library(dplyr)
datatimedf$day_time <- as.character(datatimedf$day_time)
datatimedf <- datatimedf %>%
transform(day_time = ifelse(nchar(day_time) == 4, paste("Jan", day_time), day_time))
#> day_time
#> 1 Apr 2005
#> 2 Jan 1992
#> 3 Jan 2004
#> 4 Jan 2001
#> 5 Jan 2015
For each line it checks if the length of the string is 4 and if so adds "Jan" to the beginning, otherwise it keeps the original. This isn't very applicable to other situations but it should get you started if you wanted to make it more generic and able to handle more types of input.
I have a data frame like this:
year <-c(floor(runif(100,min=2015, max=2017)))
month <- c(floor(runif(100, min=1, max=13)))
inch <- c(floor(runif(100, min=0, max=10)))
mm <- c(floor(runif(100, min=0, max=100)))
df = data.frame(year, month, inch, mm);
year month inch mm
2016 11 0 10
2015 9 3 34
2016 6 3 33
2015 8 0 77
I only care about the columns year, month, and mm.
I need to re-arrange the data frame so that the first column is the name of the month and the rest of the columns is the value of mm.
Months 2015 2016
Jan # #
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
So two things needs to happen.
(1) The month needs to become a string of the first three letters of the month.
(2) I need to group by year, and then put the mm values in a column under that year.
So far I have this code, but I can't figure it out:
df %>%
select(-inch) %>%
group_by(month) %>%
summarize(mm = mm) %>%
ungroup()
To convert month to names, you can refer to month.abb; And then you can summarize by year and month, spread to wide format:
library(dplyr)
library(tidyr)
df %>%
group_by(year, month = month.abb[month]) %>%
summarise(mm = mean(mm)) %>% # use mean as an example, could also be sum or other
# intended aggregation methods
spread(year, mm) %>%
arrange(match(month, month.abb)) # rearrange month in chronological order
# A tibble: 12 x 3
# month `2015` `2016`
# <chr> <dbl> <dbl>
# 1 Jan 65.50000 28.14286
# 2 Feb 54.40000 30.00000
# 3 Mar 23.50000 95.00000
# 4 Apr 7.00000 43.60000
# 5 May 45.33333 44.50000
# 6 Jun 70.33333 63.16667
# 7 Jul 72.83333 52.00000
# 8 Aug 53.66667 66.50000
# 9 Sep 51.00000 64.40000
#10 Oct 74.00000 39.66667
#11 Nov 66.20000 58.71429
#12 Dec 38.25000 51.50000
I have a table that uses unique IDs but inconsistent readable names for those IDs. It is more complex than month names, but for the sake of a more simple example, let's say it looks something like this:
demo_frame <- read.table(text=" Month_id Month_name Number
1 Jan 37
2 Feb 63
3 March 9
3 Mar 150
2 February 49", header=TRUE)
Except that they might have spelled "Feb" or "March" eight different ways. I also have a clean data frame that contains consistent names for the names that have variations:
month_lookup <- read.table(text=" Month_id Month_name
2 Feb
3 Mar", header=TRUE)
I want to get to this:
1 Jan 37
2 Feb 63
3 Mar 9
3 Mar 150
2 Feb 49"
I tried merge(month_lookup, demo_frame, by = "Month_id") but that dropped all the January values because "Jan" doesn't exist in the lookup table:
Month_id Month_name.x Month_name.y Number
1 2 Feb Feb 63
2 2 Feb February 49
3 3 Mar March 9
4 3 Mar Mar 150
My read of How to replace data.frame column names with string in corresponding lookup table in R is that I ought to be able to use plyr::mapvalues but I'm unclear from examples and documentation on how I'd map the id to the name. I don't just want to say "Replace 'March' with 'Mar'" -- I need to say SET month_name = 'Mar' WHERE month_id = 3 for each value in lookup.
I think you want this.
library(dplyr)
demo_frame <- read.table(text=" Month_id Month_name Number
1 Jan 37
2 Feb 63
3 March 9
3 Mar 150
2 February 49", header=TRUE, stringsAsFactors = FALSE)
month_lookup <- read.table(text=" Month_id Month_name
2 Feb
3 Mar", header=TRUE, stringsAsFactors = FALSE)
result =
demo_frame %>%
rename(bad_month = Month_name) %>%
left_join(month_lookup) %>%
mutate(month_fix =
Month_name %>%
is.na %>%
ifelse(bad_month, Month_name) )