Updating a numeric column with characters in R - r

I have a column like this of the Data data.frame:
Month
3
6
9
3
6
9
3
6
9
...
I want to update 3 with March, 6 with Jume, 9 with September. I know how to do it if I have two months 3 and 10 for example with: mutate(Data, Month=if_else(Month==3,"March","October")) How can I do it for three months?
Expected output:
Month
March
June
September
March
June
September
March
June
September
...

You could just use your numerical month values to access month.name, which is R's built-in vector of month names, starting at index 1:
Data <- data.frame(Month=c(3,6,9))
Data$MonthName <- month.name[Data$Month]
Data
Month MonthName
1 3 March
2 6 June
3 9 September

Related

Giving month names to a variable of numbers in R

I have a data set with the variable 'months' from 1 to 12, but need to change them to the month names. i.e "1" needs to be January and so on. Whats the easiest way to do this?
R has an inbuilt vector called month.name for your purpose you could do something like the following:
# Some dummy data
set.seed(1)
df <- data.frame(
month = sample(1:12, size = 10)
)
# Now use your integer month to subset month.name
df$month2 <- month.name[df$month] # Also has month.abb
df
month month2
1 9 September
2 4 April
3 7 July
4 1 January
5 2 February
6 5 May
7 3 March
8 8 August
9 6 June
10 11 November

find number of customers added each month

customer_id transaction_id month year
1 3 7 2014
1 4 7 2014
2 5 7 2014
2 6 8 2014
1 7 8 2014
3 8 9 2015
1 9 9 2015
4 10 9 2015
5 11 9 2015
2 12 9 2015
I am well familiar with R basics. Any help will be appreciated.
the expected output should look like following:
month year number_unique_customers_added
7 2014 2
8 2014 0
9 2015 3
In the month 7 and year 2014, only customers_id 1 and 2 are present, so number of customers added is two. In the month 8 and year 2014, no new customer ids are added. So there should be zero customers added in this period. Finally in year 2015 and month 9, customer_ids 3,4 and 5 are the new ones added. So new number of customers added in this period is 3.
Using data.table:
require(data.table)
dt[, .SD[1,], by = customer_id][, uniqueN(customer_id), by = .(year, month)]
Explanation: We first remove all subsequent transactions of each customer (we're interested in the first one, when she is a "new customer"), and then count unique customers by each combination of year and month.
Using dplyr we can first create a column which indicates if a customer is duplicate or not and then we group_by month and year to count the new customers in each group.
library(dplyr)
df %>%
mutate(unique_customers = !duplicated(customer_id)) %>%
group_by(month, year) %>%
summarise(unique_customers = sum(unique_customers))
# month year unique_customers
# <int> <int> <int>
#1 7 2014 2
#2 8 2014 0
#3 9 2015 3

Mutate column showing increasing/decreasing value compared to previous period [duplicate]

This question already has answers here:
subtract value from previous row by group
(3 answers)
Closed 4 years ago.
Let's say I have a df like this
Dealer Period Revenue
A August 10
B August 10
A September 30
B September 5
How can I use mutate function to create a column which shows the compared result of Revenue to previous period.
The result i want is something like
Dealer Period Revenue Compared_result
A August 10 N/A
B August 10 N/A
A September 30 20
B September 5 -5
library(dplyr)
df %>% group_by(Dealer) %>%
mutate(Comp=Revenue-lag(Revenue))
# A tibble: 4 x 4
# Groups: Dealer [2]
Dealer Period Revenue Comp
<fct> <fct> <int> <int>
1 A August 10 NA
2 B August 10 NA
3 A September 30 20
4 B September 5 -5

How to complete missing values with Na in a list?

I have a data frame that has the following column: Tree ID, month, values. For some months, there is no recorded data, therefore those months do not exist in the data frame. I have completed the list with the missing months but now I do not know how to insert NA in the value column for the added months.
Example:
Tree.Id: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
Month: Jan, Feb, Mar, May, Jun, Jul, Sept, Oct, Nov, Dec
Values: 1,0,1,1,0,2,1,1,0,2
The following months are missing: Apr, Aug,
I added them with the code below, and now I want for those 2 added months to introduce NA in the value column.
Here is what I tried:
tree_ls <- list()
for (i in unique(data$Tree.ID)){
mon1 <- data$month[data$Tree.ID == i] ### extract the month for every Tree iD
mon <- min(mon1, na.rm=T):max(mon1, na.rm=T) # completes the numbers with the missing month
dat1 <- data$value[data$Tree.ID == i]
......
After this step, I do not know how to create a list that will add NA for all the added months that were missing, so I will have lists of the same length.
Thanks
This is an old post, but I have a pretty good solution for this:
To begin, your small reproducible code should probably be the following:
month <- c(Jan, Feb, Mar, May, Jun, Jul, Sept, Oct, Nov, Dec)
value <- c(1,0,1,1,0,2,1,1,0,2)
df <- data.frame(id=id, month=month,value=value)
> head(df)
id month value
1 1 Jan 1
2 2 Feb 0
3 3 Mar 1
4 4 May 1
5 5 Jun 0
6 6 Jul 2
Now just simply introduce an entire list of your domain, e.g., your months you want to obtain NA's where missing.
completeMonths <- c("Jan", "Feb", "Mar", "Apr","May", "Jun", "Jul","Aug", "Sept", "Oct", "Nov", "Dec")
df2 <- dataframe(month=completeMonths)
> df2
month
1 Jan
2 Feb
3 Mar
4 Apr
5 May
6 Jun
7 Jul
8 Aug
9 Sept
10 Oct
11 Nov
12 Dec
Now we have a column with all the underlying values, so when we merge, we can fill the missing rows as NA with the following syntax:
merge(df, df2, on=month, all=TRUE)
With our results as follows:
month id value
1 Dec 10 2
2 Feb 2 0
3 Jan 1 1
4 Jul 6 2
5 Jun 5 0
6 Mar 3 1
7 May 4 1
8 Nov 9 0
9 Oct 8 1
10 Sept 7 1
11 Apr NA NA
12 Aug NA NA
Hope this helps, data wrangling sucks.
When you say that you have a data frame with some months that have "no recorded data" and therefore "do not exist", the fact that they are in the data frame at all means they have some representation. I'm going to guess that by "do not exist" you mean that they are blank strings, such as "". If that's the case, you can replace the blank strings with NA values using mutate in the dplyr package and ifelse in the base package as follows:
library(dplyr);
data_with_nas <- mutate(data, value = ifelse(value=="", NA, value));
That reads as "change the data data frame such that its value cells are replaced with NA if they were a blank string, or kept as is otherwise."

insert NA into a time series object in r

i want to sum the months for all the years in a time series that looks like
Jan Feb Mar Apr Jun Jul Aug Sep Oct Nov Dec
2006 4 4 3 4 4 5 5 3 3
2007 3 3 2 2 4 3 3 2 2 5 5
2008 3 3 3 2 2 4 4 3
by using
window(the time series object,start=c(2006,3),end=c(2008,3),frequency=1)
this line gives you a new ts object with just march of 2006-2007. However this does not work when the month does not have any values in it, is there any way to replace the gaps with NA? I have seen questions like this before but the dont answer i think for a ts object.
Assuming that
the_time_series_object <- ts(1:31, frequency = 12, start = c(2006, 3))
then:
window(the time series object, start = c(2006,3), end = c(2008,3), frequency = 12)
Your frequency should be 12 instead of 1. There's no NA problem it's just that one variable that you have wrong

Resources