How to complete missing values with Na in a list? - r

I have a data frame that has the following column: Tree ID, month, values. For some months, there is no recorded data, therefore those months do not exist in the data frame. I have completed the list with the missing months but now I do not know how to insert NA in the value column for the added months.
Example:
Tree.Id: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
Month: Jan, Feb, Mar, May, Jun, Jul, Sept, Oct, Nov, Dec
Values: 1,0,1,1,0,2,1,1,0,2
The following months are missing: Apr, Aug,
I added them with the code below, and now I want for those 2 added months to introduce NA in the value column.
Here is what I tried:
tree_ls <- list()
for (i in unique(data$Tree.ID)){
mon1 <- data$month[data$Tree.ID == i] ### extract the month for every Tree iD
mon <- min(mon1, na.rm=T):max(mon1, na.rm=T) # completes the numbers with the missing month
dat1 <- data$value[data$Tree.ID == i]
......
After this step, I do not know how to create a list that will add NA for all the added months that were missing, so I will have lists of the same length.
Thanks

This is an old post, but I have a pretty good solution for this:
To begin, your small reproducible code should probably be the following:
month <- c(Jan, Feb, Mar, May, Jun, Jul, Sept, Oct, Nov, Dec)
value <- c(1,0,1,1,0,2,1,1,0,2)
df <- data.frame(id=id, month=month,value=value)
> head(df)
id month value
1 1 Jan 1
2 2 Feb 0
3 3 Mar 1
4 4 May 1
5 5 Jun 0
6 6 Jul 2
Now just simply introduce an entire list of your domain, e.g., your months you want to obtain NA's where missing.
completeMonths <- c("Jan", "Feb", "Mar", "Apr","May", "Jun", "Jul","Aug", "Sept", "Oct", "Nov", "Dec")
df2 <- dataframe(month=completeMonths)
> df2
month
1 Jan
2 Feb
3 Mar
4 Apr
5 May
6 Jun
7 Jul
8 Aug
9 Sept
10 Oct
11 Nov
12 Dec
Now we have a column with all the underlying values, so when we merge, we can fill the missing rows as NA with the following syntax:
merge(df, df2, on=month, all=TRUE)
With our results as follows:
month id value
1 Dec 10 2
2 Feb 2 0
3 Jan 1 1
4 Jul 6 2
5 Jun 5 0
6 Mar 3 1
7 May 4 1
8 Nov 9 0
9 Oct 8 1
10 Sept 7 1
11 Apr NA NA
12 Aug NA NA
Hope this helps, data wrangling sucks.

When you say that you have a data frame with some months that have "no recorded data" and therefore "do not exist", the fact that they are in the data frame at all means they have some representation. I'm going to guess that by "do not exist" you mean that they are blank strings, such as "". If that's the case, you can replace the blank strings with NA values using mutate in the dplyr package and ifelse in the base package as follows:
library(dplyr);
data_with_nas <- mutate(data, value = ifelse(value=="", NA, value));
That reads as "change the data data frame such that its value cells are replaced with NA if they were a blank string, or kept as is otherwise."

Related

Can't `relevel` a factor variable in data frame because it is an `integer`, how to fix?

I have a tibble data frame in R called research_fields:
# A tibble: 411 × 2
Response_ID Field
<int> <fct>
1 1 Business
2 2 Psychology
3 3 Medicine and health
4 4 Other
5 5 Medicine and health
6 6 Education
7 7 Public policy
8 8 Computer science
9 9 Biology
10 10 Medicine and health
# … with 401 more rows
The Field variable is already a factor, but I want to use forcats::fct_relevel() to move the "Other" factor to be the last, so I tried this:
fct_relevel(research_fields$"Field", "Other", Inf)
But I get the error:
Error:
! Can't convert a number to a character vector.
I don't understand why because Field is clearly a factor (fct), not an integer. Why is this?
How to I make "Other" come last? Is there a way to do this with mutate() and something else from forcats:: perhaps?
Thank you.
You need to name the after= parmeter when calling fct_relevel
fct_relevel(research_fields$Field, "Other", after=Inf)
This is because the function signature is
fct_relevel(.f, ..., after = 0L)
That means that if you don't use after= then all the values you passed get sucked in to the ... and are assumed to be level names and the numeric value for infinity is clearly not a valid level name.
You might need a better conception of what a "factor" variable is. Let's assume we have this factor variable with the associated levels.
head(df$Field)
# [1] Jan May Jan Sep Oct Apr
# 12 Levels: Apr Aug Dec Feb Jan Jul Jun Mar May Nov ... Sep
An easy thing to do is to create a factor again with arbitrary levels= in the desired order (in this example we want of course the right order of the months).
df$Field <- factor(df$Field, levels=c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep",
"Oct", "Nov", "Dec"))
The variable stays the same, but the levels have changed.
head(df$Field)
# [1] Jan May Jan Sep Oct Apr
# 12 Levels: Jan Feb Mar Apr May Jun Jul Aug Sep Oct ... Dec
Data:
set.seed(42)
df <- data.frame(Field=factor(month.abb[sample(1:12, 50, replace=TRUE)]))

Updating a numeric column with characters in R

I have a column like this of the Data data.frame:
Month
3
6
9
3
6
9
3
6
9
...
I want to update 3 with March, 6 with Jume, 9 with September. I know how to do it if I have two months 3 and 10 for example with: mutate(Data, Month=if_else(Month==3,"March","October")) How can I do it for three months?
Expected output:
Month
March
June
September
March
June
September
March
June
September
...
You could just use your numerical month values to access month.name, which is R's built-in vector of month names, starting at index 1:
Data <- data.frame(Month=c(3,6,9))
Data$MonthName <- month.name[Data$Month]
Data
Month MonthName
1 3 March
2 6 June
3 9 September

Giving month names to a variable of numbers in R

I have a data set with the variable 'months' from 1 to 12, but need to change them to the month names. i.e "1" needs to be January and so on. Whats the easiest way to do this?
R has an inbuilt vector called month.name for your purpose you could do something like the following:
# Some dummy data
set.seed(1)
df <- data.frame(
month = sample(1:12, size = 10)
)
# Now use your integer month to subset month.name
df$month2 <- month.name[df$month] # Also has month.abb
df
month month2
1 9 September
2 4 April
3 7 July
4 1 January
5 2 February
6 5 May
7 3 March
8 8 August
9 6 June
10 11 November

How do I update data from an incomplete lookup table?

I have a table that uses unique IDs but inconsistent readable names for those IDs. It is more complex than month names, but for the sake of a more simple example, let's say it looks something like this:
demo_frame <- read.table(text=" Month_id Month_name Number
1 Jan 37
2 Feb 63
3 March 9
3 Mar 150
2 February 49", header=TRUE)
Except that they might have spelled "Feb" or "March" eight different ways. I also have a clean data frame that contains consistent names for the names that have variations:
month_lookup <- read.table(text=" Month_id Month_name
2 Feb
3 Mar", header=TRUE)
I want to get to this:
1 Jan 37
2 Feb 63
3 Mar 9
3 Mar 150
2 Feb 49"
I tried merge(month_lookup, demo_frame, by = "Month_id") but that dropped all the January values because "Jan" doesn't exist in the lookup table:
Month_id Month_name.x Month_name.y Number
1 2 Feb Feb 63
2 2 Feb February 49
3 3 Mar March 9
4 3 Mar Mar 150
My read of How to replace data.frame column names with string in corresponding lookup table in R is that I ought to be able to use plyr::mapvalues but I'm unclear from examples and documentation on how I'd map the id to the name. I don't just want to say "Replace 'March' with 'Mar'" -- I need to say SET month_name = 'Mar' WHERE month_id = 3 for each value in lookup.
I think you want this.
library(dplyr)
demo_frame <- read.table(text=" Month_id Month_name Number
1 Jan 37
2 Feb 63
3 March 9
3 Mar 150
2 February 49", header=TRUE, stringsAsFactors = FALSE)
month_lookup <- read.table(text=" Month_id Month_name
2 Feb
3 Mar", header=TRUE, stringsAsFactors = FALSE)
result =
demo_frame %>%
rename(bad_month = Month_name) %>%
left_join(month_lookup) %>%
mutate(month_fix =
Month_name %>%
is.na %>%
ifelse(bad_month, Month_name) )

Read Data into Time Series Object in R

My data looks as follows:
Month/Year;Number
01/2010; 1.0
02/2010;19.0
03/2010; 1.0
...
How can I read this into a ts(object) in R?
Try this (assuming your data is called df)
ts(df$Number, start = c(2010, 01), frequency = 12)
## Jan Feb Mar
## 2010 1 19 1
Edit: this will work only if you don't have missing dates and your data is in correct order. For a more general solution see #Anandas answer below
I would recommend using zoo as a starting point. This will ensure that if there are any month/year combinations missing, they would be handled properly.
Example (notice that data for April is missing):
mydf <- data.frame(Month.Year = c("01/2010", "02/2010", "03/2010", "05/2010"),
Number = c(1, 19, 1, 12))
mydf
# Month.Year Number
# 1 01/2010 1
# 2 02/2010 19
# 3 03/2010 1
# 4 05/2010 12
library(zoo)
as.ts(zoo(mydf$Number, as.yearmon(mydf$Month.Year, "%m/%Y")))
# Jan Feb Mar Apr May
# 2010 1 19 1 NA 12

Resources