I am getting an error message when I am attempting to change the name of a data element in a column. This is the structure of the data frame I am using.
'data.frame': 2070 obs. of 7 variables:
$ Period: Factor w/ 188 levels "1 v 1","10 v 10",..: 158 158 158 158 158 158 158 158 158 158 ...
$ Dist : num 7548 7421 9891 8769 10575 ...
$ HIR : num 2676 2286 3299 2455 3465 ...
$ V6 : num 66.2 18.5 81 40 275.1 ...
$ Date : Factor w/ 107 levels "1/3/17","1/4/17",..: 38 38 38 38 38 38 38 38 38 38 ...
$ Type : Factor w/ 28 levels "Captain's Run",..: 5 5 5 5 5 5 5 5 5 5 ...
$ Day : Factor w/ 8 levels "Friday","Monday",..: 1 1 1 1 1 1 1 1 1 1 ...
#> Error: <text>:1:22: unexpected symbol
#> 1: 'data.frame': 2070 obs.
#> ^
```
I wish to change the value Main Session in db$Type to Main Training so I can match this data frame to another I'm using. I'm using the code below to try and do this.
class(db$Type)
db$Type <- as.character(db$Type)
db$Type["Main Session"] = "Main Training"
I am getting this error message when I attempt to run the piece of code.
db$Type["Main Session"] = "Main Training"
Error in `$<-.data.frame`(`*tmp*`, Type, value = c("Main Session", "Main Session", :
replacement has 2071 rows, data has 2070
#> Error: <text>:2:7: unexpected 'in'
#> 1: db$Type["Main Session"] = "Main Training"
#> 2: Error in
#> ^
Being relatively new to R, is there anything I am missing in my code that could resolve this issue? Any suggestions will be greatly appreciated. Thank you.
The error you are encountering is in relation to your subset operation: db$Type["Main Session"] = "Main Training".
Using the mtcars dataset in R we can reproduce this error:
str(iris)
#> 'data.frame': 150 obs. of 5 variables:
#> $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
#> $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#> $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#> $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
#> $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
class(iris$Species)
#> [1] "factor"
iris$Species<- as.character(iris$Species)
iris$Species["setosa"] <- "new name"
#> Error in `$<-.data.frame`(`*tmp*`, Species, value = structure(c("setosa", : replacement has 151 rows, data has 150
Created on 2018-09-03 by the reprex package (v0.2.0).
Inside the square brackets you need to subset the vector using a logical operation (i.e. one that evaluates to TRUE or FALSE.
str(iris)
#> 'data.frame': 150 obs. of 5 variables:
#> $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
#> $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#> $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#> $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
#> $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
iris$Species<- as.character(iris$Species)
unique(iris$Species)
#> [1] "setosa" "versicolor" "virginica"
iris$Species[iris$Species == "setosa"] <- "new name"
unique(iris$Species)
#> [1] "new name" "versicolor" "virginica"
Created on 2018-09-03 by the reprex package (v0.2.0).
Related
I have a list of dataframes (datalist) and I have created wrapped plots with geom_line in ggplot however I am struggling to assign to each plot its title (BA0, OG0, ON0, etc.).
List of 27
$ BA0 :'data.frame': 14587 obs. of 2 variables:
..$ V1 : int [1:14587] 1 2 1 1 2 1 2 1 1 1 ...
..$ V2 : int [1:14587] 43 45 46 48 49 53 55 56 57 58 ...
$ OG0 :'data.frame': 7925 obs. of 2 variables:
..$ V1 : int [1:7925] 1 1 1 1 1 2 5 7 4 10 ...
..$ V2 : int [1:7925] 43 53 84 88 90 91 92 93 94 95 ...
$ ON0 :'data.frame': 8347 obs. of 2 variables:
..$ V1 : int [1:8347] 1 2 10 6 6 3 11 6 6 6 ...
..$ V2 : int [1:8347] 96 97 98 99 100 101 102 103 104 105 ...
Here's the code
names(datalist) <- c("BA0", "OG0", "ON0", ...)
graph<-lapply(datalist,function(x)
p<-ggplot(x,aes(x= V2,y= V1)) +
geom_line(colour = "blue") +
labs(x = "read length", y="occurences") +
scale_x_continuous(n.breaks =10) +
scale_y_continuous(n.breaks = 8)
)
wrap_plots(graph)
I have tried to add to p the options:
ggtitle(datalist$file) ## here I added an extra column to each df corresponding to the names: BA0, OG0, ON0 etc..
ggtitle(names(datalist))
labs(title = names(datalist))
But I only get to have the same title for each plots, BA0, which is the first element of the list
How can I add the correct titles? Or maybe there is a better way to create this serie of plots?
Thanks
You can enframe the list of data frames into a table and create a new column title using the function dplyr::mutate. Then you can use purrr::pmap to map elements of columns to a function formula creating the plot. Lastly, the plot column can be feed into patchwork::wrap_plots:
library(tidyverse)
library(patchwork)
data <-
list(
iris1 = iris,
iris2 = iris
)
str(data)
#> List of 2
#> $ iris1:'data.frame': 150 obs. of 5 variables:
#> ..$ Sepal.Length: num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
#> ..$ Sepal.Width : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#> ..$ Petal.Length: num [1:150] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#> ..$ Petal.Width : num [1:150] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
#> ..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
#> $ iris2:'data.frame': 150 obs. of 5 variables:
#> ..$ Sepal.Length: num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
#> ..$ Sepal.Width : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#> ..$ Petal.Length: num [1:150] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#> ..$ Petal.Width : num [1:150] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
#> ..$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
data %>%
enframe() %>%
mutate(
title = paste0("Fancy title ", name),
plt = list(title, value) %>% pmap(~ {
.y %>%
ggplot(aes(Sepal.Length, Sepal.Width)) +
geom_line() +
labs(title = .x)
})
) %>%
pull(plt) %>%
wrap_plots()
Created on 2022-05-25 by the reprex package (v2.0.0)
Alternatively, one can also create an explicit function for plotting:
my_plot <- function(title, data) {
data %>%
ggplot(aes(Sepal.Length, Sepal.Width)) +
geom_line() +
labs(title = title)
}
data %>%
enframe() %>%
mutate(
title = paste0("Fancy title ", name),
plt = list(title, value) %>% pmap(my_plot)
)
#> # A tibble: 2 × 4
#> name value title plt
#> <chr> <list> <chr> <list>
#> 1 iris1 <df [150 × 5]> Fancy title iris1 <gg>
#> 2 iris2 <df [150 × 5]> Fancy title iris2 <gg>
Using facet_wrap might be suitable. You can try
library(data.table)
library(ggplot2)
#from list to dataframe, adding name of list item as column
dt <- rbindlist(datalist, idcol = TRUE)
#plot using facet_wrap
ggplot(dt,aes(x= V2,y= V1)) +
geom_line(colour = "blue") +
labs(x = "read length", y="occurences") +
scale_x_continuous(n.breaks =10) +
scale_y_continuous(n.breaks = 8) +
facet_wrap(~.id)
I have an issue similar to this thread:
Search across multiple columns with a regular expression and extract the match
Let's take an iris dataset as an example. I would like to filter data based on values in several columns: let's say >=4 in cols which names end with ".Length". IRL data are much more complex than this reprex, which is why I want to use regular expression in cols rather than pick them one by one by their indices.
Tried multiple ways, including the following:
filtered <- iris %>% dplyr::filter(across(matches('.Length')>=4))
to no avail. Please help.
Using dplyr::if_all you could do:
library(dplyr)
iris1 <- iris %>%
filter(if_all(matches(".Length"), ~ .x >= 4))
str(iris1)
#> 'data.frame': 89 obs. of 5 variables:
#> $ Sepal.Length: num 7 6.4 6.9 5.5 6.5 5.7 6.3 6.6 5.9 6 ...
#> $ Sepal.Width : num 3.2 3.2 3.1 2.3 2.8 2.8 3.3 2.9 3 2.2 ...
#> $ Petal.Length: num 4.7 4.5 4.9 4 4.6 4.5 4.7 4.6 4.2 4 ...
#> $ Petal.Width : num 1.4 1.5 1.5 1.3 1.5 1.3 1.6 1.3 1.5 1 ...
#> $ Species : Factor w/ 3 levels "setosa","versicolor",..: 2 2 2 2 2 2 2 2 2 2 ...
Allthough dplyr::if_all and if_any were introduced for these specific situation to use it in conjunction with filter https://www.tidyverse.org/blog/2021/02/dplyr-1-0-4-if-any/
Here we could use an anonymous function:
across(.cols = everything(), .fns = NULL, ..., .names = NULL)
where the .fns argument could be in
purrr-style-lambda e.g. ~ . >= 4:
library(dplyr)
iris %>%
filter(across(ends_with('.Length'), ~ . >= 4))
> str(iris1)
'data.frame': 89 obs. of 5 variables:
$ Sepal.Length: num 7 6.4 6.9 5.5 6.5 5.7 6.3 6.6 5.9 6 ...
$ Sepal.Width : num 3.2 3.2 3.1 2.3 2.8 2.8 3.3 2.9 3 2.2 ...
$ Petal.Length: num 4.7 4.5 4.9 4 4.6 4.5 4.7 4.6 4.2 4 ...
$ Petal.Width : num 1.4 1.5 1.5 1.3 1.5 1.3 1.6 1.3 1.5 1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 2 2 2 2 2 2 2 2 2 2 ...
I have a list of dataframes; each dataframe has the same column names and the same number of columns (2):
> str(read_counts_list)
List of 12
$ AM1:'data.frame': 1978 obs. of 2 variables:
..$ miRNA : chr [1:1978] "let-7a-1-3p" "let-7a-2-3p" "let-7a-5p" "let-7b-3p" ...
..$ read_count: num [1:1978] 1383 0 396731 40 5889 ...
$ AM2:'data.frame': 1978 obs. of 2 variables:
..$ miRNA : chr [1:1978] "let-7a-1-3p" "let-7a-2-3p" "let-7a-5p" "let-7b-3p" ...
..$ read_count: num [1:1978] 930 0 293379 24 3051 ...
$ AM3:'data.frame': 1978 obs. of 2 variables:
..$ miRNA : chr [1:1978] "let-7a-1-3p" "let-7a-2-3p" "let-7a-5p" "let-7b-3p" ...
..$ read_count: num [1:1978] 1321 0 408655 23 2353 ...
and so on up to 12 data frames.
What I want to do now is to change the name of the second column in the data frames ("read_count") to be the name of the data frame.
So something looking like this:
List of 12
$ AM1:'data.frame': 1978 obs. of 2 variables:
..$ miRNA : chr [1:1978] "let-7a-1-3p" "let-7a-2-3p" "let-7a-5p" "let-7b-3p" ...
..$ AM1: num [1:1978] 1383 0 396731 40 5889 ...
$ AM2:'data.frame': 1978 obs. of 2 variables:
..$ miRNA : chr [1:1978] "let-7a-1-3p" "let-7a-2-3p" "let-7a-5p" "let-7b-3p" ...
..$ AM2: num [1:1978] 930 0 293379 24 3051 ...
$ AM3:'data.frame': 1978 obs. of 2 variables:
..$ miRNA : chr [1:1978] "let-7a-1-3p" "let-7a-2-3p" "let-7a-5p" "let-7b-3p" ...
..$ AM3: num [1:1978] 1321 0 408655 23 2353 ...
Of course the idea is not to do it manually with sth like <-c("name1","name2"); I have several dataframes and I will add more later.
What I have tried so far:
read_counts_list_t <- lapply(read_counts_list,function(x) colnames(x)[2] <- names(read_counts_list["x"]))
read_counts_list_t <- lapply(read_counts_list,function(x) colnames(x)[2] <- names(read_counts_list)["x"])
read_counts_list_t <- lapply(read_counts_list,function(x) colnames(x)[2] <- names(read_counts_list[x])) #invalid subscript type
read_counts_list_t <- lapply(read_counts_list,function(x) colnames(x)[2] <- names(read_counts_list)[x])
read_counts_list_t <- lapply(read_counts_list,function(x) colnames(x)[2] <- names(read_counts_list[[x]])) #invalid subscript type
read_counts_list_t <- lapply(read_counts_list,function(x) colnames(x)[2] <- deparse1(substitute(x)))
read_counts_list_t <- lapply(read_counts_list,function(x) colnames(x)[2] <- deparse(quote(x)))
read_counts_list_t <- lapply(read_counts_list,function(x) colnames(x)[2] <- deparse(substitute(read_counts_list["x"])))
read_counts_list_t <- lapply(read_counts_list,function(x) colnames(x)[2] <- paste0(names(read_counts_list["x"])))
All these options either give a strange list where I lose all my data or give error.
Reading here I found a code that changes the name of the column, but the problem is that it deletes the name of the data frames:
read_counts_list_t <- lapply(names(read_counts_list),function(i){
x <- read_counts_list[[i]]
#set 2nd column to a new name
names(x)[2] <- i
#return
x})
> str(read_counts_list_t)
List of 12
$ :'data.frame': 1978 obs. of 2 variables:
..$ miRNA: chr [1:1978] "let-7a-1-3p" "let-7a-2-3p" "let-7a-5p" "let-7b-3p" ...
..$ AM1 : num [1:1978] 1383 0 396731 40 5889 ...
$ :'data.frame': 1978 obs. of 2 variables:
..$ miRNA: chr [1:1978] "let-7a-1-3p" "let-7a-2-3p" "let-7a-5p" "let-7b-3p" ...
..$ AM2 : num [1:1978] 930 0 293379 24 3051 ...
code by:zx8754 -
Then I found something that worked but I really did not understand the code, I would not be able to reproduce it e.g. with a different column, or a sligthly different scenario:
read_counts_list_t <- Map(
function(x,n) setNames(x,c(names(x)[1],n)),
read_counts_list,names(read_counts_list)
)
code by: Axeman -
If someone knows a way of doing this with simple apply, colnames, names functions would be great :D or if you could explain what the last code is doing (yes, I looked ?Map but I was "loster" after that).
You could do that with a loop of this style, using the names of your list to be assigned in a specific name of your dataframes:
#Loop
for(i in 1:length(read_counts_list))
{
names(read_counts_list[[i]])[2] <- names(read_counts_list)[i]
}
An example:
#Data
LS <- split(iris,iris$Species)
#Loop
for(i in 1:length(LS))
{
names(LS[[i]])[5] <- names(LS)[i]
}
Output (Only first element):
LS[[1]]
Sepal.Length Sepal.Width Petal.Length Petal.Width setosa
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5.0 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
11 5.4 3.7 1.5 0.2 setosa
12 4.8 3.4 1.6 0.2 setosa
13 4.8 3.0 1.4 0.1 setosa
14 4.3 3.0 1.1 0.1 setosa
15 5.8 4.0 1.2 0.2 setosa
16 5.7 4.4 1.5 0.4 setosa
17 5.4 3.9 1.3 0.4 setosa
18 5.1 3.5 1.4 0.3 setosa
19 5.7 3.8 1.7 0.3 setosa
20 5.1 3.8 1.5 0.3 setosa
21 5.4 3.4 1.7 0.2 setosa
22 5.1 3.7 1.5 0.4 setosa
23 4.6 3.6 1.0 0.2 setosa
24 5.1 3.3 1.7 0.5 setosa
25 4.8 3.4 1.9 0.2 setosa
26 5.0 3.0 1.6 0.2 setosa
27 5.0 3.4 1.6 0.4 setosa
28 5.2 3.5 1.5 0.2 setosa
29 5.2 3.4 1.4 0.2 setosa
30 4.7 3.2 1.6 0.2 setosa
31 4.8 3.1 1.6 0.2 setosa
32 5.4 3.4 1.5 0.4 setosa
33 5.2 4.1 1.5 0.1 setosa
34 5.5 4.2 1.4 0.2 setosa
35 4.9 3.1 1.5 0.2 setosa
36 5.0 3.2 1.2 0.2 setosa
37 5.5 3.5 1.3 0.2 setosa
38 4.9 3.6 1.4 0.1 setosa
39 4.4 3.0 1.3 0.2 setosa
40 5.1 3.4 1.5 0.2 setosa
41 5.0 3.5 1.3 0.3 setosa
42 4.5 2.3 1.3 0.3 setosa
43 4.4 3.2 1.3 0.2 setosa
44 5.0 3.5 1.6 0.6 setosa
45 5.1 3.8 1.9 0.4 setosa
46 4.8 3.0 1.4 0.3 setosa
47 5.1 3.8 1.6 0.2 setosa
48 4.6 3.2 1.4 0.2 setosa
49 5.3 3.7 1.5 0.2 setosa
50 5.0 3.3 1.4 0.2 setosa
You may use setNames with a Map.
LIST <- Map(setNames, LIST, as.data.frame(t(cbind(sapply(LIST, names)[1,], names(LIST)))))
Result
str(LIST)
# List of 3
# $ AM1:'data.frame': 5 obs. of 2 variables:
# ..$ miRNA: num [1:5] 0.7111 -0.9337 -0.0507 -0.4526 1.4833
# ..$ AM1 : num [1:5] 1.382 -0.5125 -0.0438 -1.091 0.8535
# $ AM2:'data.frame': 5 obs. of 2 variables:
# ..$ miRNA: num [1:5] 0.563 1.256 -1.104 0.367 -0.516
# ..$ AM2 : num [1:5] 0.914 1.308 -0.839 0.403 -1.091
# $ AM3:'data.frame': 5 obs. of 2 variables:
# ..$ miRNA: num [1:5] 0.548 -1.377 2.179 2.264 0.892
# ..$ AM3 : num [1:5] -0.0564 0.6623 -0.7863 -1.5744 0.3109
Data:
LIST <- list(AM1 = structure(list(miRNA = c(-2.51829139109263, -0.877872477016629,
-0.25969747064056, 1.22571401548266, 0.938000291163749), read_count = c(0.766054639939597,
-0.748508051788698, 1.2388957678652, -0.169632288961075, 1.60331976703024
)), class = "data.frame", row.names = c(NA, -5L)), AM2 = structure(list(
miRNA = c(-1.38505365651707, -0.354183187247905, -0.00163202119006995,
0.596080469170588, -0.480439453674378), read_count = c(0.46622524539987,
-2.06658132516899, -0.448024554783029, -0.371688827763805,
0.214663638296237)), class = "data.frame", row.names = c(NA,
-5L)), AM3 = structure(list(miRNA = c(-0.64657040576356, -1.17865215943876,
0.181937607228803, 0.186619326022144, -1.26531611982735), read_count = c(-1.20975273029628,
-0.256061901740592, -0.036373286788934, 0.988560967485261, 0.422093433323588
)), class = "data.frame", row.names = c(NA, -5L)))
This question already has answers here:
Drop unused factor levels in a subsetted data frame
(16 answers)
Closed 2 years ago.
In the dataset below the variable Region before subsetting has the following structure:
> levels(corona$Region)
[1] " Montreal, QC"
[2] "Alabama"
[3] "Alameda County, CA"
[4] "Alaska"
[5] "Alberta"
[6] "American Samoa"
[7] "Anhui" ...
including both United States states as well as counties, and cities, etc.
I want to subset just the states in the United States running the code:
require(RCurl)
require(foreign)
require(tidyverse)
corona = read.csv("https://coviddata.github.io/covid-api/v1/regions/cases.csv", sep =",",header = T)
cor <- corona[corona$Country=="United States" & corona$Region %in% state.name,]
which works, in a way, but somehow keeps the original levels for Region:
> levels(cor$Region)
[1] " Montreal, QC"
[2] "Alabama"
[3] "Alameda County, CA"
[4] "Alaska"
[5] "Alberta"
[6] "American Samoa"
[7] "Anhui"
[8] "Arizona"
[9] "Arkansas"
[10] "Aruba" ...
as though the subsetting never happened. How can I keep only the levels subsetted (the states)?
You can try
cor <- droplevels(cor)
Here, an example using iris dataset:
ir <- subset(iris, Species != "setosa")
> str(ir)
'data.frame': 100 obs. of 5 variables:
$ Sepal.Length: num 7 6.4 6.9 5.5 6.5 5.7 6.3 4.9 6.6 5.2 ...
$ Sepal.Width : num 3.2 3.2 3.1 2.3 2.8 2.8 3.3 2.4 2.9 2.7 ...
$ Petal.Length: num 4.7 4.5 4.9 4 4.6 4.5 4.7 3.3 4.6 3.9 ...
$ Petal.Width : num 1.4 1.5 1.5 1.3 1.5 1.3 1.6 1 1.3 1.4 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 2 2 2 2 2 2 2 2 2 2 ...
Despite we removed one levels of Species, it still have 3 factor levels displayed. But if you are doing:
ir <- droplevels(ir)
> str(ir)
'data.frame': 100 obs. of 5 variables:
$ Sepal.Length: num 7 6.4 6.9 5.5 6.5 5.7 6.3 4.9 6.6 5.2 ...
$ Sepal.Width : num 3.2 3.2 3.1 2.3 2.8 2.8 3.3 2.4 2.9 2.7 ...
$ Petal.Length: num 4.7 4.5 4.9 4 4.6 4.5 4.7 3.3 4.6 3.9 ...
$ Petal.Width : num 1.4 1.5 1.5 1.3 1.5 1.3 1.6 1 1.3 1.4 ...
$ Species : Factor w/ 2 levels "versicolor","virginica": 1 1 1 1 1 1 1 1 1 1 ...
You can noticed that now Species has 2 factor levels instead of 3.
Does it answer your question ?
I am facing the following issues:
I want to replace all NA's of a certain categorical variable with "Unknown", however it does not work.
Here's the code:
x <- "Unknown"
kd$form_of_address[which(is.na(kd$form_of_address))]) <- x
The problem arises when I perform
levels(kd$form_of_address)
Sadly, my output does not include "Unknown".
My data includes ebooks whose weight is always 0. Which code is appropriate to replace NAs of the variable weight that have values of the variable ebook_count with ebook_count > 0 with 0 ?
Thank you in advance :)
I assume your variable is in factor form, which does not let you change its cell if it is not in the level.
Using iris, let's see what may have happened and how it can solved.
str(iris)
'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
We can see Species variable is a factor.
We can put some NAs into it by:
iris[c(11:13),5] = NA
iris[c(11:15), ]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
11 5.4 3.7 1.5 0.2 <NA>
12 4.8 3.4 1.6 0.2 <NA>
13 4.8 3.0 1.4 0.1 <NA>
14 4.3 3.0 1.1 0.1 setosa
15 5.8 4.0 1.2 0.2 setosa
Now, if I try to fill those NAs with "Unknown" using your code:
x = "Unknown"
iris$Species[which(is.na(iris$Species))] = x
which will generate:
Warning message: In [<-.factor(*tmp*, which(is.na(iris$Species)),
value = c(1L, : invalid factor level, NA generated
What you can do first is to add a new level to your variable, and then you can do so
levels(iris$Species) = c(levels(iris$Species), "Unknown")
levels(iris$Species)
[1] "setosa" "versicolor" "virginica" "Unknown"
#You can see now Unknown is one of the levels
iris$Species[which(is.na(iris$Species))] = "Unknown"
table(iris$Species)
setosa versicolor virginica Unknown
47 50 50 3