Unable to change name value in a vector - r

named_vector=c(a=1,b=2,c=3,d=4,e=5,f=6,g=7)
names(named_vector)[names(named_vector)=='c'] <- 'k'
names(named_vector[names(named_vector)])=='c'<-'k'
Unable to change name of a member 'c' in named_vector using line 3, but working fine with line 2
getting the error message as --------------------->
Error in names(named_vector[names(named_vector)]) == "c" <- "k" :
could not find function "==<-"

You can index by numeric position:
`names(named_vector)[3] <- "new name" `

Line 3 doesn't work because you're nesting your data too much. If you break this down
names(named_vector[names(named_vector)]) == 'c' <- 'k'
you get
# Gives you all the names back
names(named_vector)
# [1] "a" "b" "c" "d" "e" "f" "g"
# Putting it back in, you simply get all the values again
names(named_vector[c("a", "b", "c", "d", "e", "f", "g")])
# The inner part simply gives you the `named_vector` again
named_vector[c("a", "b", "c", "d", "e", "f", "g")]
# a b c d e f g
# 1 2 3 4 5 6 7
This is not to mention that the assignment is being saved into a vector
names(named_vector[names(named_vector)]) == 'c'
# [1] FALSE FALSE TRUE FALSE FALSE FALSE FALSE
So Line 2 works because you're indexing your vector names by the equality of which label you wish you change.
names(named_vector)[names(named_vector) == 'c'] <- 'K'

Related

Revise content of string factor using 'ifelse' isn't working [duplicate]

This question already has answers here:
How can I trim leading and trailing white space?
(15 answers)
Closed 2 years ago.
Kia ora data science community, I'm struggling to get an ifelse statement to work when trying to revise the contents of a data frame factor. I'm working with Trap Types of 5 different types, but two of the trap types aren't being summarized correctly. Here's the summary table of the trap types and number of observations associated with each type:
DOC150 Double (Fiordland) DOC150 Single (ATBT)
107748 20260
DOC150 Single (ATBT) DOC200 Double (Run Through)
456 2324
DOC200 Double (Takaka) DOC200 Double (ZIP)
23748 2472
DOC200 Single (Takaka) DOC200 Single (Takaka)
11258 23668
I need DOC150 Single (ATBT) traps to be recognized as the same and summarized as such, with the same being true for DOC200 Single (Takaka). For whatever reason, the trap types are being summarized into individual categories; I suspect that when the information was pulled from the larger dataset that there was something wrong with the spacing of the names.
I've tried using the following code to reclassify one of the errant Trap Types, but to no avail: the categories remain, but the code changes all of the Trap Types from a character factor into a numeric factor and the final tally for each category remains unchanged.
Records2$TrapName<- as.character(ifelse(grepl("Single (Takaka)", Records2$TrapTypeTe), "DOC200 Single (Takaka)", Records2$TrapTypeTe))
Here's the resulting summary table:
1 2 3 4 5 6 7 8
107748 20260 456 2324 23748 2472 11258 23668
I thought I finally understood how to use grepl in ifelse statements, but now I'm stuck. I know how to do this in SAS, but R has thrown me for a loop. Any help would be greatly appreciated. Kia pai to ra, Doug
As mentioned in comments the issue was because of additional space in the column values. You can remove this with trimws and would not require ifelse or grepl.
Records2$TrapTypeTe <- trimws(Records2$TrapTypeTe)
#Check
table(Records2$TrapTypeTe)
Here is an approach using factors - we accidentally include some lower case letters in our codes:
x <- c("D", "B", "E", "e", "A", "a", "E", "E", "E", "D", "E", "D",
"d", "A", "A", "b", "D", "D", "B", "C", "e", "b", "D", "d", "D")
table(x)
x
# a A b B C d D e E
# 1 3 2 2 1 2 7 2 5
x <- factor(x)
levels(x)
# [1] "a" "A" "b" "B" "C" "d" "D" "e" "E"
levels(x) <- c("A", "A", "B", "B", "C", "D", "D", "E", "E")
table(x)
# x
# A B C D E
# 4 4 1 9 7
levels(x)
# [1] "A" "B" "C" "D" "E"

Subsetting in R (Index Explanation)

a <- c("a", "b", "c", "d", "e")
u <- a > "a"
a[u]
The code gives me the output as: "b" "c" "d" "e".
What does a[u] mean ? Do vector a has a new index u of a vector type?
u is a logical vector which is used to subset a.
u
#[1] FALSE TRUE TRUE TRUE TRUE
As 1st element is FALSE, we select all TRUE elements from a by doing a[u]
a[u]
#[1] "b" "c" "d" "e"
It will be more clear with another example. Consider
a <- 11:15
u <- c(FALSE, TRUE, TRUE, FALSE, TRUE)
a[u]
#[1] 12 13 15
So all the elements in a where u is TRUE are selected i.e 12, 13 and 15.
You can figure this out yourself by looking at the contents of the u vector:
u <- a > "a"
u
[1] FALSE TRUE TRUE TRUE TRUE
When you then subset the vector a using this boolean vector u, you are telling R to output a vector consisting only of elements for which the input index be TRUE. This leaves you with just:
[1] "b" "c" "d" "e"
To be more explicit:
"a" "b" "c" "d" "e"
F T T T T
^^ |______________|
drop keep the rest

Why are empty levels in my factor tabulated after I assign NAs to missing values?

I have a dataframe df with a column foo containing data of type factor:
df <- data.frame("bar" = c(1:4), "foo" = c("M", "F", "F", "M"))
When I inspect the structure with str(df$foo), I get this:
Factor w/ 3 levels "","F",..: 2 2 2 2 2 2 2 2 2 2 ..
Why does it report 3 levels when there are only 2 in my data?
Edit:
There seems to be a missing value "" that I clean up by assigning it NA.
When I call table(df$foo), it seems to still count the "missing value" level, but finds no occurences:
F M
0 2 2
However, when I call df$foo I find it reports only two levels:
Levels: F M
How is it possible that table still counts the empty level, and how can I fix that behaviour?
Check whether your dataframe indeed has no missing values, because it does look to be that way. Try this:
# works because factor-levels are integers, internally; "" seems to be level 1
which(as.integer(df$MF) == 1)
# works if your missing value is just ""
which(df$MF == "")
You should then clean up your dataframe to properly refeclet missing values. A factor will handle NA:
df <- data.frame("rest" = c(1:5), "sex" = c("M", "F", "F", "M", ""))
df$sex[which(as.integer(df$sex) == 1)] <- NA
Once you have cleaned your data, you will have to drop unused levels to avoid tabulations such as table counting occurences of the empty level.
Observe this sequence of steps and its outputs:
# Build a dataframe to reproduce your behaviour
> df <- data.frame("Restaurant" = c(1:5), "MF" = c("M", "F", "F", "M", ""))
# notice the empty level "" for the missing value
> levels(df$MF)
[1] "" "F" "M"
# notice how a tabulation counts the empty level;
# this is the first column with a 1 (it has no label because
# there is no label, it is "")
> table(df$MF)
F M
1 2 2
# find the culprit and change it to NA
> df$MF[which(as.integer(df$MF) == 1)] <- as.factor(NA)
# AHA! So despite us changing the value, the original factor
# was not updated! I wonder what happens if we tabulate the column...
> levels(df$MF)
[1] "" "F" "M"
# Indeed, the empty level is present in the factor, but there are
# no occurences!
> table(df$MF)
F M
0 2 2
# droplevels to the rescue:
# it is used to drop unused levels from a factor or, more commonly,
# from factors in a data frame.
> df$MF <- droplevels(df$MF)
# factors fixed
> levels(df$MF)
[1] "F" "M"
# tabulation fixed
> table(df$MF)
F M
2 2

Consecutive character matching and extraction with position

I'm trying to write a generic code in R where I look for 2 (or more in the future) explicit characters in a specific order located consecutively in the vector. Every command I am trying will only return a match for the first character.
I have a character string that looks similar to data and I want to extract the positions that have "L" and "V" next to each other only and in that order. So the only matches I have should be positions 3 & 4 and 7 & 8; However, I will get back positions 1, 3, and 7 as a match for L. Is it possible to only return "LV" matches?
Reproducible data to work with:
data <- c("L", "D", "L", "V", "A", "V", "L", "V")
Here are some possibilities:
which(ts(data) == "L" & stats::lag(ts(data)) == "V")
## [1] 3 7
which(head(data, -1) == "L" & tail(data, -1) == "V")
## [1] 3 7
which(apply(t(embed(data, 2)) == c("V", "L"), 2, all))
## [1] 3 7
which(data == "L" & dplyr::lead(data) == "V")
## [1] 3 7
The vector data could first be collapsed into one string with paste. Then we can find the starting positions via gregexpr. After that, we can form a list of the start and finish points by concatenating the result from gregexpr with the adjusted match length attribute.
x <- gregexpr("LV", paste(data, collapse = ""))[[1]]
Map(c, x, x + attr(x, "match.length") - 1)
# [[1]]
# [1] 3 4
#
# [[2]]
# [1] 7 8

Changing the levels of a pooled DataArray

I'm looking for a way to modify the levels of a DataArray:
result = pool(["a", "a", "b"])
levels(result) = ["A", "B"]
As a quick-and-dirty solution, you can change the pool field of the object -- it happens to be mutable.
result.pool = [ "A", "B" ]
result
# 3-element PooledDataArray{ASCIIString,Uint8,1}:
# "A"
# "A"
# "B"
xdump( result )
# PooledDataArray{ASCIIString,Uint8,1}
# refs: Array(Uint8,(3,)) Uint8[0x01,0x01,0x02]
# pool: Array(ASCIIString,(2,)) ASCIIString["a","b"]

Resources