Conditional mutating of the R data frame based on the strings - r

I am using R and trying to create a new column based on the string information from the existing columns.
My data is like:
risk_code | area
-----------------------------------
DEEP DIGGING ALL | --
CONSTRUCTION PRO | Construction
CLAIMS ONSHORE | --
OFFSHORE CLAIMS | --
And the result I need is:
risk_code | area | area_new
-------------------------------------------------
DEEP DIGGING ALL | -- | Digging
CONSTRUCTION PRO | Construction | Construction
CLAIMS ONSHORE | -- | Onshore
OFFSHORE CLAIMS | -- | Offshore
I understanding that I make several mistakes in the code, but after the whole week of staring on it and internet searching, I cannot get the result I need.
I appreciate your help.
Thanks in advance.
Occupancy <- read_excel("Occupancy.xlsx")
OccupancyMutated <- mutate(Occupancy, area_new = area)
OccupancyMutated <- as.data.frame(OccupancyMutated)
OccupancyMutated$area_new[Occupancy$area == "--"] <-
{
if (OccupancyMutated$risk_code == %Digging%) {"Digging"}
else if (OccupancyMutated$risk_code == %ONSHORE%) {"Onshore"}
else if (OccupancyMutated$risk_code == %OFFSHORE%) {"Offshore"}
else {"empty"}
}
View(OccupancyMutated)

We can use stringr for this operation. The function word will extract the first word of each string in risk_code and the function str_to_title will convert to your required format. Both functions are vectorized so simply,
library(stringr)
str_to_title(word(df$risk_code, 1, 1))
#[1] "Digging" "Construction" "Onshore" "Offshore"
If it is not always the first word and you need to do it for specific words only, you can do,
str_to_title(str_extract(tolower(df$risk_code), 'digging|offshore|onshore'))
#[1] "Digging" NA "Onshore" "Offshore"

So, this is the answer (thanks to Sotos):
Occupancy <- read_excel("Occupancy.xlsx")
OccupancyMutated <- mutate(Occupancy, area_new = area)
OccupancyMutated <- as.data.frame(OccupancyMutated)
OccupancyMutated$area_new[Occupancy$area == "--"] <-
str_to_title(str_extract(tolower(Occupancy$risk_code), 'Extraction|Offshore|Onshore'))
View(OccupancyMutated)

Related

Kusto query calculate 2 metric fields

I'm doing a query in Kusto on Azure to bring the memory fragmentation value of Redis, this value is obtained by dividing the RSS memory by the memory used, the problem is that I am not able to do the calculation using these two different fields because it is necessary to filter the value of the "Average" field of the "usedmemoryRss" and "usedmemory" fields when I do the filter on the extend line the query returns no value, the code looks like this:
AzureMetrics
| extend m1 = Average | where MetricName == "usedmemoryRss" and
| extend m2 = Average | where MetricName == "usedmemory"
| extend teste = m1 / m2
When I remove the "where" clauyse from the lines it divides the value of each record by itself and return 1. Is it possible to do that? Thank you in advance for your help.
Thanks for the answer Justin you gave me an idea and i solved this way
let m1 = AzureMetrics | where MetricName == "usedmemoryRss" | where Average != 0 | project Average;
let m2 = AzureMetrics | where MetricName == "usedmemory" | where Average != 0 | project Average;
print memory_fragmentation=toscalar(m1) / toscalar(m2)
let Average=datatable (MetricName:string, Value:long)
["usedmemoryRss", 10,
"usedmemory", "5"];
let m1=Average
| where MetricName =="usedmemoryRss" | project Value;
let m2=Average
| where MetricName =="usedmemory" | project Value;
print teste=toscalar(m1) / toscalar (m2)

Create categorical variables using R based on percentage of black people in given area

I am new to R (and coding in general), so my apology if I do not use the appropriate terminology.
I want to create 3 categorical variables based on the percentage of black people living in a given area.
For instance:
HighBlack = 1 if a city has over 40% of black residents and 0 otherwise.
MidBlack=1 if a city has between 10% and 40% of black residents and 0 otherwise.
LowBlack=1 if a city has less than 10% black people and 0 otherwise.
I use to following code to create the variable HighBlack. Note that all the cities included have over 40% of black residents.
```pedestrian_stops$High_Black<-ifelse(pedestrian_stops$LOCATION_ZONE == 'Sumner - Glenwood' | 'Willard - Hay' | 'Near - North' | 'Folwell' | 'Phillips West' | 'Cedar Riverside' | 'Webber - Camden' |'Ventura Village' | 'Hawthorne' |'Jordan' | 'Harrison' , 1, 0)```
I get the following error message:
"Error in pedestrian_stops$LOCATION_ZONE == "Sumner - Glenwood" | "Willard - Hay" :
operations are possible only for numeric, logical or complex types"
What to do? I have tried in vain looking at ways to create categorical variables on youtube and other blogs.
Thank you in advance for your help.
Cathy
The problem is the use of the operator "OR" |. You would need to apply it to an equivalency (==) for each of the strings in your list.
i.e.:
pedestrian_stops$High_Black <-
ifelse(pedestrian_stops$LOCATION_ZONE == 'Sumner - Glenwood' | pedestrian_stops$LOCATION_ZONE == 'Willard - Hay' | pedestrian_stops$LOCATION_ZONE == 'Near - North' | pedestrian_stops$LOCATION_ZONE == 'Folwell' | pedestrian_stops$LOCATION_ZONE == 'Phillips West' | pedestrian_stops$LOCATION_ZONE == 'Cedar Riverside' | pedestrian_stops$LOCATION_ZONE == 'Webber - Camden' | pedestrian_stops$LOCATION_ZONE == 'Ventura Village' | pedestrian_stops$LOCATION_ZONE == 'Hawthorne' | pedestrian_stops$LOCATION_ZONE == 'Jordan' | pedestrian_stops$LOCATION_ZONE == 'Harrison' , 1, 0)
As that is cumbersome, I would suggest you use {dplyr}'s pipe in operator %in% with a list.
library(dplyr)
list_of_neighborhoods <- c('Sumner - Glenwood','Willard - Hay','Near - North','Folwell','Phillips West','Cedar Riverside','Webber - Camden','Ventura Village','Hawthorne', 'Jordan','Harrison')
pedestrian_stops$High_Black <-
ifelse(pedestrian_stops$LOCATION_ZONE %in% list_of_neighborhoods, 1, 0)
As is mentioned by Nicolás, consider making a list:
high_percentage <- c('Sumner - Glenwood','Willard - Hay','Near - North','Folwell','Phillips West','Cedar Riverside','Webber - Camden','Ventura Village','Hawthorne', 'Jordan','Harrison')
Then assign to a new variable as follows:
pedestrian_stops$High_Black <- as.integer(pedestrian_stops$LOCATION_ZONE %in% high_percentage)

How to separate out letters in a sentence using R

I have a character vector that is a string of letters and punctuation. I want to create a data frame where each column is made up of a letter/character from this string.
e.g.
Character string = I WENT TO THE FAIR
Dataframe = | I | | W | E | N | T | | T | O | | T | H | E | | F | A | I | R |
I thought I could do this using a loop with substr, but I can't work out how to get R to write into separate columns, rather than just writing over the previous letter. I'm new to writing loops etc so struggling a bit to get my head around the way in which to compose what I need.
Thanks for any help and advice that you can offer.
Best wishes,
Natalie
This should get that result
string <- "I WENT TO THE FAIR"
df <- as.data.frame(t(as.data.frame(strsplit(string,""))), row.names = "1")

Replacement and non-matches with 'sub'

Months ago I ended up with a sub statement that originally worked with my input data. It has since stopped working causing me to re-examine my ugly process. I hate to share it but it accomplished several things at once:
active$id[grep("CIR",active$description)] <- sub(".*CIR0*(\\d+).*","\\1",active$description[grep("CIR",active$description)],perl=TRUE)
This statement created a new id column by finding rows that had an id embedded in the description column. The sub statement would find the number following a "CIR0" and populate the id column iff there was an id within a row's description. I recognize it is inefficient with the embedded grep subsetting either side of the assignment.
Is there a way to have a 'sub' replacement be NA or empty if the pattern does not match? I feel like I'm missing something very simple but ask for the community's assistance. Thank you.
Example with the results of creating an id column:
| name | id | description |
|------+-----+-------------------|
| a | 343 | Here is CIR00343 |
| b | | Didn't have it |
| c | 123 | What is CIR0123 |
| d | | CIR lacks a digit |
| e | 452 | CIR452 is next |
I was struggling with the same issue a few weeks ago. I ended up using the str_match function from the stringr package. It returns NA if the target string is not found. Just make sure you subset the result correctly. An example:
library(stringr)
str = "Little_Red_Riding_Hood"
sub(".*(Little).*","\\1",str) # Returns 'Little'
sub(".*(Big).*","\\1",str) # Returns 'Little_Red_Riding_Hood'
str_match(str,".*(Little).*")[1,2] #Returns 'Little'
str_match(str,".*(Big).*")[1,2] # Returns NA
I think in this case you could try using ifelse(), i.e.,
active$id[grep("CIR",active$description)] <- ifelse(match, replacement, "")
where match should evaluate to true if there's a match, and replacement is what that element would be replaced with in that case. Likewise, if match evaluates to false, that element's replaced with an empty string (or NA if you prefer).

Code new variable based on grep return in R

I have a variable actor which is a string and contains values like "military forces of guinea-bissau (1989-1992)" and a large range of other different values that are fairly complex. I have been using grep() to find character patterns that match different types of actors. For example I would like to code a new variable actor_type as 1 when actor contains "military forces of", doesn't contain "mutiny of", and the string variable country is also contained in the variable actor.
I am at a loss as to how to conditionally create this new variable without resorting to some type of horrible for loop. Help me!
Data looks roughly like this:
| | actor | country |
|---+----------------------------------------------------+-----------------|
| 1 | "military forces of guinea-bissau" | "guinea-bissau" |
| 2 | "mutiny of military forces of guinea-bissau" | "guinea-bissau" |
| 3 | "unidentified armed group (guinea-bissau)" | "guinea-bissau" |
| 4 | "mfdc: movement of democratic forces of casamance" | "guinea-bissau" |
if your data is in a data.frame df:
> ifelse(!grepl('mutiny of' , df$actor) & grepl('military forces of',df$actor) & apply(df,1,function(x) grepl(x[2],x[1])),1,0)
[1] 1 0 0 0
grepl returns a logical vector and this can be assigned to whatever, e.g. df$actor_type.
breaking that appart:
!grepl('mutiny of', df$actor) and grepl('military forces of', df$actor) satisfy your first two requirements. the last piece, apply(df,1,function(x) grepl(x[2],x[1])) goes row by row and greps for country in actor.

Resources