lets say I have a matrix with inventory of car models, owner etc. I have said that the first column of the matrix is car_model. Now I want to assign all Mercedes Benz cars to a variable "mercedes", the issue is in the list some car model names are written as "Mercedes Benz" and others "Benz Mercedes". How do I assign both spelling orders to "mercedes". I know how to assign either of them:
mercedes = which(car_model=="Mercedes Benz")
but how do I assign both? I have tried
mercedes = which(car_model=="Mercedes Benz") + which(car_model=="Benz Mercedes")
but that is not correct
Related
I have a variable within a dataframe I want to quality control. The variable lists locations (character). I have another dataframe which includes all the alternate names for the same location. I want to get a true/false whether there is a match or not between one variable in my data frame, and all the alternate names in a separate dataframe. Is there a way to do this?
i.e. the variable in my dataframe is called FishingGround that I want to quality control:
FishingGround
Lobster Bay
Deep Cove
Whale Head
Then my other dataframe has all the different possible names for the same location. So I want to create a for loop that goes through each observation in my FishingGround variable and checks whether it matches to one of several listed alternate name.
I like to do this as a look-up table. You can use the acceptable names as the names of entries and just look them up. If the name is not on the list, you will get NA as the result. Example:
FishingGround = c("Lobster Bay", "Deep Cove", "Whale Head")
AcceptableNames = c("Lobster Bay", "Lobster Claw",
"Deep Cove", "Shallow Cove", "Whale Tail")
names(AcceptableNames) = AcceptableNames
AcceptableNames[FishingGround]
Lobster Bay Deep Cove <NA>
"Lobster Bay" "Deep Cove" NA
The NAs correspond to unacceptable entries
## Unacceptable names
FishingGround[which(is.na(AcceptableNames[FishingGround]))]
[1] "Whale Head"
## Acceptable names
FishingGround[which(!is.na(AcceptableNames[FishingGround]))]
[1] "Lobster Bay" "Deep Cove"
Hi I have 3 Input COntrols say Car, Bike and Vehicles and Cars contain list as Hyundai,Maruthi,Audi and Bikes Contain Honda,suzuki,Bajaj and Vehicles Input Control contain Cars and Bikes as a dropdown
Scenario 1:when Hyundai is selected in car and Honda is selected in Bike and Car is selected in Vehicles
Report should run by Vehicles and must shows list all the drop down related to car and the same vice versa when Bike is selected in Vehcile
I have created 3 Parameters $P{Cars},$P{Bikes}&$P{Vehicles}
when $P{cars==Hyundai}&&$P{Bike==Honda} and $P{Vechiles==Car} the above equation is not working
(Please adjust with my English)
You are using the parameter syntax wrong: When you say $P{cars==Hyundai} you are passing NO expression but instead, you are calling a parameter with name "cars==Hyundai"
The way to go is
$P{cars} == "Hyundai" or $P{cars}.equals("Hyundai")
In the data table the company name column, some companies are coming repeatedly with a different name, e.g. Apple, and Apple _Do not call. I want to consider only one instead. How do I clean those data? The company name which is repeating has the same value for other fields
Company Name Volume
Apple 150
Wallmart 190
Apple_Do Not Call 150
Sapient 450
Apple inc. 150
if you eyeball the data, the Apple company are coming repeatedly with different name. I want to consider 1 value only, i.e. Apple
You can group_by on a different field that has the same values (Volume in this case) then use mutate to change the Company Name to the first value of each group_by group
dt %>% group_by(Volume) %>% mutate(Company_Name = first(Company_Name))
dt here would be your data.table
I am new to R and was wondering how to do the following:
I have a data frame called 'wage' which has features like
First.Name Last.Name Hourly.Pay
Lara Davis 39.29
John Childers 35.12
Lara Grace 40.16
In 'wage' the first name can be non-unique. I have another data frame called 'wage_gender' which has features like
name gender ProbMale ProbFemale
Lara Female 0.0088 0.9912
John Male 0.992 0.008
The 'name' is wage_gender are all unique and should correspond to the First.Name in 'wage'. The two data frames are not of the same size. Also, some names in wage may not be there in wage gender. So, it should get set to NA.
I want to add a 'gender' feature to the 'wage' data frame by looking up the genders from 'wage_gender'. However i can't seem to get it to work. Here is what I have
f = function(r, gen)
r$gender = gen[which(gen$name == r$First.Name),]$gender
apply(wage, 1, f, gen=wage_gender)
Basically, I expect apply to use 'f' over each row and look for the name in 'wage_gender' and assign the appropriate gender but it throws an error: Error in r$First.Name : $ operator is invalid for atomic vectors I am not sure what I am doing wrong.
A different way to do this is to add the names as row.names in wage_gender and then just use that as a lookup table.
row.names(wage_gender) = wage_gender$name
wage_gender[wage$First.Name, "gender"]
[1] "Female" "Male" "Female"
That will also give NA if the name is not in wage_gender
Just rename the column 'name' as 'first.name' in 'wage_gender'
names(wage_gender)[i] <- "First.Name" #(where i is the number of the column that has 'name' as name)
You can also rename like that (it's more elegant, but longer):
names(wage_gender)[names(wage_gender == "name")] = "First.Name"
And then, merge the two data.frame:
new.df <- merge(wage_gender,wage,by ="First.Name")
I have a categorical variable indicating location of flu clinics as well as an "other" category. Participants who select the "other" category give open-ended responses for their location. In most cases, these open-ended responses fit with one of the existing categories (for example, one category is "public health clinic", but some respondents picked "other" and cited "mall" which was a public health clinic). I could easily do this by hand but want to learn the code to select "mall" strings then use logical expressions to assign these people to "public health clinic" (e.g. create a new variable for location of flu clinics).
My categorical variable is "lrecflu2" and my character string variable is "lfother"
So far I have:
mall <- grep("MALL", Motiv82012$lfother, value = TRUE)
This gives me a vector with all the string responses containing "MALL" (all strings are in caps in the dataframe)
How do I use this vector in a logical expression to create a new variable that assigns these people to the "public health clinic" category and assigns the original value of flu clinic location variable for people that did not select "other" (and do not have values in the character string variable) to the new flu clinic location variable?
Perhaps, grep is not even the right function to be using.
As I understand it, you have a column in a data frame, where you want to reassign one character value to another. If so, you were almost there...
set.seed(1) # for generating an example
df1 <- data.frame(flu2=sample(c("MALL","other","PHC"),size=10,replace=TRUE))
df1$flu2[grep("MALL",df1$flu2)] <- "PHC"
Here grep() is giving you the required vector index; you then subset the vector based on this and change those elements.
Update 2
This should produce a data.frame similar to the one you are using:
set.seed(1)
lreflu2 <- sample(c("PHC","Med","Work","other"),size=10,replace=TRUE)
Ifother <- rep("",10) # blank character vector
s1 <- c("Frontenac Mall","Kingston Mall","notMALL")
Ifother[lreflu2=="other"] <- s1
df1 <- data.frame(lreflu2,Ifother)
### alternative:
### df1 <- data.frame(lreflu2,Ifother, stringsAsFactors = FALSE)
df1
gives:
lreflu2 Ifother
1 Med
2 Med
3 Work
4 other Frontenac Mall
5 PHC
6 other Kingston Mall
7 other notMALL
8 Work
9 Work
10 PHC
If you're looking for an exact string match you don't need grep at all:
df1$lreflu2[df1$Ifother=="MALL"] <- "PHC"
Using a regex:
df1$lreflu2[grep("Mall",df1$Ifother)] <- "PHC"
gives:
lreflu2 Ifother
1 Med
2 Med
3 Work
4 PHC Frontenac Mall
5 PHC
6 PHC Kingston Mall
7 other notMALL
8 Work
9 Work
10 PHC
Whether Ifother is a factor or vector with mode character doesn't affect things. data.frame will coerce string vectors to factors by default.