Match and list column terms in R from .xlsx - r

Hi I am quite new to R and i need to match terms from .xlsx columns to get a list of matched data between three .xlsx. The data in files is like this:
From one.xlsx:
OneID NameOne
ACR019 Acropectoral Syndrome
ACR020 Acropectorovertebral
GNT015 Genital Dwarfism
ACR023 Acral Dysostosis Dyserythropoiesis Syndrome
From two.xlsx:
TwoID TwoName
607907 DERMATOFIBROSARCOMA PROTUBERANS
304730 DERMOIDS OF CORNEA
605967 ACROPECTORAL SYNDROME
102510 ACROPECTOROVERTEBRAL
From three.xlsx:
ThreeID ThreeName
OM85203 Acropectoral syndrome
OM67092 Dermoids cornea
OM76580 Acardia
OM45632 Hypertryptophanemia
And the final result file in .xlsx must look like this:
OneID NameOne TwoID TwoName ThreeID ThreeName
ACR019 Acropectoral Syndrome 605967 ACROPECTORAL SYNDROME OM85203 Acropectoral syndrome
ACR020 Acropectorovertebral 102510 ACROPECTOROVERTEBRAL -
- 304730 DERMOIDS OF CORNEA OM67092 Dermoids cornea
Thank you very much, any suggestion or help to code this will be welcome.

What about this: due your only common fields are names in various dataset, we have to use them, as key to connect the various .xlsx, after make some small transformations (generally imho it's not a great idea use descriptions as key, but we could not do different in this case), using the merge() function.
After import the three MSExcel files, you can do:
# first your data (fake)
one <- data.frame(OneID=c('ACR019','ACR020','GNT015','ACR023'),
NameOne = c('Acropectoral Syndrome','Acropectorovertebral','Genital Dwarfism','Acral Dysostosis Dyserythropoiesis Syndrome'))
two <- data.frame(OneID=c('A607907','304730','605967','102510'),
NameTwo = c('DERMATOFIBROSARCOMA PROTUBERANS','DERMOIDS OF CORNEA','ACROPECTORAL SYNDROME','ACROPECTOROVERTEBRAL'))
three <-data.frame(OneID=c('OM85203','OM67092','OM76580','OM45632'),
NameThree = c('Acropectoral syndrome','Dermoids cornea','Acardia','Hypertryptophanemia'))
# then, to have uniques keys, you can put all of them as upper cases to create ids:
one$ID <- toupper(one$NameOne)
two$ID <- toupper(two$NameTwo)
three$ID <- toupper(three$NameThree)
# after that, you can merge the dataframes:
merged <- merge(merge(one,two, by ='ID', all = TRUE),three, by ='ID', all = TRUE)
#lastly, you give them the names you want (to columns)
colnames(merged) <- c('ID', 'OneID','NameOne','TwoID','NameTwo','ThreeID','NameThree')
# here the results
merged
> merged
ID OneID NameOne TwoID NameTwo
1 ACARDIA <NA> <NA> <NA> <NA>
2 ACRAL DYSOSTOSIS DYSERYTHROPOIESIS SYNDROME ACR023 Acral Dysostosis Dyserythropoiesis Syndrome <NA> <NA>
3 ACROPECTORAL SYNDROME ACR019 Acropectoral Syndrome 605967 ACROPECTORAL SYNDROME
4 ACROPECTOROVERTEBRAL ACR020 Acropectorovertebral 102510 ACROPECTOROVERTEBRAL
5 DERMATOFIBROSARCOMA PROTUBERANS <NA> <NA> A607907 DERMATOFIBROSARCOMA PROTUBERANS
6 DERMOIDS CORNEA <NA> <NA> <NA> <NA>
7 DERMOIDS OF CORNEA <NA> <NA> 304730 DERMOIDS OF CORNEA
8 GENITAL DWARFISM GNT015 Genital Dwarfism <NA> <NA>
9 HYPERTRYPTOPHANEMIA <NA> <NA> <NA> <NA>
ThreeID NameThree
1 OM76580 Acardia
2 <NA> <NA>
3 OM85203 Acropectoral syndrome
4 <NA> <NA>
5 <NA> <NA>
6 OM67092 Dermoids cornea
7 <NA> <NA>
8 <NA> <NA>
9 OM45632 Hypertryptophanemia

Related

converting as factor for a list of data frames

I am trying to create a custom function to give labels to modified list of data frames. For example, I have a data frame like below.
df<-data.frame(
gender = c(1,2,1,2,1,2,1,2,2,2,2,1,1,2,2,2,2,1,1,1,1,1,2,1,2,1,2,2,2,1,2,1,2,1,2,1,2,2,2),
country = c(3,3,1,2,5,4,4,4,4,3,3,4,3,4,2,1,4,2,3,4,4,4,3,1,2,1,5,5,4,3,1,4,5,2,3,4,5,1,4),
Q1=c(1,1,NA,NA,NA,NA,NA,NA,1,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,NA,NA,1,1,1,NA,1,1,NA,NA,NA,NA,1,NA,NA,NA,NA,1,NA,1),
Q2=c(1,1,1,1,1,NA,NA,NA,NA,1,1,1,1,1,NA,NA,NA,1,1,1,NA,1,1,1,1,1,NA,NA,NA,1,1,1,1,1,1,1,NA,NA,NA),
Q3=c(1,1,NA,NA,NA,NA,NA,1,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,NA,NA,NA,NA,1,1,1,NA,NA,NA,1,NA,NA,1,1,1,1,1,NA,NA,1),
Q4=c(1,NA,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,NA,NA,NA,NA,NA,NA,1,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA),
Q5=c(1,2,1,1,1,2,1,2,2,1,2,NA,1,1,2,2,2,1,1,1,2,NA,2,1,1,1,2,2,2,NA,1,2,2,1,1,1,2,2,2)
)
I understand your goal to be the following: You want to take a list of data frames (ldat). For each of the dataframes in the list (df, df2) you want to take some existing columns (Q1, Q2, Q3) and replicate them with new names in the same data frame (Q1_new, Q2_new, Q3_new). This you could achieve like this:
variables = c("Q1","Q2","Q3")
new_label =c("Q1_new","Q2_new","Q3_new")
newdfs <- lapply(ldat, FUN = function(x) {
x[,new_label] = x[,variables]
return(x)})
head(newdfs$ALL)
gender country Q1 Q2 Q3 Q4 Q5 cc2 Q1_new Q2_new Q3_new
1 Male USA Yes Available Partner Depends on sales Local 1 Yes Available Partner
2 female USA Yes Available Partner <NA> Overseas NA Yes Available Partner
3 Male CAN <NA> Available <NA> <NA> Local 1 <NA> Available <NA>
4 female EU <NA> Available <NA> <NA> Local 1 <NA> Available <NA>
5 Male UK <NA> Available <NA> <NA> Local 1 <NA> Available <NA>
6 female BR <NA> <NA> <NA> <NA> Overseas NA <NA> <NA> <NA>
Is this what you had in mind?

How to get rid of wierd NA rows in each cell of a dataframe

I have a database as a dataframe named 'data' which constitutes 500 objects and 2 variables.
in fact
dim(data)
returns
[1] 500 2
and
str(data)
returns
'data.frame': 500 obs. of 2 variables:
$ Diagnosis : chr "D1" "D2" "D3" "D4" ...
$ Type : Factor w/ 8 levels "T1","T2",..: 6 4 1 6 1 4 4 4 5 5 ...
But, when I'm trying to retrieve the value of 'Type' for a specific 'Diagnosis', say, 'D4', 11 weird NA values appear in addition to 'Type' value. In fact, it seems that in each cell of this data frame there is a vector of 12 values of which 11 are NA have come out of thin air.
In turn,
data[data$Diagnosis=='D4','Type']
returns:
[1] <NA> <NA> <NA> <NA> <NA> <NA>
[7] <NA> <NA> <NA> <NA> <NA> T6
intrestingly:
data[data$Diagnosis=='D4',]
returns:
Diagnosis Type
NA <NA> <NA>
NA.1 <NA> <NA>
NA.2 <NA> <NA>
NA.3 <NA> <NA>
NA.4 <NA> <NA>
NA.5 <NA> <NA>
NA.6 <NA> <NA>
NA.7 <NA> <NA>
NA.8 <NA> <NA>
NA.9 <NA> <NA>
NA.10 <NA> <NA>
503 D4 T6
The dataframe had been created in excel and then I imported it to R studio, I have done a lot of alterations on the dataframe since.
I have two questions:
Where did these NAs come from and how can I delete them?
In fact, I want data[data$Diagnosis=='D4','Type']
to return:
[1] T6
and:
data[data$Diagnosis=='D4',]
to retun:
Diagnosis Type
[row number] D4 T6
I can not use omit.na(data) complete.cases() for the whole dataframe, as I have some legitimate NAs that I don't want to remove
how can I set more than one value to a cell of a data frame. let's assume that 1# person has 2 concomitant diagnoses. how can I store both values of 'D1' and 'D2' in the 'diagnosis' of the 1# person?
I think this explanation will be helpful.
As you can see Type column is a not a character,it is a factor
so in R,behind the scenes it is consider as categorical field.as you can see it shows levels as integers.so if you try to access the value it returns the level,not the value. what you need is convert Type column to characters first.after that do the operation
df$Type <- as.character(df$Type)

Matching data from multiple columns in r

I have two datasets:
Contacts2: This contains a list of ~100,000 contacts, their respective titles and a set of columns which describes the types of work contacts could be involved in. Here's an example dataset:
First<-c("George","Thomas","James","Jimmy","Howard","Herbert")
Last<-c("Washington", "Jefferson", "Madison", "Carter", "Taft", "Hoover")
Title<-c("CEO", "Accountant","Communications Specialist", "President", "Accountant", "CFO")
Finance<-NA
Executive<-NA
Communications<-NA
Contacts2<-as.data.frame(cbind(First,Last,Title,Finance,Executive,Communications))
First Last Title Finance Executive Communications
1 George Washington CEO <NA> <NA> <NA>
2 Thomas Jefferson Accountant <NA> <NA> <NA>
3 James Madison Communications Specialist <NA> <NA> <NA>
4 Jimmy Carter President <NA> <NA> <NA>
5 Howard Taft Accountant <NA> <NA> <NA>
6 Herbert Hoover CFO <NA> <NA> <NA>
Note the last three columns are numeric.
TableOfTitle: This dataset contains a list of ~1,000 unique titles and the same set of columns in which describes the type of work the contacts could be involved in. For each title I've put an 1 in the column(s) of the roles that describe that person's job.
Title<-c("CEO","Accountant", "Communications Specialist", "President", "CFO")
Finance<-c(NA,1,NA,1,1)
Executive<-c(1,NA,NA,NA,1)
Communications<-c(NA,NA,1,NA,NA)
TableOfTitle<-as.data.frame(cbind(Title,Finance,Executive,Communications))
Title Finance Executive Communications
1 CEO <NA> 1 <NA>
2 Accountant 1 <NA> <NA>
3 Communications Specialist <NA> <NA> 1
4 President 1 <NA> <NA>
5 CFO 1 1 <NA>
Note the last three columns are numeric.
I'm now trying to match the check boxes in TableOfTitle in Contacts2 based on the contact title field. For example, since TableOfTitle shows anyone with the title of CFO should have an x in the Finance and Executive field, the record for Herbert Hoover in Contacts2 should also have 1s in those columns as well.
Here's a solution that uses dplyr. It is essentially what some commenters have already recommended, except that this fulfills the request of not copying over any pre-existing data in the last 3 columns of Contacts2.
Note that ifelse() can be very slow with large datasets, but for your stated task this shouldn't really be noticeable. Algorithmically, this solution is also a bit clumsy in other ways, but I went for maximum readability here.
Contacts2 <- left_join(Contacts2, TableOfTitle, by = "Title") %>%
transmute(First = First,
Last = Last,
Title = Title,
Finance = ifelse(is.na(Finance.x), Finance.y, Finance.x),
Executive = ifelse(is.na(Executive.x), Executive.y, Executive.x),
Communications = ifelse(is.na(Communications.x), Communications.y, Communications.x))
Example output:
First Last Title Finance Executive Communications
George Washington CEO <NA> 1 <NA>
Thomas Jefferson Accountant 1 <NA> <NA>
James Madison Communications Specialist <NA> <NA> 1
Jimmy Carter President 1 <NA> <NA>
Howard Taft Accountant 1 <NA> <NA>
Herbert Hoover CFO 1 1 <NA>

Replace NA with mode from categorical dataset R

I have a dataset with categorical and NA observations of 10 variables. I want to replace the NA values of each column with the mode. I did a histogram of each variable for identifying the density for each observation and got the mode. I know what values to replace the NAs in each column with.
I saw there was a related post, but I already know what values to replace. Here's the link: Replace mean or mode for missing values in R
Here's to reproduce the dataset:
> #Create data with missing values
> set.seed(1)
> dat <- data.frame(x=sample(letters[1:3],20,TRUE), y=rnorm(20),
stringsAsFactors=FALSE)
> dat[c(5,10,15),1] <- NA
Here's an example:
> #The head of the first five observations
> head(SmallStoredf, n=5)
Age Gender HouseholdIncome MaritalStatus PresenceofChildren HomeOwnerStatus HomeMarketValue
1 <NA> Male <NA> <NA> <NA> <NA> <NA>
2 45-54 Female <NA> <NA> <NA> <NA> <NA>
5 45-54 Female 75k-100k Married Yes Own 150k-200k
6 25-34 Male 75k-100k Married No Own 300k-350k
7 35-44 Female 125k-150k Married Yes Own 250k-300k
Occupation Education LengthofResidence
1 <NA> <NA> <NA>
2 <NA> <NA> <NA>
5 <NA> Completed High School 9 Years
6 <NA> Completed High School 11-15 years
7 <NA> Completed High School 2 Years
In this example, I want NAs in HomeOwnerStatus replaced with Own, HomeMarketValue with 350K-500K, and Occupation with Professional.
EDIT: I tried inputting the values in, but got an error about three of the columns.
> replacementVals <- c(Age = "45-54", Gender = "Male", HouseholdIncome = "50K-75K",
+ MaritalStatus = "Single", PresenceofChildren = "No",
+ HomeOwnerStatus = "Own", HomeMarketValue = "350K-500K",
+ Occupation = "Professional", Education = "Completed High School",
+ LengthofResidence = "11-15yrs")
> indx1 <- replacementVals[col(df2)][is.na(df2[,names(replacementVals)])]
> df2[is.na(df2[,names(replacementVals)])] <- indx1
#Warning messages:
#1: In `[<-.factor`(`*tmp*`, thisvar, value = c("50K-75K", "50K-75K", :
invalid factor level, NA generated
#2: In `[<-.factor`(`*tmp*`, thisvar, value = c("350K-500K", "350K-500K", :
invalid factor level, NA generated
#3: In `[<-.factor`(`*tmp*`, thisvar, value = c("11-15yrs", "11-15yrs", :
invalid factor level, NA generated
Here's the output:
> head(SmallStoredf)
Age Gender HouseholdIncome MaritalStatus PresenceofChildren HomeOwnerStatus HomeMarketValue
1 45-54 Male <NA> Single No Own <NA>
2 45-54 Female <NA> Single No Own <NA>
5 45-54 Female 75k-100k Married Yes Own 150k-200k
6 25-34 Male 75k-100k Married No Own 300k-350k
7 35-44 Female 125k-150k Married Yes Own 250k-300k
8 55-64 Male 75k-100k Married No Own 150k-200k
Occupation Education LengthofResidence
1 Professional Completed High School <NA>
2 Professional Completed High School <NA>
5 Professional Completed High School 9 Years
6 Professional Completed High School 11-15 years
7 Professional Completed High School 2 Years
8 Professional Completed High School 16-19 years
Only NA values in some columns were replaced.
I amended your reproducible example a little bit, here's the setup
> #Create data with missing values
> set.seed(1)
> dat <- data.frame(x=sample(letters[1:3],20,TRUE), y=rnorm(20),
stringsAsFactors=FALSE)
> dat[c(5,10,15),1] <- NA
> dat[6,1]<-NA
#output
# x y
#1 a 1.511781168450847978590
#2 b 0.389843236411431093291
#3 b -0.621240580541803755210
#4 c -2.214699887177499881830
#5 <NA> 1.124930918143108193874
#6 c NA
#7 c -0.016190263098946087311
#8 b 0.943836210685299215051
#9 b 0.821221195098088552200
#10 <NA> 0.593901321217508826322
#11 a 0.918977371608218240873
#12 a 0.782136300731067102276
#13 c 0.074564983365190601328
#14 b -1.989351695863372793127
#15 <NA> 0.619825747894710232799
#16 b -0.056128739529000784558
#17 c -0.155795506705329295238
#18 c -1.470752383899274429169
#19 b -0.478150055108620353206
#20 c 0.417941560199702411005
now define your replacement vals, labeled by the columns you want to have NAs replaced
replacementVals<-c(x="Xreplace", y="Yreplace")
and the next call can replace them in all in one shot
dat[is.na(dat[,names(replacementVals)])]<-replacementVals
# x y
#1 a 1.51178116845085
#2 b 0.389843236411431
#3 b -0.621240580541804
#4 c -2.2146998871775
#5 Xreplace 1.12493091814311
#6 c Yreplace
#7 c -0.0161902630989461
#8 b 0.943836210685299
#9 b 0.821221195098089
#10 Yreplace 0.593901321217509
#11 a 0.918977371608218
#12 a 0.782136300731067
#13 c 0.0745649833651906
#14 b -1.98935169586337
#15 Xreplace 0.61982574789471
#16 b -0.0561287395290008
#17 c -0.155795506705329
#18 c -1.47075238389927
#19 b -0.47815005510862
#20 c 0.417941560199702
But as akrun pointed out, and subsequently solved, this didn't map well to your second data frame example. This is just taken straight from the comments they made (so either way they should probably get the check on this question)
We'll do the setup, I'm not going to do all the prints except for the result
HomeOwnerStatus = c(NA,NA,NA ,"Rent", "Rent" )
HomeMarketValue = c(NA,NA,NA, "350k", "350k")
Occupation = c(NA,NA,NA, NA, NA)
SmallStoreddf<-data.frame(HomeOwnerStatus,HomeMarketValue,Occupation, stringsAsFactors=FALSE)
replacementVals<-c("HomeOwnerStatus" = "Rent", "HomeMarketValue"="350k", "Occupation"="Professional")
Then in two steps (which could be combined into one really long line) you go
#get the values that we will be replacing
indx1<-replacementVals[col(SmallStoreddf)][is.na(SmallStoreddf[, names(replacementVals)])]
#do the replacement
SmallStoreddf[is.na(SmallStoredf[,names(replacementVals)])] <-indx1
# HomeOwnerStatus HomeMarketValue Occupation
#1 Own 350k Professional
#2 Own 350k Professional
#3 Own 350k Professional
#4 Rent 350k Professional
#5 Rent 350k Professional
Try: (Using your second example as it was a bit unclear when you showed two datasets)
indx <- which(is.na(SmallStoredf), arr.ind=TRUE)
SmallStoredf[indx] <- c("Own", "350K-500K", "Professional")[indx[,2]]
SmallStoredf
# HomeOwnerStatus HomeMarketValue Occupation
#1 Own 350K-500K Professional
#2 Own 350K-500K Professional
#3 Own 350K-500K Professional
#4 Rent 350k-500k Professional
#5 Rent 500k-1mm Professional
Upgrading comment.
If you are wanting to replace the missing data with the most frequent category, there may be an equal count of categories within a variable. So in the code below, the replacements are randomly sampled from the categories that are most frequent.
# some example data with missing
set.seed(1)
dat <- data.frame(x=sample(letters[1:3],20,TRUE),
y=sample(letters[1:3],20,TRUE),
w=rnorm(20),
z=sample(letters[1:3],20,TRUE),
stringsAsFactors=FALSE)
dat[c(5,10,15),1] <- NA
dat[c(3,7),2] <- NA
# function to get replacement for missing
# sample is used to randomly select categories, allowing for the case
# when the maximum frequency is shared by more than one category
f <- function(x) {
tab <- table(x)
l <- sum(is.na(x))
sample(names(tab)[tab==max(tab)], l, TRUE)
}
# as we are using sample, set.seed before replacing
set.seed(1)
for(i in 1:ncol(dat)){
if(!is.numeric(dat[i]))
dat[i][is.na(dat[i])] <- f(dat[i])
}
gentle warning: you should think carefully before imputing missing data this way. For example, income is often more likely to be missing for highest and lowest categories. By this method you may be imputing an average wage incorrectly. You should consider why each variable is missing and if it is reasonable to assume the data is MCAR or MAR. If so, i would then consider a more robust method of imputation (mice package).

Vector of logicals based on row membership

thank you for your patience.
I am dealing with a large dataset detailing patients and medications.
Medications are hard to code, as they are (usually) meaningless unless matched with doses.
I have a dataframe with vectors (Drug1, Drug2..... Drug 16) where individual patients are represented by rows.
The vectors are actually factors, with 100s of possible levels (all the drugs the patient could be on).
All I want to do is produce a vector of logicals (TTTTFFFFTTT......) that I could then cbind into a dataframe which will tell me whether a patient is or is not on a particular, drug.
I could then use particularly important drugs' presence or absence as categorical covariates in a model.
I've tried grep, to search along the rows, and I can generate a vector of identifiers, but I cannot seem to generate the vector of logicals.
I realise I'm doing something simply wrong.
names(drugindex)
[1] "book.MRN" "DRUG1" "DRUG2" "DRUG3" "DRUG4" "DRUG5"
[7] "DRUG6" "DRUG7" "DRUG8" "DRUG9" "DRUG10" "DRUG11"
[13] "DRUG12" "DRUG13" "DRUG14" "DRUG15" "DRUG16"
> truvec<-drugindex$book.MRN[as.vector(unlist(apply(drugindex[,2:17], 2, grep, pattern="Lamotrigine")))]
> truvec
truvec
[1] 0024633 0008291 0008469 0030599 0027667
37 Levels: 0008291 0008469 0010188 0014217 0014439 0015822 ... 0034262
> head(drugindex)
book.MRN DRUG1 DRUG2 DRUG3 DRUG4 DRUG5
4 0008291 Venlafaxine Procyclidine Flunitrazepam Amisulpiride Clozapine
31 0008469 Venlafaxine Mirtazapine Lithium Olanzapine Metoprolol
3 0010188 Flurazepam Valproate Olanzapine Mirtazapine Esomeprazole
13 0014217 Aspirin Ramipril Zuclopenthixol Lorazepam Haloperidol
15 0014439 Zopiclone Diazepam Haloperidol Paracetamol <NA>
5 0015822 Olanzapine Venlafaxine Lithium Haloperidol Alprazolam
DRUG6 DRUG7 DRUG8 DRUG9 DRUG10 DRUG11 DRUG12
4 Lamotrigine Alprazolam Lithium Alprazolam <NA> <NA> <NA>
31 Lamotrigine Ramipril Alprazolam Zolpidem Trifluoperazine <NA> <NA>
3 Paracetamol Alprazolam Citalopram <NA> <NA> <NA> <NA>
13 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
15 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
5 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
DRUG13 DRUG14 DRUG15 DRUG16
4 <NA> <NA> <NA> <NA>
31 <NA> <NA> <NA> <NA>
3 <NA> <NA> <NA> <NA>
13 <NA> <NA> <NA> <NA>
15 <NA> <NA> <NA> <NA>
5 <NA> <NA> <NA> <NA>
And what I want is a vector of logicals for each drug, saying whether that patient is on it
Thank you all for your time.
Ross Dunne MRCPsych
"Te occidere possunt sed te edere ne possunt, nefas est".
You were close with your apply attempt, but MARGIN=2 applies the function over columns, not rows. Also, grep returns the locations of the matches; you want grepl, which returns a logical vector. Try this:
apply(x[,-1], 1, function(x) any(grepl("Aspirin",x)))
You could also use %in%, which you may find more intuitive:
apply(x[,-1], 1, "%in%", x="Aspirin")
First, a comment on data structure. You have data in what some call a "wide" format, with a single row per patient and multiple columns for the drugs. It is usually the case that the "long" format, with reapeated rows per patient and a single column for drugs is more amenable to data manipulation. To reshape your data from wide to long and vice versa, take a look at the reshape package. In this case, you would have something like:
library(reshape)
dnow <- melt(drugindex, id.var='book.MRN')
subset(dnow, value=='Lamotrigine')
Much cleaner, and obvious, code, if I may say so ...
Edit: If you need the old structure back you can use cast:
cast(subset(dnow, value=='Lamotrigine'), book.MRN ~ value)
as suggested by #jonw in the comments.

Resources