I am trying to create a custom function to give labels to modified list of data frames. For example, I have a data frame like below.
df<-data.frame(
gender = c(1,2,1,2,1,2,1,2,2,2,2,1,1,2,2,2,2,1,1,1,1,1,2,1,2,1,2,2,2,1,2,1,2,1,2,1,2,2,2),
country = c(3,3,1,2,5,4,4,4,4,3,3,4,3,4,2,1,4,2,3,4,4,4,3,1,2,1,5,5,4,3,1,4,5,2,3,4,5,1,4),
Q1=c(1,1,NA,NA,NA,NA,NA,NA,1,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,NA,NA,1,1,1,NA,1,1,NA,NA,NA,NA,1,NA,NA,NA,NA,1,NA,1),
Q2=c(1,1,1,1,1,NA,NA,NA,NA,1,1,1,1,1,NA,NA,NA,1,1,1,NA,1,1,1,1,1,NA,NA,NA,1,1,1,1,1,1,1,NA,NA,NA),
Q3=c(1,1,NA,NA,NA,NA,NA,1,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,NA,NA,NA,NA,1,1,1,NA,NA,NA,1,NA,NA,1,1,1,1,1,NA,NA,1),
Q4=c(1,NA,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,NA,NA,NA,NA,NA,NA,1,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA),
Q5=c(1,2,1,1,1,2,1,2,2,1,2,NA,1,1,2,2,2,1,1,1,2,NA,2,1,1,1,2,2,2,NA,1,2,2,1,1,1,2,2,2)
)
I understand your goal to be the following: You want to take a list of data frames (ldat). For each of the dataframes in the list (df, df2) you want to take some existing columns (Q1, Q2, Q3) and replicate them with new names in the same data frame (Q1_new, Q2_new, Q3_new). This you could achieve like this:
variables = c("Q1","Q2","Q3")
new_label =c("Q1_new","Q2_new","Q3_new")
newdfs <- lapply(ldat, FUN = function(x) {
x[,new_label] = x[,variables]
return(x)})
head(newdfs$ALL)
gender country Q1 Q2 Q3 Q4 Q5 cc2 Q1_new Q2_new Q3_new
1 Male USA Yes Available Partner Depends on sales Local 1 Yes Available Partner
2 female USA Yes Available Partner <NA> Overseas NA Yes Available Partner
3 Male CAN <NA> Available <NA> <NA> Local 1 <NA> Available <NA>
4 female EU <NA> Available <NA> <NA> Local 1 <NA> Available <NA>
5 Male UK <NA> Available <NA> <NA> Local 1 <NA> Available <NA>
6 female BR <NA> <NA> <NA> <NA> Overseas NA <NA> <NA> <NA>
Is this what you had in mind?
Hi I am quite new to R and i need to match terms from .xlsx columns to get a list of matched data between three .xlsx. The data in files is like this:
From one.xlsx:
OneID NameOne
ACR019 Acropectoral Syndrome
ACR020 Acropectorovertebral
GNT015 Genital Dwarfism
ACR023 Acral Dysostosis Dyserythropoiesis Syndrome
From two.xlsx:
TwoID TwoName
607907 DERMATOFIBROSARCOMA PROTUBERANS
304730 DERMOIDS OF CORNEA
605967 ACROPECTORAL SYNDROME
102510 ACROPECTOROVERTEBRAL
From three.xlsx:
ThreeID ThreeName
OM85203 Acropectoral syndrome
OM67092 Dermoids cornea
OM76580 Acardia
OM45632 Hypertryptophanemia
And the final result file in .xlsx must look like this:
OneID NameOne TwoID TwoName ThreeID ThreeName
ACR019 Acropectoral Syndrome 605967 ACROPECTORAL SYNDROME OM85203 Acropectoral syndrome
ACR020 Acropectorovertebral 102510 ACROPECTOROVERTEBRAL -
- 304730 DERMOIDS OF CORNEA OM67092 Dermoids cornea
Thank you very much, any suggestion or help to code this will be welcome.
What about this: due your only common fields are names in various dataset, we have to use them, as key to connect the various .xlsx, after make some small transformations (generally imho it's not a great idea use descriptions as key, but we could not do different in this case), using the merge() function.
After import the three MSExcel files, you can do:
# first your data (fake)
one <- data.frame(OneID=c('ACR019','ACR020','GNT015','ACR023'),
NameOne = c('Acropectoral Syndrome','Acropectorovertebral','Genital Dwarfism','Acral Dysostosis Dyserythropoiesis Syndrome'))
two <- data.frame(OneID=c('A607907','304730','605967','102510'),
NameTwo = c('DERMATOFIBROSARCOMA PROTUBERANS','DERMOIDS OF CORNEA','ACROPECTORAL SYNDROME','ACROPECTOROVERTEBRAL'))
three <-data.frame(OneID=c('OM85203','OM67092','OM76580','OM45632'),
NameThree = c('Acropectoral syndrome','Dermoids cornea','Acardia','Hypertryptophanemia'))
# then, to have uniques keys, you can put all of them as upper cases to create ids:
one$ID <- toupper(one$NameOne)
two$ID <- toupper(two$NameTwo)
three$ID <- toupper(three$NameThree)
# after that, you can merge the dataframes:
merged <- merge(merge(one,two, by ='ID', all = TRUE),three, by ='ID', all = TRUE)
#lastly, you give them the names you want (to columns)
colnames(merged) <- c('ID', 'OneID','NameOne','TwoID','NameTwo','ThreeID','NameThree')
# here the results
merged
> merged
ID OneID NameOne TwoID NameTwo
1 ACARDIA <NA> <NA> <NA> <NA>
2 ACRAL DYSOSTOSIS DYSERYTHROPOIESIS SYNDROME ACR023 Acral Dysostosis Dyserythropoiesis Syndrome <NA> <NA>
3 ACROPECTORAL SYNDROME ACR019 Acropectoral Syndrome 605967 ACROPECTORAL SYNDROME
4 ACROPECTOROVERTEBRAL ACR020 Acropectorovertebral 102510 ACROPECTOROVERTEBRAL
5 DERMATOFIBROSARCOMA PROTUBERANS <NA> <NA> A607907 DERMATOFIBROSARCOMA PROTUBERANS
6 DERMOIDS CORNEA <NA> <NA> <NA> <NA>
7 DERMOIDS OF CORNEA <NA> <NA> 304730 DERMOIDS OF CORNEA
8 GENITAL DWARFISM GNT015 Genital Dwarfism <NA> <NA>
9 HYPERTRYPTOPHANEMIA <NA> <NA> <NA> <NA>
ThreeID NameThree
1 OM76580 Acardia
2 <NA> <NA>
3 OM85203 Acropectoral syndrome
4 <NA> <NA>
5 <NA> <NA>
6 OM67092 Dermoids cornea
7 <NA> <NA>
8 <NA> <NA>
9 OM45632 Hypertryptophanemia
I have an issue while extracting and creating a new data.frame on the basis of previous one.
So we have:
> head(data.raw)
date id contacted contacted_again region
1 2015-11-29 234 CHAT EMAIL APAC
2 2015-11-29 234 EMAIL EMAIL APAC
3 2015-11-27 257 PHONE PHONE EMEA
4 2015-11-27 278 PHONE EMAIL APAC
5 2015-11-27 293 CHAT EMAIL EMEA
6 2015-11-27 243 EMAIL EMAIL EMEA
market
1 AU/NZ
2 SE Asia (English)
3 Spain
4 China Mainland
5 DACH
6 DACH
However, one I write
data.ru <- data.raw[data.raw$market=="Russia",]
I receive the following mess:
date id contacted contacted_again region market
67 2015-11-25 334 CHAT EMAIL EMEA Russia
NA <NA> <NA> <NA> <NA> <NA> <NA>
NA.1 <NA> <NA> <NA> <NA> <NA> <NA>
NA.2 <NA> <NA> <NA> <NA> <NA> <NA>
NA.3 <NA> <NA> <NA> <NA> <NA> <NA>
NA.4 <NA> <NA> <NA> <NA> <NA> <NA>
How should I write a command to receive just a normal data.frame with all rows that $market=="Russia" without any NAs?
I would just use the subset function.
test <- data.frame(x = c("USA", "USA", "USA", "Russia", "Russia", NA), y = c("Orlando", "Boston", "Memphis", NA, "St. Petersburg", "Mexico City"))
print(test)
x y
1 USA Orlando
2 USA Boston
3 USA Memphis
4 Russia <NA>
5 Russia St. Petersburg
6 <NA> Mexico City
subset(test, x == "Russia")
x y
4 Russia <NA>
5 Russia St. Petersburg
You may want to try: data.ru <- data.raw[data.raw$market %in% "Russia",]
Explanation: I am assuming you have empty lines in your dataset, which are read as NAs (missing value). Since R cannot know if a given NA is equal to "Russia" or not, the generated data frame includes them.
Illustration in code:
# create sample dataset
example.df <- data.frame(market=c(NA, "Russia", NA), outcome = c(1,2,3))
# match market using ==
example.df$market == "Russia"
example.df[example.df$market == "Russia",]
# match market using %in%
example.df$market %in% "Russia"
example.df[example.df$market %in% "Russia",]
thank you for your patience.
I am dealing with a large dataset detailing patients and medications.
Medications are hard to code, as they are (usually) meaningless unless matched with doses.
I have a dataframe with vectors (Drug1, Drug2..... Drug 16) where individual patients are represented by rows.
The vectors are actually factors, with 100s of possible levels (all the drugs the patient could be on).
All I want to do is produce a vector of logicals (TTTTFFFFTTT......) that I could then cbind into a dataframe which will tell me whether a patient is or is not on a particular, drug.
I could then use particularly important drugs' presence or absence as categorical covariates in a model.
I've tried grep, to search along the rows, and I can generate a vector of identifiers, but I cannot seem to generate the vector of logicals.
I realise I'm doing something simply wrong.
names(drugindex)
[1] "book.MRN" "DRUG1" "DRUG2" "DRUG3" "DRUG4" "DRUG5"
[7] "DRUG6" "DRUG7" "DRUG8" "DRUG9" "DRUG10" "DRUG11"
[13] "DRUG12" "DRUG13" "DRUG14" "DRUG15" "DRUG16"
> truvec<-drugindex$book.MRN[as.vector(unlist(apply(drugindex[,2:17], 2, grep, pattern="Lamotrigine")))]
> truvec
truvec
[1] 0024633 0008291 0008469 0030599 0027667
37 Levels: 0008291 0008469 0010188 0014217 0014439 0015822 ... 0034262
> head(drugindex)
book.MRN DRUG1 DRUG2 DRUG3 DRUG4 DRUG5
4 0008291 Venlafaxine Procyclidine Flunitrazepam Amisulpiride Clozapine
31 0008469 Venlafaxine Mirtazapine Lithium Olanzapine Metoprolol
3 0010188 Flurazepam Valproate Olanzapine Mirtazapine Esomeprazole
13 0014217 Aspirin Ramipril Zuclopenthixol Lorazepam Haloperidol
15 0014439 Zopiclone Diazepam Haloperidol Paracetamol <NA>
5 0015822 Olanzapine Venlafaxine Lithium Haloperidol Alprazolam
DRUG6 DRUG7 DRUG8 DRUG9 DRUG10 DRUG11 DRUG12
4 Lamotrigine Alprazolam Lithium Alprazolam <NA> <NA> <NA>
31 Lamotrigine Ramipril Alprazolam Zolpidem Trifluoperazine <NA> <NA>
3 Paracetamol Alprazolam Citalopram <NA> <NA> <NA> <NA>
13 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
15 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
5 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
DRUG13 DRUG14 DRUG15 DRUG16
4 <NA> <NA> <NA> <NA>
31 <NA> <NA> <NA> <NA>
3 <NA> <NA> <NA> <NA>
13 <NA> <NA> <NA> <NA>
15 <NA> <NA> <NA> <NA>
5 <NA> <NA> <NA> <NA>
And what I want is a vector of logicals for each drug, saying whether that patient is on it
Thank you all for your time.
Ross Dunne MRCPsych
"Te occidere possunt sed te edere ne possunt, nefas est".
You were close with your apply attempt, but MARGIN=2 applies the function over columns, not rows. Also, grep returns the locations of the matches; you want grepl, which returns a logical vector. Try this:
apply(x[,-1], 1, function(x) any(grepl("Aspirin",x)))
You could also use %in%, which you may find more intuitive:
apply(x[,-1], 1, "%in%", x="Aspirin")
First, a comment on data structure. You have data in what some call a "wide" format, with a single row per patient and multiple columns for the drugs. It is usually the case that the "long" format, with reapeated rows per patient and a single column for drugs is more amenable to data manipulation. To reshape your data from wide to long and vice versa, take a look at the reshape package. In this case, you would have something like:
library(reshape)
dnow <- melt(drugindex, id.var='book.MRN')
subset(dnow, value=='Lamotrigine')
Much cleaner, and obvious, code, if I may say so ...
Edit: If you need the old structure back you can use cast:
cast(subset(dnow, value=='Lamotrigine'), book.MRN ~ value)
as suggested by #jonw in the comments.