Using openFDA package, I have this code:
# devtools::install_github("ropenhealth/openfda")
library("openfda")
drugs = fda_query("/drug/event.json") %>%
fda_api_key("MY_KEY") %>%
fda_count("patient.drug.medicinalproduct.exact") %>%
fda_exec()
So, I get a list of drugs with a count for each line (fda_count). I'd like to add to the data.frame "drugs" also a column with the corresponding company name to the drug. How can I add the data from "patient.drug.openfda.manufacturer_name.exact" as third column?
Related
I have a data frame that contains one column with strings coding events and one column with participant numbers. Every participant has multiple rows of events. Event codes contain keywords separated by underscores. I would like to count the occurrence of specific keywords per participant and put this in a new dataframe with one row per participant.
I have tried to do this by using grepl to find the keywords and then group_by and summarise to create a new data frame. Here is a minimal example:
part = rep(1:5,4)
events = c("black_white","black_blue","black_yellow","black_white","black_blue","black_yellow","black_white","black_blue","black_yellow","black_white","black_blue","black_yellow","black_white","black_blue","black_yellow","black_white","black_blue","black_yellow","black_white","black_blue")
data = data.frame(part,events)
data_sum = data %>%
group_by(part) %>%
summarise(
black = sum(grepl("black",data$event)),
black_yellow = sum(grepl("black_yellow",data$event))
)
However, if I run this, the counts are not grouped by participant but the overall counts, therefore, the same for everyone.
Does anyone have any tipps on what I'm doing wrong?
You can use this code
data_sum = data %>%
group_by(part) %>%
summarise(
black = sum(grepl("black",events)),
black_yellow = sum(grepl("black_yellow",events))
)
I have two datasets:
DS1 - contains a list of subjects with a columns for name, ID number and Employment status
DS2 - contains the same list of subjects names and ID numbers but some of these are missing on the second data set.
Finally it contains a 3rd column for Education Level.
I want to merge the Education column onto the first dataset. I have done this using the merge function sorting by ID number but because some of the ID numbers are missing on the second data set I want to merge the remaining Education level by name as a secondary option. Is there a way to do this using dplyr/tidyverse?
There are two ways you can do this. Choose the one based on your preference.
1st option:
#here I left join twice and select columns each time to ensure there is no duplication like '.x' '.y'
finalDf = DS1 %>%
dplyr::left_join(DS2 %>%
dplyr::select(ID,EducationLevel1=EducationLevel),by=c('ID')) %>%
dplyr::left_join(DS2 %>%
dplyr::select(Name,EducationLevel2=EducationLevel),by=c('Name')) %>%
dplyr::mutate(FinalEducationLevel = ifelse(is.na(EducationLevel1),EducationLevel2,EducationLevel1))
2nd option:
#first find the IDs which are present in the 2nd dataset
commonIds = DS1 %>%
dplyr::inner_join(DS2 %>%
dplyr::select(ID,EducationLevel),by=c('ID'))
#now the records where ID was not present in DS2
idsNotPresent = DS1 %>%
dplyr::filter(!ID %in% commonIds$ID) %>%
dplyr::left_join(DS2 %>%
dplyr::select(Name,EducationLevel),by=c('Name'))
#bind these two dfs to get the final df
finalDf = bind_rows(commonIds,idsNotPresent)
Let me know if this works.
The second option in makeshift-programmer's answer worker for me. Thank you so much. Had to play around with it for my actual data sets but the basic structure worked very well and it was easy to adapt
Ciao, I have two columns. Every row represents one student. The first column tells what class the student is in. The second column tells if the student passed a exam.
Here is my replicating example.
This is the data I have now:
a=c("A","A","A","A","B","B","B","C","C")
b=c(0,0,1,0,0,0,0,1,1)
mydata=data.frame(a,b)
names(mydata)=c("CLASS","PASSED")
This is the data I seek to attain:
a1=c("A","B","C")
b1=c(4,3,2)
c1=c(1,0,2)
mydataWANT=data.frame(a1,b1,c1)
names(mydataWANT)=c("CLASS","SIZE","PASSED")
Here is my attempt for the dplyr package
mydataWANT <- data.frame(mydata %>%
group_by(CLASS,PASSED) %>%
summarise(N = n()))
yet it does not yield the desire output.
I have my data frame below, I want to sum the data like I have in the first row in the image below (labelled row 38). The total flowering summed for Sections A-D for each date, i also have multiple plots not just Dry1, but Dry2, Dry3 etc.
It's so simple to do in my head but I can't workout how to do it in R?
Essentially I want to do this:
with(dat1, sum(dat1$TotalFlowering[dat1$Date=="1997-07-01" & dat1$Plot=="Dry1"]))
Which tells me that the sum of total flowers for sections "A,B,C,D" in plot "Dry1" for the date "1997-07-01" = 166
I want a way to write a code so this does so for every date and plot combo, and then puts it in the data frame?
In the same format as the first row in the image I included :)
Based on your comment it seems like you just want to add a variable to keep track of TotalFlowering for every Date and Plot combination. If that's the case then can you just add a field like TotalCount below?
library(dplyr)
df %>%
group_by(Date, Plot) %>%
mutate(TotalCount = sum(TotalFlowering)) %>%
ungroup()
Or, alternatively, if all you want is the sum you could make use of dplyr's summarise like below
library(dplyr)
df %>%
group_by(Date, Plot) %>%
summarise(TotalCount = sum(TotalFlowering))
I have two data frames of country data.
df1 has all the countries of the world.
df2 has a subset of countries but has the populations in one of its columns.
I want to take the population data and add it to df1 where the country names are a match.
If df1$Column1 = df2$Column1 (same country name) then populate df1$Column2 (currently empty) with the information from df2$Column2 (country's population) where the row is the the one for that country match.
I tried to merge the two using the column "Name" which they both have for country names :
total <- merge(map,Co2_2x, by="NAME")
the columns are all there but I get empty rows in my new dataframe.
I'd like to be able to say "for this row and column matrix position in df1 (the country), get the row (country name match in df2) and column X (population data). Then put it in this row and column Y matrix position in df1 (new population column in df1 for the matched country name)"... There must be an easier way :-)
Here is my code : I'd like to fill map$measure with data from Co2_2x$premium where the countries match.
library(XML)
library(raster)
library(rgdal)
download.file("http://thematicmapping.org/downloads/TM_WORLD_BORDERS_SIMPL-0.3.zip",destfile="TM_WORLD_BORDERS_SIMPL-0.3.zip")
unzip("TM_WORLD_BORDERS_SIMPL-0.3.zip",exdir=getwd())
polygons <- shapefile("TM_WORLD_BORDERS_SIMPL-0.3.shp")
polygons
map <- as.data.frame(polygons)
map$Measure <- 0
library(rvest)
Co2 <- read_html("https://en.wikipedia.org/wiki/List_of_countries_by_carbon_dioxide_emissions")
Co2_2x<-Co2 %>%
html_nodes("table") %>%
.[[1]] %>%
html_table()
names(Co2_2x)[2]<-paste("premium")
names(Co2_2x)[1]<-paste("NAME")
total <- merge(map,Co2_2x, by="NAME")
Thanks!
To have the first dataset rows with no match in the other dataset appear, you just need to add the all.x=T option, as follows (have a look at the documentation for details) :
total <- merge(map,Co2_2x, by="NAME",all.x=T)
These rows will then appear with NA in the second dataset columns.
If the matching doesn't seem to work, you may want to make sure that your matching variable (in your case, NAME) is filled exaclty the same way in the two datasets (letter case, possible spaces at the extremities...).
This answer provides a fine way of doing so.
you can use sqldf library in R.
Just follow the code below. You'll be able to merge (join) the two dataset that you have:
library(sqldf)
merged_data <- sqldf("select a.country, b.population from df1 as a
left join df2 as b on (a.country = b.country) group by 1")
Thanks and happy R-programming!!!