Related
I have two datasets:
df1
ID paddock cow ID
90/123 10 09/123
90/124 11 09/124
90/125 11 09/124
df2
ID paddock
09/123 20
09/124 21
I would like to match df1$cowID with df2$ID and return df2$paddock for whatever row matches. My current code is as follows:
dt <- ifelse(df1$cowID %in% df2$ID, df2$paddock[i], NA)
But I'm getting a return error. Could someone direct me in the right direction please? Thanks in advance!
You might consider joining the datasets.
dplyr::left_join(df1, df2, by = c('cow ID', 'ID')
You should probably use match :
df1$df2_paddock <- df2$paddock[match(df1$cow_ID, df2$ID)]
df1
# ID paddock cow_ID df2_paddock
#1 90/123 10 09/123 20
#2 90/124 11 09/124 21
data
df1 <- structure(list(ID = structure(1:2, .Label = c("90/123", "90/124"
), class = "factor"), paddock = 10:11, cow_ID = structure(1:2, .Label = c("09/123",
"09/124"), class = "factor")), class = "data.frame", row.names = c(NA, -2L))
df2 <- structure(list(ID = structure(1:2, .Label = c("09/123", "09/124"
), class = "factor"), paddock = 20:21), class = "data.frame",
row.names = c(NA, -2L))
You can do that by joining the two dataframes and getting the column that you want.
Using Base R
df1 <-
data.frame(
ID = c("90/123", "90/124"),
paddock = c(10, 11),
cow_ID = c("09/123", "09/124")
)
df2 <-
data.frame(
ID = c("90/123", "90/124"),
paddock = c(20, 21)
)
# Joining the two dataframes by ID then choosing coloum of interest
merge(df1, df2, by = c("ID"), suffixes = c(".x", ".y"))["paddock.y"]
# paddock.y
# 20
# 21
Using Dplyr
library(dplyr)
df1 <-
data.frame(
ID = c("90/123", "90/124"),
paddock = c(10, 11),
cow_ID = c("09/123", "09/124")
)
df2 <-
data.frame(
ID = c("90/123", "90/124"),
paddock = c(20, 21)
)
# Joining the two dataframes by ID then choosing coloum of interest
df1 %>%
inner_join(df2, by = c("ID"), suffixes = c(".x", ".y")) %>%
select(paddock.y) %>%
rename(paddock = paddock.y)
# paddock
# 20
# 21
If you would like to use ifelse(), maybe you can use the following code to make it
with(df2,ifelse(ID %in% df1$cow_ID,paddock,NA))
such that
> with(df2,ifelse(ID %in% df1$cow_ID,paddock,NA))
[1] 20 21
DATA
df1 <- structure(list(ID = structure(1:3, .Label = c("90/123", "90/124",
"90/125"), class = "factor"), paddock = c(10, 11, 11), cow_ID = structure(c(1L,
2L, 2L), .Label = c("09/123", "09/124"), class = "factor")), class = "data.frame", row.names = c(NA,
-3L))
df2 <- structure(list(ID = structure(1:2, .Label = c("09/123", "09/124"
), class = "factor"), paddock = c(20, 21)), class = "data.frame", row.names = c(NA,
-2L))
When using purrr::map_df(), I will occasionally pass in a list of data frames where some items are NULL. When I do, map_df() returns a data frame with fewer rows than the the original list.
I assume what's going on is that map_df() calls dplyr::bind_rows() which ignores NULL values. However, I'm not sure how to identify my problematic rows.
Here's an example:
library(purrr)
problemlist <- list(NULL, NULL, structure(list(bounds = structure(list(northeast = structure(list(
lat = 41.49, lng = -71.46), .Names = c("lat", "lng"
), class = "data.frame", row.names = 1L), southwest = structure(list(
lat = 41.49, lng = -71.46), .Names = c("lat", "lng"
), class = "data.frame", row.names = 1L)), .Names = c("northeast",
"southwest"), class = "data.frame", row.names = 1L), location = structure(list(
lat = 41.49, lng = -71.46), .Names = c("lat", "lng"
), class = "data.frame", row.names = 1L), location_type = "ROOFTOP",
viewport = structure(list(northeast = structure(list(lat = 41.49,
lng = -71.46), .Names = c("lat", "lng"), class = "data.frame", row.names = 1L),
southwest = structure(list(lat = 41.49, lng = -71.46), .Names = c("lat",
"lng"), class = "data.frame", row.names = 1L)), .Names = c("northeast",
"southwest"), class = "data.frame", row.names = 1L)), .Names = c("bounds",
"location", "location_type", "viewport"), class = "data.frame", row.names = 1L))
# what actually happens
map_df(problemlist, 'location')
# lat lng
# 1 41.49 -71.46
# desired result
map_df_with_Null_handling(problemlist, 'location')
# lat lng
# 1 NA NA
# 2 NA NA
# 3 41.49 -71.46
I considered wrapping my location accessor in one of purrr's error handling functions (eg. safely() or possibly()), but it's not that I'm running into errors--I'm just not getting the desired results.
What's the best way to handle NULL values with map_df()?
You can use the (as-of-present undocumented) .null argument for any of the map*() functions to tell the function what to do when it encounters a NULL value:
map_df(problemlist, 'location', .null = data_frame(lat = NA, lng = NA) )
# lat lng
# 1 NA NA
# 2 NA NA
# 3 41.49 -71.46
So I'm running a package in which the output of the function I'm using is something similar to this:
area ID structure
1 150 1 house
I have several of these which I get by looping through some stuff. Basically this is my loop function:
for (k in 1:length(models)) {
for (l in 1:length(patients)) {
print(result[[l]][[k]])
tableData[[l]][[k]] <- do.call(rbind, result[[l]][[k]])
}
}
So the print(result[[l]][[k]]) gives the output I showed you in the beginning. So my issue is to put all of these into one dataframe. And so far it just doesn't work, i.e. the do.call function, which I have read is the one to use when combining lists into dataframes.
So where am I going wrong here ?
Updated:
dput() output (area = value in this case):
list(list(structure(list(value = 0.0394797760472196, ID = "1 house",
structure = "house", model = structure(1L, .Label = "wood", class = "factor")), .Names = c("value",
"ID", "structure", "model"), row.names = c(NA, -1L), class = "data.frame"),
structure(list(value = 0.0394797760472196, ID = "1 house",
structure = "house", model = structure(1L, .Label = "stone", class = "factor")), .Names = c("value",
"ID", "structure", "model"), row.names = c(NA, -1L), class = "data.frame")),
list(structure(list(value = 0.0306923865158472, ID = "2 house",
structure = "house", model = structure(1L, .Label = "wood", class = "factor")), .Names = c("value",
"ID", "structure", "model"), row.names = c(NA, -1L), class = "data.frame"),
structure(list(value = 0.0306923865158472, ID = "2 house",
structure = "house", model = structure(1L, .Label = "stone", class = "factor")), .Names = c("value",
"ID", "structure", "model"), row.names = c(NA, -1L
), class = "data.frame")))
list(list(structure(list(value = 0.0394797760472196, ID = "1 house",
structure = "house", model = structure(1L, .Label = "wood", class = "factor")), .Names = c("value",
"ID", "structure", "model"), row.names = c(NA, -1L), class = "data.frame"),
structure(list(value = 0.0394797760472196, ID = "1 house",
structure = "house", model = structure(1L, .Label = "stone", class = "factor")), .Names = c("value",
"ID", "structure", "model"), row.names = c(NA, -1L), class = "data.frame")),
list(structure(list(value = 0.0306923865158472, ID = "2 house",
structure = "house", model = structure(1L, .Label = "wood", class = "factor")), .Names = c("value",
"ID", "structure", "model"), row.names = c(NA, -1L), class = "data.frame"),
structure(list(value = 0.0306923865158472, ID = "2 house",
structure = "house", model = structure(1L, .Label = "stone", class = "factor")), .Names = c("value",
"ID", "structure", "model"), row.names = c(NA, -1L
), class = "data.frame")))
Edit: I initially used purrr::map_dfr to solve this problem, but purrr::reduce is much more appropriate.
The list nesting means we have to bind rows together twice. Here's a solution using the purrr and dplyr packages and assigning your dput list to the variable my_list:
library(purrr)
library(dplyr)
my_df <- reduce(my_list, bind_rows)
#> Warning in bind_rows_(x, .id): Unequal factor levels: coercing to character
#> Warning in bind_rows_(x, .id): binding character and factor vector,
#> coercing into character vector
#> Warning in bind_rows_(x, .id): binding character and factor vector,
#> coercing into character vector
#> Warning in bind_rows_(x, .id): Unequal factor levels: coercing to character
#> Warning in bind_rows_(x, .id): binding character and factor vector,
#> coercing into character vector
#> Warning in bind_rows_(x, .id): binding character and factor vector,
#> coercing into character vector
my_df
#> value ID structure model
#> 1 0.03947978 1 house house wood
#> 2 0.03947978 1 house house stone
#> 3 0.03069239 2 house house wood
#> 4 0.03069239 2 house house stone
I find map-ing with purrr way more intuitive than do.call. Let me know if this helps!
My data is structured as follows:
dput(head(CharacterAnalysis,5))
structure(list(Character = c("A", "a", "B", "b", "C"),
Descriptor = c("Jog", "Change Direction", "Shuffle", "Walk", "Stop"),
.Names = c("Character", "Descriptor"),
row.names = c(NA, 5L), class = "data.frame")
I wish to lookup the Character and relevant Descriptor in the following data frame, but am unsure how to do so:
dput(head(StringAnalysis,3))
structure(list(MovementString = c("ACb", "aAaB", "BbCa"),
.Names = c("MovementString"),
row.names = c(NA, 3L), class = "data.frame")
My expected outcome/ data frame would be:
dput(head(Output,3))
structure(list(MovementString = c("ACb", "aAaB", "BbCa"),
MovementPerformed = c("Jog/ Stop/ Walk", "Change Direction/ Jog/ Change Direction/ Shuffle", "Shuffle/ Walk/ Stop/ Change Direction")
.Names = c("MovementString", "MovementPerformed"),
row.names = c(NA, 3L), class = "data.frame")
I would like a forward stroke (/) or similar to separate each Descriptor as it signals a new movement. Any advice on how to please complete this? My data frame CharacterAnalysis is over 1 million rows long, so I do not wish to have to search for each MovementString separately!
Thank you.
CharacterAnalysis <-
structure(list(Character = c("A", "a", "B", "b", "C"),
Descriptor = c("Jog", "Change Direction", "Shuffle", "Walk", "Stop")),
.Names = c("Character", "Descriptor"),
row.names = c(NA, 5L), class = "data.frame")
Output <-
structure(list(MovementString = c("ACb", "aAaB", "BbCa"),
MovementPerformed = c("Jog/ Stop/ Walk", "Change Direction/ Jog/ Change Direction/ Shuffle", "Shuffle/ Walk/ Stop/ Change Direction")),
.Names = c("MovementString", "MovementPerformed"),
row.names = c(NA, 3L), class = "data.frame")
# A simple approach based on names
# Build the lookup table just once
m <- CharacterAnalysis$Descriptor
names(m) <- CharacterAnalysis$Character
# Build the MovementPerformed column
Output$MovementPerformed <-
sapply(strsplit(Output$MovementString,""),
FUN = function(x) paste(m[x], collapse = "/ "))
I am trying to merge two dataframes, I've been reading the different posts but I couldn't find a way to obtain my desired output.
dfA:
Name Surname C
Ja Men T
Ale Bu T
Ge Men
dfB:
Name Surname C Ex
Ge Men T hello
Je Di T hello
Desired output:
Merge:
Name Surname C
Ja Men T
Ale Bu T
Ge Men T
Je Di T
That is, fill the columns in dfA with the available columns in dfB and ignore the columns from dfB that are not present in dfA.
I tried:
merge(dfA,dfB, by=c("Name", "Surname", "Caracter"), all.x = T)
And other combinations of the merge. I tried using dplyr but couldn't get a satisfactory results.
Any help would be aprreciated.
Thanks in advance
Data:
dfA <- data.frame(
name=c("Ja", "Ale", "Ge"),
surname=c("Men", "Bu", "Men"),
C= c("T", "T", NA))
dfB <- data.frame(
name=c("Ge", "Je"),
surname=c("Men","Di"),
C= c("T","T"),
X = c("hello","hello"))
Using dput():
# based on dput(dfA)
dfA <- structure(list(name = structure(c(3L, 1L, 2L), .Label = c("Ale",
"Ge", "Ja"), class = "factor"), surname = structure(c(2L, 1L,
2L), .Label = c("Bu", "Men"), class = "factor"), C = structure(c(1L,
1L, NA), .Label = "T", class = "factor")), .Names = c("name",
"surname", "C"), row.names = c(NA, -3L), class = "data.frame")
# based on dput(dfB)
dfB <- structure(list(name = structure(1L, .Label = "Ge", class = "factor"),
surname = structure(1L, .Label = "Men", class = "factor"),
C = "T", X = structure(1L, .Label = "hello", class = "factor")),
.Names = c("name", "surname", "C", "X"),
row.names = c(NA, -1L), class = "data.frame")
Assuming that the input is as in the output shown at the end of the question, we perform a left join of dfA with dfB . Note that coalese returns its first non-null argument -- NAs are regarded as SQL nulls:
library(sqldf)
sqldf("select A.Name, A.Surname, coalesce(A.C, B.C) C
from dfA A left join dfB B on A.Name = B.Name and A.Surname = B.Surname")
giving:
name surname C
1 Ja Men T
2 Ale Bu T
3 Ge Men T
We could use safe_full_join from my package safejoin, and resolve column conflicts using dplyr::coalesce :
# devtools::install_github("moodymudskipper/safejoin")
library(safejoin)
library(dplyr)
safe_full_join(dfA, dfB[names(dfA)], by=c("name","surname"), conflict = coalesce, check="")
# name surname C
# 1 Ja Men T
# 2 Ale Bu T
# 3 Ge Men T
# 4 Je Di T
check = "" is for not displaying warning, as we're joining on factor columns with different levels