I have a named vector with some missing values:
x = c(99, 88, 1, 2, 3, NA, NA)
names(x) = c("A", "C", "AA", "AB", "AC", "AD", "CA")
And a second dataframe which reflects the hierarchical naming structure (e.g. A is a superordinate to AA, AB, & AC)
filler = data.frame(super = c("A", "A", "A", "A", "C"), sub = c("AA", "AB", "AC", "AD", "CA"))
If a value is missing in x, I want to fill it with the superordinate from filler. So that the outcome would be
x = c(99, 88, 1, 2, 3, 99, 88)
Does anyone have any clever way to do this without looping through each possibility?
We can create a logical vector ('i1') based on the NA elements, get the index of matching elements in 'filler' with match and then do the assignmnt
i1 <- is.na(x)
x[i1] <- x[match(filler$super[match(names(x[i1]), filler$sub)], names(x))]
as.vector(x)
#[1] 99 88 1 2 3 99 88
As x is a named vector we could convert it to a dataframe (enframe) and then do a join, replace NA values with corresponding value and if needed convert it into vector again. (deframe).
library(dplyr)
library(tibble)
enframe(x) %>%
left_join(filler, by = c("name" = "sub")) %>%
mutate(value = if_else(is.na(value), value[match(super, name)], value)) %>%
select(-super) %>%
deframe()
# A C AA AB AC AD CA
#99 88 1 2 3 99 88
Related
I am trying to subset my data in ggplot based on two characters variables: model and letter. I want to subset "m1" who has the letter "a". In the original data, i have multiple rows who has "m1" and "a", but below is just a small reproducible example. Can someone guide me with how to subset it inside the command of ggplot?
model value letter
m1 5 a
m2 11 b
m3 2 c
m1 4 d
m2 22 e
m3 6 f
structure(list(model = structure(c("m1", "m2", "m3", "m1", "m2",
"m3"), format.stata = "%9s"), value = structure(c(5, 11, 2, 4,
22, 6), format.stata = "%9.0g"), letter = structure(c("a", "b",
"c", "d", "e", "f"), format.stata = "%9s")), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
We could do a group by filter and this can be used as input to ggplot
library(dplyr)
library(ggplot2)
df1 %>%
group_by(model) %>%
filter('a' %in% letter) %>%
ggplot(aes(x = letter, y = value)) +
geom_col()
Or if it is just 'm1' and 'a', do the filter at once
df1 %>%
filter(model == 'm1', letter == 'a') %>%
ggplot(aes(x = letter, y = value)) +
geom_col()
Does this work?
ggplot(subset(df,model=='m1' & letter=='a'),aes(x=letter,y=value))+
geom_point()
Explanation:
In ggplot2 the data argument allows using other functions like subset().
I am trying to build a sequence data for a recommender system. I have built a cross-tabular data (Table 1) and Table 2 as shown below:
enter image description here
I have been trying to replace all the 1's in Table 1 by the "Grade" from the Table 2 in R.
Any insight/suggestion is greatly appreciated.
Instead of replacing the first one with second, the second table and directly changed to 'wide' with dcast
library(reshape2)
res <- dcast(df2, St.No. ~ Courses, value.var = 'Grade')[names(df1)]
res
# St.No. Math Phys Chem CS
#1 1 A B
#2 2 B B
#3 3 A A C
#4 4 B B D
If we need to replace the blanks with 0
res[res =='"] <- "0"
data
df1 <- data.frame(St.No. = 1:4, Math = c(0, 0, 1, 1), Phys = c(1, 1, 0, 1),
Chem = c(0, 1, 1, 0), CS = c(1, 0, 1, 1))
df2 <- data.frame(St.No. = rep(1:4, each = 4), Courses = rep(c("Math",
"Phys", "Chem", "CS"), 4),
Grade = c("", "A", "", "B", "", "B", "B", "",
"A", "", "A", "C", "B", "B", "", "D"),
stringsAsFactors = FALSE)
So I'm trying to filter out certain things in my dataset.
Here's a really parred down example of my dataset:
fish <- data.frame ("order"=c("a", "a", "a", "b", "b", "c", "c", "d", "d", "e", "e"),
"family"= c("r", "s", "t", "r", "y", "y", "y", "u", "y", "u", "y"),
"species"=c(7, 8, 9, 6, 5, 4, 3, 10, 1, 11, 2))
so I have
fish <- fish%>%
filter(
!(order %in% c("a", "b", "c"))&
!(family %in% c("r","s","t","u"))
)
which should remove all orders in a,b,c and all families in , r, s, t, u. Leaving me with
order family species
d y 10
e y 11
But the issue is, there are two species that are in families that I am filtering out. So say species 1 is in family "r". I want species 1 to stay in the dataset, while filtering all the rest of family r. So I want the output to look like:
order family species
d y 10
e y 11
d r 1
e r 2
How can I make sure that when I'm filtering out the groups of family, it keeps these two species?
Thanks!
You could rbind the results of three separate filters:
temp1<-filter(fish,order!=c("a","b","c")&family!=c("r","s","t","u"))
temp2<-filter(fish,family=="r"&species==1)
temp3<-filter(fish,family=="s"&species==2)
fish<-rbind(temp1,temp2,temp3)
rm(temp1,temp2,temp3)
It would be most natural to have the filtering process mirror your logic --
Filter #1: filter-out undesirable order and family
Filter #2: filter desirable family, species pairs
Note: I had to change your family, species pair criteria to get matches.
library(dplyr)
library(purrr)
# your example data
fish <- tibble ("order"=c("a", "a", "a", "b", "b", "c", "c", "d", "d", "e", "e"),
"family"= c("r", "s", "t", "r", "y", "y", "y", "u", "y", "u", "y"),
"species"=c(7, 8, 9, 6, 5, 4, 3, 10, 1, 11, 2))
# put filter criteria in variables
order_filter <- c('a', 'b', 'c')
family_filter <- c('r', 's', 't', 'u')
# Filter 1
df1 <- fish %>%
filter(!order %in% order_filter,
!family %in% family_filter)
# Filter 2
df2 <- map_df(.x = list(c('r', 7), c('s', 8)),
.f = function(x) {fish %>%
filter(family == x[1], species == x[2])})
# Combine two data frames created by Filter 1 and Filter 2
df_final <- bind_rows(df1, df2)
print(df_final)
# A tibble: 4 x 3
# order family species
# <chr> <chr> <dbl>
# 1 d y 1
# 2 e y 2
# 3 a r 7
# 4 a s 8
I have a dataframe like this one :
df <- data.frame(A = c(1, 2, 3, 4, 2, 2, 1, 5, 3),
B = c("a", "b", "c", "d", NA, "b", NA, NA, NA ))
I want ro remplace this dataframe by the vlue recuperated in the other observation.
For example, in the variable A, for 1 correspond "a" in the variable B; so NA should be remplaced by a.
But for 5, we can't conclude so I keep NA.
How could I do this, I'm stuck.
Thank you.
You could try
df$B <- with(df, ave(as.character(B), A, FUN= function(x)
ifelse(is.na(x), na.omit(x), x)))
Or using data.table
library(data.table)
setDT(df)[ ,B:=ifelse(is.na(B), na.omit(B), B) , A]
Or a variant would be
setDT(df)[,B:=if(any(is.na(B))) unique(na.omit(B)), A][]
I have a matrix and would like to reorder the rows so that for example row 5 can be switched to row 2 and row 2 say to row 7. I have a list with all rownames delimited with \n and I thought I could somehow read it into R (its a txt file) and then just use the name of the matrix (in my case 'k' and do something like k[txt file,]-> k_new but this does not work since the identifiers are not the first column but are defined as rownames.
k[ c(1,5,3,4,7,6,2), ] #But probably not what you meant....
Or perhaps (if your 'k' object rownames are something other than the default character-numeric sequence):
k[ char_vec , ] # where char_vec will get matched to the row names.
(dat <- structure(list(person = c(1, 1, 1, 1, 2, 2, 2, 2), time = c(1,
2, 3, 4, 1, 2, 3, 4), income = c(100, 120, 150, 200, 90, 100,
120, 150), disruption = c(0, 0, 0, 1, 0, 1, 1, 0)), .Names = c("person",
"time", "income", "disruption"), row.names = c("h", "g", "f",
"e", "d", "c", "b", "a"), class = "data.frame"))
dat[ c('h', 'f', 'd', 'b') , ]
#-------------
person time income disruption
h 1 1 100 0
f 1 3 150 0
d 2 1 90 0
b 2 3 120 1