I'd like to see which values have a particular entry issue, but I'm not getting things done right.
For instance, I need to print on screen values from column "c" but conditional of a given value from "b" say where [b==0].
Finally, I need to add a new string for those whose condition is true.
df<- structure(list(a = c(11.77, 10.9, 10.32, 10.96, 9.906, 10.7,
11.43, 11.41, 10.48512, 11.19), b = c(2, 3, 2, 0, 0, 0, 1, 2,
4, 0), c = c("q", "c", "v", "f", "", "e", "e", "v", "a", "c")), .Names = c("a",
"b", "c"), row.names = c(NA, -10L), class = "data.frame")
I tried this without success:
if(df[b]==0){
print(df$c)
}
if((df[b]==0)&(df[c]=="v")){
df[c] <-paste("2")
}
Thanks for helping.
The correct syntax is like df[rows, columns], so you could try:
df[df$b==0, "c"]
You can accomplish changing values using ifelse:
df$c <- ifelse(df$b==0 & df$c=="v", paste(df$c, 2, sep=""), df$c)
Does this help?
rows <- which(df$b==0)
if (length(rows)>0) {
print(df$c[rows])
df$c[rows] <- paste(df$c[rows],'2')
## maybe you wanted to have:
# df$c[rows] <- '2'
}
There are several ways to subset data in R, like e.g.:
df$c[df$b == 0]
df[df$b == 0, "c"]
subset(df, b == 0, c)
with(df, c[b == 0])
# ...
To conditionally add another column (here: TRUE/FALSE):
df$e <- FALSE; df$e[df$b == 0] <- TRUE
df <- transform(df, c = ifelse(b == 0, TRUE, FALSE))
df <- within(df, e <- ifelse(b == 0, TRUE, FALSE))
# ...
Related
Example Data:
A<- c(1,2,3,4,1,2,3,4,1,2)
B<- c(A,B,C,D,E,F,G,H,I,J)
C<- c(1,1,1,1,1,1,1,1,1,0)
D<- c(TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,TRUE,FALSE)
df1<-data.frame(A,B,C,D)
df1 %>%
select_if(
###column is <90% one value
)
So I have a table that has a few columns that are predominantly one value--like C and D in the above example. I need to get rid of any columns that are 90% or more one unique value. How can I get rid of the columns that fit this criteria?
We may use select with where, get the frequency count with table, convert to proportions, get the max value and check if it is less than .90 to select the particular column
library(dplyr)
df1 <- df1 %>%
select(where(~ max(proportions(table(.))) < .90))
data
df1 <- structure(list(A = c(1, 2, 3, 4, 1, 2, 3, 4, 1, 2), B = c("A",
"B", "C", "D", "E", "F", "G", "H", "I", "J"), C = c(1, 1, 1,
1, 1, 1, 1, 1, 1, 0), D = c(TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,
TRUE, TRUE, TRUE, FALSE)), class = "data.frame", row.names = c(NA,
-10L))
First of all, my question is related to these other ones:
Lazy evaluation to annotations expanding function
R nested map through columns
So, I got this example data:
t <- tibble(a = c("a", "b", "c", "d", "e", "f", "g", "h"),
b = c( 1, 1, 1, 1, 2, 2, 2, 2),
c = c( 1, 1, 2, 2, 3, 3, 4, 4),
d = c( NA, NA, NA, "D", "E", NA, NA, NA),
e = c("A", NA, "C", NA, NA, NA, "G", "H")
)
And this functions
f1 <- function(data, group_col, expand_col){ #, return_group_col = TRUE, name_group_col = "group_col"){
data %>%
dplyr::group_by({{group_col}}) %>%
dplyr::mutate(
{{expand_col}} := dplyr::case_when(
!is.na({{expand_col}}) ~ {{expand_col}} ,
any( !is.na({{expand_col}}) ) & is.na({{expand_col}}) ~
paste(unique(unlist(str_split(na.omit({{expand_col}}), " ")) ),
collapse = " "),
TRUE ~ NA_character_
)
) %>%
dplyr::ungroup()
}
f2 <- function(data, group_col, expand_col, fun=f1){
v1 <- rlang::syms( colnames(data)[group_col] )
v2 <- rlang::syms( colnames(data)[expand_col] )
V <- tidyr::crossing( v1, v2 )
purrr::reduce2( V$v1, V$v2, fun, .init=data )
}
The function f1 use two columns, the first one {{group_col}} is a group identifier the second one {{expand_col}} may contain an annotation or NA. After a group_by by the {{group_col}} the {{expand_col}} is filled with the data from the other rows from the same group if it is NA. Example: f1(t, c, d).
The function f2 just propagates the function f1 using two sets of columns, the first set refers to grouping columns and the second set refers to annotation columns.
Then, I want to modify the function f1 to create (if needed) another column which will contain the information about the which {{group_col}} the {{expand_col}} was felled.
That means, imagine you run: t %>% f2(3:2, 4:5) you will get this:
structure(list(a = c("a", "b", "c", "d", "e", "f", "g", "h"),
b = c(1, 1, 1, 1, 2, 2, 2, 2), c = c(1, 1, 2, 2, 3, 3, 4,
4), d = c("D", "D", "D", "D", "E", "E", "E", "E"), e = c("A",
"A", "C", "C", "G H", "G H", "G", "H")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -8L))
Which is the same to run:
t %>%
f1(c, d)# %>%
f1(b, d) %>%
f1(c, e) %>%
f1(b, e)
You may notice that some rows were annotated previously. These rows should be filled with 'self' or something equivalent.
Here the example of the output I want:
structure(list(a = c("a", "b", "c", "d", "e", "f", "g", "h"),
b = c(1, 1, 1, 1, 2, 2, 2, 2),
c = c(1, 1, 2, 2, 3, 3, 4, 4),
d = c("D", "D", "D", "D", "E", "E", "E", "E"),
e = c("A", "A", "C", "C", "G H", "G H", "G", "H"),
d_fill = c("b", "b", "c", "self", "self", "c", "b", "b"),
e_fill = c("self", "c", "self", "c", "b", "b", "self", "self")
),
class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -8L))
Then I tried this unsuccessful modification:
f1 <- function(data, group_col, expand_col){ #, return_group_col = TRUE, name_group_col = "group_col"){
fill_column <- str_c(deparse(substitute(group_col)), "fill", sep = "_")
data %>%
dplyr::group_by({{group_col}}) %>%
dplyr::mutate(
{{fill_column}} := dplyr::if_else(
!is.na({{expand_col}}) & is.na({{fill_column}}) ~ "self",
is.na({{expand_col}}) & is.na({{fill_column}}) ~ deparse(substitute(group_col)),
TRUE ~ NA_character_
),
{{expand_col}} := dplyr::case_when(
!is.na({{expand_col}}) ~ {{expand_col}} ,
any( !is.na({{expand_col}}) ) & is.na({{expand_col}}) ~
paste(unique(unlist(str_split(na.omit({{expand_col}}), " ")) ),
collapse = " "),
TRUE ~ NA_character_
)
) %>%
dplyr::ungroup()
}
But when I run t %>% f1(c, d) to test it, I got this:
Error: `condition` must be a logical vector, not a `formula` object
Run `rlang::last_error()` to see where the error occurred.
25.
stop(fallback)
24.
signal_abort(cnd)
23.
.abort(text)
22.
glubort(fmt_args(args), ..., .envir = .envir)
21.
bad_args("condition", "must be a logical vector, not {friendly_type_of(condition)}")
20.
dplyr::if_else(!is.na(~d) & is.na(~"c_fill") ~ "self", is.na(~d) &
is.na(~"c_fill") ~ deparse(substitute(group_col)), TRUE ~
NA_character_)
19.
mutate_impl(.data, dots, caller_env())
18.
mutate.tbl_df(., `:=`({
{
fill_column
} ...
17.
dplyr::mutate(., `:=`({
{
fill_column
} ...
16.
function_list[[i]](value)
15.
freduce(value, `_function_list`)
14.
`_fseq`(`_lhs`)
13.
eval(quote(`_fseq`(`_lhs`)), env, env)
12.
eval(quote(`_fseq`(`_lhs`)), env, env)
11.
withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
10.
data %>% dplyr::group_by({
{
group_col
} ...
9.
f1(., c, d)
8.
function_list[[k]](value)
7.
withVisible(function_list[[k]](value))
6.
freduce(value, `_function_list`)
5.
`_fseq`(`_lhs`)
4.
eval(quote(`_fseq`(`_lhs`)), env, env)
3.
eval(quote(`_fseq`(`_lhs`)), env, env)
2.
withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
1.
t %>% f1(c, d)
I didn't figure out what is wrong.
Thanks in advance.
I am trying to build a sequence data for a recommender system. I have built a cross-tabular data (Table 1) and Table 2 as shown below:
enter image description here
I have been trying to replace all the 1's in Table 1 by the "Grade" from the Table 2 in R.
Any insight/suggestion is greatly appreciated.
Instead of replacing the first one with second, the second table and directly changed to 'wide' with dcast
library(reshape2)
res <- dcast(df2, St.No. ~ Courses, value.var = 'Grade')[names(df1)]
res
# St.No. Math Phys Chem CS
#1 1 A B
#2 2 B B
#3 3 A A C
#4 4 B B D
If we need to replace the blanks with 0
res[res =='"] <- "0"
data
df1 <- data.frame(St.No. = 1:4, Math = c(0, 0, 1, 1), Phys = c(1, 1, 0, 1),
Chem = c(0, 1, 1, 0), CS = c(1, 0, 1, 1))
df2 <- data.frame(St.No. = rep(1:4, each = 4), Courses = rep(c("Math",
"Phys", "Chem", "CS"), 4),
Grade = c("", "A", "", "B", "", "B", "B", "",
"A", "", "A", "C", "B", "B", "", "D"),
stringsAsFactors = FALSE)
I have a dataframe like this one :
df <- data.frame(A = c(1, 2, 3, 4, 2, 2, 1, 5, 3),
B = c("a", "b", "c", "d", NA, "b", NA, NA, NA ))
I want ro remplace this dataframe by the vlue recuperated in the other observation.
For example, in the variable A, for 1 correspond "a" in the variable B; so NA should be remplaced by a.
But for 5, we can't conclude so I keep NA.
How could I do this, I'm stuck.
Thank you.
You could try
df$B <- with(df, ave(as.character(B), A, FUN= function(x)
ifelse(is.na(x), na.omit(x), x)))
Or using data.table
library(data.table)
setDT(df)[ ,B:=ifelse(is.na(B), na.omit(B), B) , A]
Or a variant would be
setDT(df)[,B:=if(any(is.na(B))) unique(na.omit(B)), A][]
I have a matrix and would like to reorder the rows so that for example row 5 can be switched to row 2 and row 2 say to row 7. I have a list with all rownames delimited with \n and I thought I could somehow read it into R (its a txt file) and then just use the name of the matrix (in my case 'k' and do something like k[txt file,]-> k_new but this does not work since the identifiers are not the first column but are defined as rownames.
k[ c(1,5,3,4,7,6,2), ] #But probably not what you meant....
Or perhaps (if your 'k' object rownames are something other than the default character-numeric sequence):
k[ char_vec , ] # where char_vec will get matched to the row names.
(dat <- structure(list(person = c(1, 1, 1, 1, 2, 2, 2, 2), time = c(1,
2, 3, 4, 1, 2, 3, 4), income = c(100, 120, 150, 200, 90, 100,
120, 150), disruption = c(0, 0, 0, 1, 0, 1, 1, 0)), .Names = c("person",
"time", "income", "disruption"), row.names = c("h", "g", "f",
"e", "d", "c", "b", "a"), class = "data.frame"))
dat[ c('h', 'f', 'd', 'b') , ]
#-------------
person time income disruption
h 1 1 100 0
f 1 3 150 0
d 2 1 90 0
b 2 3 120 1