Data are as follows:
df<-read.table(text=" A1 A2 A3 M1 M2 M3
F M F A B A
M M F A B A
F M F A B A
F M F C B A
F M F C B A
M M F C C B
F M F C C B
M F F C C B
F F F D C B
M F F D C B
F F F D A B
F F F D A C
F F F D A C
M F F D A C
F M M B A D
F M M B A D
F M M B D D
M M M B D D
F M M B D D ", h=T)
I want to have bar plots for A1 with M1; A2 with M2 and A3 with M3. So far I've tried:
library(purrr)
library(ggplot2)
map2(names(df)[4:6], names(df)[1:3], ~
ggplot(df, aes(x = !!rlang::sym(.x), y = !!rlang::sym(.y))) +
geom_bar())
However, I get the following error:
Error: stat_count() must not be used with a y aesthetic.
I struggled to fix the error. Any help?
I want to have this for each plot:
ggplot(df, aes(x = A1, fill = M1)) +
geom_bar(position = position_dodge())
Combining both the working example and what was tried, did you want:
library(purrr)
library(ggplot2)
map2(names(df)[1:3], names(df)[4:6], ~
ggplot(df, aes_string(x = .x, fill = .y)) +
geom_bar(position = position_dodge()))
It looks like you want fill for columns 4-6 (M1, M2, and M3) - is that correct? You did not need y in aes then, just fill.
Also, you can make use of aes_string instead of rlang::sym.
Finally, added position_dodge based on your working example. Let me know if this is what you had in mind.
Related
This is a bit of an obscure one but I am hoping somebody can help. I am having trouble translating a Random Forest model into rules using the iTrees package:
The example below produces the error:
Error in vector("list", rf$ntree) : invalid 'length' argument
require(inTrees)
require(caret)
require(RRF)
x1 <- read.table(header=T, text="B P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13 Target
B1 a a a g g g g g g g g g a 9
B2 a g a g g g g g g g g g j 16
B3 a g a g g g g g g g g g a 13
B4 b i b i i i i i i i i i b 10
B5 a f a f f f f f f f f f a 8
B6 a g a g g g g g g g g g a 8
B7 b b b h i i i i i i i i b 29
B8 e g a g g g g g g g g g j 20
B9 a g a g g g g g g g g g a 14
B10 a a g a a j g g g g g g a 22
B11 a h a h h h h h h h h h a 25
B12 a f a f f f f f f f f f j 11
B13 a b a g g g g g g g g g a 18
B14 a a a g g g g g g g g g a 13
B15 b g b i i i i i i i i i j 21
B16 a g a g g g g g g g g g a 17
B17 d j d j j j j j j j j j d 18
B18 a g a g g g g g g g g g a 18
B19 a g a g g g g g g g g g j 14
B20 a g a g g g g g g g g g a 13
")
#View(x1)
#Partition
set.seed(3456)
trainIndex <- createDataPartition(x1$`Target`, p = .8,
list = FALSE,
times = 1)
dataTrain <- x1[ trainIndex,]
dataTest <- x1[-trainIndex,]
# Model with caret
fitControl <- trainControl(## 10-fold CV
method = "repeatedcv",
number = 10,
## repeated ten times
repeats = 10,
savePredictions = "all"
)
set.seed(825)
RFFit1 <- train(Target ~ ., dataTrain,
method = "RRF",
metric = "RMSE",
trControl = fitControl,
verbose = FALSE
)
RFpred1 <- predict(RFFit1, dataTest, type = "raw")
#Translate model to list
treeList <- RF2List(RFFit1)
When I examine the output of the mdoel there doesnt seem to be a column called rf$ntree. Im wondering if there is anything to do with the fact that I am using cross validation with caret? If so is there a way of using cross validation and producing an RRF object so that it can be interpreted by RF2List?
Any help greatly appreciated as always
Will.
This question already has answers here:
Simultaneously merge multiple data.frames in a list
(9 answers)
Closed 6 years ago.
I need to merge multiple dataframe with the matching values in column A. What is the most efficient way to do this and get the result.
df1
A B C
2 x r
1 c r
3 y t
df2
A D E
3 e y
1 t t
2 y t
df3
A F G
1 g y
2 f y
3 h k
result
A B C D E F G
1 c r t t g y
2 x r y t f y
3 y t y t h k
One solution is to use dplyr package and it's inner_join as follows:
library(dplyr)
df <- inner_join(df1, df2)
df <- inner_join(df, df3)
Resulting output is:
df
A B C D E F G
1 2 x r y t f y
2 1 c r t t g y
3 3 y t e y h k
Note, inner_join keeps only rows where A matches.
If you want it arranged by column A, you can add this line:
arrange(df, A)
A B C D E F G
1 1 c r t t g y
2 2 x r y t f y
3 3 y t e y h k
To merge a variable length list of data frames, it appears Reduce can be helpful along with the above inner_join:
df <- Reduce(inner_join, list(df1, df2, df3))
arrange(df, A)
A B C D E F G
1 1 c r t t g y
2 2 x r y t f y
3 3 y t e y h k
Consider A,B,C,D .... as words.
I have two DFs.
df1:
ColA
A B
B C
C D
E F
G H
A M
M
df2:
ColB
A B C D X Y Z
C D M N F K L
S H A F R M T U
Operation:
I want to search all element of df1 in df2 then append all the matching values in a new column OR may be create multiple rows.
Output 1:
ColB COlB
A B C D X Y Z A,A B,B C,C D
C D M N F K L C D,M
S H A F R M T U A,A M
Output2:
ColB Output
A B C D X Y Z A
A B C D X Y Z A B
A B C D X Y Z B C
A B C D X Y Z C D
C D M N F K L C D
C D M N F K L M
S H A F R M T U A
S H A F R M T U A M
I think this will do it, although it differs a bit from your expected answer, which I think is wrong.
First set up the input data frames:
# set up the data
df1 <- data.frame(ColA = c("A B",
"B C",
"C D",
"E F",
"G H",
"A M",
"M"),
stringsAsFactors = FALSE)
df2 <- data.frame(ColB = c("A B C D X Y Z",
"C D M N F K L",
"S H A F R M T"),
stringsAsFactors = FALSE)
Next we will form all the pairwise combinations of the things to search with the things to be searched:
# create a vector of patterns and items to search
intermediate <- as.vector(outer(df2$ColB, df1$ColA, paste, sep = "|"))
# split it into a list
intermediate <- strsplit(intermediate, "|", fixed = TRUE)
Then we can create a function to match the elements for each row of this full combination dataset The core is the foundMatch which returns a logical indicating whether all elements in ColA were present in ColB. In your examples, order does not matter, so here we split the elements and look for all of the first to be in the second.
# set up the output data.frame
Output2 <- data.frame(do.call(rbind, intermediate))
names(Output2) <- c("ColB", "Output")
# here is the core, which does the element matching
foundMatch <- apply(Output2, 1, function(x) {
tokens <- strsplit(x, " ", fixed = TRUE)
all(tokens[[2]] %in% tokens[[1]])
})
# filter out the ones with the match
Output2 <- Output2[foundMatch, ]
Output2
## ColB Output
## 1 A B C D X Y Z A B
## 2 C D M N F K L A B
## 3 S H A F R M T A B
## 10 A B C D X Y Z E F
## 14 C D M N F K L G H
## 20 C D M N F K L M
## 21 S H A F R M T M
Not exactly what you have above but I think it's correct.
It is not obvious for me how your data.frames df1 and df2 are built. But you can try to vectorise your data and match both sets.
d1 <- sort(as.character(unlist(df1)))
d2 <- sort(as.character(unlist(df2)))
# get the intersection/difference without duplicates
intersect(d1,d2)
setdiff(d1,d2)
# get all values matching with the first or with the second dataset, respectively
d1[ d1 %in% d2 ]
d2[ d2 %in% d1 ]
I a dataframe which I have subsetted using normal indexing. Code below.
dframe <- dframe[1:10, c(-3,-7:-10)]
But when I write dframe$Symbol I get the output.
BABA ORCL LFC TSM ACT ABBV MA ABEV KMI UPS
3285 Levels: A AA AA^B AAC AAN AAP AAT AAV AB ABB ABBV ABC ABEV ABG ABM ABR ABR^A ABR^B ABR^C ABRN ABT ABX ACC ACCO ACE ACG ACH ACI ACM ACN ACP ACRE ACT ACT^A ACW ADC ADM ADPT ADS ADT ADX AEB AEC AED AEE AEG AEH AEK AEL AEM AEO AEP AER AES AES^C AET AF AF^C ... ZX
I'm wondering what is happening here. Does the dframe dataframe only contain 10 rows or still all rows, but only outputs 10 rows?
Thanks
That's just the way factors work. When you subset a factor, it preserves all levels, even those that are no longer represented in the subset. For example:
f1 <- factor(letters);
f1;
## [1] a b c d e f g h i j k l m n o p q r s t u v w x y z
## Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
f2 <- f1[1:10];
f2;
## [1] a b c d e f g h i j
## Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
To answer your question, it's actually slightly tricky to append all missing levels to a factor. You have to combine the existing factor data with all missing indexes (here I'm referring to the integer indexes that the factor class internally uses to map the actual factor data to its levels vector, which is stored as an attribute on the factor object), and then rebuild a factor (using the original levels) from that combined data. Below I demonstrate this, now randomizing the subset taken from f1 to demonstrate that order does not matter:
set.seed(1); f3 <- sample(f1,10);
f3;
## [1] g j n u e s w m l b
## Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
factor(c(f3,setdiff(1:nlevels(f3),as.integer(f3))),labels=levels(f3));
## [1] g j n u e s w m l b a c d f h i k o p q r t v x y z
## Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
I would like to change the "U" to "N" in column 3-9, and change "H" to the character of column "type" of the same row. For example, "H" in the first row would be changed to "M", and so on. I really appreciate any helps for R scripting. Thanks. XW
ID type A01 A02 A03 A04 A05 A06 A07
ss001 M C A U A A H A
ss002 R A H A A A G A
ss003 R H A G A A A U
ss004 R A U A A A A A
ss005 Y C C H T T C C
ss006 Y C T U C C C H
ss007 R A G A H G U G
ss008 K G U T G T H G
ss009 Y T H C T T U C
ss010 K T G T H G T T
This should be a pretty efficient way to do this:
M <- as.matrix(df[-c(1, 2)]) ## Faster to work on a matrix
M[M == "U"] <- "N" ## Replace "U" with "N"
H <- which(M == "H", arr.ind=TRUE) ## Identify the Hs
M[H] <- df[cbind(H[, "row"], 2)] ## Replace with values from "type"
cbind(df[1:2], M) ## Combine
# ID type A01 A02 A03 A04 A05 A06 A07
# 1 ss001 M C A N A A M A
# 2 ss002 R A R A A A G A
# 3 ss003 R R A G A A A N
# 4 ss004 R A N A A A A A
# 5 ss005 Y C C Y T T C C
# 6 ss006 Y C T N C C C Y
# 7 ss007 R A G A R G N G
# 8 ss008 K G N T G T K G
# 9 ss009 Y T Y C T T N C
# 10 ss010 K T G T K G T T
You can do this with apply called on the rows of your data:
# Read in data frame with data stored as characters
df = read.table(text="ID type A01 A02 A03 A04 A05 A06 A07
ss001 M C A U A A H A
ss002 R A H A A A G A
ss003 R H A G A A A U
ss004 R A U A A A A A
ss005 Y C C H T T C C
ss006 Y C T U C C C H
ss007 R A G A H G U G
ss008 K G U T G T H G
ss009 Y T H C T T U C
ss010 K T G T H G T T", header=T, stringsAsFactors=F)
# Manipulate rows
df.mod = as.data.frame(t(apply(df, 1, function(x) {
to.modify <- x[c(-1, -2)]
to.modify[to.modify == "U"] <- "N"
to.modify[to.modify == "H"] <- x[2]
return(c(x[1:2], to.modify))
})))
names(df.mod) <- names(df)
df.mod
# ID type A01 A02 A03 A04 A05 A06 A07
# 1 ss001 M C A N A A M A
# 2 ss002 R A R A A A G A
# 3 ss003 R R A G A A A N
# 4 ss004 R A N A A A A A
# 5 ss005 Y C C Y T T C C
# 6 ss006 Y C T N C C C Y
# 7 ss007 R A G A R G N G
# 8 ss008 K G N T G T K G
# 9 ss009 Y T Y C T T N C
# 10 ss010 K T G T K G T T