This R code:
Iris <- data.frame(rbind(iris3[,,1], iris3[,,2], iris3[,,3]),
Sp = rep(c("s","c","v"), rep(50,3)))
train <- sample(1:150, 75)
z <- MASS::lda(Sp ~ ., Iris, prior = c(1,1,1)/3, subset = train)
MASS::predict.lda(z)
gives the following error message:
Error: 'predict.lda' is not an exported object from 'namespace:MASS'
The predict.lda function of MASS is documented but, apparently, not part of the package's namespace. Why not?
This problem is important because I need to use predict.lda in a package of my own and this error is making it fail the CRAN checks.
We can load the package and then use predict
library(MASS)
predict(z)
Or specify the :::. According to ?":::"
Accessing exported and internal variables, i.e. R objects (including lazy loaded data sets) in a namespace.
MASS:::predict.lda(z)
#$class
# [1] v s s s s c s v s v v v v c v v c v c s s s s c c v c v v c s s v c s s c v s c v v s c s c s c c s v c s s c s s c c c s c s v
#[65] v v v s c s c v v s s
#Levels: c s v
#$posterior
# c s v
#107 3.513603e-03 1.352029e-37 9.964864e-01
#37 2.749629e-26 1.000000e+00 5.088976e-50
# ...
Or another option is to get the function from name space
predictlda <- getFromNamespace("predict.lda", "MASS")
predictlda(z)
#$class
# [1] v s s s s c s v s v v v v c v v c v c s s s s c c v c v v c s s v c s s c v s c v v s c s c s c c s v c s s c s s c c c s c s v
#[65] v v v s c s c v v s s
#Levels: c s v
#$posterior
# c s v
#107 3.513603e-03 1.352029e-37 9.964864e-01
#37 2.749629e-26 1.000000e+00 5.088976e-50
# ..
Related
I want to add a total row (as in the Excel tables) while writing my data.frame in a worksheet.
Here is my present code (using openxlsx):
writeDataTable(wb=WB, sheet="Data", x=X, withFilter=F, bandedRows=F, firstColumn=T)
X contains a data.frame with 8 character variables and 1 numeric variable. Therefore the total row should only contain total for the numeric row (it will be best if somehow I could add the Excel total row feature, like I did with firstColumn while writing the table to the workbook object rather than to manually add a total row).
I searched for a solution both in StackOverflow and the official openxslx documentation but to no avail. Please suggest solutions using openxlsx.
EDIT:
Adding data sample:
A B C D E F G H I
a b s r t i s 5 j
f d t y d r s 9 s
w s y s u c k 8 f
After Total row:
A B C D E F G H I
a b s r t i s 5 j
f d t y d r s 9 s
w s y s u c k 8 f
na na na na na na na 22 na
library(janitor)
adorn_totals(df, "row")
#> A B C D E F G H I
#> a b s r t i s 5 j
#> f d t y d r s 9 s
#> w s y s u c k 8 f
#> Total - - - - - - 22 -
If you prefer empty space instead of - in the character columns you can specify fill = "" or fill = NA.
Assuming your data is stored in a data.frame called df:
df <- read.table(text =
"A B C D E F G H I
a b s r t i s 5 j
f d t y d r s 9 s
w s y s u c k 8 f",
header = TRUE,
stringsAsFactors = FALSE)
You can create a row using lapply
totals <- lapply(df, function(col) {
ifelse(!any(!is.numeric(col)), sum(col), NA)
})
and add it to df using rbind()
df <- rbind(df, totals)
head(df)
A B C D E F G H I
1 a b s r t i s 5 j
2 f d t y d r s 9 s
3 w s y s u c k 8 f
4 <NA> <NA> <NA> <NA> <NA> <NA> <NA> 22 <NA>
Consider A,B,C,D .... as words.
I have two DFs.
df1:
ColA
A B
B C
C D
E F
G H
A M
M
df2:
ColB
A B C D X Y Z
C D M N F K L
S H A F R M T U
Operation:
I want to search all element of df1 in df2 then append all the matching values in a new column OR may be create multiple rows.
Output 1:
ColB COlB
A B C D X Y Z A,A B,B C,C D
C D M N F K L C D,M
S H A F R M T U A,A M
Output2:
ColB Output
A B C D X Y Z A
A B C D X Y Z A B
A B C D X Y Z B C
A B C D X Y Z C D
C D M N F K L C D
C D M N F K L M
S H A F R M T U A
S H A F R M T U A M
I think this will do it, although it differs a bit from your expected answer, which I think is wrong.
First set up the input data frames:
# set up the data
df1 <- data.frame(ColA = c("A B",
"B C",
"C D",
"E F",
"G H",
"A M",
"M"),
stringsAsFactors = FALSE)
df2 <- data.frame(ColB = c("A B C D X Y Z",
"C D M N F K L",
"S H A F R M T"),
stringsAsFactors = FALSE)
Next we will form all the pairwise combinations of the things to search with the things to be searched:
# create a vector of patterns and items to search
intermediate <- as.vector(outer(df2$ColB, df1$ColA, paste, sep = "|"))
# split it into a list
intermediate <- strsplit(intermediate, "|", fixed = TRUE)
Then we can create a function to match the elements for each row of this full combination dataset The core is the foundMatch which returns a logical indicating whether all elements in ColA were present in ColB. In your examples, order does not matter, so here we split the elements and look for all of the first to be in the second.
# set up the output data.frame
Output2 <- data.frame(do.call(rbind, intermediate))
names(Output2) <- c("ColB", "Output")
# here is the core, which does the element matching
foundMatch <- apply(Output2, 1, function(x) {
tokens <- strsplit(x, " ", fixed = TRUE)
all(tokens[[2]] %in% tokens[[1]])
})
# filter out the ones with the match
Output2 <- Output2[foundMatch, ]
Output2
## ColB Output
## 1 A B C D X Y Z A B
## 2 C D M N F K L A B
## 3 S H A F R M T A B
## 10 A B C D X Y Z E F
## 14 C D M N F K L G H
## 20 C D M N F K L M
## 21 S H A F R M T M
Not exactly what you have above but I think it's correct.
It is not obvious for me how your data.frames df1 and df2 are built. But you can try to vectorise your data and match both sets.
d1 <- sort(as.character(unlist(df1)))
d2 <- sort(as.character(unlist(df2)))
# get the intersection/difference without duplicates
intersect(d1,d2)
setdiff(d1,d2)
# get all values matching with the first or with the second dataset, respectively
d1[ d1 %in% d2 ]
d2[ d2 %in% d1 ]
I am trying to do something very simple with data.table and I lost the idiomatic way to do it
library(data.table)
set.seed(1)
DT = data.table(a=sample(letters,1e5,T), b=sample(letters,1e5,T), c=rnorm(1e5))
DT2 = data.table(a=sample(letters,5,T), b=sample(letters,5,T))
DT2
a b
1: k h
2: e v
3: f n
4: m q
5: w v
I want to select the rows of DT that match those of DT2.
As such the number of rows after operation will always be smaller that the initial table.
I want something doing this:
> DT[paste(a,b) %chin% DT2[,paste(a,b)]]
a b c
1: m q -0.4974579
2: e v -0.1325602
3: w v -1.8081050
4: m q 0.9025120
5: w v -0.4958802
---
729: f n 0.5604650
730: f n -1.2607321
731: m q 0.5146013
732: m q -1.8329656
733: k h -0.9752011
> DT2[paste(a,b) %chin% DT[,paste(a,b)]]
a b
1: e v
2: f n
3: k h
4: m q
5: w v
>
An inner join should do:
setkey(DT, a, b)[DT2, nomatch=0]
Produces:
a b c
1: k h -1.6592442
2: k h 1.1946471
3: k h -0.8694933
4: k h 0.7789158
5: k h -1.3142607
---
729: w v -0.3516787
730: w v 0.5272145
731: w v -0.7531717
732: w v 0.3352228
733: w v 0.1182353
If you want to know which values in DT2 exist in DT then:
unique(setkey(DT[, .(a, b)], a, b))[DT2, nomatch=0]
I a dataframe which I have subsetted using normal indexing. Code below.
dframe <- dframe[1:10, c(-3,-7:-10)]
But when I write dframe$Symbol I get the output.
BABA ORCL LFC TSM ACT ABBV MA ABEV KMI UPS
3285 Levels: A AA AA^B AAC AAN AAP AAT AAV AB ABB ABBV ABC ABEV ABG ABM ABR ABR^A ABR^B ABR^C ABRN ABT ABX ACC ACCO ACE ACG ACH ACI ACM ACN ACP ACRE ACT ACT^A ACW ADC ADM ADPT ADS ADT ADX AEB AEC AED AEE AEG AEH AEK AEL AEM AEO AEP AER AES AES^C AET AF AF^C ... ZX
I'm wondering what is happening here. Does the dframe dataframe only contain 10 rows or still all rows, but only outputs 10 rows?
Thanks
That's just the way factors work. When you subset a factor, it preserves all levels, even those that are no longer represented in the subset. For example:
f1 <- factor(letters);
f1;
## [1] a b c d e f g h i j k l m n o p q r s t u v w x y z
## Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
f2 <- f1[1:10];
f2;
## [1] a b c d e f g h i j
## Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
To answer your question, it's actually slightly tricky to append all missing levels to a factor. You have to combine the existing factor data with all missing indexes (here I'm referring to the integer indexes that the factor class internally uses to map the actual factor data to its levels vector, which is stored as an attribute on the factor object), and then rebuild a factor (using the original levels) from that combined data. Below I demonstrate this, now randomizing the subset taken from f1 to demonstrate that order does not matter:
set.seed(1); f3 <- sample(f1,10);
f3;
## [1] g j n u e s w m l b
## Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
factor(c(f3,setdiff(1:nlevels(f3),as.integer(f3))),labels=levels(f3));
## [1] g j n u e s w m l b a c d f h i k o p q r t v x y z
## Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
I would like to change the "U" to "N" in column 3-9, and change "H" to the character of column "type" of the same row. For example, "H" in the first row would be changed to "M", and so on. I really appreciate any helps for R scripting. Thanks. XW
ID type A01 A02 A03 A04 A05 A06 A07
ss001 M C A U A A H A
ss002 R A H A A A G A
ss003 R H A G A A A U
ss004 R A U A A A A A
ss005 Y C C H T T C C
ss006 Y C T U C C C H
ss007 R A G A H G U G
ss008 K G U T G T H G
ss009 Y T H C T T U C
ss010 K T G T H G T T
This should be a pretty efficient way to do this:
M <- as.matrix(df[-c(1, 2)]) ## Faster to work on a matrix
M[M == "U"] <- "N" ## Replace "U" with "N"
H <- which(M == "H", arr.ind=TRUE) ## Identify the Hs
M[H] <- df[cbind(H[, "row"], 2)] ## Replace with values from "type"
cbind(df[1:2], M) ## Combine
# ID type A01 A02 A03 A04 A05 A06 A07
# 1 ss001 M C A N A A M A
# 2 ss002 R A R A A A G A
# 3 ss003 R R A G A A A N
# 4 ss004 R A N A A A A A
# 5 ss005 Y C C Y T T C C
# 6 ss006 Y C T N C C C Y
# 7 ss007 R A G A R G N G
# 8 ss008 K G N T G T K G
# 9 ss009 Y T Y C T T N C
# 10 ss010 K T G T K G T T
You can do this with apply called on the rows of your data:
# Read in data frame with data stored as characters
df = read.table(text="ID type A01 A02 A03 A04 A05 A06 A07
ss001 M C A U A A H A
ss002 R A H A A A G A
ss003 R H A G A A A U
ss004 R A U A A A A A
ss005 Y C C H T T C C
ss006 Y C T U C C C H
ss007 R A G A H G U G
ss008 K G U T G T H G
ss009 Y T H C T T U C
ss010 K T G T H G T T", header=T, stringsAsFactors=F)
# Manipulate rows
df.mod = as.data.frame(t(apply(df, 1, function(x) {
to.modify <- x[c(-1, -2)]
to.modify[to.modify == "U"] <- "N"
to.modify[to.modify == "H"] <- x[2]
return(c(x[1:2], to.modify))
})))
names(df.mod) <- names(df)
df.mod
# ID type A01 A02 A03 A04 A05 A06 A07
# 1 ss001 M C A N A A M A
# 2 ss002 R A R A A A G A
# 3 ss003 R R A G A A A N
# 4 ss004 R A N A A A A A
# 5 ss005 Y C C Y T T C C
# 6 ss006 Y C T N C C C Y
# 7 ss007 R A G A R G N G
# 8 ss008 K G N T G T K G
# 9 ss009 Y T Y C T T N C
# 10 ss010 K T G T K G T T