why does Mupad replaced my E(s) here? - mupad

I have the code:
C(s):=E(s)*G(s);
B(s):=C(s)*H(s);
openLoopTransferFunction:=B(s)/E(s)
Why does Mupad gives as output e(s)?
Mupad output:
G(s)*exp(1)(s) (or written G(s)*e(s) in blue)
How can I keep my E(s) not changed automatically by Mupad ...To be honest this is retarded...
Bonus :(
....Wtf
E(s):=R(s)-B(s)
Error: The identifier 'E' is protected. [_assign]

There are some protected identifiers in MuPAD. Usually, they use uppercase letters. For example, I denotes the imaginary unit, and E represents Euler's number. See the difference between entering E (upright letter 'e') and e (slanted letter 'e').
If you want to use any of these identifiers in your fashion, you can unprotect them:
unprotect E
results in:
ProtectLevelError
which is not an error message but the former protection level that you just changed. Check the succesful change by entering the same command again. This time, you will get:
ProtectLevelNone
Now you can assign to E as you would to any other identifier:
E(s) := R(s) - B(s);
will give the expected result.
If you want to not define E explicitly, you should delete it after lifting the protection:
delete(E)
Then it will be available like any other symbol.
In any case, if you need to use the Euler's number later, you will have to use exp(1) instead of E.
To get a list of all length-one identifiers, type:
select(op(map(op(anames(All)), expr2text)), x -> bool(length(x) = 1))
giving:
"E", "I", "O", "D"
Similarly, for length two, this gives:
"N_", "Re", "R_", "Si", "C_", "is", "Z_", "op", "id", "Li", "ln", "Im", "Ax",
"Q_", "fp", "Ci", "Ei"

Related

calling an API in local Rstudio with multiple values as input for one variable

I can call the api from local Rstudio by this script successfully:
The api is published in Domino datalab
library(httr)
library(rjson)
url <- "XXX"
response <- POST(
url,
authenticate("XXX", "XXX", type = "basic"),
body=toJSON(list(data=list(
"product_list_Zone"= "ZONE 1",
"product_list_brand"= "A1",
"product_list_FY22SRP"=390,
"commodityPrice"= 415.59))),
content_type("application/json"))
However, I want to have multiple values for each of "product_list_brand" and "product_list_FY22SRP". Say, "product_list_brand" can be A1 and A2, and "product_list_FY22SRP" will be 390 (corresponding to A1) and 400 (corresponding to A2).
I tried something like replacing "product_list_brand"= "A1" with "product_list_brand"= c("A1, A2") but this won't work. I also tried "product_list_brand"= toJSON(c("A1, A2")) but it won't work still.
So how should I have a correct output with multiple values for each of "product_list_brand" and "product_list_FY22SRP" inputs? I think it's a format issue and I can call my api successfully function using "product_list_brand"= c("A1, A2") (but not calling the api, it did not work).
(and one interesting thing is I have to use the format body=toJSON(list(data=list())) here, as per the example from Domino datalab, so far using other formats will occur errors. )
Thanks in advance!
Whether or not you can pass multiple values in on argument is controlled wholly on the API side, it has nothing to do with the client. If the remote end does not support it, the only thing you can do is repeat the query with individual values.
Perhaps:
func <- function(url, bodies, ..., expand = FALSE) {
bodies <- if (expand) do.call(expand.grid, bodies) else as.data.frame(bodies)
bodies <- do.call(mapply, c(list(FUN = function(...) jsonlite::toJSON(list(data = list(...)))), bodies))
lapply(bodies, function(body) do.call(httr::POST, c(list(url = url, body = body), ...)))
}
func("XXX",
bodies = list("product_list_Zone" = "ZONE 1", "product_list_brand" = c("A1", "A2"),
"product_list_FY22SRP" = 390, "commodityPrice" = 415.59),
httr::authenticate("XXX", "XXX", type = "basic"),
httr::content_type("application/json"))
This should always return a list, each element is the result from one call to the API.
The only difference that expand= makes is if/when you provide more than one multi-value argument. For an example, I'll use two zones and two brands, and interrupt the steps to show the body passed to each call to func:
cat(sep = "\n",
func("XXX", bodies = list("product_list_Zone" = c("ZONE 1", "ZONE 2"), "product_list_brand" = c("A1","A2"), "product_list_FY22SRP" = 390, "commodityPrice" = 415.59), httr::authenticate("XXX", "XXX", type = "basic"), httr::content_type("application/json"))
)
# {"data":{"product_list_Zone":["ZONE 1"],"product_list_brand":["A1"],"product_list_FY22SRP":[390],"commodityPrice":[415.59]}}
# {"data":{"product_list_Zone":["ZONE 2"],"product_list_brand":["A2"],"product_list_FY22SRP":[390],"commodityPrice":[415.59]}}
cat(sep = "\n",
func("XXX", bodies = list("product_list_Zone" = c("ZONE 1", "ZONE 2"), "product_list_brand" = c("A1","A2"), "product_list_FY22SRP" = 390, "commodityPrice" = 415.59), httr::authenticate("XXX", "XXX", type = "basic"), httr::content_type("application/json"), expand = TRUE)
)
# {"data":{"product_list_Zone":["ZONE 1"],"product_list_brand":["A1"],"product_list_FY22SRP":[390],"commodityPrice":[415.59]}}
# {"data":{"product_list_Zone":["ZONE 2"],"product_list_brand":["A1"],"product_list_FY22SRP":[390],"commodityPrice":[415.59]}}
# {"data":{"product_list_Zone":["ZONE 1"],"product_list_brand":["A2"],"product_list_FY22SRP":[390],"commodityPrice":[415.59]}}
# {"data":{"product_list_Zone":["ZONE 2"],"product_list_brand":["A2"],"product_list_FY22SRP":[390],"commodityPrice":[415.59]}}
With expand=FALSE (the default), it is assumed that the arguments will "recycle" (in R's sense) naturally: all vectors are either:
length 1, therefore repeated the length of the longest argument
length n, the longest argument
length of some integral divisor of n (side point: this is the feature of R's recycling that I do not think is a sane default ...)
For example, data.frame(a=1, b=1:2, d=1:6) will not err, and shows all three of the rules above.
With expand=TRUE, though, it uses expand.grid to find all permutations of all arguments.
Caveat emptor: I did not test this with an actual API, so the POST functionality might need some tweaking.

Problem with upset plot intersection numbers

I have four sets A, B, C and D like below:
A <- c("ENSG00000103472", "ENSG00000130600", "ENSG00000177335", "ENSG00000177337",
"ENSG00000178977", "ENSG00000180139", "ENSG00000180539", "ENSG00000187621",
"ENSG00000188511", "ENSG00000197099", "ENSG00000203446", "ENSG00000203739",
"ENSG00000203804", "ENSG00000204261", "ENSG00000204282", "ENSG00000204584",
"ENSG00000205056", "ENSG00000205837", "ENSG00000206337", "ENSG00000213057")
B <- c("ENSG00000146521", "ENSG00000165511", "ENSG00000174171", "ENSG00000176659",
"ENSG00000179428", "ENSG00000179840", "ENSG00000180539", "ENSG00000204261",
"ENSG00000204282", "ENSG00000204949", "ENSG00000206337", "ENSG00000223534",
"ENSG00000223552", "ENSG00000223725", "ENSG00000226252", "ENSG00000226751",
"ENSG00000226777", "ENSG00000227066", "ENSG00000227260", "ENSG00000227403")
C <- c("ENSG00000167912", "ENSG00000168405", "ENSG00000172965", "ENSG00000177234",
"ENSG00000177699", "ENSG00000177822", "ENSG00000179428", "ENSG00000179840",
"ENSG00000180139", "ENSG00000181800", "ENSG00000181908", "ENSG00000183674",
"ENSG00000189238", "ENSG00000196668", "ENSG00000196979", "ENSG00000197301",
"ENSG00000203446", "ENSG00000203999", "ENSG00000204261", "ENSG00000206337")
D <- c("ENSG00000122043", "ENSG00000162888", "ENSG00000167912", "ENSG00000176320",
"ENSG00000177699", "ENSG00000179253", "ENSG00000179428", "ENSG00000179840",
"ENSG00000180539", "ENSG00000181800", "ENSG00000185433", "ENSG00000188511",
"ENSG00000189238", "ENSG00000197301", "ENSG00000205056", "ENSG00000205562",
"ENSG00000213279", "ENSG00000214922", "ENSG00000215533", "ENSG00000218018")
An upset plot gave me following result:
library(UpSetR)
mine <- list("A" = A,
"B" = B,
"C" = C,
"D" = D)
upset(fromList(mine), keep.order = TRUE)
But I'm interested in looking at intersections between specific sets. A & B, A & C, A & D. So, I did it like below:
upset(fromList(mine), intersections = list(list("A"),list("B"),list("C"),
list("D"),list("A", "B"),
list("A", "C"),
list("A", "D")), keep.order = TRUE)
But, the common between A & B are 4, A & C are 4 and A & D are 3. Why the above upset plot show wrong numbers?
How to make it right showing correct common number? I don't want the common between all sets.
The numbers are correct! The issue is very specific and complex.
There are different ways to calculate set intersection size:
"distinct" mode
"intersect" mode
"union" mode
UpSetR uses the "distinct" mode.
The "intersect" mode may be what the user expects.
ComplexHeatmap and ComplexUpset packages allows the user to choose which mode to use.
I found a real sufficient explanation by Jakob Rosenthal here https://github.com/hms-dbmi/UpSetR/issues/72 especially this graphic:

Classifying tags in R with grepl in ifelse

I am having an issue with some R code. I am trying to classify text values from a column into a new column. My data is a collection of tags used on the gis.stackexchange site, which has ~2,500 rows. My goal is to classify the tags as either COTS, FOSS, or other. Reviewing the tags there are two "scenarios"; tags that are used once (i.e. anaconda) and tags that have a term used multiple times (i.e. qgis, qgis-desktop, qgis-server, etc.). This scenario is true for both COTS and FOSS tags.
My approach was to do the following:
create a vector with all tags that represent FOSS
create a vector with all tags that represent COTS
create a new column called software and code using ifelse
ifelse - where the tagName is %in% FOSS then code as FOSS
in the ifelse use grep on the FOSS vector to pattern match tags that may be used multiple times (i.e. qgis) and code as FOSS
Repeat this for COTS
I am getting an issue where the last grep (COTS) is being coded as FOSS. Obviously there is something wrong, but I cannot seem to figure out the issue. Below is the code and a link to the source data.
Shared folder with source CSV
Tag vectors -- FOSS and COTS
foss <- c("anaconda", "android", "apache", "aptana", "google", "blender", "cordova",
"docker", "drupal", "eclipse", "facebook", "firefox", "ftools", "fwtools",
"geodjango", "geopandas", "geomoose", "geonetwork", "geonode", "geotools",
"ggmap", "ggplot2", "gimp", "github", "gme", "chrome", "gvsig", "h2gis",
"hadoop", "inkscape", "lastools", "laszip", "mongodb", "neo4j", "numpy",
"open-data-kit", "opencv", "opendronemap", "openev", "opengeo-suite-composer",
"opengl", "openjump", "openstreetmap", "opentopomap", "opentripplanner", "openwind",
"orfeo-toolbox", "pandas", "pdal", "pgrouting", "pg2shape", "phonegap",
"plpgsql", "ppygis", "pydev", "pygdal", "pyproj", "pyqspatialite", "rasterlite",
"raster2pgsql", "rdal", "saga", "shapely", "shp2pgsql", "sp", "sf",
"spatialite-gui", "three-js", "unity3d", "wordpress", "youtube", "bing-maps",
"dropbox", "instagram", "sketchup", "carto", "django", "gdal", "geoserver",
"grass", "jupyter", "leaflet", "mapbox", "matplotlib", "mysql", "ogr", "openlayers",
"osgeo", "osm", "pgadmin", "postgis", "postgresql", "proj4", "pyqgis", "qgis",
"qt", "scikit", "scipy", "tilemill")
cots <- c("autodesk", "bentley", "cityengine", "drone2map", "ecognition", "envi", "er-mapper",
"et-geowizards", "excel", "geomatica", "geosoft", "global-mapper", "illustrator",
"mac", "matlab", "microstation", "modelbuilder", "pix4d", "plsql", "powerpoint",
"silverlight", "spss", "tableau", "xtools-pro", "mapinfo", "arc", "oracle",
"erdas", "esri", "fme", "microsoft", "-analyst")
Create new column with classified values calculated based on tag vector
tags$software <- ifelse(tags$tagName %in% foss, "FOSS",
ifelse(grep(foss, tags$tagName, fixed = TRUE), "FOSS",
ifelse(tags$tagName %in% cots, "COTS",
ifelse(grep(cots, tags$tagName, fixed = TRUE), "COTS",
"other"))))
When I run the code the following error is produced: argument 'pattern' has length > 1 and only the first element will be used
I am sure it is a very simple issue, but I cannot seem to figure it out.
With tidyverse:
tags<-data.frame(tagName=c("opengl","openglGHSAJKGNKS","arc","arc93257","asnsgn"))
tags%>%
mutate(software = case_when(
tagName %in% foss ~ "FOSS",
grepl(paste(foss,collapse="|"),tagName) ~ "FOSS",
tagName %in% cots ~ "COTS",
grepl(paste(cots,collapse="|"), tagName) ~ "COTS",
T ~ "other"))
tagName software
1 opengl FOSS
2 openglGHSAJKGNKS FOSS
3 arc COTS
4 arc93257 COTS
5 asnsgn other
Two things. First of all, you need grepl() because of the logical output. Secondly, grepl() does not work with a character vector, therefore you need to collapse it like this "anaconda|android|..." and omit the fixed = TRUE to work.
This should do it:
tags$software <- ifelse(tags$tagName %in% foss, "FOSS",
ifelse(grepl(paste(foss, collapse = "|"), tags$tagName), "FOSS",
ifelse(tags$tagName %in% cots, "COTS",
ifelse(grepl(paste(cots, collapse = "|"), tags$tagName), "COTS",
"other"))))

String pulled directly from source data seems to not match string in source data

I have a string that is failing to evaluate as a match with itself. I am trying to do a simple subset based on one of 8 possible values in a column,
out <- df[df$`Var name` == "string",]
I've had it work multiple times with different strings but for some reason this string fails. I have tried to get the exact string (thinking there may be some character encoding issue) from the source using the four below avenues but have had no success. Even when I make an explicit call to a cell I know contains that string and copy that into an evaluation statement it fails
> df[i,j]
[1] "string"
df[i,j]=="string" # pasted from above line
I don't understand how I can be explicitly pasting the output I was just given and it not match.
## attempts to get exact string to paste into subset statement
# from dput
"IF APPLICABLE – Which of the following best characterizes the expectations with"
# from calling a specific row/col (df[i, j])
[1] "IF APPLICABLE – Which of the following best characterizes the expectations with"
# from the source pane of rstudio
IF APPLICABLE – Which of the following best characterizes the expectations with
# from the source excel file
IF APPLICABLE – Which of the following best characterizes the expectations with
I don't have a clue what could be going on here. I am explicitly drawing the string straight from the data and yet it still fails to evaluate as true. Is there something going on in the background that I'm not seeing? Am I overlooking something ridiculously simple?
edit:
I subset based on another way, below is a dput and actual example of what I'm doing:
> dput(temp)
structure(list(`Item Stem` = "IF APPLICABLE – Which of the following best characterizes the expectations with",
`Item Response` = "It was required.", orgchar_group = "locale",
`Org Characteristic` = "Rural", N = 487, percent = 34.5145287030475,
`Graphs note` = NA_character_, `Report note` = NA_character_,
`Other note` = NA_character_, subsig = 1, overall = 0, varname = NA_character_,
statsig = NA_real_, use = NA_real_, difference = 9.16044821292665), .Names = c("Item Stem",
"Item Response", "orgchar_group", "Org Characteristic", "N",
"percent", "Graphs note", "Report note", "Other note", "subsig",
"overall", "varname", "statsig", "use", "difference"), row.names = 288L, class = "data.frame")
> temp[1,1]
[1] "IF APPLICABLE – Which of the following best characterizes the expectations with"
> temp[1,1] == "IF APPLICABLE – Which of the following best characterizes the expectations with"
[1] FALSE
Turns out it was in fact a non-printable character, shoutout to the commenters for helping me figure it out by 1) suggesting it and 2) showing that it worked for them.
I was able to figure it out using insights from here (& here) and here.
I used a grep command (from #Tyler Rinker) to determine that there was in fact a non-ASCII character in my string, and a stringi command (from #hadley) to determine what kind. I then used base solution from #Josh O'Brien to remove it. Turns out it was the heiphen.
# working in the temp df
> x <- temp[1,1]
> grepl("[^ -~]", x)
[1] TRUE
> stringi::stri_enc_mark(x)
[1] "UTF-8"
> iconv(x, "UTF-8", "ASCII", sub="")
[1] "IF APPLICABLE Which of the following best characterizes the expectations with"
# set x as df$`Var name` and reassign it to fix
df$`Var name` <- iconv(df$`Var name`, "UTF-8", "ASCII", sub="")
Still don't understand it enough to explain why it happened but it's fixed now.

Efficiently match multiple strings/keywords to multiple texts in R

I am trying to efficiently map exact peptides (short sequences of amino acids in the 26 character alphabet A-Z1) to proteins (longer sequences of the same alphabet). The most efficient way to do this I'm aware of is an Aho-Corasick trie (where peptides are the keywords). Unfortunately I can't find a version of AC in R that will work with a non-nucleotide alphabet (Biostrings' PDict and Starr's match_ac are both hard-coded for DNA).
As a crutch I've been trying to parallelize a basic grep approach. But I'm having trouble figuring out a way to do so without incurring significant IO overhead. Here is a brief example:
peptides = c("FSSSGGGGGGGR","GAHLQGGAK","GGSGGSYGGGGSGGGYGGGSGSR","IISNASCTTNCLAPLAK")
if (!exists("proteins"))
{
biocLite("biomaRt", ask=F, suppressUpdates=T, suppressAutoUpdate=T)
library(biomaRt)
ensembl = useMart("ensembl",dataset="hsapiens_gene_ensembl")
proteins = getBM(attributes=c('peptide', 'refseq_peptide'), filters='refseq_peptide', values=c("NP_000217", "NP_001276675"), mart=ensembl)
row.names(proteins) = proteins$refseq_peptide
}
library(snowfall)
library(Biostrings)
library(plyr)
sfInit(parallel=T, cpus=detectCores()-1)
allPeptideInstances = NULL
i=1
increment=100
count=nrow(proteins)
while(T)
{
print(paste(i, min(count, i+increment), sep=":"))
text_source = proteins[i:min(count, i+increment),]
text = text_source$peptide
#peptideInstances = sapply(peptides, regexpr, text, fixed=T, useBytes=T)
peptideInstances = sfSapply(peptides, regexpr, text, fixed=T, useBytes=T)
dimnames(peptideInstances) = list(text_source$refseq_peptide, colnames(peptideInstances))
sparsePeptideInstances = alply(peptideInstances, 2, .fun = function(x) {x[x > 0]}, .dims = T)
allPeptideInstances = c(allPeptideInstances, sparsePeptideInstances, recursive=T)
if (i==count | nrow(text_source) < increment)
break
i = i+increment
}
sfStop()
There are a few issues here:
peptideInstances here is a dense matrix, so
returning it from each worker is very verbose. I have broken it up
into blocks so that I'm not dealing with a 40,000 (proteins) x 60,000
(peptides) matrix.
Parallelizing over peptides, when it would make
more sense to parallelize over the proteins because they're bigger.
But I got frustrated with trying to do it by protein because:
This code breaks if there is only one protein in text_source.
Alternatively, if anyone is aware of a better solution in R, I'm happy to use that. I've spent enough time on this I probably would have been better served implementing Aho-Corasick.
1 Some of those are ambiguity codes, but for simplicity, ignore that.
I learned Rcpp and implemented an Aho-Corasick myself. Now CRAN has a good general purpose multiple-keyword search package.
Here are some usage examples:
listEquals = function(a, b) { is.null(unlist(a)) && is.null(unlist(b)) || !is.null(a) && !is.null(b) && all(unlist(a) == unlist(b)) }
# simple search of multiple keywords in a single text
keywords = c("Abra", "cadabra", "is", "the", "Magic", "Word")
oneSearch = AhoCorasickSearch(keywords, "Is Abracadabra the Magic Word?")
stopifnot(listEquals(oneSearch[[1]][[1]], list(keyword="Abra", offset=4)))
stopifnot(listEquals(oneSearch[[1]][[2]], list(keyword="cadabra", offset=8)))
stopifnot(listEquals(oneSearch[[1]][[3]], list(keyword="the", offset=16)))
stopifnot(listEquals(oneSearch[[1]][[4]], list(keyword="Magic", offset=20)))
stopifnot(listEquals(oneSearch[[1]][[5]], list(keyword="Word", offset=26)))
# search a list of lists
# * sublists are accessed by index
# * texts are accessed by index
# * non-matched texts are kept (to preserve index order)
listSearch = AhoCorasickSearchList(keywords, list(c("What in", "the world"), c("is"), "secret about", "the Magic Word?"))
stopifnot(listEquals(listSearch[[1]][[1]], list()))
stopifnot(listEquals(listSearch[[1]][[2]][[1]], list(keyword="the", offset=1)))
stopifnot(listEquals(listSearch[[2]][[1]][[1]], list(keyword="is", offset=1)))
stopifnot(listEquals(listSearch[[3]], list()))
stopifnot(listEquals(listSearch[[4]][[1]][[1]], list(keyword="the", offset=1)))
stopifnot(listEquals(listSearch[[4]][[1]][[2]], list(keyword="Magic", offset=5)))
stopifnot(listEquals(listSearch[[4]][[1]][[3]], list(keyword="Word", offset=11)))
# named search of a list of lists
# * sublists are accessed by name
# * matched texts are accessed by name
# * non-matched texts are dropped
namedSearch = AhoCorasickSearchList(keywords, list(subject=c(phrase1="What in", phrase2="the world"),
verb=c(phrase1="is"),
predicate1=c(phrase1="secret about"),
predicate2=c(phrase1="the Magic Word?")))
stopifnot(listEquals(namedSearch$subject$phrase2[[1]], list(keyword="the", offset=1)))
stopifnot(listEquals(namedSearch$verb$phrase1[[1]], list(keyword="is", offset=1)))
stopifnot(listEquals(namedSearch$predicate1, list()))
stopifnot(listEquals(namedSearch$predicate2$phrase1[[1]], list(keyword="the", offset=1)))
stopifnot(listEquals(namedSearch$predicate2$phrase1[[2]], list(keyword="Magic", offset=5)))
stopifnot(listEquals(namedSearch$predicate2$phrase1[[3]], list(keyword="Word", offset=11)))
# named search of multiple texts in a single list with keyword grouping and aminoacid alphabet
# * all matches to a keyword are accessed by name
# * non-matched keywords are dropped
proteins = c(protein1="PEPTIDEPEPTIDEDADADARARARARAKEKEKEKEPEPTIDE",
protein2="DERPADERPAPEWPEWPEEPEERAWRAWWARRAGTAGPEPTIDEKESEQUENCE")
peptides = c("PEPTIDE", "DERPA", "SEQUENCE", "KEKE", "PEPPIE")
peptideSearch = AhoCorasickSearch(peptides, proteins, alphabet="aminoacid", groupByKeyword=T)
stopifnot(listEquals(peptideSearch$PEPTIDE, list(list(keyword="protein1", offset=1),
list(keyword="protein1", offset=8),
list(keyword="protein1", offset=37),
list(keyword="protein2", offset=38))))
stopifnot(listEquals(peptideSearch$DERPA, list(list(keyword="protein2", offset=1),
list(keyword="protein2", offset=6))))
stopifnot(listEquals(peptideSearch$SEQUENCE, list(list(keyword="protein2", offset=47))))
stopifnot(listEquals(peptideSearch$KEKE, list(list(keyword="protein1", offset=29),
list(keyword="protein1", offset=31),
list(keyword="protein1", offset=33))))
stopifnot(listEquals(peptideSearch$PEPPIE, NULL))
# grouping by keyword without text names: offsets are given without reference to the text
names(proteins) = NULL
peptideSearch = AhoCorasickSearch(peptides, proteins, groupByKeyword=T)
stopifnot(listEquals(peptideSearch$PEPTIDE, list(1, 8, 37, 38)))
stopifnot(listEquals(peptideSearch$DERPA, list(1, 6)))
stopifnot(listEquals(peptideSearch$SEQUENCE, list(47)))
stopifnot(listEquals(peptideSearch$KEKE, list(29, 31, 33)))

Resources