How to wrap text within reactive splom lattice plot? - r

Within my shiny app, I am having my users select n number of attributes to plot within my splom() call, which essentially follows these steps:
User chooses a number.
User chooses variables.
Code subsets my data set based on users choices.
Splom is called in a very simple call, similar to: lplot <- splom(data_subset).
However, since the panels within splom() are being labeled based on the attribute name (which is reactive, so I can't label ahead of time) and the names are very long you can't read them.
Does anyone know if there is a way to wrap text within the splom panel outputs? The wrapping would be done on the attribute reactive value, so it wouldn't be done on a string.
This is what my output looks like:

After subsetting in step 3, you can manipulate the names of the data subset to wrap appropriately.
## Subset the data in whatever way shiny does (here's a reproducible example)
irisSub <- subset(iris, select = grepl("Width", names(iris)))
Then
Replace "word" separators with spaces. In your example, you probably just need "_"
names(irisSub) <- gsub("\\.|_", " ", names(irisSub))
Wrap these strings at a fairly narrow width. You might have to experiment with other widths here.
wrapList <- strwrap(names(irisSub), width = 10, simplify = FALSE)
Paste these strings back together with a linebreak (\n) in between and assign them to names of data subset
names(irisSub) <- vapply(wrapList, paste, collapse = "\n", FUN.VALUE = character(1))
splom should respect the line breaks
splom(irisSub)

Related

Search large string for multiple instances if smaller string in r

In R, I have taken a JSON format of test results and converted them to a data frame of 14 variables and 1101 entries. In this test, the user must select squares in a particular order for a correct score. Under one variable, "input," the values are long strings with info on which square was selected and the time it took to select the square.
Ex:
"[{\"selectedSquare\":\"1\",\"tapTime\":\"00:00:00:06\"},
{\"selectedSquare\":\"0\",\"tapTime\":\"00:00:01:02\"},
{\"selectedSquare\":\"3\",\"tapTime\":\"00:00:02:00\"},
{\"selectedSquare\":\"2\",\"tapTime\":\"00:00:02:07\"}]"
Some entries have more than others, some have none.
I need to search each entry for the square a student selected, and output the order into a new column. Using the example above:
1,0,3,2
I have tried to access each entry individually to test functions on using df$input[1], but it returns a factor with 219 levels. I cannot find a way to only access the relevant piece of the input entry.
You can do this by using an appropriate regular expression. Try:
library(dplyr)
library(stringr)
pattern <- "(?<=\")\\d(?=\")" ## regular expression with look arounds
df$new.col <- sapply(df$input, function(x) {str_extract_all(x, pattern)[[1]] %>% paste(collapse = ",")})

Creating Sub Lists from A to Z from a Master List

Task
I am attempting to use better functionality (loop or vector) to parse down a larger list into 26(maybe 27) smaller lists based on each letter of the alphabet (i.e. the first list contains all entries of the larger list that start with the letter A, the second list with the letter B ... the possible 27th list contains all remaining entries that use either numbers of other characters).
I am then attempting to ID which names on the list are similar by using the adist function (for instance, I need to correct company names that are misspelled. e.g. Companyy A needs to be corrected to Company A).
Code thus far
#creates a vector for all uniqueID/stakeholders whose name starts with "a" or "A"
stakeA <- grep("^[aA].*", uniqueID, value=TRUE)
#creates a distance matrix for all stakeholders whose name starts with "a" or "A"
stakeAdist <- (adist(stakeA), ignore.case=TRUE)
write.table(stakeAdist, "test.csv", quote=TRUE, sep = ",", row.names=stakeA, col.names=stakeA)
Explanation
I was able to complete the first step of my task using the above code; I have created a list of all the entries that begin with the letter A and then calculated the "distance" between each entry (appears in a matrix).
Ask One
I can copy and paste this code 26 times and move my way through the alphabet, but I figure that is likely a more elegant way to do this, and I would like to learn it!
Ask Two
To "correct" the entries, thus far I have resorted to writing a table and moving to Excel. In Excel I have to insert a row entry to have the matrix properly align (I suppose this is a small flaw in my code). To correct the entries, I use conditional formatting to highlight all instances where adist is between say 1 and 10 and then have to manually go through the highlights and correct the lists.
Any help on functions / methods to further automate this / better strategies using R would be great.
It would help to have an example of your data, but this might work.
EDIT: I am assuming your data is in a data.frame named df
for(i in 1:26) {
stake <- subset(df, uniqueID==grep(paste0('^[',letters[i],LETTERS[i],'].*'), df$uniqueID, value=T))
stakeDist <- adist(stakeA,ignore.case=T)
write.table(stakeDist, paste0("stake_",LETTERS[i],".csv"), quote=T, sep=',')
}
Using a combination of paste0, and the builtin letters and LETTERS this creates your grep expression.
Using subset, the correct IDs are extracted
paste0 will also create a unique filename for write.table().
And it is all tied together using a for()-loop

Wrap column name text in ggpairs in R

I am using ggpairs and while plotting the matrix, I receive a matrix as follows
As you can see, some of the text length is large and hence the text is not seen completely. Is there anyway that I can wrap the text so that it is visible completely.
Code
ggpairs(df)
I want the text to wrap so that it can be seen something like this
You can use the labeller argument of ggpairs to pass a function to be applied to the facet strip text.
ggplot does have a nice ready function label_wrap_gen() that wrap the long labels.
By default ggpairs use the column names as labels, and those can't contain spaces. label_wrap_gen() need spaces to split the labels on multiple rows.
This is a solution:
library(ggplot2)
library(GGally)
df <- iris
colnames(df) <- make.names(c('Long colname',
'Quite long colname',
'Longer tha usual colname',
'I\'m not even sure this should be a colname',
'The ever longest colname that one should be allowed to use'))
ggpairs(df,
columnLabels = gsub('.', ' ', colnames(df), fixed = T),
labeller = label_wrap_gen(10))

How to separate a text file into columns

This is what my text file looks like:
1241105.41129.97Y317052.03
2282165.61187.63N364051.40
2251175.87190.72Y366447.49
2243125.88150.81N276045.45
328192.89117.68Y295050.51
2211140.81165.77N346053.11
1291125.61160.61Y335048.3
3273127.73148.76Y320048.04
2191132.22156.94N336051.38
3221118.73161.03Y349349.5
2341189.01200.31Y360048.02
1253144.45180.96N305051.51
2251125.19152.75N305052.72
2192137.82172.25N240046.96
3351140.96174.85N394048.09
1233135.08173.36Y265049.82
1201112.59140.75N380051.25
2202128.19159.73N307048.29
2192132.82172.25Y240046.96
3351148.96174.85Y394048.09
1233132.08173.36N265049.82
1231114.59140.75Y380051.25
3442128.19159.73Y307048.29
2323179.18191.27N321041.12
All these values are continuous and each character indicates something. I am unable to figure out how to separate each value into columns and specify a heading for all these new columns which will be created.
I used this code, however it does not seem to work.
birthweight <- read.table("birthweighthw1.txt", sep="", col.names=c("ethnic","age","smoke","preweight","delweight","breastfed","brthwght","brthlngthā€¯))
Any help would be appreciated.
Assuming that you have a clear definition for every column, you can use regular expressions to solve this in no time.
From your column names and example data, I guess that the regular expression that matches each field is:
ethnic: \d{1}
age: \d{1,2}
smoke: \d{1}
preweight: \d{3}\.\d{2}
delweight: \d{3}\.\d{2}
breastfed: Y|N
brthwght: \d{3}
brthlngth: \d{3}\.\d{1,2}
We can put all this together in a regular expression that captures each of these fields
reg.expression <- "(\\d{1})(\\d{1,2})(\\d{1})(\\d{3}\\.\\d{2})(\\d{3}\\.\\d{2})(Y|N)(\\d{3})(\\d{3}\\.\\d{1,2})"
Note: In R, we need to scape "\" that's why we write \d instead of \d.
That said, here comes the code to solve the problem.
First, you need to read your strings
lines <- readLines("birthweighthw1.txt")
Now, we define our regular expression and use the function str_match from the package stringr to get your data into character matrix.
require(stringr)
reg.expression <- "(\\d{1})(\\d{1,2})(\\d{1})(\\d{3}\\.\\d{2})(\\d{3}\\.\\d{2})(Y|N)(\\d{3})(\\d{3}\\.\\d{1,2})"
captured <- str_match(string= lines, pattern= reg.expression)
You can check that the first column in the matrix contains the text matched, and the following columns the data captured. So, we can get rid of the first column
captured <- captured[,-1]
and transform it into a data.frame with appropriate column names
result <- as.data.frame(captured,stringsAsFactors = FALSE)
names(result) <- c("ethnic","age","smoke","preweight","delweight","breastfed","brthwght","brthlngth")
Now, every column in result is of type character, you can transform each of them into other types. For example:
require(dplyr)
result <- result %>% mutate(ethnic=as.factor(ethnic),
age=as.integer(age),
smoke=as.factor(smoke),
preweight=as.numeric(preweight),
delweight=as.numeric(delweight),
breastfed=as.factor(breastfed),
brthwght=as.integer(brthwght),
brthlngth=as.numeric(brthlngth)
)

text in plot from column, first argument of a ";" divided string

one quick question
picture a data frame like
data=data.frame(x=c(1,2,3), y=c(4,5,6), Genes=c("AHS;AKS;AHS","AHS;IO","HU"))
so i want to plot x and y
plot(x,y)
and do the label for the dots like this
text(data$x+0.2,data$y+0.2,labels=data$Genes)
BUT i dont want to use all arguments from the genes column ONLY the first one (e.g. before the ";")
Can u please help me with that?
This is only an example, i have already read my data in with read.delim, so i cannot do a specific "read in" with string separation.
As per my comment, you can use gsub to do this:
gsub('^([A-Z]+);.*$', '\\1', data$Genes)
You could also use strsplit:
unlist(lapply(strsplit(data$Genes, ';'), '[', 1))
But that's yucky...
Its also probably worth mentioning the stringr package which collects a lot of these string munging functions in to a single place with predictable syntax and names.

Resources