Proxy for Excel's split cell in R - r

Apologies in advance if anyone finds this to be a duplicate to a question answered before. I haven't found anything so here it is:
I have a 3x3 contingency table I made in RStudio (I am specifying this as a data frame below but I can also produce this as as.matrix, if that'll work better):
mat.s=data.frame("WT(H)"=11,"DEL(H)"=2)
mat.s[2,1]=13
mat.s[2,2]=500369
row.names(mat.s)=c("DEL(T)", "WT(T)")
mat.s=cbind(mat.s, Total=rowSums(mat.s))
mat.s=rbind(mat.s, Total=colSums(mat.s))
which looks like:
kable(mat.s)
| | WT.H.| DEL.H.| Total|
|:------|-----:|------:|------:|
|DEL(T) | 11| 2| 13|
|WT(T) | 13| 500369| 500382|
|Total | 24| 500371| 500395|
However, if I wanted to split a cell in this table (like you can do in Excel) into two, how would I do that? So I'd like to get something like the following when I render the document with kable:
| | WT.H.| DEL.H.| Total|
|:------|-----:|------:|------:|
|DEL(T) | S D | 2| 13|
| | 8 3 | | |
|WT(T) | 13| 500369| 500382|
|Total | 24| 500371| 500395|
So that when I want to calculate something from this table, I can call the split 8 or 3. Sorry if this is something very simple and easy to do! Still learning. Thanks!

Related

subset rows between two rows containing specific values?

I have multiple data frames with the generic layout below. The strings of text vary in length from a few words to multiple sentences. The title strings on each data frame all vary slightly but they all share a word in common (for example, all of the TitleBs on each data frame share the word “code” in common and all TitleCs share the word “write” in common).
|element|NumbID|String |
|-—————-|-————-|-———————-|
|header |1 |TitleA |
|para |2 |TxtStrng |
|header |3 |TitleB |
|header |4 |Subtit1 |
|para |5 |TxtStrng |
|header |6 |Subtit2 |
|para |7 |TxtStrng |
|header |8 |TitleC |
I am trying to figure out how to write a code that can be used on all the data frames and will allow me to extract all the rows starting at TitleB and just before TitleC, as in the example below.
|element|NumbID|String |
|:————-:|:———-:| :————--:|
|header |3 |TitleB |
|header |4 |Subtit1 |
|para |5 |TxtStrng |
|header |6 |Subtit2 |
|para |7 |TxtStrng |
I thought maybe I could use subset() in some way to do this but I’m really struggling to figure out how to make it work.
So, you need a way to identify TitleB and TitleC strings. From you're description, I'll use grepl("code", String) for TitleB and grepl("write", String) for TitleC.
Then we need to identify rows where a TitleB has already occurred but a TitleC hasn't: we can use cumsum for this to generate a cumulative count of occurrences:
result = subset(
your_data,
cumsum(grepl("code", String)) > 0 &
cumsum(grepl("write", String)) == 0
)
If you need more help, please make your example more reproducible, preferably using dput() to share a copy/pasteable version of the data in valid R syntax.

SQLite Versioning. Is it possible to use EXCEPT to show differences between rows where only one column changes?

I'm quite new to SQLite and I'm trying to use an EXCEPT statement in order to compare two tables with very similar data. The data comes from a CSV file I download daily, and within the file new rows are added and deleted, and old rows can have one or more columns change each day. I'm trying to find a way to select rows that have had a column's data change, when I am unable to predict which column's data will change.
Say for example I have:
TABLE contracts:
|ID|Description|Name|Contract Type|
|1 |Plumbing |Bob |Paper |
|2 |Cooking |Ryan|Paper |
|3 |Driving |Eric|Paper |
|4 |Dancing |Emma|Paper |
and:
TABLE updated_contracts:
|ID|Description|Name|Contract Type|
|1 |Hiking |Bob |Paper |
|2 |Cooking |Ryan|Paper |
|3 |Driving |Eric|Paper |
|4 |Dancing |Emma|Digital |
I'd like it to return:
|1 |Hiking |Bob |Paper |
|4 |Dancing |Emma|Digital |
because contract 1 has changed the description and contract 4 has changed the contract type.
Is it possible to do this in SQLite?
One way to do it is with a LEFT join of updated_contracts to contracts where the matching rows are filtered out:
select uc.*
from updated_contracts uc left join contracts c
using(id, Description, Name, `Contract Type`)
where c.id is null
EXCEPT can also be used like this:
select * from updated_contracts
except
select * from contracts
This will work only if the tables have the same number of columns and its advantage is that it compares null values in columns and returns true if they are both null.
See the demo.
Results:
| ID | Description | Name | Contract Type |
| --- | ----------- | ---- | ------------- |
| 1 | Hiking | Bob | Paper |
| 4 | Dancing | Emma | Digital |

Control digits in specific cells

I have a table that looks like this:
+-----------------------------------+-------+--------+------+
| | Male | Female | n |
+-----------------------------------+-------+--------+------+
| way more than my fair share | 2,4 | 21,6 | 135 |
| a little more than my fair share | 5,4 | 38,1 | 244 |
| about my fair share | 54,0 | 35,3 | 491 |
| a littles less than my fair share | 25,1 | 3,0 | 153 |
| way less than my fair share | 8,7 | 0,7 | 51 |
| Can't say | 4,4 | 1,2 | 31 |
| n | 541,0 | 564,0 | 1105 |
+-----------------------------------+-------+--------+------+
Everything is fine but what I would like to do is to show no digits in the last row at all since they show the margins (real cases). Is there any chance in R I can manipulate specific cells and their digits?
Thanks!
You could use ifelse to output the numbers in different formats in different rows, as in the example below. However, it will take some additional finagling to get the values in the last row to line up by place value with the previous rows:
library(knitr)
library(tidyverse)
# Fake data
set.seed(10)
dat = data.frame(category=c(LETTERS[1:6],"n"), replicate(3, rnorm(7, 100,20)))
dat %>%
mutate_if(is.numeric, funs(sprintf(ifelse(category=="n", "%1.0f", "%1.1f"), .))) %>%
kable(align="lrrr")
|category | X1| X2| X3|
|:--------|-----:|-----:|-----:|
|A | 100.4| 92.7| 114.8|
|B | 96.3| 67.5| 101.8|
|C | 72.6| 94.9| 80.9|
|D | 88.0| 122.0| 96.1|
|E | 105.9| 115.1| 118.5|
|F | 107.8| 95.2| 109.7|
|n | 76| 120| 88|
The huxtable package makes it easy to decimal-align the values (see the Vignette for more on table formatting):
library(huxtable)
tab = dat %>%
mutate_if(is.numeric, funs(sprintf(ifelse(category=="n", "%1.0f", "%1.1f"), .))) %>%
hux %>% add_colnames()
align(tab)[-1] = "."
tab
Here's what the PDF output looks like when knitted to PDF from an rmarkdown document:

Combine DataFrame rows into a new column

I am wondering if there is simple way to achieve this in Julia besides iterating over the rows in a for-loop.
I have a table with two columns that looks like this:
| Name | Interest |
|------|----------|
| AJ | Football |
| CJ | Running |
| AJ | Running |
| CC | Baseball |
| CC | Football |
| KD | Cricket |
...
I'd like to create a table where each Name in first column is matched with a combined Interest column as follows:
| Name | Interest |
|------|----------------------|
| AJ | Football, Running |
| CJ | Running |
| CC | Baseball, Football |
| KD | Cricket |
...
How do I achieve this?
UPDATE: OK, so after trying a few things including print_joint and grpby, I realized that the easiest way to do this would be by() function. I'm 99% there.
by(myTable, :Name, df->DataFrame(Interest = string(df[:Interest])))
This gives me my :Interest column as "UTF8String[\"Running\"]", and I can't figure out which method I should use instead of string() (or where to typecast) to get the desired ASCIIString output.

Creating a unique integer on the basis of a string

I have a larger dataset (data.table with approx 9m rows) with a column that I would like to use to aggregate values (min and max etc). The column is a combination of various other columns and has a string based format, like the one below:
string <- "318XXXX | VNSGN | BIER"
To gain some speed in performing tasks, I would like to recode this to a unique integer. Another application that I use on a regular basis to deal with data has a build-in function that transforms a string as the one above in a integer (e.g. 73823). I was wondering whether there is a similar function in R? The idea is that a particular string will always result in the same integer; this will allow it to be used in merging data.tables etc.
Here a little example of the data.table column that I would like to encode in simple integer values:
sample <- c("318XXXX | VNSGN | BIER", "462XXXX | TZZZH | 9905", "462XXXX | TZZZH | 9905",
"462XXXX | TZZZH | 9905", "511XXXX | FAWOR | 336H", "511XXXX | FAWOR | 336H",
"652XXXX | XXXXR | T136", "652XXXX | XXXXR | T136", "672XXXX | BQQSZ | 7777",
"672XXXX | BQQSZ | 7777")
I am hoping to encode the strings into an additional column to the table like the one below; note that the same strings result in the same numbers.
String Number
318XXXX | VNSGN | BIER 19872
462XXXX | TZZZH | 9905 78392
462XXXX | TZZZH | 9905 78392
462XXXX | TZZZH | 9905 78392
511XXXX | FAWOR | 336H 23053
511XXXX | FAWOR | 336H 23053
652XXXX | XXXXR | T136 95832
652XXXX | XXXXR | T136 95832
672XXXX | BQQSZ | 7777 71829
672XXXX | BQQSZ | 7777 71829
The data.table package will create indexes for you without making you handle them explicitly so it would be less work than the approach in the question. See the setkey function in data.table.
Also the sqldf package can use the SQL create index statement as per Examples 4h and 4i on the sqldf home page as can just about any database package.

Resources