Multiple replacement by matching values within one row - r

Maybe a little bit silly question, but I can't manage to solve my problem.
I have a table with some codes, where some rows contains few codes separated by space:
| Codes |
|-------------|
| 12.12 |
| 12.12 12.13 |
| 12.11 12.13 |
| 12.10 |
I have to match this code with values from another table
| Code | Value |
|-------|-------|
| 12.10 | AA |
| 12.11 | BB |
| 12.12 | CC |
| 12.13 | DD |
to get the following result (desired separator is comma, but it doesn't really matter):
| Codes |
|-------|
| CC |
| CC,DD |
| BB,DD |
| AA |
I have tried to achieve result like this:
dataframe1$Codes <- dataframe2$values[match(unlist(strsplit(dataframe1 $Codes)) ,dataframe2$Code)]
But I get error: replacement has X rows, data has Y

Your data:
df <- data.frame(Codes=c("12.12","12.12 12.13","12.11 12.13","12.10"),
stringsAsFactors=F)
vals <- data.frame(Code=c("12.10","12.11","12.12","12.13"),
Value=c("AA","BB","CC","DD"),
stringsAsFactors=F)
I use dplyr and iterators:
library(dplyr)
library(iterators)
Make a nested list of Codes in df:
temp <- lapply(iter(df,by="row"),function(x) unlist(strsplit(x," ")))
Match df$Codes to vals$Code, grab paired vals$Value, and paste and convert to data frame:
df1 <- lapply(iter(temp),function(x) paste0(vals$Value[vals$Code %in% x],collapse=",")) %>%
do.call(rbind,.) %>%
as.data.frame() %>%
rename(Codes=V1)
Output
Codes
1 CC
2 CC,DD
3 BB,DD
4 AA

Related

Adding a variable name for multiple-answer table expss in R / creating a variable to capture multiple answers

I want to add a variable name for the multiple-answer question Q6 which consist with 12 columns (Q6_1 to Q6_12) adding a label as follows do not give me the intended result. it adds a total_row column. I just need a label to indicate this is the table for Q6.
Alternatively if you know a way to create single variable to capture all the multiple answers, Please let me know
banner %>%
tab_cells(mrset(Q6_1 %to% Q6_12, lablel="Q6_test")) %>%
tab_stat_cpct() %>%
tab_pivot() %>%
tab_sort_desc()
You have a typo in the your code. lablel should be label:
library(expss)
mtcars %>%
tab_cells(mrset(am, cyl, label = "am+cyl")) %>%
tab_stat_cpct() %>%
tab_pivot()
# | | | #Total |
# | ------ | ------------ | ------ |
# | am+cyl | 0 | 59.4 |
# | | 1 | 40.6 |
# | | 4 | 34.4 |
# | | 6 | 21.9 |
# | | 8 | 43.8 |
# | | #Total cases | 32.0 |

R: Subset Large Data Frame with Multiple Conditions

I have a large data frame with 12 million rows and 5 columns. I want to subset the large data frame with multiple conditions. I need to do this multiple times with different criteria, so I created a Look-Up Table and a for loop.
The code below loops through and subsets the large data frame, saving each iteration as an list within a list. After the loop completes, I combined the lists into a data frame.
My current set-up functions, but it is painfully slow (about 15 minutes for 8 loops). Subsetting is actually taking more time than it took to calculate the mean and SD for the 12 million-row table!
Any advice on how to speed this up?
>scaled
| chr | site | Average_CPMn | SD_CPMn |
|------|------|--------------|---------|
| chrI | 1 | 0.071 | 0.070 |
| chrI | 2 | 0.120 | 0.111 |
| chrI | 3 | 0.000 | 0.000 |
| chrI | 4 | 0.000 | 0.000 |
| chrI | 5 | 0.000 | 0.000 |
| chrI | 6 | 0.156 | 0.056 |
...12,000,000 rows
>genes.df
| Gene | Chromosome | Meta_Start | Meta_Stop |
|---------|------------|------------|-----------|
| YGL234W | chrVII | 55982 | 59390 |
| YGR061C | chrVII | 611389 | 616465 |
| YMR120C | chrXIII | 507002 | 509780 |
| YLR359W | chrXII | 843782 | 846230 |
scaled <- read_rds("~/Desktop/scaled.rds")
subset_list = list()
for (i in 1:nrow(genes.df)) {
subset <- scaled %>%
dplyr::filter(chr == genes.df$Chromosome[i] & site >= genes.df$Meta_Start[i] & site <= genes.df$Meta_Stop[i]) %>%
dplyr::mutate(Gene = genes.df$Gene[i])
subset_list[[i]] <- subset
#combine gene-list into single dataframe
counts_subset <- as.data.frame(do.call(rbind, subset_list)) %>%
left_join(genes.df, by = "Gene")
You havne't shared the data/sample so it is difficult to demostrate, however, it is suggested to use semi_join (if you want to subset only) or left_join (if want to mutate instead) somewhat like this in tidyverse
scaled %>% semi_join(genes.df %>% pivot_longer(c(Meta_start, Meta_stop)) %>%
group_by(Gene, Chromosome) %>%
complete(value = seq(min(value), max(value), 1)) %>%
ungroup %>% select(-name), by = c('chr' = 'Chromosome', 'site' = 'value'))

R, get entires from one column based on values of another column in R

I am trying to get column entries as a list that match a list of entries from data frame
Showing what I am trying to do:
Dataframe named Tepo
| | name | shortcut |
| -------- | -------------- | ----------|
| 1 | Apples | A |
| 2 | Bannans | B |
| 3 | oranges | O |
| 4 | Carrots | C |
| 5 | Mangos | M |
| 6 | Strawberies | S |
I have a list FruitList as chr
>FruitList
>[1] "Bannas" "Carrots" "Mangos"
And I would like to get a list, shortcutList, of the corresponding columns:
>shortcutList
>[1] "B" "C" "M"
My attempt:
shortcutList <- tepo$shorcut[tepo$name == FruiteList[]]
However, I don't get the desired list output.
Thanks for the help
Use %in% :
shortcutList <- tepo$shortcut[tepo$name %in% FruitList]

How do I merge 2 dataframes without a corresponding column to match by?

I'm trying to use the Merge() function in RStudio. Basically I have two tables with 5000+ rows. They both have the same amount of rows. Although there is no corresponding Columns to merge by. However the rows are in order and correspond. E.g. The first row of dataframe1 should merge with first row dataframe2...2nd row dataframe1 should merge with 2nd row dataframe2 and so on...
Here's an example of what they could look like:
Dataframe1(df1):
+-------------------------------------+
| Name | Sales | Location |
+-------------------------------------+
| Rod | 123 | USA |
| Kelly | 142 | CAN |
| Sam | 183 | USA |
| Joyce | 99 | NED |
+-------------------------------------+
Dataframe2(df2):
+---------------------+
| Sex | Age |
+---------------------+
| M | 23 |
| M | 33 |
| M | 31 |
| F | 45 |
+---------------------+
NOTE: this is a downsized example only.
I've tried to use the merge function in RStudio, here's what I've done:
DFMerged <- merge(df1, df2)
This however increases both the rows and columns. It returns 16 rows and 5 columns for this example.
What am I missing from this function, I know there is a merge(x,y, by=) argument but I'm unable to use a column to match them.
The output I would like is:
+----------------------------------------------------------+
| Name | Sales | Location | Sex | Age |
+----------------------------------------------------------+
| Rod | 123 | USA | M | 23 |
| Kelly | 142 | CAN | M | 33 |
| Sam | 183 | USA | M | 31 |
| Joyce | 99 | NED | F | 45 |
+-------------------------------------+--------------------+
I've considering making extra columns in each dataframes, says row# and match them by that.
You could use cbind:
cbind(df1, df2)
If you want to use merge you could use:
merge(df1, df2, by=0)
You could use:
cbind(df1,df2)
This will necessarily work with same number of rows in two data frames

Converting comma separated list to dataframe

If I have a list similar to x <- c("Name,Age,Gender", "Rob,21,M", "Matt,30,M"), how can I convert to a dataframe where Name, Age, and Gender become the column headers.
Currently my approach is,
dataframe <- data.frame(matrix(unlist(x), nrow=3, byrow=T))
which gives me
matrix.unlist.user_data...nrow...num_rows..byrow...T.
1 Name,Age,Gender
2 Rob,21,M
3 Matt,30,M
and doesn't help me at all.
How can I get something which resembles the following from the list mentioned above?
+---------------------------------------------+
| name | age | gender |
| | | |
+---------------------------------------------+
| | | |
| | | |
| ... | ... | ... |
| | | |
| | | ++
+---------------------------------------------+
| | | |
| ... | ... | ... |
| | | |
| | | |
+---------------------------------------------+
We paste the strings into a single string with \n and use either read.csv or read.table from base R
read.table(text=paste(x, collapse='\n'), header = TRUE, stringsAsFactors = FALSE, sep=',')
Alternatively,
data.table::fread(paste(x, collapse = "\n"))
Name Age Gender
1: Rob 21 M
2: Matt 30 M

Resources