Crete a variable "mydata" - A data structure that outputs the following - r

I am struggling to find a way to implement a matrix to my list. $mix$b becomes a character vector. Please help.
This is the task
# Create a variable "mydata" - a data structure with the output presented
# below as comments
# > print(mydata)
# [[1]]
# [1] "Some long text"
#
# [[2]]
# [1] 1 2 3 4 5
#
# $mix
# $mix$a
# [1] "text"
#
# $mix$b
# a b c
# 1 a -2 3
# 2 b 0 4
# 3 c 2 -5
# 4 d 4 77
This is my best attempt
mydata <- list("Some long text", 1:5,
mix = list(a = 'text', b = c(a = "a", b = -2, c = 3)))
mydata
output
[[1]]
[1] "Some long text"
[[2]]
[1] 1 2 3 4 5
$mix
$mix$a
[1] "text"
$mix$b
a b c
"a" "-2" "3"

mydata <- list("Some long text", 1:5,
mix = list(a = 'text',
b = data.frame(a = c("a", "b", "c", "d"),
b = c(-2,0,2,4),
c = c(3,4,-5,77))))
mydata
[[1]]
[1] "Some long text"
[[2]]
[1] 1 2 3 4 5
$mix
$mix$a
[1] "text"
$mix$b
a b c
1 a -2 3
2 b 2 4
3 c 0 -5
4 d 4 77
Here are some example how to extract the 77:
mydata$mix$b[4,3]
mydata[[3]][[2]][4,3]
mydata[["mix"]][["b"]][4,3]
mydata[[3]][["b"]][4,3]
mydata[["mix"]][[2]][4,3]
77

Related

How to unnest a list of lists of data frame in R? [duplicate]

This question already has an answer here:
How to convert from a list of lists to a list in R retaining names?
(1 answer)
Closed 9 years ago.
I have a brief question, I would like to unnest this nested list:
mylist <- list(a = list(A=1, B=5),
b = list(C= 1, D = 2),
c = list(E = 1, F = 3))
Expected result is:
> list(a=c(1, 5), b = c(1, 2), c = c(1, 3))
$a
[1] 1 5
$b
[1] 1 2
$c
[1] 1 3
Any suggestions?
T
Slight variation on everyone else's and keeping it in base:
lapply(mylist, unlist, use.names=FALSE)
## $a
## [1] 1 5
##
## $b
## [1] 1 2
##
## $c
## [1] 1 3
Take a look at llply function from plyr package
> library(plyr)
> llply(mylist, unlist)
$a
A B
1 5
$b
C D
1 2
$c
E F
1 3
If you want to get rid of the names, then try:
> lapply(llply(mylist, unlist), unname)
$a
[1] 1 5
$b
[1] 1 2
$c
[1] 1 3
I think applying unlist() to each elment in your list should give you what you're looking for:
> mylist <- list(a = list(A=1, B=5), b = list(C= 1, D = 2), c = list(E = 1, F = 3))
> mylist2 <- list(a=c(1, 5), b = c(1, 2), c = c(1, 3))
> data.frame(lapply(mylist,unlist))
a b c
A 1 1 1
B 5 2 3
> data.frame(mylist2)
a b c
1 1 1 1
2 5 2 3

Group column according to categories in a list

I have data frame like this:
library(dplyr)
set.seed(5)
# generate sample data
df <- data.frame(value = 1:10,
type = sample(LETTERS, 10))
value type
1 1 B
2 2 K
3 3 O
4 4 Y
5 5 I
6 6 U
7 7 G
8 8 S
9 9 C
10 10 F
I want to group the column "type" according categories defined in a list:
groups <- list(LETTERS[1:7],
LETTERS[8:15],
LETTERS[16:20],
"other")
print(groups)
# [[1]]
# [1] "A" "B" "C" "D" "E" "F" "G"
#
# [[2]]
# [1] "H" "I" "J" "K" "L" "M" "N" "O"
#
# [[3]]
# [1] "P" "Q" "R" "S" "T"
#
# [[4]]
# [1] "other"
The output should be like:
value type group
1 1 B 1
2 2 K 2
3 3 O 2
4 4 Y other
5 5 I 2
6 6 U other
7 7 G 1
8 8 S 3
9 9 C 1
10 10 F 1
My approach works as follows:
# group data
df_grouped <- df %>%
mutate(group = ifelse(type %in% groups[[1]], 1,
ifelse(type %in% groups[[2]], 2,
ifelse(type %in% groups[[3]], 3, "other"))))
Since I have many more groups, I do not like the ifelse loop in the code. It is not easy to maintain the code. Is there any more efficiently way to achieve this?
A simple way to do this would be to convert groups to a data frame using reshape2::melt and perform a left_join:
library(dplyr)
library(tidyr)
library(reshape2)
left_join(df, melt(groups), by = c(type = "value")) %>%
replace_na(list(L1 = "other")) %>%
rename(group = L1)
#> value type group
#> 1 1 B 1
#> 2 2 K 2
#> 3 3 O 2
#> 4 4 Y other
#> 5 5 I 2
#> 6 6 U other
#> 7 7 G 1
#> 8 8 S 3
#> 9 9 C 1
#> 10 10 F 1
A base R method that gives the same result would be
df$group <- sapply(type, function(s) {
i <- which(sapply(groups, function(g) s %in% g))
if(length(i) < 1) "other" else i
}))
We can use enframe with join
library(dplyr)
library(tibble)
library(tidyr)
enframe(groups, value = 'type') %>%
unnest(c(type)) %>%
right_join(df)
Here is a base R option using stack + merge
out <- type.convert(merge(df,stack(setNames(groups,seq_along(groups))),by.x = "type",by.y = "values",all.x = TRUE))
replace(out,is.na(out),"other")[match(df$value,out$value),]
which gives
type value ind
1 B 1 1
6 K 2 2
7 O 3 2
10 Y 4 other
5 I 5 2
9 U 6 other
4 G 7 1
8 S 8 3
2 C 9 1
3 F 10 1
Convert the list to a named vector and use a standard lookup:
df$group = replace(v <- setNames(rep(seq_along(groups), lengths(groups)),
unlist(groups))[df$type], is.na(v), "other")
Another base alternative: The levels of a factor are renamed using a named list:
df$group = factor(df$type)
levels(df$group) = setNames(groups, seq_along(groups))
Now the "other" group is represented by NA. If you wish to change it:
df$group = as.character(df$group)
df$group[is.na(df$group)] = "other"

finding the captial letters in the string

I want find the captial letters in the each string and counting how many are there for each string
for example
t = c("gctaggggggatggttactactGtgctatggactac", "gGaagggacggttactaCgTtatggactacT", "gcGaggggattggcttacG")
ldply(str_match_all(t,"[A-Z]"),length)
when applying the above function my output is
1 4 2
But my desire output is
[1] G -1
[2] G -1
C -1
T -2
[3] G -2
You can extract all capital letters and then compute the frequencies with table:
library(stringr)
lapply(str_extract_all(t, "[A-Z]"), table)
# [[1]]
#
# G
# 1
#
# [[2]]
#
# C G T
# 1 1 2
#
# [[3]]
#
# G
# 2
If you extend docendo's answer to be your exact requested format
lapply(stringr::str_extract_all(t, "[A-Z]"),
function(x) {
x = table(x)
paste(names(x), x, sep = "-")
})
# [[1]]
# [1] "G-1"
#
# [[2]]
# [1] "C-1" "G-1" "T-2"
#
# [[3]]
# [1] "G-2"
and how i would do it in tidyverse
library(tidyverse)
data = data.frame(strings = c("gctaggggggatggttactactGtgctatggactac", "gGaagggacggttactaCgTtatggactacT", "gcGaggggattggcttacG"))
data %>%
mutate(caps_freq = stringr::str_extract_all(strings, "[A-Z]"),
caps_freq = map(caps_freq, function(letter) data.frame(table(letter)))) %>%
unnest()
# strings letters Freq
# 1 gctaggggggatggttactactGtgctatggactac G 1
# 2 gGaagggacggttactaCgTtatggactacT C 1
# 3 gGaagggacggttactaCgTtatggactacT G 1
# 4 gGaagggacggttactaCgTtatggactacT T 2
# 5 gcGaggggattggcttacG G 2

Convert a nested list to a list [duplicate]

This question already has an answer here:
How to convert from a list of lists to a list in R retaining names?
(1 answer)
Closed 9 years ago.
I have a brief question, I would like to unnest this nested list:
mylist <- list(a = list(A=1, B=5),
b = list(C= 1, D = 2),
c = list(E = 1, F = 3))
Expected result is:
> list(a=c(1, 5), b = c(1, 2), c = c(1, 3))
$a
[1] 1 5
$b
[1] 1 2
$c
[1] 1 3
Any suggestions?
T
Slight variation on everyone else's and keeping it in base:
lapply(mylist, unlist, use.names=FALSE)
## $a
## [1] 1 5
##
## $b
## [1] 1 2
##
## $c
## [1] 1 3
Take a look at llply function from plyr package
> library(plyr)
> llply(mylist, unlist)
$a
A B
1 5
$b
C D
1 2
$c
E F
1 3
If you want to get rid of the names, then try:
> lapply(llply(mylist, unlist), unname)
$a
[1] 1 5
$b
[1] 1 2
$c
[1] 1 3
I think applying unlist() to each elment in your list should give you what you're looking for:
> mylist <- list(a = list(A=1, B=5), b = list(C= 1, D = 2), c = list(E = 1, F = 3))
> mylist2 <- list(a=c(1, 5), b = c(1, 2), c = c(1, 3))
> data.frame(lapply(mylist,unlist))
a b c
A 1 1 1
B 5 2 3
> data.frame(mylist2)
a b c
1 1 1 1
2 5 2 3

How to ignore case when using subset in R

How to ignore case when using subset function in R?
eos91corr.data <- subset(test.data,select=c(c(X,Y,Z,W,T)))
I would like to select columns with names x,y,z,w,t. what should i do?
Thanks
If you can live without the subset() function, the tolower() function may work:
dat <- data.frame(XY = 1:5, x = 1:5, mm = 1:5,
y = 1:5, z = 1:5, w = 1:5, t = 1:5, r = 1:5)
dat[,tolower(names(dat)) %in% c("xy","x")]
However, this will return a data.frame with the columns in the order they are in the original dataset dat: both
dat[,tolower(names(dat)) %in% c("xy","x")]
and
dat[,tolower(names(dat)) %in% c("x","xy")]
will yield the same result, although the order of the target names has been reversed.
If you want the columns in the result to be in the order of the target vector, you need to be slightly more fancy. The two following commands both return a data.frame with the columns in the order of the target vector (i.e., the results will be different, with columns switched):
dat[,sapply(c("x","xy"),FUN=function(foo)which(foo==tolower(names(dat))))]
dat[,sapply(c("xy","x"),FUN=function(foo)which(foo==tolower(names(dat))))]
You could use regular expressions with the grep function to ignore case when identifying column names to select. Once you have identified the desired column names, then you can pass these to subset.
If your data are
dat <- data.frame(xy = 1:5, x = 1:5, mm = 1:5, y = 1:5, z = 1:5,
w = 1:5, t = 1:5, r = 1:5)
# xy x mm y z w t r
# 1 1 1 1 1 1 1 1 1
# 2 2 2 2 2 2 2 2 2
# 3 3 3 3 3 3 3 3 3
# 4 4 4 4 4 4 4 4 4
# 5 5 5 5 5 5 5 5 5
Then
(selNames <- grep("^[XYZWT]$", names(dat), ignore.case = TRUE, value = TRUE))
# [1] "x" "y" "z" "w" "t"
subset(dat, select = selNames)
# x y z w t
# 1 1 1 1 1 1
# 2 2 2 2 2 2
# 3 3 3 3 3 3
# 4 4 4 4 4 4
# 5 5 5 5 5 5
EDIT If your column names are longer than one letter, the above approach won't work too well. So assuming you can get your desired column names in a vector, you could use the following:
upperNames <- c("XY", "Y", "Z", "W", "T")
(grepPattern <- paste0("^", upperNames, "$", collapse = "|"))
# [1] "^XY$|^Y$|^Z$|^W$|^T$"
(selNames2 <- grep(grepPattern, names(dat), ignore.case = TRUE, value = TRUE))
# [1] "xy" "y" "z" "w" "t"
subset(dat, select = selNames2)
# xy y z w t
# 1 1 1 1 1 1
# 2 2 2 2 2 2
# 3 3 3 3 3 3
# 4 4 4 4 4 4
# 5 5 5 5 5 5
The 'stringr' library is a very neat wrapper for all of this functionality. It has 'ignore.case' option as follows:
also, you may want to consider using match not subset.

Resources