Accessing list entries by name in a loop - r

I have a list where each entry is a vector of named integers, like this.
a = c(1:5)
b = c(2:6)
names(a) = c("a","b","c","d","e")
names(b) = c("b","c","d","e","f")
mylist <- list("0" = a, "1" = b)
The names inside list "mylist" will always start at "0" and increase by 1, but in my real scenario, the list will not always have two entries (can vary from 2 to a few dozen).
What I am trying to do is loop through all names in my list (in this case "0" and "1"), and access the information corresponding to each index. I've tried
for (x in 0:(length(mylist) - 1)) {
print(mylist$x)
}
and
for (x in 0:(length(mylist) - 1)) {
name = as.character(x)
print(mylist$name)
}
which also does not work. I'm aware that I can use normal list indexing such as
mylist[x]
but this includes the names (e.g. "0", "1") which I am trying to provide producing in the output. To clarify, I want my output to look like
print(mylist$"0")
output:
a b c d e
1 2 3 4 5
and not
print(mylist[1])
output:
$`0`
a b c d e
1 2 3 4 5

If we want to show similar to $, use [[
mylist[[1]]
for (x in 0:(length(mylist) - 1)) {
name = as.character(x)
print(mylist[[name]])
}
#a b c d e
#1 2 3 4 5
#b c d e f
#2 3 4 5 6

Related

Grouping data in R by presence of string within lists of strings

I have a number of variables to group on, some are "normal" grouping variables (i.e. numeric/character strings) and some of which are lists of strings. What I want to do is determine a group based on matches between the "normal" variables & the presence of a match between any string within these lists.
Example data:
dat <- data.frame(cod=c(1:5,1:5,3))
dat$lst <- c(list(c("a","a"),c("b"),c("x","x"),c("r","r"),c("t","t"),c("a"),c("e"),c("f","x"),c("e","q"),c("t"),c("f","f")))
dat <- dat %>% arrange(cod)
#Data
cod lst
1 a, a
1 a
2 b
2 e
3 x, x
3 f, x
3 f, f
4 r, r
4 e, q
5 t, t
5 t
So where lst contains a string that is present in several of the same cods I want to create a group, like this:
# Desired output
cod lst grp
1 a, a 1
1 a 1
2 b 2
2 e 3
3 x, x 4
3 f, x 4
3 f, f 4
4 r, r 5
4 e, q 6
5 t, t 7
5 t 7
In grp 4, all three observations should be linked as there are common list items shared between all three (i.e. one observation has a lst value of c(f,x) which links c(f,f) and c(x,x) within the cod group 3)
I tried to just create a logical TRUE/FALSE column that would show if some cods should be grouped via:
dat %>%
group_by(cod) %>%
mutate(grp = ifelse((lst %in% lst),TRUE,FALSE))
As well as via:
for (i in 1:dim(dat)[1]) {
if (all(any(dat$cod[i] %in% dat$cod) & any(dat$lst[[i]] %in% unlist(dat$list[-i])))) {
dat$grp[i] <- TRUE
} else {
dat$grp[i] <- FALSE
}
}
But nothing has been able to isolate unique groups so far. Any help greatly appreciated!!!! Thanks!

R: How to interpolate a string variable into a vector [duplicate]

I'm trying to set the default value for a function parameter to a named numeric. Is there a way to create one in a single statement? I checked ?numeric and ?vector but it doesn't seem so. Perhaps I can convert/coerce a matrix or data.frame and achieve the same result in one statement? To be clear, I'm trying to do the following in one shot:
test = c( 1 , 2 )
names( test ) = c( "A" , "B" )
The setNames() function is made for this purpose. As described in Advanced R and ?setNames:
test <- setNames(c(1, 2), c("A", "B"))
How about:
c(A = 1, B = 2)
A B
1 2
...as a side note, the structure function allows you to set ALL attributes, not just names:
structure(1:10, names=letters[1:10], foo="bar", class="myclass")
Which would produce
a b c d e f g h i j
1 2 3 4 5 6 7 8 9 10
attr(,"foo")
[1] "bar"
attr(,"class")
[1] "myclass"
The convention for naming vector elements is the same as with lists:
newfunc <- function(A=1, B=2) { body} # the parameters are an 'alist' with two items
If instead you wanted this to be a parameter that was a named vector (the sort of function that would handle arguments supplied by apply):
newfunc <- function(params =c(A=1, B=2) ) { body} # a vector wtih two elements
If instead you wanted this to be a parameter that was a named list:
newfunc <- function(params =list(A=1, B=2) ) { body}
# a single parameter (with two elements in a list structure
magrittr offers a nice and clean solution.
result = c(1,2) %>% set_names(c("A", "B"))
print(result)
A B
1 2
You can also use it to transform data.frames into vectors.
df = data.frame(value=1:10, label=letters[1:10])
vec = extract2(df, 'value') %>% set_names(df$label)
vec
a b c d e f g h i j
1 2 3 4 5 6 7 8 9 10
df
value label
1 1 a
2 2 b
3 3 c
4 4 d
5 5 e
6 6 f
7 7 g
8 8 h
9 9 i
10 10 j
To expand upon #joran's answer (I couldn't get this to format correctly as a comment): If the named vector is assigned to a variable, the values of A and B are accessed via subsetting using the [ function. Use the names to subset the vector the same way you might use the index number to subset:
my_vector = c(A = 1, B = 2)
my_vector["A"] # subset by name
# A
# 1
my_vector[1] # subset by index
# A
# 1

Extracting one row from multiple data tables and combining into a new data table while keeping column names

Three text files are in the same directory ("data001.txt", "data002.txt", "data003.txt"). I write a loop to read each data file and generate three data tables;
for(i in files) {
x <- read.delim(i, header = F, sep = "\t", na = "*")
setnames(x, 2, i)
assign(i,x)
}
So let's say each individual table looks something like this:
var1 var2 var3
row1 2 1 3
I've used rbind to combine all of the tables...
combined <- do.call(rbind, mget(ls(pattern="^data")))
and get something like this:
var1 var2 var3
row1 2 1 3
var1 var2 var3
row1 3 2 4
var1 var2 var3
row1 1 3 5
leaving me with superfluous column names. At the moment I can get around this by just deleting that specific row containing the column names, but it's a bit clunky.
colnames(combined) = combined[1, ] # make the first row the column names
combined <- combined[-1, ] # delete the now-unnecessary first row
toDelete <- seq(1, nrow(combined), 2) # define which rows to be deleted i.e. every second odd row
combined <- combined[ toDelete ,] # delete them suckaz
This does give me what I want...
var1 var2 var3
row1 2 1 3
row1 3 2 4
row1 1 3 5
But I feel like a better way would simply be to extract the values of "row1" as a vector or as a list or whatever, and combine them all together into one data table. I feel like there is a quick and easy way to do this but I haven't been able to find anything yet. I've had a look here and here and here.
One possibility is to take the second row (that I want), and convert it into a matrix (then transpose it to make it a row instead of column!?) and rbind:
data001.txt <- as.matrix(data001.txt[2,])
data001.txt <- t(data001.txt)
combined <- rbind(data001.txt, data002.txt)
This gives me more or less what I want except without the column name headers (e.g. va1, var2, var3).
v1 v2 v3
2 1 3
3 2 4
Any ideas? Would this second method work well if there is some way to add the column names? I feel like it's less clunky than the first method. Thanks for any input :)
edit - solved in answer below.
Figured it out. Converting to data matrix and using set.names from data.table package required. Say you have a range of text data files like the one that follows, and you want to extract just the seventh column (the one with the numbers, not letters), and combine them together in their own data table including the row names:
chemical1 a b c d e 1 g h i j k l m
chemical2 a b c d e 2 g h i j k l m
chemical3 a b c d e 3 g h i j k l m
chemical4 a b c d e 4 g h i j k l m
chemical5 a b c d e 5 g h i j k l m
setting row.names = 1 and header = F.
setwd("directory")
files <- list.files(pattern = "data") # take all files with 'data' in their name
for(i in files) {
x <- read.delim(i, row.names = 1, header = F, sep = "\t", na = "*")
setnames(x, 6, i) # if the data you want is in column six. Sets data file name as the column name.
x <- as.matrix(x[6]) # just take the sixth column with the numeric data (delete everything else)
x <- t(x) # transform (if you want..)
assign(i,x)
}
combined <- do.call(rbind, mget(ls(pattern="^data"))) # combine the data matrices into one table
write.table(combined, file="filename.csv", sep=",", row.names=T, col.names = NA)

R $ operator is invalid for atomic vectors

I have a dataset where one of the columns are only "#" sign. I used the following code to remove this column.
ia <- as.data.frame(sapply(ia,gsub,pattern="#",replacement=""))
However, after this operation, one of the integer column I had changed to factor.
I wonder what happened and how can i avoid that. Appreciate it.
A more correct version of your code might be something like this:
d <- data.frame(x = as.character(1:5),y = c("a","b","#","c","d"))
> d[] <- lapply(d,gsub,pattern = "#",replace = "")
> d
x y
1 1 a
2 2 b
3 3
4 4 c
5 5 d
But as you'll note, this approach will never actually remove the offending column. It's just replacing the # values with empty character strings. To remove a column of all # you might do something like this:
d <- data.frame(x = as.character(1:5),
y = c("a","b","#","c","d"),
z = rep("#",5))
> d[,!sapply(d,function(x) all(x == "#"))]
x y
1 1 a
2 2 b
3 3 #
4 4 c
5 5 d
Surely if you want to remove an offending column from a data frame, and you know which column it is, you can just subset. So, if it's the first column:
df <- df[,-1]
If it's a later column, increment up.

Create a numeric vector with names in one statement?

I'm trying to set the default value for a function parameter to a named numeric. Is there a way to create one in a single statement? I checked ?numeric and ?vector but it doesn't seem so. Perhaps I can convert/coerce a matrix or data.frame and achieve the same result in one statement? To be clear, I'm trying to do the following in one shot:
test = c( 1 , 2 )
names( test ) = c( "A" , "B" )
The setNames() function is made for this purpose. As described in Advanced R and ?setNames:
test <- setNames(c(1, 2), c("A", "B"))
How about:
c(A = 1, B = 2)
A B
1 2
...as a side note, the structure function allows you to set ALL attributes, not just names:
structure(1:10, names=letters[1:10], foo="bar", class="myclass")
Which would produce
a b c d e f g h i j
1 2 3 4 5 6 7 8 9 10
attr(,"foo")
[1] "bar"
attr(,"class")
[1] "myclass"
The convention for naming vector elements is the same as with lists:
newfunc <- function(A=1, B=2) { body} # the parameters are an 'alist' with two items
If instead you wanted this to be a parameter that was a named vector (the sort of function that would handle arguments supplied by apply):
newfunc <- function(params =c(A=1, B=2) ) { body} # a vector wtih two elements
If instead you wanted this to be a parameter that was a named list:
newfunc <- function(params =list(A=1, B=2) ) { body}
# a single parameter (with two elements in a list structure
magrittr offers a nice and clean solution.
result = c(1,2) %>% set_names(c("A", "B"))
print(result)
A B
1 2
You can also use it to transform data.frames into vectors.
df = data.frame(value=1:10, label=letters[1:10])
vec = extract2(df, 'value') %>% set_names(df$label)
vec
a b c d e f g h i j
1 2 3 4 5 6 7 8 9 10
df
value label
1 1 a
2 2 b
3 3 c
4 4 d
5 5 e
6 6 f
7 7 g
8 8 h
9 9 i
10 10 j
To expand upon #joran's answer (I couldn't get this to format correctly as a comment): If the named vector is assigned to a variable, the values of A and B are accessed via subsetting using the [ function. Use the names to subset the vector the same way you might use the index number to subset:
my_vector = c(A = 1, B = 2)
my_vector["A"] # subset by name
# A
# 1
my_vector[1] # subset by index
# A
# 1

Resources