Display a list of matrices without row and column names

Display a list of matrices without row and column names - r

I want to display a list of matrices (not a single matrix, as has been asked about elsewhere) without the small [1,] and [,1] row and column indicators.
For example, given myList:
myList <- list(matrix(c(1,2,3,4,5,6), nrow = 2), matrix(c(1,2,3,4,5,6), nrow = 3))
names(myList) <- c("This is the first matrix:", "This is the second matrix:")
I'm looking for some function myFunction() that will output:
> myFunction(myList)
$`This is the first matrix:`
1 3 5
2 4 6
$`This is the second matrix:`
1 4
2 5
3 6
It would be even better if it could eliminate the $... around the list names so that it would display:
This is the first matrix:
1 3 5
2 4 6
This is the second matrix:
1 4
2 5
3 6
After reading all the related questions, I've tried
myList %>% lapply(print, row.names = F)
myList %>% lapply(prmatrix, collab = NULL, rowlab = NULL)
myList %>% lapply(write.table, sep = " ", row.names = F, col.names = F)
But none work as intended.

So you are just missing the headers? How about something like
library(purrr) #for walk2()
print_with_name <- function(mat, name) {
cat(name,"\n")
write.table(mat, sep = " ", row.names = F, col.names = F)
}
myList %>% walk2(., names(.), print_with_name)

Related

Assign() to specific indices of vectors, vectors specified by string names

I'm trying to assign values to specific indices of a long list of vectors (in a loop), where each vector is specified by a string name. The naive approach
testVector1 <- c(0, 0, 0)
vectorName <- "testVector1"
indexOfInterest <- 3
assign(x = paste0(vectorName, "[", indexOfInterest, "]"), value = 1)
doesn't work, instead it creates a new vector "testVector1[3]" (the goal was to change the value of testVector1 to c(0, 0, 1)).
I know the problem is solvable by overwriting the whole vector:
temporaryVector <- get(x = vectorName)
temporaryVector[indexOfInterest] <- 1
assign(x = vectorName, value = temporaryVector)
but I was hoping for a more direct approach.
Is there some alternative to assign() that solves this?
Similarly, is there a way to assign values to specific elements of columns in data frames, where both the data frames and columns are specified by string names?

If you must do this you can do it with eval(parse():
valueToAssign <- 1
stringToParse <- paste0(
vectorName, "[", indexOfInterest, "] <- ", valueToAssign
)
eval(parse(text = stringToParse))
testVector1
# [1] 0 0 1
But this is not recommended. Better to put the desired objects in a named list, e.g.:
testVector1 <- c(0, 0, 0)
dat <- data.frame(a = 1:5, b = 2:6)
l <- list(
testVector1 = testVector1,
dat = dat
)
Then you can assign to them by name or index:
vectorName <- "testVector1"
indexOfInterest <- 3
dfName <- "dat"
colName <- "a"
rowNum <- 3
valueToAssign <- 1
l[[vectorName]][indexOfInterest] <- valueToAssign
l[[dfName]][rowNum, colName] <- valueToAssign
l
# $testVector1
# [1] 0 0 1
# $dat
# a b
# 1 1 2
# 2 2 3
# 3 1 4
# 4 4 5
# 5 5 6

How does the table and $freq function work in R

I want a function for the mode of a vector. Abhiroop Sarkar's answer to This question works, but I want to understand why.
Here is the code
Mode <- function(x){
y <- data.frame(table(x))
y[y$Freq == max(y$Freq),1]
}
1) Wy do we need to put the table in a data frame,
2) in this line
y[y$Freq == max(y$Freq),1]
what does the y$Freq do? is frequency a default columns in the table?

When we convert a table output to data.frame, it creates a two column data.frame
set.seed(24)
v1 <- table(sample(1:5, 100, replace = TRUE))
y <- data.frame(v1)
y
# Var1 Freq
#1 1 19
#2 2 24
#3 3 22
#4 4 16
#5 5 19
The first column 'Var1' is the names of the frequency output from table and the 'Freq' is the actual frequency of those names
y[y$Freq == max(y$Freq), 1]
#[1] 2
#Levels: 1 2 3 4 5
Now, we are subsetting the first column 'Var1' based on the max value of 'Freq', and it returns a vector because of the drop = TRUE in [ when there is a single column
If we want to return a data.frame with single, add drop = FALSE at the end
y[y$Freq == max(y$Freq), 1, drop = FALSE]
# Var1
#2 2
Regarding the default name Freq, it is created from the as.data.frame.table method
as.data.frame.table
function (x, row.names = NULL, ..., responseName = "Freq", stringsAsFactors = TRUE,
sep = "", base = list(LETTERS))
{
ex <- quote(data.frame(do.call("expand.grid", c(dimnames(provideDimnames(x,
sep = sep, base = base)), KEEP.OUT.ATTRS = FALSE, stringsAsFactors = stringsAsFactors)),
Freq = c(x), row.names = row.names))
names(ex)[3L] <- responseName
eval(ex)
}

Splitting string into multiple columns

I have the following string structure as a column value in my data frame :
Y: 10 ,W: 3 , cp: 0.05
the numeric values at each row differ but the structure remains the same. I want to split this string into 3 columns, each containing only the numbers. So there will be one column for Y with the corresponding numeric value, another for W and the last for cp.
I have tried using str_split in the following way:
str_split(string,pattern = " ,",simplify = TRUE )
which obviously gives me:
[,1] [,2] [,3]
[1,] "Y: 40 " "W: 2" " cp: 0.05"
Now, I want to keep only the numbers in each of those columns. Still learning this stuff so not sure how to proceed! Any help is highly appreciated!

There are definitely nicer ways, but this should do the job:
Now updated for string vector with more than one element and bringing it into a matrix with three named columns. Should work on vectors of any length.
library(stringr)
string <- c("Y: 10 ,W: 3 , cp: 0.05","Y: 4 ,W: 9 , cp: 2.2")
vec <- t(str_split(str_split(string, " ,", simplify = TRUE), ": ", simplify = TRUE)[,2])
mtx = matrix(
vec,
nrow = length(vec)/3,
ncol = 3)
colnames(mtx) <- c("Y","W","cp")
mtx

Maybe not the most elegant way but it works:
library(dplyr)
library(stringr)
library(tidyr)
tibble(row = c(1,2),
col = c("Y: 10 ,W: 3 , cp: 0.05","Y: 4 ,W: 9 , cp: 2.2")) %>%
separate(col, into=c("col1", "col2", "col3"), sep = ",") %>%
gather(id, col, -row) %>%
select(-id) %>%
mutate(col = str_trim(col)) %>%
separate(col, into=c("letter", "number"), sep=":") %>%
mutate(number = str_trim(number)) %>%
spread(letter, number) %>%
select(-row)
# A tibble: 2 x 3
cp W Y
<chr> <chr> <chr>
1 0.05 3 10
2 2.2 9 4
Note that I had to add a new column named row to your data frame to make this approach work

I find sometimes reformatting name: value pair data back to an existing structure helps to take care of complexity. In this case, I've formatted to a JSON object, and then used stream_in from jsonlite to deal with the data.
This is nice because it will automatically name the columns, and also takes care of occasions when not every value is represented in every row, or the order changes. E.g.:
txt <- c(
"Y: 10 ,W: 3 , cp: 0.05",
"Y: 6 ,W: 7 , cp: 0.08",
"cp: 0.08, Y: 6 "
)
library(jsonlite)
proctxt <- paste("{", gsub("([A-Za-z]+?):", '"\\1":', txt), "}")
stream_in(textConnection(proctxt))
# Found 3 records...
# Imported 3 records. Simplifying...
# Y W cp
#1 10 3 0.05
#2 6 7 0.08
#3 6 NA 0.08

You can remove all unneeded characters e.g. with gsub and then use strsplit or read.csv.
In base it would look like:
string <- c("Y: 10 ,W: 3 , cp: 0.05", "Y: 10 ,W: 3 , cp: 0.05")
read.csv(text=gsub("[[:alpha:]: ]", "", string), header=FALSE)
# V1 V2 V3
#1 10 3 0.05
#2 10 3 0.05
#or with strsplit
strsplit(gsub("[[:alpha:]: ]", "", string), ",")

Given that your text strings are uniform it should be relatively simple to do, the first part would look like this:
txt <- c(
"Y: 10 ,W: 3 , cp: 0.05",
"Y: 6 ,W: 7 , cp: 0.08",
"Y: 5 ,W: 0 , cp: 0.08"
)
x <- do.call(rbind, strsplit(txt, split = " ,"))
And that would get a matrix of your "label: value"
library(stringr)
y <- matrix(data = str_extract(string = x,
pattern = "([0-9.]+)"),
ncol = ncol(x))
Will get you to text strings that signify your values, if you want, you can just use str_extract() without the matrix call to get your values as a vector, and:
z <- matrix(data = as.numeric(y),
ncol = ncol(x))
will get you your matrix as numerics, which it sounds like is what you're interested in.
all together it's fairly tidy, and without the intermediate matrix call, if don't need that, it would look like:
library(stringr)
txt <- c(
"Y: 10 ,W: 3 , cp: 0.05",
"Y: 6 ,W: 7 , cp: 0.08",
"Y: 5 ,W: 0 , cp: 0.08"
)
x <- do.call(rbind, strsplit(txt, split = " ,"))
y <- str_extract(string = x,
pattern = "([0-9.]+)")
z <- matrix(data = as.numeric(y),
ncol = ncol(x))
With z giving you a matrix of numerics.

I believe this should work:
library(tidyverse)
string <- c("Y: 10 ,W: 3 , cp: 0.05","Y: 4 ,W: 9 , cp: 2.2")
dat <- tibble(x = string) %>%
separate(x,c("Y","W","cp"), sep = " ,")
dat2 <- dat %>% mutate_all(., ~str_remove(.,"\\D+"))

How can I use lapply to remove empty rows from a List of dataframes?

I have a list of data frames. There are 28 data frames in my list. Some of the data frames have empty rows but not all. How can I use lapply or a similar function to remove empty rows from all data frames in the list?
Here is what I have tried which I modified from this question. Unfortunately, this returned only those rows that were empty.
#Get list of all files that will be analyzed
filenames = list.files(pattern = ".csv")
#read in all files in filenames
mydata_run1 = lapply(filenames, read.csv, header = TRUE, quote = "")
#Remove empty rows
mydata_run1 = lapply(mydata_run1, function(x) sapply(mydata_run1, nrow)>0)
Thank you.

I assume you want to remove empty rows when appeared across all columns.
If so,
# remove row data if only all the columns have NA value.
lapply(data, function(x){ x[rowSums(is.na(x)) != ncol(x),]})
output
$df1
A B
1 1 4
3 3 6
$df2
A B
1 1 NA
3 3 6
data
data <- list(
df1 = data.frame(A = c(1,NA,3), B = c(4, NA, 6)),
df2 = data.frame(A = c(1,NA,3), B = c(NA, NA, 6)))

aggregate values in dataframe by partly matching rownames in R

I'm thumbling around with the following problem, but to no evail:
d <- data.frame(value = 1:4, row.names = c("abc", "abcd", "ef", "gh"))
value
abc 1
abcd 2
ef 3
gh 4
l <- nrow(d)
wordmat <- matrix(rep(NA, l^2), l, l, dimnames = list(row.names(d), row.names(d)))
for (i in 1:ncol(wordmat)) {
rid <- agrep(colnames(wordmat)[i], rownames(wordmat), max = 0)
d$matchid[i] <- paste(rid, collapse = ";")
}
# desired output:
(d_agg <- data.frame(value = c(3, 3, 4), row.names = c("abc;abcd", "ef", "gh")))
value
abc;abcd 3
ef 3
gh 4
is there a function for this?

Here's a possible solution that you might be able to modify to suit your needs.
Some notes:
I couldn't figure out how to deal with rownames() directly, particularly in the last stage, so this depends on you being happy with copying your row names as a new variable.
The function below "hard-codes" the variable names, functions, and so on. That is to say, it is not by any means a generalized function, but one which might be useful as you look further into this problem.
Here's the function.
matches <- function(data, ...) {
temp = vector("list", nrow(data))
for (i in 1:nrow(data)) {
temp1 = agrep(data$RowNames[i], data$RowNames, value = TRUE, ...)
temp[[i]] = data.frame(RowNames = paste(temp1, collapse = "; "),
value = sum(data[temp1, "value"]))
}
temp = do.call(rbind, temp)
temp[!duplicated(temp$RowNames), ]
}
Note that the function needs a column called RowNames, so we'll create that, and then test the function.
d <- data.frame(value = 1:4, row.names = c("abc", "abcd", "ef", "gh"))
d$RowNames <- rownames(d)
matches(d)
# RowNames value
# 1 abc; abcd 3
# 3 ef 3
# 4 gh 4
matches(d, max.distance = 2)
# RowNames value
# 1 abc; abcd 3
# 3 abc; abcd; ef; gh 10
matches(d, max.distance = 4)
# RowNames value
# 1 abc; abcd; ef; gh 10

This works for your example but may need tweaking for the real thing:
d <- data.frame(value = 1:4, row.names = c("abc", "abcd", "ef", "gh"))
rowclust <- hclust(as.dist(adist(rownames(d))), method="single")
rowgroups <- cutree(rowclust, h=1.5)
rowagg <- aggregate(d, list(rowgroups), sum)
rowname <- unclass(by(rownames(d), rowgroups, paste, collapse=";"))
rownames(rowagg) <- rowname
rowagg
Group.1 value
abc;abcd 1 3
ef 2 3
gh 3 4