merging values horizontally in dataframe - r

I am trying to merge two subsets of a dataframe together, but neither merge nor cbind seem to do exactly what I want. So far I have this:
library(psych)
df1<-NULL
df1$a<-c(1,2,3,4,5)
df1$b<-c(4,5,2,6,1)
df1$c<-c(0,9,0,6,3)
df1$gender<-c(0,0,0,1,1)
df1<-as.data.frame(df1)
male<-subset(df1,gender<1)
male<-male[,-c(4)]
female<-subset(df1,gender>=1)
female<-female[,-c(4)]
library(psych)
merge(corr.test(male)$r,corr.test(female)$r)
My end goal is something like this in every cell:
a b c
a 1/1 -0.6546537/-1 0/-1
....

You can concatenate the entries in both matrices, then just fix the dimensions of the new vector to be the same as the corr.test output using dim<-, aka dim(...) <-.
## Concatenate the entries
strs <- sprintf("%s/%s", round(corr.test(male)$r,2),
round(corr.test(female)$r, 2))
## Set the dimensions
dim(strs) <- c(3,3)
## Or (to have the value returned at the same time)
`dim<-`(strs, c(3, 3))
# [,1] [,2] [,3]
# [1,] "1/1" "-0.65/-1" "0/-1"
# [2,] "-0.65/-1" "1/1" "0.76/1"
# [3,] "0/-1" "0.76/1" "1/1"
Another trick, if you want to have those rownames and column names as in the output of corr.test, and not have to worry about dimensions,
## Get one result
ctest <- corr.test(male)$r
## Concatenate
strs <- sprintf("%s/%s", round(ctest,2),
round(corr.test(female)$r, 2))
## Overwrite the matrix with the strings
ctest[] <- strs
ctest
# a b c
# a "1/1" "-0.65/-1" "0/-1"
# b "-0.65/-1" "1/1" "0.76/1"
# c "0/-1" "0.76/1" "1/1"

Related

Creating vectors with different names in for loop using R

I want to create vectors having a different character name in a loop but without indexing them to a number as it is shown in this post.
The names I want for the vectors are already in a vector I created.
Id <- c("Name1","Name2")
My current code creates a list
ListEx<- vector("list", length(Id))
## add names to list entries
names(ListEx) <- Id
## We run the loop
for (i in 1:length(Id)){
ListEx[[i]]<-getbb(Id[i])
}
##The getbb function returns a 2x2 matrix with the maximum
#and the minimum of the latitute/longitude of the city (which name is "Namei")
## Check the values of a matrix inside the list
ListEx[["Name1"]]
I woud like to have Name1 as a vector containing the values of ListEx[["Name1"]]
You are just missing one line of code. I have created a dummy function that creates a 2x2 matrix here:
Id <- c("Name1","Name2")
ListEx<- vector("list", length(Id))
## add names to list entries
names(ListEx) <- Id
f <- function(v) { return(matrix(rnorm(4), nrow=2))}
## We run the loop
for (i in 1:length(Id)){
ListEx[[i]]<-f(Id[i])
}
## Check the values of a matrix inside the list
ListEx[["Name1"]]
list2env(ListEx, globalenv()) # This is the relevant line
Name1
# [,1] [,2]
# [1,] -0.4462014 0.3178423
# [2,] 1.8384113 0.7546780
Name2
# [,1] [,2]
# [1,] -1.3315121 2.1159171
# [2,] 0.2517896 0.1966196

R nrow function returning NULL or 1

This should be very simple but I cannot resolve it. I derive what I think is a matrix from the str_match_all function. It appears this is not the case, despite its appearance. I am able to extract the timestamp value from the first row of the 'matrix' by hard coding the indices in sapply [1,2]. I want to do the same thing for the last entry in the matrix and thought I would easily be able to extract the number of rows in the matrix to do this e.g. [nrow(sm),2], but cannot! See below:
sm <- str_match_all(regex_text, regex_list[row, "regex_pattern"] )
print(sm)
#This gives me this (which is good):
# [[1]]
# [,1] [,2] [,3]
# [1,] "09/08/2014 13:01CONTENT_ACCESS.preparing" "09/08/2014 13:01" "CONTENT_ACCESS.preparing"
# [2,] "09/08/2014 13:06CONTENT_ACCESS.preparing" "09/08/2014 13:06" "CONTENT_ACCESS.preparing"
# [3,] "09/08/2014 13:08CONTENT_ACCESS.preparing" "09/08/2014 13:08" "CONTENT_ACCESS.preparing"
#Get the first timestamp
start_t_stamp <- sapply(sm, function(x) x[1,2])
print(start_t_stamp)
# Also good, I get [1] "09/08/2014 13:01"
#Get the last timestamp. How do extract the 'number of rows' in sm?
#This returns NULL
print(nrow(sm))
#transform to matrix???
t_sm <- t(sm)
#This then prints "[1,] Character,9"
print(t_sm)
#Therfore this prints 1
print(nrow(t_sm))
Thanks in advance...

R: Creating a data frame from list with missing values.

I have a list here that looks like this:
head(h)
[[1]]
[1] "gene=dnaA" "locus_tag=CD630_00010" "location=1..1320"
[[2]]
character(0)
[[3]]
[1] "locus_tag=CD630_05950" "location=719777..720313"
[[4]]
[1] "gene=dnrA" "locus_tag=CD630_00010" "location=50..1320"
I'm having trouble trying to manipulate this list to create a data.frame with three columns. For the rows with missing gene info, I want to list them as "gene=unnamed" and completely remove the empty rows into a matrix as shown:
[,1] [,2] [,3]
[1,] "gene=dnaA" "locus_tag=CD630_00010" "location=1..1320"
[2,] "gene=thrA" "locus_tag=CD630_05950" "location=719777..720313"
[3,] "gene=dnrA" "locus_tag=CD630_00010" "location=50..1320"
This is what I have right now, but I get an error about missing values in the gene column. Any suggestions?
h <- data.frame(h[lapply(h,length)>0])
h <- t(h)
rownames(h) <- NULL
# Data
l <- list(c("gene=dnaA","locus_tag=CD630_00010", "location=1..1320"),
character(0), c("locusc_tag=CD630_05950", "location=719777..720313"),
c("gene=dnrA","locus_tag=CD630_00010" ,"location=50..1320" ))
# Manipulation
n <- sapply(l, length)
seq.max <- seq_len(max(n))
df <- t(sapply(l, "[", i = seq.max))
df <- t(apply(df,1,function(x){
c(x[is.na(x)],x[!is.na(x)])}))
df <- df[rowSums(!is.na(df))>0, ]
df[is.na(df)] <- "gen=unnamed"
Output:
[,1] [,2] [,3]
[1,] "gene=dnaA" "locus_tag=CD630_00010" "location=1..1320"
[2,] "gen=unnamed" "locusc_tag=CD630_05950" "location=719777..720313"
[3,] "gene=dnrA" "locus_tag=CD630_00010" "location=50..1320"
There are a number of methods for binding lists with unequal lengths. See bind_rows from dplyr, rbind.fill from plyr or rbindlist from data.table. Here is using base R
## Sample data
h <- list(letters[1:3],
character(0),
letters[4:5])
out <- do.call(rbind, lapply(h, `length<-`, 3)) # fix lengths and make matrix
out <- out[rowSums(!is.na(out))>0, ] # remove empty rows
out[is.na(out)] <- "gen=unnamed" # rename NA
data.frame(out)
# X1 X2 X3
# 1 a b c
# 2 d e gen=unnamed

Triplicates in R

I have a set of 80 samples, with 2 variables, each measured as triplicate:
sample var1a var1b var1c var2a var2b var2c
1 -169.784 -155.414 -146.555 -175.295 -159.534 -132.511
2 -180.577 -180.792 -178.192 -177.294 -171.809 -166.147
3 -178.605 -184.183 -177.672 -167.321 -168.572 -165.335
and so on. How do I apply functions like mean, sd, se etc. for each row for var1 and var2? Also, the dataset contains NAs. Thanks for bothering with such basic questions
What is your expected result when there are NAs? apply(df[-1], 1, mean) (or whatever function) will work, but it would give NA as a result for the row. If you can replace NA with 0 then you could do df[is.na(df)] <- 0 first, and then the apply function in order to get the results.
One approach could be to reshape your data set. Another one might be just apply a function over rows of a subset of the data frame.
So, for var2X you have:
apply(dat[5:7], 1, function(x){m <- mean(x); s <- sd(x); da <-c(m, s) })
[,1] [,2] [,3]
[1,] -155.78000 -171.750000 -167.076000
[2,] 21.63763 5.573734 1.632348
and for var1X:
apply(dat[2:4], 1, function(x){m <- mean(x); s <- sd(x); da <-c(m, s) })
[,1] [,2] [,3]
[1,] -157.25100 -179.853667 -180.153333
[2,] 11.72295 1.443055 3.520835

Keep column name when select one column from a data frame/matrix in R

In R, when I select only one column from a data frame/matrix, the result will become a vector and lost the column names, how can I keep the column names?
For example, if I run the following code,
x <- matrix(1,3,3)
colnames(x) <- c("test1","test2","test3")
x[,1]
I will get
[1] 1 1 1
Actually, I want to get
test1
[1,] 1
[2,] 1
[3,] 1
The following code give me exactly what I want, however, is there any easier way to do this?
x <- matrix(1,3,3)
colnames(x) <- c("test1","test2","test3")
y <- as.matrix(x[,1])
colnames(y) <- colnames(x)[1]
y
Use the drop argument:
> x <- matrix(1,3,3)
> colnames(x) <- c("test1","test2","test3")
> x[,1, drop = FALSE]
test1
[1,] 1
[2,] 1
[3,] 1
Another possibility is to use subset:
> subset(x, select = 1)
test1
[1,] 1
[2,] 1
[3,] 1
The question mentions 'matrix or dataframe' as an input. If x is a dataframe, use LIST SUBSETTING notation, which will keep the column name and will NOT simplify by default!
`x <- matrix(1,3,3)
colnames(x) <- c("test1","test2","test3")
x=as.data.frame(x)
x[,1]
x[1]`
Data frames possess the characteristics of both lists and matrices: if you subset with a single vector, they behave like lists; if you subset with two vectors, they behave like matrices.
There's an important difference if you select a single
column: matrix subsetting simplifies by default, list
subsetting does not.
source: See http://adv-r.had.co.nz/Subsetting.html#subsetting-operators for details

Resources