Hi I'm trying to load some dates as column names of my data frame but they will only appear as numbers (11595 for example) even if I'm forcing them with as.Date
Is there another way to do this? Thanks!
dates <- seq(as.Date("2000-1-1"), as.Date("2018-10-1"), by="3 months") -1
d.test <- data.frame(matrix(0, ncol = 8, nrow = 8))
for (i in 1:8) {
colnames(d.test)[i] <- as.Date(dates[i], "yyyy-mm-dd")
}
Set dates with as.character, also avoid for loops when not necessary. See ?colnames or ?rownames, the value assigned should be a character vector
dates <- seq(as.Date("2000-1-1"), as.Date("2018-10-1"), by="3 months") -1
d.test <- data.frame(matrix(0, ncol = 8, nrow = 8))
colnames(d.test) <- as.character(dates)[1:ncol(d.test)]
See ?colnames. Values for column names should be a character vector and will be coerced using as.character(). Column names are just labels, not variable types.
Related
Is there a way to create a data frame by supplying row and column name from two different files.
File for rows:-
sample1_44849
sample2_56479
sample3_98764
sample4_54321
and so on ...
File for columns:-
e000456.c1
e000567.c1
e003456.c1
e000786.c1
and similarly 22000 more ....
this data frame will contain value 0 or 1.
The easiest way would be to first create a matrix with dimnames corresponding to the names from the files and then convert it to a data.frame.
r.names <- read.table(text = "sample1_44849
sample2_56479
sample3_98764
sample4_54321")
c.names <- read.table(text = "e000456.c1
e000567.c1
e003456.c1
e000786.c1")
res <- matrix(NA, nrow = nrow(r.names), ncol = nrow(c.names),
dimnames = list(unlist(r.names), unlist(c.names)))
res <- data.frame(res)
This code solves the original question. The comment at the end is a different matter.
I've been working on a process to create all possible combinations of unique integers for lengths 1:n. I found the nCr function (combn function in the combinat package to be useful here).
Once all unique occurrences are iterated, they are appended to a consolidation table that contains any possible length+combination of the digits 1:n. A subset of the final table's relevant column (one record) looks like this (column is named String and the subset table f1):
c(1,3,4,5,9,10)
I need to select these columns from a secondary data source (df) one at a time (I am going to loop through this table), so my logic was to use this code:
df[,f1$String]
However, I get a message that says that undefined columns are selected, but if I copy and paste the contents of the cell such as:
df[,c(1, 3, 4, 5, 9, 10)]
it works fine ... I've tried all I can think of at this point; if anyone has some insight it would be greatly appreciated.
Code to reproduce is:
library(combinat)
library(data.table)
library(plyr)
rm(list=ls())
NCols=10
NRows=10
myMat<-matrix(runif(NCols*NRows), ncol=NCols)
XVars <- as.data.frame(myMat)
colnames(XVars) <- c("a","b","c","d","e","f","g","h","i","j")
x1 <- as.data.frame(colnames(XVars[1:ncol(XVars)]))
colnames(x1) <- "Independent.Variable"
setDT(x1)[, Index := .GRP, by = "Independent.Variable"]
colClasses = c("character", "numeric", "numeric")
col.names = c("String", "r!", "n!")
Combination <- read.table(text = "", colClasses = colClasses, col.names = col.names)
for(i in 1:nrow(x1)){
x2<- as.data.frame(combn(nrow(x1),i))
for (i in 1:ncol(x2)){
x3 <- paste("c(",paste(x2[1:nrow(x2),i], collapse = ", "), ")", sep="")
x3 <- as.data.frame(x3)
colnames(x3) <- "String"
x3 <- mutate(x3, "r!" = nrow(x2))
x3 <- mutate(x3, "n!" = nrow(x1))
Combination <- rbind(Combination, x3)
}
}
setDT(Combination)[, Index := .GRP, by = c("String", "r!", "n!")]
f1 <- Combination[717,]
f1$String <- as.character(f1$String)
## reference to data frame
myMat[,(f1$String)]
## pasted element
myMat[, c(1, 3, 4, 5, 9, 10)]
f1$String is the string "c(1, 3, 4, 5, 9, 10)". When you use myMat[,(f1$String)], R will look for the column with name "c(1, 3, 4, 5, 9, 10)". To get column numbers 1,3,4,5,9,10, you have to parse the string to an R expression and evaluate it first:
myMat[,eval(parse(text=f1$String))]
As #user3794498 noticed, you set f1$String as.character() so you cannot use is to get the columns you want.
You can change the way you define f1 or extract the column numbers from f1$String. Something like this should also work (load stringr before) myMat[, f1$String %>% str_match_all("[0-9]+") %>% unlist %>% as.numeric].
Let's say there is a matrix - 'mat' which has 115 columns.
There is another matrix - 'res_mat' which has a column having 38 column names of the previous matrix 'mat'.
I want to create a third matrix - 'fin_mat' which will be a subset of the first matrix 'mat' having the columns which are stored as values in the column of the second matrix 'res_mat'.
Or in other words, I have a list of column names which is stored in a variable. How can I create a subset of the first matrix containing the columns which are stored in a variable?
Doesn't seem very difficult. If I understand your question correctly, something like this will do it.
# First make up some matrix
mat <- matrix(1:24, ncol = 6)
colnames(mat) <- paste0("Col", 1:6)
# These would be the columns to keep
res_mat <- matrix(c("Col1", "Col3", "Col4"), ncol = 1)
fin_mat <- mat[, res_mat[, 1]]
fin_mat
One way would be to use the dplyr package with the functions "select" and "one_of". One_of allows to select columns based on their names (in a string format).
Here is a simple example with the iris table, in which I extract the columns names "Sepal.Length" and "Sepal.Width".
library(dplyr)
mat1 <- iris
mat2 <- data.frame(names = c("Sepal.Length", "Sepal.Width")) %>%
mutate(names = as.character(names)) #make sure the names are characters
results <- mat1 %>% select(one_of(mat2$names))
It can be done pretty easily. In the code below, I ma creating a dataframe mat and another one res_mat. mat has the data and res_mat has a single column named- select_these_columns. the mat dataframe has 10 columns named a,b,c,d,e...,j. the select_thes_colscolumn of res_mat has five rows with entries a,b,c,d,e. ALl that needs to be done is pass the res_mat$select_these_cols to mat
a <- (matrix(rnorm(1000), nrow = 100, ncol = 10))
mat <- as.data.frame(a)
names(mat) <- letters[1:10]
res_mat <- data.frame(x = letters[1:5])
names(res_mat) <- 'select_these_cols'
fin_mat <- mat[res_mat$select_these_cols] # subsetting operation
Why is my last step converting the data frame to a vector? I want to keep the first 6000 observations in the data frame key.
set.seed(1)
key <- data.frame(matrix(NA, nrow = 10000, ncol = 1))
names(key) <- "ID"
key$ID <- replicate(10000,
rawToChar(as.raw(sample(c(48:57,65:90,97:122), 8, replace=T))))
key <- unique(key) # still a data frame
key <- key[1:6000,] # no longer a data frame
key1 <- key[1:6000,,drop=F] #should prevent the data.frame from converting to a vector.
According to the documentation of ?Extract.data.frame
drop: logical. If ‘TRUE’ the result is coerced to the lowest
possible dimension. The default is to drop if only one
column is left, but not to drop if only one row is left.
Or, you could use subset, but usually, this is a bit slower. Here the row.names are numbers from 1 to 10000
key2 <- subset(key, as.numeric(rownames(key)) <6000)
is.data.frame(key2)
#[1] TRUE
because,
## S3 method for class 'data.frame'
subset(x, subset, select, drop = FALSE, ...) #by default it uses drop=F
It's being coerced to a vector basically because it can be and that's the default coercion when there's only 1 element. R is trying to be "helpful".
This will keep it as a dataframe:
set.seed(1)
key <- data.frame(matrix(NA, nrow = 10000, ncol = 1))
names(key) <- "ID"
key$ID <- replicate(10000,
rawToChar(as.raw(sample(c(48:57,65:90,97:122), 8, replace=T))))
key <- unique(key)
key <- as.data.frame(key[1:6000,]) # still a data frame
I've created a R script that calculates the percentage of missing values in each column of a data frame, and then removes the columns that exceed a preset threshold. The column names need to be maintained.
The names are maintained when there is more than one column in the data frame after column deletion, but not when there is only one column.
Code of when column names stay the same
df <- data.frame(A=rnorm(10, 10, 1), B=rep(NA, 10), C=rnorm(10, 10, 1))
threshold <- 80
pmiss <- function(x) {
ifelse(sum(is.na(x))/length(x)*100 > threshold, TRUE, FALSE)
}
temp <- sapply(df, pmiss)
deletecols <- names(temp[temp==TRUE])
df <- as.data.frame(df[,!(names(df) %in% deletecols)])
names(df) #prints
[1] "A" "C"
However, define df as
df <- data.frame(A=rnorm(10, 10, 1), B=rep(NA, 10))
and
names(df) #prints
[1] "df[, !(names(df) %in% deletecols)]"
Does anybody know why the column names are not kept when there is only one column?
You been bitten by an R FAQ. Add ,drop = FALSE to your data frame subsetting (and you notice as a side-effect that you no longer need as.data.frame.)