I have a data.frame in R and the row.names are a character and I would like them to be numeric. I've tried to find the same issue like here but it doesn't work.
Here is my code:
attr(DF1, "row.names")
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14" "15" "16" "17" "18" "19" "20"
after I do what I linked above:
DF1$id <- as.integer(row.names(DF1))
DF1[order(DF1$id), ]
I get the same result:
attr(DF1, "row.names")
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14" "15" "16" "17" "18" "19" "20"
and I would like the result to be as in with dataframe D2:
attr(DF2, "row.names")
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
From the help page ?rownames it says (emphasis mine)
For a data frame, value for rownames should be a character vector of non-duplicated and non-missing names (this is enforced), and for colnames a character vector of (preferably) unique syntactically-valid names. In both cases, value will be coerced by as.character, and setting colnames will convert the row names to character.
You could make them an integer like this.
df <- data.frame(x = 1:3)
rownames(df) <- as.character(5:7)
attr(df, "row.names")
#> [1] "5" "6" "7"
rownames(df) <- as.integer(rownames(df))
attr(df, "row.names")
#> [1] 5 6 7
Note that row.names will always return a character vector. See ?row.names.
row.names(df)
#> [1] "5" "6" "7"
Related
This question already has answers here:
How to convert a data frame column to numeric type?
(18 answers)
Closed 2 years ago.
===========================================================================
updates 2/20/2021:
I just look into the problem and found the problem is in the second file,
Sex is originally coded as "F" and "M". When I change it with:
subject.info[subject.info$Sex=='F',]$Sex=1
subject.info[subject.info$Sex=='M',]$Sex=2
the weird thing is R directly changed 1 to "1". And what even more weird is it looks like numeric values when you print it.
My question is why this happens, not how to convert the type of values in a data.frame. I don't understand why someone insists it is a duplicated question, even though similar answers can solve the problem.
=================================================================================
I have two text files. One file is .txt and the other is .csv.
The .csv file has one additional column (with NA values). All the others are the same.
When I read those files with the commands:
subject.info = read.table(paste(data_dir, "outd01_all_subject_info.txt", sep = slash), header=TRUE)
subject.info = read.csv("data_d01_features/outd01_all_subject_info2.txt", sep = ',', header=TRUE, stringsAsFactors = F)
The dataframe subject.info looks the same, but when I run:
as.matrix(subject.info)
All the data in the second file are converted to strings:
SUBJID Sex age trauma_age ptsd
[1,] "600039015048" "2" "11" NA "0"
[2,] "600110937794" "1" "10" NA "0"
[3,] "600129552715" "1" "11" " 8" "2"
[4,] "600210241146" "1" "18" "16" "2"
[5,] "600294620965" "1" "13" NA "0"
[6,] "600409285352" "2" "16" "15" "1"
[7,] "600460215379" "1" "10" NA "0"
[8,] "600547831711" "1" "10" " 6" "1"
[9,] "600561317124" "2" "19" "19" "1"
[10,] "600635899969" "2" "11" NA "0"
[11,] "600647003585" "1" "18" NA "0"
[12,] "600682103788" "1" "18" "15" "2"
[13,] "600689706588" "1" "16" "15" "2"
[14,] "600747749665" "2" " 9" " 7" "1"
This does not happen for the first file:
SUBJID Sex age ptsd
[1,] 600039015048 2 10 0
[2,] 600110937794 1 9 0
[3,] 600129552715 1 10 2
[4,] 600210241146 1 17 2
[5,] 600294620965 1 13 0
[6,] 600409285352 2 15 1
[7,] 600460215379 1 8 0
[8,] 600547831711 1 8 1
[9,] 600561317124 2 19 1
[10,] 600635899969 2 11 0
[11,] 600647003585 1 19 0
[12,] 600682103788 1 18 2
[13,] 600689706588 1 15 2
[14,] 600747749665 2 8 1
Is this due to the NA values? But when I replace NAs with 0 in the second file, the problem still exists:
SUBJID Sex age trauma_age ptsd
[1,] "600039015048" "2" "11" " 0" "0"
[2,] "600110937794" "1" "10" " 0" "0"
[3,] "600129552715" "1" "11" " 8" "2"
[4,] "600210241146" "1" "18" "16" "2"
[5,] "600294620965" "1" "13" " 0" "0"
[6,] "600409285352" "2" "16" "15" "1"
[7,] "600460215379" "1" "10" " 0" "0"
[8,] "600547831711" "1" "10" " 6" "1"
[9,] "600561317124" "2" "19" "19" "1"
[10,] "600635899969" "2" "11" " 0" "0"
[11,] "600647003585" "1" "18" " 0" "0"
[12,] "600682103788" "1" "18" "15" "2"
[13,] "600689706588" "1" "16" "15" "2"
[14,] "600747749665" "2" " 9" " 7" "1"
And this problem still exists if I convert the second file to .csv file, nor if I use read.table, or read.csv2
From the output it look that column trauma_age is of class character which is turning everything into character. Check class(subject.info$trauma_age).
Turn it into numeric by doing :
subject.info$trauma_age <- as.numeric(subject.info$trauma_age)
and then try converting to matrix i.e as.matrix(subject.info).
You can also use type.convert to convert data automatically to respective types without worrying about column names.
subject.info <- type.convert(subject.info, as.is = TRUE)
I create a frame from different vectors
pat1<-c(11, 12, 13, 14, 15)
pat2<-c(1:5)
pat3<-seq(1,10, by=2)
pat4<-seq(-5,3, by=2)
pat5<-c(pat1+pat2)
variables<-c("a","b","c","d","e")
mydata<-data.frame(variables, pat1, pat2,pat3, pat4, pat5)
mydata<-t(mydata)
I translate my columns to rows and I get the correct table, but numbers are not doubles
[,1] [,2] [,3] [,4] [,5]
variables "a" "b" "c" "d" "e"
pat1 "11" "12" "13" "14" "15"
pat2 "1" "2" "3" "4" "5"
pat3 "1" "3" "5" "7" "9"
pat4 "-5" "-3" "-1" " 1" " 3"
pat5 "12" "14" "16" "18" "20"
How shall I get doubles for my pat values?
My data lengthens each quarter and varies start dates in different data sets.
I have written a code which runs lots of tests and produces forecasts and is automatically documented with graphs and tables of the data.
Everything works fine until the length of data or start date changes because the data in the tables is either not of a correct length or doesnt match up to the correct quarter.
Here is an example:
Test.data <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27)
Test.dates <- c("08Q1","08Q2","08Q3","08Q4","09Q1","09Q2","09Q3","09Q4","10Q1","10Q2","10Q3","10Q4","11Q1","11Q2","11Q3","11Q4","12Q1","12Q2","12Q3","12Q4","13Q1","13Q2","13Q3","13Q4","14Q1","14Q2","14Q3")
Test <- matrix(c(Test.data,""),nrow=4,byrow=FALSE)
colnames(Test) <- c("'08","'09","'10","'11","'12","'13","'14")
rownames(Test) <- c("Qtr 1", "Qtr 2", "Qtr 3", "Qtr 4")
Which quite nicely gives:
'08 '09 '10 '11 '12 '13 '14
Qtr 1 1 5 9 13 17 21 25
Qtr 2 2 6 10 14 18 22 26
Qtr 3 3 7 11 15 19 23 27
Qtr 4 4 8 12 16 20 24
However then in the next quarter the data will increase by 1 and come up with an error:
Warning message:
In matrix(c(Test.data, ""), nrow = 4, byrow = FALSE) :
data length [29] is not a sub-multiple or multiple of the number of rows [4]
Error in `colnames<-`(`*tmp*`, value = c("'08", "'09", "'10", "'11", "'12", :
length of 'dimnames' [2] not equal to array extent
Or if a data set begins in 08Q2 instead of 08Q1 then the data will all be next to the wrong quarter.
I need to display my data in the specific way of:
'yr1 'yr2 'yr3 ...
Qtr 1
Qtr 2
Qtr 3
Qtr 4
Does anyone have any suggestions on how i can get this to automatically change to fit my data without having to change anything (as very soon it will be joined to a database which will constantly produce results so therefore it cannot be changed each time the data is different lengths)
Thankyou for your help.
Please comment below if you want any more information
Test.data.padded <- as.character(Test.data)
length(Test.data.padded) <- ceiling(length(Test.data.padded) / 4) * 4
Test.data.padded[is.na(Test.data.padded)] <- ""
Test <- matrix(Test.data.padded, nrow=4, byrow=FALSE)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#[1,] "1" "5" "9" "13" "17" "21" "25"
#[2,] "2" "6" "10" "14" "18" "22" "26"
#[3,] "3" "7" "11" "15" "19" "23" "27"
#[4,] "4" "8" "12" "16" "20" "24" ""
Then use a regex to extract the years from your Test.dates.
Not sure if this helps.
library(stringi)
n <- 4
l <- length(Test.data)
m1 <- stri_list2matrix(split(Test.data,as.numeric(gl(l,n,l))), fill='')
nm1 <- do.call(rbind,strsplit(Test.dates, '(?<=[0-9])(?=[Q])', perl=TRUE))
dimnames(m1) <- list(unique(nm1[,2]), unique(nm1[,1]))
m1
# 08 09 10 11 12 13 14
#Q1 "1" "5" "9" "13" "17" "21" "25"
#Q2 "2" "6" "10" "14" "18" "22" "26"
#Q3 "3" "7" "11" "15" "19" "23" "27"
#Q4 "4" "8" "12" "16" "20" "24" ""
I am trying to accomplish the following task to get to matrix d:
d1<-matrix(as.factor(rep(sample(1:10,10,T),5)),ncol=5)
d2<-matrix(as.factor(rep(sample(1:10,10,T),5)),ncol=5)
d3<-matrix(as.factor(rep(sample(1:10,10,T),5)),ncol=5)
d<-cbind(
cbind(d1[,2],d1[,5]),
cbind(d2[,2],d2[,5]),
cbind(d3[,2],d3[,5])
)
But for many matrices d1...dn, say.
More generally I would like to select the same column numbers from a series of matrices and append into a single matrix. The focus of this task is on combining, not creating the matrices. The factor-type column vectors should be preserved.
I thought about something along the lines of
d<-matrix(nrow=10)
dl<-list(d1,d2,d3)
for (i in 1:3){
d<-cbind(d,dl[[i]][,2],dl[[i]][,5])
}
But maybe there is a better way.
You can create a list of your matrices and use do.call and lapply to get what you want:
matList <- list(d1, d2, d3)
do.call(cbind, lapply(matList, function(x) x[, c(2, 5)]))
# [,1] [,2] [,3] [,4] [,5] [,6]
# [1,] "3" "3" "3" "3" "10" "10"
# [2,] "4" "4" "2" "2" "3" "3"
# [3,] "6" "6" "7" "7" "7" "7"
# [4,] "10" "10" "4" "4" "2" "2"
# [5,] "3" "3" "8" "8" "3" "3"
# [6,] "9" "9" "5" "5" "4" "4"
# [7,] "10" "10" "8" "8" "1" "1"
# [8,] "7" "7" "10" "10" "4" "4"
# [9,] "7" "7" "4" "4" "9" "9"
# [10,] "1" "1" "8" "8" "4" "4"
By the way, the data type in your matrix is character, not factor. See the help page at ?matrix where you will find the following:
The method for data frames will return a character matrix if there is only atomic columns and any non-(numeric/logical/complex) column, applying as.vector to factors and format to other non-character columns.
I have a list named d like this:
V1 is an integer set from 0 - 50
V2 is a real set from 1500 - 1800
V3 is an integer set from 1 - 50
In total, the list contains 5100 objects
Now I would like to plot the histogram of V2, with V1 = a certain number (0, 1 or 10, etc.)
I tried different ways:
factor(d$V1)
qplot(V2, data=d, V1 = 1) --> not successful
d.subset <- subset(d, d$V1 = 1) --> not successful
I really get crazy with this. Check the characteristics of d$V1 but found nothing strange. Anyone could help me out?
is.factor(d$V1)
[1] TRUE
str(d$V1) Factor w/ 51 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
levels(d$V1)
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14" "15" "16" "17" "18" "19"
[20] "20" "21" "22" "23" "24" "25" "26" "27" "28" "29" "30" "31" "32" "33" "34" "35" "36" "37""38"
[39] "39" "40" "41" "42" "43" "44" "45" "46" "47" "48" "49" "50" "51"
Change the line:
d.subset <- subset(d, d$V1 = 1)
to
d.subset <- subset(d, V1 == 1)
Notice the double equals (==) to denote the logical operator. = is used for assignment and doesn't subset the data frame.
Finally, you might mean to put the 1 in quotes if you want to get the "1" level of the factor (which might not be the same as the numeric 1).
d.subset <- subset(d, V1 == "1")