I want to take an existing MxN matrix and create a new M-1xN matrix such that for each columns, the elements are the difference between adjacent row elements of the original matrix.
The idea is the data goes from a cumulative type to a rate type...
eg:
I have (where each column is a specific data series).
[,1] [,2] [,3]
[1,] "17" "16" "15"
[2,] "34" "32" "32"
[3,] "53" "47" "48"
[4,] "72" "62" "63"
[5,] "90" "78" "79"
[6,] "109" "94" "96"
I would like -
[,1] [,2] [,3]
[1,] "17" "16" "17"
[2,] "19" "15" "16"
[3,] "19" "15" "15"
[4,] "18" "16" "16"
[5,] "19" "16" "17"
It's very simple for numerical data (not sure why you have characters):
diff(m)
With the character data, this should work:
diff(matrix(as.numeric(m), dim(m)))
It's a bit strange with the character format, but here's a way:
# Set up the data
mymat<-matrix(c("17","16","15",
"34","32","32",
"53","47","48" ,
"72","62","63",
"90","78","79" ,
"109","94","96"),nrow=6,byrow=TRUE)
Use the apply function with an anonymous function centered around diff.
apply(mymat, 2, function(x)as.character(diff(as.numeric(x))))
# [,1] [,2] [,3]
# [1,] "17" "16" "17"
# [2,] "19" "15" "16"
# [3,] "19" "15" "15"
# [4,] "18" "16" "16"
# [5,] "19" "16" "17"
If the data are numeric to begin with and a numeric result is desired, then the above could be simplified to
apply(mymat, 2, diff)
In case you want to subtract the columns of a matrix (and not the rows), try:
col.diff = t(diff(t(mat)))
Related
This question already has answers here:
How to convert a data frame column to numeric type?
(18 answers)
Closed 2 years ago.
===========================================================================
updates 2/20/2021:
I just look into the problem and found the problem is in the second file,
Sex is originally coded as "F" and "M". When I change it with:
subject.info[subject.info$Sex=='F',]$Sex=1
subject.info[subject.info$Sex=='M',]$Sex=2
the weird thing is R directly changed 1 to "1". And what even more weird is it looks like numeric values when you print it.
My question is why this happens, not how to convert the type of values in a data.frame. I don't understand why someone insists it is a duplicated question, even though similar answers can solve the problem.
=================================================================================
I have two text files. One file is .txt and the other is .csv.
The .csv file has one additional column (with NA values). All the others are the same.
When I read those files with the commands:
subject.info = read.table(paste(data_dir, "outd01_all_subject_info.txt", sep = slash), header=TRUE)
subject.info = read.csv("data_d01_features/outd01_all_subject_info2.txt", sep = ',', header=TRUE, stringsAsFactors = F)
The dataframe subject.info looks the same, but when I run:
as.matrix(subject.info)
All the data in the second file are converted to strings:
SUBJID Sex age trauma_age ptsd
[1,] "600039015048" "2" "11" NA "0"
[2,] "600110937794" "1" "10" NA "0"
[3,] "600129552715" "1" "11" " 8" "2"
[4,] "600210241146" "1" "18" "16" "2"
[5,] "600294620965" "1" "13" NA "0"
[6,] "600409285352" "2" "16" "15" "1"
[7,] "600460215379" "1" "10" NA "0"
[8,] "600547831711" "1" "10" " 6" "1"
[9,] "600561317124" "2" "19" "19" "1"
[10,] "600635899969" "2" "11" NA "0"
[11,] "600647003585" "1" "18" NA "0"
[12,] "600682103788" "1" "18" "15" "2"
[13,] "600689706588" "1" "16" "15" "2"
[14,] "600747749665" "2" " 9" " 7" "1"
This does not happen for the first file:
SUBJID Sex age ptsd
[1,] 600039015048 2 10 0
[2,] 600110937794 1 9 0
[3,] 600129552715 1 10 2
[4,] 600210241146 1 17 2
[5,] 600294620965 1 13 0
[6,] 600409285352 2 15 1
[7,] 600460215379 1 8 0
[8,] 600547831711 1 8 1
[9,] 600561317124 2 19 1
[10,] 600635899969 2 11 0
[11,] 600647003585 1 19 0
[12,] 600682103788 1 18 2
[13,] 600689706588 1 15 2
[14,] 600747749665 2 8 1
Is this due to the NA values? But when I replace NAs with 0 in the second file, the problem still exists:
SUBJID Sex age trauma_age ptsd
[1,] "600039015048" "2" "11" " 0" "0"
[2,] "600110937794" "1" "10" " 0" "0"
[3,] "600129552715" "1" "11" " 8" "2"
[4,] "600210241146" "1" "18" "16" "2"
[5,] "600294620965" "1" "13" " 0" "0"
[6,] "600409285352" "2" "16" "15" "1"
[7,] "600460215379" "1" "10" " 0" "0"
[8,] "600547831711" "1" "10" " 6" "1"
[9,] "600561317124" "2" "19" "19" "1"
[10,] "600635899969" "2" "11" " 0" "0"
[11,] "600647003585" "1" "18" " 0" "0"
[12,] "600682103788" "1" "18" "15" "2"
[13,] "600689706588" "1" "16" "15" "2"
[14,] "600747749665" "2" " 9" " 7" "1"
And this problem still exists if I convert the second file to .csv file, nor if I use read.table, or read.csv2
From the output it look that column trauma_age is of class character which is turning everything into character. Check class(subject.info$trauma_age).
Turn it into numeric by doing :
subject.info$trauma_age <- as.numeric(subject.info$trauma_age)
and then try converting to matrix i.e as.matrix(subject.info).
You can also use type.convert to convert data automatically to respective types without worrying about column names.
subject.info <- type.convert(subject.info, as.is = TRUE)
I want to take an existing MxN matrix and create a new M-1xN matrix such that for each columns, the elements are the difference between adjacent row elements of the original matrix.
The idea is the data goes from a cumulative type to a rate type...
eg:
I have (where each column is a specific data series).
[,1] [,2] [,3]
[1,] "17" "16" "15"
[2,] "34" "32" "32"
[3,] "53" "47" "48"
[4,] "72" "62" "63"
[5,] "90" "78" "79"
[6,] "109" "94" "96"
I would like -
[,1] [,2] [,3]
[1,] "17" "16" "17"
[2,] "19" "15" "16"
[3,] "19" "15" "15"
[4,] "18" "16" "16"
[5,] "19" "16" "17"
It's very simple for numerical data (not sure why you have characters):
diff(m)
With the character data, this should work:
diff(matrix(as.numeric(m), dim(m)))
It's a bit strange with the character format, but here's a way:
# Set up the data
mymat<-matrix(c("17","16","15",
"34","32","32",
"53","47","48" ,
"72","62","63",
"90","78","79" ,
"109","94","96"),nrow=6,byrow=TRUE)
Use the apply function with an anonymous function centered around diff.
apply(mymat, 2, function(x)as.character(diff(as.numeric(x))))
# [,1] [,2] [,3]
# [1,] "17" "16" "17"
# [2,] "19" "15" "16"
# [3,] "19" "15" "15"
# [4,] "18" "16" "16"
# [5,] "19" "16" "17"
If the data are numeric to begin with and a numeric result is desired, then the above could be simplified to
apply(mymat, 2, diff)
In case you want to subtract the columns of a matrix (and not the rows), try:
col.diff = t(diff(t(mat)))
I create a frame from different vectors
pat1<-c(11, 12, 13, 14, 15)
pat2<-c(1:5)
pat3<-seq(1,10, by=2)
pat4<-seq(-5,3, by=2)
pat5<-c(pat1+pat2)
variables<-c("a","b","c","d","e")
mydata<-data.frame(variables, pat1, pat2,pat3, pat4, pat5)
mydata<-t(mydata)
I translate my columns to rows and I get the correct table, but numbers are not doubles
[,1] [,2] [,3] [,4] [,5]
variables "a" "b" "c" "d" "e"
pat1 "11" "12" "13" "14" "15"
pat2 "1" "2" "3" "4" "5"
pat3 "1" "3" "5" "7" "9"
pat4 "-5" "-3" "-1" " 1" " 3"
pat5 "12" "14" "16" "18" "20"
How shall I get doubles for my pat values?
My data lengthens each quarter and varies start dates in different data sets.
I have written a code which runs lots of tests and produces forecasts and is automatically documented with graphs and tables of the data.
Everything works fine until the length of data or start date changes because the data in the tables is either not of a correct length or doesnt match up to the correct quarter.
Here is an example:
Test.data <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27)
Test.dates <- c("08Q1","08Q2","08Q3","08Q4","09Q1","09Q2","09Q3","09Q4","10Q1","10Q2","10Q3","10Q4","11Q1","11Q2","11Q3","11Q4","12Q1","12Q2","12Q3","12Q4","13Q1","13Q2","13Q3","13Q4","14Q1","14Q2","14Q3")
Test <- matrix(c(Test.data,""),nrow=4,byrow=FALSE)
colnames(Test) <- c("'08","'09","'10","'11","'12","'13","'14")
rownames(Test) <- c("Qtr 1", "Qtr 2", "Qtr 3", "Qtr 4")
Which quite nicely gives:
'08 '09 '10 '11 '12 '13 '14
Qtr 1 1 5 9 13 17 21 25
Qtr 2 2 6 10 14 18 22 26
Qtr 3 3 7 11 15 19 23 27
Qtr 4 4 8 12 16 20 24
However then in the next quarter the data will increase by 1 and come up with an error:
Warning message:
In matrix(c(Test.data, ""), nrow = 4, byrow = FALSE) :
data length [29] is not a sub-multiple or multiple of the number of rows [4]
Error in `colnames<-`(`*tmp*`, value = c("'08", "'09", "'10", "'11", "'12", :
length of 'dimnames' [2] not equal to array extent
Or if a data set begins in 08Q2 instead of 08Q1 then the data will all be next to the wrong quarter.
I need to display my data in the specific way of:
'yr1 'yr2 'yr3 ...
Qtr 1
Qtr 2
Qtr 3
Qtr 4
Does anyone have any suggestions on how i can get this to automatically change to fit my data without having to change anything (as very soon it will be joined to a database which will constantly produce results so therefore it cannot be changed each time the data is different lengths)
Thankyou for your help.
Please comment below if you want any more information
Test.data.padded <- as.character(Test.data)
length(Test.data.padded) <- ceiling(length(Test.data.padded) / 4) * 4
Test.data.padded[is.na(Test.data.padded)] <- ""
Test <- matrix(Test.data.padded, nrow=4, byrow=FALSE)
# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#[1,] "1" "5" "9" "13" "17" "21" "25"
#[2,] "2" "6" "10" "14" "18" "22" "26"
#[3,] "3" "7" "11" "15" "19" "23" "27"
#[4,] "4" "8" "12" "16" "20" "24" ""
Then use a regex to extract the years from your Test.dates.
Not sure if this helps.
library(stringi)
n <- 4
l <- length(Test.data)
m1 <- stri_list2matrix(split(Test.data,as.numeric(gl(l,n,l))), fill='')
nm1 <- do.call(rbind,strsplit(Test.dates, '(?<=[0-9])(?=[Q])', perl=TRUE))
dimnames(m1) <- list(unique(nm1[,2]), unique(nm1[,1]))
m1
# 08 09 10 11 12 13 14
#Q1 "1" "5" "9" "13" "17" "21" "25"
#Q2 "2" "6" "10" "14" "18" "22" "26"
#Q3 "3" "7" "11" "15" "19" "23" "27"
#Q4 "4" "8" "12" "16" "20" "24" ""
I have a list named d like this:
V1 is an integer set from 0 - 50
V2 is a real set from 1500 - 1800
V3 is an integer set from 1 - 50
In total, the list contains 5100 objects
Now I would like to plot the histogram of V2, with V1 = a certain number (0, 1 or 10, etc.)
I tried different ways:
factor(d$V1)
qplot(V2, data=d, V1 = 1) --> not successful
d.subset <- subset(d, d$V1 = 1) --> not successful
I really get crazy with this. Check the characteristics of d$V1 but found nothing strange. Anyone could help me out?
is.factor(d$V1)
[1] TRUE
str(d$V1) Factor w/ 51 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 9 10 ...
levels(d$V1)
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14" "15" "16" "17" "18" "19"
[20] "20" "21" "22" "23" "24" "25" "26" "27" "28" "29" "30" "31" "32" "33" "34" "35" "36" "37""38"
[39] "39" "40" "41" "42" "43" "44" "45" "46" "47" "48" "49" "50" "51"
Change the line:
d.subset <- subset(d, d$V1 = 1)
to
d.subset <- subset(d, V1 == 1)
Notice the double equals (==) to denote the logical operator. = is used for assignment and doesn't subset the data frame.
Finally, you might mean to put the 1 in quotes if you want to get the "1" level of the factor (which might not be the same as the numeric 1).
d.subset <- subset(d, V1 == "1")