rollmean fill NAs with original value - r
I followed this example to do a rolling mean rollmin in R similar to zoo package rollmax
But the first few are filled with NA's. How can I fill the NA's with the original value so that I don't lose datapoints?
We may use coalesce with the original vector to replace the NA with that corresponding non-NA element from original vector
library(dplyr)
library(zoo)
coalesce(rollmeanr(x, 3, fill = NA), x)
If it is a data.frame
ctd %>%
group_by(station) %>%
mutate(roll_mean_beam = coalesce(rollmeanr(beam_coef,
k = 5, fill = NA), beam_coef))
data
x <- 1:10
1) Using the original values seems a bit bizarre. Taking the rolling minimum of 1:10 using a width of 3 would give
1 2 1 2 3 4 5 6 7 8
I think what you really want is to apply min to however many points are available so that in this example we get
1 1 1 2 3 4 5 6 7 8
Now rollapplyr with partial=TRUE will use whatever number of points are available if fewer than width=3 exist at that point. At the first point only one point is available so it returns min(x[1]). At the second only two points are available so it returns min(x[1:2]). For all the rest it can use three points. Only zoo is used.
library(zoo)
x <- 1:10
rollapplyr(x, 3, min, partial = TRUE)
## [1] 1 1 1 2 3 4 5 6 7 8
2) The above seems more logical than filling the first two points with the first two input values but if you really wanted to do that anyways then simply prefix the series with the original values using c or use one of the other alternatives shown below. Only zoo is used.
c(x[1:2], rollapplyr(x, 3, min))
## [1] 1 2 1 2 3 4 5 6 7 8
pmin(rollapplyr(x, 3, min, fill = max(x)), x)
## [1] 1 2 1 2 3 4 5 6 7 8
replace(rollapplyr(x, 3, min, fill = NA), 1:2, x[1:2])
## [1] 1 2 1 2 3 4 5 6 7 8
Min <- function(x) if (length(x) < 3) tail(x, 1) else min(x)
rollapplyr(x, 3, Min, partial = TRUE)
## [1] 1 2 1 2 3 4 5 6 7 8
Related
R - Shift specified columns using minimum value into positive values
I'm looking for an easy way to add the minimum value for each column inside my dataframe. This feels like a common thing, but I haven't been able to find any good answers yet...maybe I'm missing something obvious. Let's say I've got two columns (in reality I have close to 100) with positive and negative numbers. w <- c(9, 9, 9, 9) x <- c(-2, 0, 1, 3) y <- c(-1, 1, 3, 4) z <- as.data.frame(cbind(w, x, y)) w x y 1 9 -2 -1 2 9 0 1 3 9 1 3 4 9 3 4 I want z to look like this after a transformation for only x and y columns [,2:3] w x y 1 9 0 0 2 9 2 2 3 9 3 4 4 9 5 5 Does that make sense?
library(dplyr) dplyr::mutate(z, across(c(x, y), ~ . + abs(min(.)))) w x y 1 9 0 0 2 9 2 2 3 9 3 4 4 9 5 5 You can also do by column position rather than column name by changing c(x,y) to 2:3 or c(2:3, 5) for non-sequential column positions.
Depends exactly what you mean and what you want to happen if there aren't negative values. No matter the values, this will anchor the minimum at 0, but you should be able to adapt it if you want something slightly different. z[] = lapply(z, function(col) col - min(col)) z # x y # 1 0 0 # 2 2 2 # 3 3 4 # 4 5 5 As a side note, as.data.frame(cbind(x, y)) is bad - if you have a mix of numeric and character values, cbind() will convert everything to character. It's shorter and better to simplify to data.frame(x, y).
Do you want z[] <- lapply(z, function(columnValues) columnValues + abs(min(columnValues)))
How to repeat the indices of a vector based on the values of that same vector?
Given a random integer vector below: z <- c(3, 2, 4, 2, 1) I'd like to create a new vector that contains all z's indices a number of times specified by the value corresponding to that element of z. To illustrate this. The desired result in this case should be: [1] 1 1 1 2 2 3 3 3 3 4 4 5 There must be a simple way to do this.
You can use rep and seq to repeat the indices of a vector based on the values of that same vector. seq to get the indices and rep to repeat them. rep(seq(z), z) # [1] 1 1 1 2 2 3 3 3 3 4 4 5
Starting with all the indices of the vector z. These are given by: 1:length(z) Then these elements should be repeated. The number of times these numbers should be repeated is specified by the values of z. This can be done using a combination of the lapply or sapply function and the rep function: unlist(lapply(X = 1:length(z), FUN = function(x) rep(x = x, times = z[x]))) [1] 1 1 1 2 2 3 3 3 3 4 4 5 unlist(sapply(X = 1:length(z), FUN = function(x) rep(x = x, times = z[x]))) [1] 1 1 1 2 2 3 3 3 3 4 4 5 Both alternatives give the same result.
R - Cut non-zero values
I have a time series data in a data table format (let's say it has columns "date" and "y"), and I would like to cut the non-zero values of y into quartiles by date, so that each quartile gets the label 1-4, and the zero values to have a label of 0. So I know that if I just wanted to do this for all values of y, I would just run: dt <- dt %>% group_by(date) %>% mutate(quartile = cut(y, breaks = 4, labels = (1:4))) But I can't figure out how to do it to get labels 0-4, with 0 allocated to 0-values of y, and 1-4 being the quartiles in the non-zero values. Edit: To clarify, what I want to do is the following: for each date, I would like to divide the values of y in that date into 5 groups: 1) y=0, 2) bottom 25% of y (in that date), 3) 2nd 25% of y, 3) 3rd 25% of y, 4) the top 25% of y. Edit 2: So I have found 2 more solutions for this: dt[,quartile := cut(y, quantile(dt[y>0]$y, probs = 0:4/4), labels = (1:4)), by = date] and dt %>% group_by(date) %>% mutate(quartile = findInterval(y, quantile(dta[y>0]$y, probs= 0:4/4))) But what both of these seem to do is to first calculate the break points for the entire data and then cut the data by date. But I want the break points to be calculated by date, since obs distribution can be different in different dates.
You can pass the output of quantile to the breaks argument of cut. By default, quantile will produce quartile breaks. x <- rpois(100,4) table(x) x 0 1 2 3 4 5 6 7 8 9 10 12 1 7 17 19 17 18 12 5 1 1 1 1 cut(x,breaks=quantile(x),labels=1:4) [1] 2 2 2 1 2 1 1 2 3 3 1 4 1 4 1 [16] 2 4 2 4 2 3 1 4 1 2 2 1 1 2 2 [31] 1 2 2 3 4 1 4 2 2 1 2 4 4 3 1 [46] 3 1 1 3 3 2 4 2 2 1 2 2 4 1 1 [61] 1 2 2 4 4 3 3 2 1 1 3 2 3 2 3 [76] 2 4 2 <NA> 2 3 2 4 2 1 4 4 3 4 1 [91] 2 4 3 2 2 3 4 4 3 2 Levels: 1 2 3 4 Note that the minimum value is excluded by default. If you want your ranges to be computed including zero, the zeros will be NA's and you can use this to your advantage and use is.na to treat this differently afterwards. However, if you want to exclude the zero's before computing the breaks, you will need to reduce the minimum break value slightly to ensure all values are given a label. You can do this by using quantile(x[x>0])-c(1e-10,rep(0,4)) for example. The zeros will again appear as NA's in this case.
I'm admittedly not sure what you mean by "cutting the non-zero values of y into quartiles by date", and I'm afraid I don't have enough reputation to ask. If 'date' is an actual date column, and you mean, "the new variable 'quartile' should indicate what part of the year y occurred in, assuming y isn't 0, in which case it should be 0", I'd do it like this: library(dplyr) library(lubridate) # create example dt <- data.frame(y = c(0, 1, 3, 4), date = c("01-02-18", "01-06-18", "01-12-16", "01-04-17")) dt <- dt %>% ## change 'date' to an actual date mutate(date = as_date(date)) %>% ## extract the quarter mutate(quartile = quarter(date)) %>% ## replace all quarters with 0 where y was 0 mutate(quartile = if_else(y == 0, 0, as.double(quartile)))` EDIT: I think I understand the problem now. This is probably a little verbose, but I think it does what you want: library(dplyr) dt <- tibble(y = c(20, 30, 40, 20, 30, 40, 0), date = c("01-02-16", "01-02-16", "01-02-16", "01-08-18", "01-08-18", "01-08-18", "01-08-18")) new_dt <- dt %>% # filter out all cases where y is greater than 0 filter(y > 0) %>% # group by date group_by(date) %>% # cut the y values per date mutate(quartile = cut(y, breaks = 4, labels = c(1:4))) dt <- dt %>% # take the original dt, add in the newly calculated quartiles full_join(new_dt, by = c("y", "date")) %>% # replace the NAs by 0 mutate(quartile = ifelse (is.na(quartile), 0, quartile))
Zoo::Rollmax How to shorten width to prevent errors
I have 10 days of values, and for each day I want to know the max of the previous 4 days. If there aren't 4 days worth of values, then I want the max of the last 3 days, etc. Code example: set.seed(131) Index <- 1:10 Val <- c(sample(10, 10, replace = T)) df = data.frame(Index, Val) dfoo = df %>% mutate(Lag1 = lag(Val, 1, default = 0), #get last days value Last4Max = rollmax(Lag1, 4, partial = T, fill = 0, align = "right")) #get max of last 4 days This works for all but for day 2/3 since there aren't 4 values in Lag1 (day 1 should be 0/NA because there's no "previous" day). Index Val Lag1 Last4Max 1 1 3 0 0 2 2 2 3 0 3 3 3 2 0 4 4 4 3 3 5 5 9 4 4 6 6 6 9 9 7 7 6 6 9 8 8 3 6 9 9 9 4 3 9 10 10 10 4 6 So Last4Max should be 3 for index 2/3, and 0/NA for 1. Is there a way to change the width size to account for having width>rownumbers? My alternative is to create 4 variables for each lag (with default = 0) and then take the max of all 4. I know this would work but it seems clunky, and it'd limit me if I wanted to quickly do max of last 10 days on a bigger dataset. Thanks
1) Note that: as per ?rollmax it does not have a partial argument; however, we can use rollapply or rollapplyr with a partial argument and specify FUN = max. rollapplyr (and also rollmaxr) with an r on the end defaults to align = "right" allowing one to avoid writing that argument out the width argument can specify a one-component list of offsets so to specify that the prior 4 elements are to be used we can specify width = list(-seq(4)) eliminating the need for a separate lag column. Putting all these together we get: rollapplyr(Val, list(-seq(4)), max, partial = TRUE, fill = 0) ## [1] 0 3 3 3 4 9 9 9 9 6 2) Another way to do this is to use a width of 5 but not use the last element when taking the maximum. In this case we don't need fill = 0 since it is able to process each component of Val leaving nothing to fill. Max <- function(x) if (length(x) > 1) max(head(x, -1)) else 0 rollapplyr(Val, 5, Max, partial = TRUE) 2a) If we knew that all elements of Val were non-negative then we could alternately use this shorter definition for Max: Max <- function(x) max(head(x, -1), 0)
R rearrange data
I have a bunch of texts written by the same person, and I'm trying to estimate the templates they use for each text. The way I'm going about this is: create a TermDocumentMatrix for all the texts take the raw Euclidean distance of each pair cut out any pair greater than X distance (10 for the sake of argument) flatten the forest return one example of each template with some summarized stats I'm able to get to the point of having the distance pairs, but I am unable to convert the dist instance to something I can work with. There is a reproducible example at the bottom. The data in the dist instance looks like this: The row and column names correspond to indexes in the original list of texts which I can use to do achieve step 5. What I have been trying to get out of this is a sparse matrix with col name, row name, value. col, row, value 1 2 14.966630 1 3 12.449900 1 4 13.490738 1 5 12.688578 1 6 12.369317 2 3 12.449900 2 4 13.564660 2 5 12.922848 2 6 12.529964 3 4 5.385165 3 5 5.830952 3 6 5.830952 4 5 7.416198 4 6 7.937254 5 6 7.615773 From this point I would be comfortable cutting out all pairs greater than my cutoff and flattening the forest, i.e. returning 3 templates in this example, a group containing only document 1, a group containing only document 2 and a third group containing documents 3, 4, 5, and 6. I have tried a bunch of things from creating a matrix out of this and then trying to make it sparse, to directly using the vector inside of the dist class, and I just can't seem to figure it out. Reproducible example: tdm <- matrix(c(1,0,0,0,0,0,1,0,0,0,0,0,0,0,1,1,1,1,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,1,1,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,1,1,0,1,0,0,0,0,1,0,1,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,2,0,0,0,0,0,1,0,0,0,0,0,3,1,2,2,2,3,2,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,1,2,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,4,1,1,1,1,1,0,0,1,1,1,1,0,1,0,0,0,0,0,0,1,1,1,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,1,1,1,0,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,1,2,0,0,1,0,1,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,0,2,0,0,0,0,0,1,0,0,0,0,0,1,0,1,0,1,1,0,1,1,1,1,0,1,0,0,0,0,0,0,1,1,1,1,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,1,1,0,1,0,0,1,1,1,1,0,1,0,1,0,0,2,0,0,0,0,0,1,0,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,3,1,1,1,1,1,0,0,0,0,0,0,0,1,1,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,1,1,1,1,0,0,1,1,1,1,0,0,0,1,0,0,2,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,1,3,1,1,1,1,0,1,0,0,0,0,1,2,0,1,1,0,0,0,0,1,0,0,1,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,1,0,1,1,1,1,0,1,0,0,0,0,0,0,0,1,0,0,1,1,1,1,1,1,0,0,0,0,1,0,0,1,0,1,0,0,0,0,1,1,1,1,0,1,0,0,0,0,0,0,0,1,0,0,0,0,1,1,1,0,1,0,0,0,0,0,1,1,1,2,1,1,1,0,0,0,0,1,2,2,1,1,1,1,0,1,0,0,0,0,0,1,0,0,0,0,0,1,1,1,1,1,1,1,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,1,1,1,1,0,2,0,0,0,0,0,0,1,1,1,1,0,1,0,0,0,0,0,1,0,0,0,0,1,0,0,0,1,0,0,0,0,0,1,0,2,0,2,2,3,2,1,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,1,0,1,0,0,0,0,0,0,1,1,1,1,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,2,0,1,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,1,1,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,2,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,1,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,2,1,1,1,1,1,0,1,0,0,0,0,1,1,0,0,0,0,1,0,0,0,1,0,0,1,1,1,1,1,1,0,0,0,0,0,1,0,0,0,0,1,0,1,1,1,1,1,0,0,1,1,1,0,0,1,0,0,0,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,1,2,1,1,1,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,1,0,0,0,1,1,0,2,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,3,0,1,1,1,1,0,0,1,0,1,1,1,0,0,0,0,0,1,0,0,0,0,0,4,2,4,6,4,3,1,0,1,2,1,1,0,1,0,0,0,0,2,0,0,0,0,0,0,1,1,1,1,0,0,0,0,1,0,0,0,1,0,0,0,0,0,1,0,0,0,2,0,0,0,0,1,0,0,0,0,0,1,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,1,1,1,1,1,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,1,0,0,1,0,0,0,0,0,1,0,0,0,0,2,1,2,2,2,2,1,0,1,2,1,1,0,1,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,1,1,1,1,1,0,0,0,0,0,0,2,2,2,2,2,2,3,3,4,5,3,1,2,1,1,1,1,1,1,0,0,0,0,3,3,0,0,1,1,0,1,0,0,0,0), nrow=6) rownames(tdm) <- 1:6 colnames(tdm) <- paste("term", 1:229, sep="") tdm.dist <- dist(tdm) # I'm stuck turning tdm.dist into what I have shown
A classic approach to turn a "matrix"-like object to a [row, col, value] "data.frame" is the as.data.frame(as.table(.)) route. Specifically here, we need: subset(as.data.frame(as.table(as.matrix(tdm.dist))), as.numeric(Var1) < as.numeric(Var2)) But that includes way too many coercions and creation of a larger object only to be subset immediately. Since dist stores its values in a "lower.tri"angle form we could use combn to generate the row/col indices and cbind with the "dist" object: data.frame(do.call(rbind, combn(attr(tdm.dist, "Size"), 2, simplify = FALSE)), c(tdm.dist)) Also, "Matrix" package has some flexibility that, along its memory efficiency in creating objects, could be used here: library(Matrix) tmp = combn(attr(tdm.dist, "Size"), 2) summary(sparseMatrix(i = tmp[2, ], j = tmp[1, ], x = c(tdm.dist), dims = rep_len(attr(tdm.dist, "Size"), 2), symmetric = TRUE)) Additionally, among different functions that handle "dist" objects, cutree(hclust(tdm.dist), h = 10) #1 2 3 4 5 6 #1 2 3 3 3 3 groups by specifying the cut height.
That's how I've done a very similar thing in the past using dplyr and tidyr packages. You can run the chained (%>%) script row by row to see how the dataset is updated step by step. tdm <- matrix(c(1,0,0,0,0,0,1,0,0,0,0,0,0,0,1,1,1,1,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,1,1,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,1,1,0,1,0,0,0,0,1,0,1,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,2,0,0,0,0,0,1,0,0,0,0,0,3,1,2,2,2,3,2,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,1,2,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,4,1,1,1,1,1,0,0,1,1,1,1,0,1,0,0,0,0,0,0,1,1,1,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,1,1,1,0,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,1,2,0,0,1,0,1,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,0,2,0,0,0,0,0,1,0,0,0,0,0,1,0,1,0,1,1,0,1,1,1,1,0,1,0,0,0,0,0,0,1,1,1,1,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,1,1,0,1,0,0,1,1,1,1,0,1,0,1,0,0,2,0,0,0,0,0,1,0,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,3,1,1,1,1,1,0,0,0,0,0,0,0,1,1,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,1,1,1,1,0,0,1,1,1,1,0,0,0,1,0,0,2,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,1,3,1,1,1,1,0,1,0,0,0,0,1,2,0,1,1,0,0,0,0,1,0,0,1,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,1,0,1,1,1,1,0,1,0,0,0,0,0,0,0,1,0,0,1,1,1,1,1,1,0,0,0,0,1,0,0,1,0,1,0,0,0,0,1,1,1,1,0,1,0,0,0,0,0,0,0,1,0,0,0,0,1,1,1,0,1,0,0,0,0,0,1,1,1,2,1,1,1,0,0,0,0,1,2,2,1,1,1,1,0,1,0,0,0,0,0,1,0,0,0,0,0,1,1,1,1,1,1,1,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,1,1,1,1,0,2,0,0,0,0,0,0,1,1,1,1,0,1,0,0,0,0,0,1,0,0,0,0,1,0,0,0,1,0,0,0,0,0,1,0,2,0,2,2,3,2,1,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,1,0,1,0,0,0,0,0,0,1,1,1,1,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,2,0,1,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,1,1,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,2,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,1,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,2,1,1,1,1,1,0,1,0,0,0,0,1,1,0,0,0,0,1,0,0,0,1,0,0,1,1,1,1,1,1,0,0,0,0,0,1,0,0,0,0,1,0,1,1,1,1,1,0,0,1,1,1,0,0,1,0,0,0,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,1,2,1,1,1,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,1,0,0,0,1,1,0,2,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,3,0,1,1,1,1,0,0,1,0,1,1,1,0,0,0,0,0,1,0,0,0,0,0,4,2,4,6,4,3,1,0,1,2,1,1,0,1,0,0,0,0,2,0,0,0,0,0,0,1,1,1,1,0,0,0,0,1,0,0,0,1,0,0,0,0,0,1,0,0,0,2,0,0,0,0,1,0,0,0,0,0,1,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,1,1,1,1,1,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,1,0,0,1,0,0,0,0,0,1,0,0,0,0,2,1,2,2,2,2,1,0,1,2,1,1,0,1,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,1,1,1,1,1,0,0,0,0,0,0,2,2,2,2,2,2,3,3,4,5,3,1,2,1,1,1,1,1,1,0,0,0,0,3,3,0,0,1,1,0,1,0,0,0,0), nrow=6) rownames(tdm) <- 1:6 colnames(tdm) <- paste("term", 1:229, sep="") tdm.dist <- dist(tdm) library(dplyr) library(tidyr) tdm.dist %>% as.matrix() %>% # update dist object to a matrix data.frame() %>% # update matrix to a data frame setNames(nm = 1:ncol(.)) %>% # update column names mutate(names1 = 1:nrow(.)) %>% # use rownames as a variable gather(names2, value , -names1) %>% # reshape data filter(names1 <= names2) # keep the values only once # names1 names2 value # 1 1 1 0.000000 # 2 1 2 14.966630 # 3 2 2 0.000000 # 4 1 3 12.449900 # 5 2 3 12.449900 # 6 3 3 0.000000 # 7 1 4 13.490738 # 8 2 4 13.564660 # 9 3 4 5.385165 # 10 4 4 0.000000 # 11 1 5 12.688578 # 12 2 5 12.922848 # 13 3 5 5.830952 # 14 4 5 7.416198 # 15 5 5 0.000000 # 16 1 6 12.369317 # 17 2 6 12.529964 # 18 3 6 5.830952 # 19 4 6 7.937254 # 20 5 6 7.615773 # 21 6 6 0.000000