Generate an image with specific dimensions from a data frame in R - r

I have a data frame in R with the following dimensions [15750,93]. I want to construct an image using this data such that there are 3 row coordinates and 31 column coordinates in the image. Each column in the data frame corresponds to data from one coordinate position in the image. The columns in the data frame have been arranged based on their respective coordinates in the following manner [1,1], [2,1], [3,1], [1,2], [2,2], [3,2] ......... [1,31],[2,31],[3,31]
To generate the image, for each column I would like to have an average of all values, a sum of all values and the highest value in each column. This way there will be exactly one value corresponding to a coordinate. And, with the 3 variations, I should get three types of images - average, sum and highest value.
Can someone help me in generating an overall image using this data or can guide me using data with smaller dimensions?
Some demo data below:
Dimensions of the data frame are [11, 15]
0 0 0 0 0 46 0 0 0 0 0 0 0 78 0
0 734 0 0 0 0 932 0 0 56 0 0 0 0 0
0 0 0 115 0 0 0 0 0 0 64 0 0 0 0
0 67 0 0 0 45 0 0 0 0 0 546 0 12 0
0 0 0 0 65 5 56 0 54 0 0 0 0 0 0
667 0 430 0 0 0 0 456 0 0 787 0 0 467 0
0 0 0 0 54 0 0 0 0 0 0 456 90 0 0
778 45 0 0 0 0 24 913 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 26 0 0 0
234 0 0 620 0 0 0 0 0 106 0 0 901 0 0
0 0 0 0 0 0 45 0 34 0 0 0 0 0 0
I would like to have an image of with the dimensions [3,5] and the columns in the above data frame have been arranged based on their respective coordinates in the following manner [1,1], [2,1], [3,1], [1,2], [2,2], [3,2]..... and so on
The image coordinate arrangement
[1,1] [1,2] [1,3] [1,4] [1,5]
[2,1] [2,2] [2,3] [2,4] [2,5]
[3,1] [3,2] [3,3] [3,4] [3,5]

This function reads in your dataset and finds the mean (or max or sum) of each column (yielding a series of numbers, one per column). It then reshapes that series into your desired output dimensions and displays as an image.
df <- read.table(header=FALSE,text="
0 0 0 0 0 46 0 0 0 0 0 0 0 78 0
0 734 0 0 0 0 932 0 0 56 0 0 0 0 0
0 0 0 115 0 0 0 0 0 0 64 0 0 0 0
0 67 0 0 0 45 0 0 0 0 0 546 0 12 0
0 0 0 0 65 5 56 0 54 0 0 0 0 0 0
667 0 430 0 0 0 0 456 0 0 787 0 0 467 0
0 0 0 0 54 0 0 0 0 0 0 456 90 0 0
778 45 0 0 0 0 24 913 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 26 0 0 0
234 0 0 620 0 0 0 0 0 106 0 0 901 0 0
0 0 0 0 0 0 45 0 34 0 0 0 0 0 0
")
img <- function(data, op, tall, wide) image(t(matrix(sapply(data, op), nrow = wide, ncol = tall)),
col = gray((0:32) / 32))
img(df, mean, 3, 5)
img(df, max, 3, 5)
img(df, sum, 3, 5)

Related

Adding multiple columns in between columns in a data frame using a For Loop

outputdata (df)
Store.No Task
1 70
2 50
3 20
I am trying to add 53 columns after the 'Task' column by using its position not the name. Then I want want columns names to begin from 1 and end on the number 53 with 0 in the rows. The rows in this example go to row number 3 but it could vary so would it be possible to use nrow function to specify the number of rows rather than hard coding
outputdata- Desired Outcome
Store.No Task 1 2 3 4 5 6 7 8 9 10 ...53
1 70 0 0 0 0 0 0 0 0 0 0
2 50 0 0 0 0 0 0 0 0 0 0
3 20 0 0 0 0 0 0 0 0 0 0
Code used
x <- 1
y <- 0
for (i in 1:53){
outputdata <- add_column(outputdata, x = 0, .after = Fo+y)
y <- y + 1
x <- x + 1
}
The error i'm getting is the columns are being called x,x.1,x.2,x.3,x.4...x.53. Rather than 1,2,3,4...53...not too sure why this could be
I am still quite new to R so there is a far more efficient way of doing this then please let me know
Many thanks
You do not need to loop to do this:
as.data.frame(cbind(df, matrix(0, nrow = nrow(df), ncol = 53)))
Store.No Task Third Fourth 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
1 1 70 4 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 2 50 5 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 3 20 6 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
matrix will create a matrix with 53 columns and 3 rows filled with 0
cbind will add this matrix to the end of your data
as.data.frame will convert it to a dataframe
Update
To insert these zero columns positionally you can subset your df into two parts: df[, 1:2] are the first and second columns, while df[,3:ncol(df)] are the third to end of your dataframe.
as.data.frame(cbind(df[,1:2], matrix(0, nrow = nrow(df), ncol = 53), df[,3:ncol(df)))
Store.No Task 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
1 1 70 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 2 50 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 3 20 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 Third Fourth
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 7
2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 8
3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 9
add_column
Alternatively you can use the add_column function from the tibble package as you were in your post using the .after argument to insert after the second column:
library(tibble)
tibble::add_column(df, as.data.frame(matrix(0, nrow = nrow(df), ncol = 53)), .after = 2)
Note: this function will fix the column names to add a "V" before any column name that starts with a number. So 1 will become V1.
Data
df <- data.frame(Store.No = 1:3,
Task = c(70, 50, 20),
Third = 4:6,
Fourth = 7:9)

Turn a long data structure to a wide matrix structure

I do have the following data structure...
ID value
1 1 1
2 1 63
3 1 2
4 1 58
5 2 3
6 2 4
7 3 34
8 3 25
Now I want to turn it into a kind of dyadic data structure. Every ID with the same value should have a relationship.
I tried several option and:
df_wide <- dcast(df, ID ~ value)
... have brought me a long way down the road...
ID 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 39 40
1 1001 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 1006 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 1007 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 2 0 0
4 1011 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5 1018 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 1020 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
7 1030 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0
8 1036 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Now is my main problem to turn it into a proper matrix to get a igraph object out of it.
df_wide_matrix <- data.matrix(df_wide)
df_aus_wide_g <- graph.edgelist(df_wide_matrix ,directed = TRUE)
don't get me there...
I also tried to transform it into a adjacency matrix...
df_wide_matrix <- get.adjacency(graph.edgelist(as.matrix(df_wide), directed=FALSE))
... but it didn't work either
If you want to create an edge between all IDs with the same value, try something like this instead. First merge the data frame onto itself by the value. Then, remove the value column, and remove all (undirected) edges that are duplicate or just points. Finally, convert to a two-column matrix and create the edges.
res <- merge(df, df, by='value', all=FALSE)[,c('ID.x','ID.y')]
res <- res[res$ID.x<res$ID.y,]
resg <- graph.edgelist(as.matrix(res))

Loosing observation when I use reshape in R

I have data set
> head(pain_subset2, n= 50)
PatientID RSE SE SECODE
1 1001-01 0 0 0
2 1001-01 0 0 0
3 1001-02 0 0 0
4 1001-02 0 0 0
5 1002-01 0 0 0
6 1002-01 1 2a 1
7 1002-02 0 0 0
8 1002-02 0 0 0
9 1002-02 0 0 0
10 1002-03 0 0 0
11 1002-03 0 0 0
12 1002-03 1 1 1
> dim(pain_subset2)
[1] 817 4
> table(pain_subset2$RSE)
0 1
788 29
> table(pain_subset2$SE)
0 1 2a 2b 3 4 5
788 7 5 1 6 4 6
> table(pain_subset2$SECODE)
0 1
788 29
I want to create matrix with n * 6 (n :# of PatientID, column :6 levels of SE)
I use reshape, I lost many observations
> dim(p)
[1] 246 9
My code:
p <- reshape(pain_subset2, timevar = "SE", idvar = c("PatientID","RSE"),v.names = "SECODE", direction = "wide")
p[is.na(p)] <- 0
> table(p$RSE)
0 1
226 20
Compare with table of RSE, I lost 9 patients having 1.
This is out put I have
PatientID RSE SECODE.0 SECODE.2a SECODE.1 SECODE.5 SECODE.3 SECODE.2b SECODE.4
1 1001-01 0 0 0 0 0 0 0 0
3 1001-02 0 0 0 0 0 0 0 0
5 1002-01 0 0 0 0 0 0 0 0
6 1002-01 1 0 1 0 0 0 0 0
7 1002-02 0 0 0 0 0 0 0 0
10 1002-03 0 0 0 0 0 0 0 0
12 1002-03 1 0 0 1 0 0 0 0
13 1002-04 0 0 0 0 0 0 0 0
15 1003-01 0 0 0 0 0 0 0 0
18 1003-02 0 0 0 0 0 0 0 0
21 1003-03 0 0 0 0 0 0 0 0
24 1003-04 0 0 0 0 0 0 0 0
27 1003-05 0 0 0 0 0 0 0 0
30 1003-06 0 0 0 0 0 0 0 0
32 1003-07 0 0 0 0 0 0 0 0
35 1004-01 0 0 0 0 0 0 0 0
36 1004-01 1 0 0 0 1 0 0 0
40 1004-02a 0 0 0 0 0 0 0 0
Anyone knows what happens, I really appreciate.
Thanks for your help, best.
Try:
library(dplyr)
library(tidyr)
pain_subset2 %>%
spread(SE, SECODE)

R igraph Adjazenzmatrix weighted graph – plot is not weighted

I am trying to plot a weighed graph of terms used in tweets. Basically I made a term Document Matrix; removed sparse terms; build a adjazenzmatrix of the remaining words and would like to plot them.
I can't figure out where the problem is. Tried to do it exactly like on: http://www.rdatamining.com/examples/text-mining
Here's my code:
tweet_corpus = Corpus(VectorSource(df$CONTENT))
tdm = TermDocumentMatrix(
tweet_corpus,
control = list(
removePunctuation = TRUE,
stopwords = c("hehe", "haha", stopwords_phil, stopwords("english"), stopwords("spanish")),
removeNumbers = TRUE, tolower = TRUE)
)
m = as.matrix(tdm)
termDocMatrix <- m
termDocMatrix[5:10,1:20]
Docs
Terms 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
aabutin 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
aad 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
aaf 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
aali 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
aannacm 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
aantukin 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
myTdm2 <- removeSparseTerms(tdm, sparse =0.98)
m2 <- as.matrix(myTdm2)
m2[5:10,1:20]
Docs
Terms 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
filipino 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
give 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0
god 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
good 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
guy 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0
haiyan 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
myTdm2
<<TermDocumentMatrix (terms: 34, documents: 27395)>>
Non-/sparse entries: 39769/891661
Sparsity : 96%
Maximal term length: 9
Weighting : term frequency (tf)
termDocMatrix2 <- m2
termDocMatrix2[termDocMatrix2>=1] <- 1
termMatrix2 <- termDocMatrix2 %*% t(termDocMatrix2)
termMatrix2[5:10,5:10]
Terms
Terms disaster give god good guy test
disaster 623 6 53 11 4 19
give 6 592 98 16 8 6
god 53 98 2679 135 38 29
good 11 16 135 816 21 5
guy 4 8 38 21 637 5
test 19 6 29 5 5 610
g2 <- graph.adjacency(termMatrix2, weighted=T, mode="undirected")
g2 <- simplify(g2)
V(g)$label <- V(g)$name
V(g2)$label <- V(g2)$name
V(g2)$degree <- degree(g2)
set.seed(3952)
layout1 <- layout.fruchterman.reingold(g2)
plot(g2, layout=layout1)
plot(g2, layout=layout.kamada.kawai)
V(g2)$label.cex <- 2.2 * V(g2)$degree / max(V(g2)$degree)+ .2
V(g2)$label.color <- rgb(0, 0, .2, .8)
V(g2)$frame.color <- NA
egam <- (log(E(g2)$weight)+.4) / max(log(E(g2)$weight)+.4)
E(g2)$color <- rgb(.5, .5, 0, egam)
E(g2)$width <- egam
plot(g2, layout=layout1)
This then looks like:
but i would like to have something like this:
apparently the weighing doesn't work - but why?!
Thank you guys in advance!
Even though your graph is weighted, the layout algorithm does not use the weights unless you explicitly tell it to do so. Try this:
layout1 <- layout.fruchterman.reingold(g2, weights=E(g2)$weight)
However, if your weights are wildly varying in terms of magnitude, it is usually better to use the logarithm of the weights (plus some constant to make all of them strictly positive) as the input of the layout algorithm.

zoo's NA handling methods in r

I am experimenting with different imputation method in zoo
So far I tried on my dataset na.locf, na.approx, na.spline. However, when I tried the same dataset with na.StructTS which uses seasonal Kalman filter it returns me the following error:
Error in StructTS(y) : 'x' must be numeric
Did I miss something? Any help is appreciated.
UPD1
my code:
empty <-zoo(order.by=seq.Date(head(index(df1.zoo),1),tail(index(df1.zoo),1),by="days"))
merged<-na.StructTS(merge(df1.zoo,empty))
here is df1.zoo:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
2012-01-01 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 42
2012-01-02 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 57
2012-01-03 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 51
2012-01-04 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 41
2012-01-05 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 56
2012-01-06 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 55
here is empty:

Resources