PCA analysis wrong output - r

That's the example data:
structure(c(368113, 87747.35, 508620.5, 370570.5, 87286.5, 612728,
55029, 358521, 2802880, 2045399.5, 177099, 317974.5, 320687.95,
6971292.55, 78949, 245415.95, 50148.5, 67992.5, 97634, 56139.5,
371719.2, 80182.7, 612078.5, 367822.5, 80691, 665190.65, 28283.5,
309720, 2853241.5, 1584324, 135482.5, 270959, 343879.1, 6748208.5,
71534.9, 258976, 28911.75, 78306, 56358.7, 46783.5, 320882.85,
53098.3, 537383.5, 404505.5, 89759.7, 624120.55, 40406, 258183.5,
3144610.45, 1735583.5, 122013.5, 249741, 362585.35, 5383869.15,
23172.2, 223704.45, 40543.7, 68522.5, 43187.05, 29745, 356058.5,
89287.25, 492242.5, 452135.5, 97253.55, 575661.95, 65739.5, 334703.5,
3136065, 1622936.5, 131381.5, 254362, 311496.3, 5627561, 68210.6,
264610.1, 45851, 65010.5, 32665.5, 39957.5, 362476.75, 59451.65,
548279, 345096.5, 93363.5, 596444.2, 11052.5, 252812, 2934035,
1732707.55, 208409.5, 208076.5, 437764.25, 16195882.45, 77461.25,
205803.85, 30437.5, 75540, 49576.75, 48878, 340380.5, 43785.35,
482713, 340315, 64308.5, 517859.85, 11297, 268993.5, 3069028.5,
1571889, 157561, 217596.5, 400610.65, 5703337.6, 50640.65, 197477.75,
40070, 66619, 81564.55, 41436.5, 367592.3, 64954.9, 530093, 432025,
87212.5, 553901.65, 20803.5, 333940.5, 3027254.5, 1494468, 195221,
222895.5, 494429.45, 7706885.75, 60633.35, 192827.1, 29857.5,
81001.5, 112588.65, 68904.5, 338822.5, 56868.15, 467350, 314526.5,
105568, 749456.1, 19597.5, 298939.5, 2993199.2, 1615231.5, 229185.5,
280433.5, 360156.15, 5254889.1, 79369.5, 175434.05, 40907.05,
70919, 65720.15, 53054.5), .Dim = c(20L, 8L), .Dimnames = list(
c("Anne", "Greg", "thomas", "Chris", "Gerard", "Monk", "Mart",
"Mutr", "Aeqe", "Tor", "Gaer", "Toaq", "Kolr", "Wera", "Home",
"Terlo", "Kulte", "Mercia", "Loki", "Herta"), c("Day_Rep1",
"Day_Rep2", "Day_Rep3", "Day_Rep4", "Day2_Rep1", "Day2_Rep2",
"Day2_Rep3", "Day2_Rep4")))
I would like to perform a nice PCA analysis. I expect that replicates from Day will be nicely correlated with each other and replicates from Day2 together. I was trying to perform some analysis using the code below:
## log transform
data_log <- log(data[, 1:8])
#vec_EOD_EON
dt_PCA <- prcomp(data_log,
center = TRUE,
scale. = TRUE)
library(devtools)
install_github("ggbiplot", "vqv")
library(ggbiplot)
g <- ggbiplot(dt_PCA, obs.scale = 1, var.scale = 1,
groups = colnames(dt_PCA), ellipse = TRUE,
circle = TRUE)
g <- g + scale_color_discrete(name = "")
g <- g + theme(legend.direction = 'horizontal',
legend.position = 'top')
print(g)
However, the output is not what I am looking for:
but I am looking for something more like that:
I would like to use dots for each row in the data and different colors for each of the replicate. Would be cool to use the similar colors for Day replicates and as well for Day2.
Obtained data with ggplot:

Let's imagine you save your data into df.
library(ggplot2)
pc_df <- prcomp(t(df), scale.=TRUE)
pc_table <- as.data.frame(pc_df$x[,1:2]) # extracting 1st and 2nd component
experiment_regex <- '(^[^_]+)_Rep(\\d+)' # extracting replicate and condition from your experiment names
pc_table$replicate <- as.factor(sub(experiment_regex,'\\2', rownames(pc_table)))
pc_table$condition <- as.factor(sub(experiment_regex,'\\1', rownames(pc_table)))
ggplot(pc_table, aes(PC1, PC2, color=condition, shape=replicate)) +
geom_point() +
xlab(sprintf('PC1 - %.1f%%', # extracting the percentage of each PC and print it on the axes
summary(pc_df)$importance[2,1] * 100)) +
ylab(sprintf('PC2 - %.1f%%',
summary(pc_df)$importance[2,2] * 100))
The first thing you have to do, to get your data in the correct shape is to transform it using t(). This might be already what you are looking for.
I prefer to do the plots with my own function and I wrote the steps down to get a nice plot with ggplot2.
UPDATE:
Since you were asking in the comments. Here is an example where an experiment was repeated on a different day. Replicate 1 and 2 on one day, and a few days later replicate 3 and 4.
The difference on both days are higher then the changes in the conditions (day has 49% variance, experiment has only 20% variance explained).
This is not a good experiment and should be repeated!

Related

R "Error in draw.quad.venn, Impossible: produces negative area" despite numbers being correct

I'm trying to generate a four way Venn diagram using draw.quad.venn in the VennDiagram package in R, but it keeps throwing up the error message:
ERROR [2019-05-14 11:28:24] Impossible: a7 <- n234 - a6 produces negative area
Error in draw.quad.venn(length(gene_lists[[1]]), length(gene_lists[[2]]), :
Impossible: a7 <- n234 - a6 produces negative area
I'm using 4 different lists of genes as the input. calculate.overlap works fine, then I get the numbers by using the length(x) function over the overlap values, parsed as a list. I pass all of the overlap values, along with the appropriate total group sizes, to the draw.quad.venn function, but it keeps claiming that one of the groups is impossible because it generates a negative number.
I've checked the numbers manually and they clearly add up to the correct values. I've also tested the script on a random set of 20000 genes, generated using something similar to the script below, and it works fine i.e. generates a four way Venn diagram. There are no differences between the randomly generated gene lists and the ones I've curated from actual results files, apart from their sizes. A minimal working example can be seen below:
# working example that fails
# get vector of 10000 elements (representative of gene list)
values <- c(1:10000)
# generate 4 subsets by random sampling
list_1 <- sample(values, size = 5000, replace = FALSE)
list_2 <- sample(values, size = 4000, replace = FALSE)
list_3 <- sample(values, size = 3000, replace = FALSE)
list_4 <- sample(values, size = 2000, replace = FALSE)
# compile them in to a list
lists <- list(list_1, list_2, list_3, list_4)
# find overlap between all possible combinations (11 plus 4 unique to each list = 15 total)
overlap <- calculate.overlap(lists)
# get the lengths of each list - these will be the numbers used for the Venn diagram
overlap_values <- lapply(overlap, function(x) length(x))
# rename overlap values (easier to identify which groups are intersecting)
names(overlap_values) <- c("n1234", "n123", "n124", "n134", "n234", "n12", "n13", "n14", "n23", "n24", "n34", "n1", "n2", "n3", "n4")
# generate the venn diagram
draw.quad.venn(length(lists[[1]]), length(lists[[2]]), length(lists[[3]]), length(lists[[4]]), overlap_values$n12,
overlap_values$n13, overlap_values$n14, overlap_values$n23, overlap_values$n24, overlap_values$n34,
overlap_values$n123, overlap_values$n124, overlap_values$n134, overlap_values$n234, overlap_values$n1234)
I expect a four way Venn diagram regardless of whether or not some groups are 0, they should still be there, but labelled as 0. This is what it should look like:
I'm not sure if it's because I have 0 values in the real data i.e. certain groups where there is no overlap? Is there any way to force draw.quad.venn() to take any values? If not, is there another package that I can use to achieve the same results? Any help greatly appreciated!
So nothing I tried could solve the error with the draw.quad.venn in the VennDiagram package. There's something wrong with the way it's written. As long as all of the numbers in each of the 4 ellipses add up to the total number of elements in that particular list, the Venn diagram is valid. For some reason, VennDiagram will only accept data where fewer intersections lead to higher numbers e.g. the intersection of groups 1, 2 and 3 MUST be higher than the intersection of all 4 groups. This doesn't represent real world data. It's entirely possible for groups 1, 2 and 3 to not intersect at all, whilst all 4 groups do intersect. In a Venn diagram, all of the numbers are independent, and represent the total number of elements common at each intersection. They do not have to have any bearing on each other.
I had a look at the eulerr package, but actually found a very simple method of plotting the venn diagram using venn in gplots, as follows:
# simple 4 way Venn diagram using gplots
# get some mock data
values <- c(1:20000)
list_1 <- sample(values, size = 5000, replace = FALSE)
list_2 <- sample(values, size = 4000, replace = FALSE)
list_3 <- sample(values, size = 3000, replace = FALSE)
list_4 <- sample(values, size = 2000, replace = FALSE)
lists <- list(list_1, list_2, list_3, list_4)
# name thec list (required for gplots)
names(lists) <- c("G1", "G2", "G3", "G4")
# get the venn table
v.table <- venn(lists)
# show venn table
print(v.table)
# plot Venn diagram
plot(v.table)
I now consider the matter solved. Thank you zx8754 for your help!
I have had a look at the source code of the package. In case you are still interested in the reason for the error, there are two ways to send data to venn.diagram. One is the nxxxx (e. g., n134) form and the other is the an (e. g., a5) form. In the examples, n134 means "which elements belong at least to groups 1, 3 and 4". On the other hand, a5 means "which elements only belong to groups 1, 3 and 4". The relationship between both forms is really convoluted, for instance a6 corresponds to n1234. This means that n134 = a5 + a6.
The problem is that calculate.overlap gives the numbers in the an form, whereas by default draw.quad.venn expects numbers in the nxxxx form. To use the values from calculate.overlap, you can set direct.area to true and provide the result of calculate.overlap in the area.vector parameter. For instance,
tmp <- calculate.overlap(list(a=c(1, 2, 3, 4, 10), b=c(3, 4, 5, 6), c=c(4, 6, 7, 8, 9), d=c(4, 8, 1, 9)))
overlap_values <- lapply(tmp, function(x) length(x))
draw.quad.venn(area.vector = c(overlap_values$a1, overlap_values$a2, overlap_values$a3, overlap_values$a4,
overlap_values$a5, overlap_values$a6, overlap_values$a7, overlap_values$a8,
overlap_values$a9, overlap_values$a10, overlap_values$a11, overlap_values$a12,
overlap_values$a13, overlap_values$a14, overlap_values$a15), direct.area = T, category = c('a', 'b', 'c', 'd'))
If you are interested in something simpler and more flexible, I made the nVennR package for this type of problems:
library(nVennR)
g1 <- c('AF029684', 'M28825', 'M32074', 'NM_000139', 'NM_000173', 'NM_000208', 'NM_000316', 'NM_000318', 'NM_000450', 'NM_000539', 'NM_000587', 'NM_000593', 'NM_000638', 'NM_000655', 'NM_000789', 'NM_000873', 'NM_000955', 'NM_000956', 'NM_000958', 'NM_000959', 'NM_001060', 'NM_001078', 'NM_001495', 'NM_001627', 'NM_001710', 'NM_001716')
g2 <- c('NM_001728', 'NM_001835', 'NM_001877', 'NM_001954', 'NM_001992', 'NM_002001', 'NM_002160', 'NM_002162', 'NM_002258', 'NM_002262', 'NM_002303', 'NM_002332', 'NM_002346', 'NM_002347', 'NM_002349', 'NM_002432', 'NM_002644', 'NM_002659', 'NM_002997', 'NM_003032', 'NM_003246', 'NM_003247', 'NM_003248', 'NM_003259', 'NM_003332', 'NM_003383', 'NM_003734', 'NM_003830', 'NM_003890', 'NM_004106', 'AF029684', 'M28825', 'M32074', 'NM_000139', 'NM_000173', 'NM_000208', 'NM_000316', 'NM_000318', 'NM_000450', 'NM_000539')
g3 <- c('NM_000655', 'NM_000789', 'NM_004107', 'NM_004119', 'NM_004332', 'NM_004334', 'NM_004335', 'NM_004441', 'NM_004444', 'NM_004488', 'NM_004828', 'NM_005214', 'NM_005242', 'NM_005475', 'NM_005561', 'NM_005565', 'AF029684', 'M28825', 'M32074', 'NM_005567', 'NM_003734', 'NM_003830', 'NM_003890', 'NM_004106', 'AF029684', 'NM_005582', 'NM_005711', 'NM_005816', 'NM_005849', 'NM_005959', 'NM_006138', 'NM_006288', 'NM_006378', 'NM_006500', 'NM_006770', 'NM_012070', 'NM_012329', 'NM_013269', 'NM_016155', 'NM_018965', 'NM_021950', 'S69200', 'U01351', 'U08839', 'U59302')
g4 <- c('NM_001728', 'NM_001835', 'NM_001877', 'NM_001954', 'NM_005214', 'NM_005242', 'NM_005475', 'NM_005561', 'NM_005565', 'ex1', 'ex2', 'NM_003890', 'NM_004106', 'AF029684', 'M28825', 'M32074', 'NM_000139', 'NM_000173', 'NM_000208', 'NM_000316', 'NM_000318', 'NM_000450', 'NM_000539')
myV <- plotVenn(list(g1=g1, g2=g2, g3=g3, g4=g4))
myV <- plotVenn(nVennObj = myV)
myV <- plotVenn(nVennObj = myV)
The last command is repeated on purpose. The result:
You can then explore the intersections:
> getVennRegion(myV, c('g1', 'g2', 'g4'))
[1] "NM_000139" "NM_000173" "NM_000208" "NM_000316" "NM_000318" "NM_000450" "NM_000539"
There is a vignette with more information.

Generate multiple plots in base R with loop function then concatenate by matching group variables

I have a data frame (below, my apologies for the verbose code, this is my first attempt at generating reproducible random data) that I'd like to loop through and generate individual plots in base R (specifically, ethograms) for each subject's day and video clip (e.g. subj-1/day1/clipB). After generating n graphs, I'd like to concatenate a PDF for each subj that includes all days + clips, and have each row correspond to a single day. I haven't been able to get past the generating individual graphs, however, so any help would be greatly appreciated!
Data frame
n <- 20000
library(stringi)
test <- as.data.frame(sprintf("%s", stri_rand_strings(n, 2, '[A-Z]')))
colnames(test)<-c("Subj")
test$Day <- sample(1:3, size=length(test$Subj), replace=TRUE)
test$Time <- sample(0:600, size=length(test$Subj), replace=TRUE)
test$Behavior <- as.factor(sample(c("peck", "eat", "drink", "fly", "sleep"), size = length(test$Time), replace=TRUE))
test$Vid_Clip <- sample(c("Clip_A", "Clip_B", "Clip_C"), size = length(test$Time), replace=TRUE)
Sample data from data frame:
> head(test)
Subj Day Time Behavior Vid_Clip
1 BX 1 257 drink Clip_B
2 NP 2 206 sleep Clip_B
3 ZF 1 278 peck Clip_B
4 MF 2 391 sleep Clip_A
5 VE 1 253 fly Clip_C
6 ID 2 359 eat Clip_C
After adapting this code, I am able to successfully generate a single plot (one at a time):
Subset single subj/day/clip:
single_subj_day_clip <- test[test$Vid_Clip == "Clip_B" & test$Subj == "AA" & test$Day == 1,]
After which, I can generate the graph I'm after by running the following lines:
beh_numb <- nlevels(single_subj_day_clip$Behavior)
mar.default <- c(5,4,4,2) + 0.1
par(mar = mar.default + c(0, 4, 0, 0))
plot(single_subj_day_clip$Time,
xlim=c(0,max(single_subj_day_clip$Time)), ylim=c(0, beh_numb), type="n",
ann=F, yaxt="n", frame.plot=F)
for (i in 1:length(single_subj_day_clip$Behavior)) {
ytop <- as.numeric(single_subj_day_clip$Behavior[i])
ybottom <- ytop - 0.5
rect(xleft=single_subj_day_clip$Subj[i], xright=single_subj_day_clip$Time[i+1],
ybottom=ybottom, ytop=ytop, col = ybottom)}
axis(side=2, at = (1:beh_numb -0.25), labels=levels(single_subj_day_clip$Behavior), las = 1)
mtext(text="Time (sec)", side=1, line=3, las=1)
Example graph from randomly generate data(sorry for link - newb SO user so until I'm at 10 reputation pts, I can't embed an image directly)
Example graph from actual data
Ideal per subject graph
Thank you all in advance for your input.
Cheers,
Dan
New and hopefully correct answer
The code is too long to post it here, so there is a link to the Dropbox folder with data and code. You can check this html document or run this .Rmd file on your machine. Please check if all required packages are installed. There is the output of the script.
There are additional problem in the analysis - some events are registered only once, at a single time point between other events. So there is no "width" of such bars. I assigned width of such events to 1000 ms, so some (around 100 per 20000 observations) of them are out of scale if they are at the beginning or at the end of the experiment (and if the width for such events is equal to zero). You can play with the code to fix this behavior.
Another problem is the different colors for the same factors on the different plots. I need some fresh air to fix it as well.
Looking into the graphs, you can notice that sometimes, it seems that some observation with a very short time are overlapping with other observations. But if you zoom the pdf to the maximum - you will see that they are not, and there is a 'holes' in underlying intervals, where they are supposed to be.
Lines, connecting the intervals for different kinds of behavior are helping to follow the timecourse of the experiment. You can uncomment corresponding parts of the code, if you wish.
Please let me know if it works.
Old answer
I am not sure it is the best way to do it, but probably you can use split() and after that lapply through your tables:
Split your data.frame by Subj, Day, and Vid_clip:
testl <- split(test, test[, c(1, 2, 5)], drop = T)
testl[[1123]]
# Subj Day Time Behavior Vid_Clip
#8220 ST 2 303 fly Clip_A
#9466 ST 2 463 fly Clip_A
#9604 ST 2 32 peck Clip_A
#10659 ST 2 136 peck Clip_A
#13126 ST 2 47 fly Clip_A
#14458 ST 2 544 peck Clip_A
Loop through the list with your data and plot to .pdf:
mar.default <- c(5,4,4,2) + 0.1
par(mar = mar.default + c(0, 4, 0, 0))
nbeh = nlevels(test$Behavior)
pdf("plots.pdf")
invisible(
lapply(testl, function(l){
plot(x = l$Time, xlim = c(0, max(l$Time)), ylim = c(0, nbeh),
type = "n", ann = F, yaxt = "n", frame.plot = F)
lapply(1:nbeh, function(i){
ytop <- as.numeric(l$Behavior[i]); ybot <- ytop - .5
rect(l$Subj[i], ybot, l$Time[i + 1], ytop, col = ybot)
})
axis(side = 2, at = 1:nbeh - .25, labels = levels(l$Behavior), las = 1)
mtext(text = "Time (sec)", side = 1, line = 3, las = 1)
})
)
dev.off()
You should probably check output here before you run code on your PC. I didn't edit much your plot-code, so please check it twice.

Mapping slope of an area and returning percent above and below a threshold in R

I am trying to figure our the proportion of an area that has a slope of 0, +/- 5 degrees. Another way of saying it is anything above 5 degrees and below 5 degrees are bad. I am trying to find the actual number, and a graphic.
To achieve this I turned to R and using the Raster package.
Let's use a generic country, in this case, the Philippines
{list.of.packages <- c("sp","raster","rasterVis","maptools","rgeos")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)}
library(sp) # classes for spatial data
library(raster) # grids, rasters
library(rasterVis) # raster visualisation
library(maptools)
library(rgeos)
Now let's get the altitude information and plot the slopes.
elevation <- getData("alt", country = "PHL")
x <- terrain(elevation, opt = c("slope", "aspect"), unit = "degrees")
plot(x$slope)
Not very helpful due to the scale, so let's simply look at the Island of Palawan
e <- drawExtent(show=TRUE) #to crop out Palawan (it's the long skinny island that is roughly midway on the left and is oriented between 2 and 8 O'clock)
gewataSub <- crop(x,e)
plot(gewataSub, 1)## Now visualize the new cropped object
A little bit better to visualize. I get a sense of the magnitude of the slopes and that with a 5 degree restriction, I am mostly confined to the coast. But I need a little bit more for analysis.
I would like Results to be something to be in two parts:
1. " 35 % (made up) of the selected area has a slope exceeding +/- 5 degrees" or " 65 % of the selected area is within +/- 5 degrees". (with the code to get it)
2. A picture where everything within +/- 5 degrees is one color, call it good or green, and everything else is in another color, call it bad or red.
Thanks
There are no negative slopes, so I assume you want those that are less than 5 degrees
library(raster)
elevation <- getData('alt', country='CHE')
x <- terrain(elevation, opt='slope', unit='degrees')
z <- x <= 5
Now you can count cells with freq
f <- freq(z)
If you have a planar coordinate reference system (that is, with units in meters or similar) you can do
f <- cbind(f, area=f[,2] * prod(res(z)))
to get areas. But for lon/lat data, you would need to correct for different sized cells and do
a <- area(z)
zonal(a, z, fun=sum)
And there are different ways to plot, but the most basic one
plot(z)
You can use reclassify from the raster package to achieve that. The function assigns each cell value that lies within a defined interval a certain value. For example, you can assign cell values within interval (0,5] to value 0 and cell values within the interval (5, maxSlope] to value 1.
library(raster)
library(rasterVis)
elevation <- getData("alt", country = "PHL")
x <- terrain(elevation, opt = c("slope", "aspect"), unit = "degrees")
plot(x$slope)
e <- drawExtent(show = TRUE)
gewataSub <- crop(x, e)
plot(gewataSub$slope, 1)
m <- c(0, 5, 0, 5, maxValue(gewataSub$slope), 1)
rclmat <- matrix(m, ncol = 3, byrow = TRUE)
rc <- reclassify(gewataSub$slope, rclmat)
levelplot(
rc,
margin = F,
col.regions = c("wheat", "gray"),
colorkey = list(at = c(0, 1, 2), labels = list(at = c(0.5, 1.5), labels = c("<= 5", "> 5")))
)
After the reclassification you can calculate the percentages:
length(rc[rc == 0]) / (length(rc[rc == 0]) + length(rc[rc == 1])) # <= 5 degrees
[1] 0.6628788
length(rc[rc == 1]) / (length(rc[rc == 0]) + length(rc[rc == 1])) # > 5 degrees
[1] 0.3371212

Renko Chart in R

I am trying to construct Renko Chart using the obtained from Yahoo finance and was wondering if there is any package to do so. I had a look at the most financial packages but was only able to find Candlestick charts.
For more information on Renko charts use the link given here
Really cool question! Apparently, there is really nothing of that sort available for R. There were some attempts to do similar things (e.g., waterfall charts) on various sites, but they all don't quite hit the spot. Soooo... I made a little weekend project out of it with data.table and ggplot.
rrenko
There are still bugs, instabilities, and visual things that I would love to optimize (and the code is full of commented out debug notes), but the main idea should be there. Open for feedback and points for improvement.
Caveats: There are still case where the data transformation screws up, especially if the size is very small or very large. This should be fixable in the near future. Also, the renko() function at the moment expects a dataframe with two columns: date (x-axis) and close (y-axis).
Installation
devtools::install_github("RomanAbashin/rrenko")
library(rrenko)
Code
renko(df, size = 5, style = "modern") +
scale_y_continuous(breaks = seq(0, 150, 10)) +
labs(x = "", y = "")
renko(df, size = 5, style = "classic") +
scale_y_continuous(breaks = seq(0, 150, 10)) +
labs(x = "", y = "")
Data
set.seed(1702)
df <- data.frame(date = seq.Date(as.Date("2014-05-02"), as.Date("2018-05-04"), by = "week"),
close = abs(100 + cumsum(sample(seq(-4.9, 4.9, 0.1), 210, replace = TRUE))))
> head(df)
date close
1: 2014-05-02 104.0
2: 2014-05-09 108.7
3: 2014-05-16 111.5
4: 2014-05-23 110.3
5: 2014-05-30 108.9
6: 2014-06-06 106.5
I'm R investment developer, I used some parts of Roman's code to optimize some lines of my Renko code. Roman's ggplot skills are awesome. The plot function was just possible because of Roman's code.
If someone is interesting:
https://github.com/Kinzel/k_rrenko
It will need the packages: xts, ggplot2 and data.table
"Ativo" need to be a xts, with one of columns named "close" to work.
EDIT:
After TeeKea request, how to use it is simple:
"Ativo" is a EURUSD xts 15-min of 2015-01-01 to 2015-06-01. If the "close" column is not found, it will be used the last one.
> head(Ativo)
Open High Low Close
2015-01-01 20:00:00 1.20965 1.21022 1.20959 1.21006
2015-01-01 20:15:00 1.21004 1.21004 1.20979 1.21003
2015-01-01 20:30:00 1.21033 1.21041 1.20982 1.21007
2015-01-01 20:45:00 1.21006 1.21007 1.20978 1.21002
2015-01-01 21:00:00 1.21000 1.21002 1.20983 1.21002
2015-01-02 00:00:00 1.21037 1.21063 1.21024 1.21037
How to use krenko_plot:
krenko_plot(Ativo, 0.01,withDates = F)
Link to image krenko_plot
Compared to plot.xts
plot.xts(Ativo, type='candles')
Link to image plot.xts
There are two main variables: size and threshold.
"size" is the size of the bricks. Is needed to run.
"threshold" is the threshold of new a brick. Default is 1.
The first brick is removed to ensure reliability.
Here's a quick and dirty solution, adapted from a python script here.
# Get some test data
library(rvest)
url <- read_html("https://coinmarketcap.com/currencies/bitcoin/historical-data/?start=20170602&end=20181126")
df <- url %>% html_table() %>% as.data.frame()
# Make sure to have your time sequence the right way up
data <- apply(df[nrow(df):1, 3:4], 1, mean)
# Build the renko function
renko <- function(data, delta){
pre <- data[1]
xpos <- NULL
ypos <- NULL
xneg <- NULL
yneg <- NULL
for(i in 1:length(data)){
increment <- data[i] - pre
incrementPerc <- increment / pre
pre <- data[i]
if(incrementPerc > delta){
xpos <- c(xpos, i)
ypos <- c(ypos, data[i])
}
if(incrementPerc < -delta){
xneg <- c(xneg, i)
yneg <- c(yneg, data[i])
}
}
signal <- list(xpos = xpos,
ypos = unname(ypos),
xneg = xneg,
yneg = unname(yneg))
return(signal)
}
# Apply the renko function and plot the outcome
signals <- renko(data = data, delta = 0.05)
plot(1:length(data), data, type = "l")
points(signals$xneg, signals$yneg, col = "red", pch = 19)
points(signals$xpos, signals$ypos, col = "yellowgreen", pch = 19)
NOTE: This is not a renko chart (thanks to #Roman). Buy and sell signals are displayed only. See reference mentioned above...

R blueprints / floorplan

I'm working on trying to represent an office building in R. Later, I'll need to represent multiple floors, but for now I need to start with one floor. There are clusters of cubes all in a regular structure. There are four small cubes for junior staff (4x4), and two larger cubes for a senior engineer and a manager (4x6). Once these are mapped out, I need to be able to show if they are occupied or free for new hires -- by color (like red for occupied, green for available). These are all laid out the same way, with the big ones on one end. For example,
+----+--+--+
| S |J1|J2|
+----+--+--+
<-hallway-->
+----+--+--+
| M |J3|J4|
+----+--+--+
I first thought I could use ggplot and just scatter plot everybody out, but I can't figure out how to capture the different size cubes with geom_point. I spent some time looking at maps, but it seems like I can't really take advantage of the regular structure of my floorplan -- maybe that really is the way to go and I take advantage of my regular structure in building out a map? Does R have a concept I should Google for this kind of structure?
In the end, I'll get a long data file, with the type of cubicle, the x and y coordinates of the cluster, and a "R" or "G" (4 columns).
You could also write a low-level graphic function; it's sometimes easier to tune than removing more and more components from a complex plot,
library(grid)
library(gridExtra)
floorGrob <- function(S = c(TRUE, FALSE), J = c(TRUE, FALSE, TRUE, TRUE),
draw=TRUE, newpage=is.null(vp), vp=NULL){
m <- rbind(c(1,3,4), # S1 J1 J2
c(7,7,7), # hall
c(2,5,6)) # S2 J3 J4
fills <- c(c("#FBB4AE","#CCEBC5")[c(S, J)+1], "grey90")
cellGrob <- function(f) rectGrob(gp=gpar(fill=f, col="white", lwd=2))
grobs <- mapply(cellGrob, f=fills, SIMPLIFY = FALSE)
g <- arrangeGrob(grobs = grobs, layout_matrix = m, vp = vp, as.table = FALSE,
heights = unit(c(4/14, 1/14, 4/14), "null"),
widths = unit(c(6/14, 4/14, 4/14), "null"), respect=TRUE)
if(draw) {
if(newpage) grid.newpage()
grid.draw(g)
}
invisible(g)
}
floorGrob()
How about?
df <- expand.grid(x = 0:5, y = 0:5)
df$color <- factor(sample(c("green", "red"), 36, replace = T))
head(df)
# x y color
# 1 0 0 green
# 2 1 0 green
# 3 2 0 green
# 4 3 0 red
# 5 4 0 green
# 6 5 0 red
library(ggplot2)
ggplot(df, aes(x, y, fill = color)) +
geom_tile() +
scale_fill_manual(name = "Is it open?",
values = c("lightgreen", "#FF3333"),
labels = c("open", "not open"))

Resources