Rearranging Numeric Axis - r

I'm trying to plot data and running into an issue with a numeric axis. It should be plotted in order:
1, 2, 3, 4, 5... 22, X, Y
Instead it's plotting like this:
1, 10, 11, 12... 2, 22, 3, 4..., X, Y
I've tried changing the column in question with as.character, as.factor, as.numeric. I've also checked out a few "rearrange" suggestions, but they all deal with the observations themselves, and not the axis.
What am I overlooking?
Here is a sample of the data:
Chr Chunk A B C
1 1 3 4 4
1 2 3 4 4
1 3 3 2 4
1 4 3 4 9
2 1 3 3 4
2 2 3 4 4
2 3 3 4 4
10 1 3 4 4
10 2 3 4 4
X 1 3 4 5
X 2 3 4 8
Y 1 3 4 5
I'm attempting to make a series of heat plots using ggplot:
heat <- ggplot(data, aes(Chr, Chunk, fill = A, label = sprintf("", A))) + geom_tile() + geom_text() + scale_fill_gradient2(high = "red")

Since you’re dealing with character data, ggplot will simply sort your data for plotting (and character strings are lexicographically ordered, such that '10' comes before '2'). If you want to influence the order, convert your character to an ordered factor. Unfortunately this requires actually providing the order manually (but in your case that order isn’t too hard to write down):
data$Chr = factor(data$Chr, levels = c(1 : 22, 'X', 'Y'), ordered = TRUE)

Related

rollmean fill NAs with original value

I followed this example to do a rolling mean rollmin in R similar to zoo package rollmax
But the first few are filled with NA's. How can I fill the NA's with the original value so that I don't lose datapoints?
We may use coalesce with the original vector to replace the NA with that corresponding non-NA element from original vector
library(dplyr)
library(zoo)
coalesce(rollmeanr(x, 3, fill = NA), x)
If it is a data.frame
ctd %>%
group_by(station) %>%
mutate(roll_mean_beam = coalesce(rollmeanr(beam_coef,
k = 5, fill = NA), beam_coef))
data
x <- 1:10
1) Using the original values seems a bit bizarre. Taking the rolling minimum of 1:10 using a width of 3 would give
1 2 1 2 3 4 5 6 7 8
I think what you really want is to apply min to however many points are available so that in this example we get
1 1 1 2 3 4 5 6 7 8
Now rollapplyr with partial=TRUE will use whatever number of points are available if fewer than width=3 exist at that point. At the first point only one point is available so it returns min(x[1]). At the second only two points are available so it returns min(x[1:2]). For all the rest it can use three points. Only zoo is used.
library(zoo)
x <- 1:10
rollapplyr(x, 3, min, partial = TRUE)
## [1] 1 1 1 2 3 4 5 6 7 8
2) The above seems more logical than filling the first two points with the first two input values but if you really wanted to do that anyways then simply prefix the series with the original values using c or use one of the other alternatives shown below. Only zoo is used.
c(x[1:2], rollapplyr(x, 3, min))
## [1] 1 2 1 2 3 4 5 6 7 8
pmin(rollapplyr(x, 3, min, fill = max(x)), x)
## [1] 1 2 1 2 3 4 5 6 7 8
replace(rollapplyr(x, 3, min, fill = NA), 1:2, x[1:2])
## [1] 1 2 1 2 3 4 5 6 7 8
Min <- function(x) if (length(x) < 3) tail(x, 1) else min(x)
rollapplyr(x, 3, Min, partial = TRUE)
## [1] 1 2 1 2 3 4 5 6 7 8

R - Shift specified columns using minimum value into positive values

I'm looking for an easy way to add the minimum value for each column inside my dataframe.
This feels like a common thing, but I haven't been able to find any good answers yet...maybe I'm missing something obvious.
Let's say I've got two columns (in reality I have close to 100) with positive and negative numbers.
w <- c(9, 9, 9, 9)
x <- c(-2, 0, 1, 3)
y <- c(-1, 1, 3, 4)
z <- as.data.frame(cbind(w, x, y))
w x y
1 9 -2 -1
2 9 0 1
3 9 1 3
4 9 3 4
I want z to look like this after a transformation for only x and y columns [,2:3]
w x y
1 9 0 0
2 9 2 2
3 9 3 4
4 9 5 5
Does that make sense?
library(dplyr)
dplyr::mutate(z, across(c(x, y), ~ . + abs(min(.))))
w x y
1 9 0 0
2 9 2 2
3 9 3 4
4 9 5 5
You can also do by column position rather than column name by changing c(x,y) to 2:3 or c(2:3, 5) for non-sequential column positions.
Depends exactly what you mean and what you want to happen if there aren't negative values. No matter the values, this will anchor the minimum at 0, but you should be able to adapt it if you want something slightly different.
z[] = lapply(z, function(col) col - min(col))
z
# x y
# 1 0 0
# 2 2 2
# 3 3 4
# 4 5 5
As a side note, as.data.frame(cbind(x, y)) is bad - if you have a mix of numeric and character values, cbind() will convert everything to character. It's shorter and better to simplify to data.frame(x, y).
Do you want
z[] <- lapply(z, function(columnValues) columnValues + abs(min(columnValues)))

R: Creating bins by a factor when number of observations not divisible by number of bins?

I have a data set in which I have a number of DV's for each level of a factor. The number of DV's/ factor is not consistent. I would like to create quintiles, such that for each level of the factor the smallest 25% of values are assigned to bin 1, the next 25% smallest in bin2, etc,
I have found a package with a NEAR perfect solution: schoRsch, in which the function ntiles creates bins based on levels of the factor, like so:
library(schoRsch)
#{
dv <- c(5, 2, 10, 15, 3, 7, 20, 44, 18)
factor <- c(1,1,2,2,2,2,3,3,3)
tmpdata <- data.frame(cbind(dv,factor))
tmpdata$factor <- as.factor(tmpdata$factor)
head(tmpdata)
tmpdata$bins <- ntiles(tmpdata, dv = "dv", bins=2, factors = "factor")
tmpdata
#}
the output looks like:
dv factor bins
1 5 1 2
2 2 1 1
3 10 2 2
4 15 2 2
5 3 2 1
6 7 2 1
7 20 3 2
8 44 3 2
9 18 3 1
My problem occurs when the number of DV's for a particular factor level is not divisible by the number of bins. In the example above, factor 3 has 3 observations, and when sorting into two bins the first bin has one observation, and the second has 2. However, I would like the priority such that the first bin gets priority for assigning a DV, and the second and so-on. In my actual data set, for instance, I have a factor with 79 associated DV's and 5 bins. So I would want 16 observations in each of bin 1-4, and then 15 in bin 5. However this method gives me 16 observation in bins 1 and 3-5, and 15 in bin 2.
Is there any way to specify here my desired order of binning? Or is there an alternative way that I can solve this problem with another method that allows me to bin on the basis of a factor or, more helpfully, multiple factors?
Thank-you!
Something like this?
foo = function(x, bins) {
len = length(x)
n1 = ceiling(len/bins)
n2 = len - n1 * (bins - 1)
c(rep(1:(bins - 1), each = n1), rep(bins, n2))
}
table(foo(1:79, 5))
# 1 2 3 4 5
#16 16 16 16 15
library(dplyr)
tmpdata %>% group_by(factor) %>% mutate(bin = foo(dv, 2))
## A tibble: 9 x 3
## Groups: factor [3]
# dv factor bin
# <dbl> <fct> <dbl>
#1 5 1 1
#2 2 1 2
#3 10 2 1
#4 15 2 1
#5 3 2 2
#6 7 2 2
#7 20 3 1
#8 44 3 1
#9 18 3 2

How to count entries with specific values in R [duplicate]

This question already has an answer here:
Include levels of zero count in result of table()
(1 answer)
Closed 6 years ago.
Let's say I have two vectors
x <- c(1, 2, 2, 3, 4, 4, 5, 5, 5)
y <- c(3, 3, 3, 4, 5, 6, 6, 7, 7)
The unique numbers among all the numbers in these vectors are 1:7. I know that if I use the table function in R, I can count the number of unique entries in each of the vectors. For example, if I apply the table function to the first vector, I will get
table(x)
#x
# 1 2 3 4 5
# 1 2 1 2 3
Applying it to the second vector, I will get
table(y)
# y
# 3 4 5 6 7
# 3 1 1 2 2
How can I get it to count the number of occurrences of all unique entries in both vectors? For example, I'd like to produce the following results:
1 2 3 4 5 6 7
1 2 1 2 3 0 0
for the first vector and
1 2 3 4 5 6 7
0 0 3 1 1 2 2
First, generate a list of the values you want to get counts for in both vectors
lvl<-unique(c(x,y))
Then explicitly list those values as levels of a factor before doing table
table(factor(x, lvl))
table(factor(y, lvl))
table(factor(x, unique(union(x,y))))
table(factor(y, unique(union(x,y))))

R ggplot2 number of rows of the same values in a column

I'm new to R and plotting in R. This might be a very simple question but here it is,
Suppose I have a data frame like this:
a b c d
1 5 6 7
2 3 5 7
1 4 6 2
2 3 5 NA
1 4 4 2
2 2 4 2
1 2 5 1
2 3 4 NA
Here a, b, c, d are column names. I want to plot a bar chart that has values in column d on the x axis, and the number of rows with that value on y axis. So 7 has 2 rows, 1 has 1 and 2 has 3. It's not important to include missing values in between(3, 4, 5, 6).
So the result would be something like a histogram. I know I can do counting on column d and then do the plotting but I feel there must be a better way to do this.
Here's an approach--if I understand your question, columns A, B, and C are immaterial to what you are doing, which is plotting frequencies of column D.
library(ggplot2)
library(reshape)
##get frequencies of col d
test.summary<-table(test$d)
## re-shape the data
test.summary.m<-melt(test.summary)
ggplot(test.summary.m,aes(x=as.factor(Var.1),y=value))+
geom_bar(stat='identity')

Resources