Why does this xlim error occur in circlize initialization? - r

I want to initialize a new chord diagram with circlize, but I'm getting an error that doesn't seem to make any sense given the data I'm feeding into it:
Error: Since `xlim` is a matrix, it should have same number of rows as the length of the level of `sectors` and number of columns of 2.
I understand the requirement, but when I try to produce different plots, it fails for some but not others. Here's the relevant code snippet with some output for debugging
dev.new()
circos.clear()
circos.par(cell.padding=c(0,0,0,0), track.margin=c(0,0.01), gap.degree=1)
xlim = cbind(0, regionTotal)
print(class(region))
print(length(region))
print(class(xlim))
print(dim(xlim))
circos.initialize(factors=region, xlim=xlim)
The output for a plot that works fine:
[1] "character"
[1] 24
[1] "matrix" "array"
[1] 24 2
And for one that returns the error:
[1] "character"
[1] 50
[1] "matrix" "array"
[1] 50 2
Error: Since `xlim` is a matrix, it should have same number of rows as the length of the level of `sectors` and number of columns of 2.
I am aware of these question:
this one led me to check the class
and this one led me to check my circlize version (0.4.11)
What am I missing??? Thanks for any help you can provide.

After a lot of hair pulling, I figured out the problem: there was a repeated value in my region variable (the factors or sectors entry in circos.initialize), so the effective number of sectors was lower than the dimension of the variable. Hopefully nobody else is dumb enough to make this mistake, but just in case they are, now they can have an additional thing to check if they come across this error.

Related

Why does mutate() command create NAs?

I am currently working on an amazon dataset with many rows, which makes it hard to spot issues in the data.
My goal is to look at the amazon data, and see whether certain products have a higher variance in star ratings than other ones. I have a variable indicating product ID (asin), a variable indicating the star rating (overall), and want to create a variance variable.
I have thus used dplyr's group_by function in combination with the mutate function. Even though all input variables don't have NAs/Missings, my output variable does. I have attempted to look for a solution, yet only found solutions on what to do if the input has NAs.
See my code attached:
any(is.na(data$asin))
#[1] FALSE
any(is.na(data$overall))
# [1] FALSE
#create variable that represents variance of rating, grouped by product type
data <- data %>%
group_by(asin) %>%
mutate(ProductVariance = var(overall))
any(is.na(data$ProductVariance))
#5226 [1] TRUE
> sum(is.na(data$ProductVariance))
# [1] 289
I would much appreciate your help! Even though the amount of NAs is not big regarding the number of reviews, I would still appreciate getting to accurate means (NAs hinder the usage of tapply) and being as precice as possible in follow-up analyses.
Thank you in advance!
var will return NA if the input is length one. So any ASINs that appear once in your data will have NA variance. Depending what you're doing with it, you may find it convenient to change those NAs to 0s:
var(1)
# [1] NA
...
mutate(ProductVariance = coalesce(var(overall), 0))
Is it possible that what you're seeing is that "empty" groups are not showing up? You can change the default with .drop.
When .drop = TRUE, empty groups are dropped.

plot more than 50 components in RSSA package in R

require(Rssa)
t=ssa(co2,200) #200 here should be the number of components
plot(t) # this way it plots only the first 50 not 200!
Above code produces a graph the first 50 components only. I need to plot more than 50 components
I tried
plot(t$sigma[1:200],type='l',log='y')
but it didn't work!
Example : similar to this case
accessing eigenvalues in RSSA package in R
Looking at the help page for ?ssa we see a parameter named neig which is documented as;
integer, number of desired eigentriples. If 'NULL', then sane default value will be used, see 'Details'
Using that as a named parameter:
t=ssa(co2, neig=200)
plot(t)
And:
> t$sigma
[1] 78886.190749 329.031810 327.198387 184.659743 88.695271 88.191805 52.380502
[8] 40.527875 31.329930 29.409384 27.157698 22.334446 17.237926 14.175096
[15] 14.111402 12.976716 12.943775 12.216524 11.830642 11.614243 11.226010
[22] 10.457529 10.435998 snipped the remaining 200 numbers.
(Apparently, the package authors do not consider 200 to be "sane" number to use, although looking at the values of the results from neig=50 and neig-200 I do not see a discernable cutpoint at the 50th eigenvalue. But ... they must set it in the code which I've shown you how to access.)

Loss of decimal places when calculating mean in R

I have a list entitled SET1Bearing1slope with nine numbers, and each number has at least 10 decimal places. When I use the mean() function on the list I get an arithmetic mean
.
Yet if I list the numbers individually and then use the mean() function, I get a different output
I know that this is caused by a rounding and that the second mean is more accurate. Is there a way to avoid this issue? What method can I use to avoid rounding errors when calculating the mean?
In R, mean() expects a vector of values, not multiple values. It is also a generic function so it is tolerant of additional parameters it doesn't understand (but doesn't warn you about them). See
mean(c(1,5,6))
# [1] 4
mean(1, 5, 6) #only "1" is used here, 5 and 6 are ignored.
# [1] 1
So in your example there are no rounding errors, you are just calling the function incorrectly.
Look at the difference in the way you're calling the function:
mean(c(1,2,5))
[1] 2.666667
mean(1,2,5)
[1] 1
As pointed by MrFlick, in the first case you're passing a vector of numbers (the correct way); in the second, you're passing a list of arguments, and just the first one is considered.
As for the number of digits, you can specify it using options():
options(digits = 10)
x <- runif(10)
x
[1] 0.49957540398 0.71266139182 0.07266473584 0.90541790240 0.41799820261
[6] 0.59809536533 0.88133668737 0.17078919476 0.92475634208 0.48827998806
mean(x)
[1] 0.5671575214
But remember that a greater number of digits is not necessarily better. There's a reason why R and others limits the number os digits. Check this topic: https://en.wikipedia.org/wiki/Significance_arithmetic

How to get an Element from a vector without using numbers or indices?

Lets say I have these two vectors in my R workspace with the following content:
[1] "Atom.Type" and "Molar.Mass"
> Atom.Type
[1] "Oxygen" "Lithium" "Nitrogen" "Hydrogen"
> Molar.Mass
[1] 16 6.9 14 1
I now want to assign the Molar.Mass belonging to "Lithium" (i.e. 6.9) to a new variable called mass.
The problem is: I have to do that without using any numbers or indices.
Does anyone have a suggestion for this problem?
This should work: mass<-Molar.Mass[Atom.Type=="Lithium"] Clearly this assumes the two vectors are of the same length and sorted correctly. See additional comment from Roland below.

Define variables in an implicit way

I wanted to know how I can define variables in an implicit way in R.
For example, let's assume I have z<-0.5 and x<-2, I want to define y such that the following holds: z=beta(x,y).
Obviously, if I enter z<-beta(x,y), I have the following error Error in beta(x, y) : object 'y' not found.
I tried to find a solution in google but strangely I didn't find anything.
Thank you in advance!
For your example you could use uniroot to find the value of y:
(y <- uniroot(function(y) beta(x,y)-z, interval=c(0,100)))
$root
[1] 1
$f.root
[1] -1.08689e-07
$iter
[1] 13
$estim.prec
[1] 6.103516e-05
beta(x,y$root)==z
[1] FALSE
all.equal(beta(x,y$root),z, tol=1e-5)
[1] TRUE
beta(x,1)==z
[1] TRUE
However this relies on a number of assumptions such as there only being one value to satisfy the equation and you being able to give it a sensible interval. In general your function may not admit solutions, and it may be slow to compute if you need to calculate a large number of y values. You also need to consider that a numerical solution may not be exact, so comparisons will need to be made with care.

Resources