Plot going off graph in gvisMotionChart - r

I have created a plot in R using googleVis, specifically gvisMotionChart, plotting a number of variables.
I am primarily using the line graph and it is all good when I view the graph with all variables, however when I select some of the individual variables it zooms in sunch that some of the plot for this variable is no longer on the graph. I know it should zoom in just to view this variable and can exclude other variables (which is a good feature) but it zooms in too much so that the variable I am after is not entirely on the graph.
This doesn't happen with all variables, and I can get around it by also selecting other variables either side of the one which I want to view, but it would be good if I could fix this. Has anyone come across a similar problem before and know a way around it?
Thanks in advance
EDIT: I have an example of this using the data Batting from the Lahman package. (I know nothing about basaeball so the analysis probably doesn't make sense, in fact looking at the results it almost certainly doesn't but it displays my point). If you run the following code:
library(Lahman)
recent <- subset(Batting, yearID > 2000)
homeruns <- aggregate(HR ~ stint + yearID, data = recent, FUN = sum)
avgHR <- mean(homeruns$HR)
homeruns$HR <- homeruns$HR - avgHR
m <- gvisMotionChart(data = homeruns, idvar = "stint", timevar = "yearID")
plot(m)
Then select the line graph, then subset on number 2, the top part of the graph is cut off

It seems to be Google's bug. I could even reproduce this same error in their "Visualization Playground" (https://code.google.com/apis/ajax/playground/?type=visualization#motion_chart) making part of the data negative.
I've already reported the issue as a bug: https://code.google.com/p/google-visualization-api-issues/issues/detail?id=1479
Might the force be with them!

I just had the same problem w/ a Sankey plot. I resolved it by deleting entries with value==0. However, I just tried to reproduce your example and could not reproduce your bug, so perhaps this has already been solved?

Related

RNAseq - Plotting log2foldchange-basemean but has weird data points

I am new to processing RNA seq data and am now practicing to reproduce a published figure related to RNA seq. This os the paper and Fig2A is what I'm trying to achieve.
In brief, I downloaded the code with recount3 and subset the sample for groups that I want (control vs condition 1, control vs condition 2, etc). Then I performed the following code:
dds_4uM_30min <- DESeqDataSetFromMatrix(countData = ha_4uM_30min_data,
colData = ha_4uM_30min_meta,
design = ~ type)
dds2_4uM_30min <- DESeq(dds_4uM_30min)
res_4uM_30min <- results(dds2_4uM_30min, tidy=F)
(type is the column that I made to contain the information of whether it's control or condition 1)
This is the figure I get, which confuses me since it is nowhere near the original figure.
I thought that they might do additional processing of the data, but have no idea what are the common or reasonable ways to do.
Furthermore, there seems to be datapoints that form lines (as can seen in the above figure), which is not seen by in the original figure. I am wondering what causes this kind of distribution and how to adjust for getting rid of it.
Thanks in advance for any opinion or suggestion.
I have been trying to use the function lfcShrink but the figure still has this weird line.
Any suggestions on how to further process RNA seq data?

How to plot a histogram of a specific data frame column in R

I am super new to coding with R, Im taking is as part of a bachelors degree program. I am super stuck on something I feel should be basic but I cannot get my code to work and I am not sure why. The prompt is:
"In this problem we will be using the mpg data set, to get access to the data set you need to load the tidyverse library.
Complete the following steps:
Create a histogram for the cty column with 10 bins"
and for my code I have:
library(tidyverse)
print(mpg)
df <- mpg[ , c("city")]
histo <- ggplot(data = df, aes(x=median)) + geom_histogram(bins=10)
print(histo)
The first print was just to make sure the data loaded correctly, which it did. I am not sure about the second print function, the histo one. Ive gotten various error messages or bugs so Ive been just moving stuff around and trying different commands to get it to work. Im following the steps previously outlined in our reading, but I cannot seem to get this to work. Any help would be appreciated.
I have tried removing the print(histo) function and just leaving the ggplot, but that give me a blank white box instead of a plot, or no plot is printed.

How do I use prodlim function with a non-binary variable in formula?

I am trying to (eventually) plot data by groups, using the prodlim function.
I'm adjusting and adapting code that someone else (not available for questions) has written, and I'm not very familiar with the prodlim library/function. There are definitely other ways to do what I'd like to, but I'm trying to keep it consistent with what the previous person did.
I have code that works, when dividing the data into 2 groups, but when I try to adjust for a 4 group situation, I get an error.
Of note, the data is coming over from SAS using StatTransfer, which has been working fine.
I am new to coding, but I have compared the dataframes I'm trying to work with. The second is just a subset of the first (where the code does work), with all the same variables, and both of the variables I'm trying to group by are integer values.
Hist(medpop$dz_time, medpop$dz_status) works just fine, so the problem must be with the prodlim function, and I haven't understood much of what I've looked up about it, sadly :/ But it the documentation seems to indicate it supports continuous or categorical variables, and doesn't seem limited to binary either. None of the options seem applicable as I understand them.
this works:
M <- prodlim(Hist(dz_time, dz_status)~med, data=pop)
where med is a binary value =1 when a member of this population is taking it, and dz is a disease that some portion develop.
this does not:
(either of these get the error as below)
N <- prodlim(Hist(dz_time, dz_status)~strength, data=medpop)
N <- prodlim(Hist(dz_time, dz_status)~strength, data=pop, subset=pop$med==1)
medpop = the subset of the original population taking the med,
strength = categorical variable ("1","2","3","4")
For the line that does work, the next step is just plot(M), giving a plot with two lines, med==0 and med==1 (showing cumulative incidence of dz_status by dz_time).
For the other line, I get an error saying
Error in KernSmooth::dpik(cumtabx/N, kernel = "box") :
scale estimate is zero for input data
I don't know what that means or how to fix it.. :/

Rstudio - how to write smaller code

I'm brand new to programming and an picking up Rstudio as a stats tool.
I have a dataset which includes multiple questionnaires divided by weeks, and I'm trying to organize the data into meaningful chunks.
Right now this is what my code looks like:
w1a=table(qwest1,talm1)
w2a=table(qwest2,talm2)
w3a=table(quest3,talm3)
Where quest and talm are the names of the variable and the number denotes the week.
Is there a way to compress all those lines into one line of code so that I could make w1a,w2a,w3a... each their own object with the corresponding questionnaire added in?
Thank you for your help, I'm very new to coding and I don't know the etiquette or all the vocabulary.
This might do what you wanted (but not what you asked for):
tbl_list <- mapply(table, list(qwest1, qwest2, quest3),
list(talm1, talm2, talm3) )
names(tbl_list) <- c('w1a', 'w2a','w3a')
You are committing a fairly typical new-R-user error in creating multiple similarly named and structured objects but not putting them in a list. This is my effort at pushing you in that direction. Could also have been done via:
qwest_lst <- list(qwest1, qwest2, quest3)
talm_lst <- list(talm1, talm2, talm3)
tbl_lst <- mapply(table, qwest_lst, talm_lst)
names(tbl_list) <- paste0('w', 1:3, 'a')
There are other ways to programmatically access objects with character vectors using get or wget.

Discretization of data in R -crazy values

Hello again stackoverflow-ers ! hope you are well
I am working on a project and am essentially trying to create a decision tree. The data is a for a bank's campaign concerning how well the campaign incentivized the customers to open up a term deposit.
Anyhow, i've worked through coding etc with some assistance from online resources and hit the wall on one part.
One of the columns is the term deposit amout figure for all customers and as I plotted the data to visualize it (please see attached the plot)
Since the data is so dispersed i wanted to discretize it. I used the following code:
BankTraining$TDepositAMTD<-cut(BankTraining$TermDepositAMT, right=F,
breaks= c(0,5000,10000,15000,20000,max(BankTraining$TermDepositAMT)))
here
The Y axis is the number of observations and X axis is the dollar amount of term deposits.
However, viewing the column after this step i see :
table(BankTraining$TDepositAMTD)
[0,5e+03) [5e+03,1e+04)
5213 8631
[1e+04,1.5e+04) [1.5e+04,2e+04)
8367 1698
[2e+04,3e+04)
3121
Now, clearly this is no good. Once the decision tree is created it shows these weird categories which I cannot interpret.
Could someone shed light on this issue please? Much gratitude for your help.
Since it seems you are not happy with the cuts you are producing, have a go at it with:
library(Hmisc)
Groups <- cut2(data, g = 5) # g is the number of groups or levels I want
The package Hmisc can be found here.
As for your weird categories, we would need to see what packages/ algorithms along with how you call it as these categories may be a product of your binning and some consequence of default behavior. Happy to edit when more information is available.

Resources