prevent [R] data frames with long lines from being truncated?

prevent [R] data frames with long lines from being truncated? - r

I have data frames that have wide rows, either because they have a lot of columns of numerical values or because they have long text columns
I typically use RStudio and Rmd Notebooks. I can not figure out how to get the entire line printed without a truncation. I get the same behavior when I run the code in the RStudio console so I do not think it is an R markdown issue.
I also want to use capture.out() to save the formatted output from functions like anova. The capture.out() file is also truncated
Here is an example from the RStudio console
> fdf <- mcols(res.lfcShrink)
> fdf
DataFrame with 4 rows and 2 columns
type description
<character> <character>
baseMean intermediate mean of normalized c..
log2FoldChange results log2 fold change (MA..
lfcSE results posterior SD: diseas..
svalue results FSOS s-value (T=1): ..
> mcols(res.lfcShrink)$description
[1] "mean of normalized counts for all samples"
[2] "log2 fold change (MAP): diseaseState PDAC vs healthy"
[3] "posterior SD: diseaseState PDAC vs healthy"
[4] "FSOS s-value (T=1): diseaseState PDAC vs healthy"
I tried setting "limit length of lines displayed in console to" 500. Did not help
thanks in advance
Andy

Related

Error during harmonisation in TwoSampleMR (R-package)

I am trying to perform Mandalian Randomisation using the R package “TwoSampleMR”.
As exposure data, I use instruments from the GWAS catalog. (Phenotype - Sphingolipid levels).
As a outcome data, I use GISCOME ischemic stroke outcome GWAS (http://www.kp4cd.org/index.php/node/391)
I have an error when I do harmonization by the command harmonise_data().
The text of the error is:
**Error in data.frame(…, check.names = FALSE) : arguments imply differing number of rows: 1, 0**.
I have noticed that the error is caused by some exact lines in the file with outcomes. When I make a text file that contains only one line from the original file and use it as outcome data, some lines cause an error, and someones don’t.
As an example this one causes an error:
MarkerName CHR POS Allele1 Allele2 Freq1 Effect StdErr P-value
rs10938494 4 47563448 a g 0.2139 0.0294 0.0519 0.5706
This one doesn’t:
rs1000778 11 61655305 a g 0.2559 0.0939 0.0493 0.05705
Here is all commands that I use.
library(TwoSampleMR)
library(MRInstruments)
data(gwas_catalog)
exp <- subset(gwas_catalog, grepl("Sphingolipid levels", Phenotype))
exp_dat<-format_data(exp)
exp_dat<-clump_data(exp_dat)
exp_dat
out_dat<-read_outcome_data(
snps=exp_dat$SNP,
filename='giscome.012vs3456.age-gender-5PC.meta1.txt'
sep='\t', snp_col='MarkerName',
beta_col='Effect',
se_col='StdErr',
effect_allele_col='Allele1',
other_allele_col='Allele2',
eaf_col='Freq1',
pval_col='Р-value'
)
dat<-harmonise_data(exporsure_dat=exp_dat, outcome_dat=out_dat)
What would be the reason for this problem?
Thank you.

It is difficult to comment without looking at your sample input file but you might encounter this sort of error when there are inconsistencies with naming the exposure columns in your data frame.
Please see this thread on.
https://github.com/MRCIEU/TwoSampleMR/issues/226

What is the best way to manage/store result from either posthoc.krukal.dunn.test() or dunn.test() - where my input data is in dataframe format?

I am a newbie in R programming and seek help in analyzing the Metabolomics data - 118 metabolites with 4 conditions (3 replicates per condition). I would like to know, for each metabolite, which condition(s) is significantly different from which. Here is part of my data
> head(mydata)
Conditions HMDB03331 HMDB00699 HMDB00606 HMDB00707 HMDB00725 HMDB00017 HMDB01173
1 DMSO_BASAL 0.001289121 0.001578235 0.001612297 0.0007772231 3.475837e-06 0.0001221674 0.02691318
2 DMSO_BASAL 0.001158363 0.001413287 0.001541713 0.0007278363 3.345166e-04 0.0001037669 0.03471329
3 DMSO_BASAL 0.001043537 0.002380287 0.001240891 0.0008595932 4.007387e-04 0.0002033625 0.07426482
4 DMSO_G30 0.001195253 0.002338346 0.002133992 0.0007924157 4.189224e-06 0.0002131131 0.05000778
5 DMSO_G30 0.001511538 0.002264779 0.002535853 0.0011580857 3.639661e-06 0.0001700157 0.02657079
6 DMSO_G30 0.001554804 0.001262859 0.002047611 0.0008419137 6.350990e-04 0.0000851638 0.04752020
This is what I have so far.
I learned the first line from this post
kwtest_pvl = apply(mydata[,-1], 2, function(x) kruskal.test(x,as.factor(mydata$Conditions))$p.value)
and this is where I loop through the metabolite that past KW test
tCol = colnames(mydata[,-1])[kwtest_pvl <= 0.05]
for (k in tCol){
output = posthoc.kruskal.dunn.test(mydata[,k],as.factor(mydata$Conditions),p.adjust.method = "BH")
}
I am not sure how to manage my output such that it is easier to manage for all the metabolites that passed KW test. Perhaps saving the output from each iteration appending to excel? I also tried dunn.test package since it has an option of table or list output. However, it still leaves me at the same point. Kinda stuck here.
Moreover, should I also perform some kind of adjusted p-value, i.e FWER, FDR, BH right after KW test - before performing the posthoc test?
Any suggestion(s) would be greatly appreciated.

How to plot data from Excel using the R corrplot function?

I am trying to learn R, and use the corrplot library to draw Y:City and X: Population graph. I wrote the below code:
When you look at the picture above, there are 2 columns City and population. When I run the code I get this error message:
Error in cor(Illere_Gore_Nufus) : 'x' must be numeric.
My excel data:

In general, correlation plot (Scattered plot) can be plotted only when you have two continuous variable. Correlation is a value that tells you how two continuous variables are linearly related. The Correlation value will always fall between -1 and 1, where correlation value of -1 depicts weak linear relationship and correlation value of 1 depicts strong linear relationship between the two variables. Correlation value of 0 says that there is no linear relationship between the two variables, however, there could be curvi-linear relationship between the two variables
For example
Area of the land Vs Price of the land
Here is the Data
The correlation value for this data is 0.896, which means that there is a strong linear correlation between Area of the land and Price of the land (Obviously!).
Scatter plot in R would look like this
Scatter plot
The R code would be
area<-c(650,785,880,990,1100,1250,1350,1800,2200,2800)
price<-c(250,275,280,290,350,340,400,335,420,460)
cor(area,price)
plot(area,price)
In Excel, for the same example, you can select the two columns, go to Insert > Scatter plot (under charts section)
Scatter plot
In your case, the information can be plotted in bar graph with city in y axis and population in x axis or vice versa!
Hope I have answered you query!

Some assumptions
You are asking how to do this in Excel, but your question is tagged R and Power BI (also RStudio, but that has been edited away), so I'm going to show you how to do this with R and Power BI. I'm also going to show you why you got that error message, and also why you would get an error message either way because your dataset is just not sufficient to make a correlation plot.
My answer
I'm assuming you would like to make a correlation plot of the population between the cities in your table. In that table you'd need more information than only one year for each city. I would check your data sources and see if you could come up with population numbers for, let's say, the last 10 years. In lack of the exact numbers for the cities in your table, I'm going to use some semi-made up numbers for the population in the 10 most populous countries (following your datastrutcture):
Country 2017 2016 2015 2014 2013
China 1415045928 1412626453 1414944844 1411445597 1409517397
India 1354051854 1340371473 1339431384 1343418009 1339180127
United States 326766748 324472802 325279622 324521777 324459463
Indonesia 266794980 266244787 266591965 265394107 263991379
Brazil 210867954 210335253 209297939 209860881 209288278
Pakistan 200813818 199761249 200253292 197655630 197015955
Nigeria 195875237 192568158 195757661 191728478 190886311
Bangladesh 166368149 165630262 165936711 166124290 164669751
Russia 143964709 143658415 143146914 143341653 142989754
Mexcio 137590740 137486490 136768870 137177870 136590740
Writing and debugging R code in Power BI is a real pain, so I would recommend installing R studio, write your little R snippets there, and then paste it into Power B.
The reason for your error message is that the function cor() onlyt takes numerical data as arguments. In your code sample the city names are given as arguments. And there are more potential traps in your code sample. You have to make sure that your dataset is numeric. And you have to make sure that your dataset has a shape that the cor() will accept.
Below is an R script that will do just that. Copy the data above, and store it in a file called data.xlsx on your C drive.
The Code
library(corrplot)
library(readxl)
# Read data
setwd("C:/")
data <- read_excel("data.xlsx")
# Set Country names as row index
rownames(data) <- data$Country
# Remove Country from dataframe
data$Country <- NULL
# Transpose data into a readable format for cor()
data <- data.frame(t(data))
# Plot data
corrplot(cor(data))
The plot
Power BI
In Power BI, you need to import the data before you use it in an R visual:
Copy this:
Country,2017,2016,2015,2014,2013
China,1415045928,1412626453,1414944844,1411445597,1409517397
India,1354051854,1340371473,1339431384,1343418009,1339180127
United States,326766748,324472802,325279622,324521777,324459463
Indonesia,266794980,266244787,266591965,265394107,263991379
Brazil,210867954,210335253,209297939,209860881,209288278
Pakistan,200813818,199761249,200253292,197655630,197015955
Nigeria,195875237,192568158,195757661,191728478,190886311
Bangladesh,166368149,165630262,165936711,166124290,164669751
Russia,143964709,143658415,143146914,143341653,142989754
Mexcio,137590740,137486490,136768870,137177870,136590740
Save it as countries.csv in a folder of your choosing, and pick it up in Power BI using
Get Data | Text/CSV, click Edit in the dialog box, and in the Power Query Editor, click Use First Row as headers so that you have this table in your Power Query Editor:
Click Close & Apply and make sure that you've got the data available under VISUALIZATIONS | FIELDS:
Click R under VISUALIZATIONS:
Select all columns under FIELDS | countries so that you get this setup:
Take parts of your R snippet that we prepared above
library(corrplot)
# Set Country names as row index
data <- dataset
rownames(data) <- data$Country
# Remove Country from dataframe
data$Country <- NULL
# Transpose data into a readable format for cor()
data <- data.frame(t(data))
# Plot data
corrplot(cor(data))
And paste it into the Power BI R script Editor:
Click Run R Script:
And you're gonna get this:
That's it!
If you change the procedure to importing data from an Excel file instead of a textfile (using Get Data | Excel , you've successfully combined the powers of Excel, Power BI and R to produce a scatterplot!
I hope this is what you were looking for!

Issue with Boxplot formula or variable definition

I have a csv file having 4 columns labeled AGE, DIASTOLIC, BMI and EVER.PREGNANT and 700 rows. The last column consists of only yes or no. I wish to plot the data BMI vs EVER.PREGNANT with an intent to comparing BMI of those with yes in the fourth column and no in the same column. What code should I write to get the required boxplot?
I have tried the following code:
Sheet=read.csv(/Downloads/1739230_1284354330_PIMA.csv - 1739230_1284354330_PIMA.csv.csv, sep=",")
boxplot(BMI~EVER.PREGNANT,data=sheet, main="BMI vs PREG",xlab="BMI",ylab="PREGNANT")
The error that I get is
Error in eval(expr,envr,enclos): object 'Sheet' not found
Similarly, what modifications can be done to plot AGE vs DIASTOLIC, where both columns are numbers? Will I get the 700 odd values nicely?

I answer here because it tells me not to extend the discussion :-).
I think you haven't loaded correctly your data set. You need to add header = T when loading to tell the program that your first row corresponds with the names of the variables.
Sheet=read.csv("/Downloads/1739230_1284354330_PIMA.csv", sep=",", header = T)

how to use the functcomp code in R

I am having trouble using the functcomp package in R.
I have 2 datasets: one with species frequency, and the other listing the functional traits of my species. The frequency dataset has 264 species listed in the first row and 27 sites listed in the first column, all values in dataset are between 0-1. The functional trait dataset has the same 264 species (copied & pasted from the frequency dataset to make sure identical) listed in the first column, and 5 different functional traits listed in the 1st row (height, life history, life form, origin, palatability).
I am using the following code:
traits.df <- read.table("species_functional_traits_6_ August.txt", header = TRUE)
frequency.df <- read.table("Spring 2014 - combined table - 6 August.txt", header = TRUE)
x <- (as.matrix(traits.df))
a <- (as.matrix(frequency.df))
functcomp(x, a, CWM.type = c("dom", "all"), bin.num = height)
But keep getting the following error message:
Error in functcomp(x, a, CWM.type = c("dom", "all"), bin.num = height) :
Different number of species in 'x' and 'a'.
I have tried fiddling with a couple of things in the code and datasets, but cannot work out what I am doing wrong here. Any help would be greatly appreciated!
Here are links the frequency & trait data (a subset of it, but still get same error message with this data) as a tab-delimited text file
frequency: https://www.dropbox.com/s/girs3nrq1ciyg1a/frequency%20-%20small.txt?dl=0
traits: https://www.dropbox.com/s/l888sallx7mu3f6/traits%20-%20small.txt?dl=0

try stating row.names=1 when read in your table, this solved my problem -
Anna

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

prevent [R] data frames with long lines from being truncated? - r

Related

Error during harmonisation in TwoSampleMR (R-package)

What is the best way to manage/store result from either posthoc.krukal.dunn.test() or dunn.test() - where my input data is in dataframe format?

How to plot data from Excel using the R corrplot function?

Issue with Boxplot formula or variable definition

how to use the functcomp code in R

Categories

Resources