Hi so I have two nearly identical data sets, however one has some values the other doesn't and I'm trying to compare them in R. I'm trying to create a list of the observations in the two data sets that aren't shared between the two, but I'm struggling with how to do this. I'm relatively new to R.
You should try the arsenal package.
try
install.packages("arsenal")
library(arsenal)
captureVariable <- summary(arsenal::comparedf(list1,list2))
captureVariable[["diffs.byvar.table"]]
There are some other helpful outputs that will be captured by captureVariable if that particular table doesn't suit your needs.
Related
I am trying to run stamppFst() and stamppConvert() with haplotype data. The data I have is a squence of nucleotides in a DNAbin. I have tried to find ways to turn it into a matrix but what I have read goes way over my head since this is the first time I have ever used R.
data
This is an example of one of the data sets I want to use.
I apologize if this is a very basic question. Thanks for any help!
I am pretty new to R, but I am working with several datasets containing the same data only from different days.
For my analysis I only need some specific columns from this dataset, therefore I created a new dataset with only the new colums (I do not want to overwrite or delete the old dataset). I am using the following code to do this:
subset01012018 <- (dataset01012018[,c(1,2,3,4,10,11,14,15,16)])
Now I want to apply the same to all the datasets. How could I do something like this? Could I do this with a for loop? Or do I need an apply function?
Hope someone can help me!
I am very familiar with Excel but new to R. I have several years worth of data across multiple spreadsheets:
data1996.csv
data1997.csv
...
data2013.csv
Each csv is about 500,000 rows by 1700 columns.
I want to manipulate this data in D3 and plan to remove columns that are not essential to calculation. My goal is to create a slider with years that will create a corresponding visualization. I want to know what the easiest way is to aggregate these massive datasets. I suppose it could be done manually, but this would prove cumbersome and inefficient.
I am very new to R, but if there is another means to aggregate the data into a single CSV, that would work fine as well.
Any help and suggestions are appreciated.
The sample for a survey I am analysing was not selected randomly and so I need to apply a vector of weights to make the findings representative of the population. I have used wtd.table() (from gmodels) successfully to create frequency tables but now want to create a contingency table to compare two categorical variables and conduct a Chi-sqrd test. I'm struggling to find the right function. The svytable() function in the survey package sounds promising but I don't see where I should input the weight vector. I'm new to R. Could anyone explain how to use svytable() or suggest an alternative?
I have just started knowing PCA and i wish to use it for a huge microarray dataset with more than 4,00,000 rows. I have my columns in the form of samples, and rows in the form of genes/locus. I did go through some tutorials on using PCA and came across princomp() and prcomp() and a few others.
Now, as i learn here that, in order to plot ¨samples¨ in the biplot, i would need to have them in the rows, and genes/locus in the columns, and hence i will have to transpose my data before using it for PCA.
However, since the rows are more than 4,00,000, i am not really able to transpose them into columns, because the columns are limited. So my question is that, is there any way to perform a PCA on my data, without transposing it, using these R functions ? If not, can anyone of you suggest me any other way or method to do so ?
Why do you hate to transpose your data? It's easy!
If you read your data into R (for example as the matrix microarray.data) you can transpose them with just a command:
transposed.microarray.data<-t(microarray.data)