(r) turning data (DNAbin) into a matrix - r

I am trying to run stamppFst() and stamppConvert() with haplotype data. The data I have is a squence of nucleotides in a DNAbin. I have tried to find ways to turn it into a matrix but what I have read goes way over my head since this is the first time I have ever used R.
data
This is an example of one of the data sets I want to use.
I apologize if this is a very basic question. Thanks for any help!

Related

How to create a loop with R function specgram(signal)

I am working with many signals; each one in a time series but is too many and, I need to make more than 1000 but, I am not sure how to implement it because I not only need the plots but the values of output for each spectrogram stored in a file or an R object. I am sorry I don't have an approach. Can anyone help out, please?

Comparing two lists in R

Hi so I have two nearly identical data sets, however one has some values the other doesn't and I'm trying to compare them in R. I'm trying to create a list of the observations in the two data sets that aren't shared between the two, but I'm struggling with how to do this. I'm relatively new to R.
You should try the arsenal package.
try
install.packages("arsenal")
library(arsenal)
captureVariable <- summary(arsenal::comparedf(list1,list2))
captureVariable[["diffs.byvar.table"]]
There are some other helpful outputs that will be captured by captureVariable if that particular table doesn't suit your needs.

How to handle a large collection of time series in R?

I have data that represents about 50,000 different 2-year monthly time series. What would be the most convenient and tidyverse-ish way to store that in R? I'll be using R to review each series, trying to extract characteristic features of their shapes.
Somehow a data frame with 50,000 rows and 24 columns (plus a few more for meta data) seems awkward, because the time axis is in the columns. But what else should I be using? A list of xts objects? A data frame with 50,000x24 rows? A three-dimensional matrix? I'm not really seeing anything obviously convenient, and my friend google hasn't found any great examples for me either. I imagine this means I'm overlooking the obvious solution, so maybe someone can suggest it. Any help?

How to run a cluster on data that is strings only R

I am trying to run a cluster on a very large data set. It contains only strings for values. I have removed the NA's and relaced with a dummy value. My K-Means in R keeps failing due to NA coerecion. How would the community run a cluster on this data. I am shwoing 10 rows of a dummy example below. In this situation lets call the data frame: cluster_data
ANy help would be greatly appreciated. I am trying see if any of the columns cause the data to break earlier then another to try and understand a possible struture. Thought Clustering with K-means was best approach but do not see how to do with strings. Have converted to factors in R and still have issues. ANy example code is greatly appreciated
Question: how do you run kmeans clustering with strings?
Answer: You can't run k means cluster analysis on categorical data. You need data that a distance function can make sense of.
K-means is designed for continuous variables, where least-squares and the mean make sense to be used as centers.
For other data types, it is better to sue other algorithms, such as PAM, HAC, DBSCAN, OPTICS, ...

matching among multiple variables in R

I am beginner in R. So, I am confused about the title of my question. sorry for that. I am trying to explain..
Professor gave me a NetCDF atmospheric data file(18.3MB).this file has 8 dimension and 8 variable. i have to work with 4 variable. every variable(time,site number,urban site,pm10) has 683016 data. suppose,
Urban site number:[2,5],
site number:[1,2,3,4,5,6],
time:[1-3-2012,2-3-2012....](hourly data(24) has taken in each day ),
pm10:[1,2,3,4,5,6.......](different for every hourly data with some missing value)
I have to manage this data set only for urban site and 1-3-2012(actually I have to make this spatio-temporal data to spatial data).I want my final data set like this:
Colum 1(time): 1-3-2012,1-3-2012,1-3-2012,1-3-2012,1-3-2012,1-3-2012
colum 2(Urban site number): 2,2,2,5,5,5
colum 3(pm10 value):1,2,3,NA,4,5,
As I only know very basic commands of R so I cant understand how can I solve this problem. Even I don't under stand How can I find any example of this type of problem in internet.
so, please give me some suggestion or link about what I have to learn to solve this problem in R. Please, help me out?
I think you're trying to reshape the dataset but i'm afraid i do not see how your current dataset looks like.
Could you elaborate more on what your dataset looks like right now?
There are packages that help reshaping such as {reshape} or {plyr}. But i need more detail to suggest which one you should use.

Resources