matching among multiple variables in R - r

I am beginner in R. So, I am confused about the title of my question. sorry for that. I am trying to explain..
Professor gave me a NetCDF atmospheric data file(18.3MB).this file has 8 dimension and 8 variable. i have to work with 4 variable. every variable(time,site number,urban site,pm10) has 683016 data. suppose,
Urban site number:[2,5],
site number:[1,2,3,4,5,6],
time:[1-3-2012,2-3-2012....](hourly data(24) has taken in each day ),
pm10:[1,2,3,4,5,6.......](different for every hourly data with some missing value)
I have to manage this data set only for urban site and 1-3-2012(actually I have to make this spatio-temporal data to spatial data).I want my final data set like this:
Colum 1(time): 1-3-2012,1-3-2012,1-3-2012,1-3-2012,1-3-2012,1-3-2012
colum 2(Urban site number): 2,2,2,5,5,5
colum 3(pm10 value):1,2,3,NA,4,5,
As I only know very basic commands of R so I cant understand how can I solve this problem. Even I don't under stand How can I find any example of this type of problem in internet.
so, please give me some suggestion or link about what I have to learn to solve this problem in R. Please, help me out?

I think you're trying to reshape the dataset but i'm afraid i do not see how your current dataset looks like.
Could you elaborate more on what your dataset looks like right now?
There are packages that help reshaping such as {reshape} or {plyr}. But i need more detail to suggest which one you should use.

Related

(r) turning data (DNAbin) into a matrix

I am trying to run stamppFst() and stamppConvert() with haplotype data. The data I have is a squence of nucleotides in a DNAbin. I have tried to find ways to turn it into a matrix but what I have read goes way over my head since this is the first time I have ever used R.
data
This is an example of one of the data sets I want to use.
I apologize if this is a very basic question. Thanks for any help!

I want to know how a csv data table with age_groups should be handled in R

enter image description here
This is my dumbed down example dataset.
When I read.csv into R and say for example "table(mydata$England)" it shows frequencies which make no sense in the console. How do I make it understand that the frequencies of the ages have already been calculated in the table data and have been split by ages?
I feel like it is something really obvious that I'm just being very blind to, if there is a tutorial for this then feel free to just link me!
Should I be using levels()? I can't find any resources that help explain this particular problem on google.
My code snippet as it stands:
> ! `mydatasheet <- read.csv("A2.csv")
View(mydatasheet)
#Displays fine
Eng_v<-table(mydatasheet$ENGLAND,mydatasheet$Ages)

Is repeated anova what i am looking for?

I'm studying the NDVI (normalized vegetation index) behaviour of some soils and cultivars. My database has 33 days of acquisition, 17 kind of soils and 4 different cultivars. I have built it in two different ways, that you can see attached. I am having troubles and errors with both the shapes.
The question first of all is: Is repeated anova the correct way of analyzing my data? I want to see if there are any differences between the behaviours of the different cultivars and the different soils. I've made an ANOVA for each day and there are statistical differecies in each day, but the results are not globally interesting due to the fact that I would like to investigate the whole year behaviour.
The second question then is: how can I perform it? I''ve tryed different tutorials but I had unexpected errors or I didn't manage to complete the analysis.
Last but not the least: I'm coding with R Studio.
Any help is appreciated, I'm still new to statistic but really interested in improving!
orizzontal database
vertical database
I believe you can use the ANOVA, but as always, you have to know if that really is what you're looking for. Either way, since this a plataform for programmin questions, I'll write a code that should work for the vertical version. However, since I don't have your data, I can't know for sure (for future reference, dput(data) creates easily importeable code for those trying to answer you).
summary(aov(suolo ~ CV, data = data))

How to quickly identify which type of plot I should use?

We have a lot of types of plots available in R. And every single time I get a dataset, I have to think for a long time that which type of plot I should use to plot my dataset in order to get information I want (I'm a beginner of R). I don't know whether it's related to my math and stats knowledge or just not familiar with R tech skills. Anybody can tell me the reasons and how to improve that? Thanks many in advance.

Variable selection and adding noise data

Its my first post and english is not my first language so please bear with me.
I have searched the forum about my problem but im still looking forward to the suitable answer.
So here is my problem im trying to use spike and slab package as a variable selection tool for the first time and i have a data set of 1000 examples and 8 variables, but i think i need more variables to evaluate the effectiveness of the package and i dont know how i can add more random variables to my data set.
Is there any command in R that do this ? Can you please help me friends?
I appreciate your inputs
Thanks.
the code i've used :
diabet=read.csv(data,header=T,sep=",")
diabet
library(spikeslab)
obj <- spikeslab(BS~ . , diabet)
print(obj)
plot(obj)
https://imgur.com/a/NerKn
as you can see all of my variables are included as top vars

Resources