So I imported the file and saved it as Purity and its clearly imported. I tried t-test but it doesn't recognize my variables. I tried using the names function to retrieve the variable names and its exactly what I'm entering, V1 and V2. I also tried with Lab-1 and Lab-2. I also tried just using dataset=Purity, all to no avail.
I took a screenshot so as to show code and that data is in the studio, can anyone tell me why this is not working?
apologies if this is painfully obvious I was only introduced to R for stats last week and am still a beginner, also don't have much experience with programming in general. I have looked at other similar problems but just cant see why mine aren't being recognised and others are.
You've got 2 issues here:
1). You don't show how you imported the dataset but you need to either remove the first row or (better) name the columns correctly. I'm assuming you imported the data with read.table(). If so, then include the argument header = TRUE when you import the data.
2). You need to tell R where you want it to get Lab-1 and Lab-2 from.
with(Purity, t.test(Lab-1, Lab-2, paired = TRUE))
r is case sensitive. It looks like your script is using a lowercase 'v" when its an upper case "V" in your variable names.
The problem is in the way you have named your variable. r doesn't recognise the hyphen (-) as a legitimate part of a variable name. Try using underscore (_) instead.
Related
I'm learning the very basics of the R language.
I would like to loop (with either a loop or a while function since that's what I'm learning) through all the values of a specific dataset column that is called "Kid.Height". Let's say the dataset is called test.
I can target a standard column like this : "test$KidHeight". But not with a dot in its name ("test$Kid.Height").
I would like to do something like this:
for(i in test$Kid.Height) {
print(Kid.Height.value);
}
So that I can read all the row values of that column
I can't find any instance on the web that tells me how to deal with dots in columns name.
I know how to target a column by its index but not by name so that it always works, however fancy it is.
PS: since I'm learning the basics, if I can ask you the most clean and recommended way to achieve this, so that I can learn from scratch, I would be grateful.
Thank you.
Columns can also be accessed using square brackets for difficult column names, try:
test['Kid.Height']
So I've used the decompose function and I want to export all the lists it generates not just the plot it creates. I tried converting the lists into either a matrix or data frame but then that gets rid of the date header and year columns so if someone knows how to convert it and keep the list formatting that would solve my issue I think.
Anyway, The closest I've got to being able to do this keeping the list format is by doing
capture.output(decompose, file = "filename.csv")
As you can see from the image attached though:
Sometimes the months arent all together in a row which is really not helpful or what I want. It also just puts it in one column and I'm having to go into the excel after and do the text to column option which is going to get old really quickly.
Any help would be greatly appriciated. I'm really new to R so apologise if there is an obvious fix I'm missing.
I cannot figure out how to assign the column headers from my imported xlsx sheet as variables. I have several column headers, for example DAY_CHNG and INPUT_CHG. So far, I can only run gls(DAY_CHG~INPUT_CHG) by first assigning the values as variables by X<-mydata$DAY_CHG. Is there some command to get these variables assigned automatically when I import?
I had horrible problems getting the program up and running, by the way, due to firewalls at the firm for which I'm working, wondering if that's causing some of the issue.
Any help is much appreciated. Thanks!
attach(mydata) will allow you to directly use the variable names. However, attach may cause problems, especially with more complex data/analyses (see Do you use attach() or call variables by name or slicing? for a discussion)
An alternative would be to use with, such as with(mydata, gls(DAY_CHG~INPUT_CHG)
I would suggest using the $ in order to use the headers as variables and still be able to use other data sets. All that needs to be done is assign the data to an object such as your mydata and by putting a $ immediately following, you will be able to refer to your headers as variables.
As an example for your case, instead of creating a new object x, simply take what you assigned x to and put it directly into your command.
gls(mydata$DAY_CHG ~ mydata$INPUT_CHG)
when it becomes more complicated with more data sets this will allow you to have access to all of them still while not limiting yourself to the data set you attach()
I have a raw dataset and the columns are not clearly defined at all. When I go to import the data using "Read.Table" in R, it automatically tries to approximate where the columns begin and end. But it is not correct. I know the number of characters per variable, but I am not sure how to customize them as one would in Excel(=Left(x,3) OR =MID(X,4,1)... etc.). Some variables are separated by spaces, some aren't. It is not consistent.
FYI: The document was originally ".dat", then I saved the file as a ".R" file.
Here is an example of my data
Any help is much appreciated! Let me know
You can use read_fwf from the great readr package, to specify the fix widths per variable.
I have a dataset in stata and I want to take it to R, but there are some missing values in state and they are represented using a period. I want to get the data into R which I do by loading the foreign package and then I use read.table() function. How do I convert the periods in state which are genuinely missing to NA in R?
If i understand you correctly, you first load the Foreign-Package for loading a .dta-File, correct?
library("foreign")
Then you would read in your Data by using:
myRFile <- read.dta(file="someStataFile.dta")
You are asking for a way that the missing operator from Stata, often denoted by a dot ., is converted to the missing operator in R, NA, also correct?
One thing to know here is, that Stata handles missing values "behind the scenes" in multiple ways. There are actually about 27 different missing operators in Stata, which are usually not distinguishable for the user. You do not need to know them for you problem though, because read.dta() handles them itself.
To learn how you can tackle a simple problem like this yourself in the future, you always need to check the help file for your function first:
help(read.dta)
Here you see, that the function handles the extensive missing-data types from Stata automatically and correctly.
If you want to have information about which type of missing operator was recognized, you can set the argument missing.type=TRUE, by using:
myRFile <- read.dta(file="someStataFile.dta", missing.type=TRUE)
Then, according to the help file, the following will happen:
If missing.type is TRUE a separate list is created with the same
variable names as the loaded data. For string variables the list value
is NULL. For other variables the value is NA where the observation is
not missing and 0–26 when the observation is missing. This is attached
as the "missing" attribute of the returned value.