I want to find the mean of some variable which satisfy some attribute properties.
For example, let att1 and att2 be two variable attributes. I want to create a variable that is the average of the variables for which both att1=3 and att2=3.
The problem with the syntax below is that I don't know the length of the vector vec and therefore I cannot create it without specifying its length.
COMPUTE newvar = MEAN(vec).
EXECUTE
Related
I am trying to use the cor() function to find correlation between different attributes in a dataframe. When I pass attributes individually like in:
cor_or1_or7 = cor(plant_data$Orientation1, plant_data$Orientation7,
method = "pearson", use = "complete.obs")
I get the correlation between plant_data$Orientation1 and plant_data$Orientation7 to be -0.8696721
But when I pass the whole dataframe, that contains all the attributes, I get a value of -0.89070093 for the same pair of attributes.
The code I used for passing the entire dataframe into the cor() function:
correlation_table <- cor(plant_data[2:19], method = "pearson", use = "complete.obs")
There are in total of 18 attributes with 724 instances in the dataframe.
I can't seem to figure out why the same function, applied to the same set of values is giving different answers! Can anyone please tell me what's going on here? Thanks!
This will be because using the complete.obs argument, depending on where your NAs fall in other variables in columns 2-19, the sample will change. Say you have the entire data set for Orientation 1 and Orientation 7, n = 50 for example, then all 50 data points are used in the correlation calculation calling just those two variables. If in another variables you have 3 NAs and n = 47, by calling the whole data frame you will n = 47 data points for all variables before calculating the correlation coefficient and so the value for Orientation 1 and 7 will change.
I am new in R and probably this is an easy question:
I have the following vector:
P <- c(23,45,98)
These values represent the numbers of rows
Now, I have a table with only one column and I would like to obtain the values on each row from the previous vector and return it into 3 different objects (Variables).
e.g. The row #23 has the value P05.14 and for this first value of the vector "P" I want to create a variable or object like: A = P05.14. The same with the other two values of that vector.
Thanks for your help.
If you only have the three values, just do it manually:
A <- dat[23,]
B <- dat[45,]
C <- dat[98,]
For more values, you can assign them in a loop:
for(value in P){
assign(paste0("A",value), as.character(dat[value,]))
}
I should note that in a situation such as this, it would be best to use a list, and not litter the workspace with variable. But to each their own. Good luck!
I would like to know if it is possible to assign more than one value to a vector using only one statement. For example, suppose that I have a data frame made of 3 columns name, age and sex and I want to modify, let's say, row #40. Normally I would do something like this
df[40,]$name <- 'Foo'
df[40,]$age <- 75
However, I am wondering if it can be done in one statement (like Python multiple assignment). It's okay if it can be done using an external package.
Yes, you can do
df[40, c("name", "age")] <- list("Foo", 75)
As long as the name column is not factor, it should be fine.
I am using the dataset that can be accessed with the following command - load(url("http://bit.ly/dasi_gss_data"))
When I run the query table(gss$premarsx), it returns a column called Other with count 0. When I plot a graph of the same variable (premarsx), there is a column Other with zero height. Is there a way to remove the variable value Other from the variable definition so that it does not appear in the results of any queries/plots?
You can pass it through the factor() function to have it pick up the present levels:
gss$premarsx <- factor(gss$premarsx)
In R, I have a matrix: matClust4 which holds all vectors that are in cluster 4 after executing the kmeans algorithm.
matClust4 has dimensions 27 X 31 and has the rownames attribute set for each vector.
What I would like to do is give another attribute to each row vector in matClust4
I would prefer to use the apply function. I would like to try something like this:
apply(matClust4, 1, function(v) SOME_ATTRIBUTE(v) = idClust4)
#where idClust4 is some previous calculated result
How can I create/use an attribute of matClust4 to do this?
You woud not need to use apply for that purpose if the to-be-assigned values had already been computed (and had the same number of elements as matClust4 had rows. You should just assign an R attribute with:
attr(matClust4, 'SOME_ATTRIBUTE') = idClust4
This is how Frank Harrell creates value labels for datasets he imports from SAS. You do need to be careful that reordering or alterations of the dataframe could upset the association with the vector, since there would be enforcement of consistency by [<- or sort or order.