Remove a variable value from the list of possible values

Remove a variable value from the list of possible values - r

I am using the dataset that can be accessed with the following command - load(url("http://bit.ly/dasi_gss_data"))
When I run the query table(gss$premarsx), it returns a column called Other with count 0. When I plot a graph of the same variable (premarsx), there is a column Other with zero height. Is there a way to remove the variable value Other from the variable definition so that it does not appear in the results of any queries/plots?

You can pass it through the factor() function to have it pick up the present levels:
gss$premarsx <- factor(gss$premarsx)

Related

names above the numbers of a numerical vector in R. how to change them?

in R there were several times on which a numerical vector had names before each numeric value as this:
class(oral_NO_AR_comp$clustering$clust1)
output: 1 "numeric"
and the content looks like this:
THe point here is that I need to change the names of the strings above the numbers, is there a way to do that?r

You can get those names with
names(oral_NO_AR_comp$clustering$clust1).
You can use
names(oral_NO_AR_comp$clustering$clust1)<- <whatever you want>
# or
setNames(oral_NO_AR_comp$clustering$clust1, <whatever you want)
to change the names if you like. You can also use remove the names with
unname(oral_NO_AR_comp$clustering$clust1)
Note that these functions (with the exception of names<-) do not change the original value, they return a new value. If you want to replace the original value, be sure to assign it <- to the original variable.

Convert range of column titles to variables for vars() function

I have a data frame with 100+ variables listed in columns, and each subject in rows. I'd like to loop through each column to perform an ANOVA, and while the loop function works fine the step I am stuck on is listing which columns to loop through. Currently I can set these by manually typing/pasting each variable name but this is obviously not practical.
Currently the loop runs through my list of vars, to get this I currently just type the name of these columns manually...
variables <- vars(height, width, strength)
Which only loops for those selected 3 out of 100+ variables that I have had to manually type in.
I had thought I could list the range of column names for dataframe df between columns 3 to 100 within the vars expression as below...
variables <- vars(colnames(df[3:100]))
This just provides one variable of the name colnames(df[3:100]).
Any ideas to avoid typing or manually inserting commas/removing quotation marks from 100+ different variable names? Thanks in advance.

Consider do.call which is shorthand for expanded list of arguments to a function. Specifically, below:
variables <- do.call(vars, colnames(df)[3:100])
is equivalent to expanded version:
variables <- vars(colnames(df)[3], colnames(df)[4], ..., colnames(df)[100])

How can I create the following function in R?

I need to remove some outliers from two variables in my dataset. What I've been thinking of is to replace those outliers with the value of it's Q3+-1.5IQR. Is there a fuction aviable to do this or how can I create a fuction that replaces the values of those observation that exceed Q3+1.5IQR for the value of the Q3+1.5IQR itself. Thank you in advance

Determining the number of objects in a vector

I have a vector of character data with repeat values. My ultimate goal is to create a bar plot displaying the frequency at which each unique value occurs in the vector. A long way of doing it would be as follows:
object1=length(df$vector[df$vector=="object1"])
object2=length(df$vector[df$vector=="object2"])
object3=length(df$vector[df$vector=="object3"])
amounts=c(object1,object2, object3)
barplot(amounts)
This works but is cumbersome when there are many unique values, which indicates to me that a loop could be used. I know I can get a vector of the unique values in the original vector via the "unique()" command, but I'm not sure where to go from there. The following posts have made me think, but weren't able to answer my question.
Counting the number of elements with the values of x in a vector
R for loop on character variables

you could use ggplot.
Installation:
install.packages('ggplot2')
load library:
library(ggplot2)
Plot Barplot:
ggplot(df,aes(x=as.factor(vector)))+geom_bar()
If your vector is numeric, the as.factor() function can help to change it into categorical.

using a function with lapply to create a column and match values

I have two datasets H and G. They have a column named 'diff' that as the name suggests, holds difference between two columns within each dataset. I used lapply to calculate the percentage for each dataset (I have more datasets than H and G, so would like to calculate the percentage of the two columns in each dataset), but for some reason lapply gives me the output however doesn't create "perc" column in the datasets that pass through it. What am I doing wrong here?
H<-data.frame(replicate(10,sample(0:20,10,rep=TRUE)))
G<-data.frame(replicate(10,sample(0:20,10,rep=TRUE)))
H[c(2,3,7,9),9]<-NA
G[c(1,5,7,8),9]<-NA
H$diff<-H$X10-H$X9
G$diff<-G$X10-G$X9
dsay<-list(H,G)
lapply(dsay,function(x)x$perc<-round((x$diff/x$X10)*100,1))
Extension of this question:
once I have the percent differences as columns using:
H<-data.frame(replicate(10,sample(0:20,10,rep=TRUE)))
G<-data.frame(replicate(10,sample(0:20,10,rep=TRUE)))
H[c(2,3,7,9),9]<-NA
G[c(1,5,7,8),9]<-NA
H$diff<-H$X10-H$X9
G$diff<-G$X10-G$X9
H$perc<-round((H$diff/H$X10)*100,1)
G$perc<-round((G$diff/G$X10)*100,1)
I generated a plot using:
xyplot(X8+X9+X10~X1,H,type=c('p','l','g'),
col = c('yellow', 'green', 'blue','red'),
ylab='Count',layout=c(3, 1),
xlab=paste("H",'difference',min(pmin(H$perc, na.rm = TRUE),na.rm=TRUE),
'% change count'))
Never mind the plot it will generate, but what I'm trying to get to is that I also display the value of corresponding difference from the "diff" column alongwith the lowest difference (which is what the min function is doing). I've tried using "match" in vain. Could someone help please?

If we need the changes to reflect in the dataframe objects as well, list2env or assign can be used. But, I would do all the computations within the list itself.
list2env(lapply(mget(c('H','G')), function(x)
{x$perc<-round((x$diff/x$X10)*100,1);x}), envir=.GlobalEnv)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Remove a variable value from the list of possible values - r

You can pass it through the factor() function to have it pick up the present levels: gss$premarsx <- factor(gss$premarsx)

Related

names above the numbers of a numerical vector in R. how to change them?

Convert range of column titles to variables for vars() function

How can I create the following function in R?

Determining the number of objects in a vector

using a function with lapply to create a column and match values

Categories

Resources