in R there were several times on which a numerical vector had names before each numeric value as this:
class(oral_NO_AR_comp$clustering$clust1)
output: 1 "numeric"
and the content looks like this:
THe point here is that I need to change the names of the strings above the numbers, is there a way to do that?r
You can get those names with
names(oral_NO_AR_comp$clustering$clust1).
You can use
names(oral_NO_AR_comp$clustering$clust1)<- <whatever you want>
# or
setNames(oral_NO_AR_comp$clustering$clust1, <whatever you want)
to change the names if you like. You can also use remove the names with
unname(oral_NO_AR_comp$clustering$clust1)
Note that these functions (with the exception of names<-) do not change the original value, they return a new value. If you want to replace the original value, be sure to assign it <- to the original variable.
I have a data frame with 100+ variables listed in columns, and each subject in rows. I'd like to loop through each column to perform an ANOVA, and while the loop function works fine the step I am stuck on is listing which columns to loop through. Currently I can set these by manually typing/pasting each variable name but this is obviously not practical.
Currently the loop runs through my list of vars, to get this I currently just type the name of these columns manually...
variables <- vars(height, width, strength)
Which only loops for those selected 3 out of 100+ variables that I have had to manually type in.
I had thought I could list the range of column names for dataframe df between columns 3 to 100 within the vars expression as below...
variables <- vars(colnames(df[3:100]))
This just provides one variable of the name colnames(df[3:100]).
Any ideas to avoid typing or manually inserting commas/removing quotation marks from 100+ different variable names? Thanks in advance.
Consider do.call which is shorthand for expanded list of arguments to a function. Specifically, below:
variables <- do.call(vars, colnames(df)[3:100])
is equivalent to expanded version:
variables <- vars(colnames(df)[3], colnames(df)[4], ..., colnames(df)[100])
I need to remove some outliers from two variables in my dataset. What I've been thinking of is to replace those outliers with the value of it's Q3+-1.5IQR. Is there a fuction aviable to do this or how can I create a fuction that replaces the values of those observation that exceed Q3+1.5IQR for the value of the Q3+1.5IQR itself. Thank you in advance
I have a vector of character data with repeat values. My ultimate goal is to create a bar plot displaying the frequency at which each unique value occurs in the vector. A long way of doing it would be as follows:
object1=length(df$vector[df$vector=="object1"])
object2=length(df$vector[df$vector=="object2"])
object3=length(df$vector[df$vector=="object3"])
amounts=c(object1,object2, object3)
barplot(amounts)
This works but is cumbersome when there are many unique values, which indicates to me that a loop could be used. I know I can get a vector of the unique values in the original vector via the "unique()" command, but I'm not sure where to go from there. The following posts have made me think, but weren't able to answer my question.
Counting the number of elements with the values of x in a vector
R for loop on character variables
you could use ggplot.
Installation:
install.packages('ggplot2')
load library:
library(ggplot2)
Plot Barplot:
ggplot(df,aes(x=as.factor(vector)))+geom_bar()
If your vector is numeric, the as.factor() function can help to change it into categorical.
I have two datasets H and G. They have a column named 'diff' that as the name suggests, holds difference between two columns within each dataset. I used lapply to calculate the percentage for each dataset (I have more datasets than H and G, so would like to calculate the percentage of the two columns in each dataset), but for some reason lapply gives me the output however doesn't create "perc" column in the datasets that pass through it. What am I doing wrong here?
H<-data.frame(replicate(10,sample(0:20,10,rep=TRUE)))
G<-data.frame(replicate(10,sample(0:20,10,rep=TRUE)))
H[c(2,3,7,9),9]<-NA
G[c(1,5,7,8),9]<-NA
H$diff<-H$X10-H$X9
G$diff<-G$X10-G$X9
dsay<-list(H,G)
lapply(dsay,function(x)x$perc<-round((x$diff/x$X10)*100,1))
Extension of this question:
once I have the percent differences as columns using:
H<-data.frame(replicate(10,sample(0:20,10,rep=TRUE)))
G<-data.frame(replicate(10,sample(0:20,10,rep=TRUE)))
H[c(2,3,7,9),9]<-NA
G[c(1,5,7,8),9]<-NA
H$diff<-H$X10-H$X9
G$diff<-G$X10-G$X9
H$perc<-round((H$diff/H$X10)*100,1)
G$perc<-round((G$diff/G$X10)*100,1)
I generated a plot using:
xyplot(X8+X9+X10~X1,H,type=c('p','l','g'),
col = c('yellow', 'green', 'blue','red'),
ylab='Count',layout=c(3, 1),
xlab=paste("H",'difference',min(pmin(H$perc, na.rm = TRUE),na.rm=TRUE),
'% change count'))
Never mind the plot it will generate, but what I'm trying to get to is that I also display the value of corresponding difference from the "diff" column alongwith the lowest difference (which is what the min function is doing). I've tried using "match" in vain. Could someone help please?
If we need the changes to reflect in the dataframe objects as well, list2env or assign can be used. But, I would do all the computations within the list itself.
list2env(lapply(mget(c('H','G')), function(x)
{x$perc<-round((x$diff/x$X10)*100,1);x}), envir=.GlobalEnv)