Input:
df = data.frame(col1 = 1:5, col2 = 5:9)
rownames(df) <- letters[1:5]
#add jitter
jitter(df) #Error in jitter(df) : 'x' must be numeric
Expected output: jitter will be added to the columns of df. Thanks!
jitter is a function that takes numeric as input. You cannot simply run jitter on the whole data.frame. You need to loop through the columns. You can do:
data.frame(lapply(df, jitter))
Jitter is to be applied to a numerical vector, not a dataframe.
If you want to apply Jitter to all your columns, this should do:
apply(df, 2, jitter)
Just adding random numbers?
df_jit <- df + matrix(rnorm(nrow(df) * ncol(df), sd = 0.1), ncol = ncol(df))
Related
I am trying to write an R program that can create a vector.
Suppose I have 3 factors (X with levels 1,2,3, Y with levels 1,2, and Z with level 1,2,3,4). If I want to represent them in a contingency table there are 3x2x4=24 cells. (for example, (111, 112, 121, 222) are typical cells).
I want to write a for loop that the output is a vector of all cells. that means the output is a vector of length 24.
vector1 <- factor(x = c(1,2,3))
vector2 <- factor(x = c(1,2))
vector3 <- factor(x = c(1,2,3,4,3,2))
df1 <- expand.grid(levels(vector1),levels(vector2),levels(vector3))
results <- paste0(df1$Var1,df1$Var2,df1$Var3)
factor(results)
Here's another solution using lapply :
create dummy data and assign it to a list
set.seed(10)
listofvectors<-list(
factor(sample(1:3,20, replace = TRUE)),
factor(sample(1:2,10, replace = TRUE)),
factor(sample(1:4,15, replace = TRUE))
)
generate a table of combinations
combinations <-expand.grid(lapply(listofvectors, levels))
combine row-wise then create vector
unlist(do.call("paste0", combinations))
Hey I am having a little bit of missunderstanding and need a little bit of guidance. I want to compute the correlation between a vector (or df with 1 column) and each line of a dataframe.
I made a graphic for a better understanding:
!(https://ibb.co/51Fk5KB)
All rows have a date and fit to a unique as.Date of the other dataframe. Because I want to compute it in a rolling window of 12 months I run:
df1 <- read.zoo(df1)
df2 <- read.zoo(df2)
new_df <- rollapplyr(??????????, 12, function(x) cor(x[, 1], x[, 2]), by.column = TRUE, fill = NA)
new_df <- fortify.zoo(new_df)
Now I ask you: what do I have to insert in the ?????????? spot? Or do I even have to change/add something else?
You can use calculate the correlation between a vector and columns of a dataframe like so cor(vector, dataframe)
Example
Create a vector and dataframe :
set.seed(1234)
vec <- (runif(150, 0, 10))
iris2 <- iris[,c(1:4)] # 150 x 4 dataframe
Now calculate correlations
cor(vec, iris2)
# Correlations
# -0.0187099581910839078691 -0.0233219261874525844724 -0.0063229780212239634907 0.0138003706052788940178
I want to cut test$income into 25 levels and using the intervals derived, I stored them in a variable called levels and I wish to cut train$income based on the same intervals. I tried the following code below but I am not sure why some of my values in train$income were coerced to NA.
What went wrong? Is there a better way to do this? Thank you!
test$income <- cut(test$income,b=25)
levels <- c(-0.853,-0.586,-0.325,-0.0643,0.196,0.457,0.718,0.978,1.24,1.5,1.76,2.02,2.28,2.54,2.8,3.06,3.32,3.59,3.85,4.11,4.37,4.63,4.89,5.15,5.41,5.68)
train$income <- cut(train$income,levels)
As #JohnGilfillan says, one reason can be that your train$income is higher than 5.68 or lower than -0.853. In this case you would get some of your values as NAs, while others would be numeric. This is a likely case, but another reason (for another instance) could be that you have used a character vector to specify the breaks in your actual code (levels from cut object will return a character vector). In this case you would get a vector with only NAs (written as <NA>).
The solution is to expand the extremes of your levels vector.
Try this:
set.seed(1)
a <- runif(100, -6, 6)
set.seed(2)
b <- runif(100, -6, 6)
levs <- levels(cut(a, 25))
levs <- gsub("\\(", "", levs)
levs <- gsub("\\]", "", levs)
levs <- c(as.numeric(sapply(strsplit(levs, ","), "[", 1)),
as.numeric(sapply(strsplit(levs, ","), "[", 2))[length(levs)])
cut.b <- cut(b, levs)
## Both NA values are outside levs
b[is.na(cut.b)]
cut.b.new <- cut(b, c(-6, levs[c(-1, -length(levs))], 6))
## No NAs
any(is.na(cut.b.new))
PS: It is not recommended to use function names as object names. Therefore levs instead of levels.
I've got a frame with a set of different variables - integers, factors, logicals - and I would like to recode all of the "NAs" as a numeric across the whole dataset while preserving the underlying variable class. For example:
frame <- data.frame("x" = rnorm(10), "y" = rep("A", 10))
frame[6,] <- NA
dat <- as.data.frame(apply(frame,2, function(x) ifelse(is.na(x)== TRUE, -9, x) ))
dat
str(dat)
However, here the integers turn into factors; when I include as.numeric(x) in the apply() function, this introduces errors. Thanks for any and all thoughts on how to deal with this.
apply returns a matrix of type character. as.data.frame turns this into factors by default. Instead, you could do
dat <- as.data.frame(lapply(frame, function(x) ifelse(is.na(x), -9, x) ) )
a<- data.frame(sex=c(1,1,2,2,1,1),bq=factor(c(1,2,1,2,2,2)))
library(Hmisc)
label(a$sex)<-"gender"
label(a$bq)<-"xxx"
str(a)
b<-data.frame(lapply(a, as.character), stringsAsFactors=FALSE)
str(b)
When I covert dataframe a columns to character,the columns labels disappeared.My dataframe have many columns.Here as an example only two columns. How to keep columns labels when numeric convert to character? Thank you!
Labels are not a commonly used R feature. Unfortunately, you will have to do it yourself:
b <- data.frame(lapply(a, function(x) { y <- as.character(x); label(y) <- label(x); y }), stringsAsFactors = FALSE)