How to add jitter in a data frame in R

How to add jitter in a data frame in R - r

Input:
df = data.frame(col1 = 1:5, col2 = 5:9)
rownames(df) <- letters[1:5]
#add jitter
jitter(df) #Error in jitter(df) : 'x' must be numeric
Expected output: jitter will be added to the columns of df. Thanks!

jitter is a function that takes numeric as input. You cannot simply run jitter on the whole data.frame. You need to loop through the columns. You can do:
data.frame(lapply(df, jitter))

Jitter is to be applied to a numerical vector, not a dataframe.
If you want to apply Jitter to all your columns, this should do:
apply(df, 2, jitter)

Just adding random numbers?
df_jit <- df + matrix(rnorm(nrow(df) * ncol(df), sd = 0.1), ncol = ncol(df))

Related

Create a vector of factors with multiple level

I am trying to write an R program that can create a vector.
Suppose I have 3 factors (X with levels 1,2,3, Y with levels 1,2, and Z with level 1,2,3,4). If I want to represent them in a contingency table there are 3x2x4=24 cells. (for example, (111, 112, 121, 222) are typical cells).
I want to write a for loop that the output is a vector of all cells. that means the output is a vector of length 24.

vector1 <- factor(x = c(1,2,3))
vector2 <- factor(x = c(1,2))
vector3 <- factor(x = c(1,2,3,4,3,2))
df1 <- expand.grid(levels(vector1),levels(vector2),levels(vector3))
results <- paste0(df1$Var1,df1$Var2,df1$Var3)
factor(results)

Here's another solution using lapply :
create dummy data and assign it to a list
set.seed(10)
listofvectors<-list(
factor(sample(1:3,20, replace = TRUE)),
factor(sample(1:2,10, replace = TRUE)),
factor(sample(1:4,15, replace = TRUE))
)
generate a table of combinations
combinations <-expand.grid(lapply(listofvectors, levels))
combine row-wise then create vector
unlist(do.call("paste0", combinations))

How to compute the correlation between a vector and each column of a data.frame

Hey I am having a little bit of missunderstanding and need a little bit of guidance. I want to compute the correlation between a vector (or df with 1 column) and each line of a dataframe.
I made a graphic for a better understanding:
!(https://ibb.co/51Fk5KB)
All rows have a date and fit to a unique as.Date of the other dataframe. Because I want to compute it in a rolling window of 12 months I run:
df1 <- read.zoo(df1)
df2 <- read.zoo(df2)
new_df <- rollapplyr(??????????, 12, function(x) cor(x[, 1], x[, 2]), by.column = TRUE, fill = NA)
new_df <- fortify.zoo(new_df)
Now I ask you: what do I have to insert in the ?????????? spot? Or do I even have to change/add something else?

You can use calculate the correlation between a vector and columns of a dataframe like so cor(vector, dataframe)
Example
Create a vector and dataframe :
set.seed(1234)
vec <- (runif(150, 0, 10))
iris2 <- iris[,c(1:4)] # 150 x 4 dataframe
Now calculate correlations
cor(vec, iris2)
# Correlations
# -0.0187099581910839078691 -0.0233219261874525844724 -0.0063229780212239634907 0.0138003706052788940178

Cut a column based on intervals of another column in r

I want to cut test$income into 25 levels and using the intervals derived, I stored them in a variable called levels and I wish to cut train$income based on the same intervals. I tried the following code below but I am not sure why some of my values in train$income were coerced to NA.
What went wrong? Is there a better way to do this? Thank you!
test$income <- cut(test$income,b=25)
levels <- c(-0.853,-0.586,-0.325,-0.0643,0.196,0.457,0.718,0.978,1.24,1.5,1.76,2.02,2.28,2.54,2.8,3.06,3.32,3.59,3.85,4.11,4.37,4.63,4.89,5.15,5.41,5.68)
train$income <- cut(train$income,levels)

As #JohnGilfillan says, one reason can be that your train$income is higher than 5.68 or lower than -0.853. In this case you would get some of your values as NAs, while others would be numeric. This is a likely case, but another reason (for another instance) could be that you have used a character vector to specify the breaks in your actual code (levels from cut object will return a character vector). In this case you would get a vector with only NAs (written as <NA>).
The solution is to expand the extremes of your levels vector.
Try this:
set.seed(1)
a <- runif(100, -6, 6)
set.seed(2)
b <- runif(100, -6, 6)
levs <- levels(cut(a, 25))
levs <- gsub("\\(", "", levs)
levs <- gsub("\\]", "", levs)
levs <- c(as.numeric(sapply(strsplit(levs, ","), "[", 1)),
as.numeric(sapply(strsplit(levs, ","), "[", 2))[length(levs)])
cut.b <- cut(b, levs)
## Both NA values are outside levs
b[is.na(cut.b)]
cut.b.new <- cut(b, c(-6, levs[c(-1, -length(levs))], 6))
## No NAs
any(is.na(cut.b.new))
PS: It is not recommended to use function names as object names. Therefore levs instead of levels.

How to change values in data frame by column class in R

I've got a frame with a set of different variables - integers, factors, logicals - and I would like to recode all of the "NAs" as a numeric across the whole dataset while preserving the underlying variable class. For example:
frame <- data.frame("x" = rnorm(10), "y" = rep("A", 10))
frame[6,] <- NA
dat <- as.data.frame(apply(frame,2, function(x) ifelse(is.na(x)== TRUE, -9, x) ))
dat
str(dat)
However, here the integers turn into factors; when I include as.numeric(x) in the apply() function, this introduces errors. Thanks for any and all thoughts on how to deal with this.

apply returns a matrix of type character. as.data.frame turns this into factors by default. Instead, you could do
dat <- as.data.frame(lapply(frame, function(x) ifelse(is.na(x), -9, x) ) )

How to keep columns labels when numeric convert to character

a<- data.frame(sex=c(1,1,2,2,1,1),bq=factor(c(1,2,1,2,2,2)))
library(Hmisc)
label(a$sex)<-"gender"
label(a$bq)<-"xxx"
str(a)
b<-data.frame(lapply(a, as.character), stringsAsFactors=FALSE)
str(b)
When I covert dataframe a columns to character,the columns labels disappeared.My dataframe have many columns.Here as an example only two columns. How to keep columns labels when numeric convert to character? Thank you!

Labels are not a commonly used R feature. Unfortunately, you will have to do it yourself:
b <- data.frame(lapply(a, function(x) { y <- as.character(x); label(y) <- label(x); y }), stringsAsFactors = FALSE)

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

How to add jitter in a data frame in R - r

Input: df = data.frame(col1 = 1:5, col2 = 5:9) rownames(df) <- letters[1:5] #add jitter jitter(df) #Error in jitter(df) : 'x' must be numeric Expected output: jitter will be added to the columns of df. Thanks!

jitter is a function that takes numeric as input. You cannot simply run jitter on the whole data.frame. You need to loop through the columns. You can do: data.frame(lapply(df, jitter))

Jitter is to be applied to a numerical vector, not a dataframe. If you want to apply Jitter to all your columns, this should do: apply(df, 2, jitter)

Just adding random numbers? df_jit <- df + matrix(rnorm(nrow(df) * ncol(df), sd = 0.1), ncol = ncol(df))

Related

Create a vector of factors with multiple level

How to compute the correlation between a vector and each column of a data.frame

Cut a column based on intervals of another column in r

How to change values in data frame by column class in R

How to keep columns labels when numeric convert to character

Categories

Resources