This question already has an answer here:
Selecting only unique values from a comma separated string [duplicate]
(1 answer)
Closed 2 years ago.
I am looking to find the unique values with the each row of a column.
df <- as.data.frame(rbind(c('10','20','30','10','45','34'),
c('a','b','c','a','b'),
c("fs","pp","dd","dd")))
df$f7 <-paste0(df$V1,
',',
df$V2,
',',
df$V3,',',df$V4,',',df$V5,',',df$V6)
df_1 <- as.data.frame(df[,c(7)])
names(df_1)[1] <-"f1"
The expected output is :
Row1 :10,20,30,45,34
Row2: a,b,c
Row3:fs,pp,dd
Any help is highly appreciated.
Regards,
R
We can loop over the rows with apply (MARGIN = 1 - for rowwise loop), get the unique values and paste
apply(df, 1, FUN = function(x) toString(unique(x)))
Related
This question already has answers here:
Get rank for every column using dplyr
(1 answer)
Adding multiple ranking columns in a dataframe in R
(3 answers)
Closed last month.
Given a data frame such as the following, how do I get a rank order (e.g. integer column ranking the value in order from descending as "1,2,3") column output for every single column without writing out ever single column?
df <- data.frame(
col1 = rnorm(100),
col2 = rnorm(100),
col3 = rnorm(100),
col4 = rnorm(100))
rank_col1
rank_col2
rank_col3
etc...
Is this what you want?
df <- cbind(df, as.data.frame(apply(df, 2, rank)))
This question already has an answer here:
How to replace outlier values?
(1 answer)
Closed 1 year ago.
I want to find all the outliers in a dataframe and replace them by the mean of the variable (column).
This is a big dataframe, composed of 46 obs. of 147 variables.
I was thinking of doing somethings like
new_df <- for (i in scaled.df){
i[!i %in% boxplot.stats(i)$out]
And then replace NULL values, but that function creates a NULL object, I believe the reason is that the new vectors created won´t have the same length.
Any ideas? Thx
You can write a function to do this -
replace_outlier_with_mean <- function(x) {
replace(x, x %in% boxplot.stats(x)$out, mean(x))
}
To apply for multiple columns you can use lapply -
scaled.df[] <- lapply(scaled.df, replace_outlier_with_mean)
Or in dplyr -
library(dplyr)
scaled.df %>% mutate(across(.fns = replace_outlier_with_mean))
This question already has answers here:
Extract columns from data table by numeric indices stored in a vector
(2 answers)
Closed 1 year ago.
Given a data.table, how can I select a set of columns using a variable?
Example:
df[, 1:3]
is OK, but
idx <- 1:3
df[, idx]
is not OK: column named "idx" does not exist.
How can I use idx to select the columns in the simplest possible way?
We can use .. before the idx to select the columns in data.table or with = FALSE
library(data.table)
df[, ..idx]
df[, idx, with = FALSE]
This question already has answers here:
Replace all occurrences of a string in a data frame
(7 answers)
Closed 2 years ago.
I would like to replace a series of "99"s in my dataframe with NA. To do this for one column I am using the following line of code, which works just fine.
data$column[data$column == "99"] = NA
However, as I have a large number of columns I want to apply this to all columns. The following line of code isn't doing it. I assume it is because the third "x" is again a reference to the dataframe and not to a specific column.
data = lapply(data, function(x) {x[x == "99"] = NA})
Any advice on what I should change?
If you want to replace all 99, simply do
data[data=="99"] <- NA
If you want to stick to the apply function
apply(data, 2, function(x) replace(x, x=="99", NA))
This question already has answers here:
How do I split a data frame among columns, say at every nth column?
(1 answer)
What is the algorithm behind R core's `split` function?
(1 answer)
Closed 4 years ago.
Is there an easy way in base R to split a data frame into a list of data frames based on an index factor levels (taken from another data frame)?
For example,
x = data.frame(num1 = 1:26, let = letters, num2 = 10:35, LET = LETTERS)
ls = list(x[, 1:2], x[, 3:4])
But lets say we had an index indicating factor levels for columns, can split be used?
indx = c(1,1,2,2)
? split(x, indx)
It would be the default method of split
out <- split.default(x, indx)
identical(ls, setNames(out, NULL))
#[1] TRUE