Apply a function to each column in a dataframe in R [duplicate] - r

This question already has answers here:
Replace all occurrences of a string in a data frame
(7 answers)
Closed 2 years ago.
I would like to replace a series of "99"s in my dataframe with NA. To do this for one column I am using the following line of code, which works just fine.
data$column[data$column == "99"] = NA
However, as I have a large number of columns I want to apply this to all columns. The following line of code isn't doing it. I assume it is because the third "x" is again a reference to the dataframe and not to a specific column.
data = lapply(data, function(x) {x[x == "99"] = NA})
Any advice on what I should change?

If you want to replace all 99, simply do
data[data=="99"] <- NA
If you want to stick to the apply function
apply(data, 2, function(x) replace(x, x=="99", NA))

Related

Replace outliers of a dataframe with the mean value [duplicate]

This question already has an answer here:
How to replace outlier values?
(1 answer)
Closed 1 year ago.
I want to find all the outliers in a dataframe and replace them by the mean of the variable (column).
This is a big dataframe, composed of 46 obs. of 147 variables.
I was thinking of doing somethings like
new_df <- for (i in scaled.df){
i[!i %in% boxplot.stats(i)$out]
And then replace NULL values, but that function creates a NULL object, I believe the reason is that the new vectors created won´t have the same length.
Any ideas? Thx
You can write a function to do this -
replace_outlier_with_mean <- function(x) {
replace(x, x %in% boxplot.stats(x)$out, mean(x))
}
To apply for multiple columns you can use lapply -
scaled.df[] <- lapply(scaled.df, replace_outlier_with_mean)
Or in dplyr -
library(dplyr)
scaled.df %>% mutate(across(.fns = replace_outlier_with_mean))

unique values from one column from each row [duplicate]

This question already has an answer here:
Selecting only unique values from a comma separated string [duplicate]
(1 answer)
Closed 2 years ago.
I am looking to find the unique values with the each row of a column.
df <- as.data.frame(rbind(c('10','20','30','10','45','34'),
c('a','b','c','a','b'),
c("fs","pp","dd","dd")))
df$f7 <-paste0(df$V1,
',',
df$V2,
',',
df$V3,',',df$V4,',',df$V5,',',df$V6)
df_1 <- as.data.frame(df[,c(7)])
names(df_1)[1] <-"f1"
The expected output is :
Row1 :10,20,30,45,34
Row2: a,b,c
Row3:fs,pp,dd
Any help is highly appreciated.
Regards,
R
We can loop over the rows with apply (MARGIN = 1 - for rowwise loop), get the unique values and paste
apply(df, 1, FUN = function(x) toString(unique(x)))

How to loop through a vector of data frame names to print first columns of the df's? [duplicate]

This question already has answers here:
How to extract certain columns from a list of data frames
(3 answers)
Closed 2 years ago.
so x is a vector. i am trying to print the first col of df's name's saved in the vector. so far I have tried the below but they don't seem to work.
x = (c('Ethereum,another Df..., another DF...,'))
for (i in x){
print(i[,1])
}
sapply(toString(Ethereum), function(i) print(i[1]))
You can try this
x <- c('Ethereum','anotherDf',...)
for (i in x){
print(get(i)[,1])
}
You can use mget to get data in a list and using lapply extract the first column of each dataframe in the list.
data <- lapply(mget(x), `[`, 1)
#Use `[[` to get it as vector.
#data <- lapply(mget(x), `[[`, 1)
Similar solution using purrr::map :
data <- purrr::map(mget(x), `[`, 1)

Duplicate line according to 1 column of a dataframe in R [duplicate]

This question already has answers here:
Repeat each row of data.frame the number of times specified in a column
(10 answers)
Closed 4 years ago.
I have a dataframe like this :
enter image description here
I would like to dupplicate each line the number of times indicated in the column "nombreIndividus".
I tried with rep() and each = and/or time = but I can't do it.
Example :
incomeGlobalCopie <- incomeGlobalCopie[rep(1:nrow(incomeGlobalCopie),
each=incomeGlobalCopie$nombreIndividus)]
Can you help me ?
Thanks
Completely unelegant, but it does the trick:
names <- c("lion","tiger","flamengo")
replication <- c(4,5,3)
species <- data.frame(names, replication)
speciesCopy <- data.frame(matrix(ncol=2,nrow=0))
for(i in 1:length(species$names)){
for(j in 1:species$replication[i]){
speciesCopy <- rbind(speciesCopy, species[i,])
}
}
speciesCopy

Complete blanks by last non-blank value in R [duplicate]

This question already has answers here:
Replacing NAs with latest non-NA value
(21 answers)
Closed 4 years ago.
I'm struggling to solve this apparently simple question in R, but no suceess until now.
I have a data.frame with a char variable having some blanks and some non-blank values. I'm trying to complete those blanks with the last non-blank found into the same variable from top-down as in the following example related do variable 'Species' in data.frame 'want' vs 'have'.
If someone could help, I thanks in advance!
set.seed(12346)
foi <- split(iris, iris$Species)
want <- do.call("rbind", lapply(foi, function(x){
x[1:sample(1:10, 1), ]
}))
row.names(want) <- NULL
want$Species <- as.character(want$Species)
have <- want
have$Species[2:10] <- ""
have$Species[12:16] <- ""
have$Species[18:21] <- ""
head(have, 20)
head(want, 20)
A simple for loop assuming the first value is non missing:
for(i in which(have$Species=="")) have$Species[i]=have$Species[i-1]
You could split your variable by block of consecutive blank values and fill each block with the first previous non blank value if speed is an issue and your file is huge.

Resources