This question already has answers here:
Replacing NAs with latest non-NA value
(21 answers)
Closed 4 years ago.
I'm struggling to solve this apparently simple question in R, but no suceess until now.
I have a data.frame with a char variable having some blanks and some non-blank values. I'm trying to complete those blanks with the last non-blank found into the same variable from top-down as in the following example related do variable 'Species' in data.frame 'want' vs 'have'.
If someone could help, I thanks in advance!
set.seed(12346)
foi <- split(iris, iris$Species)
want <- do.call("rbind", lapply(foi, function(x){
x[1:sample(1:10, 1), ]
}))
row.names(want) <- NULL
want$Species <- as.character(want$Species)
have <- want
have$Species[2:10] <- ""
have$Species[12:16] <- ""
have$Species[18:21] <- ""
head(have, 20)
head(want, 20)
A simple for loop assuming the first value is non missing:
for(i in which(have$Species=="")) have$Species[i]=have$Species[i-1]
You could split your variable by block of consecutive blank values and fill each block with the first previous non blank value if speed is an issue and your file is huge.
Related
This question already has answers here:
Finding ALL duplicate rows, including "elements with smaller subscripts"
(9 answers)
Closed 9 months ago.
I have thise code that generates random values as part of a sequence. I need to keep duplicates and remove values that are NOT repeated. Any help with this?
Apparently, a solution is supposed to contain '%/%, as.numeric, names, table, >'
Here is original code.
x <- sample(c(-10:10), sample(20:40, 1), replace = TRUE)
Any help would be appreciated!
I would use table(). Table will give you a list of all values and the number of times they occur.
vec <- c(1,2,3,4,5,3,4,5,4,5)
valueset <- unique(vec) # all unique values in vec, will be used later
#now we determine whihc values are occuring more than once
valuecounts <- table(vec) # returns count of all unique values, values are in the names of this variable as strings now regardless of original data type
morethanone <- names(valuecounts)[valuecounts>1] #returns values with count>1
morethanone <- as.numeric(morethanone) # converts strings back to numeric
valueset[valueset %in% morethanone] #returns the original values which passed the above cirteria
As a function....
duplicates <- function(vector){
# Returns all unique values that occur more than once in a list
valueset <- unique(vector)
valuecounts <- table(vector)
morethanone <- names(valuecounts)[valuecounts>1]
morethanone <- as.numeric(morethanone)
return( valueset[valueset %in% morethanone] )
}
Now try running duplicates(x)
This question already has answers here:
Update data frame via function doesn't work
(6 answers)
Closed 1 year ago.
I have written a function to change values not being NA in each column into a new value. The following example illustrates the problem:
df <- data.frame(A=c(1,NA,1,1,NA),
B=c(NA,1,NA,1,NA),
C=c(1,NA,1,NA,1))
1's should be changed into 0's with the function:
cambio <- function(d,v){
d[[v]][!is.na(d[[v]])] <- 0
}
The column is named within the function with [[]], and it is passed with quotes as argument to the function. I learned this in a clear and useful response to the post Pass a data.frame column name to a function.
However, after running the function, for example, with the first variable,
cambio(df,"A")
the values of tha column keep unchanged.
Why this function does not work as expected?
You have
d[[v]][!is.na(d[[v]])] <- 0
But this tells it to put a zero on any not NA, so you want:
cambio <- function(d,v){
d[[v]][is.na(d[[v]])] <- 0
return(d)
}
EDIT:: you're just missing the return(d) statement.
Here's a few base R solutions:
one:
replace(df, df == 1, 0)
two:
replace(df, !is.na(df), 0)
three:
data.frame(lapply(df, pmin, 0))
This question already has answers here:
Replace all occurrences of a string in a data frame
(7 answers)
Closed 2 years ago.
I would like to replace a series of "99"s in my dataframe with NA. To do this for one column I am using the following line of code, which works just fine.
data$column[data$column == "99"] = NA
However, as I have a large number of columns I want to apply this to all columns. The following line of code isn't doing it. I assume it is because the third "x" is again a reference to the dataframe and not to a specific column.
data = lapply(data, function(x) {x[x == "99"] = NA})
Any advice on what I should change?
If you want to replace all 99, simply do
data[data=="99"] <- NA
If you want to stick to the apply function
apply(data, 2, function(x) replace(x, x=="99", NA))
This question already has answers here:
Repeat each row of data.frame the number of times specified in a column
(10 answers)
Closed 4 years ago.
I have a dataframe like this :
enter image description here
I would like to dupplicate each line the number of times indicated in the column "nombreIndividus".
I tried with rep() and each = and/or time = but I can't do it.
Example :
incomeGlobalCopie <- incomeGlobalCopie[rep(1:nrow(incomeGlobalCopie),
each=incomeGlobalCopie$nombreIndividus)]
Can you help me ?
Thanks
Completely unelegant, but it does the trick:
names <- c("lion","tiger","flamengo")
replication <- c(4,5,3)
species <- data.frame(names, replication)
speciesCopy <- data.frame(matrix(ncol=2,nrow=0))
for(i in 1:length(species$names)){
for(j in 1:species$replication[i]){
speciesCopy <- rbind(speciesCopy, species[i,])
}
}
speciesCopy
This question already has answers here:
Get the index of the values of one vector in another?
(3 answers)
Closed 5 years ago.
I like to give all values which are in the dat vector a corresponding value which I define in another vector (new_value). I already managed to do somehow, but I like to have a solution without loop:
dat <- c("a","c","b","a","a","c","a","b","b","a")
old_value <- names(table(dat))
new_value <- 1:length(old_value)
new_dat <- rep(NA, length(dat))
for(z in 1:length(old_value)){
new_dat[dat==old_value[z]] <- c(1:length(new_value))[z]
}
new_dat
I don't want to use additional libraries. Please only base solutions.
We can use match
new_dat <- new_value[match(dat, old_value)]
For the current example, even
match(dat, old_value)
should work