This question already has answers here:
Replace all particular values in a data frame
(8 answers)
Replace a value in a data frame based on a conditional (`if`) statement
(10 answers)
How do I replace NA values with zeros in an R dataframe?
(29 answers)
Replace contents of factor column in R dataframe
(9 answers)
Closed 3 years ago.
I have a data set:
x y z
1 apple a 4
2 orange d 3
3 banana b 2
4 strawberry c 1
How can I change the name "banana" to "grape"? I want to get:
x y z
1 apple a 4
2 orange d 3
3 grape b 2
4 strawberry c 1
Reproducible code:
example<-data.frame( x = c("apple", "orange", "banana", "strawberry"), y = c("a", "d", "b", "c"), z = c(4:1) )
Below is the solution using tidyverse in R
library(tidyverse)
example %>%
mutate(x = as.character(x)) %>%
mutate(x = replace(x, x == 'banana', 'grape'))
Related
This question already has an answer here:
R: Split Variable Column into multiple (unbalanced) columns by comma
(1 answer)
Closed 3 years ago.
I have to import a table that look like as the following dataframe:
> df = data.frame(x = c("a", "a.b","a.b.c","a.b.d", "a.d"))
> df
x
1 <NA>
2 a
3 a.b
4 a.b.c
5 a.b.d
6 a.d
I'd like to separate the first column in one or more columns based one how many separator I'll find.
The output should lool like this
> df_separated
col1 col2 col3
1 a <NA> <NA>
2 a b <NA>
3 a b c
4 a b d
5 a d <NA>
I tried to use the separate function in tidyr but I need to specify a priori how many outoput columns I need.
Thank you very much for your help
You can first count the number of columns it can take and then use separate.
nmax <- max(stringr::str_count(df$x, "\\.")) + 1
tidyr::separate(df, x, paste0("col", seq_len(nmax)), sep = "\\.", fill = "right")
# col1 col2 col3
#1 a <NA> <NA>
#2 a b <NA>
#3 a b c
#4 a b d
#5 a d <NA>
This question already has answers here:
How to sum a variable by group
(18 answers)
Closed 3 years ago.
I have a data frame with some variables with the same name but different values. I need to sum the values and keep the original values as a separate column.
data <- data.frame(cod = c("A", "B", "C", "A", "A", "B"),
values = c(3, 4, 5, 1, 2, 5))
data
cod Values
A 3
B 4
C 5
A 1
A 2
B 5
I expect the following, where the original Values column is kept the same and the group sum is added as a new column, Values2:
> data2
cod Values Values2
A 3 6
B 4 9
C 5 5
A 1 6
A 2 6
B 5 9
An option with base R would be
data$Values2 <- with(data, ave(Values, cod, FUN = sum))
This question already has answers here:
Convert Rows into Columns by matching string in R
(3 answers)
Closed 4 years ago.
Given a vector, I want to convert it to a dataframe using a 'key' value which is randomly distributed throughout the vector at the start of what is to be a row. In this case, "z" would be the first value in each column.
vd <- c("z","a","b","c","z","a","b","c","z","a","b","c","d")
The resultant data should look like:
#using magrittr
data.frame(x1 = c("z","a","b","c", NA), x2 = c("z","a","b","c", NA), x3 = c("z","a","b","c","d"))
%>% transpose()
One solution would be to find the largest distance between 'keys' in the vector and then interject blank values at the end of 'sections' that are smaller than the longest 'section' so you could use matrix()
What would be the best way to do this?
plyr::ldply(split(vd, cumsum(vd == "z")), rbind)[-1]
(copied from here)
result:
1 2 3 4 5
1 z a b c <NA>
2 z a b c <NA>
3 z a b c d
We can use cumsum to identify groups then split them. Then we append the vectors and format them as a data.frame.
x <- split(vd,cumsum("z"==vd))
maxl <- max(lengths(x))
as.data.frame(lapply(x,function(y) c(y,rep(NA,maxl-length(y)))))
# X1 X2 X3
# 1 z z z
# 2 a a a
# 3 b b b
# 4 c c c
# 5 <NA> <NA> d
This question already has answers here:
Count number of rows per group and add result to original data frame
(11 answers)
Closed 4 years ago.
I have no idea where to start with this, but what I am trying to do is create a new value based on the number of times another value is represented in another column.
For example
# Existing Data
key newcol
a ?
a ?
a ?
b ?
b ?
c ?
c ?
c ?
Would like the output to look like
key newcol
a 3
a 3
a 3
b 2
b 2
c 3
c 3
c 3
Thanks!
This can be achieved with the doBy package like so:
require(doBy)
#original data frame
df <- data.frame(key = c('a', 'a', 'a', 'b', 'b', 'c', 'c', 'c'))
#add counter
df$count <- 1
#use summaryBy to count number of instances of key
counts <- summaryBy(count ~ key, data = df, FUN = sum, var.names = 'newcol', keep.names = TRUE)
#merge counts into original data frame
df <- merge(df, counts, by = 'key', all.x = TRUE)
df then looks like:
> df
key count newcol
1 a 1 3
2 a 1 3
3 a 1 3
4 b 1 2
5 b 1 2
6 c 1 3
7 c 1 3
8 c 1 3
If key is a vector like this key <- rep(c("a", "b", "c"), c(3,2,3)), then you can get what you want by using table to count occurences of key elements
> N <- table(key)
> data.frame(key, newcol=rep(N,N))
key newcol
1 a 3
2 a 3
3 a 3
4 b 2
5 b 2
6 c 3
7 c 3
8 c 3
On the other hand, if key is a data.frame, then...
key.df <- data.frame(key = rep(letters[1:3], c(3, 2, 3)))
N <- table(key.df$key)
data.frame(key=key.df, newcol=rep(N, N))
This question already has answers here:
Filter multiple values on a string column in dplyr
(6 answers)
Closed 6 years ago.
I have a dataset of, say, 150 countries from which I would like to select records of, for instance, 50 countries that I already have a vector of. How can I filter needed countries? It's troubling to repetitively use | like:
filter(mydata, country == "A" | country == "B")
Recommendation much appreciated.
You can use %in%.
An example data set:
mydata <- data.frame(country = LETTERS[1:10])
# country
# 1 A
# 2 B
# 3 C
# 4 D
# 5 E
# 6 F
# 7 G
# 8 H
# 9 I
# 10 J
Vector of letters:
vec <- c("A", "B", "C")
The code:
library(dplyr)
filter(mydata, country %in% vec)
# country
# 1 A
# 2 B
# 3 C