I have a dataframe with two columns: df$user and df$type. The users are a list of different user names and the type category has two values: 'high_user' and 'small_user'
I want to create some code so that one user cannot be both types. For example if the user is high_user he cannot also be a small_user.
head(df$user)
[1] RompnStomp Vladiman Celticdreamer54 Crimea is Russia shrek1978third annietattooface
head(df$type)
"high_user" "high_user" "small_user" "high_user" "high_user" "small_user"
Any help would be greatly appreciated.
One way would be to assign the first value of User to all the values of it's type.
df$new_type <- df$type[match(df$User, unique(df$User))]
df
# User type new_type
#1 a high_user high_user
#2 b high_user high_user
#3 a small_user high_user
#4 c small_user small_user
#5 c high_user small_user
This can also be done using grouped operations.
library(dplyr)
df %>% group_by(User) %>% mutate(new_type = first(type))
data
df <- data.frame(User = c('a', 'b', 'a', 'c', 'c'),
type = c('high_user', 'high_user', 'small_user', 'small_user', 'high_user'))
An option with base R
df$new_type <- with(df, ave(type, User, FUN = function(x) x[1]))
data
df <- data.frame(User = c('a', 'b', 'a', 'c', 'c'),
type = c('high_user', 'high_user', 'small_user', 'small_user', 'high_user'))
Related
I have the following datasets and information: first, I have i different plots that I want to analyze. In each plot, i have j species that I want to obtain some information, such as:
plot1 = c(rep(1, 3), rep(2, 4), rep(3, 5))
spp1 = c('a', 'b', 'c', 'a', 'b', 'c', 'd', 'b', 'b', 'b', 'e', 'f')
data.1 = data.frame(plot1, spp1)
The above mentioned information repeats for a second dataframe of similar structure:
plot2 = c(rep 1, 2), rep(2, 3), rep(3, 5))
spp2 = c('a', 'a', 'b', 'c', 'c', 'b', 'b', 'b', 'e', 'f'))
data.2 = data.frame(plot2, spp2)
What I'm trying to do is, for each i plot, setdiff(unique(data.1$spp1), unique(data.2$spp2)) and add the obtained information to a dataframe that has 2 columns: plot and spp_name
For the example datasets I'd like to obtain a final dataframe such as:
df_result = data.frame(plot = c(1,1,2,2,3), spp_name = ('b','c','a','d',0)
0 (or similar) must be returned when the setdiff(unique()) returns 'character(0)', So, in a way, my df_result needs to have, for each i plot, length equal to the number of setdiff strings between data.1$spp1 and data.2$spp2.
The first thing I did was using a for loop based on each i plot. Getting to setdiff() string result is ok to but I don't know how to add this information to a empty dataframe...do I need to loop something for each species? I really hope my question is comprehensible.
Thanks already
You could use anti_join and add rows for the missing values:
library(dplyr)
anti_join(data.1, data.2, by = c("plot1" = "plot2", "spp1" = "spp2")) %>%
add_row(plot1 = setdiff(data.1$plot1, .$plot1))
# plot1 spp1
#1 1 b
#2 1 c
#3 2 a
#4 2 d
#5 3 <NA>
I have two lists:
source <- list(c(5,10,20,30))
source.val <- list(c('A', 'B', 'C', 'D'))
Each corresponding element in source has a corresponding value in source.val. I want to create dataframe from the above two files that look like below
source.val_5 source.val_10 source.val_20 source.val_30
A B C D
I did this
tempList <- list()
for(i in 1:lengths(source)){
tempList[[i]] <- data.frame(variable = paste0('source.val_',source[[1]][[i]]),
value = source.val[[1]][[i]])
}
temp.dat <- do.call('rbind', tempList)
temp.dat_wider <- tidyr::pivot_wider(finalList, id_cols = value, names_from = variable)
Now I want to do this across a bigger list
source <- list(c(5,10,20,30),
c(5,10,20,30),
c(5,10,20,30),
c(5,10,20,30))
source.val <- list(c('A', 'B', 'C', 'D'),
c('B', 'B', 'D', 'D'),
c('C', 'B', 'A', 'D'),
c('D', 'B', 'B', 'D'))
The resulting table will have 4 rows looking like this:
A tibble: 1 x 4
source.val_5 source.val_10 source.val_20 source.val_30
A B C D
B B D D
C B A D
D B B D
What is the best way to use function like mapply to achieve my desired result?
For the example shared, where all the elements of source have the same order you can do :
cols <- paste0('source.val_', sort(unique(unlist(source))))
setNames(do.call(rbind.data.frame, source.val), cols)
# source.val_5 source.val_10 source.val_20 source.val_30
#1 A B C D
#2 B B D D
#3 C B A D
#4 D B B D
However, for a general case where every value in source do not follow the same order you can reorder source.val based on source :
source.val <- Map(function(x, y) y[order(x)], source, source.val)
and then use the above code.
as a beginner in R, I am having an issue with making a column.
I have a table of students' grades based on points and percentile.
let's say I have something like this.
enter image description here
I wish to create a new column called Finalgrade. And to do so, I would like to compare these two columns and assign the higher grade as finalgrade. Can anyone help me with this?
Let's assume that the grading system has a sequence like below
grade_seq <- c('A', 'AB', 'B', 'BC', 'C', 'D', 'E', 'F')
then
library(dplyr)
df <- df %>%
mutate_if(is.factor, as.character) %>%
mutate(Finalgrade = grade_seq[pmin(match(Gradepoints, grade_seq), match(Gradepercentile, grade_seq))])
gives
Gradepoints Gradepercentile Finalgrade
1 A B A
2 A D A
3 F D D
4 F F F
5 AB BC AB
6 AB C AB
Sample data:
df <- data.frame(Gradepoints = c('A','A','F','F','AB','AB'),
Gradepercentile = c('B','D','D','F','BC','C'))
This is an extension of the question asked in Count number of times combination of events occurs in dataframe columns, I will reword the question again so it is all here:
I have a data frame and I want to calculate the number of times each combination of events in two columns occur (in any order), with a zero if a combination doesn't appear.
For example say I have
df <- data.frame('x' = c('a', 'b', 'c', 'c', 'c'),
'y' = c('c', 'c', 'a', 'a', 'b'))
So
x y
a c
b c
c a
c a
c a
c b
a and b do not occur together, a and c 4 times (rows 2, 4, 5, 6) and b and c twice (3rd and 7th rows) so I would want to return
x-y num
a-b 0
a-c 4
b-c 2
I hope this makes sense? Thanks in advance
This should do it:
res = table(df)
To convert to data frame:
resdf = as.data.frame(res)
The resdf data.frame looks like:
x y Freq
1 a a 0
2 b a 0
3 c a 2
4 a b 0
5 b b 0
6 c b 1
7 a c 1
8 b c 1
9 c c 0
Note that this answer takes order into account. If ordering of the columns is unimportant, then modifying the original data.frame prior to the process will remove the effect of ordering (a-c treated the same as c-a).
df1 = as.data.frame(t(apply(df,1,sort)))
As said, you can do this with factor() and expand.grid() (or another way to get all possible combinations)
all.possible <- expand.grid(c('a','b','c'), c('a','b','c'))
all.possible <- all.possible[all.possible[, 1] != all.possible[, 2], ]
all.possible <- unique(apply(all.possible, 1, function(x) paste(sort(x), collapse='-')))
df <- data.frame('x' = c('a', 'b', 'c', 'c', 'c'),
'y' = c('c', 'c', 'a', 'a', 'b'))
table(factor(apply(df , 1, function(x) paste(sort(x), collapse='-')), levels=all.possible))
An alternative, because I was a bit bored. Perhaps a bit more generalised? But probably still uglier than it could be...
df2 <- as.data.frame(table(df))
df2$com <- apply(df2[,1:2],1,function(x) if(x[1] != x[2]) paste(sort(x),collapse='-'))
df2 <- df2[df2$com != "NULL",]
ddply(df2, .(unlist(com)), summarise,
num = sum(Freq))
I have
vetor <- c(1,2,3)
data <- data.frame(id=c('a', 'b', 'a', 'c', 'a'))
I need a data.frame output that match each vector value to a specific id, resulting:
id vector1
1 a 1
2 b 2
3 a 1
4 c 3
5 a 1
Here are two approaches I often use for similar situations:
vetor <- c(1,2,3)
key <- data.frame(vetor=vetor, mat=c('a', 'b', 'c'))
data <- data.frame(id=c('a', 'b', 'a', 'c', 'a'))
data$vector1 <- key[match(data$id, key$mat), 'vetor']
#or with merge
merge(data, key, by.x = "id", by.y = "mat")
So you want one unique integer for each different id column?
This is called a factor in R, and your id column is one.
To convert to a numeric representation, use as.numeric:
data <- data.frame(id=c('a', 'b', 'a', 'c', 'a'))
data$vector1 <- as.numeric(data$id)
This works because data$id is not a column of strings, but a column of factors.
Here's an answer I found that follows the "mathematical.coffee" tip:
vector1 <- c('b','a','a','c','a','a') # 3 elements to be labeled: a, b and c
labels <- factor(vector1, labels= c('char a', 'char b', 'char c') )
data.frame(vector1, labels)
The only thing we need to observe is that in the factor(vector1,...) function, vector1 will be ordered and the labels must follow that order correctly.