as a beginner in R, I am having an issue with making a column.
I have a table of students' grades based on points and percentile.
let's say I have something like this.
enter image description here
I wish to create a new column called Finalgrade. And to do so, I would like to compare these two columns and assign the higher grade as finalgrade. Can anyone help me with this?
Let's assume that the grading system has a sequence like below
grade_seq <- c('A', 'AB', 'B', 'BC', 'C', 'D', 'E', 'F')
then
library(dplyr)
df <- df %>%
mutate_if(is.factor, as.character) %>%
mutate(Finalgrade = grade_seq[pmin(match(Gradepoints, grade_seq), match(Gradepercentile, grade_seq))])
gives
Gradepoints Gradepercentile Finalgrade
1 A B A
2 A D A
3 F D D
4 F F F
5 AB BC AB
6 AB C AB
Sample data:
df <- data.frame(Gradepoints = c('A','A','F','F','AB','AB'),
Gradepercentile = c('B','D','D','F','BC','C'))
Related
I want to extract a column from a dataframe in R based on a condition for another column in the same dataframe, the dataframe is given below.
b <- c(1,2,3,4)
g <- c("a", "b" ,"b", "c")
df <- data.frame(b,g)
row.names(df) <- c("aa", "bb", "cc" , "dd")
I want to extract all values for column b as a dataframe (with rownames) where column g has value 'b',
My required output is given below:
df
b
cc 3
dd 4
I have tried several methods like which or subset but it does not work. I have also tried to find the answer to this question on stackoverflow but I was not able to find it. Is there a way to do it?
Thanks,
You can use the subset function in base R -
subset(df, g == 'b', select = b)
# b
#bb 2
#cc 3
Using data.table
library(data.table)
setDT(df, key = 'g')['b', .(b)]
b
1: 2
2: 3
Or with collapse
library(collapse)
sbt(df, g == 'b', b)
b
1 2
2 3
This is the basic way of slicing data in r
df[df$g == 'b',]['b']
Or the tidyverse answer
df %>%
filter(g == 'b') %>%
select(b)
I have a dataframe with two columns: df$user and df$type. The users are a list of different user names and the type category has two values: 'high_user' and 'small_user'
I want to create some code so that one user cannot be both types. For example if the user is high_user he cannot also be a small_user.
head(df$user)
[1] RompnStomp Vladiman Celticdreamer54 Crimea is Russia shrek1978third annietattooface
head(df$type)
"high_user" "high_user" "small_user" "high_user" "high_user" "small_user"
Any help would be greatly appreciated.
One way would be to assign the first value of User to all the values of it's type.
df$new_type <- df$type[match(df$User, unique(df$User))]
df
# User type new_type
#1 a high_user high_user
#2 b high_user high_user
#3 a small_user high_user
#4 c small_user small_user
#5 c high_user small_user
This can also be done using grouped operations.
library(dplyr)
df %>% group_by(User) %>% mutate(new_type = first(type))
data
df <- data.frame(User = c('a', 'b', 'a', 'c', 'c'),
type = c('high_user', 'high_user', 'small_user', 'small_user', 'high_user'))
An option with base R
df$new_type <- with(df, ave(type, User, FUN = function(x) x[1]))
data
df <- data.frame(User = c('a', 'b', 'a', 'c', 'c'),
type = c('high_user', 'high_user', 'small_user', 'small_user', 'high_user'))
I want to obtain the minimum distance between 2 columns, however the same name may appear in both Column A and Column B. See example below;
Patient1 Patient2 Distance
A B 8
A C 11
A D 19
A E 23
B F 6
C G 25
So the output I need is:
Patient Patient_closest_distance Distance
A B 8
B F 6
c A 11
I have tried using the list function
library(data.table)
DT <- data.table(Full_data)
j1 <- DT[ , list(Distance = min(Distance)), by = Patient1]
j2 <- DT[ , list(Distance = min(Distance)), by = Patient2]
However, I just get the minimum distance for each column, i.e. C will have 2 results as it is in both columns rather than showing the closest patient considering both columns. Also, I only get a list of distances, so I can't see which patient is linked to which;
Patient1 SNP
1: A 8
I have tried using the list function in R Studio
library(data.table)
DT <- data.table(Full_data)
j1 <- DT[ , list(Distance = min(Distance)), by = Patient1]
j2 <- DT[ , list(Distance = min(Distance)), by = Patient2]
This code below works.
# Create sample data frame
df <- data.frame(
Patient1 = c('A','B', 'A', 'A', 'C', 'B'),
Patient2 = c('B', 'A','C', 'D', 'D', 'F'),
Distance = c(10, 1, 20, 3, 60, 20)
)
# Format as character variable (instead of factor)
df$Patient1 <- as.character(df$Patient1); df$Patient2 <- as.character(df$Patient2);
# If you want mirror paths included, you'll need to add them.
# Ex.) A to C at a distance of 20 is equivalent to C to A at a distance of 20
# If you don't need these mirror paths, you can ignore these two lines.
df_mirror <- data.frame(Patient1 = df$Patient2, Patient2 = df$Patient1, Distance = df$Distance)
df <- rbind(df, df_mirror); rm(df_mirror)
# group pairs by min distance
library(dplyr)
df <- summarise(group_by(df, Patient1, Patient2), min(Distance))
# Resort, min to top.
nearest <- df[order(df$`min(Distance)`), ]
# Keep only the first of each group
nearest <- nearest[!duplicated(nearest$Patient1),]
This question already has answers here:
Split comma-separated strings in a column into separate rows
(6 answers)
Closed 3 years ago.
I have multiple values in certain rows within a column in a dataframe. I would like to have a dataframe with a new row for each row that contains multiple values for a single column. I have the gotten the values separated by am now certain how to go forward. Any thoughts?
Here is an example:
## input
tibble(
code = c(
85310,
47730,
61900,
93110,
"56210,\r\n70229",
"93110,\r\n93130,\r\n93290"),
vary2 = LETTERS[1:6])
## desired output
tibble(
code = c(85310, 47730, 61900, 93110, 56210, 70229,
93110, 93130, 93290),
vary2 = c('A', 'B', 'C', 'D', 'E', 'E', 'F', 'F', 'F')
)
## one unsuccesful approach
tibble(
code = c(
85310,
47730,
61900,
93110,
"56210,\r\n70229",
"93110,\r\n93130,\r\n93290"),
vary2 = LETTERS[1:6]) %>%
separate(col = 'code', into = LETTERS[1:3], sep = ',\\r\\n')
We can use separate_rows
library(tidyverse)
df1 %>%
separate_rows(code, sep="[,\r\n]+")
# A tibble: 9 x 2
# code vary2
# <chr> <chr>
#1 85310 A
#2 47730 B
#3 61900 C
#4 93110 D
#5 56210 E
#6 70229 E
#7 93110 F
#8 93130 F
#9 93290 F
As #KerryJackson mentioned in the comments, if we don't specify the sep, the algo will automatically pick up all the delimiters (in case we want to limit this to a particular delimiter- better to use sep)
df1 %>%
separate_rows(code)
This is an extension of the question asked in Count number of times combination of events occurs in dataframe columns, I will reword the question again so it is all here:
I have a data frame and I want to calculate the number of times each combination of events in two columns occur (in any order), with a zero if a combination doesn't appear.
For example say I have
df <- data.frame('x' = c('a', 'b', 'c', 'c', 'c'),
'y' = c('c', 'c', 'a', 'a', 'b'))
So
x y
a c
b c
c a
c a
c a
c b
a and b do not occur together, a and c 4 times (rows 2, 4, 5, 6) and b and c twice (3rd and 7th rows) so I would want to return
x-y num
a-b 0
a-c 4
b-c 2
I hope this makes sense? Thanks in advance
This should do it:
res = table(df)
To convert to data frame:
resdf = as.data.frame(res)
The resdf data.frame looks like:
x y Freq
1 a a 0
2 b a 0
3 c a 2
4 a b 0
5 b b 0
6 c b 1
7 a c 1
8 b c 1
9 c c 0
Note that this answer takes order into account. If ordering of the columns is unimportant, then modifying the original data.frame prior to the process will remove the effect of ordering (a-c treated the same as c-a).
df1 = as.data.frame(t(apply(df,1,sort)))
As said, you can do this with factor() and expand.grid() (or another way to get all possible combinations)
all.possible <- expand.grid(c('a','b','c'), c('a','b','c'))
all.possible <- all.possible[all.possible[, 1] != all.possible[, 2], ]
all.possible <- unique(apply(all.possible, 1, function(x) paste(sort(x), collapse='-')))
df <- data.frame('x' = c('a', 'b', 'c', 'c', 'c'),
'y' = c('c', 'c', 'a', 'a', 'b'))
table(factor(apply(df , 1, function(x) paste(sort(x), collapse='-')), levels=all.possible))
An alternative, because I was a bit bored. Perhaps a bit more generalised? But probably still uglier than it could be...
df2 <- as.data.frame(table(df))
df2$com <- apply(df2[,1:2],1,function(x) if(x[1] != x[2]) paste(sort(x),collapse='-'))
df2 <- df2[df2$com != "NULL",]
ddply(df2, .(unlist(com)), summarise,
num = sum(Freq))