Creating new column with condition - r

I have this data set:
ID Type Frequency
1 A 0.136546185
2 A 0.228915663
3 B 0.006024096
4 C 0.008032129
I want to create a new column that change the Frequency vaules less than 0.00 in to "other" and keep other information as it is. Like this :
ID Type Frequency New_Frequency
1 A 0.136546185 0.136546185
2 A 0.228915663 0.228915663
3 B 0.006024096 other
4 C 0.008032129 other
I used mutate but I dont know how to keep the original frequency bigger than 0.00.
Can you please help me?

You can't achieve what you want in base r because you cannot mix characters and numerics in the same vector. If you are willing to convert everything to characters the other answers will work. If you want to keep them numeric you need to use NA rather than "other". You can also try the labelled package which allows something like SPSS labels or SAS formats on numeric data.

Using mutate():
library(dplyr)
d <- tibble(ID = 1:4,
Type = c("A", "A", "B", "C"),
Frequency = c(0.136546185, 0.228915663, 0.006024096, 0.008032129))
d %>%
mutate(New_Frequency = case_when(Frequency < .01 ~ "other",
TRUE ~ as.character(Frequency)))

You can use ifelse
transform(df, Frequency = ifelse(Frequency < 0.01, 'Other', Frequency))
# ID Type Frequency
#1 1 A 0.136546185
#2 2 A 0.228915663
#3 3 B Other
#4 4 C Other
Note that Frequency column is now character since a column can have data of only one type.

Related

How to select rows with a certain value in r?

I am trying to edit my dataframe but cannot seem to find the function that I need to sort this out.
I have a dataframe that looks roughly like this:
Title Description Rating
Beauty and the Beast a 2.5
Aladdin b 3
Coco c 2
etc.
(rating is between 1 and 3)
I am trying to edit my dataframe so that I get a new dataframe where there is no decimal numbers for the rating column.
i.e: the new dataframe would be:
Title Description Rating
Aladdin b 3
Coco c 2
As Beaty and the Beast's rating is not 1, 2 or 3.
I feel like there's a simple function in R that I just cannot find on Google, and I was hoping someone could help.
We can use subset (from base R) with a comparison on the integer converted values of 'Rating'
subset(df1, Rating == as.integer(Rating))
# Title Description Rating
#2 Aladdin b 3
#3 Coco c 2
Or if we are comparing with specific set of values, use %in%
subset(df1, Rating %in% 1:3)
data
df1 <- structure(list(Title = c("Beauty and the Beast", "Aladdin", "Coco"
), Description = c("a", "b", "c"), Rating = c(2.5, 3, 2)),
class = "data.frame", row.names = c(NA,
-3L))
You can get the remainder after dividing by 1 and select rows where the remainder is 0.
subset(df, Rating %% 1 == 0)
# Title Description Rating
#2 Aladdin b 3
#3 Coco c 2
You want to use the dplyr function in R
library(dplyr)
df1 %>%
filter(R != 2.5)

R - function to extract value according to rank [duplicate]

This question already has answers here:
R - extracting value by rank
(2 answers)
Closed 6 years ago.
Suppose I have a data frame that I have ordered according to rate such that it now looks something like this:
Name Rate
A 10
D 11
C 11
E 12
B 13
F 14
I am trying to write a function that takes a rank value as an argument (e.g. rank = 2) and outputs the corresponding names, such that if there are ties in ranks, it would output the name that comes first alphabetically.
In this case, the data should look something like this:
Name Rate Rank
A 10 1
C 11 2
D 11 3
E 12 4
B 13 5
F NA 6
so that rank=2 would output "C" (not D)
and rank = 5 would output "B"
Suppose that the function's rank input is called "num", this is what I've tried to do:
rankName <- df[!is.na(df[,2]),]
rankName <- sort(rankName[,2],) #sorting according to Rate
rank<-seq(1,length(rankName),by=1) #creating a sequence for rank
rankName <- cbind(rankHosp,rank) #combining rankName & rank seq.
comp <- rankName[rankName[,3]==num,] #finding rate value where rank = num
rankName <- rankName[rankName[,2]==comp,] #finding rows where rates are
#equal at that rank
rankName<-rankName$Name #extracting by Name
if (length(rankName)>1){
rankName <- sort(rankName)
rankName <- rankName[1]
}
I'm getting the following error:
Error in `[.data.frame`(rankName, , 3) : undefined columns selected
I'm assuming that, regardless of my error, there's a significantly simpler way to accomplish this, but I haven't been able to figure it out.
Any advice is appreciated. Thank you!
One way of doing this would be to use base::rank() and then using grouping functionality provided by packages like dplyr
df<- read.table(header = T, text = "Name Rate
A 10
D 11
C 11
E 12
B 13
F 14")
df$rnk<- rank(df$Rate, na.last = T,ties.method = "average")
df
require(dplyr)
finaldf<- df %>% group_by(rnk) %>% mutate(Rank=floor(rnk)+ order(Name)-1) %>%
as.data.frame %>% select(c(Name,Rate,Rank))
finaldf
first rnk is created using average, so we group_by by using these averages that will be 2.5 for names D and C

How to change values in a column of a data frame based on conditions in another column?

I would like to have an equivalent of the Excel function "if". It seems basic enough, but I could not find relevant help.
I would like to assess "NA" to specific cells if two following cells in a different columns are not identical. In Excel, the command would be the following (say in C1): if(A1 = A2, B1, "NA"). I then just need to expand it to the rest of the column.
But in R, I am stuck!
Here is an equivalent of my R code so far.
df = data.frame(Type = c("1","2","3","4","4","5"),
File = c("A","A","B","B","B","C"))
df
To get the following Type of each Type in another column, I found a useful function on StackOverflow that does the job.
# determines the following Type of each Type
shift <- function(x, n){
c(x[-(seq(n))], rep(6, n))
}
df$TypeFoll <- shift(df$Type, 1)
df
Now, I would like to keep TypeFoll in a specific row when the File for this row is identical to the File on the next row.
Here is what I tried. It failed!
for(i in 1:length(df$File)){
df$TypeFoll2 <- ifelse(df$File[i] == df$File[i+1], df$TypeFoll, "NA")
}
df
In the end, my data frame should look like:
aim = data.frame(Type = c("1","2","3","4","4","5"),
File = c("A","A","B","B","B","C"),
TypeFoll = c("2","3","4","4","5","6"),
TypeFoll2 = c("2","NA","4","4","NA","6"))
aim
Oh, and by the way, if someone would know how to easily put the columns TypeFoll and TypeFoll2 just after the column Type, it would be great!
Thanks in advance
I would do it as follows (not keeping the result from the shift function)
df = data.frame(Type = c("1","2","3","4","4","5"),
File = c("A","A","B","B","B","C"), stringsAsFactors = FALSE)
# This is your shift function
len=nrow(df)
A1 <- df$File[1:(len-1)]
A2 <- df$File[2:len]
# Why do you save the result of the shift function in the df?
Then assign if(A1 = A2, B1, "NA"). As akrun mentioned ifelse is vectorised: Btw. this is how you append a column to a data.frame
df$TypeFoll2 <- c(ifelse(A1 == A2, df$Type, NA), 6) #Why 6?
As 6 is hardcoded here something like:
df$TypeFoll2 <- c(ifelse(A1 == A2, df$Type, NA), max(df$Type)+1)
Is more generic.
First off, 'for' loops are pretty slow in R, so try to think of this as vector manipulation instead.
df = data.frame(Type = c("1","2","3","4","4","5"),
File = c("A","A","B","B","B","C"));
Create shifted types and files and put it in new columns:
df$TypeFoll = c(as.character(df$Type[2:nrow(df)]), "NA");
df$FileFoll = c(as.character(df$File[2:nrow(df)]), "NA");
Now, df looks like this:
> df
Type File TypeFoll FileFoll
1 1 A 2 A
2 2 A 3 B
3 3 B 4 B
4 4 B 4 B
5 4 B 5 C
6 5 C NA NA
Then, create TypeFoll2 by combining these:
df$TypeFoll2 = ifelse(df$File == df$FileFoll, df$TypeFoll, "NA");
And you should have something that looks a lot like what you want:
> df;
Type File TypeFoll FileFoll TypeFoll2
1 1 A 2 A 2
2 2 A 3 B NA
3 3 B 4 B 4
4 4 B 4 B 4
5 4 B 5 C NA
6 5 C NA NA NA
If you want to remove the FileFoll column:
df$FileFoll = NULL;

Finding unique tuples in R but ignoring order

Since my data is much more complicated, I made a smaller sample dataset (I left the reshape in to show how I generated the data).
set.seed(7)
x = rep(seq(2010,2014,1), each=4)
y = rep(seq(1,4,1), 5)
z = matrix(replicate(5, sample(c("A", "B", "C", "D"))))
temp_df = cbind.data.frame(x,y,z)
colnames(temp_df) = c("Year", "Rank", "ID")
head(temp_df)
require(reshape2)
dcast(temp_df, Year ~ Rank)
which results in...
> dcast(temp_df, Year ~ Rank)
Using ID as value column: use value.var to override.
Year 1 2 3 4
1 2010 D B A C
2 2011 A C D B
3 2012 A B D C
4 2013 D A C B
5 2014 C A B D
Now I essentially want to use a function like unique, but ignoring order to find where the first 3 elements are unique.
Thus in this case:
I would have A,B,C in row 5
I would have A,B,D in rows 1&3
I would have A,C,D in rows 2&4
Also I need counts of these "unique" events
Also 2 more things. First, my values are strings, and I need to leave them as strings.
Second, if possible, I would have a column between year and 1 called Weighting, and then when counting these unique combinations I would include each's weighting. This isn't as important because all weightings will be small positive integer values, so I can potentially duplicate the rows earlier to account for weighting, and then tabulate unique pairs.
You could do something like this:
df <- dcast(temp_df, Year ~ Rank)
combos <- apply(df[, 2:4], 1, function(x) paste0(sort(x), collapse = ""))
combos
# 1 2 3 4 5
# "BCD" "ABC" "ACD" "BCD" "ABC"
For each row of the data frame, the values in columns 1, 2, and 3 (as labeled in the post) are sorted using sort, then concatenated using paste0. Since order doesn't matter, this ensures that identical cases are labeled consistently.
Note that the paste0 function is equivalent to paste(..., sep = ""). The collapse argument says to concatenate the values of a vector into a single string, with vector values separated by the value passed to collapse. In this case, we're setting collapse = "", which means there will be no separation between values, resulting in "ABC", "ACD", etc.
Then you can get the count of each combination using table:
table(combos)
# ABC ACD BCD
# 2 1 2
This is the same solution as #Alex_A but using tidyverse functions:
library(purrr)
library(dplyr)
df <- dcast(temp_df, Year ~ Rank)
distinct(df, ID = pmap_chr(select(df, num_range("", 1:3)),
~paste0(sort(c(...)), collapse="")))

Subsetting data.frame in in specific order in R (for setting vertex attributes)

I have information in a data.frame containing of two columns e.g.:
name age
a 10
b 20
c 30
and I have a list of names c b d. Now I want to obtain a data.frame (or list or anything) of the attributes of the original data frame in the order of the list. For the above example, that would be
name age
c 30
b 20
d NA
I feel that this shouldn't be too difficult (even in-line maybe) but I can't find a way to do it in R.
Background:
I have a 'network' object created from an edge list. I have another of vertex-attributes, but no power over how each of these is ordered initially. Now I want
assign the network vertices these attributes.
But in order to use
network %v% "age" <- dataframe[,2] I'd need the data frame to be in the right order
and for
set.vertex.attribute(network, "age", hhs$age, v = hhs$di) I'd need the vertex ids
I took your list of names ls and made it a data.frame with the same name name.
I then used left_join from dplyr
ls<-c("c","b","d")
df2<-data.frame(name=ls)
df2 %>% left_join(df,by="name")->new_df
> new_df
name age
1 c 30
2 b 20
3 d NA
Or, if you're unfamiliar with the dplyr/magrittr piping, you could re-write this as:
new_df<-left_join(df2,df,by="name")
As it yields the same result:
> new_df
name age
1 c 30
2 b 20
3 d NA
In fact, since df2 only has name, you don't even need to specify the by= argument.
new_df<-left_join(df2,df)
yields the same result.
This can be done in a single line in base R with the match function:
data.frame(name=names, age=df$age[match(names, df$name)])
# name age
# 1 c 30
# 2 b 20
# 3 d NA
Data:
names <- c("c", "b", "d")
df <- data.frame(name=c("a", "b", "c"), age=c(10, 20, 30))

Resources