I am trying to edit my dataframe but cannot seem to find the function that I need to sort this out.
I have a dataframe that looks roughly like this:
Title Description Rating
Beauty and the Beast a 2.5
Aladdin b 3
Coco c 2
etc.
(rating is between 1 and 3)
I am trying to edit my dataframe so that I get a new dataframe where there is no decimal numbers for the rating column.
i.e: the new dataframe would be:
Title Description Rating
Aladdin b 3
Coco c 2
As Beaty and the Beast's rating is not 1, 2 or 3.
I feel like there's a simple function in R that I just cannot find on Google, and I was hoping someone could help.
We can use subset (from base R) with a comparison on the integer converted values of 'Rating'
subset(df1, Rating == as.integer(Rating))
# Title Description Rating
#2 Aladdin b 3
#3 Coco c 2
Or if we are comparing with specific set of values, use %in%
subset(df1, Rating %in% 1:3)
data
df1 <- structure(list(Title = c("Beauty and the Beast", "Aladdin", "Coco"
), Description = c("a", "b", "c"), Rating = c(2.5, 3, 2)),
class = "data.frame", row.names = c(NA,
-3L))
You can get the remainder after dividing by 1 and select rows where the remainder is 0.
subset(df, Rating %% 1 == 0)
# Title Description Rating
#2 Aladdin b 3
#3 Coco c 2
You want to use the dplyr function in R
library(dplyr)
df1 %>%
filter(R != 2.5)
Related
Apologies for the unclear title. Although not effective, I couldn't think of a better way to describe this problem.
Here is a sample dataset I am working with
test = data.frame(
Value = c(1:5, 5:1),
Index = c(1:5, 1:5),
GroupNum = c(rep.int(1, 5), rep.int(2, 5))
)
I want to create a new column (called "Value_Standardized") whose values are calculated by grouping the data by GroupNum and then dividing each Value observation by the Value observation of the group when the Index is 1.
Here's what I've come up with so far.
test2 = test %>%
group_by(GroupNum) %>%
mutate(Value_Standardized = Value / special_function(Value))
The special_function would represent a way to get value when Index == 1.
That is also precisely the problem - I cannot figure out a way to get the denominator to be the value when index == 1 in that group. Unfortunately, the value when the index is 1 is not necessarily the max or the min of the group.
Thanks in advance.
Edit: Emphasis added for clarity.
There is a super simple tidyverse way of doing this with the method cur_data() it pulls the tibble for the current subset (group) of data and acts on it
test2 <- test %>%
group_by(GroupNum) %>%
mutate(output=Value/cur_data()$Value[1])
The cur_data() grabs the tibble, then you extract the Values column as you would normally using $Value and because the denominator is always the first row in this group, you just specify that index with [1]
Nice and neat, there are a whole bunch of cur_... methods you can use, check them out here:
Not sure if this is what you meant, nor if it's the best way to do this but...
Instead of using a group_by I used a nested pipe, filtering and then left_joining the table to itself.
test = data.frame(
Value = c(1:5, 5:1),
Index = c(1:5, 1:5),
GroupNum = c(rep.int(1, 5), rep.int(2, 5))
)
test %>%
left_join(test %>%
filter(Index == 1) %>%
select(Value,GroupNum),
by = "GroupNum",
suffix = c('','_Index_1')) %>%
mutate(Value = Value/Value_Index_1)
output:
Value Index GroupNum Value_Index_1
1 1.0 1 1 1
2 2.0 2 1 1
3 3.0 3 1 1
4 4.0 4 1 1
5 5.0 5 1 1
6 1.0 1 2 5
7 0.8 2 2 5
8 0.6 3 2 5
9 0.4 4 2 5
10 0.2 5 2 5
A quick base R solution:
test = data.frame(
Value = c(1:5, 5:1),
Index = c(1:5, 1:5),
GroupNum = c(rep.int(1, 5), rep.int(2, 5)),
Value_Standardized = NA
)
groups <- levels(factor(test$GroupNum))
for(currentGroup in groups) {
test$Value_Standardized[test$GroupNum == currentGroup] <- test$Value[test$GroupNum == currentGroup] / test$Value[test$GroupNum == currentGroup & test$Index == 1]
}
This only works under the assumption that each group will have only one observation with a "1" index though. It's easy to run into trouble...
I have this data set:
ID Type Frequency
1 A 0.136546185
2 A 0.228915663
3 B 0.006024096
4 C 0.008032129
I want to create a new column that change the Frequency vaules less than 0.00 in to "other" and keep other information as it is. Like this :
ID Type Frequency New_Frequency
1 A 0.136546185 0.136546185
2 A 0.228915663 0.228915663
3 B 0.006024096 other
4 C 0.008032129 other
I used mutate but I dont know how to keep the original frequency bigger than 0.00.
Can you please help me?
You can't achieve what you want in base r because you cannot mix characters and numerics in the same vector. If you are willing to convert everything to characters the other answers will work. If you want to keep them numeric you need to use NA rather than "other". You can also try the labelled package which allows something like SPSS labels or SAS formats on numeric data.
Using mutate():
library(dplyr)
d <- tibble(ID = 1:4,
Type = c("A", "A", "B", "C"),
Frequency = c(0.136546185, 0.228915663, 0.006024096, 0.008032129))
d %>%
mutate(New_Frequency = case_when(Frequency < .01 ~ "other",
TRUE ~ as.character(Frequency)))
You can use ifelse
transform(df, Frequency = ifelse(Frequency < 0.01, 'Other', Frequency))
# ID Type Frequency
#1 1 A 0.136546185
#2 2 A 0.228915663
#3 3 B Other
#4 4 C Other
Note that Frequency column is now character since a column can have data of only one type.
I created a column of names like this
df:
PC
1 word
2 Now
3 Hate
4 Look
5 Check
I want to create another column based on this one and I was able to do that by
df<- df %>%
mutate (PCcode= factor [letters(PC)])
However, the new column assigned letters alphabetically which I do not want! I need to assign letters from A-Z but based on the order in the column to be like this:
df:
PC PCcode
1 word A
2 word A
3 Hate B
4 Look C
5 Check D
You can use match + unique to get unique index based on the occurrence of the value.
transform(df, PCcode = LETTERS[match(PC, unique(PC))])
# PC PCcode
#1 word A
#2 word A
#3 Hate B
#4 Look C
#5 Check D
If you prefer to do it in dplyr :
library(dplyr)
df %>% mutate(PCcode = LETTERS[match(PC, unique(PC))])
data
df <- structure(list(PC = c("word", "word", "Hate", "Look", "Check"
)), class = "data.frame", row.names = c(NA, -5L))
I'm new to R and here and I need some help to structure my data.
I have two data sets:
One of them is a long format within subjects data set which is large and looks a little bit like this:
long.format <- data.frame(subject.no = c(1, 1, 1, 1, 2, 2, 2, 2), condition = c("prime", "prime", "prime", "prime", "control", "control","control","control"), response = c(1,1,1,0,1,1,1,0))
subject.no condition response
>1 1 prime 1
>2 1 prime 1
>3 1 prime 1
>4 1 prime 0
>5 2 control 1
>6 2 control 1
>7 2 control 1
>8 2 control 0
The other one is already in wide format and looks like this
wide.format <- data.frame(subject = c(1, 2), age = c(26,27), gender = c("m","f"))
subject age gender
>1 1 26 m
>2 2 27 f
The only thing I want to do now is to get the value in "condition" (and only this!) from the long format data frame to the corresponding subject in the wide data frame by adding a new column in the wide data frame (by using the columns subject.no and subject, respectively).
So the final data frame should look like this:
wide.format.aim <- data.frame(subject = c(1, 2), age = c(26,27), gender = c("m","f"), condition = c("prime","control"))
subject age gender condition
>1 1 26 m prime
>2 2 27 f control
I've tried merging but this ended up with a long format data frame added with the information from the wide format data frame... but I want it the other way around...
This is what I've tried:
test.it <- merge(x=wide.format, y=long.format[,c("subject.no", "condition")], all.x=T, by.x="subject", by.y="subject.no")
Any suggestions?
Thanks in advance!
You are interested merging the unique values from long.format[,c("subject.no", "condition")]:
unique(long.format[,c("subject.no", "condition")])
# subject.no condition
#1 1 prime
#5 2 control
You can merge using those values
merge(x = wide.format,
y = unique(long.format[,c("subject.no", "condition")]),
by.x = "subject",
by.y = "subject.no")
# subject age gender condition
#1 1 26 m prime
#2 2 27 f control
Since my data is much more complicated, I made a smaller sample dataset (I left the reshape in to show how I generated the data).
set.seed(7)
x = rep(seq(2010,2014,1), each=4)
y = rep(seq(1,4,1), 5)
z = matrix(replicate(5, sample(c("A", "B", "C", "D"))))
temp_df = cbind.data.frame(x,y,z)
colnames(temp_df) = c("Year", "Rank", "ID")
head(temp_df)
require(reshape2)
dcast(temp_df, Year ~ Rank)
which results in...
> dcast(temp_df, Year ~ Rank)
Using ID as value column: use value.var to override.
Year 1 2 3 4
1 2010 D B A C
2 2011 A C D B
3 2012 A B D C
4 2013 D A C B
5 2014 C A B D
Now I essentially want to use a function like unique, but ignoring order to find where the first 3 elements are unique.
Thus in this case:
I would have A,B,C in row 5
I would have A,B,D in rows 1&3
I would have A,C,D in rows 2&4
Also I need counts of these "unique" events
Also 2 more things. First, my values are strings, and I need to leave them as strings.
Second, if possible, I would have a column between year and 1 called Weighting, and then when counting these unique combinations I would include each's weighting. This isn't as important because all weightings will be small positive integer values, so I can potentially duplicate the rows earlier to account for weighting, and then tabulate unique pairs.
You could do something like this:
df <- dcast(temp_df, Year ~ Rank)
combos <- apply(df[, 2:4], 1, function(x) paste0(sort(x), collapse = ""))
combos
# 1 2 3 4 5
# "BCD" "ABC" "ACD" "BCD" "ABC"
For each row of the data frame, the values in columns 1, 2, and 3 (as labeled in the post) are sorted using sort, then concatenated using paste0. Since order doesn't matter, this ensures that identical cases are labeled consistently.
Note that the paste0 function is equivalent to paste(..., sep = ""). The collapse argument says to concatenate the values of a vector into a single string, with vector values separated by the value passed to collapse. In this case, we're setting collapse = "", which means there will be no separation between values, resulting in "ABC", "ACD", etc.
Then you can get the count of each combination using table:
table(combos)
# ABC ACD BCD
# 2 1 2
This is the same solution as #Alex_A but using tidyverse functions:
library(purrr)
library(dplyr)
df <- dcast(temp_df, Year ~ Rank)
distinct(df, ID = pmap_chr(select(df, num_range("", 1:3)),
~paste0(sort(c(...)), collapse="")))