Pull specific rows - r

Let's say that I have a data frame that looks like this...
City <- c("x","x","y","y","z","z","a","a")
Number <-c(1,2,3,4,5,6,7,8)
mat <- cbind.data.frame(City ,Number)
"City" "Number"
x 1
x 2
y 3
y 4
z 5
z 6
a 7
a 8
Now I want to be able to pull the data for...
list <- c("x","y", "a")
And the desired out come would look something like this...
x y a
1 3 7
2 4 8
I tried using which(list%in%City) to help find the indices to pull that data from the index but that does not produce the rows that I want.
UPDATE
Make sure that when you are using Chris' answer that your data type for "City" is "chr" otherwise you will pop up with an error message as I got originally before using the "as.character" function.

I renamed your variable list to test, because list is a function name. You can do this, using data.table:
matdt <- as.data.table(mat)
setkey(matdt, City)
sapply(test, function(x) matdt[x, Number])
x y a
[1,] 1 3 7
[2,] 2 4 8

You need to pass the City names to the extraction function one by one. In this case sapply will deliver a matrix as you expect but if there were a varying number of results per city, the retruned object would be a list rather than a matrix:
sapply( list, function(city) mat[ mat$City %in% city, "Number"] )
x y a
[1,] 1 3 7
[2,] 2 4 8

Using dplyr and tidyr:
mat %>%
filter(City %in% c("x", "y", "a")) %>%
group_by(City) %>%
mutate(Index = 1:n()) %>%
spread(City, Number) %>%
select(-Index)

Related

How to transpose the first rows into new columns in R?

I want to transpose the first two rows into two new columns, and remain the rest of data frame. How do I do it in R?
My original data
A <- c("2012","PL",3,2)
B <- c("2012","PL",6,1)
C <- c("2012","PL",7,4)
DF <- data.frame(A,B,C)
My final data after transpose
V1 <- c("2012","2012")
V2 <- c("PL","PL")
A <- c(3,2)
B <- c(6,1)
C <- c(7,4)
DF <- data.frame(V1,V2,A,B,C)
Where V1 and V2 are the names for new columns and they are created automatically.
Thank you for any assistance.
Base R:
cbind(t(DF[1:2, 1, drop=FALSE]), DF[-(1:2),])
# Warning in data.frame(..., check.names = FALSE) :
# row names were found from a short variable and have been discarded
# 1 2 A B C
# 1 2012 PL 3 6 7
# 2 2012 PL 2 1 4
though I have some concerns about the apparent key property of "2012" and "PL". That is, you start with three instances of each and end with two. Logically it makes sense, though really to me it looks as if you have a matrix of numbers associated with a single "2012","PL", but perhaps that's not how the data is coming to you. (If you can change the format of the data before getting to this point such that you have a matrix and its associated keys, then it might make data munging more direct, declarative, and resistant to bugs.)
Here is an option with slice
library(dplyr)
DF %>%
select(A) %>%
slice(1:2) %>%
t %>%
as.data.frame %>%
bind_cols(DF %>%
slice(-(1:2)))

How can I store replaced values after filter() %>% mutate()?

I'm attempting to replace empty values in column z based on the values in column x.
I've used filter() to narrow down to the rows of importance, and apply mutate() afterwards, but the mutate values are not replaced in the original dataframe. I can store it as a new dataframe, but merging afterwards would be a considerable headaches as this is happening across dozens of conditionals.
make dummy data
xx <- data.frame(x = c(1,2,3), y = c("a","","c"), z=c(5,5,""))
xx %>% filter(x == 3) %>% # filter to value of interest
filter(z == "") %>% # filter to NA values to be replaced
mutate(z = replace(z, z =="", 5) ) # mutate to replace NA value
if i do:
xx <- xx %>% filter(x == 3) %>% # filter to value of interest
filter(z == "") %>% # filter to NA values to be replaced
mutate(z = replace(z, z =="", 5) ) # mutate to replace NA value
then only the single row is stored...
I'm looking for a way to keep all of the other dataframe data but replace the mutated data.
Feels like it should be a quick fix, but been stuck on it for a while..
You can use an ifelse() statement within dplyr::mutate().
df <- data.frame(x=sample(1:10,100,T),
y=sample(c(NA,1:5),100,T))
df %>% mutate(y=ifelse(is.na(y),x,y))
x y
1 7 7
2 10 3
3 7 1
4 7 1
5 10 4
6 3 3
...

Renaming columns according to vector inside pipe

I have a data.frame df with columns A and B:
df <- data.frame(A = 1:5, B = 11:15)
There's another data.frame, df2, which I'm building by various calculations that ends up having generic column names X1 and X2, which I cannot control directly (because it passes through being a matrix at one point). So it ends up being something like:
mtrx <- matrix(1:10, ncol = 2)
mtrx %>% data.frame()
I would like to rename the columns in df2 to be the same as df. I could, of course, do it after I finish building df2 with a simple assigning:
names(df2)<-names(df)
My question is - is there a way to do this directly within the pipe? I can't seem to use dplyr::rename, because these have to be in the form of newname=oldname, and I can't seem to vectorize it. Same goes to the data.frame call itself - I can't just give it a vector of column names, as far as I can tell. Is there another option I'm missing? What I'm hoping for is something like
mtrx %>% data.frame() %>% rename(names(df))
but this doesn't work - gives error Error: All arguments must be named.
Cheers!
You can use setNames
mtrx %>%
data.frame() %>%
setNames(., nm = names(df))
# A B
#1 1 6
#2 2 7
#3 3 8
#4 4 9
#5 5 10
Or use purrr's equivalent set_names
mtrx %>%
data.frame() %>%
purrr::set_names(., nm = names(df))
A third option is "names<-"
mtrx %>%
data.frame() %>%
"names<-"(names(df))
We can use rename_all from tidyverse
library(tidyverse)
mtrx %>%
as.data.frame %>%
rename_all(~ names(df))
# A B
# 1 1 6
# 2 2 7
# 3 3 8
# 4 4 9
# 5 5 10

R - using LIKE operator with variable

I want to substitute a variable instead of a string in the %like% function from DescTools package. What I want to do with it after is to have a loop where the variable changes value and I get a different results.
I've tried a few ways but can't get it working.
Here is my sample code:
library(DescTools)
library(dplyr)
x <- c(1,2,3,4,5,6)
y <- c("a","b","c","a","a","a")
df <- data.frame(x = x, y = y)
df
Here is what I get if I seach for "a" in the x column. This is the desired output.
df %>% filter(y %like% "%a%")
# desired output
> df %>% filter(y %like% "%a%")
x y
1 1 a
2 4 a
3 5 a
4 6 a
Now I want to create a variable which will hold the value I want to search
# create a variable which will take out the value I'm looking for
let <- '"%a%"'
If I use that variable in place of the string, I get either no result or the wrong result.
Is there any way for me to use a variable insetead of a string?
#not working
df %>% filter(y %like% let)
> df %>% filter(y %like% let)
[1] x y
<0 rows> (or 0-length row.names)
#not working
df %>% filter(y %like% cat(let))
> df %>% filter(y %like% cat(let))
"%a%" x y
1 1 a
2 2 b
3 3 c
4 4 a
5 5 a
6 6 a
Option 1: Evaluate the variable.
df %>% filter(y %like% eval(parse(text = let)))
Option 2: Take advantage of the filter_ function in dplyr.
df %>% filter_(paste0("y %like%", let))
Edit: actually, the comments are better answers because it's less convoluted---it was the quote level that was the problem.

R: Add new column to dataframe using function

I have a data frame df that has two columns, term and frequency. I also have a list of terms with given IDs stored in a vector called indices. To illustrate these two info, I have the following:
> head(indices)
Term
1 hello
256 i
33 the
Also, for the data frame.
> head(df)
Term Freq
1 i 24
2 hello 12
3 the 28
I want to add a column in df called TermID which will just be the index of the term in the vector indices. I have tried using dplyr::mutate but to no avail. Here is my code below
library(dplyr)
whichindex <- function(term){
ind <- which(indices == as.character(term))
ind}
mutate(df, TermID = whichindex(Term))
What I am getting as output is a df that has a new column called TermID, but all the values for TermID are the same.
Can someone help me figure out what I am doing wrong? It would be nice as well if you can recommend a more efficient algorithm to do this in [R]. I have implemented this in Python and I have not encountered such issues.
Thanks in advance.
what about?
df %>% rowwise() %>% mutate(TermID = grep(Term,indices))
w/ example data:
library(dplyr)
indices <- c("hello","i","the")
df <- data_frame(Term = c("i","hello","the"), Freq = c(24,12,28))
df_res <- df %>% rowwise() %>% mutate(TermID = grep(Term,indices))
df_res
gives:
Source: local data frame [3 x 3]
Groups: <by row>
Term Freq TermID
1 i 24 2
2 hello 12 1
3 the 28 3

Resources