Matching Multiple Rows To Find A Value - R

Matching Multiple Rows To Find A Value - R - r

I think that this is similiar but it is not the same as a previous question that I have asked here Pull specific rows
Here is the code that I am now working with:
City <- c("x","x","y","y","z","z")
Type <- c("a","b","a","b","a","b")
Value <- c(1,3,2,5,6,10)
cbind.data.frame(City,Type,Value)
Which produces:
City Type Value
1 x a 1
2 x b 3
3 y a 2
4 y b 5
5 z a 6
6 z b 10
I want to do something similar as before but now if two different conditions must be met to pull a specific number. Lets say we had a matrix,
testmat <- matrix(c("x","x","y","a","b","b"),ncol=2)
Which looks like this:
[,1] [,2]
[1,] "x" "a"
[2,] "x" "b"
[3,] "y" "b"
The desired outcome is
[,1] [,2] [,3]
[1,] "x" "a" 1
[2,] "x" "b" 3
[3,] "y" "b" 5
Another Question PLEASE ANSWER THIS PART
City <- c("x","x","x","x","y","y","x","z")
Type <- c("a","a","a","a","a","b","a","b")
Value <- c(1,3,2,5,6,10,11,15)
mat <- cbind.data.frame(City,Type,Value)
mat
testmat <- matrix(c("y","x","b","a"),ncol=2)
testmat <- data.frame(testmat)
testmat
test <- inner_join(mat,testmat,by = c("City"="X1", "Type"="X2"))
How come when I try to use the inner_join function it gives me a warning message. Here is the warning message that I get....
In inner_join_impl(x, y, by$x, by$y) :
joining factors with different levels, coercing to character vector
This is the desired output, is...
City Type Value
1 y b 10
2 x a 1
3 x a 3
4 x a 2
5 x a 5
6 x a 11
but it is producing...
City Type Value
1 x a 1
2 x a 3
3 x a 2
4 x a 5
5 y b 10
6 x a 11
I want the inner_join function to produce the values in which they are presented first in the testmat, as shown above. So if since City "y" of type "b" comes first in the testmat I want it to come first in the values for "test"

The solution is to just switch the order of testmat and mat, like so..
test <- inner_join(testmat,mat,by = c("X1"="City", "X2"="Type"))
I find it interesting that the order of the by parameter needs to be in the same order of the data frames being passed throught the innerjoin function.

The warning is because R treats string vectors as factor type. you can change this behaviour by running the following code at the start of your script:
options(stringsAsFactors = FALSE)

Answer to second part:
The warning states, that you try to join on two factors with different levels. Therefor, the variables are coerced into "character" before joining, theres no problem with that. As Mostafa Rezaei mentioned in his answer R is coercing factors from character-vectors when creating a dataframe. Usually it's best to leave characters:
mat <- data.frame(City,Type,Value, stringsAsFactors=F)
testmat <- data.frame(testmat, stringsAsFactors=F)
Concerning your real question:
The order of the result of a join is not defined. If order is crucial to you, you can use an additional sorting variable:
mat %>%
mutate(rn = row_number()) %>%
semi_join(testmat, by = c("City"="X1", "Type"="X2")) %>%
arrange(rn)
btw: I think your looking for an semi_join rather than an inner_join, read the help file for differences.

Related

A more elegant way to combine two vectors as separate columns (or dataframes), match the rows, and have NA where they do not match

I have two vectors of the same 'thing' that I want to combine into a dataframe. Each vector will become its own column, but they will match up the rows are the same and introduce NA values for one vector where it does not match the second vector. Since the data starts as just two vectors, there are no common id values or anything to match up other than the vector values.
I got this to work in a toy data test using a simple and straightforward approach, but would like to know if there is a more direct and elegant way to do this.
My current approach requires assigning a unique value by which I can then merge the two vectors, but I am curious if I can do this without it and rely instead on the vector values. My other attempts tried to not adopt a new id value, exploring functions like merge and join, cbind, rbind, bind_rows, bind_cols, intersect and union. Perhaps I wasn't using them as well as I could. I found some other useful posts on SO (like this one), but they all already start with a unique identifier.
Here is my toy data test with a final output how I want it to look. It does not matter to me if the final output has an id column or not. Note, my actual data will be character, hence my use of letters here.
# create toy data
x <- letters[1:5]
y <- letters[2:6]
# combine into dataframe, keep only unique values & assign id
xy <- data.frame(xy=unique(c(x,y))); xy
xy$id <- 1:length(xy$xy); xy
# match id back to original toy data as dataframes
x <- data.frame(x)
x$id <- match(x$x, xy$xy)
y <- data.frame(y)
y$id <- match(y$y, xy$xy)
# merge using id
xy2 <- merge(x, y, by="id", all=TRUE)
xy2
# results in
id x y
1 1 a <NA>
2 2 b b
3 3 c c
4 4 d d
5 5 e e
6 6 <NA> f

Using tidyverse you can try using full_join and create keys based on your 2 vectors:
library(tidyverse)
full_join(data.frame(key=x, x),
data.frame(key=y, y), by="key") %>%
select(-key)
Alternatively, you can just use merge in base R:
merge(data.frame('key'=x, x), data.frame('key'=y, y), by='key', all=T)[-1]
Output
x y
1 a <NA>
2 b b
3 c c
4 d d
5 e e
6 <NA> f

Here's an alternative one-liner in base R:
cbind(x[match(unique(c(x, y)), x)], y[match(unique(c(x, y)), y)])
#> [,1] [,2]
#> [1,] "a" NA
#> [2,] "b" "b"
#> [3,] "c" "c"
#> [4,] "d" "d"
#> [5,] "e" "e"
#> [6,] NA "f"

Issue while executing drop() function in R

I am trying to find out usage of drop() function. I read the documentation that a matrix or array can be the input object for the function however the size of the matrix or object does not change. Can someone explain its actual usage and how it works?
I am using R version 3.2.1. Code snippet:
data1 <- matrix(data=(1:10),nrow=1,ncol=1)
drop(data1)

R has factors, which are very cool (and somewhat analogous to labeled levels in Stata). Unfortunately, the factor list sticks around even if you remove some data such that no examples of a particular level still exist.
# Create some fake data
x <- as.factor(sample(head(colors()),100,replace=TRUE))
levels(x)
x <- x[x!="aliceblue"]
levels(x) # still the same levels
table(x) # even though one level has 0 entries!
The solution is simple: run factor() again:
x <- factor(x)
levels(x)
If you need to do this on many factors at once (as is the case with a data.frame containing several columns of factors), use drop.levels() from the gdata package:
x <- x[x!="antiquewhite1"]
df <- data.frame(a=x,b=x,c=x)
df <- drop.levels(df)

R matrix is a two dimensional array. R has a lot of operator and functions that make matrix handling very convenient.
Matrix assignment:
>A <- matrix(c(3,5,7,1,9,4),nrow=3,ncol=2,byrow=TRUE)
>A
[,1] [,2]
[1,] 3 5
[2,] 7 1
[3,] 9 4
Matrix row and column count:
>rA <- nrow(A)
>rA
[1] 3
>cA <- ncol(A)
>cA
[1] 2
t(A) function returns a transposed matrix of A:
>B <- t(A)
>B
[,1] [,2] [,3]
[1,] 3 7 9
[2,] 5 1 4
Matrix multplication:
C <- A * A
C
[,1] [,2]
[1,] 9 25
[2,] 49 1
[3,] 81 16
Matrix Addition:
>C <- A + A
>C
[,1] [,2]
[1,] 6 10
[2,] 14 2
[3,] 18 8
Matrix subtraction (-) and division (/) operations ... ...
Sometimes a matrix needs to be sorted by a specific column, which can be done by using order() function.
Following is a csv file example:
,t1,t2,t3,t4,t5,t6,t7,t8
r1,1,0,1,0,0,1,0,2
r2,1,2,5,1,2,1,2,1
r3,0,0,9,2,1,1,0,1
r4,0,0,2,1,2,0,0,0
r5,0,2,15,1,1,0,0,0
r6,2,2,3,1,1,1,0,0
r7,2,2,3,1,1,1,0,1
Following R code will read in the above file into a matrix, and sort it by column 4, then write to a output file:
x <- read.csv("sortmatrix.csv",header=T,sep=",");
x <- x[order(x[,4]),];
x <- write.table(x,file="tp.txt",sep=",")
The result is:
"X","t1","t2","t3","t4","t5","t6","t7","t8"
"1","r1",1,0,1,0,0,1,0,2
"4","r4",0,0,2,1,2,0,0,0
"6","r6",2,2,3,1,1,1,0,0
"7","r7",2,2,3,1,1,1,0,1
"2","r2",1,2,5,1,2,1,2,1
"3","r3",0,0,9,2,1,1,0,1
"5","r5",0,2,15,1,1,0,0,0

The DROP function supports natively compiled, scalar user-defined functions.
Removes one or more user-defined functions from the current database
To execute DROP FUNCTION, at a minimum, a user must have ALTER permission on the schema to which the function belongs, or CONTROL permission on the function.
DROP FUNCTION will fail if there are Transact-SQL functions or views in the database that reference this function and were created by using SCHEMA BINDING, or if there are computed columns, CHECK constraints, or DEFAULT constraints that reference the function.
DROP FUNCTION will fail if there are computed columns that reference this function and have been indexed.
DROP FUNCTION { [ schema_name. ] function_name } [ ,...n ]

Keep column name when select one column from a data frame/matrix in R

In R, when I select only one column from a data frame/matrix, the result will become a vector and lost the column names, how can I keep the column names?
For example, if I run the following code,
x <- matrix(1,3,3)
colnames(x) <- c("test1","test2","test3")
x[,1]
I will get
[1] 1 1 1
Actually, I want to get
test1
[1,] 1
[2,] 1
[3,] 1
The following code give me exactly what I want, however, is there any easier way to do this?
x <- matrix(1,3,3)
colnames(x) <- c("test1","test2","test3")
y <- as.matrix(x[,1])
colnames(y) <- colnames(x)[1]
y

Use the drop argument:
> x <- matrix(1,3,3)
> colnames(x) <- c("test1","test2","test3")
> x[,1, drop = FALSE]
test1
[1,] 1
[2,] 1
[3,] 1

Another possibility is to use subset:
> subset(x, select = 1)
test1
[1,] 1
[2,] 1
[3,] 1

The question mentions 'matrix or dataframe' as an input. If x is a dataframe, use LIST SUBSETTING notation, which will keep the column name and will NOT simplify by default!
`x <- matrix(1,3,3)
colnames(x) <- c("test1","test2","test3")
x=as.data.frame(x)
x[,1]
x[1]`
Data frames possess the characteristics of both lists and matrices: if you subset with a single vector, they behave like lists; if you subset with two vectors, they behave like matrices.
There's an important difference if you select a single
column: matrix subsetting simplifies by default, list
subsetting does not.
source: See http://adv-r.had.co.nz/Subsetting.html#subsetting-operators for details

How to order list by median?

I'm quite new to R and having some problems understanding the reorder function.
Lets say i have a list with 3 vectors like:
myList <- (c(7,5,2),c(2,3,4),c(1,1,1))
and I want my list to be reordered by the median of each vector so that boxplotting the list gives me an ordered plot. Now how would I do this? I read the Help description for ?reorder but I cant seem to adapt the given example for my list.
any help would be appreciated

I think you want
myList <- list(c(7,5,2),c(2,3,4),c(1,1,1))
unordered.median <- unlist(lapply(myList, median))
ordered.median <- order(unordered.median)
myList[ordered.median]
[[1]]
[1] 1 1 1
[[2]]
[1] 2 3 4
[[3]]
[1] 7 5 2

"replace" function examples

I don't find the help page for the replace function from the base package to be very helpful. Worst part, it has no examples which could help understand how it works.
Could you please explain how to use it? An example or two would be great.

If you look at the function (by typing it's name at the console) you will see that it is just a simple functionalized version of the [<- function which is described at ?"[". [ is a rather basic function to R so you would be well-advised to look at that page for further details. Especially important is learning that the index argument (the second argument in replace can be logical, numeric or character classed values. Recycling will occur when there are differing lengths of the second and third arguments:
You should "read" the function call as" "within the first argument, use the second argument as an index for placing the values of the third argument into the first":
> replace( 1:20, 10:15, 1:2)
[1] 1 2 3 4 5 6 7 8 9 1 2 1 2 1 2 16 17 18 19 20
Character indexing for a named vector:
> replace(c(a=1, b=2, c=3, d=4), "b", 10)
a b c d
1 10 3 4
Logical indexing:
> replace(x <- c(a=1, b=2, c=3, d=4), x>2, 10)
a b c d
1 2 10 10

You can also use logical tests
x <- data.frame(a = c(0,1,2,NA), b = c(0,NA,1,2), c = c(NA, 0, 1, 2))
x
x$a <- replace(x$a, is.na(x$a), 0)
x
x$b <- replace(x$b, x$b==2, 333)

Here's two simple examples
> x <- letters[1:4]
> replace(x, 3, 'Z') #replacing 'c' by 'Z'
[1] "a" "b" "Z" "d"
>
> y <- 1:10
> replace(y, c(4,5), c(20,30)) # replacing 4th and 5th elements by 20 and 30
[1] 1 2 3 20 30 6 7 8 9 10

Be aware that the third parameter (value) in the examples given above: the value is a constant (e.g. 'Z' or c(20,30)).
Defining the third parameter using values from the data frame itself can lead to confusion.
E.g. with a simple data frame such as this (using dplyr::data_frame):
tmp <- data_frame(a=1:10, b=sample(LETTERS[24:26], 10, replace=T))
This will create somthing like this:
a b
(int) (chr)
1 1 X
2 2 Y
3 3 Y
4 4 X
5 5 Z
..etc
Now suppose you want wanted to do, was to multiply the values in column 'a' by 2, but only where column 'b' is "X". My immediate thought would be something like this:
with(tmp, replace(a, b=="X", a*2))
That will not provide the desired outcome, however. The a*2 will defined as a fixed vector rather than a reference to the 'a' column. The vector 'a*2' will thus be
[1] 2 4 6 8 10 12 14 16 18 20
at the start of the 'replace' operation. Thus, the first row where 'b' equals "X", the value in 'a' will be placed by 2. The second time, it will be replaced by 4, etc ... it will not be replaced by two-times-the-value-of-a in that particular row.

Here's an example where I found the replace( ) function helpful for giving me insight. The problem required a long integer vector be changed into a character vector and with its integers replaced by given character values.
## figuring out replace( )
(test <- c(rep(1,3),rep(2,2),rep(3,1)))
which looks like
[1] 1 1 1 2 2 3
and I want to replace every 1 with an A and 2 with a B and 3 with a C
letts <- c("A","B","C")
so in my own secret little "dirty-verse" I used a loop
for(i in 1:3)
{test <- replace(test,test==i,letts[i])}
which did what I wanted
test
[1] "A" "A" "A" "B" "B" "C"
In the first sentence I purposefully left out that the real objective was to make the big vector of integers a factor vector and assign the integer values (levels) some names (labels).
So another way of doing the replace( ) application here would be
(test <- factor(test,labels=letts))
[1] A A A B B C
Levels: A B C

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Matching Multiple Rows To Find A Value - R - r

The solution is to just switch the order of testmat and mat, like so.. test <- inner_join(testmat,mat,by = c("X1"="City", "X2"="Type")) I find it interesting that the order of the by parameter needs to be in the same order of the data frames being passed throught the innerjoin function.

The warning is because R treats string vectors as factor type. you can change this behaviour by running the following code at the start of your script: options(stringsAsFactors = FALSE)

Related

A more elegant way to combine two vectors as separate columns (or dataframes), match the rows, and have NA where they do not match

Issue while executing drop() function in R

Keep column name when select one column from a data frame/matrix in R

How to order list by median?

"replace" function examples

Categories

Resources