convert all features of a image to a row vector - r

I currently have:
arr<-array(1:3, c(2,4,6))
dim(arr)
#[1] 2 4 6
mg <-data.frame(arr)
dim(mg)
#[1] 2 24
dim(mg2)
#[1] 2 24
dim(mg2)
#[1] 2 24
And I want to get a row vector with the result:
1 * 48
I've tried to use:
as.vector(t(mg2))
But doesn't the result doesn't multiply 2*24.
How can I get the result?

to transform the array to a vector you can use any of the ones below:
> c(arr)
> as.vector(arr)
> as.matrix(arr)
> t(as.matrix(arr))
The first two codes produce a column vector while the last two produce a matrix of dim 48*1,1*48.
If you first make it as a dataframe, remember the dimension of your array: 2 rows. Thus your dataframe must have two rows. That's why it is giving you the 2*24. But still from here you can make it a a vector.
The code as.vector(t(mg)) will give a vector but the values will be read in a row instead of in a column. Thus for the example above the result will be 1 3 2 1 3 2... instead of 1 2 3 1 2 3 .... You can fix this by doing a double transpose on the mg. ie as.vector(t(t(mg))) or c(as.matrix(mg)).

Related

Sort function in R when index.return=TRUE

I have the following vector in R:
> A<-c(8.1915935, 3.0138083, 0.3245712, 10.7353747, 13.7505131 ,63.2337407, 16.7505131, 5.7781297)
I want to sort it, and, at the same time, know each element's position in the sorted vector. So i use the following function:
sort(A, index.return=T)
And I get the following output, which I don't clearly understand:
$x
[1] 0.3245712 3.0138083 5.7781297 8.1915935 10.7353747 13.7505131 16.7505131 63.2337407
$ix
[1] 3 2 8 1 4 5 7 6
Looking at the original vector A, the first element, goes in the 4th position of the sorted vector. So the first element of "$ix" should be 4. Why is it 3?
Then, the biggest number of the vector is the 6th of A. But the 6th element of $ix is not 8, as I expected to see (the length of the vector)but 6. Why?
And so on, for all the elements. Clearly, there is something I don't understand about this output.
$ix is indicating the position of the elements of x in the original vector; you were hoping for the reverse -- the location of the elements in the original vector in x. The difference is between order() and rank()
> order(A)
[1] 3 2 8 1 4 5 7 6
> rank(A)
[1] 4 2 1 5 6 8 7 3
Note that order(order(A)) == rank(A), so one way to get the answer you're looking for is
result <- sort(A, index.return = TRUE)
order(result$ix)

R: match () only returns first occurrence

I have a dataframe
names2 <- c('AdagioBarber','AdagioBarber', 'Beethovan','Beethovan')
Value <- c(33,55,21,54)
song.data <- data.frame(names2,Value)
I would like to arrange it according to this character vector
names <- c('Beethovan','Beethovan','AdagioBarber','AdagioBarber')
I am using match() to achieve this
data.frame(song.data[match((names), (song.data$names2)),])
The problem is that match returns only first occurences
names2 Value
3 Beethovan 21
3.1 Beethovan 21
1 AdagioBarber 33
1.1 AdagioBarber 33
You can use order, as #zx8754 and #Evan Friedland have pointed out.
> name.order <- c('Beethovan','AdagioBarber')
> song.data$names2 <- factor(song.data$names2, levels= name.order)
> song.data[order(song.data$names2), ]
names2 Value
3 Beethovan 21
4 Beethovan 54
1 AdagioBarber 33
2 AdagioBarber 55
Basically, factor turns the strings into integers and creates a lookup table of what integers correspond to what strings. The levels argument specifies what you want that lookup table to be. Without that argument, it would just go by order of appearance.
So for example:
> as.numeric(factor(letters[1:5]))
[1] 1 2 3 4 5
> as.numeric(factor(letters[1:5], levels=c("d","b","e","a","c")))
[1] 4 2 5 1 3
Note: You'll need to be absolutely sure you get all your (correctly spelled) levels in that name.order vector, otherwise you'll end up with NA's in the output from order.
(I'm not sure why sort doesn't have the ability to sort factors, but it is what it is.)

Find string in data.frame

How do I search for a string in a data.frame? As a minimal example, how do I find the locations (columns and rows) of 'horse' in this data.frame?
> df = data.frame(animal=c('goat','horse','horse','two', 'five'), level=c('five','one','three',30,'horse'), length=c(10, 20, 30, 'horse', 'eight'))
> df
animal level length
1 goat five 10
2 horse one 20
3 horse three 30
4 two 30 horse
5 five horse eight
... so row 4 and 5 have the wrong order. Any output that would allow me to identify that 'horse' has shifted to the level column in row 5 and to the length column in row 4 is good. Maybe:
> magic_function(df, 'horse')
col row
'animal', 2
'animal', 3
'length', 4
'level', 5
Here's what I want to use this for: I have a very large data frame (around 60 columns, 20.000 rows) in which some columns are messed up for some rows. It's too large to eyeball in order to identify the different ways that order can be wrong, so searching would be nice. I will use this info to move data to the correct columns for these rows.
What about:
which(df == "horse", arr.ind = TRUE)
# row col
# [1,] 2 1
# [2,] 3 1
# [3,] 5 2
# [4,] 4 3
Another way around:
l <- sapply(colnames(df), function(x) grep("horse", df[,x]))
$animal
[1] 2 3
$level
[1] 5
$length
[1] 4
If you want the output to be matrix:
sapply(l,'[',1:max(lengths(l)))
animal level length
[1,] 2 5 4
[2,] 3 NA NA
We can get the indices where the value is equal to horse. Divide it by number of rows (nrow) to get the column indices and by columns (ncol) to get the row indices.
We use colnames to get column names instead of indices.
data.frame(col = colnames(df)[floor(which(df == "horse") / (nrow(df) + 1)) + 1],
row = floor(which(df == "horse") / ncol(df)) + 1)
# col row
#1 animal 1
#2 animal 2
#3 level 4
#4 length 5
Another way to do it is the following:
library(data.table)
library(zoo)
library(dplyr)
library(timeDate)
library(reshape2)
data frame name = tbl_account
first,Transpose it :
temp = t(tbl_Account)
Then, put it in to a list :
temp = list(temp)
This essentially puts every single observation in a data frame in to one massive string, allowing you to search the whole data frame in one go.
then do the searching :
temp[[1]][grep("Horse",temp[[1]])] #brings back the actual value occurrences
grep("Horse", temp[[1]]) # brings back the position of the element in a list it occurs in
hope this helps :)

Count number of short strings in a long string in R [duplicate]

This question already has answers here:
How to calculate the number of occurrence of a given character in each row of a column of strings?
(14 answers)
Closed 6 years ago.
suppose I have a long string such like:
c<-"abcabcdabcdeabcdefghijkabcdabcaba"
My question is how to quickly count the number of exact "abcd" in c.
1) gregexpr First paste "abcd" onto c so that there is at least 1 match. (This is needed because gregexpr returns -1 for any component of c having no matches rather than a zero length numeric vector.) Now, gregexpr returns a list whose components are numeric vectors of the starting positions of the matches one component per component of c -- in this case c only has one component but the code below works more generally. Now find the lengths of the components of the result of gregexpr and subtract 1 to take into account the extra abcd we added. No packages are used.
Example 1
lengths(gregexpr("abcd", paste(c, "abcd"))) - 1
## [1] 4
Note: If we knew that there was at least one match it could be slightly simplified to: lengths(gregexpr("abcd", c)) .
Example 2
Here is another example. Here DF has 3 rows and the corresponding components of c have 4, 4, and 0 occurrences of "abcd".
DF <- data.frame(c = c(c, c, "X")) # test input
lengths(gregexpr("abcd", paste(DF$c, "abcd"))) - 1
## [1] 4 4 0
2) regmatches
Here is an alternative approach. This approach has the advantage that no special code is needed for the no-match case. Again, no packages are used.
Here are the same two examples:
lengths(regmatches(c, gregexpr("abcd", c)))
## [1] 4
lengths(regmatches(DF$c, gregexpr("abcd", DF$c)))
## [1] 4 4 0
Using library stringr, you can do it as follows (on larger set, it will be fairly fast and efficient):
library(stringr)
c <- "abcabcdabcdeabcdefghijkabcdabcaba"
c
[1] "abcabcdabcdeabcdefghijkabcdabcaba"
str_count(c, 'abcd')
[1] 4
This will work on a column of a data frame as follows:
df <- data.frame(txt = rep(c, 10))
df$abcd_count <- str_count(df$txt, 'abcd')
df
txt abcd_count
1 abcabcdabcdeabcdefghijkabcdabcaba 4
2 abcabcdabcdeabcdefghijkabcdabcaba 4
3 abcabcdabcdeabcdefghijkabcdabcaba 4
4 abcabcdabcdeabcdefghijkabcdabcaba 4
5 abcabcdabcdeabcdefghijkabcdabcaba 4
6 abcabcdabcdeabcdefghijkabcdabcaba 4
7 abcabcdabcdeabcdefghijkabcdabcaba 4
8 abcabcdabcdeabcdefghijkabcdabcaba 4
9 abcabcdabcdeabcdefghijkabcdabcaba 4
10 abcabcdabcdeabcdefghijkabcdabcaba 4
Here is one method using base Rs gsub and strsplit:
# example
temp <- "abcabcdabcdeabcdefghijkabcdabcaba"
# substitute pattern for character not in string, here 9
temp2 <- gsub("abcd", "9", temp)
# split on 9, and count number of elements
length(strsplit(temp2, split="9")[[1]]) - 1
You need the [[1]] because strsplit is designed to operate over vectors of strings, here the vector is of length 1. An alternative to [[1]] in this case is unlist.
Also, 1 is subtracted because the number of elements are one larger than the number of abcd patterns by 1.

How do I go from cell given by dist back to row and column numbers [duplicate]

This question already has answers here:
R - How to get row & column subscripts of matched elements from a distance matrix
(2 answers)
Closed 4 years ago.
Say I have
> x<-1:5
> dist(x)
1 2 3 4
2 1
3 2 1
4 3 2 1
5 4 3 2 1
> which(dist(x)==max(dist(x)))
[1] 4
How do I get from the index 4 back to the row and column numbers (5,1)?
There might be a tidier way ...
dist.x <- dist(x)
which(as.matrix(dist.x) == max(dist.x) & lower.tri(dist.x), arr.ind=TRUE)
# row col
# 5 5 1
dist has a method to as.matrix which is useful. You can try this:
kk <- as.matrix(dist(x))
which(kk == max(kk), arr.ind=TRUE)
For your example,
row col
5 5 1
1 1 5
dist returns an object of class "dist." You should start by reading the help file, which says:
Value
dist returns an object of class "dist".
The lower triangle of the distance matrix stored by columns in a vector, say do. If n is the number of observations, i.e., n <- attr(do, "Size"), then for i < j ≤ n, the dissimilarity between (row) i and j is do[n*(i-1) - i*(i-1)/2 + j-i]. The length of the vector is n*(n-1)/2, i.e., of order n^2.
The other answers posted modify the "dist" object in useful ways for you.

Resources