Extracting exact value from a data.frame in R

Extracting exact value from a data.frame in R - r

I can not find the answer to this question. All the answers my search engine gives me is how to round the number instead of how to get an unrounded number. Suppose I have a data.frame:
a <- c(1:4)
b <- c(1.123456789, 2.123456789, 3.123456789, 4.123456789)
df <- data.frame(a, b)
All methods I know return me a rounded number to the 6th digit after point:
df[2,2]
# [1] 2.123457
df[2,]
# a b
# 2 2 2.123457
df$b
# [1] 1.123457 2.123457 3.123457 4.123457
df$b[2]
# [1] 2.123457
df[df$a == 2, ]
# a b
# 2 2 2.123457
So, how to get the exact value? My desired output would be
[1] 2.123456789
Thank you!

Related

How to delete from vector using ifelse condition in R

I have a vector a with values (1,2,3,4) and another vector b with values (1,1,0,1). Using the elements in b as a flag, I want to remove the vector elements from A at the same positions where 0 is found in element b.
a <- c(1,2,3,4)
b <- c(1,1,0,1)
for(i in 1:length(b))
{
if(b[i] == 0)
{
a <- a[-i]
}
}
I get the desired output
a
[1] 1 2 4
But using ifelse, I do not get the output as required.
a <- c(1,2,3,4)
b <- c(1,1,0,1)
for(i in 1:length(b))
{
a <- ifelse(b[i] == 0,a[-i],a)
}
Output:
a
[1] 1
How to use ifelse in such situations?

I think ifelse isn't the correct function here since ifelse gives output of same length as input and we want to subset values here. You don't need a loop as well. You can directly do
a[b != 0]
#[1] 1 2 4
data
a <- 1:4
b <- c(1, 1, 0, 1)

Another option could be:
a[as.logical(b)]
[1] 1 2 4

If you want to use ifelse, you can use the following code
na.omit(ifelse(b==0,NA,a))
such that
> na.omit(ifelse(b==0,NA,a))
[1] 1 2 4
attr(,"na.action")
[1] 3
attr(,"class")
[1] "omit"

We can also use double negation
a[!!b]
#[1] 1 2 4
data
a <- 1:4
b <- c(1, 1, 0, 1)

Remove a sequence in a character in R

I have the following character in a data.frame:
b <- "http://datos.labcd.mx/dataset/5b18cc1e-d2f2-46b0-bf2c-e699ae2af713/resource/e265a46f-7a9f-4a30-ae0d-d5937fff17c1/download/201003.csv"
I just want to extract the number 201003.
How should I do that?

b <- "http://datos.labcd.mx/dataset/5b18cc1e-d2f2-46b0-bf2c-e699ae2af713/resource/e265a46f-7a9f-4a30-ae0d-d5937fff17c1/download/201003.csv"
Try this on 'b':
file_name <- basename(b)
file_name
# [1] "201003.csv"
number <- strsplit(file_name, "\\.")[[1]]
number
# [1] "201003" "csv"
number = as.numeric(number[1])
number
# [1] 201003
Hope this helped.

How do you determine which element in a list contains a value matching some other value?

If I have the following list:
a <- list(1:3, 4:5, 6:9)
a
[[1]]
[1] 1 2 3
[[2]]
[1] 4 5
[[3]]
[1] 6 7 8 9
I want to determine which element of the list a specific value is in. For example, I might want to find which element the number 5 falls under. In this case it would be [[2]].
My goal is to have something like
match(5,a)
return the value 2.
However, this code only checks whether the selected number exists as a complete element of a given element
match(5,a)
[1] NA
Further, unlist only tells me where in the entire length of all values my number of interest falls:
match(5,unlist(a))
[1] 5
Thoughts?

You can use grep function
grep(5, a)
# [1] 2
grep(9, a)
# [1] 3
Updated Answer
After reading #nicola 's comment came to know that grep command works only for the numbers that belong to start and end of the list and not for the numbers that are in between.
You can try the below mentioned code for the complete solution,
a <- list(1:3, 4:5, 6:9)
df <- data.frame(unlist(a))
df$group <- 0
k <- 1
i<-0
for(i in 1:length(a))
{
x[i] <- length(unlist(a[i]))
for(j in 1:x[i])
{
df$group[k] <- i
k <- k+1
}
}
colnames(df)[1] <- "num"
df[df$num == 5, ]$group
# [1] 2
> df[df$num == 9, ]$group
#[1] 3
df[df$num == 8, ]$group
# [1] 3

"replace" function examples

I don't find the help page for the replace function from the base package to be very helpful. Worst part, it has no examples which could help understand how it works.
Could you please explain how to use it? An example or two would be great.

If you look at the function (by typing it's name at the console) you will see that it is just a simple functionalized version of the [<- function which is described at ?"[". [ is a rather basic function to R so you would be well-advised to look at that page for further details. Especially important is learning that the index argument (the second argument in replace can be logical, numeric or character classed values. Recycling will occur when there are differing lengths of the second and third arguments:
You should "read" the function call as" "within the first argument, use the second argument as an index for placing the values of the third argument into the first":
> replace( 1:20, 10:15, 1:2)
[1] 1 2 3 4 5 6 7 8 9 1 2 1 2 1 2 16 17 18 19 20
Character indexing for a named vector:
> replace(c(a=1, b=2, c=3, d=4), "b", 10)
a b c d
1 10 3 4
Logical indexing:
> replace(x <- c(a=1, b=2, c=3, d=4), x>2, 10)
a b c d
1 2 10 10

You can also use logical tests
x <- data.frame(a = c(0,1,2,NA), b = c(0,NA,1,2), c = c(NA, 0, 1, 2))
x
x$a <- replace(x$a, is.na(x$a), 0)
x
x$b <- replace(x$b, x$b==2, 333)

Here's two simple examples
> x <- letters[1:4]
> replace(x, 3, 'Z') #replacing 'c' by 'Z'
[1] "a" "b" "Z" "d"
>
> y <- 1:10
> replace(y, c(4,5), c(20,30)) # replacing 4th and 5th elements by 20 and 30
[1] 1 2 3 20 30 6 7 8 9 10

Be aware that the third parameter (value) in the examples given above: the value is a constant (e.g. 'Z' or c(20,30)).
Defining the third parameter using values from the data frame itself can lead to confusion.
E.g. with a simple data frame such as this (using dplyr::data_frame):
tmp <- data_frame(a=1:10, b=sample(LETTERS[24:26], 10, replace=T))
This will create somthing like this:
a b
(int) (chr)
1 1 X
2 2 Y
3 3 Y
4 4 X
5 5 Z
..etc
Now suppose you want wanted to do, was to multiply the values in column 'a' by 2, but only where column 'b' is "X". My immediate thought would be something like this:
with(tmp, replace(a, b=="X", a*2))
That will not provide the desired outcome, however. The a*2 will defined as a fixed vector rather than a reference to the 'a' column. The vector 'a*2' will thus be
[1] 2 4 6 8 10 12 14 16 18 20
at the start of the 'replace' operation. Thus, the first row where 'b' equals "X", the value in 'a' will be placed by 2. The second time, it will be replaced by 4, etc ... it will not be replaced by two-times-the-value-of-a in that particular row.

Here's an example where I found the replace( ) function helpful for giving me insight. The problem required a long integer vector be changed into a character vector and with its integers replaced by given character values.
## figuring out replace( )
(test <- c(rep(1,3),rep(2,2),rep(3,1)))
which looks like
[1] 1 1 1 2 2 3
and I want to replace every 1 with an A and 2 with a B and 3 with a C
letts <- c("A","B","C")
so in my own secret little "dirty-verse" I used a loop
for(i in 1:3)
{test <- replace(test,test==i,letts[i])}
which did what I wanted
test
[1] "A" "A" "A" "B" "B" "C"
In the first sentence I purposefully left out that the real objective was to make the big vector of integers a factor vector and assign the integer values (levels) some names (labels).
So another way of doing the replace( ) application here would be
(test <- factor(test,labels=letts))
[1] A A A B B C
Levels: A B C

R - preserve order when using matching operators (%in%)

I am using matching operators to grab values that appear in a matrix from a separate data frame. However, the resulting matrix has the values in the order they appear in the data frame, not in the original matrix. Is there any way to preserve the order of the original matrix using the matching operator?
Here is a quick example:
vec=c("b","a","c"); vec
df=data.frame(row.names=letters[1:5],values=1:5); df
df[rownames(df) %in% vec,1]
This produces > [1] 1 2 3 which is the order "a" "b" "c" appears in the data frame. However, I would like to generate >[1] 2 1 3 which is the order they appear in the original vector.
Thanks!

Use match.
df[match(vec, rownames(df)), ]
# [1] 2 1 3
Be aware that if you have duplicate values in either vec or rownames(df), match may not behave as expected.
Edit:
I just realized that row name indexing will solve your issue a bit more simply and elegantly:
df[vec, ]
# [1] 2 1 3

Use match (and get rid of the NA values for elements in either vector for those that don't match in the other):
Filter(function(x) !is.na(x), match(rownames(df), vec))

Since row name indexing also works on vectors, we can take this one step further and define:
'%ino%' <- function(x, table) {
xSeq <- seq(along = x)
names(xSeq) <- x
Out <- xSeq[as.character(table)]
Out[!is.na(Out)]
}
We now have the desired result:
df[rownames(df) %ino% vec, 1]
[1] 2 1 3
Inside the function, names() does an auto convert to character and table is changed with as.character(), so this also works correctly when the inputs to %ino% are numbers:
LETTERS[1:26 %in% 4:1]
[1] "A" "B" "C" "D"
LETTERS[1:26 %ino% 4:1]
[1] "D" "C" "B" "A"
Following %in%, missing values are removed:
LETTERS[1:26 %in% 3:-5]
[1] "A" "B" "C"
LETTERS[1:26 %ino% 3:-5]
[1] "C" "B" "A"
With %in% the logical sequence is repeated along the dimension of the object being subsetted, this is not the case with %ino%:
data.frame(letters, LETTERS)[1:5 %in% 3:-5,]
letters LETTERS
1 a A
2 b B
3 c C
6 f F
7 g G
8 h H
11 k K
12 l L
13 m M
16 p P
17 q Q
18 r R
21 u U
22 v V
23 w W
26 z Z
data.frame(letters, LETTERS)[1:5 %ino% 3:-5,]
letters LETTERS
3 c C
2 b B
1 a A

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

Extracting exact value from a data.frame in R - r

Related

How to delete from vector using ifelse condition in R

Remove a sequence in a character in R

How do you determine which element in a list contains a value matching some other value?

"replace" function examples

R - preserve order when using matching operators (%in%)

Categories

Resources