Checking whether a row name exists in a data frame? - r

I would like to address the rows of a data frame by a string name, and the table will be built sequentially. I want to do something like
> mytab <- data.frame(city=c("tokyo","delhi","lima"),price=c(9,8,7),row.names=1)
> mytab
price
tokyo 9
delhi 8
lima 7
> # I can add a new row
> mytab["london",] = 8.5
I now need to check whether a row name already exists.
> mytab["ny",]
[1] NA
Is there anything better that I can do other than
> if (is.na(mytab["ny",])) { mytab["ny",]=9;}
since a NA may possibly arise otherwise?

Something like this
if (!('ny' %in% row.names(mytab))) {mytab['ny',]=9}
might do the trick.

There are plenty of ways to do this. One of the easiest is to use the any() function like this:
# Returns true if any of the row names is 'lima', false otherwise.
any(row.names(mytab) == 'lima')
Since this returns a boolean, you can branch conditionals from it as you please.

Here's a slightly different approach if you want to check several cities in one go. That could help speed things up...
mytab <- data.frame(city=c("tokyo","delhi","lima"),price=c(9,8,7),row.names=1)
# Check several cities in one go:
newcities <- c('foo', 'delhi', 'bar')
# Returns the missing cities:
setdiff(newcities, row.names(mytab))
#[1] "foo" "bar"
# Returns the existing cities:
intersect(newcities, row.names(mytab))
#[1] "delhi"

Related

How to create a R dictionary [duplicate]

I want to make the equivalent of a python dict in R. Basically in python, I have:
visited = {}
if atom_count not in visited:
Do stuff
visited[atom_count] = 1
The idea is, if I saw that specific, atom_count, I have visited[atom_count] = 1. Thus, if I see that atom_count again, then I don't "Do Stuff". Atom_Count is an integer.
Thanks!
The closest thing to a python dict in R is simply a list. Like most R data types, lists can have a names attribute that can allow lists to act like a set of name-value pairs:
> l <- list(a = 1,b = "foo",c = 1:5)
> l
$a
[1] 1
$b
[1] "foo"
$c
[1] 1 2 3 4 5
> l[['c']]
[1] 1 2 3 4 5
> l[['b']]
[1] "foo"
Now for the usual disclaimer: they are not exactly the same; there will be differences. So you will be inviting disappointment to try to literally use lists exactly the way you might use a dict in python.
If, like in your case, you just want your "dictionary" to store values of the same type, you can simply use a vector, and name each element.
> l <- c(a = 1, b = 7, f = 2)
> l
a b f
1 7 2
If you want to access the "keys", use names.
> names(l)
[1] "a" "b" "f"
I believe that the use of a hash table (creating a new environment) may be the solution to your problem. I'd type out how to do this but I just did so yesterday day at talkstats.com.
If your dictionary is large and only two columns then this may be the way to go. Here's the link to the talkstats thread with sample R code:
HASH TABLE LINK

Ignore or display NA in a row if the search word is not available in a list- R

How to print or display Not Available if any of my search list in (Table_search) is not available in the list I input. In the input I have three lines and I have 3 keywords to search through these lines and tell me if the keyword is present in those lines or not. If present print that line else print Not available like I showed in the desired output.
My code just prints all the available lines but that doesn't help as I need to know where is the word is missing as well.
Table_search <- list("Table 14", "Source Data:","VERSION")
Table_match_list <- sapply(Table_search, grep, x = tablelist, value = TRUE)
Input:
Table 14.1.1.1 (Page 1 of 2)
Source Data: Listing 16.2.1.1.1
Summary of Subject Status by Respiratory/Non-Ambulatory at Event Entry
Desired Output:
Table 14.1.1.1 (Page 1 of 2)
Source Data: Listing 16.2.1.1.1
NA
#r2evans
sapply(unlist(Table_search), grepl, x = dat)
I get a good output with this code actually, but instead of true or false I would like to print the actual data.
I think a single regex will do it:
replace(dat, !grepl(paste(unlist(Table_search), collapse="|"), dat), NA)
# [1] "Table 14.1.1.1 (Page 1 of 2)" "Source Data: Listing 16.2.1.1.1"
# [3] NA
One problem with using sapply(., grep) is that grep returns integer indices, and if no match is made then it returns a length-0 vector. For sapply (a class-unsafe function), this means that you may or may not get a integer vector in return. Each return may be length 0 (nothing found) or length 1 (something found), and when sapply finds that each return value is not exactly the same length, it returns a list instead (ergo my "class-unsafe" verbiage above).
This doesn't change when you use value=TRUE: change my reasoning above about "0 or 1 logical" into "0 or 1 character", and it's the same exact problem.
Because of this, I suggest grepl: it should always return logical indicating found or not found.
Further, since you don't appear to need to differentiate which of the patterns is found, just "at least one of them", then we can use a single regex, joined with the regex-OR operator |. This works with an arbitrary length of your Table_search list.
If you somehow needed to know which of the patterns was found, then you might want something like:
sapply(unlist(Table_search), grepl, x = dat)
# Table 14 Source Data: VERSION
# [1,] TRUE FALSE FALSE
# [2,] FALSE TRUE FALSE
# [3,] FALSE FALSE FALSE
and then figure out what to do with the different columns (each row indicates a string within the dat vector).
One way (that is doing the same as my first code suggestion, albeit less efficiently) is
rowSums(sapply(unlist(Table_search), grepl, x = dat)) > 0
# [1] TRUE TRUE FALSE
where the logical return value indicates if something was found. If, for instance, you want to know if two or more of the patterns were found, one might use rowSums(.) >= 2).
Data
Table_search <- list("Table 14", "Source Data:","VERSION")
dat <- c("Table 14.1.1.1 (Page 1 of 2)", "Source Data: Listing 16.2.1.1.1", "Summary of Subject Status by Respiratory/Non-Ambulatory at Event Entry")

List Rows from column selection

Hello I would like to select rows in form of list from a dataframe. Here is my dataframe:
df2 <- data.frame("user_id" = 1:2, "username" = c(215,154), "password" = c("John4","Dora4"))
now with this dataframe I can only select 1 column to view rows as a list, which I did with this code
df2[["user_id"]]
output is
[1] 1 2
but now when I try this with more columns I am told its out of bounds, what is the problem here
df2[["user_id", "username"]]
How can I resolve and get the results of rows as a list
If I understood your question correctly, you need to familiarize yourself with subsetting in R. These are ways to select multiple columns in R:
df2[,c('user_id', 'username')]
or
df2[,1:2]
If you want to return all columns as a list, you can use something like this:
lapply(1:ncol(df2), function(x) df2[,x])
The format is df2['rows','columns'], so you should use:
df2[,c("user_id", "username")]
To get them 'in form of list', do:
as.list(df2[,c("user_id", "username")])
The double bracket [[ notion is used to select a single unnamed element (in this case a single unnamed column since data frames are essentially lists of column data).
See this answer for more on double vs single bracket notion: https://stackoverflow.com/a/1169495/8444966
This should give you a row of list (There's got to be an answer somewhere here).
row_list<- as.list(as.data.frame(t(df2[c("user_id", "username")])))
#$V1
#[1] 1 215
#$V2
#[1] 2 154
If you want to keep names of the rows.
df2_subset <- df2[c("user_id", "username")]
setNames(split(df2_subset, seq(nrow(df2_subset))), rownames(df2_subset))
#$`1`
# user_id username
#1 1 215
#$`2`
# user_id username
#2 2 154

Lookup Values in Named List to Replace in another Named List in R

I have a list of coded answers to a survey, along with a codex which has each possible coded answer to a particular question in said survey with the actual answer stored with it. The data is set up as a list, which is built something like this for context:
mylist=list(a=list(AA="Yes",AB="No",AC="Maybe"),b=list(BA="Yes",BB="No",BC="Maybe"))
myanswers<-list(a="AA",b="BC")
So currently the data looks like:
myanswers
$a
[1] "AA"
$b
[1] "BC"
but I would like
myanswers
$a
[1] "Yes"
$b
[1] "Maybe"
I have tried using different lapply methods but have not been able to get those to work. Also, the indexes do not always line up, so trying a for loop has not garnered the best results either.
You can do this with base R and the mapply function assuming the lists are in the same order
mapply(function(a,b) a[b], mylist, myanswers)
of if that's not the case, you can Map over the names
Map(function(x) {
mylist[[x]][[myanswers[[x]]]]
}, names(myanswers))
Another option where you first create a lookup table
(lookup <- do.call(rbind, lapply(mylist, stack)))
# values ind
#a.1 Yes AA
#a.2 No AB
#a.3 Maybe AC
#b.1 Yes BA
#b.2 No BB
#b.3 Maybe BC
And then use lapply and match for the replacement
lapply(myanswers, function(x) lookup$values[match(x, lookup$ind)])
#$a
#[1] "Yes"
#
#$b
#[1] "Maybe"

R: manipulating data.frames containing strings and booleans

I have a data.frame in R; it's called p. Each element in the data.frame is either True or False. My variable p has, say, m rows and n columns. For every row there is strictly only one TRUE element.
It also has column names, which are strings. What I would like to do is the following:
For every row in p I see a TRUE I would like to replace with the name of the corresponding column
I would then like to collapse the data.frame, which now contains FALSEs and column names, to a single vector, which will have m elements.
I would like to do this in an R-thonic manner, so as to continue my enlightenment in R and contribute to a world without for-loops.
I can do step 1 using the following for loop:
for (i in seq(length(colnames(p)))) {
p[p[,i]==TRUE,i]=colnames(p)[i]
}
but theres's no beauty here and I have totally subscribed to this for-loops-in-R-are-probably-wrong mentality. Maybe wrong is too strong but they're certainly not great.
I don't really know how to do step 2. I kind of hoped that the sum of a string and FALSE would return the string but it doesn't. I kind of hoped I could use an OR operator of some kind but can't quite figure that out (Python responds to False or 'bob' with 'bob'). Hence, yet again, I appeal to you beautiful Rstats people for help!
Here's some sample data:
df <- data.frame(a=c(FALSE, TRUE, FALSE), b=c(TRUE, FALSE, FALSE), c=c(FALSE, FALSE, TRUE))
You can use apply to do something like this:
names(df)[apply(df, 1, which)]
Or without apply by using which directly:
idx <- which(as.matrix(df), arr.ind=T)
names(df)[idx[order(idx[,1]),"col"]]
Use apply to sweep your index through, and use that index to access the column names:
> df <- data.frame(a=c(TRUE,FALSE,FALSE),b=c(FALSE,FALSE,TRUE),
+ c=c(FALSE,TRUE,FALSE))
> df
a b c
1 TRUE FALSE FALSE
2 FALSE FALSE TRUE
3 FALSE TRUE FALSE
> colnames(df)[apply(df, 1, which)]
[1] "a" "c" "b"
>

Resources