How to create a R dictionary [duplicate] - r

I want to make the equivalent of a python dict in R. Basically in python, I have:
visited = {}
if atom_count not in visited:
Do stuff
visited[atom_count] = 1
The idea is, if I saw that specific, atom_count, I have visited[atom_count] = 1. Thus, if I see that atom_count again, then I don't "Do Stuff". Atom_Count is an integer.
Thanks!

The closest thing to a python dict in R is simply a list. Like most R data types, lists can have a names attribute that can allow lists to act like a set of name-value pairs:
> l <- list(a = 1,b = "foo",c = 1:5)
> l
$a
[1] 1
$b
[1] "foo"
$c
[1] 1 2 3 4 5
> l[['c']]
[1] 1 2 3 4 5
> l[['b']]
[1] "foo"
Now for the usual disclaimer: they are not exactly the same; there will be differences. So you will be inviting disappointment to try to literally use lists exactly the way you might use a dict in python.

If, like in your case, you just want your "dictionary" to store values of the same type, you can simply use a vector, and name each element.
> l <- c(a = 1, b = 7, f = 2)
> l
a b f
1 7 2
If you want to access the "keys", use names.
> names(l)
[1] "a" "b" "f"

I believe that the use of a hash table (creating a new environment) may be the solution to your problem. I'd type out how to do this but I just did so yesterday day at talkstats.com.
If your dictionary is large and only two columns then this may be the way to go. Here's the link to the talkstats thread with sample R code:
HASH TABLE LINK

Related

How do I find the position of a (fuzzy) match within a string?

I have a text processing problem in R. I want to get the character within a string where a different string makes an exact match and/or a fuzzy match with some edit distance. For example:
A = "blahmatchblah"
B = "match"
C = "latch"
I would like to return something telling me that the 5th character within string A is where the match for a search of both B and C. All the pattern matching tools I'm aware of will tell me if there's a (fuzzy) match for B and C within A, but none for where that match begins.
The base function aregexec() is used for approximate string position matching. Unfortunately it's not vectorized over pattern, so we'll have to use a loop to get the positions for both B and C.
sapply(c(B, C), aregexec, A)
# $match
# [1] 5
# attr(,"match.length")
# [1] 5
#
# $latch
# [1] 5
# attr(,"match.length")
# [1] 5
See help(aregexec) for more.
I don't have rep to comment but at least for the first part of your question: gregexpr(B,A)[[1]][1] will yield 5 because "match" is a valid sub-sequence in A.
A few months back I made an interface to the fuzzywuzzy Python package in R, which has the get_matching_blocks() method (it's pretty close to what you actually ask).
Assuming you want to find the matching blocks between two strings,
A = "blahmatchblah"
B = "match"
library(fuzzywuzzyR)
init <- SequenceMatcher$new(string1 = A, string2 = B)
init$get_matching_blocks()
returns,
[[1]]
Match(a=4, b=0, size=5)
[[2]]
Match(a=13, b=5, size=0)
The first sublist gives the matching blocks of the two strings. a = 4 gives the starting index of the string A and b=0 gives the starting index of the string B (indexing starts from 0). size = 5 gives the count of characters that both strings match (in this case the matching block is "match" and has 5 characters).
The documentation, especially for SequenceMatcher, has more info.

R: return column using get() and paste()

Why does get() in combination with paste() work for dataframes but not for columns within a dataframe? how can I make it work?
ab<-12
get(paste("a","b",sep=""))
# gives: [1] 12
ab<-data.frame(a=1:3,b=3:5)
ab$a
#gives: [1] 1 2 3
get(paste("a","b",sep=""))
# gives the whole dataframe
get(paste("ab$","a",sep=""))
# gives: Error in get(paste("ab$", "a", sep = "")) : object 'ab$a' not found
Columns in dataframes are not first class objects. Their "names" are really indexing values for list-extraction. Despite the understandable confusion caused by the existence of the names function, they are not true R-names, i.e. unquoted tokens or symbols, in the list of R objects. See the ?is.symbol help page. The get function takes a character value, and then looks for it in the workspace and returns it for further processing.
> ab<-data.frame(a=1:3,b=3:5)
> ab$a
[1] 1 2 3
> get(paste("a","b",sep=""))
a b
1 1 3
2 2 4
3 3 5
>
> # and this would be the way to get the 'a' column of the ab object
get(paste("ab",sep=""))[['a']]
If there were a named object target with a value "a" tehn you could also do:
target <- "a"
get(paste("ab",sep=""))[[target]] # notice no quotes around target
# because `target` is a _real_ R name
It doesn't work because get() interprets the string it's passed as referring to an object named "ab$a" (not as referring to the element named "a" of the object named "ab") . Here's probably the best way to see what that means:
ab<-data.frame(a=1:3,b=3:5)
`ab$a` <- letters[1:3]
get("ab$a")
# [1] "a" "b" "c"

Converting a text to command

I would like to know if there’s a way to convert a text to a command and execute it. Here’s the concerned part of the script I’m working on:
Var_name<- as.character(data_list[1,1])
U4005
(data_list contains just names of vectors in a bigger dataframe called vectors)
Comm<-paste(Var_name,”<-“,”vectors$”,Var_name)
Comm
“U4005<-vectors$U4005”
The following code should work:
e <- "a <- 1"
eval(parse(text = e))
a
# [1] 1
Does it help?
This seems a strange thing to do, but I think you are better off using assign rather than creating "interactive" commands to parse.
assign(Var_name,vectors[[Var_name]])
This way you can loop over your names to pull things out quite easily. You just need to specify that it is the global environment to assign to.
x <- data.frame(a=1:3,b=letters[1:3])
ls()
[1] "x"
invisible(sapply(names(x),function(y) assign(y,x[[y]],.GlobalEnv)))
ls()
[1] "a" "b" "x"
a
[1] 1 2 3
b
[1] a b c
Levels: a b c

How to save data.frames respectively in a list?

Here's my question. It's really appreciated if you can help.
I have a list containing several data.frames with different length but the same structure.
Now I want to save the data.frames in the list respectively.
Note: not combine them using do.call(rbind,...) into one single data.frame. And Also I want to name each of the data.frame in a array.
a=c(1,2,3,4,5,6)
b=c(4,5,6,5,5,5)
c=c(3,4,5,6,7,8)
A=data.frame(a=a,b=b,c=c)
B=data.frame(a=c,b=b,c=a)
C=data.frame(a=b,b=c,c=a)
l <- list(A, B, C)
names.list <- c("NewYear_Data", "Thanks_giving", "Christmas")
Now I want to save the A B C in the list using the name in names.list
To be specific, Here I have a list l, in which have several data.frames. Now I want to save each data.frames in the list l using the name in the names.list.
I tried unlist, and get, and apply...
It would be great if anyone can solve this using plyr, reshape, or data.table methods.
Thanks a lot!
Here is the solution
l <- list(A, B, C)
nms <- c("NewYear_Data", "Thanks_giving", "Christmas")
names(l) = nms
Now you can use names like this:
l$Christmas
If you want to get rid of the list, do this:
attach(l)
Christmas
To save them in a binary file:
save(list=nms,file='file.Rdata')
Or in a text files:
for( i in 1:length(l))
write.csv(l[i],paste0(nms[i],'.txt'))
Note to avoid calling your variable names.
If the question is, "How do I save a list of data frames as separate files in a folder?", you may try using saveRDS rather than save (see here).
names(l) <- names.list
lapply(names(l), function(df)
saveRDS(l[[df]], file = paste0(df, ".rds")))
list.files(pattern = ".rds")
[1] "Christmas.rds" "NewYear_Data.rds" "Thanks_giving.rds"
# To load an individual dataframe later
df <- readRDS(file = "NewYear_Data.rds")
If you want to use save, the following should work (see here; noting that save preserves variable names, hence the with statement and use of list =).
with(l,
lapply(names(l), function(df)
save(list = df, file = paste0(df, ".rda"))))
list.files(pattern = ".rda")
[1] "Christmas.rda" "NewYear_Data.rda" "Thanks_giving.rda"
Otherwise, you can save the entire list as a single file
save(l, file = "Holidays.Rda")
Working with a single list with named elements is almost always preferable to working with lots of objects in your workspace. However, two functions that may be of convenience to acheive your aims are setNames() and list2env(). The following line will create named data.frame objects in your global environment using the names in names.list...
list2env( setNames( l , names.list ) , .GlobalEnv )
setNames() is a convenience function that sets the names on an object and returns the object. list2env() assigns each element from a named list into the specified environment so you end up with 3 data.frame objects.
After running your code, we have:
> ls()
[1] "a" "A" "b" "B" "c" "C"
[7] "l" "names.list"
You can then assign the data.frames to the names in names.list:
> invisible(mapply(function(x, y) assign(x, y, envir=globalenv()), names.list, l))
> ls()
[1] "a" "A" "b" "B" "c"
[6] "C" "Christmas" "l" "names.list" "NewYear_Data"
[11] "Thanks_giving"
> Christmas
a b c
1 4 3 1
2 5 4 2
3 6 5 3
4 5 6 4
5 5 7 5
6 5 8 6
> NewYear_Data
a b c
1 1 4 3
2 2 5 4
3 3 6 5
4 4 5 6
5 5 5 7
6 6 5 8
Then, if you want to clean up your workspace and remove what you used to create the data, you can run the following (careful, this will remove EVERYTHING in your workspace, except the data frames we just created):
> rm(list=ls()[!(ls() %in% names.list)])
> ls()
[1] "Christmas" "NewYear_Data" "Thanks_giving"
Frankly, I would recommend #Andrey's answer with attach, but if you're really looking to get the data frames to be in your workspace and get rid of the stuff you used to create it, this is an option.

In R, how can I construct an anonymous list containing an element whose name is contained in a variable?

I would like to insert an element into a list in R. The problem is that I would like it to have a name contained within a variable.
> list(c = 2)
$c
[1] 2
Makes sense. I obviously want a list item named 'c', containing 2.
> a <- "b"
> list(a = 1)
$a
[1] 1
Whoops. How do I tell R when i want it to treat a word as a variable, instead of as a name, when I am creating a list?
Some things I tried:
> list(eval(a)=2)
Error: unexpected '=' in "list(eval(a)="
> list(a, 2)
[[1]]
[1] "b"
[[2]]
[1] 2
> list(get(a) = 2)
Error: unexpected '=' in "list(get(a) ="
I know that if I already have a list() laying around, I could do this:
> ll<-list()
> ll[[a]]=456
> ll
$b
[1] 456
...But:
> list()[[a]]=789
Error in list()[[a]] = 789 : invalid (NULL) left side of assignment
How can I construct an anonymous list containing an element whose name is contained in a variable?
One option:
a <- "b"
> setNames(list(2),a)
$b
[1] 2
or the somewhat more "natural":
l <- list(2)
names(l) <- a
and if you look at the code in setNames you'll see that these two methods are so identical that "the way to do this" in R really is basically to create the object, and then set the names, in two steps. setNames is just a convenient way to do that.

Resources