I have a list containing some basic characteristics of factories (like capacity, turnover). All values set initially to NULL:
#My List:
list.var <- list(Capacity = NULL, Production = NULL)
list <- list(Factory1 = list.var, Factory2 = list.var)
> list
$Factory1
$Factory1$Capacity
NULL
$Factory1$Production
NULL
$Factory2
$Factory2$Capacity
NULL
$Factory2$Production
NULL
Also I have data frames that contains the "missing" values separately for each characteristics for all factories, like that:
> #My Data Frame:
> df.capacity <- data.frame(Factory = c("Factory1", "Factory2"), Capacity = c(100,200))
> df.capacity
Factory Capacity
1 Factory1 100
2 Factory2 200
I want to assign the capacity values in df.capacity to the corresponding factory in my list. The result should look like this:
$Factory1
$Factory1$Capacity
[1] 100
$Factory1$Production
NULL
$Factory2
$Factory2$Capacity
[1] 200
$Factory2$Production
NULL
How can I do this? (note that I have multiple factories and even more characteristics, thus I should do it automatically each time like left join in case of data frames). I tried to convert the data frame to a list and then combine with the original one, but it didn't work for me.
From base R, you could also do:
modifyList(list, split(df.capacity[-1], df.capacity[1]))
$Factory1
$Factory1$Capacity
[1] 100
$Factory1$Production
NULL
$Factory2
$Factory2$Capacity
[1] 200
$Factory2$Production
NULL
We could match to get the corresponding values and then do the assignment
library(purrr)
imap(list, ~ {
.x$Capacity <- df.capacity$Capacity[match(.y, df.capacity$Factory)]
.x})
Or with Map from base R
Map(function(x, y) {
x$Capacity <- df.capacity$Capacity[match(y, df.capacity$Factory)]
x
},
list, names(list))
-output
$Factory1
$Factory1$Capacity
[1] 100
$Factory1$Production
NULL
$Factory2
$Factory2$Capacity
[1] 200
$Factory2$Production
NULL
Or using a for loop
for(i in seq_along(df.capacity$Factory)) list[[df.capacity$Factory[i]]]$Capacity <- df.capacity$Capacity[i]
Related
In the R environment, I have already have some variable, their name:
id_01_r
id_02_l
id_05_l
id_06_r
id_07_l
id_09_1
id_11_l
So, their pattern seems like id_ and follows two figures, then _ and r or l randomly.
Each of them corresponds to one frame but different dim() output.
Also, there are some other variables in the environment, so first I should extract these frames. For this, I'm going to adopt:
> a <- list(ls()[grep("id*",ls())])` #a little sample for just id* I know
But, this function put them as one element, so I don't think it's good way
> length(a) [1] 1
I know how to read them in like below, but now for extact and same processes, I'm so confused.
i_set <- Sys.glob(paths='mypath/////id*.txt')
for (i in i_set) {
assign(substring(i, startx, endx),read.table(file=i,header=F))
}
Here, the key point is I want to do a series of same data processing for each of these frames. But based on these, what can I do instead of one by one?
Thanks your kind consideration.
Here is an example:
id_01_r <- iris
id_02_l <- mtcars
foo <- 42
vars <- grep("^id_\\d{2}_[rl]$", ls(), value = TRUE)
# [1] "id_01_r" "id_02_l"
process_data <- function(df) {
dim(df)
}
processed_data <- lapply(
mget(vars),
process_data
)
# $id_01_r
# [1] 150 5
#
# $id_02_l
# [1] 32 11
I have a data frame of some 90 financial symbols (will use 3 for simplicity)
> View(syM)
symbol
1 APPL
2 YAHOO
3 IBM
I created a function that gets JSON data for these symbols and produce an output. Basically:
nX <- function(x) {
#get data for "x", format it, and store it in "nX"
nX <- x
return(nX)
}
I used a loop to get the data and store the zoo series named after each symbol accordingly.
for (i in 1:nrow(syM)) {
assign(x = paste0(syM[i,]),
value = nX(x = syM[i,]))
Sys.sleep(time = 1)
}
Which results in:
[1] "APPL" "YAHOO" "IBM"
Each is a zoo series with 5 columns of data.
Further, I want to get some plotting done to each series and output the result, preferably using a for loop or something better.
yN <- function(y) {
#plot "y" series, columns 2 and 3, and store it in "yN"
yN <- y[,2:3]
return(yN)
}
Following a similar logic to my previous loop I tried:
for (i in 1:nrow(syM)) {
assign(x = paste0(pairS[i,],".plot"),
value = yN(y = paste0(syM[i,])))
}
But so far the data is not being sent to the function, only the name of the symbol, so I naturally get:
y[,2:3] : incorrect number of dimensions
I have also tried:
for (i in 1:nrow(syM)) {
assign(x = paste0(syM[i,],".plot"),
value = yN(y = ls(pattern = paste0(syM[i,]))))
}
With similar results. When I input the name of the series manually it does save the plot of the first symbol as "APPL.Plot".
assign(paste0(syM[1,], ".Plot"),
value = yN(p = APPL))
Consider lapply with setNames to create a named list of nX returned objects:
nX_list <- setNames(lapply(syM$symbol, nX), syM$symbol)
# OUTPUT ZOO OBJECTS BY NAMED INDEX
nX_list$AAPL
nX_list$YAHOO
nX_list$IBM
# CREATE SEPARATE OBJECTS FROM LIST
# BUT NO NEED TO FLOOD GLOBAL ENVIR W/ 90 OBJECTS, JUST USE 1 LIST
list2env(nX_list, envir=.GlobalEnv)
For plot function, first add a get inside function to retrieve an object by its string name, then similarly run lapply with setNames:
yN <- function(y) {
#plot "y" series, columns 2 and 3, and store it in "yN"
yobj <- get(nX_list[[y]]) # IF USING ABOVE LIST
yobj <- get(y) # IF USING SEPARATE OBJECT
yN <- yobj[,2:3]
return(yN)
}
plot_list <- setNames(lapply(syM$symbol, yN), paste0(syM$symbol, ".plot"))
# OUTPUT PLOTS BY NAMED INDEX
plot_list$AAPL.plot
plot_list$YAHOO.plot
plot_list$IBM.plot
# CREATE SEPARATE OBJECTS FROM LIST
# BUT NO NEED TO FLOOD GLOBAL ENVIR W/ 90 OBJECTS, JUST USE 1 LIST
list2env(plot_list, envir=.GlobalEnv)
As you note, you're calling yN with a character argument in:
for (i in 1:nrow(syM)) {
assign(x = paste0(pairS[i,],".plot"),
value = yN(y = paste0(syM[i,])))
}
paste0(syM[i,]) is going to resolve to a character and not the zoo object it appears you're trying to reference. Instead, use something like get():
for (i in 1:nrow(syM)) {
assign(x = paste0(pairS[i,],".plot"),
value = yN(y = get(paste0(syM[i,]))))
}
Or perhaps just store your zoo objects in a list in the first place and then operate on all elements of the list with something like lapply()...
I am trying to do the following:
If there is nothing in the dataframe, print "no_match".
If there is something, bind it to the ID of dataframe df2:
if(df == []){
print("nomatch")
}else{
cbind(df, df2$id2)
}
You could get the information about the dimensions of your data frame via dim. For example running the code:
data(mtcars)
dim(mtcars)
will show you the dimensions:
[1] 32 11
For a NULL object you would get:
mtcars <- NULL
dim(mtcars)
NULL
dim is quite flexible as in case of a data.frame with no rows:
mtcars <- mtcars[-c(1:dim(mtcars)[1]),]
you will get
> dim(mtcars)
[1] 0 11
IF statements
Constructing if statements is very simple, depening on what you want to check you can do
Object is NULL
*The object is NULL, no rows and no columns.
if (dim(df) == NULL) {
}
No rows
This data frame has columns but no observations.
if (dim(df)[1] == 0) {
}
No columns
*The object is still of class data.frame but has no data.
if (dim(df)[2] == 0) {
}
You would construct the object like that (if of interest):
data(mtcars)
mtcars <- mtcars[,-c(1:dim(mtcars)[2])]
Naturally, you can combine conditions to check for both or one event of data frame being empty.
It depends, is your data.frame actually empty or are all the elements something you consider empty.
If the data.frame is empty you can use nrow as a simple check.
tmp <- data.frame(A = numeric())
nrow(tmp)
[1] 0
if(nrow(tmp) == 0){
print("data.frame is empty")
}else{
print("data.frame contains data")
}
EDIT - OP asks about object existence
You can check if an object has been defined with exists
exists("tmp2")
[1] FALSE
exists("tmp")
[1] TRUE
Is max(dim(df)) == 0 doing the trick?
if (max(dim(df)) == 0) {
print("nomatch")
} else {
cbind(df, df2$id2)
}
I would like to rename a bunch of data frames with name function but not able to use lapply or loop.
I have group of data frames name qcew.2007, qcew.2014, etc... I have vector with name I would like all of data frame to have. They are all the same. the vector is name colnm:
colnm = c("area_fips" , "own_code", "industry_code", "agglvl_code") # example shortened
# groups has names of all data frames and goes to 2013
group =c("qcew.2007", "qcew.2008", "qcew.2009")
# using lapply
names <- lapply(group, function(d){
n = paste0(d)
names(n) = colnm
})
# using loop does not work either
for (i in seq(group)) {
names(group[[i]]) = colnm
}
Neither option works, as it is saying I am comparing vectors with uneven lengths. I must be missing something obvious. Thanks
Here you go. You need to use get otherwise you're assigning names to the character vectors in group:
# sample data
qcew.2007 <- data.frame(a=1, b=2, c=3, d=4)
qcew.2008 <- data.frame(a=3, b=4, c=5, d=6)
qcew.2009 <- data.frame(a=5, b=6, c=7, d=8)
for(i in 1:3)
assign(group[i], `names<-`(get(group[i]), colnm))
names(qcew.2007)
# [1] "area_fips" "own_code" "industry_code" "agglvl_code"
names(qcew.2008)
# [1] "area_fips" "own_code" "industry_code" "agglvl_code"
names(qcew.2009)
# [1] "area_fips" "own_code" "industry_code" "agglvl_code"
Here you use get to get the object named in each position in group and then use assign to reassign the modified object (modified by changing column names) back into that named object.
Also:
list2env(lapply(mget(group), setNames, colnm),envir=.GlobalEnv)
names(qcew.2007)
#[1] "area_fips" "own_code" "industry_code" "agglvl_code"
I have a vector of values, call it X, and a data frame, call it dat.fram. I want to run something like "grep" or "which" to find all the indices of dat.fram[,3] which match each of the elements of X.
This is the very inefficient for loop I have below. Notice that there are many observations in X and each member of "match.ind" can have zero or more matches. Also, dat.fram has over 1 million observations. Is there any way to use a vector function in R to make this process more efficient?
Ultimately, I need a list since I will pass the list to another function that will retrieve the appropriate values from dat.fram .
Code:
match.ind=list()
for(i in 1:150000){
match.ind[[i]]=which(dat.fram[,3]==X[i])
}
UPDATE:
Ok, wow, I just found an awesome way of doing this... it's really slick. Wondering if it's useful in other contexts...?!
### define v as a sample column of data - you should define v to be
### the column in the data frame you mentioned (data.fram[,3])
v = sample(1:150000, 1500000, rep=TRUE)
### now here's the trick: concatenate the indices for each possible value of v,
### to form mybiglist - the rownames of mybiglist give you the possible values
### of v, and the values in mybiglist give you the index points
mybiglist = tapply(seq_along(v),v,c)
### now you just want the parts of this that intersect with X... again I'll
### generate a random X but use whatever X you need to
X = sample(1:200000, 150000)
mylist = mybiglist[which(names(mybiglist)%in%X)]
And that's it! As a check, let's look at the first 3 rows of mylist:
> mylist[1:3]
$`1`
[1] 401143 494448 703954 757808 1364904 1485811
$`2`
[1] 230769 332970 389601 582724 804046 997184 1080412 1169588 1310105
$`4`
[1] 149021 282361 289661 456147 774672 944760 969734 1043875 1226377
There's a gap at 3, as 3 doesn't appear in X (even though it occurs in v). And the
numbers listed against 4 are the index points in v where 4 appears:
> which(X==3)
integer(0)
> which(v==3)
[1] 102194 424873 468660 593570 713547 769309 786156 828021 870796
883932 1036943 1246745 1381907 1437148
> which(v==4)
[1] 149021 282361 289661 456147 774672 944760 969734 1043875 1226377
Finally, it's worth noting that values that appear in X but not in v won't have an entry in the list, but this is presumably what you want anyway as they're NULL!
Extra note: You can use the code below to create an NA entry for each member of X not in v...
blanks = sort(setdiff(X,names(mylist)))
mylist_extras = rep(list(NA),length(blanks))
names(mylist_extras) = blanks
mylist_all = c(mylist,mylist_extras)
mylist_all = mylist_all[order(as.numeric(names(mylist_all)))]
Fairly self-explanatory: mylist_extras is a list with all the additional list stuff you need (the names are the values of X not featuring in names(mylist), and the actual entries in the list are simply NA). The final two lines firstly merge mylist and mylist_extras, and then perform a reordering so that the names in mylist_all are in numeric order. These names should then match exactly the (unique) values in the vector X.
Cheers! :)
ORIGINAL POST BELOW... superseded by the above, obviously!
Here's a toy example with tapply that might well run significantly quicker... I made X and d relatively small so you could see what's going on:
X = 3:7
n = 100
d = data.frame(a = sample(1:10,n,rep=TRUE), b = sample(1:10,n,rep=TRUE),
c = sample(1:10,n,rep=TRUE), stringsAsFactors = FALSE)
tapply(X,X,function(x) {which(d[,3]==x)})