plot element of a list - r

I'm learning to work with lists in R, I looked in the internet and in some books but I did not find out the solution.
I have a data frame with n rows and several columns. What I would like to do is a simple and quickly way to plot one column (e.g. value1) for each year (other column).
First, I created a list from a data.frame using, split
lst<-split(X, X$Year)
So now I have a subset of data frame divided for years, and that's fine.
But now, how can I create now plot of value1 for each year?
I tried to write a short script, but it doesn't work at all
lst<-split(X, X$Year)
for (i in names(lst)) {
plot(i$value1)
}

Is this what you are after:
library(xts)
data<-xts(data.frame(a=c(1,2,3), b=c(4,5,6)), c(as.POSIXct("1970-01-01"), as.POSIXct("1971-01-01"), as.POSIXct("1972-01-01")))
plot(data[,1], ylim=c(min(data), max(data)))
for (i in 2:ncol(data)) {
lines(data[,i])
}
Crude but works...

Related

R for loop: creating data frames using split?

I have data that I want to separate by date, I have managed to do this manually through:
tsssplit <- split(tss, tss$created_at)
and then creating dataframes for each list which I then use.
t1 <- tsssplit[[1]]
t2 <- tsssplit[[2]]
But I don't know how many splits I will need, as sometimes the og data frame may may have 6 dates to split up by, and sometimes it may have 5, etc. So I want to create a for loop.
Within the for loop, I want to incorporate this code, which connects to a function:
bscore3 <- score.sentiment(t3$cleaned_text,pos.words,neg.words,.progress='text')
score3 <- as.integer(bscore3$score[[1]])
Then I want to be able to create a new data frame that has the scores for each list.
So essentially I want the for loop to:
split the data into lists using split
split each list into a separate data frames for each different day
Come out with a score for each data frame
Put that into a new data frame
It doesn't have to be exactly like this as long as I can come up with a visualisation of the scores at the end.
Thanks!
It is not recommended to create separate dataframes in the global environment, they are difficult to keep track of. Put them in a list instead. You have started off well by using split and creating list of dataframes. You can then iterate over each dataframe in the list and apply the function on each one of them.
Using by this would look like as :
by(tss, tss$created_at, function(x) {
bscore3 <- score.sentiment(x$cleaned_text,pos.words,neg.words,.progress='text')
score3 <- as.integer(bscore3$score[[1]])
return(score3)
}) -> result
result

Extracting a Point From Multiple Data Frames Within a List

I am trying to isolate one point in the same location (same column and row) from 1000 data frames. Each data frame has the same 8 columns with varying amounts of rows (at least one)- and I only need the points from the first row for now. These data frames are within a list created with the lapply function. Here is how I did that:
list <- list.files(pattern=".aei")
files <- lapply(list, read.table, ...)
Now, I need to isolate points from each data frame in Row 1 and Column 2. I was able to do this for one data frame with the following code:
a <- data.frame(files[1])[1,2]
However, I can't get this to work for all 1000 files. I've tried several pieces of code, such as:
all <- data.frame(files[1:999])[1,2]
all<- lapply(files data.frame)[1,2]
all<- lapply(files, data.frame[1,2])
and even two different for loops:
for(i in files [[1:999]]) {
list(files[1:999])[1,2]
}
for(i in files [[1:999]]) {
data.frame(files[1:999])[1,2]
}
Are any of these methods on the right track or are they completely wrong? I've been stuck on this for awhile and seem to have hit a complete dead end regarding any other ideas. Please let me know of any suggestions you may have!
We can use a anonymous function (lambda function) to extrac the element
lapply(files, function(x) x[1,2])
The read.table already gives a data.frame, so there is no need to wrap with data.frame

Doing operation on multiple numbered tables in R

I'm new to programming in R and I'm working with a huge dataset containing hundreds of variables and thousands of observations. Among these variables there is Age, which is my main concern. I want to get means for each other variables in function of Age. I can get smaller tables with this:
for(i in 18:84)
{
n<- sprintf("SortAgeM%d",i)
assign(x=n,subset(SortAgeM,subset=(SortAgeM$AGE>=i & SortAgeM$AGE<i+1)))
}
"SortAgeM85plus"<-subset(SortAgeM,subset=(SortAgeM$AGE>=85 & SortAgeM$AGE<100))
This gives me subdatasets for each age I'm concern with. I would then want to get the mean for each column. Each column is an observation of the volume of a specific brain region. I'm interested in knowing how is the volume decreasing with time and I would like to be able to know if individuals of a given age are close to the mean of their age or not.
Now, I would like to get one more row with the mean for each column. So I tried this:
for(i in 18:85) {
addmargins((SortAgeM%d,i), margin=1, FUN= "mean")
}
But it didn't work... I'm stuck and I'm not familiar enough with R function to find a solution on the net...
Thank you for your help.
Victor
Post answer edit: This is what I finally did:
for(i in 18:84)
{
n<- sprintf("SortAgeM%d",i)
assign(x=n,subset(SortAgeM,subset=(SortAgeM$AGE>=i & SortAgeM$AGE<i+1)))
Ajustment<-c(NA,NA,NA,NA,NA,NA,NA) #first variables aren't numeric
Line1<- colMeans(item[,8:217],na.rm=TRUE)
Line<-c(Ajustment,Ligne1)
assign(x=n, rbind(item,Ligne))
}
If you simply want an additional row with the means of each column, you can rbind the colMeans of your df like this
df_new <- rbind(df, colMeans(df))

For Loop Over List of Data Frames and Create New Data Frames from Every Iteration Using Variable Name

I cannot for the life of me figure out where the simple error is in my for loop to perform the same analyses over multiple data frames and output each iteration's new data frame utilizing the variable used along with extra string to identify the new data frame.
Here is my code:
john and jane are 2 data frames among many I am hoping to loop over and compare to bcm to find duplicate results in rows.
x <- list(john,jane)
for (i in x) {
test <- rbind(bcm,i)
test$dups <- duplicated(test$Full.Name,fromLast=T)
test$dups2 <- duplicated(test$Full.Name)
test <- test[which(test$dups==T | test$dups2==T),]
newname <- paste("dupl",i,sep=".")
assign(newname, test)
}
Thus far, I can either get the naming to work correctly without including the x data or the loop to complete correctly without naming the new data frames correctly.
Intended Result: I am hoping to create new data frames dupl.john and dupl.jane to show which rows are duplicated in comparison to bcm.
I understand that lapply() might be better to use and am very open to that form of solution. I could not figure out how to use it to solve my problem, so I turned to the more familiar for loop.
EDIT:
Sorry if I'm not being more clear. I have about 13 data frames in total that I want to run the same analysis over to find the duplicate rows in $Full.Name. I could do the first 4 lines of my loop and then dupl.john <- test 13 times (for each data frame), but I am purposely trying to write a for loop or lapply() to gain more knowledge in R and because I'm sure it is more efficient.
If I understand correctly based on your intended result, maybe using the match_df could be an option.
library(plyr)
dupl.john <- match_df(john, bcm)
dupl.jane <- match_df(jane, bcm)
dupl.john and dupl.jane will be both data frames and both will have the rows that are in these data frames and bcm. Is this what you are trying to achieve?
EDITED after the first comment
library(plyr)
l <- list(john, jane)
res <- lapply(l, function(x) {match_df(x, bcm, on = "Full.Name")} )
dupl.john <- as.data.frame(res[1])
dupl.jane <- as.data.frame(res[2])
Now, res will have a list of the data frames with the matches, based on the column "Full.Name".

Applying a function to a dataframe to trim empty columns within a list environment R

I am a naive user of R and am attempting to come to terms with the 'apply' series of functions which I now need to use due to the complexity of the data sets.
I have large, ragged, data frame that I wish to reshape before conducting a sequence of regression analyses. It is further complicated by having interlaced rows of descriptive data(characters).
My approach to date has been to use a factor to split the data frame into sets with equal row lengths (i.e. a list), then attempt to remove the trailing empty columns, make two new, matching lists, one of data and one of chars and then use reshape to produce a common column number, then recombine the sets in each list. e.g. a simplified example:
myDF <- as.data.frame(rbind(c("v1",as.character(1:10)),
c("v1",letters[1:10]),
c("v2",c(as.character(1:6),rep("",4))),
c("v2",c(letters[1:6], rep("",4)))))
myDF[,1] <- as.factor(myDF[,1])
myList <- split(myDF, myDF[,1])
myList[[1]]
I can remove the empty columns for an individual set and can split the data frame into two sets from the interlacing rows but have been stumped with the syntax in writing a function to apply the following function to the list - though 'lapply' with 'seq_along' should do it?
Thus for the individual set:
DF <- myList[[2]]
DF <- DF[,!sapply(DF, function(x) all(x==""))]
DF
(from an earlier answer to a similar, but simpler example on this site). I have a large data set and would like an elegant solution (I could use a loop but that would not use the capabilities of R effectively). Once I have done that I ought to be able to use the same rationale to reshape the frames and then recombine them.
regards
jac
Try
lapply(split(myDF, myDF$V1), function(x) x[!colSums(x=='')])

Resources