Split lists by half of their lengths in R - r

I have a tibble with variable, that contains a lists. Each list has a different lengths. I would like to have two new variables, let’s say “lon” and “lat”. In variable “lon” I’d like to have first half of each list, and in variable “lat” the second half.
data:
file_url <- "https://github.com/slawomirmatuszak/Covid.UA/raw/master/sample.Rda?raw=true"
load(url(file_url))
I can achieve that by filtering lists, but I’d like to do this by more universal code (based on lengths, not a specific number).
sample.data$lon <- lapply(sample.data$geometry, function(x) unlist(x)[x<40])
sample.data$lat <- lapply(sample.data$geometry, function(x) unlist(x)[x>40])

Probably, you can try with length to get first half and second half of geometry column.
sample.data$lat <- sapply(sample.data$geometry, function(x)
{tmp <- unlist(x);tmp[1:(length(tmp)/2)]})
sample.data$lon <- sapply(sample.data$geometry, function(x)
{tmp <- unlist(x);tmp[((length(tmp)/2) + 1):length(tmp)]})

Related

Find difference of same column names across different data frames in a list in R

I have a list of data frames with same column names where each dataframe corresponds to a month
June_2018 <- data.frame(Features=c("abc","def","ghi","jkl"), Metric1=c(100,200,250,450), Metric2=c(1000,2000,5000,6000))
July_2018 <- data.frame(Features=c("abc","def","ghi","jkl"), Metric1=c(140,250,125,400), Metric2=c(2000,3000,2000,3000))
Aug_2018 <- data.frame(Features=c("abc","def","ghi","jkl"), Metric1=c(200,150,250,600), Metric2=c(1500,2000,4000,2000))
Sep_2018 <- data.frame(Features=c("abc","def","ghi","jkl"), Metric1=c(500,500,1000,100), Metric2=c(500,4000,6000,8000))
lst1 <- list(Aug_2018,June_2018,July_2018,Sep_2018)
names(lst1) <- c("Aug_2018","June_2018","July_2018","Sep_2018")
I intend to create a new column in each of the data frames in the list named Percent_Change_Metric1 and Percent_Change_Metric2 by doing below calculation
for (i in names(lst1)){
lst1[[i]]$Percent_Change_Metric1 <- ((lst1[[i+1]]$Metric1-lst1[[i]]$Metric1)*100/lst1[[i]]$Metric1)
lst1[[i]]$Percent_Change_Metric2 <- ((lst1[[i+1]]$Metric2-lst1[[i]]$Metric2)*100/lst1[[i]]$Metric2)
}
However, obviously the i in for loop is against the names(lst1) and wouldn't work
Also, the dataframes in my list in random order and not ordered by month-year. So the calculation to subtract successive dataframes' columns isn't entirely accurate.
Please advise
How I go about with adding the Percent_change_Metric1 and
Percent_change_Metric2
How to choose the dataframe corresponding
to next month to arrive at the correct Percent_Change
Thanks for the guidance
Here is one option with base R
lst1[-length(lst1)] <- Map(function(x, y)
transform(y, Percent_Change_Metric1 = (x$Metric1 - Metric1) * 100/Metric1,
Percent_Change_Metric2 = (x$Metric2 - Metric2) * 100/Metric2),
lst1[-1], lst1[-length(lst1)])

Data.frame of Data.frames

I'm using a data.frame that contains many data.frames. I'm trying to access these sub-data.frames within a loop. Within these loops, the names of the sub-data.frames are contained in a string variable. Since this is a string, I can use the [,] notation to extract data from these sub-data.frames. e.g. X <- "sub.df"and then df[42,X] would output the same as df$sub.df[42].
I'm trying to create a single row data.frame to replace a row within the sub-data.frames. (I'm doing this repeatedly and that's why my sub-data.frame name is in a string). However, I'm having trouble inserting this new data into these sub-data.frames. Here is a MWE:
#Set up the data.frames and sub-data.frames
sub.frame <- data.frame(X=1:10,Y=11:20)
df <- data.frame(A=21:30)
df$Z <- sub.frame
Col.Var <- "Z"
#Create a row to insert
new.data.frame <- data.frame(X=40,Y=50)
#This works:
df$Z[3,] <- new.data.frame
#These don't (even though both sides of the assignment give the correct values/dimensions):
df[,Col.Var][6,] <- new.data.frame #Gives Warning and collapses df$Z to 1 dimension
df[7,Col.Var] <- new.data.frame #Gives Warning and only uses first value in both places
#This works, but is a work-around and feels very inelegant(!)
eval(parse(text=paste0("df$",Col.Var,"[8,] <- new.data.frame")))
Are there any better ways to do this kind of insertion? Given my experience with R, I feel like this should be easy, but I can't quite figure it out.

Looping a rep() function in r

df is a frequency table, where the values in a were reported as many times as recorded in column x,y,z. I'm trying to convert the frequency table to the original data, so I use the rep() function.
How do I loop the rep() function to give me the original data for x, y, z without having to repeat the function several times like I did below?
Also, can I input the result into a data frame, bearing in mind that the output will have different column lengths:
a <- (1:10)
x <- (6:15)
y <- (11:20)
z <- (16:25)
df <- data.frame(a,x,y,z)
df
rep(df[,1], df[,2])
rep(df[,1], df[,3])
rep(df[,1], df[,4])
If you don't want to repeat the for loop, you can always try using an apply function. Note that you cannot store it in a data.frame because the objects are of different lengths, but you could store it in a list and access the elements in a similar way to a data.frame. Something like this works:
df2<-sapply(df[,2:4],function(x) rep(df[,1],x))
What this sapply function is saying is for each column in df[,2:4], apply the rep(df[,1],x) function to it where x is one of your columns ( df[,2], df[,3], or df[,4]).
The below code just makes sure the apply function is giving the same result as your original way.
identical(df2$x,rep(df[,1], df[,2]))
[1] TRUE
identical(df2$y,rep(df[,1], df[,3]))
[1] TRUE
identical(df2$z,rep(df[,1], df[,4]))
[1] TRUE
EDIT:
If you want it as a data.frame object you can do this:
res<-as.data.frame(sapply(df2, '[', seq(max(sapply(df2, length)))))
Note this introduces NAs into your data.frame so be careful!

R: t tests on rows of 2 dataframes

I have two dataframes and I would like to do independent 2-group t-tests on the rows (i.e. t.test(y1, y2) where y1 is a row in dataframe1 and y2 is matching row in dataframe2)
whats best way of accomplishing this?
EDIT:
I just found the format: dataframe1[i,] dataframe2[i,]. This will work in a loop. Is that the best solution?
The approach you outlined is reasonable, just make sure to preallocate your storage vector. I'd double check that you really want to compare the rows instead of the columns. Most datasets I work with have each row as a unit of observation and the columns represent separate responses/columns of interest Regardless, it's your data - so if that's what you need to do, here's an approach:
#Fake data
df1 <- data.frame(matrix(runif(100),10))
df2 <- data.frame(matrix(runif(100),10))
#Preallocate results
testresults <- vector("list", nrow(df1))
#For loop
for (j in seq(nrow(df1))){
testresults[[j]] <- t.test(df1[j,], df2[j,])
}
You now have a list that is as long as you have rows in df1. I would then recommend using lapply and sapply to easily extract things out of the list object.
It would make more sense to have your data stored as columns.
You can transpose a data.frame by
df1_t <- as.data.frame(t(df1))
df2_t <- as.data.frame(t(df2))
Then you can use mapply to cycle through the two data.frames a column at a time
t.test_results <- mapply(t.test, x= df1_t, y = df2_t, SIMPLIFY = F)
Or you could use Map which is a simple wrapper for mapply with SIMPLIFY = F (Thus saving key strokes!)
t.test_results <- Map(t.test, x = df1_t, y = df2_t)

apply a function separately on each element of a list

I have a question about applying a function on each elements of a list.
Here's my problem:
I have a list of DF (I divided a bigger DF by days):
mydf <- data.frame(x=c(1:5), y=c(21:25),z=rnorm(1:5))
mylist <- rep(list(mydf),5)
names(mylist) <-c("2006-01-01","2006-01-02","2006-01-03","2006-01-04","2006-01-05")
Don't care about this fake data if it's identical), it's just for the example. I've my results in column "z" for each DF of the list, and 2 other columns "x" and "y" representing some spatial coordinates.
I have another independent DF containing a list of "x" and "y" too, representing some specific regions (imagine 10 regions):
region <- data.frame(x=c(1:10),y=c(21:30),region=c(1:10))
The final aim is to have for each 10 regions, a value "z" (of my results) from the nearest point (according to coordinates) of each of the DF of my list.
That means for one region: 10 results "z" from DF1 of my list, then 10 other results "z" from DF2, ...
My final DF should look like this if possible (for the structure):
final1 <- data.frame("2006-01-01"=rnorm(1:10),"2006-02-01"=rnorm(1:10),
"2006-03-01"=rnorm(1:10),"2006-04-01"=rnorm(1:10),"2006-05-01"=rnorm(1:10))
With one column for one day (so one DF of the list) and one value for each row (so for example for 2006-01-01: the value "z" from the nearest point with the first region).
I already have a small function to look for the nearest value:
min.dist <- function(p, coord){
which.min( colSums((t(coord) - p)^2) )
}
Then, I'm trying to make a loop to have what I want, but I have difficulties with the list. I would need to put 2 variables in the loop, but it doesn't works.
This works approximately if I just take 1 DF of my list:
for (j in 1:nrow(region)){
imin <- min.dist(c(region[j,1],region[j,2]),mylist[[1]][,1:2])
imin[j] <- min.dist(c(region[j,1],region[j,2]),mylist[[1]][,1:2])
final <- mylist[[1]][imin[j], "z"]
final[j] <- mylist[[1]][imin[j], "z"]
final <- as.data.frame(final)
}
But if I select my whole list (in order to have one column of results for each DF of the list in the object "final"), I have errors.
I think the first problem is that the length of "regions" is different of the length of my list, and the second maybe is about adding a second variable for the length of my list.
I'm not very familiar with loop, and so with 2-variables loops.
Could you help me to change in the loop what should be changed in order to have what I'm looking for?
Thank you very much!
You can use lapply() to apply a function over a list.
This should work. It returns a list of vectors.
lapply(
mylist,
FUN = function(mydf)
mydf[apply(
region[, -3],
1,
FUN = function(x)
which.min(apply(
mydf[, -3],
1,
FUN = function(y)
dist(rbind(x, y))
))
), 3]
)

Resources