I have a list (selected_key_ratios) containing 4 data frames ($nestle ; $unilever ; $pepsico ; $abf). Each data frame contains financial data of a company. All dataframes have the same row index and almost the same columns (only currency differ sometimes). Here is a screenshot of the list.
I'm trying to make a new list where each item would be a column of the dataframe, grouped by company. Here is a graphical exemple:
And so on for each column of the dataframes. I tried things with lapply for hours now but nothing produces the desired result.
Do you have any clues ? Thanks a lot !
You could try something like this nested lapply:
# Recreation of your list of dataframes
w <- list(
abc = data.frame(
"eps_usd" = runif(10) * 10,
"eps_gbp" = runif(10) * 8
),
def = data.frame(
"eps_usd" = runif(10) * 15,
"eps_eur" = runif(10) * 13
),
ghi = data.frame(
"eps_gbp" = runif(10) * 35,
"eps_aud" = runif(10) * 19
),
jkl = data.frame(
"eps_usd" = runif(10) * 2,
"eps_aud" = runif(10) * 1.4
)
)
# Create a new dataframe with the year column
result <- data.frame(year = 2007:2016)
# Apply to each name in the list
lapply(names(w), function(tbl) {
# Apply to each colname of each df
lapply(colnames(w[[tbl]]), function(col) {
# Assign to the reult df column the corresponding column int he list of df's
result[[paste0(tbl, "_", col)]] <<- w[[tbl]][[col]]
})
})
Output:
> result
year abc_eps_usd abc_eps_gbp def_eps_usd def_eps_eur ghi_eps_gbp ghi_eps_aud jkl_eps_usd jkl_eps_aud
1 2007 8.107360 3.419094 11.660133 9.9744151 3.801628 1.936746299 1.36976914 0.58472812
2 2008 7.527040 2.342307 11.407357 5.6755403 13.433364 8.595490269 0.31085568 0.06655984
3 2009 5.155562 4.272123 8.506886 8.5367400 20.305427 18.191703109 0.01993349 0.31829031
4 2010 2.947270 2.983519 5.686625 5.2630734 14.064397 9.049538589 0.92122668 0.55233980
5 2011 8.645507 2.657100 12.445061 6.9406141 5.056093 18.787235097 0.41227465 0.01664083
6 2012 7.192367 5.695391 3.620765 9.1173421 26.452499 0.002014068 1.84031115 0.38873530
7 2013 4.878473 1.527182 11.769227 9.6991108 16.232696 6.934076956 1.07328960 0.28808505
8 2014 1.766486 5.272151 12.656086 0.7318888 32.855694 15.643783443 1.33677381 1.09871196
9 2015 9.428541 6.462755 11.473938 4.3658361 7.547359 17.634770134 1.27743503 1.35510589
10 2016 6.047083 3.437785 13.845070 12.9766045 7.401827 18.032713128 1.73208881 0.03394082
Without a dataset I made up one.
set.seed(5489)
n <- 20
df_list <- list(
nestle = data.frame(A = runif(n), B = runif(n), C = runif(n)),
unilever = data.frame(D = runif(n), E = runif(n), F = runif(n)),
abf = data.frame(G = runif(n), H = runif(n), I = runif(n))
)
The code that follows assumes that you want to extract the first column of each data frame, and that you want to name the columns of the result with a combination of the names of the original df's names and of those first columns.
result <- as.data.frame(do.call(cbind, lapply(df_list, `[[`, 1)))
names(result) <- paste(names(result), sapply(df_list, function(DF) names(DF)[1]))
row.names(result) <- row.names(df_list[[1]])
head(result)
# nestle A unilever D abf G
#1 0.2348625 0.007785561 0.6453142
#2 0.5951392 0.494773356 0.2167643
#3 0.3001674 0.381868381 0.7182713
#4 0.1745270 0.983473145 0.8829462
#5 0.3387269 0.178523104 0.6042962
#6 0.1103261 0.211874225 0.4545857
Related
I need help please. I have two lists: the first contains ndvi time series for distinct points, the second contains precipitation time series for the same plots (plots are in the same order in the two lists).
I need to combine the two lists. I want to add the column called precipitation from one list to the corresponding ndvi column from the other list respecting the dates (represented here by letters in the row names) to a posterior analises of correlation between columns. However, both time series of ndvi and precipitation have distinct lenghts and distinct dates.
I created the two lists to be used as example of my dataset. However, in my actual dataset the row names are monthly dates in the format "%Y-%m-%d".
library(tidyverse)
set.seed(100)
# First variable is ndvi.mon1 (monthly ndvi)
ndvi.mon1 <- vector("list", length = 3)
for (i in seq_along(ndvi.mon1)) {
aux <- data.frame(ndvi = sample(randu$x,
sample(c(seq(1,20, 1)),1),
replace = T))
ndvi.mon1[i] <- aux
ndvi.mon1 <- ndvi.mon1 %>% map(data.frame)
rownames(ndvi.mon1[[i]]) <- sample(letters, size=seq(letters[1:as.numeric(aux %>% map(length))]) %>% length)
}
# Second variable is precipitation
precipitation <- vector("list", length = 3)
for (i in seq_along(ndvi.mon1)){
prec_aux <- data.frame(precipitation = sample(randu$x*500,
26,
replace = T))
row.names(prec_aux) <- seq(letters[1:as.numeric(prec_aux %>% map(length))])
precipitation[i] <- prec_aux
precipitation <- precipitation %>% map(data.frame)
rownames(precipitation[[i]]) <- letters[1:(as.numeric(precipitation[i] %>% map(dim) %>% map(first)))]
}
Can someone help me please?
Thank you!!!
Marcio.
Maybe like this?
library(dplyr)
library(purrr)
precipitation2 <- precipitation %>%
map(rownames_to_column) %>%
map(rename, precipitation = 2)
ndvi.mon2 <- ndvi.mon1 %>%
map(rownames_to_column) %>%
map(rename, ndvi = 2)
purrr::map2(ndvi.mon2, precipitation2, left_join, by = "rowname")
[[1]]
rowname ndvi precipitation
1 k 0.354886 209.7415
2 x 0.596309 103.3700
3 r 0.978769 403.8775
4 l 0.322291 354.2630
5 c 0.831722 348.9390
6 s 0.973205 273.6030
7 h 0.949827 218.6430
8 y 0.443353 61.9310
9 b 0.826368 8.3290
10 d 0.337308 291.2110
The below will return a list of data.frames, that have been merged, using rownames:
lapply(seq_along(ndvi.mon1), function(i) {
merge(
x = data.frame(date = rownames(ndvi.mon1[[i]]), ndvi = ndvi.mon1[[i]][,1]),
y = data.frame(date = rownames(precipitation[[i]]), precip = precipitation[[i]][,1]),
by="date"
)
})
Output:
[[1]]
date ndvi precip
1 b 0.826368 8.3290
2 c 0.831722 348.9390
3 d 0.337308 291.2110
4 h 0.949827 218.6430
5 k 0.354886 209.7415
6 l 0.322291 354.2630
7 r 0.978769 403.8775
8 s 0.973205 273.6030
9 x 0.596309 103.3700
10 y 0.443353 61.9310
[[2]]
date ndvi precip
1 g 0.415824 283.9335
2 k 0.573737 311.8785
3 p 0.582422 354.2630
4 y 0.952495 495.4340
[[3]]
date ndvi precip
1 b 0.656463 332.5700
2 c 0.347482 94.7870
3 d 0.215425 431.3770
4 e 0.063100 499.2245
5 f 0.419460 304.5190
6 g 0.712057 226.7125
7 h 0.666700 284.9645
8 i 0.778547 182.0295
9 k 0.902520 82.5515
10 l 0.593219 430.6630
11 m 0.788715 443.5345
12 n 0.347482 132.3950
13 q 0.719538 79.1835
14 r 0.911370 100.7025
15 s 0.258743 309.3575
16 t 0.940644 142.3725
17 u 0.626980 335.4360
18 v 0.167640 390.4915
19 w 0.826368 63.3760
20 x 0.937211 439.8685
I have 8 data frames and I want to create a variable for each of this data frame. I use a for a loop and the code I have used is given below:
year <- 2001
dflist <- list(bhps01, bhps02, bhps03, bhps04, bhps05, bhps06, bhps07, bhps08)
for (df in dflist){
df[["year"]] <- as.character(year)
assign()
year <- year + 1
}
bhps01,...,bhps08 are the data frame objects and year is a character variable. bhps01 is the data frame for year 2001, bhps02 is the data frame for year 2002 and so on.
Each data corresponds to a year, so bhps01 corresponds to year 2001, bhps corresponds to 2002 and so on. So, I want to create a year variable for each one of these data. So, year variable would be "2001" for bhps01 data, "2002" for bhps02 and so on.
The code runs fine but it does not create the variable year for either of the data frames except the local variable df.
Can someone please explain the error in the above code? Or is there an alternative of doing the same thing?
The syntax in the for loop is wrong. I am not entirely sure what you try to accomplish but let us try this
year = 2001
A = data.frame(a = c(1, 1), b = c(2, 2))
B = data.frame(a = c(1, 1), b = c(2, 2))
L = list(A, B)
for (i in seq_along(L)) {
L[[i]][, dim(L[[i]])[2] + 1] = as.character(rep(year,dim(L[[i]])[1]))
year = year + 1
}
with output
> L
[[1]]
a b V3
1 1 2 2001
2 1 2 2001
[[2]]
a b V3
1 1 2 2002
2 1 2 2002
That is what you intend as output, correct?
In order to change the column name to "year" you can do
L = lapply(L, function(x) {colnames(x)[3] = "year"; x})
You take a copy of the dataframe from the list, and add the variable "year" to it, but then do not assign it anywhere, which is why it is discarded (i.e. not stored in a variable). Here's a fix:
year <- 2001
dflist <- list(bhps01, bhps02, bhps03, bhps04, bhps05, bhps06, bhps07, bhps08)
counter <- 0
for (df in dflist){
counter <- counter + 1
df[["year"]] <- as.character(year)
dflist[[counter]] <- df
year <- year + 1
}
If you want the original dataframes to be edited, you could assign the result back on the rather then into the list. This is a bit of an indirect route, and notice the change in creating the dflist with names. We create the df, and then assign it to the original name. For example:
year <- 2001
dflist <- list(bhps01 = bhps01, bhps02 = bhps02, bhps03 = bhps03, bhps04 = bhps04, bhps05 = bhps05, bhps06 = bhps06, bhps07 = bhps07, bhps08 = bhps08)
counter <- 0
for (df in dflist){
counter <- counter + 1
df[["year"]] <- as.character(year)
dflist[[counter]] <- df
assign(names(dflist)[counter], df)
year <- year + 1
}
I want to select the top 10 voted restaurants, and plot them together.
So i want to create a plot that shows the restaurant names and their votes.
I used:
topTenVotes <- top_n(dataSet, 10, Votes)
and it showed me data of the columns in dataset based on the top 10 highest votes, however i want just the number of votes and restaurant names.
My Question is how to select only the top 10 highest votes and their restaurant names, and plotting them together?
expected output:
Restaurant Names Votes
A 300
B 250
C 230
D 220
E 210
F 205
G 200
H 194
I 160
J 120
K 34
And then a bar plot that shows these restaurant names and their votes
Another simple approach with base functions creating another variable:
df <- data.frame(Names = LETTERS, Votes = sample(40:400, length(LETTERS)))
x <- df$Votes
names(x) <- df$Names # x <- setNames(df$Votes, df$Names) is another approach
barplot(sort(x, decreasing = TRUE)[1:10], xlab = "Restaurant Name", ylab = "Votes")
Or a one-line solution with base functions:
barplot(sort(xtabs(Votes ~ Names, df), decreasing = TRUE)[1:10], xlab = "Restaurant Names")
I'm not seeing a data set to use, so here's a minimal example to show how it might work:
library(tidyverse)
df <-
tibble(
restaurant = c("res1", "res2", "res3", "res4"),
votes = c(2, 5, 8, 6)
)
df %>%
arrange(-votes) %>%
head(3) %>%
ggplot(aes(x = reorder(restaurant, votes), y = votes)) +
geom_col() +
coord_flip()
The top_n command also works in this case but is designed for grouped data.
Its more efficient, though less readable, to use base functions:
#toy data
d <- data.frame(list(Names = sample(LETTERS, size = 15), value = rnorm(25, 10, n = 15)))
head(d)
Names value
1 D 25.592749
2 B 28.362303
3 H 1.576343
4 L 28.718517
5 S 27.648078
6 Y 29.364797
#reorder by, and retain, the top 10
newdata <- data.frame()
for (i in 1:10) {
newdata <- rbind(newdata,d[which(d$value == sort(d$value, decreasing = T)[1:10][i]),])
}
newdata
Names value
8 W 45.11330
13 K 36.50623
14 P 31.33122
15 T 30.28397
6 Y 29.36480
7 Q 29.29337
4 L 28.71852
10 Z 28.62501
2 B 28.36230
5 S 27.64808
I have a dataset consisting of pairs of data.frames (which are almost exact pairs, but not enough to merge directly) which I need to munge together. Luckily, each df has an identifier for the date it was created which can be used to reference the pair. E.g.
df_0101 <- data.frame(a = rnorm(1:10),
b = runif(1:10))
df_0102 <- data.frame(a = rnorm(5:20),
b = runif(5:20))
df2_0101 <- data.frame(a2 = rnorm(1:10),
b2 = runif(1:10))
df2_0102 <- data.frame(a2 = rnorm(5:20),
b2 = runif(5:20))
Therefore, the first thing I need to do is mutate a new column on each data.frame consisting of this date (01_01/ 01_02 / etc.) i.e.
df_0101 <- df_0101 %>%
mutate(df_name = "df_0101")
but obviously in a programmatic manner.
I can call every data.frame in the global environment using
l_df <- Filter(function(x) is(x, "data.frame"), mget(ls()))
head(l_df)
$df_0101
a b
1 0.7588803 0.17837296
2 -0.2592187 0.45445752
3 1.2221744 0.01553190
4 1.1534353 0.72097071
5 0.7279514 0.96770448
$df_0102
a b
1 -0.33415584 0.53597308
2 0.31730849 0.32995013
3 -0.18936533 0.41024220
4 0.49441962 0.22123885
5 -0.28985964 0.62388478
$df2_0101
a2 b2
1 -0.5600229 0.6283224
2 0.5944657 0.7384586
3 1.1284180 0.4656239
4 -0.4737340 0.1555984
5 -0.3838161 0.3373913
$df2_0102
a2 b2
1 -0.67987149 0.65352466
2 1.46878953 0.47135011
3 0.10902751 0.04460594
4 -1.82677732 0.38636357
5 1.06021443 0.92935144
but no idea how to then pull the names of each df down into a new column on each. Any ideas?
Thanks for reading,
We can use Map in base R
Map(cbind, names = names(l_df), l_df)
If we are going by the tidyverse way, then
library(tidyverse)
map2(names(l_df), l_df, ~(cbind(names = .x, .y)))
Also, this can be created a single dataset with bind_rows
bind_rows(l_df, .id = "names")
I am trying to create a data frame (BOS.df) in order to explore the structure of a future analysis I will perform prior to receiving the actual data. In this scenario, lets say that there are 4 restaurants looking to run ad campaigns (the "Restaurant" variable). The total number of days that the campaign will last is cmp.lngth. I want random numbers for how much they are billing for the ads (ra.num). The ad campaigns start on StartDate. ultimately, I want to create a data frame the cycles through each restaurant, and adds a random billing number for each day of the ad campaign by adding rows.
#Create Data Placeholders
set.seed(123)
Restaurant <- c('B1', 'B2', 'B3', 'B4')
cmp.lngth <- 42
ra.num <- rnorm(cmp.lngth, mean = 100, sd = 10)
StartDate <- as.Date("2017-07-14")
BOS.df <- data.frame(matrix(NA, nrow =0, ncol = 3))
colnames(BOS.df) <- c("Restaurant", "Billings", "Date")
for(i in 1:length(Restaurant)){
for(z in 1:cmp.lngth){
BOS.row <- c(as.character(Restaurant[i]),ra.num[z],StartDate +
cmp.lngth[z]-1)
BOS.df <- rbind(BOS.df, BOS.row)
}
}
My code is not functioning correctly right now. The column names are incorrect, and the data is not being placed correctly if at all. The output comes through as follows:
X.B1. X.94.3952435344779. X.17402.
1 B1 94.3952435344779 17402
2 B1 <NA> <NA>
3 B1 <NA> <NA>
4 B1 <NA> <NA>
5 B1 <NA> <NA>
6 B1 <NA> <NA>
How can I obtain the correct output? Is there a more efficient way than using a for loop?
Using expand.grid:
cmp.lngth <- 2
StartDate <- as.Date("2017-07-14")
set.seed(1)
df1 <- data.frame(expand.grid(Restaurant, seq(cmp.lngth) + StartDate))
colnames(df1) <- c("Restaurant", "Date")
df1$Billings <- rnorm(nrow(df1), mean = 100, sd = 10)
df1 <- df1[ order(df1$Restaurant, df1$Date), ]
df1
# Restaurant Date Billings
# 1 B1 2017-07-15 93.73546
# 5 B1 2017-07-16 103.29508
# 2 B2 2017-07-15 101.83643
# 6 B2 2017-07-16 91.79532
# 3 B3 2017-07-15 91.64371
# 7 B3 2017-07-16 104.87429
# 4 B4 2017-07-15 115.95281
# 8 B4 2017-07-16 107.38325
You can use rbind, but this would be another way to do it.
Also, the length of the data frame should be cmp.lngth*length(Restaurant), not cmp.lngth.
#Create Data Placeholders
set.seed(123)
Restaurant <- c('B1', 'B2', 'B3', 'B4')
cmp.lngth <- 42
ra.num <- rnorm(cmp.lngth, mean = 100, sd = 10)
StartDate <- as.Date("2017-07-14")
BOS.df <- data.frame(matrix(NA, nrow = cmp.lngth*length(Restaurant), ncol = 3))
colnames(BOS.df) <- c("Restaurant", "Billings", "Date")
count <- 1
for(name in Restaurant){
for(z in 1:cmp.lngth){
BOS.row <- c(name, ra.num[z], as.character(StartDate + z - 1))
BOS.df[count,] <- BOS.row
count <- count + 1
}
}
I would also recommend you to look at the package called tidyverse and use add_row with tibble instead of data frame. Here is a sample code:
library(tidyverse)
BOS.tb <- tibble(Restaurant = character(),
Billings = numeric(),
Date = character())
for(name in Restaurant){
for(z in 1:cmp.lngth){
BOS.row <- c(name, ra.num[z], as.character(StartDate + z - 1))
BOS.tb <- add_row(BOS.tb,
Restaurant = name,
Billings = ra.num[z],
Date = as.character(StartDate + z - 1))
}
}