R: drop columns from tibbles inside a function - r

This is a followthrough of this topic. Here are my 3 tibbles:
dftest_tw <- structure(list(text = c("RT #BitMEXdotcom: A new high: US$500M turnover in the last 24 hours, over 80% of it on $XBTUSD. Congrats to the team and thank you to our u…",
"RT #Crowd_indicator: Thank you for this nice video, #Nicholas_Merten",
"RT #Crowd_indicator: Review of #Cindicator by DataDash: t.co/D0da3u5y3V"
), Tweet.id = c("896858423521837057", "896858275689398272", "896858135314538497"
), created.date = structure(c(17391, 17391, 17391), class = "Date"),
created.week = c(33, 33, 33)), .Names = c("text", "Tweet.id",
"created.date", "created.week"), row.names = c(NA, -3L), class = c("tbl_df",
"tbl", "data.frame"))
dftest1_tw <- dftest_tw
dftest2_tw <- dftest_tw
myUserList <- ls(,pattern = "_tw")
Following yesterday topic, I have the wanted result when running this:
library(tidyverse)
lst <- mget(myUserList) %>%
map2(myUserList, ~mutate(.data = .x, Twitter.name = .y)) %>%
list2env(lst, envir = .GlobalEnv)
I need to drop a few columns for each df. This do the job when running on one df:
select_(dftest_tw, quote(-text), quote(-Tweet.id), quote(-created.date))
It seems like I have a serious probelm when it comes to apply code to each member of a list. I can't find a way to apply it to all df when using lapply, or writing a function:
MySelect <- function(x){
select_(x, quote(-text), quote(-Tweet.id), quote(-created.date))
x
}
for(var in myUserList){MySelect(get(var))}
Thank you for your help.

Related

use dplyr to get list items from dataframe in R

I have a dataframe being returned from Microsoft365R:
SKA_student <- structure(list(name = "Computing SKA 2021-22.xlsx", size = 22266L,
lastModifiedBy =
structure(list(user =
structure(list(email = "my#email.com",
id = "8ae50289-d7af-4779-91dc-e4638421f422",
displayName = "Name, My"), class = "data.frame", row.names = c(NA, -1L))),
class = "data.frame", row.names = c(NA, -1L)),
fileSystemInfo = structure(list(
createdDateTime = "2021-09-08T16:03:38Z",
lastModifiedDateTime = "2021-09-16T00:09:04Z"), class = "data.frame", row.names = c(NA,-1L))), row.names = c(NA, -1L), class = "data.frame")
I can return all the lastModifiedBy data through:
SKA_student %>% select(lastModifiedBy)
lastModifiedBy.user.email lastModifiedBy.user.id lastModifiedBy.user.displayName
1 my#email.com 8ae50289-d7af-4779-91dc-e4638421f422 Name, My
But if I want a specific item in the lastModifiedBy list, it doesn't work, e.g.:
SKA_student %>% select(lastModifiedBy.user.email)
Error: Can't subset columns that don't exist.
x Column `lastModifiedBy.user.email` doesn't exist.
I can get this working through base, but would really like a dplyr answer
This function allows you to flatten all the list columns (I found this ages ago on SO but can't find the original post for credit)
SO_flat_cols <- function(data) {
ListCols <- sapply(data, is.list)
cbind(data[!ListCols], t(apply(data[ListCols], 1, unlist)))
}
Then you can select as you like.
SO_flat_cols (SKA_student) %>%
select(lastModifiedBy.user.email)
Alternatively you can get to the end by recursively pulling the lists
SKA_student %>%
pull(lastModifiedBy) %>%
pull(user) %>%
select(email)
You could use
library(dplyr)
library(tidyr)
SKA_student %>%
unnest_wider(lastModifiedBy) %>%
select(email)
This returns
# A tibble: 1 x 1
email
<chr>
1 my#email.com

merging outputs from a loop

I have two datasets and named E and eF respectively.
E<- structure(list(Inception_Date = structure(c(962323200, 962323200,
810950400, 988675200, 1042502400, 1536624000), tzone = "UTC", class =
c("POSIXct","POSIXt")), Name = c("Calvert Social Index B", "Calvert US
Large Cap Core Rspnb Idx A", "Green Century Equity Individual
Investor", "Praxis Value Index A", "Vanguard FTSE Social Index I",
"Amundi IS Amundi MSCI USA SRI ETF DR")), row.names = c(NA, -6L),
class = c("tbl_df", "tbl", "data.frame"))
eF <- structure(list(Inception_Date = structure(c(760233600, 519868800,
1380067200, 1101772800, 1325203200, 628473600, 1325203200, 1123804800
), tzone = "UTC", class = c("POSIXct", "POSIXt")), Name = c("Amana
Growth Investor", "Amana Income Investor", "Amana Income
Institutional", "American Century Sustainable Equity A",
"Ariel Appreciation Institutional", "Ariel Appreciation Investor",
"Ariel Focus Institutional", "Baywood Socially Responsible Invs"
)), row.names = c(NA, -8L), class = c("tbl_df", "tbl", "data.frame"
))
I applied the following codes to the data E and eF.
for (k in 1:nrow(E)) {
F_temp <- eF;
G_temp <- F_temp %>% filter(abs(F_temp$Inception_Date-
E$Inception_Date[k]) <= 1500);
print(G_temp)}
As the "G_temp" under the "Global Environment" shows it as 0 obs. of 2 variables only (which must be the last components in the loop's list), how to make a .csv file that shows all the "G_temp" components merged together removing duplicates?
Thanks
Using your exact filter criteria would this do it?
G_temp <- data.frame(Inception_Date = as.POSIXct(character()),
Name = character())
for (k in 1:nrow(E)) {
G_temp_int <- eF %>%
filter(abs(eF$Inception_Date - E$Inception_Date[k]) <= 1500)
G_temp <- bind_rows(G_temp, G_temp_int)
}
G_temp <- G_temp %>%
distinct(Inception_Date, Name)
write.csv(G_temp, "G_temp.csv")

How to plot layers of tupples on same plot in R?

I am trying to plot the time and NDVI for each region on the same plot. I think to do this I have to convert the date column from characters to time and then plot each layer. However I cannot figure out how to do this. Any thoughts?
list(structure(list(observation = 1L, HRpcode = NA_character_,
timeseries = NA_character_), row.names = c(NA, -1L), class = c("tbl_df",
"tbl", "data.frame")), structure(list(observation = 1:6, time = c("2014-01-01",
"2014-02-01", "2014-03-01", "2014-04-01", "2014-05-01", "2014-06-01"
), ` NDVI` = c("0.3793765496776215", "0.21686891782421552", "0.3785652933528299",
"0.41027240624704164", "0.4035578030242673", "0.341299793064468"
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
)), structure(list(observation = 1:6, time = c("2014-01-01",
"2014-02-01", "2014-03-01", "2014-04-01", "2014-05-01", "2014-06-01"
), ` NDVI` = c("0.4071076986818826", "0.09090719657570319", "0.35214166081795284",
"0.4444311032927228", "0.5220702877666005", "0.5732370503295022"
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
)), structure(list(observation = 1:6, time = c("2014-01-01",
"2014-02-01", "2014-03-01", "2014-04-01", "2014-05-01", "2014-06-01"
), ` NDVI` = c("0.3412131556625801", "0.18815996897460135", "0.5218904976415136",
"0.6970128777711452", "0.7229657162729096", "0.535967435470161"
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
)))
111
First we need to clean your data. The first element in this list is empty
df = df[-1]
Now we need to make a data.frame
df = do.call(rbind, df)
I am going to add a region variable, change the name of NDVI to remove the space,
change ndvi into a numeric vector, and change time into a Date object
library(dplyr)
df = df %>%
mutate(region = factor(rep(1:3, rep(6, 3)))) %>%
rename(ndvi = ' NDVI') %>%
mutate(ndvi = as.numeric(ndvi)) %>%
mutate(time = as.Date(time))
Now we can use ggplot2 to plot the data by region
library(ggplot2)
g = df %>%
ggplot(aes(x = time, y = ndvi, col = region)) +
geom_line()
g
Which gives this plot:
Here's an approach with lubridate to handle dates and dplyr to make the binding of the data.frames easier to understand.
Note that the group names are taken from the names of the list, and since those don't exist in the data you provided, we have to set them in advance.
library(lubridate)
library(ggplot2)
library(dplyr)
names(data) <- 1:3
data <- bind_rows(data, .id = "group")
data$time <- ymd(data$time)
setnames(data," NDVI","NDVI")
data$NDVI <- as.numeric(data$NDVI)
ggplot(data, aes(x=time,y=NDVI,color=Group)) + geom_line()

Casting data correctly in R using the grep function

I'm trying to reshape my data based on the value in a particular column (ie. "up" and "down"). The Up and Down are not in the same order in the data frame, so I'm having difficultly "casting" the data into the right shape.
I've tried used the cast function to shift the data, but I can't get the answers to work in a consistent (aka accurate) fashion.
This is my input:
input = structure(list(X = 1:6, Report = c("Sales.csv", "Sales.csv",
"Sales.csv", "Sales.csv", "Sales.csv", "Sales.csv"), Shock = c("Currencies.USD_Up",
"Currencies.USD_Down", "Currencies.AUD_Up", "Currencies.AUD_Down",
"Currencies.EUR_Down", "Currencies.EUR_Up"), Result = c(-519375.9816,
-7388851.423, -42950.77683, -667.367063, -12819532.15, -138054.0061
), FX = c("USD", "USD", "AUD", "AUD", "EUR", "EUR")), class = "data.frame", row.names = c(NA,
-6L))
and this is my preferred output:
output = structure(list(X = 1:3, Report = c("Sales.csv", "Sales.csv",
"Sales.csv"), Shock = c("Currencies.USD", "Currencies.AUD", "Currencies.EUR"
), Currency = c("USD", "AUD", "EUR"), Up = c(-519375.9816, -42950.77683,
-138054.0061), Down = c(-7388851.423, -667.367063, -12819532.15
)), class = "data.frame", row.names = c(NA, -3L))
Because the EUR data in the input is in a different order, I can't seem to make the data shape correctly. I've tried using the grep function to order this, but I can't make this work. Can anyone suggest a better way?
This is a tidyverse approach to do it:
library(dplyr)
library(tidyr)
library(tibble)
input %>%
as_tibble() %>%
separate(Shock, c("Shock", "tmp"), sep = "_") %>%
rename(Currency = FX) %>%
select(-X) %>%
spread(tmp, Result) %>%
mutate(X = row_number()) %>%
select(X, Report, Shock, Currency, Up, Down)

R: add a new column to dataframes from a function

I have many tibbles similar to this:
dftest_tw <- structure(list(text = c("RT #BitMEXdotcom: A new high: US$500M turnover in the last 24 hours, over 80% of it on $XBTUSD. Congrats to the team and thank you to our u…",
"RT #Crowd_indicator: Thank you for this nice video, #Nicholas_Merten",
"RT #Crowd_indicator: Review of #Cindicator by DataDash: t.co/D0da3u5y3V"
), Tweet.id = c("896858423521837057", "896858275689398272", "896858135314538497"
), created.date = structure(c(17391, 17391, 17391), class = "Date"),
created.week = c(33, 33, 33)), .Names = c("text", "Tweet.id",
"created.date", "created.week"), row.names = c(NA, -3L), class = c("tbl_df",
"tbl", "data.frame"))
For testing, we add another one:
dftest2_tw <- dftest_tw
I have this list of my df:
myUserList <- ls(,pattern = "_tw")
What I am looking to do is:
1- add a new column named Twitter.name
2- fill the column with the df name, all this in a function. The following code works for each df taken one by one:
dftest_tw %>% rowwise() %>% mutate(Twitter.name = myUserList[1])
The desired result is this:
MyRes <- structure(list(text = c("RT #BitMEXdotcom: A new high: US$500M turnover in the last 24 hours, over 80% of it on $XBTUSD. Congrats to the team and thank you to our u…",
"RT #Crowd_indicator: Thank you for this nice video, #Nicholas_Merten",
"RT #Crowd_indicator: Review of #Cindicator by DataDash: t.co/D0da3u5y3V"
), Tweet.id = c("896858423521837057", "896858275689398272", "896858135314538497"
), created.date = structure(c(17391, 17391, 17391), class = "Date"),
created.week = c(33, 33, 33), retweet = c(0, 0, 0), custom = c(0,
0, 0), Twitter.name = c("dftest_tw", "dftest_tw", "dftest_tw"
)), .Names = c("text", "Tweet.id", "created.date", "created.week",
"retweet", "custom", "Twitter.name"), class = c("rowwise_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -3L))
When it comes to write a function to be thereafter been applied to all my df (more than 100), I can't achieve it. Any help would be appreciated.
We can use tidyverse options. Get the value of multiple string objects with mget, then with map2 from purrr, create the new column 'Twitter.name in each dataset of the list with corresponding string element of 'myUserList`
library(tidyverse)
lst <- mget(myUserList) %>%
map2(myUserList, ~mutate(.data = .x, Twitter.name = .y))
If we need to modify the objects in the global environment, use list2env
list2env(lst, envir = .GlobalEnv)

Resources