Updating multiple date columns in R conditioned on a particular column - r

I have a table that consists of only columns of type Date. The data is about shopping behavior of customers on a website. The columns correspond to the first time an event is triggered by a customer (NULL if no occurrence of the event). One of the columns is the purchase motion.
Here's a MRE for the starting state of the Database:
structure(list(purchase = structure(c(NA, NA, 10729, NA, 10737
), class = "Date"), action_A = structure(c(NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_), class = "Date"), action_B = structure(c(NA,
NA, 10713, NA, 10613), class = "Date"), action_C = structure(c(10707,
10729, 10739, NA, NA), class = "Date")), row.names = c(NA, -5L
), class = c("tbl_df", "tbl", "data.frame"))
I want to update the table so that all the columns of a particular row, all the cells that did not occur within 30 days prior to the purchase are replaced with NULL. However, if the purchase motion is NULL, I'd like to keep the dates of the other events.
So after my envisioned transformation, the above table should look as the following:
structure(list(purchase = structure(c(NA, NA, 10729, NA, 10737
), class = "Date"), action_A = structure(c(NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_), class = "Date"), action_B = structure(c(NA,
NA, 10713, NA, NA), class = "Date"), action_C = structure(c(10707,
10729, NA, NA, NA), class = "Date")), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))
I have yet to be able to achieve this transformation, and would appreciate the help!
Finally, I'd like to transform the above table into a binary format. I've achieved this via the below code segment; however, I'd like to know if I can do this in a simpler way.
df_c <- df_b %>%
is.na() %>%
magrittr::not() %>%
data.frame()
df_c <- df_c * 1

I assume that by saying "replaced by NULL" you actually mean "replaced by NA".
I also assume that the first structure in your question is df_a.
df_b <- df_a %>% mutate(across(starts_with("action"),
~ if_else(purchase - . > 30, as.Date(NA), .)))
mutate(across(cols, func)) applies func to all selected cols.
the real trick here is to use if_else and cast NA into Date class. Otherwise, the dates will be converted to numeric vectors.
Result:
# Tibble (class tbl_df) 4 x 5:
│purchase │action_A│action_B │action_C
1│NA │NA │NA │NA
2│NA │NA │NA │NA
3│1999-05-18│NA │1999-05-02│1999-05-28
4│NA │NA │NA │NA
5│1999-05-26│NA │NA │NA
One problem which remains as a homework exercise: how do you modify the if_else such that you will keep the action if purchase is NA? (this should be now very simple!) I did not include that on purpose because you omitted it from the question.

Related

Catching an issue in imp.dates when two events are taking place on the same day

I am creating a calendar using a dynamic dataframe of the following pattern.
structure(list(Date = structure(c(19304, 19305, 19311,
19311, 19312), class = "Date"), Category = c("4",
"6", "1", "0",
"3"), Units_Sold = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), Raised = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_), Method = c("Trad",
"Trad", "Unknown", "Trad",
"Unknown"), Day = c(8, 9, 15, 15, 16)), row.names = c(NA,
-5L), class = c("tbl_df", "tbl", "data.frame"))
This is then fed into imp.dates like this:
imp.dates <- rep(NA, 31)
imp.dates[df$Day] <- df$Category
This is then fed into a calendR and used to plot some colour-coded events.
As you can probably see from above, there are two events taking place on the same day being of two different categories, this is causing a bit of a problem in so far as the code doesn't know what category to plot them under and what text to put in the calendar on that date.
In an attempt to catch this issue, I'm thinking of putting a condition on the dataframe that checks whether there are >1 events taking place on the same day, and then dropping and renaming the category to say something like "2 events taken place today".
My question is whether this would be the best solution, and if not whether there are some better options available. Any advice or pointers greatly appreciated.

Trouble combining double and character

I am having trouble merging datasets and am going to merging many together so need to figure out a way to automate getting through the following error:
"Error: Can't combine `C:/Users/gabri/AppData/Local/Cache/R/noaa_lcd/2006_72038163885.csv$HourlyWetBulbTemperature` <double> and `C:/Users/gabri/AppData/Local/Cache/R/noaa_lcd/2009_72038163885.csv$HourlyWetBulbTemperature` <character>."
I have examined the data and see that in one of the files some of the NAs are marked by * so I know that is why the problem is there. I would like to add a command that will convert either all to character or all to numeric so that I can merge but when I try adding as.character I receive this error:
Error: Names repair functions can't return `NA` values
Here is the relevant code I am trying to run which produces the error.
library(rnoaa)
library(tidyverse)
library(fs)
super_big_df <- map_df(my_files, read_csv, col_select = c(1,2,21,32,80), col_types = "cTddd", .id = "file")
Here is the output of dput for the relevant columns of the dataset
structure(list(STATION = c(72038163885, 72038163885, 72038163885
), DATE = structure(c(1230768000, 1230769200, 1230770400), tzone = "UTC", class = c("POSIXct",
"POSIXt")), HourlyWetBulbTemperature = c("*", "38", "37"), DailyAverageWetBulbTemperature = c(NA,
NA, NA), MonthlyWetBulb = c(NA, NA, NA)), row.names = c(NA, -3L
), class = c("tbl_df", "tbl", "data.frame"))
structure(list(STATION = c(72038163885, 72038163885, 72038163885
), DATE = structure(c(1146459600, 1146460800, 1146462000), tzone = "UTC", class = c("POSIXct",
"POSIXt")), HourlyWetBulbTemperature = c(NA_real_, NA_real_,
NA_real_), DailyAverageWetBulbTemperature = c(72, NA, NA), MonthlyWetBulb = c(NA,
NA, NA)), row.names = c(NA, -3L), class = c("tbl_df", "tbl",
"data.frame"))
In sum, I am wondering if there is a command I can put in map_df() that will convert everything to be the same (either character or numeric) so that the rest of the command will still run.
Untested, but the best way forward as #GregorThomas suggested is to read it in properly the first time. In this case, it's likely something like:
super_big_df <- map_df(
my_files, read_csv, na = c("", "NA", "*"),
col_select = c(1,2,21,32,80), col_types = "cTddd",
.id = "file")
If you need to fix it after the fact, then you'll need to read them into a list-of-frames, perhaps changing map_df to map,
super_big_df <- map(
my_files, read_csv, na = c("", "NA", "*"),
col_select = c(1,2,21,32,80), col_types = "cTddd",
.id = "file")
bind_rows(super_big_df)
# Error: Can't combine `..1$HourlyWetBulbTemperature` <character> and `..2$HourlyWetBulbTemperature` <double>.
and then something like
library(dplyr) # in case you did not already have it loaded
purrr::map(super_big_df, ~ mutate(., HourlyWetBulbTemperature = suppressWarnings(as.numeric(HourlyWetBulbTemperature)))) %>%
bind_rows()
# # A tibble: 6 x 5
# STATION DATE HourlyWetBulbTemperature DailyAverageWetBulbTemperature MonthlyWetBulb
# <dbl> <dttm> <dbl> <dbl> <lgl>
# 1 72038163885 2009-01-01 00:00:00 NA NA NA
# 2 72038163885 2009-01-01 00:20:00 38 NA NA
# 3 72038163885 2009-01-01 00:40:00 37 NA NA
# 4 72038163885 2006-05-01 05:00:00 NA 72 NA
# 5 72038163885 2006-05-01 05:20:00 NA NA NA
# 6 72038163885 2006-05-01 05:40:00 NA NA NA
The suppressWarnings here is because we know there is a non-number ("*") in that column somewhere. For that one frame, it will fix that column; for other frames, it should be a no-op since the column is already as.numeric.
Note that I hard-coded the name here since we know what it is ahead of time. If there are more columns that need repairing (i.e., you get more errors after fixing this one), then it might be advantageous to go with a more dynamic/programmatic approach (not yet covered here).
Data
super_big_df <- list(
structure(list(STATION = c(72038163885, 72038163885, 72038163885), DATE = structure(c(1230768000, 1230769200, 1230770400), tzone = "UTC", class = c("POSIXct", "POSIXt")), HourlyWetBulbTemperature = c("*", "38", "37"), DailyAverageWetBulbTemperature = c(NA, NA, NA), MonthlyWetBulb = c(NA, NA, NA)), row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame")),
structure(list(STATION = c(72038163885, 72038163885, 72038163885), DATE = structure(c(1146459600, 1146460800, 1146462000), tzone = "UTC", class = c("POSIXct", "POSIXt")), HourlyWetBulbTemperature = c(NA_real_, NA_real_, NA_real_), DailyAverageWetBulbTemperature = c(72, NA, NA), MonthlyWetBulb = c(NA, NA, NA)), row.names = c(NA, -3L), class = c("tbl_df", "tbl", "data.frame"))
)

R: manipulate multiple files in a folder and combine

I have a folder in which I have many .csv files that I want to first, a) manipulate, and b) add together in a way that each file is going to turn into a new row in the new file
See an example file that I have below. All others have the same format:
data <- structure(list(view_history = c("[{\"page_index\":0,\"viewing_time\":3078.7250000284985},{\"page_index\":1,\"viewing_time\":1287.8200000268407}]",
NA, NA, NA, NA, NA), rt = c("4367.33", "32741.89", "84982.255",
"44164.12", "16395.195", "21816.545"), trial_type = c("instructions",
"html-button-response", "survey-multi-choice", "survey-multi-choice",
"survey-multi-choice", "survey-multi-choice"), trial_index = c(0,
1, 2, 3, 4, 5), time_elapsed = c(4369, 37115, 122101, 166268,
182665, 204484), internal_node_id = c("0.0-0.0", "0.0-1.0", "0.0-2.0",
"0.0-3.0", "0.0-4.0", "0.0-5.0"), stimulus = c(NA, "The price of hourly piano course has a mean of $100 with a standard devation of $20. Random samples are taken from the population from small to large sample sizes.</br><img src='LLI_wrong.png' style= 'width:25%; height:30%'><img src= 'LLI_graph_2.png' style= 'width:25%; height:30%'> <br/><img src= 'LLI_wrong2.png' style= 'width:25%; height:30%'><img src= 'LLI_wrong3.png' style= 'width:25%; height:30%'>",
NA, NA, NA, NA), button_pressed = c(NA, 3, NA, NA, NA, NA), responses = c(NA,
NA, "{\"WQ2\":\"<strong>B.</strong> You should go to the large office.\"}",
"{\"WQ3\":\"<strong>B.</strong> The number of days on which mean heights were over 71 inches would be greater for the large post office than for the small post office.\"}",
"{\"WQ4\":\"<strong>B.</strong> The large street\"}", "{\"R2\":\"<strong>A. </strong> As the sample size increases, its mean will tend to be closer to that of the population\"}"
), question_order = c(NA, NA, "[0]", "[0]", "[0]", "[0]"), correct_response = c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), accuracy = c(NA,
NA, NA, NA, NA, NA), key_press = c(NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
Next, I'm organizing and manipulating this data:
#keep only the related columns
data2 <- select(data, time_elapsed, button_pressed, responses, accuracy)
#add time on task
data3 <- mutate(data2, time = tail(time_elapsed, 1))
#data shrunk + time on task added
transformed_data <- select(data3, -time_elapsed)
#select the necessary cells and turn the data into a vector
new_data <- c(transformed_data$button_pressed[2], transformed_data$responses[3:6],
transformed_data$button_pressed[7], transformed_data$time[1])
Next, I transpose the data and write it to a csv file:
new_data <- t(new_data)
write.csv(as.data.frame(new_data), "hello_data.csv")
What I want to do next and that I couldn't figure out:
Loop this process through all .csv files in the folder in a way that each row in your new file corresponds to the data from one file.
Get all the files from the folder with list.files
files <- list.files(path = "/path/to/folder", pattern = "\\.csv$", full.names = TRUE)
Then loop over the files and read the files
library(dplyr)
library(purrr)
library(stringr)
out <- map_dfr(files, ~ {
transformed_data <- readr::read_csv(.x) %>%
dplyr::select(time_elapsed, button_pressed, responses, accuracy) %>%
dplyr::mutate(time = time_elapsed[n()], time_elapsed = NULL)
new_data <- as.data.frame(list(transformed_data$button_pressed[2], transformed_data$responses[3:6],
transformed_data$button_pressed[7], transformed_data$time[1]))
new_data
})
readr::write_csv(out, "hello_data.csv")

How to create a for loop based on unique user IDs and specific event types

I have two data frames: users and events.
Both data frames contain a field that links events to users.
How can I create a for loop where every user's unique ID is matched against an event of a particular type and then stores the number of occurrences into a new column within users (users$conversation_started, users$conversation_missed, etc.)?
In short, it is a conditional for loop.
So far I have this but it is wrong:
for(i in users$id){
users$conversation_started <- nrow(event[event$type = "conversation-started"])
}
An example of how to do this would be ideal.
The idea is:
for(each user)
find the matching user ID in events
count the number of event types == "conversation-started"
assign count value to user$conversation_started
end for
Important note:
The type field can contain one of five values so I will need to be able to effectively filter on each type for each associate:
> events$type %>% table %>% as.matrix
[,1]
conversation-accepted 3120
conversation-already-accepted 19673
conversation-declined 27
conversation-missed 831
conversation-request 23427
Data frames (note that these are reduced versions as confidential information has been removed):
users <- structure(list(`_id` = c("JTuXhdI4Ai", "iGIeCEXyVE", "6XFtOJh0bD",
"mNN986oQv9", "9NI71KBMX9", "x1jH7t0Cmy"), language = c("en",
"en", "en", "en", "en", "en"), registering = c(TRUE, TRUE, FALSE,
FALSE, FALSE, NA), `_created_at` = structure(c(1485995043.131,
1488898839.838, 1480461193.146, 1481407887.979, 1489942757.189,
1491311381.916), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
`_updated_at` = structure(c(1521039527.236, 1488898864.834,
1527618624.877, 1481407959.116, 1490043838.561, 1491320333.09
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), lastOnlineTimestamp = c(1521039526.90314,
NA, 1480461472, 1481407959, 1490043838, NA), isAgent = c(FALSE,
NA, FALSE, FALSE, FALSE, NA), lastAvailableTime = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), class = c("POSIXct",
"POSIXt"), tzone = ""), available = c(NA, NA, NA, NA, NA,
NA), busy = c(NA, NA, NA, NA, NA, NA), joinedTeam = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), class = c("POSIXct",
"POSIXt"), tzone = ""), timezone = c(NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_
)), row.names = c("list.1", "list.2", "list.3", "list.4",
"list.5", "list.6"), class = "data.frame")
and
events <- structure(list(`_id` = c("JKY8ZwkM1S", "CG7Xj8dAsA", "pUkFFxoahy",
"yJVJ34rUCl", "XxXelkIFh7", "GCOsENVSz6"), expirationTime = structure(c(1527261147.873,
NA, 1527262121.332, NA, 1527263411.619, 1527263411.619), class = c("POSIXct",
"POSIXt"), tzone = ""), partId = c("d22bfddc-cd51-489f-aec8-5ab9225c0dd5",
"d22bfddc-cd51-489f-aec8-5ab9225c0dd5", "cf4356da-b63e-4e4d-8e7b-fb63035801d8",
"cf4356da-b63e-4e4d-8e7b-fb63035801d8", "a720185e-c300-47c0-b30d-64e1f272d482",
"a720185e-c300-47c0-b30d-64e1f272d482"), type = c("conversation-request",
"conversation-accepted", "conversation-request", "conversation-accepted",
"conversation-request", "conversation-request"), `_p_conversation` = c("Conversation$6nSaLeWqs7",
"Conversation$6nSaLeWqs7", "Conversation$6nSaLeWqs7", "Conversation$6nSaLeWqs7",
"Conversation$bDuAYSZgen", "Conversation$bDuAYSZgen"), `_p_merchant` = c("Merchant$0A2UYADe5x",
"Merchant$0A2UYADe5x", "Merchant$0A2UYADe5x", "Merchant$0A2UYADe5x",
"Merchant$0A2UYADe5x", "Merchant$0A2UYADe5x"), `_p_associate` = c("D9ihQOWrXC",
"D9ihQOWrXC", "D9ihQOWrXC", "D9ihQOWrXC", "D9ihQOWrXC", "D9ihQOWrXC"
), `_wperm` = list(list(), list(), list(), list(), list(), list()),
`_rperm` = list("*", "*", "*", "*", "*", "*"), `_created_at` = structure(c(1527264657.998,
1527264662.043, 1527265661.846, 1527265669.435, 1527266922.056,
1527266922.059), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
`_updated_at` = structure(c(1527264657.998, 1527264662.043,
1527265661.846, 1527265669.435, 1527266922.056, 1527266922.059
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), read = c(TRUE,
NA, TRUE, NA, NA, NA), data.customerName = c("Shopper 109339",
NA, "Shopper 109339", NA, "Shopper 109364", "Shopper 109364"
), data.departmentName = c("Personal advisors", NA, "Personal advisors",
NA, "Personal advisors", "Personal advisors"), data.recurring = c(FALSE,
NA, TRUE, NA, FALSE, FALSE), data.new = c(TRUE, NA, FALSE,
NA, TRUE, TRUE), data.missed = c(0L, NA, 0L, NA, 0L, 0L),
data.customerId = c("84uOFRLmLd", "84uOFRLmLd", "84uOFRLmLd",
"84uOFRLmLd", "5Dw4iax3Tj", "5Dw4iax3Tj"), data.claimingTime = c(NA,
4L, NA, 7L, NA, NA), data.lead = c(NA, NA, FALSE, NA, NA,
NA), data.maxMissed = c(NA, NA, NA, NA, NA, NA), data.associateName = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_), data.maxDecline = c(NA, NA, NA, NA, NA, NA
), data.goUnavailable = c(NA, NA, NA, NA, NA, NA)), row.names = c("list.1",
"list.2", "list.3", "list.4", "list.5", "list.6"), class = "data.frame")
Update: 21st September 2018
This solution now results in an NA-only data frame being produced at the end of the function. When written to a .csv, this is what I get (naturally, Excel displays NA-values as blank values):
My data source has not changed, nor has my script.
What might be causing this?
My guess is that this is an unforeseen case where there may have been 0 hits for each step has occurred; as such, is there a way to add 0 to those cases where there weren't any hits, rather than NA/ blank values?
Is there a way to avoid this?
New solution based on the provided data.
Note: As your data had no overlap in _id, I changed the events$_id to be the same as in users.
Simplified example data:
users <- structure(list(`_id` = structure(c(4L, 3L, 1L, 5L, 2L, 6L),
.Label = c("6XFtOJh0bD", "9NI71KBMX9", "iGIeCEXyVE",
"JTuXhdI4Ai", "mNN986oQv9", "x1jH7t0Cmy"),
class = "factor")), .Names = "_id",
row.names = c(NA, -6L), class = "data.frame")
events <- structure(list(`_id` = c("JKY8ZwkM1S", "CG7Xj8dAsA", "pUkFFxoahy",
"yJVJ34rUCl", "XxXelkIFh7", "GCOsENVSz6"),
type = c("conversation-request", "conversation-accepted",
"conversation-request", "conversation-accepted",
"conversation-request", "conversation-request")),
.Names = c("_id", "type"), class = "data.frame",
row.names = c("list.1", "list.2", "list.3", "list.4", "list.5", "list.6"))
events$`_id` <- users$`_id`
> users
_id
1 JTuXhdI4Ai
2 iGIeCEXyVE
3 6XFtOJh0bD
4 mNN986oQv9
5 9NI71KBMX9
6 x1jH7t0Cmy
> events
_id type
list.1 JTuXhdI4Ai conversation-request
list.2 iGIeCEXyVE conversation-accepted
list.3 6XFtOJh0bD conversation-request
list.4 mNN986oQv9 conversation-accepted
list.5 9NI71KBMX9 conversation-request
list.6 x1jH7t0Cmy conversation-request
We can use the same approach I suggested before, just enhance it a bit.
First we loop over unique(events$type) to store a table() of every type of event per id in a list:
test <- lapply(unique(events$type), function(x) table(events$`_id`, events$type == x))
Then we store the specific type as the name of the respective table in the list:
names(test) <- unique(events$type)
Now we use a simple for-loop to match() the user$_id with the rownames of the table and store the information in a new variable with the name of the event type:
for(i in names(test)){
users[, i] <- test[[i]][, 2][match(users$`_id`, rownames(test[[i]]))]
}
Result:
> users
_id conversation-request conversation-accepted
1 JTuXhdI4Ai 1 0
2 iGIeCEXyVE 0 1
3 6XFtOJh0bD 1 0
4 mNN986oQv9 0 1
5 9NI71KBMX9 1 0
6 x1jH7t0Cmy 1 0
Hope this helps!

Remove NA columns in a list of dataframes

I am having some trouble cleaning data that I imported from Excel with readxl. readxl created a large list of objects with classes = c('data.frame', tbl_df, tbl) (I would also like to know about why/how it has multiple classes assigned to it). Each of those objects is one of the sheets in the original Excel workbook. The problem is that each of those objects (sheets) may have many columns entirely filled with NAs. I have scanned through stackoverflow and found some similar problems and tried to apply the given solutions like here and here (the first one is the most like my problem). However when I try this:
lapply(x, function(y) y[, !is.na(y)])
I get the following error:
Error in `[.data.frame`(y, , !is.na(y)) : undefined columns selected
I've also tried this:
lapply(x, function(y) y[!is.na(y)]
but it reduces all of my dataframes to only the first column. I think I know it's something to do with my dataframe-within-list syntax. I've experimented with different iterations of y[[]][] and even recently found this interesting pattern in lapply: lapply(x, "[[", y), but couldn't make it work.
Here are the first two objects in my list of dataframes (any hints on how to be more efficient in dput-ing this data are also appreciated). As you can see, the first object has no NA columns, whereas the second has 5 NA columns. I would like to remove those 5 NA columns, but do so for all objects in my list.
Any help is greatly appreciated!
dput(head(x[[1]]))
structure(list(Date = structure(c(1305504000, 1305504000, 1305504000,
1305504000, 1305504000, 1305504000), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), Time = structure(c(-2209121912, -2209121612,
-2209121312, -2209121012, -2209120712, -2209120412), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), Level = c(106.9038, 106.9059, 106.89,
106.9121, 106.8522, 106.8813), Temperature = c(6.176, 6.173,
6.172, 6.168, 6.166, 6.165)), .Names = c("Date", "Time", "Level",
"Temperature"), row.names = c(NA, 6L), class = c("tbl_df", "tbl",
"data.frame"))
dput(head(x[[2]]))
structure(list(Date = structure(c(1305504000, 1305504000, 1305504000,
1305504000, 1305504000, 1305504000), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), Time = structure(c(-2209121988, -2209121688,
-2209121388, -2209121088, -2209120788, -2209120488), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), LEVEL = c(117.5149, 117.511, 117.5031,
117.5272, 117.4523, 117.4524), TEMPERATURE = c(5.661, 5.651,
5.645, 5.644, 5.644, 5.645), `NA` = c(NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_), `NA` = c(NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_), `NA` = c(NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_), `NA` = c(NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_), `NA` = c(NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_)), .Names = c("Date", "Time", "LEVEL",
"TEMPERATURE", NA, NA, NA, NA, NA), row.names = c(NA, 6L), class =
c("tbl_df", "tbl", "data.frame"))
How about this:
lapply(df_list, function(df) df[, colSums(is.na(df)) == 0])
Or maybe:
lapply(df_list, function(df) df[, colSums(is.na(df)) < nrow(df)])
if you want to allow some, but not all rows to be NA

Resources