How to create master data source in tableau desktop version 9.3? - report

I have 5 tde which includes data for last 5 years. I am trying to create a single view or master data source based on these 5 tde. If I want to show data of last 5 years together how can I do this by using these 5 tdes. I am trying to append the data by using option 'Append data from file' but it prompts '0 rows added successfully'. I just want to create a view/ master data source which will hold data for all 5 years.

Related

Is there a way to automate the extraction of the Microsoft Office 365 Quarantine Data CSV file into Multiple CSV Files?

When I ran a security report through the Office 365 Admin Email Explorer to obtain detailed information about emails and their respective types of attacks, I downloaded the .csv file and manually use Microsoft Excel to filter out exact email subject rows and save to their own .csv file. This took a long time to create individual CSV files since there were quite a lot of various emails with same or differing subject titles as values.
Downloaded the .csv fild from the Office 365 Admin portal with a date range of 7 days into the past (date-range).
Imported into R using the R command below:
Office_365_Report_CSV = "C:/Users/absnd/Documents/2022-11-18office365latestquarantine.csv"
Imported the table from the library.
require(data.table)
Created a new variable to convert the data into a data-frame.
quarantine_data = fread(paste0(Office_365_Report_CSV),sep = ",", header = TRUE, check. Names = FALSE)
Pull columns needed to filter through in the data-frame.
Quarantine_Columns = quarantine_data[,c("Email date (UTC)","Recipients","Subject","Sender","Sender IP","Sender domain","Delivery action","Latest delivery location","Original delivery location","Internet message ID","Network message ID","Mail language","Original recipients","Additional actions","Threats","File threats","File hash","Detection technologies","Alert ID","Final system override","Tenant system override(s)","User system override(s)","Directionality","URLs","Sender tags","Recipient tags","Exchange transport rule","Connector","Context" )]
Steps Needed to be done (I am not sure where to go from here):
-I would like to have R write to individual .csv file with the same "Subject" value rows that must contain all the above columns data in step 5.
Sub-step - ex. If the row data contains the value inside the column (named, "Threats") = "Phish" generate a file named, "YYYY-MM-DD Phishing <number increment +1>.csv."
Sub-step - ex. 2 If the row data contains the value inside the column (named, "Threats") = Phish, Spam" generate a CSV file named, "YYYY-MM-DD Phishing and Spam <number increment +1>.csv."
Step 6 and so on would filter out like same "Subject" column values for rows and save the rows with same Subject email values into a single file that would be named based on the if-condition in the substeps above in step 6.
First of all, you are looking to do this in R - RStudio is an IDE to make usage of R easier.
If you save your data frames in a list, and then set a vector of the names of the files that you want to give each of those files, you can then use purrr::walk2() to iterate through the saving. Some reproducible code as an example:
library(purrr)
library(glue)
library(readr)
mydfs_l <- list(mtcars, iris)
file_names <- c("mtcars_file", "iris_file")
walk2(mydfs_l, file_names, function(x, y) { write_excel_csv(x, glue("mypath/to/files/{y}.csv")) })

Converting missing values to NA within sapply function

I want to import json data from some kind of fitness tracker in order to run some analysis on them. The single json files are quite large, while I am only interested in specific numbers per training session (each json file is a training session).
I managed to read in the names of the files & to grab the interesting content out of the files. Unfortunately my code does obviously not work correctly if one or more information are missing in some of the json files (e.g. distance is not availaible as it was an indoor training).
I stored all json files with training sessions in a folder (=path in the code) and asked R to get a list of the files in that folder:
json_files<- list.files(path,pattern = ".json",full.names = TRUE) #this is the list of files
jlist<-as.list(json_files)
Then I wrote this function to get the data im interested in from each single file (as reading in all the content for each file exceeded my available RAM capacity...)
importPFData <- function(x)
{
testimport<-fromJSON(x)
sport<-testimport$exercises$sport
starttimesession<-testimport$exercises$startTime
endtimesession<-testimport$exercises$stopTime
distance<-testimport$exercises$distance
durationsport<-testimport$exercises$duration
maxHRsession <- testimport$exercises$heartRate$max
minHRsession <- testimport$exercises$heartRate$min
avgHRsession <- testimport$exercises$heartRate$avg
calories <- testimport$exercises$kiloCalories
VO2max_overall <- testimport$physicalInformationSnapshot$vo2Max
return(c(starttimesession,endtimesession,sport,distance,durationsport,
maxHRsession,minHRsession,avgHRsession,calories,VO2max_overall))
}
Next I applied this function to all elements of my list of files:
dataTest<-sapply(jlist, importPFData)
I receive a list with one entry per file, as expected. Unfortunately not all of the data was available per file, which results in some entries having 7, other having 8,9 or 10 entries.
I struggle with getting this into a proper dataframe as the infomation is not shown as NA or 0, its just left out.
Is there an easy way to include NA in the function above if no information is found in the individual json file for that specific detail (e.g. distance not available --> "NA" for distance for this single entry)?
Example (csv) of the content of a file with 10 entries:
"","c..2013.01.06T08.52.38.000....2013.01.06T09.52.46.600....RUNNING..."
"1","2013-01-06T08:52:38.000"
"2","2013-01-06T09:52:46.600"
"3","RUNNING"
"4","6890"
"5","PT3608.600S"
"6","234"
"7","94"
"8","139"
"9","700"
"10","48"
Example (csv) for a file with only 7 entries (columns won´t macht to Example 1):
"","c..2014.01.22T18.38.30.000....2014.01.22T18.38.32.000....RUNNING..."
"1","2014-01-22T18:38:30.000"
"2","2014-01-22T18:38:32.000"
"3","RUNNING"
"4","0"
"5","PT2S"
"6","0"
"7","46"

How to import an ASCII text file into R - the NIBRS

Currently, I am trying to import the National Incidence Based Reporting System (NIBRS) 2019 data into R. The data comes in an ASCII text format, and so far I've tried readr::read_tsv and readr::read_fwf. However, I can't seem to import the data correctly - read_tsv shows only 1 column, while read_fwf needs column arguments that I do not understand how to decipher based on the text file.
Here is the link to the NIBRS. I used the Master File Downloads to download the zipped file for the NIBRS in 2019.
My overall goal is to have a typical dataframe/tibble for this data set with column names being the type of crime, and the rows being the number of incidents.
I have seen a few other examples of importing this data through this help page, but their copies of the data only covers up to 2015 (My data needs to range from 2015-2019).
.
Use read.fwf(). Column widths are listed here
We can use read_fwf with column_positions
library(readr)
read_fwf(filename, column_positions = c(2, 5, 10))

Beginner R and required to teach: Selecting a defined column using $

I recently began using R with no prior coding experience after I was transferred to a new department and I want to understand how some R functions work. I have this written code:
read.csv("something.csv",header=TRUE)$DATE123
The csv file contains a time series with header that begins with DATE in A1 cell.
How does R classify that A column is DATE123? is it because of the header=true and $?
As explained in the comments, header=TRUE indicates that the first row of your files are column names. Thus, every object in that row will be a column with that name. In your case, there is probably a field in the first row of your csv file that is called DATE123.
A data frame consists of rows and columns. Each column in the data frame can be accessed by the $ sign. If the name of the data frame is df and one of the columns is named DATE123, then you can extract all data from that column by using the following command:
df$DATE123

Passing dataframe from one code to another

I am preparing an extensive report which pulls data from various source and merge them. For fetching data from each source i am using a separate R file. These files need to interact with each other(passing date ranges and resulting data frames).
To pass the date range from my main.R file to the fetch_data.R file i am using
system("cmd.exe", input = paste('"C:\\Program Files\\R\\R-3.3.2\\bin\\Rscript.exe" "Rcodes\\Audit\\fetch_data.R"',
arg1=start_date,
arg2=end_date))
Once fetch_data.R receives the dates i get the resulting dataframe table_1 from fetch_data.R.
I need to pass table_1 from fetch_data.R to my original main.R and carry on with remaining code in main.R.

Resources