How to put data frame in R including count of complete cases in separate files - r

I'm a new student at R. I have a directory containing EXCEL files and I need to make a summary in a data frame with complete cases in each file. How can I do this. I tried the following code buwt doesn't work. Appreciate your support

Always begin with the steps required. You will need to do the following:
Read in your data
Clean up your data
Since you do not have any code shown, I will provide you with pseudo code.
library(readxl)
df <- read_xls(path, other options)
df <- complete.cases(df)
You'll want to do that for all of your files. You can use lapply once you are more advanced, and loop over your list.files() list of excel files.

Related

When I upload my excel file to R, the column titles are in the rows and the data seems all jumbled. How do I fix this?

hi literally day one new coder
On the excel sheet, my data looks organized, but when I upload my file to R, it's not able to read the excel properly and the column headers are in the rows and the data seems randomized.
So far I have tried:
library(readxl)
dataset <-read_excel("pathname")
View(dataset)
Also tried:
dataset <-read_excel("pathname", sheet=1, colNames=TRUE)
Also tried to use the package openxlsx
but nothing is giving me the correct, organized data set.
I tried formatting my Excel to a CSV file, and the CSV file looks exactly like the data that shows up on R (both are messed up).
How should I approach this problem?
I deal with importing .xlsx into R frequently. It can be challenging due to the flexibility of the excel platform. I generally use readxl::read_xlsx() to fetch data from .xlsx files. My suggestions:
First, specify exactly the data you want to import with the range argument.
A cell range to read from, as described in cell-specification. Includes typical Excel
ranges like "B3:D87", possibly including the sheet name like "Budget!B2:G14"
Second, if there are there merged cells or other formatting challenges in column headers, I resort to setting col_names = FALSE. And supplying clean names after import with names(df) <- c("first_col", "second_col")
Third, if there are merged cells elsewhere in the spreadsheet I generally I resort to "fixing" them in excel (not ideal but easier for my use case), however, others may have suggestions on a programmatic fix.
It may be helpful to provide a screenshot of your spreadsheet.

import multiple csv from web into one data frame

i want to read out several csv files from the web and save the data into a data frame.
If the files were on my computer this would be very easy as I have seen but I don't always want to download the files.
The example:
"https://www.football-data.co.uk/mmz4281/1819/F1.csv",
"https://www.football-data.co.uk/mmz4281/1718/F1.csv",
"https://www.football-data.co.uk/mmz4281/1617/F1.csv",
"https://www.football-data.co.uk/mmz4281/1516/F1.csv",
"https://www.football-data.co.uk/mmz4281/1415/F1.csv",
"https://www.football-data.co.uk/mmz4281/1314/F1.csv",
"https://www.football-data.co.uk/mmz4281/1213/F1.csv",
"https://www.football-data.co.uk/mmz4281/1112/F1.csv",
"https://www.football-data.co.uk/mmz4281/1011/F1.csv"
These are the CSV files. maybe its possible with a function or a loop but i dont know how.
Maybe you can help me.
Greetings
Reading files from the web is just as easy as reading them from your file system; you can just pass a URL instead of a file-path to readr::read_csv() (you tagged your question with readr so I assume you want to use that).
Assuming your files are in a vector:
files <- c("https://www.football-data.co.uk/mmz4281/1819/F1.csv",
"https://www.football-data.co.uk/mmz4281/1718/F1.csv",
"https://www.football-data.co.uk/mmz4281/1617/F1.csv",
"https://www.football-data.co.uk/mmz4281/1516/F1.csv",
"https://www.football-data.co.uk/mmz4281/1415/F1.csv",
"https://www.football-data.co.uk/mmz4281/1314/F1.csv",
"https://www.football-data.co.uk/mmz4281/1213/F1.csv",
"https://www.football-data.co.uk/mmz4281/1112/F1.csv",
"https://www.football-data.co.uk/mmz4281/1011/F1.csv")
You can use readr::read_csv to read a specific file, and combine them into one data-frame with purrr::map_dfr:
df <- purrr::map_dfr(files, readr::read_csv)
This iterates over the contents of files, applies readr::read_csv to each of those elements, and combines them into one data frame, rowwise (hence dfr).

crawl excel data automatically based on contents

I want to get data from many excel files with the same format like this:
What I want is to output the ID data from column B to a CSV file. I have many files like this and for each file, the number of columns may not be the same but the ID data will always be in the B column.
Is there a package in Julia that can crawl data in this format? If not, what method should I use?
You can use the XLSX package.
If the file in your screenshot is called JAKE.xlsx and the data shown is in a sheet called DataSheet:
data = XLSX.readtable("JAKE.xlsx", "DataSheet")
# `data[1]` is a vector of vectors, each with data for a column.
# that way, `data[1][2]` correponds to column B's data.
data[1][2]
This should give you access to a vector with the data you need. After getting the IDs into a vector, you can use the CSV package to create an output file.
If you add a sample xlsx file to your post it might be possible to give you a more complete answer.

How to write table on Juliabox?

I define a DataFrame named data and want to write it into .csv file. I used writetable("result_data.csv", data) but it doesn't work.
This is the dataframe
error details
To write a data frame to a disk you should use the CSV.jl package like this (also make sure that you have write right to the directory you want to save the file on Juliabox):
using CSV
CSV.write("result_data.csv", data)
If this fails then please report back in the comment I will investigate it further.

Reading and Setting Up CSV files on R Programming Language

I would like to clarify my understanding here on both converting a file into CSV and also reading it. Let's use a dataset from R for instance, titled longley.
To set up a data frame, I can just use the write.table command as follows, right?
d1<-longley
write.table(d1, file="", sep="1,16", row.names=TRUE, col.names=TRUE)
Has this already become a data frame or am I missing something here?
Now let's say if I want to read this CSV file. Then would my code be something like:
read.table(<dframe1>, header=FALSE, sep="", quote="\"")
It seems like before that I have to use a function called setwd(). I'm not really sure what it does or how it helps. Can someone help me here?
longley and, therefore, d1 are already data frames (type class(d1) in the console). A data frame is a fundamental data structure in R. Writing a data frame to a file saves the data in the data frame. In this case, you're trying to save the data in the data frame in CSV format, which you would do like this:
write.csv(d1, "myFileName.csv")
write.csv is a wrapper for write.table that takes care of the settings needed for saving in CSV format. You could also do:
write.table(d1, "myFileName.csv", sep=",")
sep="," tells R to write the file with values separated by a comma.
Then, to read the file into an R session you can do this:
df = read.csv("myFileName.csv", header=TRUE, stringsAsFactors=FALSE)
This creates a new object called df, which is the data frame created from the data in myFileName.csv. Once again, read.csv is a wrapper for read.table that takes care of the settings for reading a CSV file.
setwd is how you change the working directory--that is, the default directory where R writes to and reads from. But you can also keep the current working directory unchanged and just give write.csv or read.csv (or any other function that writes or reads R objects) the full path to wherever you want to read from or write to. For example:
write.csv(d1, "/path/for/saving/file/myFileName.csv")

Resources