How to read a one lined CSV in R? - r

I have been working on a dummy dataset recently and i found out that the data provided to me was all in single line. A similiar example for the same is depicted as follows:
Name,Age,Gender,Occupation A,10,M,Student B,11,M,Student C,11,F,Student
i want to import the data and obtain an output as follows:
Name Age Gender Occupation
A 10 M Student
B 11 M Student
C 12 F Student
a case may arise that a value might be missing. a logic is required to import such data. Can anyone help me out to build a logic behind the import of such data sets.
i tried the normal import but it really didn't helped. just imported the file by read.csv() function and it didn't gave me an expected result.
EDIT: what if the data is like:
Name,Age,Gender,Occupation ABC XYZ,10,M,Student B,11,M,Student C,11,F,Student
and i want an output like:
Name Age Gender Occupation
ABC XYZ 10 M Student
B 11 M Student
C 12 F Student

You could read your file in with readLines, turn spaces into line breaks, and then read it with read.csv:
# txt <- readLines("my_data.txt") # with a real data file
txt <- readLines(textConnection("Name,Age,Gender,Occupation A,10,M,Student B,11,M,Student C,11,F,Student"))
read.csv(text=gsub(" ","\n",txt))
output
Name Age Gender Occupation
1 A 10 M Student
2 B 11 M Student
3 C 11 F Student

If you have millions of records, you will probably want to speed up this process, so I suggest using data.table's fread instead of read.csv, which can also take a shell command to pre-process the file before reading in R, and sed will be a lot faster then doing the string manipulation in R.
Eg if you have this CSV stored at /tmp/x.csv, you can try something like:
> data.table::fread("sed 's/ /\\n/g' /tmp/x.csv")
Name Age Gender Occupation
1: A 10 M Student
2: B 11 M Student
3: C 11 F Student

Related

How to input a variable which contains multiple simultaneous options

I am analyzing a data set which is feedback from teachers. Each line in the data frame is a teacher, each of their answers is a variable, however I've run into a problem inputting the year level for each teacher as a lot of the teachers teach multiple grades.
eg:
Teacher Year
a 1
b 3
c 1/2
d 7
e 3/4
How can I enter this data into an excel sheet and then into R and analyse it usefully? I've never dealt with a variable before which contains multiple options on the same row.
Suppose you already have this data in R in an object called teacher_data. I will show you the way to deal with such responses that I have seen most commonly employed: you create additional columns so that each answer gets its own cell via the convenient tidyr function separate().
library(tidyr)
separate(teacher_data, col = "Year", into = paste0("Year", 1:2), sep = "/")
Here's the result:
Teacher Year1 Year2
1 a 1 <NA>
2 b 3 <NA>
3 c 1 2
4 d 7 <NA>
5 e 3 4
How you then use those columns kind of depends on what sort of answer you're trying to ask with the data. This part of your question is probably best asked at the sister site Cross Validated (Stack Exchange for statistics).
As far as Excel goes, I would not even deal with Excel as an intermediate step; it's just unnecessary. If you write the data out when you're done into a CSV, Excel can read CSVs just fine:
write.csv(teacher_data, file = "teacher_data.csv", row.names = FALSE)
Also, just so you know, I put your data into R via the following:
teacher_data <- read.table(header = TRUE, stringsAsFactors = FALSE, text = "
Teacher Year
a 1
b 3
c 1/2
d 7
e 3/4")

grepl function in R

I have a data set and I want to find the rows which include a specific word "result". I used the following function but it seems it doesn't work correctly. Any suggestion?
data$new<-data.frame(grepl("result",col1))
data:
col1 col2
ABC result VDCbvdc home 22
fgc school 34
university result home exam 45
exam math stat 65
try data$new <- grepl("result",data$col1)
data$new should be assigned to a vector, but you're trying to feed it a data frame. also, col1 only exists inside data, so you'll need data$col1.

Combine all dataframe in list to single dataframe in r [duplicate]

This question already has answers here:
Simultaneously merge multiple data.frames in a list
(9 answers)
Closed 5 years ago.
This one is a doozy. I've been trying to figure this out for a while, but I keep hitting the wall. So, I'm crowd sourcing this in the name of science.
A Brief Introduction
I have about 93 files with unique names in a directory. I read this files in to a list using r.
files.measurements <- as.character(list.files(path = "~/measurements/", full.names = TRUE))
So, what this is doing is just finding the names of all files in the directory. All these files are .csv. Saves me a lot of hassle.
I then read the names of the files.
measurements.filenames <- gsub(".csv", "", basename(files.measurements))
The reason to read these files is because each file name represents the name of the measurement. The same item in the file may or may not exist in multiple files.
For Example
There are 5 file names, viz., NameA, NameB, NameC, NameD, NameE. Each file has 8 column names: id, name, sex, dob, ..., measurement. (This name is same for each file name)
Of course, the id is unique, but may or may not exist int NameB, if it exists in NameA.
Need
So, what I need to do is merge these 93 files to a single dataframe such that the dataframe contains id, name, sex, dob, ... and instead of measurement the name of the file - NameA, for example. The value should be the same for the same id, and if the id doesn't exist, rbind to the dataframe with additional column, else if the id exists, just add the measurement to the column with the new column name - NameB.
Can you please help? This is to gather the data for cardiovascular and HIV diseases for research.
EDIT
DATA
NameA
id gender dob status date measurement
1 F 5/24/1942 Rpt 1/12/2018 2.9
2 F 12/1/2017 Rpt 1/12/2018 0.622
3 M 11/15/1957 Rpt 1/11/2018 3.6
4 M 5/17/1947 Rpt 1/11/2018 3.5
5 F 7/17/1955 Rpt 1/11/2018 2.7
NameB
id gender dob status date measurement
1 F 5/24/1942 Rpt 1/12/2018 3.5
2 F 12/1/2017 Rpt 1/12/2018 2.5
8 M 11/15/1957 Rpt 1/11/2018 1.9
10 M 5/17/1947 Rpt 1/11/2018 0.8
11 F 7/17/1955 Rpt 1/11/2018 1.2
Explanation
So, as you see, all the columns in both tables are the same, but the last measurement is different. Please ignore gender, dob, status and date columns for now. Let's focus on id and measurement. As you can see, id 1 and 2 are in both tables NameA and NameB. if that's the case, then measurement from NameB should be added to the dataframe right next to the measurement from NameB with name (like NameB-measurement). And for all the id's that doesn't exist in NameA from NameBshould be added as new row withmeasurementfromNameAas blank butNameB-measurement` added.
I know it's convoluted, but that's how the researchers gave me the data. I need to clean this up somehow.
Try the following:
# collecting all the csv files in a given folder
files.measurments <- base::list.files(path = ".", include.dirs = FALSE)
# reading all csv files into a list of dataframes
files.combined <- purrr::map(files.measurements, read.csv)
# combining the individual dataframes into a single dataframe
finaldf <- plyr::rbindfill(files.combined)

using subset with a string in R

I have the following data frame in R (made up stuff to learn the program):
country population civilised
1 Town 13 5
2 city 69 9
3 Home 24 2
4 Stuff 99 9
and I am trying to access specific rows with the subset function, like
test <- subset(t, country==Town. But all ever get is object not found.
We need to quote the string.
test <- subset(t, country=='Town')
test
# country population civilised
#1 Town 13 5
NOTE; t is a function name (Check ?t). It is better to name objects that are not function names.

How to create madata in R?

How to create madata?
write.madata(madata, datafile="madata.txt", designfile="design.txt")
I have the following links
http://cgd.jax.org/churchill-apps/jmaanova-1.0.0/help/8e11221e.html
http://rgm2.lab.nig.ac.jp/RGM2/func.php?rd_id=maanova:write.madata
But do not understand how to create them for the dataset I have created.
For example: How to convert a data set like this to madata:
ID Name Age
1 ABC 15
2 PQR 80
3 XZY 15
From the help page that you linked ?read.madata, all you have to do is call read.madata with a matrix or tab delimited file as the first argument.

Resources