Related
Let's say I have a list of 23 elements.
ls <- list(1:23)
Which I want to write to a file which has 5 elements on each line, separated by a tab until not possible anymore:
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
21 22 23
How would I go about doing this? I don't see any options in write.lines or write.table.
The code by #akrun works best:
cat(gsub("\\s*((\\d+\\s+){1,4}\\d+)", "\\1\n",
paste(unlist(ls), collapse="\t")), '\n', file = 'file1.txt')
With a minor error for decimal values, as the resulting file1.txt looks like this:
0.0005862 0.0005983 0.0006225 0.0006637 0
.0006622 0.0006197 0.000599 0.0005983 0
.0006247 0.0006707 0.0006641 0.0006253 0
.0006087 0.0006234 0.0006807 0.0007485 0
.0007546 0.0007 0.000643 0.0006183 0
.0006264 0.0006819 0.000697 0.0006453 0
It can be done with cat and gsub. unlist the list, paste them into a single string, insert nextline (\n) at every block of 'n' digits with spaces, and use cat to write into console
cat(gsub("\\s*((\\d+\\s+){1,4}\\d+)", "\\1\n",
paste(unlist(ls), collapse="\t")), '\n')
#1 2 3 4 5
#6 7 8 9 10
#11 12 13 14 15
#16 17 18 19 20
#21 22 23
or write to a file
cat(gsub("\\s*((\\d+\\s+){1,4}\\d+)", "\\1\n",
paste(unlist(ls), collapse="\t")), '\n', file = 'file1.txt')
If it is a complex data with scientific notation etc. we could split into a list and then append NA at the end for those elements with less number of elements
v1 <- unlist(ls)
lst1 <- split(v1, (seq_along(v1)-1) %/% 4 + 1)
mat1 <- do.call(rbind, lapply(lst1, `length<-`, max(lengths(lst1))))
write(mat1, 'file2.txt')
You first need to define the chunks, I used BBmisc which have chunk function to obtain chunks of N elementes (five in your case).
Then you can use write.table witch have the append option.
library(BBmisc)
x <-list(1:20)
n<-5
splited<-chunk(x[[1]],n)
for(i in 1:length(splited)){
x=splited[[i]]
line=paste(x,collapse = "\t")
write.table(line, file = "output.txt", sep = "\t",
row.names = FALSE, col.names = FALSE, quote = FALSE, append = T)
}
Regards
I have various different CSV documents that are all in the same folder. All of these documents have 65 columns are titled with the same two header lines and they need to be merged to a single document. Furthermore, I need to merge the header lines.
The structure looks more or less like this:
B2.csv:
TP1 TP1 TP2 TP2 TP2
Value Measurement Condition Time Max_Value
1.09 2.779 1 120 5.885
5.09 2.005 2 180 7.555
9.33 1.889 3 240 1.444
5.00 6.799 4 300 9.125
8.88 3.762 5 360 6.223
B4.csv:
TP1 TP1 TP2 TP2 TP2
Value Measurement Condition Time Max_Value
2.11 4.339 7 120 6.115
5.69 8.025 8 180 7.555
8.38 5.689 9 240 5.244
9.70 7.795 10 300 8.824
8.78 3.769 11 360 3.883
The final document should then look like this:
TP1_Value TP1_Measurement TP2_Condition TP2_Time TP2_Max_Value
1.09 2.779 1 120 5.885
5.09 2.005 2 180 7.555
9.33 1.889 3 240 1.444
5.00 6.799 4 300 9.125
8.88 3.762 5 360 6.223
2.11 4.339 7 120 6.115
5.69 8.025 8 180 7.555
8.38 5.689 9 240 5.244
9.70 7.795 10 300 8.824
8.78 3.769 11 360 3.883
To merge the documents, I have used this code:
setwd("C:/Users/XXXX/Desktop/Data/.")
# Get a List of all files in directory named with a key word, say all `.csv` files
filenames <- list.files("C:/Users/XXXX/Desktop/Data/.", pattern="*.csv", full.names=TRUE)
# Read and row bind all data sets
data <- rbindlist(lapply(filenames,fread))
# Generate new CSV document
write.csv(data, file = "C:/Users/XXXX/Desktop/Data/OneHeader.csv", sep = ",", row.names = FALSE)
However, with this code, the second title line remains in the data file. To merge these titles, I would use this code:
# Merging first two lines into one single header
data[] <- lapply(data, as.character)
names(data) <- paste(names(data), data[1, ], sep = "_")
new_data <- data[-1,]
Could you help me, how I could combine these two parts of the code in a way that it does the merging automatically?
I would be very grateful, if somebody could help me hereby, as I am a very beginner using R. Or are there any other (better) ways to achieve this task?
Thank you very much for your help!
Here is a data.table approach, mostly using fread().
Since it reads the column names by file, it will also work if tour files contain different headers. Use fill = TRUE in rbindlist() to fill in blank-columns.
library( data.table )
#get list of files to read
files <- list.files( pattern = "^B[0-9].csv", full.names = TRUE )
#read files to list using lapply
l <- lapply( files, function(x) {
#read the first two rows of each file, and paste them together to get col_names
col_names = transpose( fread( x, nrows = 2 ) )[, .(paste(V1, V2, sep = "_") )][[1]]
#read file from except the first two rows, use col_names as header
dt <- fread( x, skip = 2, col.names = col_names )
})
#bind list together
rbindlist( l, fill = TRUE )
# TP1_Value TP1_Measurement TP2_Condition TP2_Time TP2_Max_Value
# 1: 1.09 2.779 1 120 5.885
# 2: 5.09 2.005 2 180 7.555
# 3: 9.33 1.889 3 240 1.444
# 4: 5.00 6.799 4 300 9.125
# 5: 8.88 3.762 5 360 6.223
# 6: 2.11 4.339 7 120 6.115
# 7: 5.69 8.025 8 180 7.555
# 8: 8.38 5.689 9 240 5.244
# 9: 9.70 7.795 10 300 8.824
# 10: 8.78 3.769 11 360 3.883
Then write the result to disk.
This is a base R solution.
First, get the file names. The regex pattern assumes that they all start with an uppercase "B" followed by 1 or more digits and that the file extension is ".csv".
fnames <- list.files(pattern = "^B\\d+\\.csv")
Second, read them all in with an lapply loop, skipping the first rows. Then, rbind the several dataframes together.
df_list <- lapply(fnames, read.table, skip = 2, sep = ",")
df_final <- do.call(rbind, df_list)
Now for the column names.
readLines reads text lines and strsplitseparates them into the column names' components.
header <- readLines(fnames[1], n = 2)
header <- strsplit(header, ",")
names(df_final) <- paste(header[[1]], header[[2]], sep = "_")
See the result.
df_final
# TP1_Value TP1_Measurement TP2_Condition TP2_Time TP2_Max_Value
#1 1.09 2.779 1 120 5.885
#2 5.09 2.005 2 180 7.555
#3 9.33 1.889 3 240 1.444
#4 5.00 6.799 4 300 9.125
#5 8.88 3.762 5 360 6.223
#6 2.11 4.339 7 120 6.115
#7 5.69 8.025 8 180 7.555
#8 8.38 5.689 9 240 5.244
#9 9.70 7.795 10 300 8.824
#10 8.78 3.769 11 360 3.883
Since you always have the same headers from what I gather, I'd just use a regex to remove these second header lines from my inserted data object like this:
data <- data[!grepl(*.Value.*, data$TP1),] # removes all the lines that have the term Value on data$TP1 column
Then you can just rename your first header as you please with:
colnames(data) <- c('TP1_Value', ....)
Try this:
filenames <- list.files("C:/Users/XXXX/Desktop/Data/.", pattern="*.csv", full.names=TRUE)
data <- lapply(filenames, read.csv, skip = 2)
dataDF <- as.data.frame(do.call("rbind", data), stringsAsFactors = FALSE)
headersDF<- read.csv(filenames[[1]], nrows= 2, header = FALSE, stringsAsFactors = FALSE)
names(dataDF) <- paste(headersDF[1,], headersDF[2,], sep = "_")
write.csv(data, file = "C:/Users/XXXX/Desktop/Data/OneHeader.csv", sep = ",", row.names = FALSE)
Basically thie does the following:
Row 1 creates a vector with the names of the csv files in the directory you provide.
Row 2 reads the data from all files into a list of data frames. It skips the first two rows in every file.
Row 3 binds the different dataframes from the matrix into one. (Now you have your file, what you are lacking is the column names)
Row 4 reads the first two rows from the first file (your header) into a data.frame.
Row 5 pastes the two rows elementwise using a "_" as separator and sets this string as column names.
Row 6 writes your csv.
I have a series of .txt files that look like this:
Button,Intensity,Acc,Intensity,RT,Time
0,30,0,0,0,77987.931
1,30,1,13.5,0,78084.57
1,30,1,15,0,78098.624
1,30,1,6,0,78114.132
1,30,1,15,0,78120.669
They have file names like 1531_Day49.txt, 1531_Day50.txt, 1532_Day49.txt, 1532_Day50.txt etc
I want to load all the files in this directory into data frames, append a column that is the difference between the Time in the row above (tdelta), and append two columns that are the first 4 digits (i.e. 1531, 1532) and one column that's the Day code uncoded so the column title would be PrePost and each row would be, if filename Day49, then "Pre" and if filename Day50 then "Post".
So ideal output for a 1531 Day 49 file would be:
Button,Intensity,Acc,Intensity,RT,Time,Tdelta,ID,PrePost
0,30,0,0,0,77987.931,0 ,1531,Pre
1,30,1,13.5,0,78084.57,96.693 ,1531,Pre
1,30,1,15,0,78098.624, 14.054,1531,Pre
So far I have:
#call library
library(data.table)
#batch enter .txt files and put them into a data frame
setwd("~/Documents/PVTPASAT/PVT")
temp = list.files(pattern="*.txt")
list.DFs <- lapply(myfiles,fread)
#view print out to visually check
View(list.DFs)
#add column of time difference
list.DFs <- lapply(list.DFs, cbind, tDelta = c(0, diff(df$Time)))
#Add empty columns for ID and PrePost
list.DFs <- lapply(list.DFs, cbind, ID = c(""))
list.DFs <- lapply(list.DFs, cbind, PrePost = c(""))
#print one to visually check
View(list.DFs[3])
I would create a function to do the processing and then apply it to your list of files like so:
example <- read.delim(textConnection('
Button, Intensity, Acc, Intensity, RT, Time
0,30,0,0,0,77987.931
1,30,1,13.5,0,78084.57
1,30,1,15,0,78098.624
1,30,1,6,0,78114.132
1,30,1,15,0,78120.669'),
header = T,
sep = ','
)
write.table(example, '1531_Day49.txt', row.names = F)
temp <- list.files(pattern="*.txt")
process_txt <- function(x) {
dat <- data.table::fread(x, header = T)
dat$tdelta <- c(0, diff(dat$Time))
dat$ID <- substr(x, 1, 4)
dat$PrePost <- if (grepl('49\\.', x)) {'Pre'} else {'Post'}
dat
}
out <- lapply(temp, process_txt)
#Heather, the main guidance is to first solve properly one file. Then, place all that working code into a function.
library(dplyr) ## for lag function
library(stringr) ## for str_detect
# make two test files
dt <- read.csv(text=
'Button,Intensity,Acc,Intensity,RT,Time
0,30,0,0,0,77987.931
1,30,1,13.5,0,78084.57
1,30,1,15,0,78098.624
1,30,1,6,0,78114.132
1,30,1,15,0,78120.669
')
write.csv(dt,"1531_Day49.txt")
write.csv(dt,"1532_Day50.txt")
# function to do the work for one file name - returns a dataframe
doOne <- function (file) {
# read
contents <- fread(file)
# compute delta
contents$Tdelta <- contents$Time - lag(contents$Time)
# prefix up to underscore
contents$ID <- strsplit(file, c("_"))[[1]][[1]]
# add the prepost using ifelse and str_detetct
contents$PrePost <- ifelse(str_detect(file, "Day49"), "Pre", "Post")
return(contents)
}
#test files
files <- c("1531_Day49.txt", "1532_Day50.txt")
# call the function for each file -- result is
# a list of dataframes
lapply(files, doOne)
# better get them all into a single data frame for analysis
do.call(rbind, lapply(files, doOne))
# V1 Button Intensity Acc Intensity.1 RT Time Tdelta ID PrePost
# 1: 1 0 30 0 0.0 0 77987.93 NA 1531 Pre
# 2: 2 1 30 1 13.5 0 78084.57 96.639 1531 Pre
# 3: 3 1 30 1 15.0 0 78098.62 14.054 1531 Pre
# 4: 4 1 30 1 6.0 0 78114.13 15.508 1531 Pre
# 5: 5 1 30 1 15.0 0 78120.67 6.537 1531 Pre
# 6: 1 0 30 0 0.0 0 77987.93 NA 1532 Post
# 7: 2 1 30 1 13.5 0 78084.57 96.639 1532 Post
# 8: 3 1 30 1 15.0 0 78098.62 14.054 1532 Post
# 9: 4 1 30 1 6.0 0 78114.13 15.508 1532 Post
# 10: 5 1 30 1 15.0 0 78120.67 6.537 1532 Post
I am analysing some data and need help.
Basically, I have a dataset that looks like this:
date <- seq(as.Date("2017-04-01"),as.Date("2017-05-09"),length.out=40)
switch <- c(rep(1:2,each=10),rep(1:2,each=10))
O2 <- runif(40,min=21.02,max=21.06)
CO2 <- runif(40,min=0.076,max=0.080)
test.data <- data.frame(date,switch,O2,CO2)
As can be seen, there's a switch column that switches between 1 and 2 every 10 data points. I want to write a code that does: when the "switch" column changes its value (from 1 to 2, or 2 to 1), delete the first 5 rows of data after the switch (i.e. leaving the 5 last data points for all the 4 variables), average the rest of the data points for O2 and CO2, and put them in 2 new columns (avg.O2 and avg.CO2) before the next switch. Then repeat this process until the end.
It's quite easy to do manually on paper or excel, but my real dataset would comprise thousands of data points and I would like to use R to do it automatically for me. So anyone has any ideas that could help me?
Please find my edits which should work for both regular and irregular
date <- seq(as.Date("2017-04-01"),as.Date("2017-05-09"),length.out=40)
switch <- c(rep(1:2,each=10),rep(1:2,each=10))
O2 <- runif(40,min=21.02,max=21.06)
CO2 <- runif(40,min=0.076,max=0.080)
test.data <- data.frame(date,switch,O2,CO2)
CleanMachineData <- function(Data, SwitchData, UnreliableRows = 5){
# First, we can properly turn your switch column into a grouping column (1,2,1,2)->(1,2,3,4)
grouplength <- rle(Data[,"switch"])$lengths
# mapply lets us input vector arguments into typically one/first-element only argument functions.
# In this case we create a sequence of lengths (output is a list/vector)
grouping <- mapply(seq, grouplength)
# Here we want it to become a single vector representing groups
groups <- mapply(rep, 1:length(grouplength), each = grouplength)
# if frequency was irregular, it will be a list, if regular it will be a matrix
# convert either into a vector by doing as follows:
if(class(grouping) == "list"){
groups <- unlist(groups)
} else {
groups <- as.vector(groups)
}
Data$group <- groups
#
# vector of the first row of each new switch (except the starting 0)
switchRow <- c(0,which(abs(diff(SwitchData)) == 1))+1
# I use "as.vector" to turn the matrix output of mapply into a sequence of numbers.
# "ToRemove" will have all the row numbers to get rid of from your original data, except for what happens before (in this case) row 10
ToRemove <- c(1:UnreliableRows, as.vector(mapply(seq, switchRow, switchRow+(UnreliableRows)-1)))
# I concatenate the missing beginning (1,2,3,4,5) and theToRemove them with c() and then remove them from n with "-"
Keep <- seq(nrow(Data))[-c(1:UnreliableRows,ToRemove)]
# Create the new data, (in case you don't know: data[<ROW>,<COLUMN>])
newdat <- Data[-ToRemove,]
# print the results
newdat
}
dat <- CleanMachineData(test.data, test.data$switch, 5)
dat
date switch O2 CO2 group
6 2017-04-05 1 21.03922 0.07648886 1
7 2017-04-06 1 21.04071 0.07747368 1
8 2017-04-07 1 21.05742 0.07946615 1
9 2017-04-08 1 21.04673 0.07782362 1
10 2017-04-09 1 21.04966 0.07936446 1
16 2017-04-15 2 21.02526 0.07833825 2
17 2017-04-16 2 21.04511 0.07747774 2
18 2017-04-17 2 21.03165 0.07662803 2
19 2017-04-18 2 21.03252 0.07960098 2
20 2017-04-19 2 21.04032 0.07892145 2
26 2017-04-25 1 21.03691 0.07691438 3
27 2017-04-26 1 21.05846 0.07857017 3
28 2017-04-27 1 21.04128 0.07891908 3
29 2017-04-28 1 21.03837 0.07817021 3
30 2017-04-29 1 21.02334 0.07917546 3
36 2017-05-05 2 21.02890 0.07723042 4
37 2017-05-06 2 21.04606 0.07979641 4
38 2017-05-07 2 21.03822 0.07985775 4
39 2017-05-08 2 21.04136 0.07781525 4
40 2017-05-09 2 21.05375 0.07941123 4
aggregate(cbind(O2,CO2) ~ group, dat, mean)
group O2 CO2
1 1 21.04675 0.07812336
2 2 21.03497 0.07819329
3 3 21.03967 0.07834986
4 4 21.04166 0.07882221
# crazier, irregular switching
test.data2 <- test.data
test.data2$switch <- unlist(mapply(rep, 1:2, times = 1, each = c(10,8,10,5,3,10)))[1:20]
dat2 <- CleanMachineData(test.data2, test.data2$switch, 5)
dat2
date switch O2 CO2 group
6 2017-04-05 1 21.03922 0.07648886 1
7 2017-04-06 1 21.04071 0.07747368 1
8 2017-04-07 1 21.05742 0.07946615 1
9 2017-04-08 1 21.04673 0.07782362 1
10 2017-04-09 1 21.04966 0.07936446 1
16 2017-04-15 2 21.02526 0.07833825 2
17 2017-04-16 2 21.04511 0.07747774 2
18 2017-04-17 2 21.03165 0.07662803 2
24 2017-04-23 1 21.05658 0.07669662 3
25 2017-04-24 1 21.04452 0.07983165 3
26 2017-04-25 1 21.03691 0.07691438 3
27 2017-04-26 1 21.05846 0.07857017 3
28 2017-04-27 1 21.04128 0.07891908 3
29 2017-04-28 1 21.03837 0.07817021 3
30 2017-04-29 1 21.02334 0.07917546 3
36 2017-05-05 2 21.02890 0.07723042 4
37 2017-05-06 2 21.04606 0.07979641 4
38 2017-05-07 2 21.03822 0.07985775 4
# You can try removing a vector with the following
lapply(5:7, function(x) {
dat <- CleanMachineData(test.data2, test.data2$switch, x)
list(data = dat, means = aggregate(cbind(O2,CO2)~group, dat, mean))
})
Use
test.data[rep(c(FALSE, TRUE), each=5),]
to select always the last five rows from the group of 10 rows.
Then you can use aggregate:
d2 <- test.data[rep(c(FALSE, TRUE), each=5),]
aggregate(cbind(O2, CO2) ~ 1, data=d2, FUN=mean)
If you want the average for every 5-rows-group:
aggregate(cbind(O2, CO2) ~ gl(k=5, n=nrow(d2)/5L), data=d2, FUN=mean)
Here is a generalization for the situation of arbitrary number of rows in test.data:
stay <- rep(c(FALSE, TRUE), each=5, length.out=nrow(test.data))
d2 <- test.data[stay,]
group <- gl(k=5, n=nrow(d2)/5L+1L, length=nrow(d2))
aggregate(cbind(O2, CO2) ~ group, data=d2, FUN=mean)
Here is a variant for mixing the data with the averages:
group <- gl(k=10, n=nrow(test.data)/10L+1L, length=nrow(test.data))
L <- split(test.data, group)
mySummary <- function(x) {
if (nrow(x) <= 5) return(NULL)
x <- x[-(1:5),]
d.avg <- aggregate(cbind(O2, CO2) ~ 1, data=x, FUN=mean)
rbind(x, cbind(date=NA, switch=-1, d.avg))
}
lapply(L, mySummary) # as list of dataframes
do.call(rbind, lapply(L, mySummary)) # as one dataframe
Problem setup: Creating a function to take multiple CSV files selected by ID column and combine into 1 csv, then create an output of number of observations by ID.
Expected:
complete("specdata", 30:25) ##notice descending order of IDs requested
## id nobs
## 1 30 932
## 2 29 711
## 3 28 475
## 4 27 338
## 5 26 586
## 6 25 463
I get:
> complete("specdata", 30:25)
id nobs
1 25 463
2 26 586
3 27 338
4 28 475
5 29 711
6 30 932
Which is "wrong" because it has been sorted by id.
The CSV file I read from does have the data in descending order. My snippet:
dfTable<-read.csv("~/progAssign1/specdata/tmpdata.csv")
ccTab<-complete.cases(dfTable)
xTab3<-as.data.frame(table(dfTable$ID[ccTab]),)
colnames(xTab3)<-c("id","nobs")
And as near as I can tell, the third line is where sorting occurs. I broke out the expression and it happens in the table() call. I've not found any option or parameter I can pass to make something like sort=FALSE. You'd think...
Anyway. Any help appreciated!
So, the problem is in the output of table, which are sorted by default. For example:
> r = sample(5,15,replace = T)
> r
[1] 1 4 1 1 3 5 3 2 1 4 2 4 2 4 4
> table(r)
r
1 2 3 4 5
4 3 2 5 1
If you want to take the order of first appearance, you are going to get your hands a little bit dirty by recoding the table function:
unique_r = unique(r)
table_r = rbind(label=unique_r, count=sapply(unique_r,function(x)sum(r==x)))
table_r
[,1] [,2] [,3] [,4] [,5]
label 1 4 3 5 2
count 4 5 2 1 3
One way to get around this is...don't use table. Here's an example where I create three one-line data sets from your data. Then I read them in with a descending sequence, with read.table and it seems to be okay.
The real big thing here is that multiple data sets should be placed in a list upon being read into R. You'll get the exact order of data sets you want that way, among other benefits.
Once you've read them into R the way you want them, it's much easier to order them at the very end. Ordering of rows (for me) is usually the very last step.
> dat <- read.table(h=T, text = "id nobs
1 25 463
2 26 586
3 27 338
4 28 475
5 29 711
6 30 932")
Write three one-line files:
> write.table(dat[3,], "dat3.csv", row.names = FALSE)
> write.table(dat[2,], "dat2.csv", row.names = FALSE)
> write.table(dat[1,], "dat1.csv", row.names = FALSE)
Read them in using a 3:1 order:
> do.call(rbind, lapply(3:1, function(x){
read.table(paste0("dat", x, ".csv"), header = TRUE)
}))
# id nobs
# 1 27 338
# 2 26 586
# 3 25 463
Then, if we change 3:1 to 1:3 the rows "comply" with our request
> do.call(rbind, lapply(1:3, function(x){
read.table(paste0("dat", x, ".csv"), header = TRUE)
}))
# id nobs
# 1 25 463
# 2 26 586
# 3 27 338
And just for fun
> fun <- function(z){
do.call(rbind, lapply(z, function(x){
read.table(paste0("dat", x, ".csv"), header = TRUE) }))
}
> fun(c(2, 3, 1))
# id nobs
# 1 26 586
# 2 27 338
# 3 25 463
You may try something like this:
t1 <- c(5,3,1,3,5,5,5)
as.data.frame(table(t1)) ##result in ascending order
# t1 Freq
#1 1 1
#2 3 2
#3 5 4
t1 <- factor(t1)
as.data.frame(table(reorder(t1, rep(-1, length(t1)),sum)))
# Var1 Freq
#1 5 4
#2 3 2
#3 1 1
In your case you are complaining about the actions of the table function with a single argument returning the items with the names in ascending order and you wnat them in descending order. You could have simply used the rev() function around the table call.
xTab3<-as.data.frame( rev( table( dfTable$ID[ccTab] ) ),)
(I'm not sure what that last comma is doing in there.) The sort order in the original would not be expected to determine the order of a table operation. Generally R will return results with discrete labels sorted in alpha (ascending) order unless the levels of a factor item have been specified differently. That's one of those R-specific rules that may be difficult to intuit. The other R-specific rule that may be difficult to grasp (although not really a problem here) is that arguments are often expected to be in the form of R-lists.
It's probably wise to think about R-table objects at this point (and what happens with the as.data.frame call. table-objects are actually R-matrices, so the feature that you wanted to sort by was actually the rownames of that table object and are of class character:
r = sample(5,15,replace = T)
table(r)
#r
#2 3 4 5
#5 3 2 5
rownames(table(r))
#[1] "2" "3" "4" "5"
str(as.data.frame(table(r)))
#-------
'data.frame': 4 obs. of 2 variables:
$ r : Factor w/ 4 levels "2","3","4","5": 1 2 3 4
$ Freq: int 5 3 2 5
I just wanna share this homework I've done
complete <- function(directory, id=1:332){
setwd("E:/Coursera")
files <- dir(directory, full.names = TRUE)
data <- lapply(files, read.csv)
specdata <- do.call(rbind, data)
cleandata <- specdata[!is.na(specdata$sulfate) & !is.na(specdata$nitrate),]
targetdata <- data.frame(Date=numeric(0), sulfate=numeric(0), nitrate=numeric(0), ID=numeric(0))
result<-data.frame(id=numeric(0), nobs=numeric(0))
for(i in id){
targetdata <- cleandata[cleandata$ID == i, ]
result <- rbind(result, data.frame(table(targetdata$ID)))
}
names(result) <- c("id","nobs")
result
}
A simple solution that no one has proposed yet is combining table() with unique() function. The unique() function does the behaviour that you are looking (listing unique IDs in order of appearance).
In your case it would be something like this:
dfTable<-read.csv("~/progAssign1/specdata/tmpdata.csv")
ccTab<-complete.cases(dfTable)
x<-dfTable$ID[ccTab] #unique IDs
xTab3<-as.data.frame(table(x)[unique(x)],) #here you sort the "table()" result in order of appearance
colnames(xTab3)<-c("id","nobs")