I know you can use read.table to read one matrix from a file, but I would like to read two matrices of the same size (m by n) from one file in R and put them in two separate R variables.
For example, this file contains two 3 by 2 matrices:
6 3
2 5
5 4
4 3
6 3
3 4
Here is my shot at it.
split(read.table("data.txt"), gl(2, 3, labels=c("x1", "x2")))
It should be easy to generalize this and wrap it up into a function.
I hope this helps.
Related
I have written a loop that stores data frames in a list and would like to use strings stored in a vector as their names. This way, I could refer to the dataframes stored in the list by their names without having to use indexes. I have searched the internet extensively to this issue but so far have not found any solution.
So far, I have used a workaround: I loop over a list of data frame names using read.csv(). In each iteration, I write the imported data frame to the global environment using assign() which allows me to me set a variable name. Using get() and a pattern matching approach, I then fetch data frames from the global environment and store them in a list.
This approach is quite cumbersome and only works when data frame names follow a shared pattern.
Preferably, I would like to rename data frames without having to use assign():
Name of imported data frame 1 <- First element of vector containing the data frame names
How could I achieve this?
I highly appreciate every help!
My approach to this sort of problem is to use lapply to create the loop and then supply names for the elements of the resulting list. This gives a simple, two line solution once the "create a data frame" function has been written.
For example, generating a random data.frame rather than reading a csv file for easy reproduction:
createDataFrame <- function(x) {
data.frame(X=x, Y=rnorm(5))
}
beatles <- lapply(1:4, createDataFrame)
names(beatles) <- c("John", "Paul", "George", "Ringo")
beatles
$John
X Y
1 1 -1.1590175
2 1 0.6872888
3 1 -0.8868616
4 1 -0.3458603
5 1 1.1136297
$Paul
X Y
1 2 -0.3761409
2 2 -0.9059801
3 2 -0.7039736
4 2 -0.4490143
5 2 1.1337149
$George
X Y
1 3 -0.4804286
2 3 1.0573272
3 3 -1.9000426
4 3 0.8887967
5 3 0.6550380
$Ringo
X Y
1 4 -0.7539840
2 4 -0.3743590
3 4 -0.9748449
4 4 -1.1448570
5 4 -1.3277712
beatles$George
X Y
1 3 -0.4804286
2 3 1.0573272
3 3 -1.9000426
4 3 0.8887967
5 3 0.6550380
Make the obvious changes to createDataFrame for your actual use case.
Suppose I have the dataset that has the following information:
1) Number (of products bought, for example)
1 2 3
2) Frequency for each number (e.g., how many people purchased that number of products)
2 5 10
Let's say I have the above information for each of the 2 groups: control and test data.
How do I format the data such that it would look like this:
controldata<-c(1,1,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3)
(each number * frequency listed as a vector)
testdata<- (similar to above)
so that I can perform the two independent sample t-test on R?
If I don't even need to make them a vector / if there's an alternative clever way to format the data to perform the t-test, please let me know!
It would be simple if the vector is small like above, but I can have the frequency>10000 for each number.
P.S.
Control and test data have a different sample size.
Thanks!
Use rep. Using your data above
rep(c(1, 2, 3), c(2, 5, 10))
# [1] 1 1 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3
Or, for your case
control_data = rep(n_bought, frequency)
I have data file of the form:
unimportant1 unimportant2 unimportant3 matrixdata[i]
1e4 2e5 3e2 1 2 3 4 5
2e3 1e1 7e3 5 4 3 2 1
... ... ... ...
2e3 1e4 4e2 4 4 4 4 4
So it has columnheaders (here "unimportant1" to "unimportant3") as the first row. I want gnuplot to ignore these first three unimportant columns columns so the data entries in exponential notation. I want gnuplot to plot the matrixdata as a matrix. So as if I did it like this:
#!/usr/bin/gnuplot -p
plot '-' matrix with image
1 2 3 4 5
5 4 3 2 1
...
4 4 4 4 4
e
How do I get gnuplot to ignore the first three columns and the header row and plot the rest as matrix image? For compatibility, I would prefere a gnuplot built-in to do that, but I could write a shell script and use the `plot '< ...' syntax preprocessing the data file.
Edit: So neuhaus' answer almost solved it. The only thing I'm missing is, how to ignore the first row (line) with the text header data. Every seems to expect numeric data and so the whole plot fails as it's not a matrix. I don't want to comment out the fist line, as I'm using the unimportant data sets for other 2D plots that, in turn, use the header data.
So how do I skip a row in a matrix plot that already uses every to skip columns?
When using matrix gnuplot must first parse the data file before it can skip rows and columns. Now, your first row evaluates to four invalid number, the second row has 8 number and I get an error that Matrix does not represent a grid.
If you don't want to comment out the first line or skip it with an external tool like < tail -n +2 matrix.dat, then you could change it to contain some dummy strings like
unimportant1 unimportant2 unimportant3 matrixdata[i] B C D E
1e4 2e5 3e2 1 2 3 4 5
2e3 1e1 7e3 5 4 3 2 1
... ... ... ...
2e3 1e4 4e2 4 4 4 4 4
Now your first row has as many entries as the other rows, and you can plot this file with
plot 'test.txt' matrix every ::3:1 with image
This still gives you a warning: matrix contains missing or undefined values, but you don't need to care.
I'm not familiar with matrix plots, but I got some sample data and
plot 'matrix.dat' matrix every ::3 with image
seems to do the trick.
You could probably use shell commands, for instance, the following skips the first six lines of a file:
plot '<tail -n +7 terrain0.dem' matrix with image
Let's say I have a vector of integers 1:6
w=1:6
I am attempting to obtain a matrix of 90 rows and 6 columns that contains the multinomial combinations from these 6 integers taken as 3 groups of size 2.
6!/(2!*2!*2!)=90
So, columns 1 and 2 of the matrix would represent group 1, columns 3 and 4 would represent group 2 and columns 5 and 6 would represent group 3. Something like:
1 2 3 4 5 6
1 2 3 5 4 6
1 2 3 6 4 5
1 2 4 5 3 6
1 2 4 6 3 5
...
Ultimately, I would want to expand this to other multinomial combinations of limited size (because the numbers get large rather quickly) but I am having trouble getting things to work. I've found several functions that do binomial combinations (only 2 groups) but I could not locate any functions that do this when the number of groups is greater than 2.
I've tried two approaches to this:
Building up the matrix from nothing using for loops and attempting things with the reshape package (thinking that might be something there for this with melt() )
working backwards from the permutation matrix (720 rows) by attempting to retain unique rows within groups and or removing duplicated rows within groups
Neither worked for me.
The permutation matrix can be obtained with
library(gtools)
dat=permutations(6, 6, set=TRUE, repeats.allowed=FALSE)
I think working backwards from the full permutation matrix is a bit excessive but I'm tring anything at this point.
Is there a package with a prebuilt function for this? Anyone have any ideas how I shoud proceed?
Here is how you can implement your "working backwards" approach:
gps <- list(1:2, 3:4, 5:6)
get.col <- function(x, j) x[, j]
is.ordered <- function(x) !colSums(diff(t(x)) < 0)
is.valid <- Reduce(`&`, Map(is.ordered, Map(get.col, list(dat), gps)))
dat <- dat[is.valid, ]
nrow(dat)
# [1] 90
I want to (as ever) use code that performs better but functions equivalently to the following:
write.table(results.df[seq(1, ncol(results.df),2)],file="/path/file.txt", row.names=TRUE, sep="\t")
write.table(results.df[seq(2, ncol(results.df),2)],file="/path/file2.txt",row.names=TRUE, sep="\t")
results.df is a dataframe that looks something thus:
row.names 171401 171401 111201 111201
1 1 0.8320923 10 0.8320923
2 2 0.8510621 11 0.8510621
3 3 0.1009001 12 0.1009001
4 4 0.9796110 13 0.9796110
5 5 0.4178686 14 0.4178686
6 6 0.6570377 15 0.6570377
7 7 0.3689075 16 0.3689075
There is no consistent patterning in the column headers except that each one is repeated twice consecutively.
I want to create (1) one file with only odd-numbered columns of results.df and (2) another file with only even-numbered columns of results.df. I have one solution above, but was wondering whether there is a better-performing means of achieving the same thing.
IDEA UPDATE: I was thinking there may be some way of excising - deleting it from memory - each processed column rather than just copying it. This way the size of the dataframe progressively decreases and may result in a performance increase???
The code is only slightly shorter but...
# Instead of
results.df[seq(1, ncol(results.df), 2]
results.df[seq(2, ncol(results.df), 2]
#you could use
results.df[c(T,F)]
results.df[c(F,T)]