reaching max.print on R - r

I just found a bunch of weather data that I would like to play around with in glmnet in R. First I've been reading and organizing the data in R, and right now I am just trying to look at the raw data of each variable. Unfortunately, each variable has a lot of data and R isn't able to print it all. Is there a way I can view all the raw data in R or just in the file itself? I've tried opening the file in excel to no success. Thanks!

Try to use Frequency tables, you can group by segments.
str() , summary(), table(), pairs(), plots() etc. There are several libraries (such as decr) which facilitate analyzing numerical and factor levels. Let me know if you need help with any.

Related

R: convert data frame columns to least memory demanding data type without loss of information

My data is massive and I was wondering if there is a way I could tell R to convert each column to data types which are less memory demanding without any loss of information.
In Stata, there is a function called compress that does that. I was wondering if there is something similar in R.
I would also be grateful if you have other simple advice of how to handle large datasets in R (in addition to using data.table instead of dplyr).

Non-programmer, ascii file data extract (can I even learn to code?)

As the title says, I'm not a programmer. I've tried R before, got very confused and abandoned it. I'm a physician, and I do all my statistics either with SPSS or Excel. I'd like to learn some coding for when I get into problems like this:
I have an ascii file that I'd like to extract data from. The fields are contained within columns of variable width. 90% of the file is useless to me. For example, the fields I'm interested in extracting are encoded in columns 00645-00649, 03315-03319, etc. I'd like to get this into a format so I can run stats in SPSS/Excel. Should I be looking to use R, Python, something else or am I totally beyond hope?
Thanks in advance.
It's impossible to say for certain given only the information here, but the DATA LIST command in SPSS may well allow you to read the data into SPSS directly from the current file. If you can specify the column locations of the desired variables, you can specify those on that command, and SPSS will simply skip over the unnamed columns.

Data structure and package for a radial dendrogram in R

I'd like to create a radial dendrogram in R, but being new to the software, I don't know if I chose the correct data structure and package.
I've created a YAML file that looks as follows:
Data structure
I know the exact hierachy of the languages, but I need R to calculate x and y values. I'd use hclust for that, I think?
I found this instruction here for example: https://stats.stackexchange.com/questions/4062/how-to-plot-a-fan-polar-dendrogram-in-r, but it uses the mtcars dataset. I'd just like to know whether it makes sense to set up my data as above or whether I should use a different structure. When I try to import the datasets I get an error message saying I've got more columns than column headers so I must be doing something wrong.

rangedummarizedexperiment for deseq2

I'm trying to use the DESeq2 package in R for differential gene expression, but I'm having trouble creating the required RangedSummarizedExperiment object from my input data. I have found several tutorials and vignettes for doing this, but they all seem to apply to a raw data set that is different from mine. My data has gene names as row names and patient id as column names, and the data is simply integer count data. There has to be a simple way to create the RangedSummarizedExperiment object from this type of input data, but I haven't yet found a way. Can anybody help? Thanks.
I had a similar problem understanding how to use this data structure. I eventually managed to do without it by using DESeqDataSetFromMatrix. You can see an example in the first code block of Modify r object with rpy2 (this code is pure R, rpy2 stuff comes after). In this example, I have genes as rows and samples as columns, so it is likely you will be able to adopt the same approach.

pass value from R (run in SPSS) to open SPSS dataset

Completely new to R here. I ran R in SPSS to solve some complex polynomials from SPSS datasets. I managed to get the result from R back into SPSS, but it was a very inelegant process:
begin program R.
z <- polyroot(unlist(spssdata.GetDataFromSPSS(variables=c("qE","qD","qC","qB","qA"),cases=1),use.names=FALSE))
otherVals <- spssdata.GetDataFromSPSS(variables=c("b0","b1","Lc","tInv","sR","c0","c1","N2","xBar","DVxSq"),cases=1)
b0<-unlist(otherVals["b0"],use.names=FALSE)
b1<-unlist(otherVals["b1"],use.names=FALSE)
Lc<-unlist(otherVals["Lc"],use.names=FALSE)
tInv<-unlist(otherVals["tInv"],use.names=FALSE)
sR<-unlist(otherVals["sR"],use.names=FALSE)
c0<-unlist(otherVals["c0"],use.names=FALSE)
c1<-unlist(otherVals["c1"],use.names=FALSE)
N2<-unlist(otherVals["N2"],use.names=FALSE)
xBar<-unlist(otherVals["xBar"],use.names=FALSE)
DVxSq<-unlist(otherVals["DVxSq"],use.names=FALSE)
z2 <- Re(z[abs(c(abs(b0+b1*Re(z)-tInv*sR*sqrt(1/(c0+c1*Re(z))^2+1/N2+(Re(z)-xBar)^2/DVxSq))-Lc))==min(abs(c(abs(b0+b1*Re(z)-tInv*sR*sqrt(1/(c0+c1*Re(z))^2+1/N2+(Re(z)-xBar)^2/DVxSq))-Lc)))])
varSpec1 <- c("Xd","Xd",0,"F8","scale")
dict <- spssdictionary.CreateSPSSDictionary(varSpec1)
spssdictionary.SetDictionaryToSPSS("results", dict)
new = data.frame(z2)
spssdata.SetDataToSPSS("results", new)
spssdictionary.EndDataStep( )
end program.
Honestly, it was mostly pieced together from somewhat-related examples and seems more complicated than it should be. I had to take the new dataset created by R and run MATCH FILES with my original dataset. All I want to do is a) pull numbers from SPSS into R, b) manipulate them-in this case, finding a polyroot that fit certain criteria- , and c) put the results right back into the SPSS dataset without messing up any of the previous data.
Am I missing something that would make this more simple? Keep in mind that I have zero R experience outside of this attempt, but I have decent experience in programming SPSS and matlab.
Thanks in advance for any help you give!
R in SPSS can create new SPSS datasets, but it can't modify an existing one. There are a lot of situations where the data from R would be dimensionally inconsistent with the active SPSS dataset. So you need to create a dictionary and data frame using the apis above and then do whatever is appropriate on the SPSS side if you need to match back. You might want to submit an enhancement request for SPSS at suggest#us.ibm.com

Resources