As the title says, I'm not a programmer. I've tried R before, got very confused and abandoned it. I'm a physician, and I do all my statistics either with SPSS or Excel. I'd like to learn some coding for when I get into problems like this:
I have an ascii file that I'd like to extract data from. The fields are contained within columns of variable width. 90% of the file is useless to me. For example, the fields I'm interested in extracting are encoded in columns 00645-00649, 03315-03319, etc. I'd like to get this into a format so I can run stats in SPSS/Excel. Should I be looking to use R, Python, something else or am I totally beyond hope?
Thanks in advance.
It's impossible to say for certain given only the information here, but the DATA LIST command in SPSS may well allow you to read the data into SPSS directly from the current file. If you can specify the column locations of the desired variables, you can specify those on that command, and SPSS will simply skip over the unnamed columns.
Related
I have a large population survey dataset for a project and the first step is to make exclusions and have a final dataset for analyses. To organize my work, I must continue my work in a new file where I derive survey variables correctly. Is there a command used to continue work by saving all the previous data and code to the new file?
I don´t think I understand the problem you have. You can always create multiple .R files and split the code among them as you wish, and you can also arrange those files as you see fit in the file system (group them in the same folder with informative names and comments, etc...).
As for the data side of the problem, you can load your data into R, make any changes / filters needed, and then save it to another file with one of the billions of functions to write stuff to the disk: write.table() from base, fwrite() from data.table (which can be MUCH faster), etc...
I feel that my answer is way too obvious. When you say "project" you mean "something I have to get done" or the actual projects that you can create in rstudio. If it´s the first, then I think I have covered it. If it´s the second, I never got to use that feature so I am not going to be able to help :(
Maybe you can elaborate a bit more.
I have a large R Markdown file with many different outputs. The dataset is still being collected, and I often reknit the file to get an update including the most recent data. I would like to automatically see what has changed from the last time without needing to page through the entire output.
A) Is there an easier strategy than writing code to extract all the values from the output and formatting a side-by-side presentation myself?
B) The output includes several figures. I would like to compare these as well, but I would be happy with a solution that only compares numbers.
C) I would also be satisfied with a function or package that saves a defined subset of variables and lets me compare them to the values of variables saved with the same name in the past.
I'm pretty new to R.
I just imported a CSV file into my R environment. I see the name of the dataframe and the name of the columns, but there is information below and I don't know what to make of it.
It looks like it might be records of the data types that R guessed when it imported the data, but I'm not sure. Can you confirm what it's communicating? Thanks!
The functions in the readr package add an extra spec attribute to the data they read in. It won't affect anything in later code. I assume it is intended for debugging purposes in case your data is imported poorly, you can see what was used so you can try something different in a manual override.
It's mentioned a bit in the readr vignette, especially the last two sections. You can access the spec attribute directly with either attr(your_data, "spec") or spec(your_data).
I am learning to use R and I am working with the for loop
Here is an example:
for (loopvalues in c(1,5,8,10,19)){
print(paste("The number is", loopvalues))
}
I was wondering what can be done if the list of values is as big as 100 or 1000 different values and they follow no patterns.
I imagined I can have the values saved in a csv or a txt file beforehand, but how could I tell the loop command to read the values from that file?
I am sure the question is very basic, so I thank you beforehand for your help!
For loops can be used for extremely long lists however you will often find that they become slow and you will want to use other commands such as the apply family.
You do not have to name all the values in the loop. One way to accomplish this is using the in. Here is an example using the mtcars data set that is preloaded into R.
for(c in unique(mtcars$carb)){
print(c)
}
By using the unique function, I don't even have to know what all the possible values of mtcars$carb are but I can still loop through them.
Additionally, you probably want to practice your googling skills instead of asking StackOverflow. Most of the questions you're going to ask when learning R are already out there.
I just found a bunch of weather data that I would like to play around with in glmnet in R. First I've been reading and organizing the data in R, and right now I am just trying to look at the raw data of each variable. Unfortunately, each variable has a lot of data and R isn't able to print it all. Is there a way I can view all the raw data in R or just in the file itself? I've tried opening the file in excel to no success. Thanks!
Try to use Frequency tables, you can group by segments.
str() , summary(), table(), pairs(), plots() etc. There are several libraries (such as decr) which facilitate analyzing numerical and factor levels. Let me know if you need help with any.