apologies for what is probably an already-answered question, I couldn't seem to find what I was looking for in the archives.
I'm currently in the process of trying to merge multiple excel files into one df for data analysis.
It's experimental data across different versions, and the variables in each column in excel are inconsistent (ie, in Version 1, ReactionTime is in Column AB, in Version 2 it's in AG). I need to merge the values from specified variables from the (~24) data files with different column structures into one long format df.
I've only ever used an excel macro to merge files before, and am unsure of how to go about specifying the variable names for merging. Any help you could provide would be appreciated!
Related
Apologies if this has already been answered somewhere else.
So I have a dataset in R that contains a certain amount of variables. When I preview it some of the variables I need for my analysis are there but outside the count of the variables as if they were in subsections of the dataframe itself.
Now I managed to access some of it using sapply (these were sublists in the dataframe) but there are several others I cannot.
I am still unable to access a column that is containing the country information for my data set.
It looks as if it is contained in another variable.
Any suggestions how to bring this variable on the same plane as the others in the data set and eliminate subfolders?
I have to deal with data organized by row. So, R reads observation as variables and variables as observation. I have tried to transpose using function t() but R changed all data to character.
The original file is a .csv one.
Thank you.
It's the first time I deal with Matlab files in R.
The rationale for saving the information in a .mat file type was the length. (the dataset contains 226518 rows). We were worried to excel (and then a csv) would not take them.
I can upload the original file if necessary
So I have my Matlab file and when I open it in Matlab all good.
There are various arrays and the one I want is called "allPoints"
I can open it and then see that it contains values around 0.something.
Screenshot:
What I want to do is to extract the same data in R.
library(R.matlab)
df <- readMat("170314_Col_HD_R20_339-381um_DNNhalf_PPP1-EN_CellWallThickness.mat")
str(df)
And here I get stuck. How do I pull out "allPoints" from it. $ does not seem to work.
I will have multiple files that need to be put together in one single dataframe in R so the plan is to mutate each extracted df generating a new column for sample and then I will rbind together.
Could anybody help?
can someone provide and equivalent code in SPSS that merges datasets in SPSS to replicate the rbind and cbind commands usable in R ? Many thanks !
To add rows from dataset1 to dataset2, you can use ADD FILES. This requires that both datasets hold the same variables, with matching variable names and formats.
To add columns from dataset1 to dataset2, use MATCH FILES. This command matches the values for the desired variables in dataset2 to the right rows in dataset1 using keys present in both files (such as a respondent id). The keys are defined in the BY subcommand.
Please note that R and SPSS work in a totally different way. In short, SPSS (mainly) works with datasets in which variables are defined and formatted, while R can handle single values, vectors, matrices, dataframes etc. Simply copying columns from an existing dataset to another dataset (without paying attention tohow the files are sorted) and simply adding rows without matching the variable names and types in the existing dataset are very unusual in SPSS.
If you post an example of what you are trying to achieve, I could give you a more useful answer...
I am very familiar with Excel but new to R. I have several years worth of data across multiple spreadsheets:
data1996.csv
data1997.csv
...
data2013.csv
Each csv is about 500,000 rows by 1700 columns.
I want to manipulate this data in D3 and plan to remove columns that are not essential to calculation. My goal is to create a slider with years that will create a corresponding visualization. I want to know what the easiest way is to aggregate these massive datasets. I suppose it could be done manually, but this would prove cumbersome and inefficient.
I am very new to R, but if there is another means to aggregate the data into a single CSV, that would work fine as well.
Any help and suggestions are appreciated.