read.spss Extra variables/fields - r

With the foreign package, I'm reading in a .sav file. When I open the file with PSPP there are 95 variables. However, read.spss("file") responds with a list of 353 variables. The extra variables are blank filled fields with 220 spaces. Has anyone ever experienced this?
Before you ask, I am unable to provide a reproducible example, as the data file and its contents are proprietary.
One obvious solution would be to search for list elements that contain only spaces and set them the list element to NULL or each element with 220 spaces to NA and then drop NA columns.
But I'd like to avoid having to further post process my files if necessary. Does anyone have a fix for this?

I've had something similar before. This happened when the data was exported from SPSS CATI (the field interview application), rather than the SPSS we know and love.
In my case the resolution was to play around with the arguments to read.spss. I found that setting use.missings=FALSE resolved the problem, i.e. something like:
read.spss(global$datafile, to.data.frame=TRUE, use.missings=FALSE)
Good luck, and my sympathy. I know how frustrating this was for me.

Related

Add row into table even when some data not found in power automate?

Hi I am trying to use AI Builder to scan some titles and populate into a spreadsheet once the pdf is dropped into a folder.
It works fine where it find all the Data but if it can not find any data in the columns starting with SOL then it does bring anything through. I would like it too still bring through any data from the first 3 columns even if nothing is found for the "SOL" columns. Can anyone help please?
Example Output as needed. Currently row 3 will not come through.
Tried some conditions and compose
Maybe you can also post your message in the Power automate community.

Update a table with English words/vocabulary (tsv file) with dictionary definitions

I have a .tsv file with a list of words/vocabulary in English without a translation or dictionary definition from my Amazon Kindle and I want to update the table with each word's definition by some dictionary. There are over a 1000 words on that list and I have no way of doing this manually.
Is there any app or program that might do the trick?
If programming something is necessary I pretty good in R and a bit in Swift. I haven't found an R package that might apply.
Anyone any ideas? I would really appreciate it. Thanks!
Here is a sample.
Sample
Most of that table is blank on the right side. I'd like some sort of definition for each word in those blanks.

R studio wont recognize my variables? any ideas why?

So I imported the file and saved it as Purity and its clearly imported. I tried t-test but it doesn't recognize my variables. I tried using the names function to retrieve the variable names and its exactly what I'm entering, V1 and V2. I also tried with Lab-1 and Lab-2. I also tried just using dataset=Purity, all to no avail.
I took a screenshot so as to show code and that data is in the studio, can anyone tell me why this is not working?
apologies if this is painfully obvious I was only introduced to R for stats last week and am still a beginner, also don't have much experience with programming in general. I have looked at other similar problems but just cant see why mine aren't being recognised and others are.
You've got 2 issues here:
1). You don't show how you imported the dataset but you need to either remove the first row or (better) name the columns correctly. I'm assuming you imported the data with read.table(). If so, then include the argument header = TRUE when you import the data.
2). You need to tell R where you want it to get Lab-1 and Lab-2 from.
with(Purity, t.test(Lab-1, Lab-2, paired = TRUE))
r is case sensitive. It looks like your script is using a lowercase 'v" when its an upper case "V" in your variable names.
The problem is in the way you have named your variable. r doesn't recognise the hyphen (-) as a legitimate part of a variable name. Try using underscore (_) instead.

Specify end of record (EOL) delimiter while importing from a text file?

I'm trying to import into R a large number of pipe-delimited files that were created in a windows environment, with CR+LF as the end of record (=EOL) delimiter. However, they also have CR's scattered about periodically, which is resulting in frequent inappropriately-split lines. Ideally, want an efficient way to solve this problem from within R - either by finding a way to specify the EOL delimiter when I import, or, if necessary, by reading in the text file and excising the CRs before any parsing of lines is done.
The creators of the files comment on this problem and recommend adding "TERMSTR= CRLF" into your SAS code, and I can find lots of discussions of how to do this in other languages as well. For R, however, all I can find is this discussion, here on stackoverflow:
Possible to change the record delimiter in R?
The sample problem given is a great match for my problem. The solution identified is nice for their specific situation of having a single file like this, but for me would require coding up separate scripts for importing each of the dozens of files, since each have different primary keys that would need to be recognized after the fact to repair the inappropriate import. Alternatively, I could open each file in something like Notebook++ to remove the extra CR's but again, that seems quite inefficient, and then would have to be repeated by hand every time the initial data set was updated by its producers.
Given how frequent a problem this seems to be for people, and the existence of hard-coded solutions in other programming languages, I'm confused as to why there isn't a fix in R and feel like I must be missing something. It seems clear (I think?) that there's no way to do this directly from read.table or even from readLines, but is there a way perhaps to do this using scan, that I'm missing?
Thanks for any thoughts!

reading in large text files in r

I want to read in a large ido file that had just under 110,000,000 rows and 8 columns. The columns are made up of 2 integer columns and 6 logical columns. The delimiter "|" is used in the file. I tried using read.big.matrix and it took forever. I also tried dumpDf and it ran out of RAM. I tried ff which I heard was a good package and I am struggling with errors. I would like to do some analysis with this table if I can read it in some way. If anyone has any suggestions that would be great.
Kind Regards,
Lorcan
Thank you for all your suggestions. I managed to figure out why I couldn't get the error to work. I'll give you all answers and suggestions so no one can make my stupid mistake again.
First of all, the data that was been giving to me contained some errors in it so I was doomed to fail from the start. I was unaware until a colleague came across it in another piece of software. In a column that contained integers there were some letters so that when the read.table.ff package tried to read in the data set it somehow got confused or I don't know. Whatever though I was given another sample of data, 16,000,000 rows and 8 columns with correct entries and it worked perfectly. The code that I ran is as follows and took about 30 seconds to read:
setwd("D:/data test")
library(ff)
ffdf1 <- read.table.ffdf(file = "test.ido", header = TRUE, sep = "|")
Thank you all for your time and if you have any questions about the answer feel free to ask and I will do my best to help.
Do you really need all the data for your analysis? Maybe you could aggregate your dataset (say from minute values to daily averages). This aggregation only needs to be done once, and can hopefully be done in chunks. In this way you do need to load all your data into memory at once.
Reading in chunks can be done using scan, the important arguments are skip and n. Alternatively, put your data into a database and extract the chunks in that way. You could even using the functions from the plyr package to run chunks in parallel, see this blog post of mine for an example.

Resources