I am new to R and trying to use R to run the report I am currently doing in excel. Most of the topics here have been so helpful to me translating excel formula to R codes, however, I am struggling to generate codes for below excel if statement
=IF(AND(G2="SEA",OR(F2="FCL",F2="BCN")),W2*40,IF(G2="AIR",X2/1000*66,""))
G Column corresponds to Container/Product
F Column corresponds to Transport Mode
AI and AJ correspond to the volumes associated to each Transport mode
Appreciate all the help. Thanks
Here's the link to data exported to R
We can do a nested ifelse after reading the dataset
df1 <- read.csv("yourfile.csv", stringsAsFactors=FALSE)
ifelse(df1[,7]=="SEA" & df1[,6] %in% c("FCL", "BCN"),
df1[,35]*40, ifelse(df1[,7]=="AIR", df1[,36]*66, NA))
NOTE: Here we are referring to numeric index for extracting the columns as a reproducible example was not showed.
Related
I am analysing student level data from PISA 2015. The data is available in SPSS format here
I can load the data into R using the read_sav function in the haven package. I need to be able to edit the data in R and then save/export the data in SPSS format with the original value labels that are included in the SPSS download intact. The code I have used is:
library(haven)
student<-read_sav("CY6_MS_CMB_STU_QQQ.sav",user_na = T)
student2<-data.frame(student)
#some edits to data
write_sav(student2,"testdata1.sav")
When my colleague (who works in SPSS) tries to open the "testdata1.sav" the value labels are missing. I've read through the haven documentation and can't seem to find a solution for this. I have also tried read/write.spss in the foreign package but have issues loading in the dataset.
I am using R version 3.4.0 and the latest build of haven.
Does anyone know if there is a solution for this? I'd be very grateful of your help. Please let me know if you require any additional information to answer this.
library(foreign)
df <- read.spss("spss_file.sav", to.data.frame = TRUE)
This may not be exactly what you are looking for, because it uses the labels as the data. So if you have an SPSS file with 0 for "Male" and 1 for "Female," you will have a df with values that are all Males and Females. It gets you one step further, but perhaps isn't the whole solution. I'm working on the same problem and will let you know what else I find.
library ("sjlabelled")
student <- sjlabelled::read_spss("CY6_MS_CMB_STU_QQQ.sav")
student2 <-student
write_spss(student2,"testdata1.sav")
I did not try and hope it works. The sjlabelled package is good with non-ascii-characters as German Umlaute.
But keep in mind, that R saves the labels as attributes. These attributes are lost, when doing some data transformations (as subsetting data for example). When lost in R they won't show up in SPSS of course. The sjlabelled::copy_labels function is helpful in those cases:
student2 <- copy_labels(student2, student) #after data transformations and before export to spss
I think you need to recover the value labels in the dataframe after importing dataset into R. Then write the that dataframe into sav file.
#load library
libray(haven)
# load dataset
student<-read_sav("CY6_MS_CMB_STU_QQQ.sav",user_na = T)
#map to find class of each columns
map_dataset<-map(student, function(x)attr(x, "class"))
#Run for loop to identify all Factors with haven-labelled
factor_variable<-c()
for(i in 1:length(map_dataset)){
if(map_dataset[i]!="NULL"){
name<-names(map_dataset[i])
factor_variable<-c(factor_variable,name)
}
}
#convert all haven labelled variables into factor
student2<-student %>%
mutate_at(vars(factor_variable), as_factor)
#write dataset
write_sav(student2, "testdata1.sav")
We are working in Stata with data created in R, that have been exported using haven package. We stumbled upon an issue with variables that have a dot in the name. To replicate the problem, some minimal R code:
library("haven")
var.1 <- c(1,2,3)
var_2 <- c(1,2,3)
test_df <- employ.data <- data.frame(var.1, var_2)
str(test_df)
write_dta(test_df, "D:/test_df.dta")
Now, in Stata, when I do:
use "D:\test_df.dta"
d
First problem - I get an empty dataset. Second problem - we get variable name with a dot - which in Stata should be illegal. Therefore any command using directly the variable name like
drop var.1
returns an error:
factor variables and time-series operators not allowed
r(101);
What is causing such behaviour? Any solutions to this problem?
This will drop var.1 in Stata:
drop var?1
Here (as in Excel), ? is used as a wildcard for a single character. (The regular expression equivalent to .)
Unfortunately, this will also drop var_1, if it exists.
I am not sure about the missing values when writing a .dta file with haven. I am able to replicate this result in Stata 14.1 and haven 0.2.0.
However, using the read_dta function from haven,
temp2 <- read_dta("test_df.dta")
returns the data.frame. As an alternative to haven, I have used the readstata13 package in the past without issues.
library(readstata13)
save.dta13(test_df, "testdf.dta")
While this code has the same variable names issue, it provided a .dta file that contained the correct values when read into Stata 14.1. There is a convert.underscore argument to save.dta13, that is intended to remove non-valid characters in Stata variable names. I verified that it will work properly in this example for readstata13 for version 0.8.5, but had a bug in some earlier versions including version 0.8.2.
Is there anyway without converting to char to replace NA with blank or nothing?
I used
data_model <- sapply(data_model, as.character)
data_model[is.na(data_model)] <- " "
data_model=data.table(data_model)
however it changes all the columns' types to categorical.
I want to save the data set and use it in sas it does not understand NA.
Here's a somewhat belated (and shameless self-promotion) from The R Primer on how to export a data frame to SAS. It should automatically correctly handle your NAs:
First you can use the foreign package to export the data frame as a SAS xport dataset. Here, I'll just export the trees data frame.
library(foreign)
data(trees)
write.foreign(trees, datafile = "toSAS.dat",
codefile="toSAS.sas", package="SAS")
This gives you two files, toSAS.dat and toSAS.sas. It is easy to get the data into SAS since the codefile toSAS.sas contains a SAS script that can be read and interpreted directly by SAS and reads the data in toSAS.dat.
I have successfully added information to shapefiles before (see my post on http://rusergroup.swansea.ac.uk/Healthmap.ashx?HL=map ).
However, I just tried to do it again with a slightly different shapefile (new local health boards for Wales) and the code fails at spCbind with a "row names not identical error"
o <- match(wales.lonlat$NEW_LABEL, wds$HB_CD)
wds.xtra <- wds[o,]
wales.ncchd <- spCbind(wales.lonlat, wds.xtra)
My rows did have different names before and that didn't cause any problems. I relabeled the column in wds.xtra to match "NEW_LABEL" and that doesn't help.
The labels and order of labels do match exactly between wales.lonlat and wds.xtra.
(I'm using Revolution R 5.0, which is built on R 2.13.2)
I use match to merge data to the sp data slot based on rownames (or any other common ID). This avoids the necessity of maptools for the spCbind function.
# Based on rownames
sdata#data=data.frame(sdata#data, new.df[match(rownames(sdata#data), rownames(new.df)),])
# Based on common ID
sdata#data=data.frame(sdata#data, new.df[match(sdata#data$ID, new.df$ID),])
# where; sdata is your sp object and new.df is a data.frame object that you want to merge to sdata.
I had the same error and could resolve it by deleting all other data, which were not actually to be added. I suppose, they confused spCbind because the matching wanted to match all row-elements, not only the one given. In my example, I used
xtra2 <- data.frame(xtra$ID_3, xtra$COMPANY)
to extract the relevant fields and fed them to spCbind afterwards
gadm <- spCbind(gadm, xtra2)
I've a question about using sqlSave. How does R map RODBC data in the data frame to the database table columns?
If I've a table with columns X and Y and a data frame with columns X and Y, RODBC puts X into X and Y into Y (I found out by trail-and-error). But can I explicitly tell R how to map data.frame columns to database table columns, like put A in X and B in Y.
I'm rather new to R and think the RODBC manual is a bit cryptic. Nor can I find an example on the internet.
I'm now doing it this way (maybe that's also what you meant):
colnames(dat) <- c("A", "B")
sqlSave(channel, dat, tablename = "tblTest", rownames=FALSE, append=TRUE)
It works for me. Thanks for your help.
You should find the fine R manuals of great help as you start to explore R, and its help facilities are very good too.
If you start with
help(sqlSave)
you will see the colNames argument. Supplying a vector c("A", "B") would put your first data.frame column into a table column A etc.
I'm having massive problems using sqlSave with an IBM DB2 databank. I'm trying to avoid it by using sqlQuery instead to create the table with the correct formatting and then use sqlSave with append=T to force my R table into the database table. This resolve a lot of problems such as date formats and floating point numbers (instead of doubles).