I'm trying to create a sparse-matrix and get this error message:
Error: fnames == names(mf) are not all TRUE
I think it has something to do with the column names of my data, maybe you can help.
Here are the column names:
Error: fnames == names(mf) are not all TRUE
colnames(trainDataShrinkage) <-"Bildungsgrad2_Lower_secondary_education"
,"Bildungsgrad3_Upper_secondary_education"
,"Bildungsgrad4_Post-secondary_non-tertiary_education"
,"Bildungsgrad5_Short-cycle_tertiary_education"
,"Bildungsgrad6_Bachelors_or_equivalent_level"
,"Bildungsgrad7_Masters_or_equivalent_level"
,"Bildungsgrad8_Doctoral_or_equivalent_level"
,"Familienstand2_Verheiratet,_getrenntlebend"
,"Familienstand3_Ledig"
,"Familienstand4_Geschieden,_eing._gleichg._Partn._aufgehoben"
,"Familienstand5_Verwitwet,_Lebenspartner/in_verstorben"
,"Familienstand6_Eing._gleichg._Partn.,_zusammenlebend"
,"Geschlecht2_Weiblich"
,"Migrationshintergrund2_direkter_Migrationshintergrund"
,"Migrationshintergrund3_indirekter_Migrationshintergrund"
,"Bundesland2_Hamburg"
,"Bundesland3_Niedersachsen"
,"Bundesland4_Bremen"
,"Bundesland5_Nordrhein-Westfalen"
,"Bundesland6_Hessen"
,"Bundesland7_Rheinland-Pfalz"
,"Bundesland8_Baden-Wuerttemberg"
,"Bundesland9_Bayern"
,"Bundesland10_Saarland"
,"Bundesland11_Berlin_(West_und_Ost)"
,"Bundesland12_Brandenburg"
,"Bundesland13_Mecklenburg-Vorpommern"
,"Bundesland14_Sachsen"
,"Bundesland15_Sachsen-Anhalt"
,"Bundesland16_Thueringen"
,"Unternehmengroesse2_5 bis_10"
,"Unternehmengroesse3_11_bis_unter_20"
,"Unternehmengroesse6_20_bis_unter_100"
,"Unternehmengroesse7_100_bis_unter_200"
,"Unternehmengroesse9_200_bis_unter_2000"
,"Unternehmengroesse10_2000_und_mehr"
,"Erwerbsstatus2_Teilzeitbeschaeftigung"
,"Erwerbsstatus4_Geringfuegig_beschaeftigt"
,"Stundenlohn"
,"AlterEins"
,"AlterZwei"
,"AlterDrei"
,"AlterFuenF"
,"AlterSechs"
,"BildungsjahreEins"
,"BildungsjahreZwei"
,"BildungsjahreDrei"
,"BildungsjahreVier"
,"BildungsjahreFuenf"
,"ArbeitsmarkterfahrungVollzeitEins"
,"ArbeitsmarkterfahrungVollzeitZwei"
,"ArbeitsmarkterfahrungVollzeitDrei"
,"ArbeitsmarkterfahrungVollzeitVier"
,"ArbeitsmarkterfahrungVollzeitFuenf"
,"ArbeitsmarkterfahrungTeilzeitEins"
,"ArbeitsmarkterfahrungTeilzeitZwei"
,"ArbeitsmarkterfahrungTeilzeitDrei"
,"ArbeitsmarkterfahrungTeilzeitVier"
,"ArbeitsmarkterfahrungTeilzeitFuenf"
,"BruttoverdienstLetztenMonatEins"
,"BruttoverdienstLetztenMonatZwei"
,"BruttoverdienstLetztenMonatDrei"
,"BruttoverdienstLetztenMonatVier"
,"BruttoverdienstLetztenMonatFuenf")
It does not like some special characters in the column names. I faced issues with column names starting with #, 1, . and /. Can you try to replace these occurrences with '_'.
Simplest way would be to trim your column names off any special characters. Let me know if you cannot rename them due to any limitation.
Related
I am trying to clean a dataset with the column: ltaCpInfoDF$weekdays_rate_1
For some of the rows, I would like to do this:
input: Daily(7am-11pm): $1.20 ; output: 7am-11pm: $1.20
The values within the bracket can be different timings for the rows.
Initially, I was thinking of removing by part such as removing "Daily(" with gsub first then removing ")". However, I seem to be facing issues with that.
ltaCpInfoDF$weekdays_rate_1 <- gsub("Daily(", "", ltaCpInfoDF$weekdays_rate_1)
Here is the error shown:
Error in gsub("Daily(", "", ltaCpInfoDF$weekdays_rate_1) :
invalid regular expression 'Daily(', reason 'Missing ')''
In addition: Warning message:
In gsub("Daily(", "", ltaCpInfoDF$weekdays_rate_1) :
TRE pattern compilation error 'Missing ')''
Could someone share with me a better way? Thank you in advance!
Use sub with a capture group:
input <- "Daily(7am-11pm): $1.20"
output <- gsub("\\S+\\s*\\((.*?)\\)", "\\1", input)
output
[1] "7am-11pm: $1.20"
We may use without capturing
gsub("^[^(]+\\(|\\)", "", str1)
[1] "7am-11pm: $1.20"
data
str1 <- "Daily(7am-11pm): $1.20"
I would like to paste "miR" to strings that do not have "miR" already, and skipping those that have it.
paste("miR", ....)
in
c("miR-26b", "miR-26a", "1297", "4465", "miR-26b", "miR-26a")
out
c("miR-26b", "miR-26a", "miR-1297", "miR-4465", "miR-26b", "miR-26a")
One way could be by removing "miR" if it is present in the beginning of the string using sub and pasting it to every string irrespectively.
paste0("miR-", sub("^miR-","", x))
#[1] "miR-26b" "miR-26a" "miR-1297" "miR-4465" "miR-26b" "miR-26a"
data
x <- c("miR-26b", "miR-26a", "1297", "4465", "miR-26b", "miR-26a")
vec <- c("miR-26b", "miR-26a", "1297", "4465", "miR-26b", "miR-26a")
sub("^(?!miR)(.*)$", "miR-\\1", vec, perl = T)
#[1] "miR-26b" "miR-26a" "miR-1297" "miR-4465" "miR-26b" "miR-26a"
If you want to learn more:
type ?sub into R console
learn regex, have a closer look at negative look ahead, capturing groups LEARN REGEX
I've used perl = T because I get an error if I don't. READ MORE
This issue is giving me a lot of trouble, even though it should be fixed eaily. I have a dataset with the columns id and poster. I want to change the poster's value if the id value contains a certain string. See data below:
test_df
id poster
143537222999_2054 Kevin
143115551234_2049 Dave
14334_5334 Eric
1456322_4334 Mandy
143115551234_445633 Patrick
143115551234_4321 Lars
143537222999_56743 Iris
I would like to get
test_df
id poster
143537222999_2054 User
143115551234_2049 User
14334_5334 Eric
1456322_4334 Mandy
143115551234_445633 User
143115551234_4321 User
143537222999_56743 User
Both the columns are characters. I would like to change the poster's value to "User" if id value contains "143537222999", OR "143115551234". I have tried the following codes:
Match within/which
test_df <- within(test_df, poster[match('143115551234', test_df$id) | match('143537222999', test_df$id)] <- 'User')
This code gave me no errors, but it didn't change any of the values in the poster column. When I replace within for which, I get the error:
test_df <- which(test_df, poster[match('143115551234', test_df$id) | match('143537222999', test_df$id)] <- 'User')
Error in which(test_df, poster[match("143115551234", test_df$id) | :
argument to 'which' is not logical
Match different variant
test_df <- test_df[match(id, test_df, "143115551234") | match(id, test_df, "143537222999"), test_df$poster] <- 'User'
This code gives me the error:
Error in `[<-.data.frame`(`*tmp*`, match(id, test_df, "143115551234") | :
missing values are not allowed in subscripted assignments of data frames
In addition: Warning messages:
1: In match(id, test_df, "143115551234") :
NAs introduced by coercion to integer range
2: In match(id, test_df, "143537222999") :
NAs introduced by coercion to integer range
After looking up this error I found out that the integers in R are 32-bits and the maximum value of an integer is 2147483647. I'm not sure why i'm getting this error because R states that my column is a character.
> lapply(test_df, class)
$poster
[1] "character"
$id
[1] "character"
Grepl
test_df[grepl("143115551234", id | "143537222999", id), poster := "User"]
This code raises the error:
Error in `:=`(poster, "User") : could not find function ":="
I'm not sure what the best way is to fix this error, I have tried multiple variaties and keep getting across different errors.
I have tried multiple answers from multiple questions that were asked before on here, but I still can't get to fix some errors.
Use grepl with ifelse:
df$poster <- ifelse(grepl("143537222999|143115551234", df$id), "User", df$poster)
Demo
You may try this using grepl.
df[grepl('143115551234|143537222999', df$id),"poster"] <- "User"
So, all the true for above matched in poster column getting replaced by "User"
> df[grepl('143115551234|143537222999', df$id),"poster"] <- "User"
> df
id poster
1 143537222999_2054 User
2 143115551234_2049 User
3 14334_5334 Eric
4 1456322_4334 Mandy
5 143115551234_445633 User
6 143115551234_4321 User
7 143537222999_56743 User
How do I select multiple columns by name without having to type out each name.
For example I have the following code:
CTDB[, c(
"ENJOY_TV_RADIO_CHILD",
"ENJOY_FMLY_CLOSE_FRND_CHILD",
"ENJOY_HOBBIES_CHILD",
"ENJOY_FAV_MEAL_CHILD",
"ENJOY_SHOWER_CHILD",
"ENJOY_SCENT_CHILD",
"ENJOY_PPL_SMILE_CHILD",
"ENJOY_LOOK_SMART_CHILD",
"ENJOY_READ_CHILD",
"ENJOY_FAV_DRINK_CHILD",
"ENJOY_SMALL_THINGS_CHILD",
"ENJOY_LANDSCAPE_CHILD",
"ENJOY_HELP_OTHR_CHILD",
"ENJOY_PRAISE_CHILD")] <-revalue(as.matrix(CTDB[, c(
"ENJOY_TV_RADIO_CHILD",
"ENJOY_FMLY_CLOSE_FRND_CHILD",
"ENJOY_HOBBIES_CHILD",
"ENJOY_FAV_MEAL_CHILD",
"ENJOY_SHOWER_CHILD",
"ENJOY_SCENT_CHILD",
"ENJOY_PPL_SMILE_CHILD",
"ENJOY_LOOK_SMART_CHILD",
"ENJOY_READ_CHILD", '
"ENJOY_FAV_DRINK_CHILD",
"ENJOY_SMALL_THINGS_CHILD",
"ENJOY_LANDSCAPE_CHILD",
"ENJOY_HELP_OTHR_CHILD",
"ENJOY_PRAISE_CHILD")]), c("0"=3, "1"=2, "2"=1, "3"=0))
All the columns are in order but instead of selecting by number like below
CTDB[,74:87] <-revalue(as.matrix(CTDB[,74:87]), c("0"=3, "1"=2, "2"=1, "3"=0))
I would like to select by the name of the column.
Thank you!
You should use grep or grepl
CTBD[,grep("^ENJOY.*CHILD$",colnames(CTBD)]
or
CTBD[,grepl("^ENJOY.*CHILD$",colnames(CTBD)]
If you need to do this as part of a pipe, you can also use dplyr::select and its helper functions in two equivalent ways, including one that can avoid regular expressions:
CTBD %>% select(matches("^ENJOY.*CHILD$"))
CTBD %>% select(intersect(starts_with("ENJOY"), ends_with("CHILD")))
PROBLEM
I have many .RData files in one folder and I want to extract the coordinates continued in each .rdata file. I'd also like to link the concomitant file name(use_hab) and datetime(dt) to each row of their respective coordinates.
CODE
file.namez<-list.files("C:/fitting/fitdata/7 27 2015") #name of files
#file.namez.rev<-file.namez[grep(".RData",file.namez)]
datastor<-data.frame(matrix(NA,length(file.namez),4))
names(datastor)<-c("use_hab",paste("B",1:3,sep=""))
allresults<-NULL
for(i in 1:length(file.namez))
{
datastor<-NULL
print(file.namez[i])
load(paste("C:/fitting/fitdata/7 27 2015/",file.namez[i], sep=""))
use_hab <- as.character(as.data.frame(strsplit(file.namez[i],"_an"))[2,])# this line is used to remove unwanted parts of the file name
use_hab <- gsub(".RData","", use_hab)
datastor <- fitdata$coords
datastor$use_hab <- use_hab
datastor$dt <- fitdata$dt
allresults <- rbind(allresults, datastor[,c(3,4,1,2)])
}
This is only result before the error message:
[1] "fitdata_anw514_yr2008.RData"
ERROR
Error in datastor[, c(3, 4, 1, 2)] : incorrect number of dimensions
In addition: Warning message:
In datastor$use_hab <- use_hab : Coercing LHS to a list
QUESTION
How am I getting the incorrect number of dimensions? Each file name should have 1098 coordinates and date time. In total, 63 files x 1098 rows with 4 columns(filename, datetime, x, y).
The desired result is to have the file name as the first column, the date time as the second column, and the x and y coordinates as the third and fourth columns.
Replace
datastor <- fitdata$coords
with
datastor$coords <- fitdata$coords
The error message Coercing LHS to a list is thrown when you try to access something with $ that does not support this. datastor <- fitdata$coords changes datastor to the data type of fitdata$coords.
Also, you'd change
allresults<-NULL
datastor<-NULL
to
allresults <- data.frame()
datastor <- data.frame()
but this may just my personal preference.