Related
Hi I'm trying to extract the full company name from a string description about the company with bert-base-ner. I am also open to trying other methods but I couldn't really find one. The issue is that although it tags the orgs correctly, it tags it by word/token so I can't easily extract the full company name without having to concat and build it myself.
Is there an easier way or model to do this?
Here is my code:
from transformers import AutoTokenizer, AutoModelForTokenClassification
from transformers import pipeline
tokenizer = AutoTokenizer.from_pretrained("dslim/bert-base-NER")
model = AutoModelForTokenClassification.from_pretrained("dslim/bert-base-NER")
nlp = pipeline("ner", model=model, tokenizer=tokenizer)
ner_results = nlp(text1)
print(ner_results)
Here is my output for one text string:
[{'entity': 'B-ORG', 'score': 0.99965024, 'index': 1, 'word': 'Orion', 'start': 0, 'end': 5}, {'entity': 'I-ORG', 'score': 0.99945647, 'index': 2, 'word': 'Metal', 'start': 6, 'end': 11}, {'entity': 'I-ORG', 'score': 0.99943095, 'index': 3, 'word': '##s', 'start': 11, 'end': 12}, {'entity': 'I-ORG', 'score': 0.99939036, 'index': 4, 'word': 'Limited', 'start': 13, 'end': 20}, {'entity': 'B-LOC', 'score': 0.9997398, 'index': 14, 'word': 'Australia', 'start': 78, 'end': 87}]
I have faced a similar issue and solved it by using a better model called "xlm-roberta-large-finetuned-conll03-English" which is much better than the one you're using right now and will render the complete organization's name rather than the broken pieces. Feel free to test out the below-mentioned code which will extract the full organization's list from the document. Accept my answer by clicking on tick button if it founds useful.
from transformers import pipeline
from subprocess import list2cmdline
from pdfminer.high_level import extract_text
import docx2txt
import spacy
from spacy.matcher import Matcher
import time
start = time.time()
nlp = spacy.load('en_core_web_sm')
matcher = Matcher(nlp.vocab)
model_checkpoint = "xlm-roberta-large-finetuned-conll03-english"
token_classifier = pipeline(
"token-classification", model=model_checkpoint, aggregation_strategy="simple"
)
def text_extraction(file):
""""
To extract texts from both pdf and word
"""
if file.endswith(".pdf"):
return extract_text(file)
else:
resume_text = docx2txt.process(file)
if resume_text:
return resume_text.replace('\t', ' ')
return None
# Organisation names extraction
def org_name(file):
# Extract the complete text in the resume
extracted_text = text_extraction(file)
classifier = token_classifier(extracted_text)
# Get the list of dictionary with key value pair "entity":'ORG'
values = [item for item in classifier if item["entity_group"] == "ORG"]
# Get the list of dictionary with key value pair "entity":'ORG'
res = [sub['word'] for sub in values]
final1 = list(set(res)) # Remove duplicates
final = list(filter(None, final1)) # Remove empty strings
print(final)
org_name("your file name")
end = time.time()
print("The time of execution of above program is :", round((end - start), 2))
id:268 levels
group:10 levels
Fitted a mixed-effects model using gamm in the mgcv package. Fitted a generalized additive model for age and bmi, but could not read the output of the random effect using ranef
model:
ilrgamm1 <-gamm(y~z1+z2+z3+s(age)+sex+s(bmi)+time,random=list(id=~1+time,group=~1),data=data,method = "REML")
At this time, there are mysterious things like g and g.0 in the output of ranef
names(ranef(ilrgamm1$lme))
[1] "g" "g.0" "id" "group"
ranef(ilrgamm1$lme)[1:2]
$g
Xr1 Xr2 Xr3 Xr4 Xr5 Xr6 Xr7 Xr8
1 0.1130164 0.6108163 -0.1332607 0.4076337 -0.04366085 2.503919 -0.9792176 -0.5911858
$g.0
Xr.01 Xr.02 Xr.03 Xr.04 Xr.05 Xr.06 Xr.07 Xr.08
1/1 0.1983299 0.2758039 -1.100218 -0.4742126 -0.5449821 1.477916 -0.6329114 1.053759
What is Xr1 or Xr2?
Furthermore, there are 268 random effects in group, not 10.
$group
(Intercept)
1/1/102/1 0.0172579674
1/1/103/1 0.0201196786
1/1/104/1 -0.0281116571
1/1/105/1 -0.0217217446
1/1/106/1 0.0124654493
1/1/108/1 -0.0282589006
1/1/109/1 -0.0499878886
1/1/110/1 0.0492600500
1/1/111/1 0.0507119068
1/1/113/1 0.0546332994
1/1/114/1 0.0393550975
1/1/115/1 -0.0148861329
1/1/116/1 0.0375339571
1/1/117/1 0.0148069805
1/1/118/1 -0.0351320894
1/1/119/1 -0.1195068445
1/1/120/1 -0.1160370216
1/1/121/1 0.0473366382
1/1/122/1 -0.0111156856
1/1/123/1 0.0076577605
1/1/124/1 -0.0042122818
1/1/125/1 0.0249031339
1/1/126/1 -0.1207996724
1/1/127/1 0.0275137051
1/1/128/1 -0.0004621387
1/1/130/1 -0.0080189325
1/1/132/1 -0.0147162203
1/1/133/1 0.0019108355
1/1/134/1 0.0048134559
1/1/135/1 -0.0275929191
1/1/136/1 0.0024070977
1/1/138/1 -0.0364971159
1/1/139/1 -0.0250644476
1/1/140/1 -0.0161684667
1/1/143/1 0.0097684438
1/1/144/1 -0.0254024942
1/1/145/1 -0.0308170535
1/1/146/1 -0.0314913020
1/1/147/1 0.0047849092
1/1/148/1 0.0398563674
1/1/149/1 -0.0328543231
1/1/201/2 -0.0386289339
1/1/202/2 -0.0164038050
1/1/203/2 -0.0310222871
1/1/204/2 -0.0465893084
1/1/206/2 -0.0639166021
1/1/207/2 0.0178124681
1/1/208/2 -0.0215777533
1/1/209/2 -0.0008097909
1/1/211/2 -0.0276369553
1/1/218/2 -0.0233586483
1/1/219/2 -0.0381510950
1/1/220/2 -0.0245044572
1/1/221/2 0.0257439303
1/1/222/2 -0.0526194229
1/1/223/2 -0.0598638388
1/1/224/2 -0.0564427102
1/1/226/2 -0.0682312455
1/1/227/2 0.0025178471
1/1/228/2 0.0050413163
1/1/229/2 0.0006566180
1/1/230/2 -0.0394159991
1/1/233/2 -0.0339136266
1/1/234/2 -0.0355879691
1/1/235/2 0.0264388355
1/1/236/2 -0.0190059575
1/1/237/2 -0.0466046545
1/1/238/2 -0.0103843873
1/1/239/2 0.0030630609
1/1/242/2 -0.0385347399
1/1/246/2 -0.0233604289
1/1/247/2 -0.0549077802
1/1/249/2 -0.0309410264
1/1/250/2 -0.0138412118
1/1/251/2 -0.0236995292
1/1/252/2 -0.0263367786
1/1/253/2 -0.0158340565
1/1/254/2 -0.0003306973
1/1/255/2 -0.0106150344
1/1/256/2 -0.0223922258
1/1/258/2 0.0042958519
1/1/301/3 0.1100838962
1/1/302/3 0.0240153141
1/1/303/3 0.0403893185
1/1/306/3 0.0483381436
1/1/307/3 -0.0129870303
1/1/309/3 0.0173975588
1/1/310/3 -0.0189250961
1/1/313/3 0.0357035256
1/1/315/3 0.0012214394
1/1/316/3 0.0325373842
1/1/317/3 -0.0085589305
1/1/319/3 0.0524899049
1/1/321/3 0.0416124283
1/1/322/3 0.0095534385
1/1/325/3 0.0321591953
1/1/326/3 -0.0054073693
1/1/327/3 0.0050364482
1/1/328/3 0.0531385640
1/1/331/3 0.0232251446
1/1/332/3 0.0189221949
1/1/333/3 -0.0181158192
1/1/334/3 -0.0359340965
1/1/335/3 0.0083524511
1/1/336/3 -0.0118781160
1/1/337/3 -0.0085829648
1/1/338/3 0.0095829746
1/1/401/4 -0.0512378233
1/1/402/4 -0.0219261499
1/1/403/4 -0.0160446585
1/1/407/4 0.0017872369
1/1/408/4 -0.0371254332
1/1/409/4 0.0303154843
1/1/411/4 -0.0026150821
1/1/412/4 0.1418719283
1/1/414/4 -0.0556019328
1/1/415/4 0.0073027068
1/1/416/4 -0.0122557311
1/1/417/4 0.0367134933
1/1/418/4 -0.0253763258
1/1/419/4 -0.0203686506
1/1/421/4 -0.0187932155
1/1/422/4 -0.0189659510
1/1/423/4 -0.0306159126
1/1/424/4 0.0273308724
1/1/425/4 0.0040657657
1/1/426/4 0.0312199779
1/1/429/4 0.0036135869
1/1/430/4 -0.0256442792
1/1/433/4 0.0438767257
1/1/434/4 0.0150299855
1/1/435/4 -0.0058240553
1/1/436/4 0.0028309330
1/1/437/4 -0.0023443246
1/1/438/4 0.0115472464
1/1/439/4 -0.0071635162
1/1/441/4 -0.0187692003
1/1/442/4 -0.0301687031
1/1/443/4 -0.0054707553
1/1/501/5 0.0233900218
1/1/502/5 0.0270437356
1/1/503/5 -0.0505494678
1/1/504/5 -0.0555547708
1/1/506/5 -0.0232974224
1/1/508/5 -0.0316901016
1/1/510/5 0.0498275109
1/1/511/5 0.0140125034
1/1/513/5 -0.1284098189
1/1/514/5 0.0336408919
1/1/515/5 -0.0328592365
1/1/516/5 -0.0264024730
1/1/601/6 -0.0064048726
1/1/602/6 0.0136098007
1/1/603/6 0.0437196138
1/1/604/6 0.0685239133
1/1/605/6 -0.0141230573
1/1/606/6 0.0555226687
1/1/607/6 -0.0411745650
1/1/608/6 0.0219745785
1/1/609/6 -0.0045706685
1/1/610/6 -0.0176662070
1/1/611/6 0.0408741543
1/1/612/6 0.0187626096
1/1/613/6 0.0561545743
1/1/614/6 0.0284241671
1/1/615/6 0.0157012751
1/1/616/6 0.0496079608
1/1/701/7 -0.0398327297
1/1/702/7 -0.0140910866
1/1/705/7 0.0286548362
1/1/706/7 0.0369761615
1/1/708/7 0.0116733825
1/1/709/7 0.0001330362
1/1/710/7 0.0274371733
1/1/711/7 0.0090225922
1/1/712/7 0.0765875063
1/1/713/7 0.0148952419
1/1/714/7 -0.0054933850
1/1/716/7 0.0043641233
1/1/717/7 -0.0119174808
1/1/719/7 0.0010953154
1/1/723/7 -0.0371240564
1/1/801/8 0.0636698316
1/1/803/8 0.0246677751
1/1/804/8 -0.0445965919
1/1/806/8 -0.0289816619
1/1/807/8 0.0076561215
1/1/808/8 0.0237686430
1/1/809/8 0.0450896739
1/1/810/8 0.0149585857
1/1/811/8 0.0075693911
1/1/812/8 0.0085475577
1/1/813/8 -0.0136763527
1/1/814/8 0.0117384418
1/1/815/8 0.0067855948
1/1/816/8 0.0140344652
1/1/817/8 0.0103800524
1/1/818/8 -0.0361848876
1/1/819/8 0.0449431626
1/1/820/8 -0.0092320086
1/1/822/8 -0.0404730405
1/1/823/8 -0.0494073578
1/1/824/8 -0.0029941736
1/1/825/8 -0.0145742585
1/1/826/8 -0.0314564014
1/1/828/8 0.0183565957
1/1/829/8 0.0288121410
1/1/830/8 0.0286684412
1/1/831/8 0.0059331890
1/1/832/8 0.0341139486
1/1/833/8 0.0386864016
1/1/834/8 0.0147205534
1/1/835/8 -0.0031409478
1/1/901/9 0.0660687434
1/1/902/9 0.0564001190
1/1/903/9 0.0756466936
1/1/904/9 0.0096398307
1/1/905/9 0.0221015690
1/1/906/9 0.0046220720
1/1/907/9 0.0412366347
1/1/908/9 0.0284303878
1/1/909/9 0.0452359853
1/1/910/9 -0.0195940019
1/1/911/9 -0.0154676475
1/1/912/9 0.0184574647
1/1/913/9 0.0460445032
1/1/914/9 -0.0067133484
1/1/915/9 -0.0087355534
1/1/916/9 -0.0043938763
1/1/917/9 -0.0470434649
1/1/919/9 0.0794927553
1/1/920/9 0.0555903561
1/1/921/9 -0.0036186615
1/1/922/9 0.0078238313
1/1/923/9 0.0143975055
1/1/924/9 0.0731162776
1/1/925/9 -0.0065668921
1/1/926/9 0.0549429919
1/1/927/9 0.0368946293
1/1/928/9 0.0247474240
1/1/929/9 -0.0404517417
1/1/930/9 -0.0076552298
1/1/1001/10 0.0117082112
1/1/1002/10 0.0068444544
1/1/1003/10 0.0327977955
1/1/1004/10 0.0071551344
1/1/1005/10 -0.0052717304
1/1/1006/10 0.0483668189
1/1/1007/10 -0.0167403419
1/1/1008/10 -0.0364566907
1/1/1009/10 -0.0254350538
1/1/1010/10 0.0504571167
1/1/1011/10 -0.0039537094
1/1/1012/10 -0.0054692797
1/1/1013/10 0.0224140597
1/1/1014/10 -0.0310392331
1/1/1015/10 -0.0498130767
1/1/1016/10 -0.0223939677
1/1/1017/10 0.0041103780
1/1/1018/10 0.0880528857
1/1/1019/10 -0.0467056887
1/1/1022/10 -0.0769873686
1/1/1025/10 -0.0229126779
1/1/1028/10 -0.0340772236
1/1/1029/10 -0.0251866535
1/1/1030/10 0.0307034344
1/1/1031/10 -0.0369165146
1/1/1035/10 -0.0056637857
Why not 10 types, which is the number of levels in group?
Is there a reason you're using gamm() instead of gam()? Also, why list time as both a fixed and random effect?
If there aren't reasons for these choices, then one possible solution could be to fit your model as ilrgamm1 <- gam(y~z1+z2+z3+s(age)+sex+s(bmi)+s(time, bs="re"), data=data, method = "REML") - the s(time, bs="re") is indicating that time is a random effect.
Then you can use summary(ilrgamm1) to look at your results, and partial effects plots in plot.gam() or with ggpredict to visualize trends in your smoothed variables.
I try to use and adapt a notebook based on huggingface models: Text Classification on GLUE (https://colab.research.google.com/github/huggingface/notebooks/blob/master/examples/text_classification.ipynb#scrollTo=YZbiBDuGIrId)
My goal is to classify a sentence (16 classes predefined).
So I followed the notebook and did. My data looks like below.
id data label langue
0 text_1 label_1 Français
0 text_2 label_2 Français
1 text_3 label_3 Français
import pandas as pd
import numpy as np
from datasets import load_dataset, load_metric, DatasetDict, Features, Value, ClassLabel, Dataset
I have a labeldict like this
{'label_1': 0,
'label_2': 1,
...}
dataset = load_dataset('csv', sep="|", data_files={"train" : train_paths, "test" : test_paths})
output:
DatasetDict({
train: Dataset({
features: ['id', 'data', 'label', 'langue'],
num_rows: ...
})
test: Dataset({
features: ['id', 'data', 'label', 'langue'],
num_rows: ...
})
})
Did all before in the notebook and when I try to do this:
trainer = Trainer(
model,
args,
train_dataset=encoded_dataset["train"],
eval_dataset=encoded_dataset[validation_key],
tokenizer=tokenizer,
compute_metrics= compute_metrics,
callbacks=[MLflowCallback()]
)
trainer.train()
I have the error: The following columns in the training set don't have a corresponding argument in CamembertForSequenceClassification.forward and have been ignored: langue, id, data. IndexError: tuple index out of range
What can I do ?
I have trained a newsmap model in the Newsmap package for quanteda in R and am trying to export the large dictionary it constructed based on my corpus (not the seed dictionary).
I have tried this code, but it only gives me the 10 most associated terms per country in a list format, which I also fail to extract in order to form a dictionary object I can use in R.
Dict <-coef(model)
I would really appreciate any and all help!
You only need to extract the names of the vectors with desired number of words passed to n.
> quanteda::dictionary(lapply(coef(model, n = 1000), FUN = names))
Dictionary object with 226 key entries.
- [bi]:
- burundi, burundi's, bujumbura, burundian, nkurunziza, uprona, msd, nduwimana, hutus, tutsi, radebe, drcongo, rapporteur, elderly, mushikiwabo, generation, kayumba, faustin, hutu, olga [ ... and 980 more ]
- [dj]:
- djibouti, djibouti's, djiboutian, western-led, pretty, photo, watkins, ask, entebbe, westerners, mujahideen, salvation, osprey, persistent, horn, afdb, donors, ismael, nevis, grenade [ ... and 980 more ]
- [er]:
- eritrea, eritreans, eritrean, keetharuth, issaias, eritrea's, binnie, sheila, somaliland, catania, mandeb, brutal, sicily's, lana, horn, lampedusa, aman, afdb, donors, monitoring [ ... and 980 more ]
- [et]:
- ethiopia, ethiopian, addis, ababa, addis, ababa, hailemariam, desalegn, ethiopians, maasho, ethiopia's, mandeb, igad, dibaba, genzebe, mesfin, bekele, spla, shrikesh, laxmidas [ ... and 980 more ]
- [ke]:
- kenya, kenyan, nairobi, nairobi, uhuru, lamu, mombasa, mpeketoni, kenyans, kws, nairobi's, akwiri, ruto, westgate, kenyatta's, mombasa, makaburi, kenyatta, kenya's, ol [ ... and 980 more ]
- [km]:
- comoros, mazen, emiratis, oil-rich, canterbury, lahiya, shoukri, gender, wadia, lombok, brisbane's, entire, christiana, blahodatne, everest's, culiacan, kamensk-shakhtinsky, protestants, pk-5, parwan [ ... and 980 more ]
[ reached max_nkey ... 220 more keys ]
I have the following data:
479117.562500000 -100.366333008
479117.625000000 -100.292800903
479117.687500000 -100.772460937
479117.750000000 -101.344261169
479117.812500000 -102.828948975
479117.875000000 -103.842330933
479117.937500000 -102.289733887
479118.000000000 -101.856155396
479118.062500000 -101.972282410
479118.125000000 -101.272254944
479118.187500000 -101.042846680
479118.250000000 -101.957427979
479118.312500000 -101.363922119
479118.375000000 -101.065864563
479118.437500000 -99.710098267
479118.500000000 -98.789115906
479118.562500000 -99.854644775
479118.625000000 -100.956558228
479118.687500000 -100.456512451
479118.750000000 -100.779090881
479118.812500000 -101.598800659
479118.875000000 -100.329147339
479118.937500000 -100.486946106
479119.000000000 -102.275772095
479119.062500000 -103.128715515
479119.125000000 -103.075996399
479119.187500000 -103.266349792
479119.250000000 -102.390190125
479119.312500000 -101.386428833
479119.375000000 -102.008850098
479119.437500000 -103.579475403
479119.500000000 -103.382720947
479119.562500000 -100.842361450
479119.625000000 -98.478569031
479119.687500000 -98.745864868
479119.750000000 -99.653961182
479119.812500000 -100.032035828
479119.875000000 -99.955345154
479119.937500000 -99.842536926
479120.000000000 -100.187896729
479120.062500000 -100.456726074
479120.125000000 -101.258850098
479120.187500000 -102.649017334
479120.250000000 -104.833518982
479120.312500000 -102.760551453
479120.375000000 -101.653732300
479120.437500000 -102.729179382
479120.500000000 -102.752014160
479120.562500000 -103.103675842
479120.625000000 -102.842521667
479120.687500000 -102.692077637
479120.750000000 -102.499221802
479120.812500000 -101.806587219
479120.875000000 -102.124893188
479120.937500000 -101.700584412
479121.000000000 -101.385307312
479121.062500000 -101.242889404
479121.125000000 -100.172935486
479121.187500000 -100.230110168
479121.250000000 -100.861007690
479121.312500000 -101.013366699
479121.375000000 -100.585502625
479121.437500000 -100.897743225
479121.500000000 -101.453987122
479121.562500000 -102.233383179
479121.625000000 -102.231163025
479121.687500000 -99.512817383
479121.750000000 -97.662391663
479121.812500000 -97.647987366
479121.875000000 -100.217674255
479121.937500000 -102.411224365
479122.000000000 -101.892311096
479122.062500000 -102.475875854
479122.125000000 -103.164466858
479122.187500000 -103.406997681
479122.250000000 -104.319549561
479122.312500000 -102.138801575
479122.375000000 -99.946632385
479122.437500000 -100.355888367
479122.500000000 -101.683120728
479122.562500000 -101.582458496
479122.625000000 -99.907981873
479122.687500000 -100.329666138
479122.750000000 -100.243255615
479122.812500000 -100.713218689
479122.875000000 -102.436210632
479122.937500000 -103.173072815
479123.000000000 -103.720008850
479123.062500000 -105.225852966
479123.125000000 -104.841903687
479123.187500000 -103.589698792
479123.250000000 -101.543907166
479123.312500000 -101.051879883
479123.375000000 -103.181671143
479123.437500000 -104.825492859
479123.500000000 -103.848281860
479123.562500000 -102.969032288
479123.625000000 -101.002128601
479123.687500000 -100.698005676
479123.750000000 -102.078453064
479123.812500000 -103.582519531
479123.875000000 -105.085006714
479123.937500000 -103.349472046
479124.000000000 -100.479156494
479124.062500000 -100.558197021
479124.125000000 -101.563316345
479124.187500000 -101.261054993
479124.250000000 -102.108535767
479124.312500000 -104.861206055
479124.375000000 -105.044944763
479124.437500000 -105.712318420
479124.500000000 -105.045219421
479124.562500000 -104.131736755
479124.625000000 -104.060478210
479124.687500000 -103.435829163
479124.750000000 -103.167121887
479124.812500000 -102.186767578
479124.875000000 -101.180900574
479124.937500000 -101.686195374
479125.000000000 -102.167709351
479125.062500000 -102.771011353
479125.125000000 -103.367576599
479125.187500000 -103.127212524
479125.250000000 -103.924591064
479125.312500000 -103.187667847
479125.375000000 -102.220222473
479125.437500000 -102.674034119
479125.500000000 -101.717445374
479125.562500000 -100.879615784
479125.625000000 -100.964996338
479125.687500000 -102.864616394
479125.750000000 -102.009140015
479125.812500000 -99.761398315
479125.875000000 -99.798591614
479125.937500000 -101.713653564
479126.000000000 -103.273422241
479126.062500000 -102.664245605
479126.125000000 -101.682983398
479126.187500000 -101.853103638
479126.250000000 -103.193588257
479126.312500000 -104.359184265
479126.375000000 -105.037651062
479126.437500000 -104.446434021
479126.500000000 -103.674736023
479126.562500000 -103.374031067
479126.625000000 -102.921363831
479126.687500000 -103.374008179
479126.750000000 -104.299362183
479126.812500000 -104.015937805
479126.875000000 -103.758834839
479126.937500000 -103.698440552
479127.000000000 -103.501396179
479127.062500000 -101.677307129
479127.125000000 -101.010841370
479127.187500000 -103.159111023
479127.250000000 -105.232284546
479127.312500000 -105.949432373
479127.375000000 -104.999694824
479127.437500000 -104.207763672
479127.500000000 -103.822082520
479127.562500000 -103.189147949
479127.625000000 -102.943603516
479127.687500000 -102.586914062
479127.750000000 -102.973297119
479127.812500000 -104.049942017
479127.875000000 -106.436325073
479127.937500000 -105.395500183
479128.000000000 -106.032653809
479128.062500000 -106.538482666
479128.125000000 -105.961471558
479128.187500000 -106.049240112
479128.250000000 -104.937507629
479128.312500000 -104.842300415
479128.375000000 -104.720268250
479128.437500000 -105.791313171
479128.500000000 -106.022468567
479128.562500000 -103.848289490
479128.625000000 -103.887428284
479128.687500000 -104.258583069
479128.750000000 -105.152420044
479128.812500000 -107.673591614
479128.875000000 -107.705734253
479128.937500000 -105.925376892
479129.000000000 -105.528671265
479129.062500000 -106.021476746
479129.125000000 -107.750610352
479129.187500000 -108.693489075
479129.250000000 -108.675323486
479129.312500000 -109.919746399
479129.375000000 -110.940391541
479129.437500000 -109.279312134
479129.500000000 -108.321495056
479129.562500000 -107.995155334
479129.625000000 -109.164222717
479129.687500000 -111.977653503
479129.750000000 -113.194961548
479129.812500000 -114.239585876
479129.875000000 -115.780212402
479129.937500000 -116.979713440
479130.000000000 -117.042602539
479130.062500000 -116.658126831
479130.125000000 -116.624031067
479130.187500000 -116.923446655
479130.250000000 -118.727882385
479130.312500000 -120.354904175
479130.375000000 -121.513587952
479130.437500000 -121.322601318
479130.500000000 -121.338325500
479130.562500000 -120.500923157
479130.625000000 -116.656593323
479130.687500000 -113.295486450
479130.750000000 -111.713729858
479130.812500000 -111.394592285
479130.875000000 -109.731071472
479130.937500000 -108.571876526
479131.000000000 -109.059860229
479131.062500000 -106.810707092
479131.125000000 -106.095306396
479131.187500000 -106.258293152
479131.250000000 -106.243156433
479131.312500000 -106.613525391
479131.375000000 -105.910820007
479131.437500000 -104.405731201
479131.500000000 -102.325592041
479131.562500000 -101.502128601
479131.625000000 -103.445144653
479131.687500000 -105.970573425
479131.750000000 -105.379684448
479131.812500000 -102.992294312
479131.875000000 -100.679176331
479131.937500000 -99.553001404
479132.000000000 -100.532035828
479132.062500000 -102.480346680
479132.125000000 -104.630592346
479132.187500000 -103.669296265
479132.250000000 -101.364990234
479132.312500000 -100.193199158
479132.375000000 -98.483375549
479132.437500000 -98.084083557
479132.500000000 -100.955741882
479132.562500000 -102.788536072
479132.625000000 -102.540054321
479132.687500000 -102.550140381
479132.750000000 -101.182907104
479132.812500000 -100.926239014
479132.875000000 -100.933807373
479132.937500000 -101.358642578
479133.000000000 -100.544723511
479133.062500000 -99.536102295
479133.125000000 -99.533355713
479133.187500000 -100.520698547
479133.250000000 -99.944213867
479133.312500000 -100.118461609
479133.375000000 -101.425323486
479133.437500000 -102.523521423
479133.500000000 -102.540794373
479133.562500000 -101.491882324
479133.625000000 -100.919067383
479133.687500000 -100.623329163
479133.750000000 -99.431541443
479133.812500000 -99.252487183
479133.875000000 -101.166763306
479133.937500000 -102.311378479
479134.000000000 -101.306701660
479134.062500000 -100.665534973
479134.125000000 -100.248069763
479134.187500000 -99.179161072
479134.250000000 -100.506088257
479134.312500000 -101.349990845
479134.375000000 -101.028564453
479134.437500000 -101.089591980
479134.500000000 -100.819961548
479134.562500000 -100.899681091
479134.625000000 -102.236335754
479134.687500000 -101.911392212
479134.750000000 -101.253051758
479134.812500000 -102.417984009
479134.875000000 -101.647750854
479134.937500000 -100.494926453
479135.000000000 -99.920089722
479135.062500000 -101.046142578
479135.125000000 -102.893470764
479135.187500000 -102.895072937
479135.250000000 -103.607261658
479135.312500000 -104.568321228
479135.375000000 -104.253341675
479135.437500000 -102.952842712
479135.500000000 -101.928634644
479135.562500000 -101.746994019
479135.625000000 -102.218338013
479135.687500000 -102.627662659
479135.750000000 -102.185234070
479135.812500000 -103.266464233
479135.875000000 -104.480552673
479135.937500000 -102.991035461
479136.000000000 -101.333572388
479136.062500000 -102.019165039
479136.125000000 -100.434211731
479136.187500000 -99.072113037
479136.250000000 -100.616592407
479136.312500000 -101.648803711
479136.375000000 -102.449073792
479136.437500000 -103.141990662
479136.500000000 -101.611976624
479136.562500000 -101.742187500
479136.625000000 -102.974266052
479136.687500000 -101.894943237
479136.750000000 -101.637077332
479136.812500000 -101.545288086
479136.875000000 -101.042068481
479136.937500000 -101.836784363
479137.000000000 -103.539382935
479137.062500000 -105.681159973
479137.125000000 -102.126953125
479137.187500000 -98.450904846
479137.250000000 -98.859046936
479137.312500000 -102.353157043
479137.375000000 -105.606437683
479137.437500000 -104.589866638
479137.500000000 -103.607994080
479137.562500000 -102.202362061
479137.625000000 -101.861511230
479137.687500000 -101.010215759
479137.750000000 -100.456481934
479137.812500000 -101.639465332
479137.875000000 -102.876907349
479137.937500000 -103.880729675
479138.000000000 -105.811225891
479138.062500000 -106.915206909
479138.125000000 -108.233901978
479138.187500000 -106.625434875
479138.250000000 -103.686866760
479138.312500000 -102.977874756
479138.375000000 -105.153343201
479138.437500000 -106.966751099
479138.500000000 -104.752532959
479138.562500000 -104.894256592
479138.625000000 -105.125381470
479138.687500000 -102.721633911
479138.750000000 -102.299125671
479138.812500000 -102.762176514
479138.875000000 -101.316398621
479138.937500000 -100.695121765
479139.000000000 -101.257949829
479139.062500000 -102.382209778
479139.125000000 -104.331405640
479139.187500000 -106.033081055
479139.250000000 -105.467399597
479139.312500000 -104.492301941
479139.375000000 -104.413681030
479139.437500000 -103.263702393
479139.500000000 -103.199569702
479139.562500000 -104.447860718
479139.625000000 -104.169952393
479139.687500000 -105.357246399
479139.750000000 -105.624694824
479139.812500000 -104.329673767
479139.875000000 -104.890480042
479139.937500000 -103.739471436
479140.000000000 -102.343170166
479140.062500000 -102.630371094
479140.125000000 -103.861930847
479140.187500000 -102.614120483
479140.250000000 -102.544586182
479140.312500000 -103.947563171
479140.375000000 -104.194770813
479140.437500000 -103.187141418
479140.500000000 -102.442695618
479140.562500000 -103.064849854
479140.625000000 -104.047111511
479140.687500000 -103.641082764
479140.750000000 -104.192665100
479140.812500000 -105.001426697
479140.875000000 -106.180221558
479140.937500000 -106.504646301
479141.000000000 -104.772674561
479141.062500000 -104.167114258
479141.125000000 -102.925132751
479141.187500000 -102.731872559
479141.250000000 -104.101806641
479141.312500000 -104.532470703
479141.375000000 -103.677726746
479141.437500000 -103.467483521
479141.500000000 -104.314605713
479141.562500000 -106.088348389
479141.625000000 -105.849678040
479141.687500000 -104.784294128
479141.750000000 -104.685859680
479141.812500000 -102.816184998
479141.875000000 -103.009178162
479141.937500000 -105.581695557
479142.000000000 -104.964607239
479142.062500000 -103.978279114
479142.125000000 -104.709609985
479142.187500000 -105.373786926
479142.250000000 -105.477348328
479142.312500000 -107.076698303
479142.375000000 -108.599830627
479142.437500000 -107.518699646
To which I want to fit the function
While the formula is kind of a beast, it has physical meaning, so I would like to not change it.
I have the following code:
index_min <- which(mydf[,2] == min(mydf[,2]))[1]
n0start <- -119
n1start <- 16
df0start <- 120
df1start <- 1
f0start <- mydf[index_min,1]
f1start <- mydf[index_min,1]
plot(x=mydf[,1],y=mydf[,2])
eq = function(f,n0, n1, f0, f1, df0, df1){ n0+n1*4*(f-f1)^2/(4*(f-f1)^2+(4*((f-f0)/df0)*(f-f1)-df1)^2)}
lines(mydf[,1], eq(mydf[,1],n0start, n1start, f0start, f1start, df0start, df1start), col="red" )
res <- try(nlsLM( y ~ n0+n1*4*(f-f1)^2/(4*(f-f1)^2+(4*((f-f0)/df0)*(f-f1)-df1)^2),
start=c(n0=n0start, n1=n1start,f0=f0start,df0=df0start,f1=f1start,df1=df1start) , data = mydf))
coef(res)
As you can see, the starting values look rather decent, but I get the "singular gradient matrix at initial parameter estimates" error. I have looked through all the other posts, however, I don't see why my formula is overdetermined or why the starting values should be bad.
Okay, I figured out the mistake. nlsLM requires data to be a data-frame and not just a bare matrix. The error message is simply misleading.