Reference Structuring

By Dominic Prakash (LinkedIn)

This is my long time research project. Here I try to split a reference into valid pieces and try to structure it into 3 different valid XML types. As an electronic publishing professional I always find it very challenging when processing or converting different people’s text input. Each one has his own style of writing and of course each one will follow (try to) some kind of style and standard. The purpose of this project is to identify and structure the bibliographic references (citations) without changing any parameters.

  • NLM XML
  • Proprietary Type 1 XML
  • Proprietary Type 2 XML

I would like to hear from you and you are welcome to give your suggestions, more importantly your advice regarding better structuring to dominic (at) dlink.org or through LinkedIn

Sandbox

Copy your references (maximum 10 at a time) in the below text box and select the XML type. Let me know what you think. Your browser may not be able to translate the copied reference properly if it has few UNICODE characters. e.g. [S. Şahin, G. Yalçιn, and M. Übeyli, Ann. Nuc. Energy, 30, 245259, (2003).]

Certain browsers may not display the generated XML properly.(e.g. Crome). If you are not able to see the unicode characters as it is then check your browser setting.

Note: Make sure to use only unicode text below and NOT with junk/encoded characters (“) or entities ( ).

NLM XML
Proprietary XML 1
Proprietary XML 2

Check Journal

RefHit

Reference Samples More

Copy the following sample references and paste them in the above text area. I have about 2000 unique references tested in this system and you will see 5 types in the following box in each page refresh.

The above ramdom samples are gerenally 1. Chapter, 2. Book, 3. Article, 4. Article without title and 5. Miscellaneous/problematic type.

Big reference with 40+ different author names (sample only. Copy it to the above box and try). It has been tested but few users mentioned that their browser cannot handle unicode characters below.

Zhang, XD., J.F. van der Veen, Hu YY., Hu,YY., J.Garcia-Fernandez, A.J.Garcia-Fernandez, Groot, A.D. de, Heerden, J. van, Bisson, P. A., Aaron, P.G., Emden. H.F., Han, C.-H, Mateu, Jaume, SB McMahon, Mackay LE, Rengpipat S., van Emden. H.F., van der Linden, A., Van der Zanden, A.G. M., de Souza ML, R. H. Whittaker, R. Frieden. C.S. Vikram, Stray-Gundersen J, J. M. Kohlmeyer, III. Bennett JP Jr, Roberts LJ 2nd, Smith, II LM., W. Strunk Jr., Strunk, W., Jr., Bagnuolo, W. G., Jr., Bennett, W. R. Jr., La Porta R., Lopez-de-Silanes F., D’Alessandro W., Panchenko, V. Ia., Perdikis, D.Ch., J.F. van der Veen, Stephanie Schmitt-Groh, Maria Eugenia Munoz, Gustavo R. Heudebert., Nelson, James Lindemann., Emonds, Joseph E. (1997). Nervous system – NO production. Brain Res 748 (1-2), 1-11.

Note

  1. Check the heading “Why this structuring engine fails?” below
  2. This web version does not have the PubMed linking feature.
  3. Does not have the journal database cross checking feature
  4. Fails if title comes with multiple lines
  5. In article type of references the volume information is required
  6. In Chapter or Book type of references the publisher and location information is required
  7. References with multiple lines may not work. Or any titles with more “.” will not structure
  8. Reference with month name is partially handled (e.g. S. Reveliotis, M. Lawley, and P. Ferreira, “Polynomial complexity deadlock avoidance”, IEEE Trans. Automat. Contr., 42, 1344-1357, Oct. 1997.)
  9. Conferences and Proceedings may work if it contains publisher and location details
  10. Grouped reference or multiple references in a single line may work only for “Proprietary Type 2” XML selection
  11. If it fails to structure a particular refernce then it could be due to
    a) The reference pattern is not available
    b) Error (possibly two elements merged together) in the reference
  12. In some versions of FireFox the references are getting merged and you may not see a proper output.
  13. You can try 10 reference at a time and 25 times in a day. You can process about 250 references in a day. This is to control the traffic in this site. Please contact me if you would like to use ths software as a service.
  14. Please feel free to contact me [dominic (at) dlink.org] in case if you need any help

More NLM Samples
The following citation samples are collected from NLM site

  1. Herbert JR, Barone J. On the possible relationship between AIDS and nutrition. Med Hypotheses 1988 Sep;27(1):51-54.
  2. You CH, Lee KY, Chey RY, Menguy R. Electrogastrographic study of patients with unexplained nausea, bloating and vomiting. Gastroenterology 1980 Aug;79(2):311-314.
  3. Bedford CD, Harris RN, 3rd, Howd RA, Goff DA, Koolpe GA, Petesch M, Miller A, Nolen HW, 3rd, Musallam HA, Pick RO, et al. Quaternary salts of 2-[(hydroxyimino)methyl]imidazole. J Med Chem 1989 Feb;32(2):493-503.
  4. The Royal Marsden Hospital Bone-Marrow Transplantation Team. Failure of syngeneic bone-marrow graft without preconditioning in post-hepatitis marrow aplasia. Lancet 1977 Oct 8;2(8041):742-744.
  5. Magni F, Rossoni G, Berti F. BN-52021 protects guinea-pig from heard anaphylaxis. Pharm Res Commun 1988 Dec;20 Suppl 5:75-78.
  6. Hoyme HE, Jones KL, Dixon SD, Jewett T, Hanson JW, Robinson LK, Small ME, Allanson J. Maternal cocain use and fetal vascular disruption [abstract]. Am J Hum Genet 1988;43(3 Suppl):A56.
  7. Gardos G, Cole JO, Haskell D, Marby D, Paine SS, Moore P. The natural history of tardive dyskinesia. J Clin Psychopharmacol 1988 Aug;8(4 Suppl):31S-37S.
  8. Ferriero DM, Wong DF, Townsend R, Simon RP. Neurologic complications in infants of cocaine-abusing mothers [abstract]. Neurology 1988;38(3 Suppl 1):163.
  9. Hanley C. Metaphysics and innateness: a psycholanalytic perspective. Int J Psychoanal 1988;69(Pt 3):89-99.
  10. Edwards L, Meyskens F, Levine N. Effect of oral isotretinoin on dysplastic nevi. J Am Acad Dermatol 1989 Feb;20(2 Pt 1):257-260.
  11. More to Follow…

Why this structuring engine fails?

  1. Supports UNICODE characters only. For example if you have entities like &#00E9; the it should be converted into é.
  2. Any month along with year or multiple years
    a. DeGrazia, D. (1998/99), “Animal Ethics…
  3. Any alpha characters along with volume?
  4. In article type, the volume number is mandatory to structure. For example:
    Iyer S, Jayanthi A (2003) Synlett 1125–1128.
  5. Any two elements merged (Journal merged with volume/year etc)
    a. 3rd ed Springer, Wien New York. : Edition and publisherts are merged
  6. Tool expects both Publisher and Location details in certain references
  7. Publisher’s name is missing in the config file
  8. No proper delimiters for Location and Publishers
    a. Book Title, Bloomington, IN, Indiana University Press.
    b. Said R (1962) The geology of Egypt. Elsevier Publication Amsterdam.
  9. If author names are NOT separated with year from the rest of the reference, check for name consistancy.
    Inconsistant Author Names
    a. Whitty, G. Power, S., Halpin, D. : First author initials are not followed by a comma
    b. Rama Chellappa, P.J. Phillips, and Azriel Rosenfeld : Second author initials as abbreviated
    c. Cannot handle names like: Pham Dinh Tao and Le Thi Hoai An, DC optimization algorithms…
    d. Will not structure: Zhang L Xiong C Deng X (1995) J Appl Polym Sci 56:103
  10. Abnormal Italics/bold
    a. <i>Human Values, 16</i> : Wrong
    b. <i>Human Values</i>, <i>16(1)</i> : Will not structure
    c. <i>Human Values</i>, <i>7</i>(<i>1)</i>, 281–304. : Will not
    d. <i>Human Values</i>, <b>16</b> : Can be structured
  11. More than one period (dot) in the title
  12. Check for NDASH in journal names
  13. Volume cannot have too many alphabets (5, A5, 5A, A5-6, 5A-6, 5-6, A5B-6)
  14. <sup> tag/type in edition number will not work: 12thed.,
  15. Check for other languages in author name
    a. Sennewald, K. und Wälde: ‘Und’ should be ‘and’
  16. Part of the reference may not be structured properly
    a. Two elements are joined by a emphasis like italics, bold etc
    <i>Probability Theory. Volume II</i>: Here Volume may not be structured separately.
  17. When a contribution made by an individual and a team – Will not structure
    a. Byles J and the Women’s Health Researchers. Study on Women’s Health. Australas J Ageing 1999;18(3):55-61
  18. Location – US States with multiple dots (N.Y. should be NY.)
    a. Location comes with “and” or “&” (London & New York)
  19. If Publisher names comes with Co, Inc. etc it should end with a “.” (Like Llc. or Inc.) [Update: Works for certain cases]
  20. Publisher’s name with “.” and comes in between:
    a. Pattern and Process. W.H. Freeman: San Francisco, CA.[Update: Works for certain cases]

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload the CAPTCHA.