Tip #35: Non-Latin Characters in PubMed

Back in January, I published "Tip #32: Non-Latin Characters in Ovid MEDLINE " which introduced my non-Latin/Roman characters investigation across databases. This week we will look at how PubMed handles the following examples:

  1. 17β-HSD (17 beta-HSD)
  2. cáncer (Spanish for cancer)
  3. سرطان (Arabic for cancer)
  4. diabético (Spanish/Italian/Portuguese for diabetic)

 

17β-HSD

When you run the above search in PubMed, you can see that it automatically transliterates the β to "beta" from the search details. In this case, PubMed can handle Greek characters, so feel free to use them or the transliterated version in your search.

 
 Some potential variations you may also want to consider - phrase with hyphens and/or double quotes, and spacing:

  1. 17beta-HSD - 833 (same as running "17beta HSD")
  2. 17betaHSD - 927
  3. "17 beta HSD" - 281
  4. 17 beta HSD - 374
  5. "17-Hydroxysteroid Dehydrogenases"[Mesh] - 2922
  6. "17 beta-hydroxysteroid dehydrogenase" - 1516
  7. 17 beta-hydroxysteroid dehydrogenase - 2777

  Pay special attention to how PubMed translates each of them! For example, search #7 maps to some additional MeSH terms that may or may not be relevant:

cáncer

 PubMed completely ignores the accent in "cáncer" and runs it as "cancer" including all automatic term mapping.

سرطان  

PubMed was unable to run a search for سرطان  and returned zero results.

And even for articles that do have non-English/non-Latin abstracts...

 

...PubMed doesn't appear to search with the non-Roman/Latin characters.

diabético

This one was interesting because unlike the cáncer example, diabético isn't also an English language term. This term also contains the forward-leaning acute accent over the é. PubMed ignored the diacritical mark over the e, as we saw in the cáncer example, but because "diabético" isn't an English word, it didn't include any additional automatic term mapping. 
 

 The 906 results primarily include the term in author affiliation, non-English titles, bilingual abstracts, and in the non-translated author supplied keywords fields.

Language related notes from PubMed's help documentation:

Non-English abstracts

"PubMed may include non-English abstracts if supplied by the publisher. The abstract text defaults to English when a citation has an accompanying non-English abstract. Links to display the additional language(s) are available on the Abstract display. To retrieve citations with non-English abstracts, use the query hasnonenglishabstract"  

Non-English Language Article Title

Transliterated Title [tt]

"Words and numbers in title originally published in a non-English language, in that language. Non-Roman alphabet language titles are transliterated. Transliterated title is not included in Text Word [TW] retrieval."

Language [la]

"The language search field includes the language in which the article was published. Note that many non-English articles have English language abstracts. You may search using either the language or the first three characters of most languages, e.g., chi [la] retrieves the same results as chinese [la]. The most notable exception is jpn [la] for Japanese."

Title/Abstract [tiab]

"Words and numbers included in a citation's title, collection title, abstract, other abstract and author keywords (Other Term [ot] field). English language abstracts are taken directly from the published article. If an article does not have a published abstract, NLM does not create one."

Comments

Popular posts from this blog

Tip #1: Bulk export from Google Scholar

Tip #23: PubMed's [tiab] vs. [tw]

Tip #4: Ovid MEDLINE Adjacency and Field Tags