Tip#66: Searching for author names in PubMed (an overview, headaches, and an Excel shortcut!)

As a librarian, a common part of my job is to generate reports for departments on their publication outputs. Namely, this consists of creating a PubMed (or sometimes Scopus) search based on a list of names provided to me by the department.

After a few years of these requests, I've been able to compile some insights, and, perhaps most importantly, shortcuts to speed up the process in PubMed.  As a note, I will only be discussing generating the author names search string, and not how to structure affiliation search strings (which is a whole other can of worms!).

Basic structure of a PubMed author name search

Author searches in PubMed are structured as last name followed by initials (e.g., Wilson P[au] OR Wilson PJ[au]). PubMed automatically applies truncation to the last initial in the name without the need for truncation (in fact, using truncation with an author name can cause issues with the search, as discussed in the following section). This automatic truncation feature helps to capture records where middle initials are and are not given in the records (e.g., it will retrieve records authored by P. Wilson , P.A. Wilson, P.J. Wilson, etc.). This feature is particularly helpful if you don't know the authors' middle initials.

If you know an author's middle initial or an author does not have or use a middle initial, you can turn off this automatic truncation feature by using quotation marks. For example, searching "Wilson P"[au] OR "Wilson PJ"[au] will retrieve records by P. Wilson or P.J. Wilson, only.

Some…interesting PubMed quirks

As evinced by this blog on multiple occasions, databases never like to make things too simple, and PubMed has thrown in a few quirks.

A. Troublesome truncation

As noted previously, PubMed automatically truncates after the last initial without you needing to use an asterisk. If you do use an asterisk, however, it does some strange things. For example, searching Wilson P*[au], in addition to retrieving records by P. Wilson, will retrieve records by F. Perry Wilson, A. Peter R. Wilson, etc. It will also retrieve records from authors with multiple last names, such as H.E. Wilson-Perez. After puzzling it over with my fellow UX Caucus blogging team, we're thinking the truncation is retrieving additional records where the author has a first initial followed by a second, non-last name starting with "P," and also records where the second last name starts with a "P." Why? We haven't figured that out just yet (if you have, please let us know!).

B. Automatic Term Mapping headaches

Another quirk I've found is PubMed will sometimes apply automatic term mapping on names it can't find. Even more frustrating, however, it doesn't notify you when it decides to do this! For example, if I were to search Example S[au] I don't receive any errors, but when I go to the advanced search and expand the details, PubMed ran the search as follows: 

Search: Example S[au] ("example"[All Fields] OR "examples"[All Fields]) AND S[Author] Translations Example: "example"[All Fields] OR "examples"[All Fields]

As any search professional can imagine, this can cause a lot of clutter! So be sure to double check how PubMed is interpreting your author searches in the back end, even if you don't see an error on the results page!

To correct for this issue, you can put quotation marks around the name (e.g., "Example S"[au]); however, you are going to miss instances where a middle initial is included in the record, so it may be in your best interest to hunt down that middle initial to combine with OR.

C. Authors with hyphenated last names

Lastly, if your authors have hyphenated last names you will need to include both names in order for their records to be retrieved in PubMed. For example, if an author's name was H. Wilson-Perez, searching Wilson H[au] OR Perez H[au] won't retrieve that author's publications. Instead, you will need to search Wilson Perez H[au] or Wilson-Perez H[au]. Interestingly, one of my fellow UX bloggers found that searching without a space between the last names seems to work as well, e.g., WilsonPerez H[au].

Excel as a shortcut

Before creating a publication tracking search, I require departments to provide me with an Excel sheet that has the researchers' last names in Column A, first initials in Column B, and, when possible, middle initials in Column C. This formatting facilitates my using Excel codes to generate the author search string for me (which can certainly be a blessing for departments with over a hundred researchers!).

A template of my Excel sheet can be found here. If you download the sheet, you can simply paste your list of last names, first initials, and middle initials into Columns A through C (respectively), and it should auto-generate the PubMed searches for you in Column G.

Once you've generated those search strings:

  1. Copy the entire G column (starting with G2 and ending with whatever cell the final name in your list is)
  2. Paste into a Word doc as text only
  3. Hit CTRL H on your keyboard
  4. In the find box, insert this text: ^p
  5. Leave the replace box blank
  6. Click "replace all"
  7.  Delete the last OR

And that's it! You now have a batch author search in no time flat, which you can then combine with your affiliation search hedge. This method has saved me countless hours when running searches for departments with 100+ names.

Comments

Popular posts from this blog

Tip #59: Getting Up Close and Personal with Database Proximity Syntax

Tip #1: Bulk export from Google Scholar

Favorite Features & Sneaky Solutions: A (Second) Database Tips Lightning Round: View the recording!