Tip#66: Searching for author names in PubMed (an overview, headaches, and an Excel shortcut!)
As a librarian, a common part of my job is to generate reports for departments on their publication outputs. Namely, this consists of creating a PubMed (or sometimes Scopus) search based on a list of names provided to me by the department.
After a few years of these requests, I've been able to
compile some insights, and, perhaps most importantly, shortcuts to speed up the
process in PubMed. As a note, I will only be discussing generating the author names search
string, and not how to structure affiliation search strings (which is a whole
other can of worms!).
Basic structure of a PubMed author name search
Author searches in PubMed are structured as last name
followed by initials (e.g., Wilson P[au] OR Wilson PJ[au]). PubMed automatically applies
truncation to the last initial in the name without the need for truncation (in
fact, using truncation with an author name can cause issues with the search, as
discussed in the following section). This automatic truncation feature helps to
capture records where middle initials are and are not given in the records
(e.g., it will retrieve records authored by P. Wilson , P.A. Wilson, P.J.
Wilson, etc.). This feature is particularly helpful if you don't know the
authors' middle initials.
If you know an author's middle initial or an author does not have or use a middle initial, you can turn off this automatic truncation feature by using
quotation marks. For example, searching "Wilson P"[au] OR "Wilson
PJ"[au] will retrieve records by P. Wilson or P.J. Wilson, only.
Some…interesting PubMed quirks
As evinced by this blog on multiple occasions, databases
never like to make things too simple, and PubMed has thrown in a few quirks.
A. Troublesome truncation
As noted previously, PubMed automatically truncates after the last initial without you needing to use an asterisk. If you do use an asterisk, however, it does some strange things. For example, searching Wilson P*[au], in addition to retrieving records by P. Wilson, will retrieve records by F. Perry Wilson, A. Peter R. Wilson, etc. It will also retrieve records from authors with multiple last names, such as H.E. Wilson-Perez. After puzzling it over with my fellow UX Caucus blogging team, we're thinking the truncation is retrieving additional records where the author has a first initial followed by a second, non-last name starting with "P," and also records where the second last name starts with a "P." Why? We haven't figured that out just yet (if you have, please let us know!).
B. Automatic Term Mapping headaches
Another quirk I've found is PubMed will sometimes apply automatic term mapping on names it can't find. Even more frustrating, however, it doesn't notify you when it decides to do this! For example, if I were to search Example S[au] I don't
receive any errors, but when I go to the advanced search and expand the
details, PubMed ran the search as follows:
As any search professional can imagine, this can cause a lot of clutter! So be sure to double check how PubMed is interpreting your author searches in the back end, even if you don't see an error on the results page!
To correct for this issue, you can put quotation marks
around the name (e.g., "Example S"[au]); however, you are
going to miss instances where a middle initial is included in the record, so it
may be in your best interest to hunt down that middle initial to combine with
OR.
C. Authors with hyphenated last names
Lastly, if your authors have hyphenated last names you will need to include both names in order for their records to be retrieved in PubMed. For example, if an author's name was H. Wilson-Perez, searching Wilson H[au] OR Perez H[au] won't retrieve that author's publications. Instead, you will need to search Wilson Perez H[au] or Wilson-Perez H[au]. Interestingly, one of my fellow UX bloggers found that searching without a space between the last names seems to work as well, e.g., WilsonPerez H[au].
Excel as a shortcut
Before creating a publication tracking search, I require
departments to provide me with an Excel sheet that has the researchers' last
names in Column A, first initials in Column B, and, when possible, middle
initials in Column C. This formatting facilitates my using Excel codes to generate
the author search string for me (which can certainly be a blessing for departments
with over a hundred researchers!).
A template of my Excel sheet can be found here. If you
download the sheet, you can simply paste your list of last names, first initials,
and middle initials into Columns A through C (respectively), and it should
auto-generate the PubMed searches for you in Column G.
Once you've generated those search strings:
- Copy the entire G column (starting with G2 and ending with whatever cell the final name in your list is)
- Paste into a Word doc as text only
- Hit CTRL H on your keyboard
- In the find box, insert this text: ^p
- Leave the replace box blank
- Click "replace all"
- Delete the last OR
And that's it! You now have a batch author search in no time flat, which you can then combine with your affiliation search hedge. This method has saved me countless hours when running searches for departments with 100+ names.
Comments
Post a Comment