I was in my local pub quiz recently when as usual we were faced with the anagram speed round. The quiz-master Dr Paul reads out a collection of words with a clue to a mystery person’s identity, and the first team to correctly shout out the answer wins a spot prize. It’s often a box of Tunnock’s Teacakes – so well worth the effort!
We were told on this particular occasion that the jumbled letters ‘OR ANAL MR BOND’ (!) were an anagram of a famous actor. As the words are being read out the well-worn technique is to write them in a “cloud” on the page, and then try to decipher the correct name. This can often take some time, but on this night myself and a couple of other teams almost instantaneously shouted “Orlando Bloom!”. One of the easiest anagrams we’d had, I thought. But I was wrong. It wasn’t Orlando Bloom, and so we had to get to thinking again. It turned out that the correct name we were after was old-Godfather himself, Marlon Brando.
I had to double-check that Orlando Bloom didn’t fit the original character set – that indeed, Orlando Bloom and Marlon Brando aren’t anagrams of each other. They’re close, but they’re different. This got me to wondering whether there are any famous actors who’s names are anagrams of each other, and to investigate this I took to Python.
The Internet Movie Database (or IMDb for short) is an encyclopedia of all things film and TV. On the page http://www.imdb.com/list/ls058011111/ I found a list of the “Top 1000 Actors and Actresses” created by user hagennelson. This forms the basis for the following research.
Since the list spreads over four pages we first need to download the elements and concatenate them into a single pandas data frame.
import pandas as pd frame_list = [] for start in [1, 251, 501, 751]: URL = "http://www.imdb.com/list/ls058011111/?start=" + str(start) + "&view=compact&sort=listorian:asc" dfs = pd.read_html(URL, match="Name", header=0, index_col=0) frame_list.append(dfs[0]) df = pd.concat(frame_list) df.index = df.index.astype(int)
The next task is to create a new column containing the alphabetically sorted characters of the actors’ names:
df['Chars'] = df['Name'].apply(lambda x: ''.join(sorted(x.lower().strip().replace(" ", ""))))
We can see the effect of this in the top five rows of the data frame using df.head()
.
One quick way to check for anagrams now is just to look for duplicates in the Chars column. If we have 1000 unique character strings then we know that we have no anagrams at all. In fact it turns out that we have only 997 unique character strings, so it looks like we have three pairs of anagrams.
One quick way to check for anagrams now is just to look for duplicates in the Chars column. If we have 1000 unique character strings then we know that we have no anagrams at all. In fact it turns out that we have only 997 unique character strings, so it looks like we have three pairs of anagrams.
997 Name Chars # 292 Neve Campbell abceeellmnpv 846 Neve Campbell abceeellmnpv Name Chars # 773 Paul Scofield acdefillopsu 949 Paul Scofield acdefillopsu Name Chars # 708 Patrick Wilson aciiklnoprstw 954 Patrick Wilson aciiklnoprstw
Unfortunately what we actually find above is that three of the actors appear erroneously on the list twice each! The actual list of 997 actors do not contain any self-anagrams whatsoever.
Given that we weren’t able to find anagrams, and that the downloaded list contains the actors birth dates, I thought it would be fun to see whether any of the famous thespians share the same date of birth. This turns out to be far more fruitful.
First we need to rid ourselves of the duplicated actors discovered above.
df = df.drop_duplicates(subset='Name')
Similar to above we now wish to find out which, if any, birth dates are duplicated within our dataframe.
df_duplicates = df[df.duplicated(subset='Birth Date')][['Name', 'Birth Date']] for bd in df_duplicates['Birth Date']: print df[df['Birth Date']==bd][['Name', 'Birth Date']].to_csv(header=False)
195,Julianne Moore,3 December 1960 199,Daryl Hannah,3 December 1960 238,Lena Headey,3 October 1973 292,Neve Campbell,3 October 1973 397,Anna Paquin,24 July 1982 493,Elisabeth Moss,24 July 1982 145,Glenda Jackson,9 May 1936 571,Albert Finney,9 May 1936 494,Adrianne Palicki,6 May 1983 587,Gabourey Sidibe,6 May 1983 523,Tom Wilkinson,5 February 1948 668,Christopher Guest,5 February 1948 248,Gillian Anderson,9 August 1968 673,Eric Bana,9 August 1968 379,Ellen Page,21 February 1987 779,Ashley Greene,21 February 1987 786,Thandie Newton,6 November 1972 827,Rebecca Romijn,6 November 1972 747,Keri Russell,23 March 1976 836,Michelle Monaghan,23 March 1976 753,Djimon Hounsou,24 April 1964 871,Cedric the Entertainer,24 April 1964 77,Angelina Jolie,4 June 1975 920,Russell Brand,4 June 1975 629,Catalina Sandino Moreno,19 April 1981 932,Hayden Christensen,19 April 1981 425,Chris Pine,26 August 1980 946,Macaulay Culkin,26 August 1980 913,Justin Bartha,21 July 1978 953,Josh Hartnett,21 July 1978 828,Michelle Rodriguez,12 July 1978 964,Topher Grace,12 July 1978 858,John Leguizamo,22 July 1964 970,David Spade,22 July 1964
And above we get a set of 17 pairs of actors from our original list who were born on exactly the same day. To pick out just three pairs we have:
3 December 1960: Julianne Moore and Daryl Hannah.
21 February 1987: Ellen Page and Ashley Greene.
4 June 1975: Angelina Jolie and Russell Brand.
You just never know — this could all come in useful at a pub quiz one day!
Leave a Reply