Cleaning Up Scraped Usernames From LinkedIn

One of the things that pentesters will do during an engagement is generate a list of potential usernames to use in various attacks on the target organization. There have been several utilities developed over the years which attempt to gather this info from LinkedIn, but I have not had great success with them for one reason or another. This is certainly not to discredit any of them, but I’ve typically resorted to scraping LinkedIn manually, copying/pasting entire pages into a text file, adding in a little of SublimeText’s magic, and coming out on the other side with a user list. A bit time consuming? Sure…but most of the organizations I’ve dealt with don’t have thousands and thousands of users…so it wasn’t overly painful.

Either way…it wasn’t overly painful, but it also wasn’t exactly great. A few weeks ago, I remembered reading a new technique for doing this, posted by Carrie Roberts over at the Black Hills Information Security blog:

Gathering Usernames from Google LinkedIn Results Using Burp Suite Pro

In short, this worked flawlessly. My target organization is small, but within a few minutes of setting up Burp, and crawling the results, I had a decent list of about 120 users ready to go. Now of course, this is in raw format, not in the final format we’ll want for usernames. In other words, the file contained “John Smith” and “Jane Doe,” and we’ll need “jsmith” and “jdoe” for our final format. The format will vary depending on the organization, but first initial last name is still very common. So how can we take the raw output and get it into a list of usernames or emails that we can work with in Burp or other utilities?

As Carrie points out, you can then take that raw data and import it into Excel, but I thought it might be useful to show some of the quick “cleaning” methods I use to make that import as easy and clean as possible

Let’s say that after going through the process, we end up with a text file containing the following 5 users we found at SomeRandomCompany Inc.: (all completely fictitious, of course)

John Smith
Linda Rogers Jones
Jane Doe
Lisa Campbell-Smith
Fred Rogers - MBA

These results are representative of the output you will get. Some people put their degree info in their last name field, some users have hyphenated last names, and some users have 3 or sometimes 4 components to their name. So what can we do with this? If we copy/paste into Excel, everything will appear in a single column…that’s no good. Here’s what I ended up doing that worked fairly well. (Note: This was in SublimeText, so YMMV depending on your text editor)

  1. “Find –> Find” and search for a space followed by a hyphen
  2. Click “Find All”
  3. “Edit –> Text –> Delete to End” – This should strip all of the junk from the end of the records that have a degree or other hyphen something in their last name.
    • Note – if your list is large, you may lose a few hyphenated last names depending on how they are spelled out.
  4. “Find –> Replace” and search for a single space.
  5. In the replace field, put a comma.
  6. Click “Replace All”

At this point, our list should like this:


Save the file, then import that into Excel as comma delimited:

“Data –> Get External Data –> Import Text File”


Next: (you can leave the column types as General also)

Finish. We end up with:

Typically, what I’ll do is make each hyphenated user into 3, like this:

Once we have our list with just 2 columns, thenĀ  in C1, use the following formula (Again, this is if your username format is first initial, last name)


You can apply that formula to all of column C by highlighting it, and clicking “Fill –> Down”

And finally, we end up with:

I’m guessing there are 100 other ways to do this, but I’m not a sed/awk ninja by any means. Hopefully this may help someone out there. I know it might be a little overly simplified and might not work quite as well with larger lists, but this will at least get you started.