joomla stats

Archive for March, 2012

Data-Mining Craigslist (No Boys Allowed)

Moving home is never fun, and that seems to go double in a big city. First, you ask around your friends, and most likely, although none of them are moving, they all know a couple of people who have a room or are looking for someone else to move with. One-by-one you talk to these pre-vetted strangers and quickly discover the vast range of requirements that people have. This time around I saw all the usual suspects: people who want to pay more or less rent; live in a different area; move on a different date; or keep a vegan-only kitchen. But I also saw one novel factor: a whole bunch of people who only wanted to live with girls.

New York is a girl-heavy city (unlike say, Middlesborough, a heavy-girl city,) but it’s also a very liberal place. So, I was intrigued as to this trend, and I wondered if it was simply a statistical blip from my small sample size. I threw together a Perl script and scraped tens of thousands of postings on the rooms & shares section of New York’s Craigslist, which is my the most common method of finding roommates in a big city (Gumtree is an alternative in other countries.) I then fed the results into a number of data-mining suites I’ve built or used as part of my day job and waited for the results.

The first thing to do, was to normalise the data. All capitalisation and punctuation is removed, carefully. It’s not enough to simply convert all hyphens and colons to spaces, for example. The code for marijuana on such sites is ’4-20′ or ’420′ (as in “must be 420 friendly”), so splitting this to ’4 20′ is meaningless, but the oddly popular ‘washing-machine’ should clearly be split into two words. Data from Princeton’s WordNet together with some manual effort took care of this. Contractions such as “you’re” need to be expanded, and there are many other boring details to be taken care of. Fortunately, I already have the means to do most of this automatically.

Even after this, the initial data was, always, useless. First, I looked at popular ngrams – sets of n concurrent words – and found predictably boring results.

Most popular trigrams
21% ‘looking for a’
18% ‘to move in’
14% ‘the room is’
11% ‘if you are’
10% ‘i am a’

But, even these had value. It seems the posters aren’t particularly egotistical, with first-person utterances only slightly more common than those in the second-person. Interesting to note was that the first-person statements were definitive – “I am a” or “we will be” – and the second-person statements were surprisingly imperative — “you are a” or “you must”. Despite this, posters were polite, with ‘please’ occurring in 35% of posts. The most common, non-mundane ngram followed by exclamation marks was “Washer and dryer” or “Washer/dryer”, as in “Washer and dryer in building!!” — a rare and desirable treat in New York.

Requirements similar to ‘you must be…’
1. Respectful
2. Clean
3. A good fit (with current roommates/with us)
4. Financially stable/have a job
5. Female

9. Normal

A lexicon unique to the site has also appeared, with the aforementioned “420 friendly” (generally positive, i.e. required) and “bring the party home/back” (generally negative) being rarely frequent. Posters cared far more that a potential roommate was ‘respectful’ than ‘clean’ and were far more likely to identify themselves as ‘white’ than any other ethnicity (and yes, I checked they weren’t simply talking about the colour of the walls). The word ‘lesbian’ was used only twice, compared to hundreds of occurrences of ‘gay female’, a trend of which I was previously unaware, and the word ‘chill’ (laid-back, cool, fun) was very likely to be found within a post littered with errors of grammar and spelling.

Initial exploration over with, I turned my attention to the question that had started this all off: do New Yorkers really find the idea of living with men unbearable? It seems they do. Somewhere between 11-13% of posts explicitly required a female roommate and around 17% preferred one. The only stricter requirement was no pets (around 35%), but this is a tickbox on Craigslist. Only about 10% specified ‘no smoking’, 8% specified ‘no hard drugs’ and 6% ‘no drugs’. Male roommates were required in less than 1% of posts, by comparison.

A ‘female preferred’ example
…NON SMOKER >>>NO DRUGS no exceptions …GAY friendly bc I am Gay . So no closed minds . I am female chef would perfer female bout will consider the right very clean guy.

The above is one of the few examples where any reason whatsoever is given, as for the rest I can simply guess and relate the guesses of some New York friends. One has suggested that there exists a stereotype that gentlemen are ‘obnoxious and slobbish’, which is somewhat corroborated by the above quote. I can’t argue against that one, but we do have redeeming qualities too. Another notes that ‘what if he turns out to be a rapist?’ is, apparently, a question to be asked of a potential new roommate. Whilst Craigslist indeed contains the odd posting warning seekers away from a certain location, including one recent set of posts warning that a man on the Upper East Side tried to rape a girl who came to view an apartment, I feel the average rapist doesn’t sign a 1-year lease with his victims prior to the act.

It isn’t even just girls that only want to live with girls either. One male German banker posts:

I want to live with more girls, my roommates that are leaving are girls and I want to keep the vibe.

Another chap who I’m simply assuming without evidence to be a banker is looking for:

A girl to live in my downtown apartment, very cheap rent, must be willing to be a tease and if we hit it off, maybe more.

Charming, I’m sure.

So, whilst the handful of guys with the ‘females only’ requirement have explained themselves, I’m still at a loss as to the motivations of the girls that specify this. But, as I’m not particularly interested in living with someone who’d make such a distinction, or those who specify ‘straight people only’ (very rare) or ‘I’m not racist but I prefer to live with people from the same background’ (just the one), I can live without knowing.

Next, I intend to continue gathering data from New York Craigslist and, when I have enough, to compare the attitudes in the various boroughs. I also intend on getting data from other cities and seeing how they differ. Are the Californian cities really much more liberal? Do the Aussies care about anything other than the quality of barbecuing facilities on offer?

First though, I need to stop procrastinating and find myself somewhere to live.


posted by newyorkgeek in Uncategorized and have Comments (9)


American pie is a much-vaunted cultural export, a sweet comfort food that Americans from any state can unite behind enjoying and the subject of much regional pride. Don McClean wrote an eight-and-a-half minute long song about this delicious dessert, and a whole series of Hollywood documentaries have been produced detailing the delight that American Pie can bring to school-aged teens. However, in the pie-speaking world, it’s a poor cousin to the other pies that Americans miss out on in their devotion to sweet and sugar snacks, especially their beloved apple pie.

Historically, the pie-polarity across the Atlantic was reversed. With few apple trees available in the Americas, the colonists would much more commonly eat meat pies, and the apples they did have would go towards making (hard) cider instead. Now, cider in America is normally non-alcoholic and served to children, meat pies are a relative rarity and apple pie has become as American as… well, you see my point.

Pie in America is generally a sweet dish, with a double-crust encasing a layer of fruit. Some variants exist, such as deep-dish apple pie, containing only a top-crust, and various bottom-crust tarts, such as the Christmas staple of pecan pie, or The Official Pie of the State of Florida, key-lime pie, whose status was determined by a 38-1 vote in the Senate and a 106-14 vote in the House of Congress. 4 of the 14 naysayers in Congress, it should be noted, later changed their votes in favour of the pie.

I. Summary:
This bill designates key lime pie as the official state pie.
This bill creates section 15.052, Florida Statutes.

II. Present Situation:
Currently, no pie is designated as the official state pie.

Meat pie is a relative rarity in the USA, but they do have one offering that comes close, pot pie (most commonly filled with chicken), which I have yet to have a good experience with. A pot pie is closer to a pasty than a true pie, and generally requires a pie dish to maintain its shape, being fully encased with puff pastry. The interior is often a light stew, not to be confused with British chicken & mushroom or chicken & leek pies.

A true meat pie, however, is an experience not to be trifled with. Instead of a flaky puff pastry, the bottom crust of this pie is a solid, stable crust made with suet (raw beef fat) or lard (raw pig fat), neither of which appear to be readily available in the USA, but can apparently be obtained from some butchers. The inside is a thick gravy, stuffed with meat and bursting at the seams as soon as it is touched. Local pubs and breweries in the UK will often offer a steak & ale pie made with their signature ale and served with mash & peas — there’s no better treat after a day out in the countryside.

Steak Pie

Britain has yet more to offer aside from these. Cornish pasties, which the Cornish claim as their ‘national dish’ – refusing to accept English dominance in true Celtic style – are an inimitable treat and finding one at the bottom of a well-used rucksack that hasn’t been touched for weeks is a cause for immediate celebration. They are, however, best enjoyed warm and from within Cornwall, with the various supermarket brands a mere shadow of the real stuff. A Cornish pasty is generally made with minced beef, potato, swede and onions, wrapped in a thick crust, crimped at one side.

Cornish Pasty

Pork pies likewise confuse the Americans. They’re one of the few meat products which can be brought through US customs without being wrapped in innumerable jumpers and accidentally missed off the customs declaration forms, but when my father brought one over for me when he came to visit, the customs officers were very much intrigued, and spent a long time questioning him over them. Then again, it possibly didn’t help when he replied to the innocent question, “What’s a pork pie?” with “It’s pork. In a pie.” For the record, it’s cooked pork and pork fat, wrapped in pork jelly, sealed in a hot-water crust pastry, generally eaten cold, and it’s lovely.

Pork Pie

Finally, I failed to work this into the narrative, but I still want to take a moment to note that there’s such a thing as a pie bird. In the old days, at kingly feasts, a bird would be placed on top of each pie to indicate the filling, and this led to the concept of the pie bird, which is a decorative cooking aide that allows steam to escape from a cooking pie.

Now, if you’ll excuse me, I’m off to wipe the saliva from my keyboard before it drowns.


posted by newyorkgeek in Uncategorized and have Comment (1)


I’ve mostly cut alcohol out of my life since the start of this year, other than the occasional spree when a friend’s having a party and something tasty and refreshing is on offer. I’m past the point where I feel a need to drink to fit in, or have a good time – goodbye University! – and I’m 95% of the way to being able to go to a bar and order a diet Coke without an overwhelming feeling of shame and emasculation.

It’s partly a health thing, and partly related to the fact that I just wasn’t happy meeting someone for the fourth time and having no recollection of them, or having no memory of the night between 2am and waking up the next morning. The final reason is that, with three hours of Capoeira on Saturday and Sunday, together with a two-hour session once or twice during the week, I simply can’t afford to be hungover and dehydrated. My latest novelty is Bikram Yoga (90 minutes in a room heated to ~105°F/40°C with a humidity of 40%), which I’m aiming to do for 30 days straight – 3 down so far – and less frequently thereafter, and I think I’d pass out if I tried that after a night out.

There are two downsides I’ve found so far to sobriety. The first is that I’m essentially (self-)excluded from certain events: such as those with a high entry-fee and ‘free drinks’, or day-drinking events (such as St. Patrick’s Day in NY) when I’d quickly realise the drunk guy hugging me and telling me he loves me isn’t The Best Guy Ever but just some idiot with a hip-flask. The second, much worse downside, has been that I’m now acutely aware of exactly how long it takes to get home at the end of a night, rather than nodding off happily drunk on the subway or in a cab.

On the plus side, physically, I feel great; mentally I feel a lot more alert and active; and socially I feel exactly as awkward as I did before, but with the advantage that no-one now knows me as ‘the guy who…’ from some ill-advised drunken escapade. Also, – previously the bane of my life with its 7am “You have exceeded your budget for alcohol & bars” message the worst complement to a hangover – tells me I’m saving a surprisingly large amount of money per month, which is always fun.

I’m by no means converted to the hippy ‘your body is a temple’ mentality (I love bacon too much, for starters), but I am completely sold on the benefits of drinking rarely, eating healthy and exercising a lot.


posted by newyorkgeek in Uncategorized and have No Comments

Inventor of the Year

When people ask me what I do for work, I generally just say ‘research’, ‘math(s)’ or ‘data science’ and hope they stop talking. It’s not that I don’t love my job, it’s just that I get really really bored by small talk, and most people don’t really care in the first place.

In the case that the other person’s interest is actually piqued (or, I suppose, that they really have nothing else to say and no-one else to talk to), I go on to describe how I draw on whiteboards, erase what I’ve drawn on whiteboards, make some graphs on a computer, look at the graphs, delete the graphs, and then go home. This, combined with the fact that my occasional stint of working at home is often indistinguishable from staring blankly into the distance, often leads people to think that I do essentially nothing.

As such, it’s always nice when there’s a tangible output to what I’ve done, and people end up using something I’ve invented in their real lives. Last year I filed a couple of patents and had a research paper published, and this year the algorithms behind those are now starting to help experts in cancer research, lawyers and traders to do the things they do.

Less spiritually fulfilling, but much more instantaneously gratifying, I also won the Thomson Reuters Inventor of the Year award for one of the 2011 patents, which comes with a cash prize and the opportunity to subvert professional norms by offering a photo of me doing Capoeira in Central Park as my caption for the announcement.

Capoeira in Central Park

Next goal, get an award that’s actually recognised outside of my company. Or, at least, some new whiteboard pens.


posted by newyorkgeek in Uncategorized and have No Comments

Welsh for Americans (Ear Beer Hoff Bye)

The two national emblems of Wales are the leek (cenhinen) and the daffodil (cenhinen Bedr; or “rock leek”). The national animal – and major contributor to Wales’ clear dominance in the Most Awesome Flag awards – is a dragon. The national language is Welsh (Cymraeg) and the national religion is rugby (rygbi).

Right now, my weekend mornings follow a strict tradition. I roll out of bed at around 10 or 11am, grab a dodgy Internet feed from Qatari Al-Jazeera Sports +3, Argentinian ESPN Plus Vivo, Italian Dolce Sport or some other such provider, and enjoy the rygbi. If it’s a Welsh game, I’ll be wearing my Wales shirt and drinking out of my ‘Wales RWC Heroes 2011′ mug. I’ve never seen a mug commemorating a 4th place finish before, but I love it anyway. Wales are currently leading the 2012 6 Nations tournament, with games held around Europe, and the afternoon games match up nicely with an early morning start in New York.

For the Rugby World Cup last year I watched the games (which kicked off between midnight and 4am) at an Aussie bar, often with new Welsh friends, and some of my Americans. I taught one of my housemates a good portion of the rules of rygbi, and the Welsh national anthem, to sing before the games. Welsh is a tricky language, removed from English or the romance languages and with a number of phonemes that a native English speaker will find very difficult to enunciate. The apparent lack of vowels adds to the confusion, and as such, my roommate decided to write out the Welsh national anthem phonetically. It’s priceless. Click to embiggen.

Welsh National Anthem Phonetic.JPG

Here’s how it’s meant to sound:

Sing along now.


posted by newyorkgeek in Uncategorized and have No Comments

St. Patrick

I’m not religious (sorry mam.) I do, however, have a lot of respect for a number of religions. I grew up as a Roman Catholic and whilst at University, for a while I would visit both Temple, and Prayers at the local Mosque. Both Judaism and Islam were supremely welcoming, willing to offer advice and answer questions and take time out so that I could understand and join with the celebrations, prayers and services. If you’re curious and have some spare time, I heartily recommend contacting a local priest, imam, or rabbi and trying this out for yourself. But, please, remember to be respectful and follow their lead entirely.

In all these religions, the local small congregations offer a wonderful community, and help many people to set moral guidelines and evaluate their situations and choices. However, when I visit anywhere new, I always seem drawn to the monolithic religious buildings, beautiful and imposing as they are. In New York, the Cathedrals are dominated by the nearby skyscrapers, but their strikingly different architectural style still manages to draw the eye, once one is near enough.

My favourite of these is St. Patrick’s Cathedral. The building itself is beautiful, the reflections in the glass skyscraper that runs up and above the side of the Cathedral are truly impressive and the interior is a testament to the glory days of religious hedonism and wealth. St. Patrick’s, rather awesomely, holds more than 2,400 Masses a year, and attracts 5.5 million visitors in the same period. This means the the services are, predictably, rather impersonal, but still more inviting than some I’ve been to in continental Europe. The ministers understand and cater to the international crowd, and tourism is effectively prohibited whilst services are being celebrated.

St Patricks Reflected

Architecture, décor and sanctity aside, however, there’s one reason any New Yorker or tourist should make time to visit St. Patrick’s at least once. At scattered times throughout the week, and always following Sunday morning Mass, organist Donald Dumler will play a postlude, and he is a big fan of J.S. Bach. Hearing Prelude in G Minor or Toccata & Fugue in D Minor played in the environment they were written for is an experience that can’t be recreated and never fails to overwhelm me. If you don’t wish to attend the service, you can simply arrive around 11.30am and wait at the back until the end of Mass.

This visual, spiritual and aural treat, however, isn’t all that St. Patrick gave to New York. Whilst St. David’s Day, feast of the Patron Saint of Wales, went nearly unnoticed here, New Yorkers have an amazing affinity towards St. Patrick. Or, more precisely, towards drinking and reinforcing Irish stereotypes on his feast day.

Hoboken – just over the river into New Jersey – cancelled its annual St. Patrick’s Day parade (held two weeks before the actual day) this year to try and avoid the hellish mess than ensues, but was still the host/victim of an impromptu pub crawl by those that weren’t deterred by the decision. Only 53 people needed ambulance attention this year, which is apparently a statistic meant to inspire praise. Already bars are decorating themselves with shamrocks and tempting punters with offers of ‘green beer’.

Green Beer

As yet there’s no information as to whether the Cathedral will be offering sanctuary to those severely hungover on March 18th. I’ll keep checking.


posted by newyorkgeek in Uncategorized and have No Comments