Microformats for webscraping

What are they?
Microformats are a set of standard approaches to HTML/XHTML markup which allows data inteded for an end user on a browser to be easily accessed by a machine. Think universal API for certain types of data baked into a websites markup. You probably use them everyday without even realizing it. For example hCard is a microformat for representing contact information. It shares the exact same formatting as vCard which is a standard used nearly everywhere contact information is utalized.

Read More

I built trellogd for your job search

Today I learned how to pip!
Not in the sense of using it as a package manager, that’s water well under the bridge. I learned how to pip to share what I’m doing and contribute to the open source community, and it feels fantastic!

I’ve been working on my job search, and I have been using Trello for the purpose of organization and record keeping. If you aren’t familiar with it I suggest you check it out. It’s a simple, clean, and incredibly flexible hub for organization of anything you can think of. What’s better is they have a great API and there is a good stable python wrapper for interacting with it.
Given all that, I wanted to create a simple command line tool to act as a pipeline from a glass door job listing, to a consistent, organized, and feature rich addition to the trello board of my choosing.

Read More

Jupyter Notebook Automator

I use Jupyter notebook almost exclusively in my exploration phase. It’s only when I get to the point of needing to refactor into modules or run web applications that I move over to Sublime text. If you work with python, or any other of the rapidly expanding list of supported languages, I suggest you do the same.

Read More

Password strength with NLP

Latent Semantic Analysis re-imagined
Account takeover attacks are incredibly lucrative for criminals. The average stolen credit card can be bought and sold on the dark web for pennies, while compromised accounts fetch $3-$20 each depending upon what is accessible through them. Additionally the extent of possible damage to the victim is far greater; accounts can be drained, identities stolen personal data scrapped, the list goes on. Firms have conflicting incentives when it comes to pushing strength. Any friction in the customer acquisition pipeline is universally abhorred, and as is the case, there is a real resistance to implementing strong measures. At the same time, High profile account takeovers, are incredibly damaging to a firms reputation, not to mention bottom line, as fraudulent attacks from a previously trusted source are next to impossible to prevent. The fine line balancing act present in everything related to security and fraud prevention is certainly in full force here.

Read More

How predictable... Diving into OKCupid data

Dating Data: Who are you?
Dating sites are now ubiquitous. Whether you think that’s a good thing or a bad thing is up to you. The fact of the matter is, online dating has seen a significant increase in popularity over the last decade, and resulted in an industry worth over $2 Billion. One thing that’s difficult to refute is dating apps create a whole lot of interesting data. Which is why when faced with the challenge of building out a project using classification algorithms, what better place to peek under the hood of the human condition than a bunch of people looking for love… or something like that.

Read More

Metis clock T+ 79.78 hours

What a week. I was excited walking into this, but I can confidently say that has only increased. As a relatively new blogger, it’s not really a format that I’m entirely comfortable with yet, however part of the reason I started this whole adventure was to shake up my comfort zone, so why not here too!

Read More