Cloud OCR Project: Learn to Develop for Windows Azure Today

Yuriy Guts on
R&D engineer and solutions architect at ELEKS

A few months ago, we conducted a Cloud Development training course for our engineers. The attendees learned the main concepts of Cloud Computing, found out about the capabilities of the most popular cloud providers on the market, and learned the key design decisions for building highly available, scalable and fault-tolerant distributed systems in the cloud.

We concluded the course with a simple hands-on project: a Windows Azure service that performs cloud OCR (Optical Character Recognition) on images uploaded by users on a Web interface. The service works in a distributed manner, leveraging a fleet of Web and Worker roles that communicate asynchronously using queues:

The reference implementation is available on GitHub under Apache License 2.0. It uses Windows Azure Storage Tables, Blobs and Queues for data persistence and communication, Tesseract for OCR, and ASP.NET MVC 4 and Twitter Boostrap for the Web application. If you always wanted to learn the basics of implementing Web and Worker Roles for Windows Azure and using cloud storage services, this would be a great project to start. It does not involve a lot of abstraction and advanced features, so it will keep you focused on the important parts.

You’re welcome to fork this project and send us pull requests.
Have fun!