Systems Projects

Students who have completed courses with me often approach me to ask if I have prepared projects or other materials for learning more about systems topics, or to practice programming the types of projects that we have implemented in class. This page contains project ideas and resources for systems projects in a variety of areas.

Projects and recommendations on this page are in no particular order. You should feel free to browse the page until something catches your eye. Follow up with me if you have questions about how to get started on one of these topics!

This document has become long, so here is a brief table of contents.

Systems Programming

“Systems programming” is a very broad topic, and it is difficult to know both where to begin and where to end on this front. The other sections of this page (and in particular, Operating Systems) follow more or less directly from the material in CSE 220: Systems Programming, and it is surprisingly feasible to move directly from systems programming into those topics, but there are certainly potential stops along the way.

Exercises in Programming

A good way to learn more about systems programming is to simply do some programming. That programming need not necessarily even be systems programming.

A great place to start is reimplementing some of the classic Unix utilities; the text-processing utilities like cat, cut, and wc are a great place to start (see their man pages or the POSIX Utilities specification for descriptions). Books like Software Tools by Kernighan and Plauger describe similar tasks. More systems-oriented Unix utilities like ls or find (although find is … quite complicated) give practice with interacting with the OS itself. A classic project is of course to implement a shell, although a complete POSIX shell is probably not a reasonable goal for a student project; limit yourself to simple pipelines and redirection of standard input/output/error to start with!

The exercises from The C Programming Language by Kernighan and Ritchie or another C or C++ textbook are reasonable practice. For that matter, if you haven’t simply sat down and worked through K&R, now it a great time to do that!

Programming challenges like Advent of Code are a good choice, as well – pick a language or system that you’d like to learn, and work through some of the past years’ challenges in that language. Work through assignments from CSE courses that you have previously taken in a new language or on a new system. Or simply reimplement using the knowledge you’ve learned since the first time.

For exercise with binary files, image and audio formats are an easy place to start, and produce satisfying results. For example, the Windows BMP, TARGA, Baseline TIFF, and PBM/PGM/PNM image formats are all relatively simple binary formats consisting of a small amount of metadata and essentially raw pixel information in some particular format. The NetPBM package, available in most Linux distributions, has converters for all of these file formats and most Linux image viewers can view them. The Microsoft WAV and Sun AU file formats are similar in spirit, but store digital audio. Given one of these image or audio formats, reasonable projects would be a converter between two formats, an image or audio filter, or simple signal processing of the data (such as edge detection on an image, or a spectrum display for audio). Choosing a well-understood problem like edge detection or a spectrum display allows you to use an existing library or implement a documented algorithm to focus on the binary data manipulation, or to derive and implement some solution yourself if you so choose.

Programming Languages

Learning a new programming language is always a good way to learn more about the operation of a computer system. It is true that some languages are considered more “systems-y” than others, but there is a lot to be learned about systems topics from even very high level languages.

Rust

Rust is a very interesting systems-oriented programming language with a wealth of online information. The Rust playground lets you try many of the Rust tutorials in your web browser. The Rust book is available for free online, or can be purchased in print. Rust provides many language features associated with traditional functional programming languages, along with implicit memory management, but is careful to maintain predictable running time and low run-time complexity with a small library footprint, making it suitable for even embedded systems development. On the other hand, it was originally developed for implementing critical parts of the Mozilla Project’s browsers, so it is also appropriate for large desktop application development!

Go

Go is another interesting systems-oriented programming language that targets a different place in the world than Rust or C. It has implicit memory management provided by garbage collection like an interpreted or virtual machine language, but uses strong typing and compiles to native code like a systems language. It is widely used in distributed systems and microservice development. It also has an online playground that allows you to complete Go tutorials in your browser, as well as the well-regarded Go Programming Language book. Go was developed at Google for large systems implementation projects by Robert Greisemer, Rob Pike, and Ken Thompson (the latter two veterans of AT&T’s Bell Labs), and shares a lot of attitude with C, Unix, Plan 9, Inferno and other Bell Labs projects.

Forth

Forth is a fascinating language originally used for development for astronomical systems that has seen success in embedded systems, workstation and desktop computer ROM monitors, video game programming, and many other fields. It is simultaneously a very low level (it is really only comfortable manipulating whatever the platform-native word is, everything else is built on top of that) and extremely high-level (it has an interpreted interactive development interface) language. Due to a long-lived and prolific support organization called the Forth Interest Group, there are numerous open source implementations of Forth under the name FIG-Forth for a staggering variety of platforms. The book Starting Forth (freely available) is an excellent introduction to the language, and the book Thinking Forth (freely available) and the the Moving Forth (freely available) articles provide a great jumping off point for a Forth implementer. There are also modern implementations of Forth like Gforth for application development. Because Forth is a small and notionally simple language, fascinating educational exercises like Jonesforth provide a brilliant introduction to how a language, run-time system, and development environment can be implemented in very approachable assembly language. Planckforth and Sector Forth are examples of just how little is required to bootstrap a Forth environment.

Scheme

Scheme is a programming language in the Lisp family which is in some sense dubiously a systems programming language, but in another sense the quintessential systems programming language. It is a very high level, garbage-collected, interpreted (sometimes) language, but its usage in the book Structure and Interpretation of Computer Programs (freely available), a truly excellent text on all facets of computer science, sets it apart as a spectacular language for learning about computing systems. There are several robust implementations of Scheme, as well as a wide variety of related languages such as Racket and Emacs Lisp. Common Lisp, a later dialect of Lisp, is available as SBCL and CMUCL, and there are entire operating systems written in Common Lisp (in the tradition of Lisp Machine Lisp). Similar to Forth, there are many quite small implementations of Scheme and Lisp, such as SectorLISP and uLisp.

Systems Programming Books

There are a number of excellent systems programming books. There are also many excellent books about programming that apply at least as much to systems programming as they do to other programming.

  • Advanced Programming in the Unix Environment, by W. Richard Stevens. This is a book about the intricacies of Unix programming, including common portability concerns (some of which are less problematic now than they were when the book was written!) and the details of some of the more complicated system interfaces.
  • The Practice of Programming by Brian W. Kernighan and Rob Pike. This is a book about general programming practices and techniques, but it includes many details and tips that are helpful to the systems programmer.
  • The Design and Evolution of C++ by Bjarne Stroustrup. While it has become dated with respect to the design of modern (certainly post-C++17) C++, the principles and design considerations detailed in this book are timeless. Even if you are not a huge C++ fan (I am not!), the rigor of design and careful reconciliation of conflicting desires described in this book are illuminating.

Operating Systems

After learning a bit about systems programming, a good next stop is operating systems. There are at least two major tactics to learning from operating systems: using and exploring operating systems that might teach you something about systems design, and modifying or implementing operating systems yourself. Fortunately, there are many operating systems that allow you to either one, or even both!

Exploring Existing Systems

There are many complete operating systems available for exploration, and quite a few of them are either open source or code-available. Some even have associated textbooks, although several of those will be covered in the next section, about implementing and modifying operating systems.

A good way to try out an operating system is to install it in a virtual machine or an emulator. Several of these operating systems will run on commodity PC hardware, which makes installing them in VMware, Virtual Box, or qemu-kvm a very reasonable option. As the systems get older, that becomes more difficult (even if they ran on PC hardware at the time!), and you may need an emulator or simulator. Where this is necessary, I have linked to appropriate software for the task.

Sometimes using the operating system is not as important as learning a little bit about it. Start by reading up on the system and trying to decide what it has to offer.

Linux

Linux is a great place to start if you are not currently primarily a user of a Unix-like operating system. I recommend Debian as a friendly and capable system, but there are many reasonable distributions out there. Pick one, and go to town!

FreeBSD

FreeBSD is another great choice for a primary-use operating system in the Unix family. Unlike Linux, FreeBSD shares a direct lineage from Bell Labs’ Unix system, although that heritage is an academic notion in the 21st century. Unlike Linux, which is an operating system kernel collected together with hundreds of utilities and necessary packages from other providers (such as the Free Software Foundation’s GNU Project), FreeBSD is a complete operating system from a single code base. This makes it rather easier to understand and build from scratch than a Linux system, and also means that the source code is in some sense cleaner – FreeBSD utilities need only work with FreeBSD, not with whatever system they might be built for! There is a famous book on the FreeBSD system that has been relatively recently revised (and the first edition is a true classic).

NetBSD

NetBSD is interesting for many of the reasons that FreeBSD is, and is in some ways even simpler. It also runs on an absolutely staggering number of platforms. If you have a computer of any kind, there’s a very good chance that there is a port of NetBSD.

Plan 9

Plan 9 from Bell Labs is an operating system designed as a follow-on system for Unix, intended to represent a new operating system design reflecting all of the lessons learned between the Unix of the 70s and the modern, 32-bit, networked world of the late 1980s. Its influence is visible in most modern operating systems (including, without question, Linux and BSD) and in programming languages like Go. Plan 9 takes the Unix “everything is a file” philosophy to its logical conclusion, representing even computing services, graphical displays, and network connections as files and filesystems. The 9front project is a (perhaps rather idiosyncratic) fork of Plan 9 under continued development. Ports of many of the Plan 9 user space tools are available for Unix in the form of Plan 9 from User Space and 9base, one or both of which may be available on your existing Unix-like system.

BeOS

BeOS (or The BeOS, as it was styled in the R4.5 media I have) was an operating system developed from scratch by Apple expats in the early 1990s. By that time it was obvious that the Apple Macintosh operating system (then called simply System 6) was not a long-term viable operating system, and that it was being left behind by modern hardware and competing OS products. (Interestingly, it would take another 10 years for Apple to replace it with Mac OS X, despite this; System 7 through Mac OS 9 enhanced its capabilities, but did not fundamentally leave its limitations behind.) However, the founders of Be felt like the other products on the market were either equally disappointing in technical merit or lacking in aesthetics and user experience. Thus BeOS was created as a modern, multitasking operating system with a native graphical environment, strong multimedia capabilities, and a fresh, object-oriented system API. BeOS 5 Personal Edition is freely available for evaluation, although it is increasingly difficult to install. An open source reimplementation of BeOS called Haiku is available and runs easily in virtualization.

V6 Unix

Version 6 Unix is a classic operating system for student analysis. It was the first version of Unix to be widely distributed outside of Bell Labs, and it was distributed mostly to academic and research institutions. Complete source code is available (the Unix History repository is a fantastic way to access a large number of Unix releases), and the system can be installed on the SIMH historical computer simulator from pre-installed software images from that site or tape images downloaded from The Unix History Project Archives. Installation and configuration of V6 requires some care for the modern user, but there are many guides online. The Lions Commentary on Sixth Edition Unix by John Lions is an excellent book about the Unix kernel available both online (freely available) and in modern reprint; copies of the original printings are hard to come by. V6 Unix is an excellent example of a real operating system that is quite small and very understandable in its entirety by a single developer.

Implementing and Modifying Operating Systems

It is absolutely reasonable to write a small operating system from scratch! This process can be greatly simplified (relative to the pain of yesteryear) by buying an embedded systems development board that can be targeted with a modern symbolic debugger. However, you probably don’t want to start by implementing a fresh operating system from scratch. Fortunately, there are a number of well-documented educational systems available for your use and study.

Much like exploring the operating systems described above to learn their lessons in design and interaction, it may not be necessary to implement, port, or develop for these systems to learn some of the lessons that they have to teach you. Look at their documentation, peruse the textbooks, and think about why they are built the way they are built. Several of these operating systems are available with source code to run on more than one platform; look at the differences between the platforms and the way that classic OS problems such as scheduling, block I/O, and memory management are handled on different architectures and in different systems.

Xinu

Xinu is a small operating system almost as venerable as Unix designed for educational use by Douglas Comer, and it is documented in an excellent textbook to accompany the system. Xinu has been ported to dozens of architectures, and porting Embedded Xinu to a new platform is a very manageable task. (I have ported it twice!) The Xinu philosophy borrows somewhat from the Unix tradition, but rather than saying “everything is a file,” Xinu says “everything is a device”. The Xinu APIs and basic system organization will seem familiar to the student of Unix. The Xinu book is available for a wide variety of architectures, from PDP-11 to x86 to MIPS and ARM.

MINIX

MINIX is a second small operating system designed for educational use by Andrew Tanenbaum, possibly most well-known in the 21st century for serving as the inspiration for Linus Torvalds’s initial implementation of Linux. (Pre-1.0 Linux used the MINIX filesystem as its native filesystem, for example!) Like Xinu, an excellent textbook describes MINIX. Architecturally, MINIX is a microkernel system, which means that the core operating system kernel is a very small piece of code, and many of the services that are typically thought of as part of the operating system kernel are more like applications that run along side the kernel.

Xv6

Xv6 is a reimplementation of Version 6 Unix on the x86 architecture in an embedded system style. A commentary on xv6 (freely available) in the spirit of John Lions’ commentary on V6 is also available. A student who wishes to explore V6 Unix in a somewhat more gentle environment may wish to start with xv6.

Write an OS

What defines an “operating system” is a matter of perspective, but for educational purposes a small system that either abstracts a few devices or provides thread scheduling (or, of course, both!) is a plausible and educational goal. One or the other of these tasks can probably be accomplished in just a few hundred lines of code. It is important to choose your platform carefully, however, as the reset semantics of some platforms are very complicated. ARM Cortex-M and the PDP-11 both have very simple and understandable boot sequences and may be a reasonable place to start; I recommend ARM if you want to start with C development, and the PDP-11 if you’d like to try your hand at assembly. Both can be run in simulation if you wish, and ARM development boards (see, for example, STM32, below) are readily available.

Operating Systems Books

Several of the operating systems listed above are associated with excellent textbooks or design books. There are also many classic texts on operating systems. Some particularly good examples are:

  • Operating Systems Design: The Xinu Approach by Douglas Comer. There are many editions of this book for different platforms. They have different strengths and weaknesses, but they are all good. Choose either for least price or most relevance to your direct interests, you’ll be fine either way.
  • Operating Systems Design and Implementation by Andrew S. Tanenbaum. There are several editions of this book, as well, and they’re all good.
  • The Design and Implementation of the FreeBSD Operating System by Marshall Kirk McKusick and George V. Neville Neil. Comer and Tanenbaum each describe a small operating system kernel in exhaustive detail, but this book describes a large general purpose kernel at a somewhat higher level of operation. It does dive into deep details in many places, but it also covers a lot more ground in a similar volume of text.
  • Lions’ Commentary on UNIX 6th Edition by John Lions. This book is available in print, and the commentary and source code are both available online, as well. It is a complete annotated listing of the Version 6 Unix kernel from 1975 (or thereabouts) with a description of all of the major components and their operation. It is more or less the model for the above books, although they describe purpose-built educational operating systems, while it describes a research system designed for general use.
  • Operating Systems Concepts by Abraham Silberschatz, Peter B. Galvin, and Greg Gagne. There are also ten editions of this book, and once again, edition is not critical. While the preceding books in this list focus on the specific implementation of a particular operating system (with source included), this is a more conceptual book.
  • The Design of the UNIX Operating System by Maurice J. Bach. Another classic text, this book explores the similarities and differences in implementation between several Unix implementations during the substantial fragmentation of Unix in the 1980s. As a bonus, it touches on some architectural considerations for interesting 32-bit systems that students may not encounter today (such as VAX and SPARC).

Embedded Systems

Embedded systems are more approachable now than they have ever been before, and present an excellent opportunity to do systems programming. Development boards are available for a few dollars, and can be run without any operating system or libraries at all for the ultimate “bare metal” programming experience. Powerful, free real-time operating systems are readily available to develop sophisticated applications. Many even quite high-level programming languages have been ported to many popular platforms.

Hardware Platforms

It is difficult to choose an embedded systems platform, but there are a few good starting points. The problem is typically that there are simply so many to choose from, so it is hard to know where to begin.

Arduino

Arduino and Arduino-compatible boards are readily available and have excellent support for many devices. There are education-focused vendors like Adafruit and Sparkfun that provide a wealth of Arduino-compatible devices, peripherals, and software libraries. The Arduino environment can become constraining, and the Arduino IDE is somewhat painful, but still it is a reasonable starting point. Native Arduino code is written in a simple subset of C++.

STM32

I like the STM32 line of ARM Cortex-M microcontrollers, and in particular their Discovery development boards. They provide high-level, pleasant-to-use peripheral devices and a wide variety of attached external devices on affordable boards. The ST-provided programming libraries can be frustrating, but their hardware documentation is excellent and reasonably clear and it is a reasonable educational exercise to program the bare microcontroller.

ESP32

ESP32 is a proprietary architecture with poor English-language system documentation (although that has improved a lot), but excellent connectivity including Wi-Fi and Bluetooth, and excellent support from many open source libraries, languages, and products. An ESP32 development board is only a few dollars, and can be programmed in Lua, Python, and other high-level languages as well as C and C++.

Raspberry Pi Pico

The Pi Pico is another ARM Cortex-M device, based on the RP2040 microcontroller. It has excellent software support and solid documentation from the Pi Foundation, and the RP2040 chip is also available in many more-or-less compatible boards with different feature sets. It is easily programmed in C, C++, or Python, and other languages are available with a little bit of work.

Software Platforms

Once you have a hardware platform, you will need software to run on it. The typical starting points are things like the Arduino IDE or Mbed, and those are a good place to start. However, once you have a more complicated project, or are ready to learn more, you are likely to want to move to an open source real-time operating system (RTOS) and a standard compiler or language run-time such as the GNU C compiler or Python.

The learning curve for moving to such a system is higher than starting with a pre-packaged system, so you may wish to start out with Arduino or Mbed and move to a custom system once you have some comfort developing for your platform of choice.

GCC on Bare Metal

It is reasonable to program many microcontrollers using GCC, without a C library and without any particular external support. You probably would not want to do this for a commercial product (there are advantages to using preexisting systems, for sure!), but as an educational exercise it can be very rewarding. Most Linux distributions have a C compiler for the arm-none-eabi platform, which will build for Cortex-M microcontrollers with no operating system. Software such as OpenOCD will let you send your program to the controller’s flash storage, either directly or from the gdb debugger.

FreeRTOS

FreeRTOS is a popular and widely-ported open source RTOS. It has support for most common microcontrollers and can be built by, and developed for, using gcc.

Python

Both MicroPython and Circuit Python run on many popular microcontrollers, and let you develop in a constrained Python 3 environment. The Pi Pico is capable of running MicroPython code out of the box.

Port an OS

Operating systems like FreeRTOS, Xinu and xv6 can be ported with relatively little effort. Exactly how much or how little effort varies, and it requires significant knowledge of the target platform. There are, however, few ways to learn either the platform or the operating system more intimately than porting!

Embedded Systems Books

I am not aware of as many excellent embedded systems books as I am books on some other topics, unfortunately. Certainly several of the systems programming and operating systems apply directly to this topic.

  • Making Embedded Systems by Elecia White. One of the great values of this book is that it covers some of the hurdles involved with getting connected to your embedded system to begin with. Relatedly, Elecia White’s Embedded.fm podcast is an excellent resource.

Architectures

Learning about processor and system architectures is a powerful way to learn more about operating systems and systems programming. There are certain architectures which are, in my opinion, particularly illuminating for one reason or another.

PDP-11

The PDP-11 (I recommend the 1976 04-55 or 1981 handbook; the former is simpler but the latter describes every PDP-11 processor) is the system on which the C programming language and the “modern” Unix system was developed. It is, however, just old enough that its complexity is much more comparable to a modern microcontroller than a microprocessor computer architecture, and it has a particularly elegant CISC instruction set. As described above under Version 6 Unix, it can be simulated in the SIMH historical computer simulator. It is reasonable to read the entire PDP-11 architecture handbook front-to-back and understand almost all of it, which is something that cannot be said for most modern architectures.

ARM

ARM is a RISC architecture (to contrast with PDP-11’s CISC) that was revolutionary when it was introduced in 1985 and is still one of the most common and influential architectures in modern systems. There are numerous ARM architecture manuals available on the linked documentation site; I recommend the Cortex M3 or M4 as a good starting point, although if you have chosen to look at an ARM embedded systems development board, you would do well to use that architecture as your reference!

The Raspberry Pi is an easy way to get a high quality ARM board capable of running Linux, and the Raspberry Pi Foundation has a wealth of free educational material for using the Pi (or the Pi Pico mentioned above, an embedded ARM board) to solve all sorts of problems and develop all sorts of software. A reasonable Pi setup is only about $50, making it an affordable way to play with desktop ARM.

RISC-V

RISC-V is a very interesting “new” architecture that is completely open hardware. This makes it readily accessible for study, of course. The RISC-V ISA Specification is fascinating reading. There are several textbooks either on the RISC-V architecture itself or using RISC-V for pedagogical purposes, including a version of Hennessy and Patterson’s Computer Organization and Design.

Compilers

[Watch this space.]

Networking

There are many aspects of networking to learn, and most of them have a strong systems component. I personally think that a solid understanding of network configuration and management is a powerful tool for a systems programmer, although most CSE-type degrees do not focus on this. Learning about such topics is much easier if one has access to a handful of computers and some networking equipment, but it can also be done in simulation. Another important aspect of networking for the systems programmer is network and socket programming, and using (in particular, TCP/IP) network connections in software.

Understanding Networks

Learning about and understanding computer networks is more of a textbook activity than a programming activity. Some textbooks that I would particularly recommend are:

  • Internetworking with TCP/IP Volume I: Principles, Protocols, and Architecture by Douglas Comer. There are many editions of this book, and they are all excellent. This is the most readable “big picture” TCP/IP networking book that I am aware of.
  • TCP/IP Illustrated, Volume 1: The Protocols by W. Richard Stevens. There are two editions of this book, as well. As I keep saying, edition is not so important. This book has more details and specific protocol information than Comer; I think Comer is a better introduction, but Stevens offers a bit more for the implementer.

The standards that comprise the Internet are almost all open, as well. They are mostly Requests for Comments (RFCs) published by the IETF. Some good starting points are:

  • RFC 791: Internet Protocol
  • RFC 793: Transmission Control Protocol
  • RFC 5681: TCP Congestion Control
  • RFC 8200: Internet Protocol, Version 6 (IPv6) Specification

You will also want to explore “real” traffic on networks. Good tools for this are tcpdump (which can be used to watch brief text summaries of network traffic or save network traffic to disk) and Wireshark (which is excellent for analyzing network packets and protocols either from saved dump files or in real-time on the network).

Network Configuration and Management

As mentioned above, hands-on network configuration and management is often neglected in CSE education (it tends to be left to MIS and IT degrees), but I think it is a valuable and important topic for the network programmer. Building and configuring networks does require some commitment to actual hardware manipulation, as simulation can only take you so far. Fortunately, networking equipment that is “outdated” by commercial standards can make fine educational platforms, and small embedded systems like the Raspberry Pi can rapidly populate a complex network configuration.

The student of hands-on networking would do well to acquire:

  • A computer with multiple (ideally 3 or more) Ethernet ports to serve as a router and network simulator
  • A wired Ethernet switch, preferably managed
  • A wireless access point
  • Several additional computers (which may be something like Raspberry Pi systems)

None of these devices need to be new, fast, or powerful. Craigslist, eBay, the local e-waste facility, and cast-off equipment from local businesses are perfect sources. The goal is to be able to place them in several configurations and observe how they interact. Software packages like Mininet can help expand your logical network to a larger scope than the set of physical network components that you have on hand.

I recommend spending some time to set up at least the following protocols and services:

  • IPv4 and IPv6
  • DHCP and DHCPv6
  • Network Address Translation (NAT)
  • A DNS server (even if only as a proxy)
  • An HTTP/HTTPS (they require different knowledge!) web server
  • An HTTP proxy
  • Some interior gateway protocol, even as simple as RIP

Network Programming

Given that a network is already set up, taking advantage of the network is a different problem. This is a problem that we address more effectively in CSE education, fortunately. There are several aspects of network programming for the systems programmer to consider, from implementing the networking stack itself (e.g., TCP and IP) to writing applications on top of the network stack.

Writing Network Applications

Network application programming is part of almost any software engineering job, either from the point of view of the service provider or the application implementer (or both). Many modern programming languages have simple yet powerful network abstractions. However abstract network handling becomes, it is nonetheless different from local I/O and requires the programmer to address a different set of concerns.

Socket programming in C is a very low level approach to network programming with a much less comfortable interface than higher level languages like Python or Go, but it provides the programmer with a good view of how the protocols underneath are actually doing their jobs. The book Unix Network Programming: Volume 1 by W. Richard Stevens is the quintessential overview of this topic. While I find its primary value for the experienced programmer to be as a reference, it is structured as a textbook and contains both explanatory material and valuable exercises.

Higher-level languages often remove much of the tedium of resolving hosts and making network connections, which is convenient for the network programmer. If you feel like you have a good understanding of the underlying protocols, or you are not as interested in that part of the stack, implementing network protocols in Python or Go (or another language with a mature network library) will be more pleasant than socket programming in C. In particular, connecting to a remote computer by its hostname is a single function call in these languages, versus a multi-step process using baroque structures in C.

Deciding what application to implement depends on what it is that you want to learn as a network programmer. Classic Internet protocols tend to use relatively simple text-based interactions, while newer protocols may have much more complicated data structures or exchange binary data. I recommend starting with the former, as they are typically easier to debug. The Wireshark “follow stream” functionality is also very helpful for debugging.

Classic applications for implementation include:

  • Discard and Echo are the quintessential first projects. While they do have (rather humorous, if you ask me) RFC standards, a discard server essentially reads and ignores all incoming data, while an echo server reads and returns all incoming data unchanged. The standard implementations run on low-numbered ports, which requires root privileges, so playful implementations should probably run on non-standard ports > 1024.
  • An IRC client is a very common project, although IRC is not as popular now as it once was. It is a relatively simple real-time chat protocol which suffers from poor standardization but is typically quite forgiving of implementation errors.
  • A HTTP/1.1 client or server. Implementing a client is a good first step, as modern web browsers can be difficult clients for a naïve server implementation. A client like curl makes a good test subject for a small server.
  • MQTT is a relatively simple protocol for small publish/subscribe services often used in Internet of Things devices. An MQTT client is a good binary protocol client that can be readily tested against an existing MQTT implementation (such as Mosquitto) with very little application logic, while an MQTT server is a good opportunity to build a long-lived service that must manage some persistent (but relatively simple) state and handle unexpected client arrivals and departures.
  • XMPP is a much larger, XML-based real-time messaging protocol with publish/subscribe features. A complete XMPP client is a very large project, but a small client or “bot” may be reasonable. The Prosody XMPP server is a good implementation against which to develop a client.

Implementing a TCP/IP Stack

A full network stack implementation is a very large amount of work. In particular, TCP is quite a complicated protocol. However, a minimal stack consisting of ARP, IP, and UDP is feasible. The most difficult part of this process is often implementing the device drivers for a network device; this can be mitigated by choosing to ride the stack on top of a serial port using PPP or a similar protocol, or by choosing an embedded device and associated operating system that has a working Ethernet device driver. Embedded Xinu or FreeRTOS may be able to provide this.

The books and RFCs listed in Understanding Networks will be valuable in this endeavor, and both Comer and Stevens have an associated Volume II that works through an implementation. (Comer’s implementation is on Xinu, while Stevens, if I recall correctly, is a BSD.)

To give an idea of the scope of this idea, note that implementing the above-mentioned minimal ARP, IP, UDP stack over Ethernet is a common semester project for a second graduate networking course. It is certainly not a trivial task, but that does mean it can be accomplished in a semester!

Distributed Systems

[Watch this space.]