IEnumerable Is Not A List!
By James Charlesworth
18 March 2020
Demystifying the IEnumerable interface in .NET
Have a close look at the following C# function and see if you can spot what's wrong
public void LogLongAndShortNames(IEnumerable<string> names)
{
foreach (var name in names.Where(x => x.Length > 10))
Console.WriteLine($"{name} is a long name");
foreach (var name in names.Where(x => x.Length < 5))
Console.WriteLine($"{name} is a short name");
}
It's simple right? We are taking in a list of names, looping through the long names and writing them to the console, then looping through the short names and writing them.
Except we aren't.
That names
parameter is not a List<string>
at all, it is an IEnumerable<string>
and they are not the same thing.
From the Microsoft docs
List<T> Class
Represents a strongly typed list of objects that can be accessed by index.
… and …
IEnumerable<T> Interface
Exposes the enumerator, which supports a simple iteration over a collection of a specified type.
The important distinction here is that while a List<T>
is an actual collection of objects stored somewhere in memory, IEnumerable<T>
is simply something that exposes an enumerator. Nothing more. There is nothing about a class implementing IEnumerable<T>
that says it must contain the same data each time you enumerate it. There is nothing that specifies the data is even held in memory. It just exposes the enumerator, which supports a simple iteration over a collection of a specified type.
Let's take another look at that function, but this time I'm going to turn ReSharper on and let it analyse the code for me.
public void LogLongAndShortNames(IEnumerable<string> names)
{
foreach (var name in _names_.Where(x => x.Length > 10))
Console.WriteLine($"{name} is a long name");
foreach (var name in _names_.Where(x => x.Length < 5))
Console.WriteLine($"{name} is a short name");
}
Those squiggly lines are ReSharper warnings telling us about a Possible multiple enumeration of IEnumerable
Since the names
parameter is specified as just something that exposes an enumerator, the function may not work entirely as expected. For example, what would happen if I passed in an instance of the following:
public class AlternatingEnumerable : IEnumerable<string>
{
private readonly List<string> _setA = new List<string> {"aaa", "bbb", "ccc"};
private readonly List<string> _setB = new List<string> { "aaaaaaaaaaaa", "bbbbbbbbbbbbbbb", "ccccccccccccccc" };
private bool flip = false;
public IEnumerator<string> GetEnumerator()
{
return (flip = !flip) == true ? _setA.GetEnumerator() : _setB.GetEnumerator();
}
IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}
}
Here AlternatingEnumerable
is an implementation of IEnumerable<string>
that returns a different set of strings each time you enumerate it. Here is the result...
LogLongAndShortNames(new [] { "aaaaaaaaaaa", "bbb" });
// outputs...
// aaaaaaaaaaa is a long name
// bbb is a short name
LogLongAndShortNames(new AlternatingEnumerable());
// outputs nothing!
Weird, I know, but entirely possible. And if I were writing that LogLongAndShortNames(...)
function for another developer to call then you can bet money that one day somebody will pass in something weird and your function will not act the way you expected.
Now you might be thinking this AlternatingEnumerable
is not realistic, but there are real world situations like this. What if your enumerable was reading from a file and the file were changed between each enumeration? What if it were reading from an un-buffered HTTP stream that can only be read once? You can't ever rely on the data being the same each time it is enumerated, or even being able to enumerate it more than once in the first place. You should write your code so that is not an issue.
So why do we love lists and arrays?
Attribute Substitution may be to blame for this. When we deal with interfaces in our code we often subconsciously replace the abstract interface with something more concrete for the purposes of holding all the logic in our minds. The following pattern is common in the .NET enterprise software world
public class MyService : IMyService
{ }
public class MyOtherService
{
public MyOtherService(IMyService myService)
{
// It is common, but incorrect, to think of myService as an instance of the concrete class MyService
}
}
Our minds don't deal so well with the idea that myService
is an abstract behaviour and not an actual object backed by specific code we can read. An interface can have so many different implementations it's just too much to have to consider, we like to think of it as a real class with a known implementation.
So when it comes to IEnumerable<T>
we equally like to think of it as something simple, like a list or an array, as opposed to what it really is - something abstract and vague. The problem is, it is not good practise to just accept a List<T>
or an array as an input parameter to our function.
Accept the Most Generic Type
It is generally (and correctly) accepted that you should accept the most generic type and return the most specific type when writing function signatures. After all you wouldn't want to write your methods like this
public void Foo(List<string> strings, Guid[] guids)
{
// Bad signature
}
A much better method signature would be
public void Foo(IEnumerable<string> strings, IEnumerable<Guid> guids)
{
// Better signature
}
It's better because now we aren't restricting the two parameters to being a list and an array. Anything that implements IEnumerable<T>
can be passed to our method and we have just made our code a hundred times more usable!
But just because a List<string>
can be passed as a IEnumerable<string>
does not mean every IEnumerable<string>
is a list.
IEnumerable as a function
So hopefully I've now convinced you that you can't treat a parameter of type IEnumerable<T>
as a list or an array - so what should you treat it as? Well the best answer is that you should treat it as a function. A function that can give you enumerators. This means you can free yourself from horrible 1990's style code and start writing pure functions that act on enumerables. For example take this method
public List<string> GetInterweavedReaults(List<string> sourceA, List<string> sourceB)
{
var results = new List<int>();
for(int i = 0; i < sourceA.Count; i++)
{
results.Add(sourceA[i]);
results.Add(sourceB[i]);
}
return results;
}
The method above takes two lists and returns a new list that contains alternating items from the source lists.
Sure it works, but it's long winded and horrible because we are working with lists. But look what happens if we change to IEnumerable<T>
and treat the inputs as functions not lists
public IEnumerable<string> GetInterweavedReaults(IEnumerable<string> sourceA, IEnumerable<string> sourceB)
{
using (var enumeratorA = sourceA.GetEnumerator())
using (var enumeratorB = sourceA.GetEnumerator())
{
while(enumeratorA.MoveNext() && enumeratorB.MoveNext())
{
yield return enumeratorA.Current;
yield return enumeratorB.Current;
}
}
}
This code still does the same, but it treats the inputs as what they are - functions that return an enumerator.
Let's revisit our original function but with the names
parameter renamed to getNames
to reflect thinking of it as as function
// Parameter renamed to "getNames" to reflect its functional nature
public void LogLongAndShortNames(IEnumerable<string> getNames)
{
foreach (var name in getNames.Where(x => x.Length > 10))
Console.WriteLine($"{name} is a long name");
foreach (var name in getNames.Where(x => x.Length < 5))
Console.WriteLine($"{name} is a short name");
}
Suddenly the multiple enumeration seems much less acceptable. After all you wouldn't call an actual Func<string[]>
multiple times would you?
// Parameter replace with an actual function
public void LogLongAndShortNames(Func<string[]> getNames)
{
foreach (var name in getNames().Where(x => x.Length > 10))
Console.WriteLine($"{name} is a long name");
foreach (var name in getNames().Where(x => x.Length < 5))
Console.WriteLine($"{name} is a short name");
}
This is much more apparent that calling getNames()
twice is bad design, however it is surprisingly similar to our original snippet. If we were asked to refactor this version of the method we would probably cache the result of the function and end up with something like the below
public void LogLongAndShortNames(Func<string[]> getNames)
{
// Call the function only once!
var namesArray = getNames();
foreach (var name in namesArray.Where(x => x.Length > 10))
Console.WriteLine($"{name} is a long name");
foreach (var name in namesArray.Where(x => x.Length < 5))
Console.WriteLine($"{name} is a short name");
}
Replacing IEnumerable<string>
back in gives us the correct way to write the function at the top of the page
public void LogLongAndShortNames(IEnumerable<string> nameSource)
{
var namesArray = nameSource.ToArray();
foreach (var name in namesArray.Where(x => x.Length > 10))
Console.WriteLine($"{name} is a long name");
foreach (var name in namesArray.Where(x => x.Length < 5))
Console.WriteLine($"{name} is a short name");
}
Conclusion
So in conclusion I'd like to stress that yes, while you should write your methods to Return the most specific type, accept the most generic type, this doesn't mean just blindly reducing everything down to IEnumerable<T>
. And remember to treat any IEnumerable<T>
you see as a function that can return an enumerator, not a list of things.
Hi, I'm James
I've worked in software development for nearly 20 years, from junior engineer to Director of Engineering. I'm here to help you get your foot on the next rung of your career ladder. I post weekly videos on YouTube and publish guides, articles, and guided project tutorials on this site. Sign up to my newsletter too for monthly career tips and insights in the world of software development.
Related Project Tutorials
Read Next...
What is JavaScript Strict Mode and Why You Should Use It?
3 December 2023 • 3 min read
Explore the benefits of JavaScripts strict mode in enhancing code reliability and preventing common mistakes. This article delves into its features, real-world examples, and reasons why both beginners and pros should consider its adoption.
How to Run TypeScript in VS Code
30 November 2023 • 3 min read
Learn how to set up, run, and debug TypeScript in Visual Studio Code. This guide provides step-by-step instructions to enhance your JavaScript development process. Dive into the seamless integration of TypeScript with VSCode for a productive coding session.
First Class Functions in Javascript
17 August 2023 • 3 min read
This article introduces you to first class functions, and how functions can be treated as first class citizens in programming, demonstrating their utility and practical applications in Javascript.