We recently migrated some APIs to Azure Function with Cosmos-DB as database. We use the .net Cosmos SDK for accessing the database.

When searching for all persons who don’t have a ZipCode provided we would write it as such:

public record Person(string Id, string Firstname, string Lastname, int? ZipCode, ...);

var iterator = container.GetItemLinqQueryable<Person>()
	.Where(p => p.ZipCode == null)
	.ToFeedIterator();
...

The result set was suprising as we were expecting to see a lot more records that clearly do not have a ZipCode. It took some time and digging until I found the reason for this. By default all properties with NULL-values in Cosmos-DB are omitted when storing a document. So in our example a Person without the ZipCode would not be saved as:

{
	Id = "1",
	Firstmame = "Michael",
	Lastnme = "Knight",
	ZipCode = null
}

but instead as:

{
	Id = "1",
	Firstmame = "Michael",
	Lastnme = "Knight"
}

this is done to save on disc space, traffic and also aligns way better to the document database philosophy. One can force Azure Cosmos to save properties with NULL-values but to me this is a waste of resources and note really adhering to document DB standards.

When querying data Cosmos-DB SQL makes a clear distinction between a NULL property (first JSON) and an undefined property (second JSON). Unfortunately this distinction is carried over into the .net Cosmos SDK. In order to get all Persons without a ZipCode one has to make use of the extension Method IsDefined() as shown here:


var iterator = container.GetItemLinqQueryable<Person>()
	.Where(p => p.ZipCode == null && p.ZipCode.IsDefined() == false )
	.ToFeedIterator();

In theory the NULL-check could be ommitted, if one is sure that null properties are always left out by all writers to the database. But one cannot be sure, so I would have checks for both always.

Personally, I think that making this distinction in .net code is a design flaw, as this IsDefined()-Method is not known by the majority of .net-Developers that switch over from SQL Server (Entity Framework Core). A good API should not surprise users and this one clearly does.

There is a GitHub-Issue that is stil open, discussing the issue. Proposed solutions would be a parameter like treatUndefinedAsNull or an analyzer that warns about this behavior.

PS: I usually blog for myself (mostly, I guess, don’t have any analytics installed) and in German, but this topic might be of interest to a broader audience, thus I wrote it in English. Hope it finds its readers :)