Azure Cosmos-DB LINQ Null-Handling
We recently migrated some APIs to Azure Function with Cosmos-DB as database. We use the .net Cosmos SDK for accessing the database.
When searching for all persons who don’t have a ZipCode provided we would write it as such:
public record Person(string Id, string Firstname, string Lastname, int? ZipCode, ...);
var iterator = container.GetItemLinqQueryable<Person>()
.Where(p => p.ZipCode == null)
.ToFeedIterator();
...
The result set was suprising as we were expecting to see a lot more records that clearly do not have a ZipCode. It took some time and digging until I found the reason for this. By default all properties with NULL-values in Cosmos-DB are omitted when storing a document. So in our example a Person without the ZipCode would not be saved as:
{
Id = "1",
Firstmame = "Michael",
Lastnme = "Knight",
ZipCode = null
}
but instead as:
{
Id = "1",
Firstmame = "Michael",
Lastnme = "Knight"
}
this is done to save on disc space, traffic and also aligns way better to the document database philosophy. One can force Azure Cosmos to save properties with NULL-values but to me this is a waste of resources and note really adhering to document DB standards.
When querying data Cosmos-DB SQL makes a clear distinction between a NULL property (first JSON) and an undefined property (second JSON). Unfortunately this distinction is carried over into the .net Cosmos SDK. In order to get all Persons without a ZipCode one has to make use of the extension Method IsDefined()
as shown here:
var iterator = container.GetItemLinqQueryable<Person>()
.Where(p => p.ZipCode == null && p.ZipCode.IsDefined() == false )
.ToFeedIterator();
In theory the NULL-check could be ommitted, if one is sure that null properties are always left out by all writers to the database. But one cannot be sure, so I would have checks for both always.
Personally, I think that making this distinction in .net code is a design flaw, as this IsDefined()-Method is not known by the majority of .net-Developers that switch over from SQL Server (Entity Framework Core). A good API should not surprise users and this one clearly does.
There is a GitHub-Issue that is stil open, discussing the issue. Proposed solutions would be a parameter like treatUndefinedAsNull
or an analyzer that warns about this behavior.
PS: I usually blog for myself (mostly, I guess, don’t have any analytics installed) and in German, but this topic might be of interest to a broader audience, thus I wrote it in English. Hope it finds its readers :)