Credential errors in Synapse On-demand pools
A few weeks ago I created a data lake in Azure and filled it with some CSV files. Then I spun up a Synapse Analytics Workspace and queried the files using Azure Synape Analytics On-demand pools to query the CSV files via the Synapse Analytics Studio. This works great - if you haven’t tried running SQL on text files in Azure data lake, stop reading and go check it out.
Next, I created a database in the on-demand pool, and added a view to it, referencing the select OPENROWSET statement. That view can now be used in, say, Power BI or other tools that can connect to the on-demand pool endpoint.
Let’s try it. Using SQL Server Management Studio, I connected to the endpoint as the SQLADMINUSER user (the default user created when setting up the workspace), and all was well.
Until today.
Today, I was hit with a nasty error saying something about credentials.
Msg 15151, Level 16, State 1, Line 9
Cannot find the CREDENTIAL 'https://xxx.dfs.core.windows.net/filesystem/path/file.csv',
because it does not exist or you do not have permission.
After banging my head against this for more time than I care to mention, I had a chat with Mark Pryce-Maher of Microsoft. He pointed out that the issue had to do with missing permissions - the Azure Synapse Workspace managed identity (that gets automatically created when you stand up a workspace) need Storage Blob Data Contributor privileges on the Azure data lake to function. The problem was that it already had said privileges, so we were barking up the wrong tree.
We then tried connecting to the on-demand endpoint using my AAD account - something that worked like charm. This lead us to think that there might be some issue with the context that the user logged into the on-demand pool was running under.
The documentation on controlling access to storage is actually rather good and turned out to be the solution to get things working again, but not WHY this happened. It would seem that sometime during the last couple of weeks, changes were made to the way the SQLADMINUSER account is treated. Apparently it used to get special treatment with credentials automatically created, but not anymore.
The solution is thus to treat the SQLADMINUSER account like any other, and create a server-scoped credential inside of the on-demand pool backed by a shared access signature on the data lake - like this (no, the SAS does not work anymore 😁):
CREATE CREDENTIAL [https://xxx.dfs.core.windows.net/filesystem/path/file.csv]
WITH IDENTITY='SHARED ACCESS SIGNATURE'
, SECRET = '?sv=2020-02-10&ss=bfqt&srt=sco&sp=rwdlacupx&se=2024-06-02T21:12:50Z&st=
2021-06-02T13:12:50Z&spr=https&sig=owUEX9er5N0sdfffsas1rJYoKpenzUI0H46TJNyZK%2BCiWv4%3D';
GO
Hopefully this can save some time for others hitting the same surprise. A huge shoutout to Mark Pryce-Maher, Cathrine Wilhelmsen and Simon Whitely to get me pointed in the right direction.