TLTR: Clone this git project, set params and run 0_script.sh to deploy 1 ALDSgen2 hub and N Databricks spokes
A data lake is a centralized repository of data that allows enterprises to create business value from data. Azure Databricks is a popular tool to analyze data and build data pipelines. In this blog, it is discussed how Azure Databricks can be connected to an ADLSgen2 storage account in a secure and scalable way. In this, the following is key:
TLTR: Create Azure DevOps git project using azure-pipelines.yml, create build artifact, deploy ADFv2 and SQLDB bacpac, trigger pytest to do unit tests
Unit testing is a software engineering practice that focuses on testing individual parts of code. In unit testing, the following best practices are applicable:
Disaster recovery aims for the continuation of IT systems after disruptive events. Data services are a vital part of every IT system and shall be protected against these events. Most Azure PaaS data services have service tiers that support zone redundancy. This implies that when disaster is limited to a single zone, data services are not impacted or impact is minimized. However, some disruptive events require more planning than just selecting the right tier. These events are grouped to three major scenarios as follows:
Creating a data pipeline is one thing; bringing it into production is another. This is especially true for a modern data pipeline in which multiple services are used for advanced analytics. Examples are transforming unstructured data to structured data, training of ML models and embedding OCR. Integration of multiple services can be complicated and deployment to production has to be controlled. In this blog, an example project is provided as follows:
The code from…
The Microsoft identity platform is key to secure access to your web app. Users can authenticate to your app using their Azure AD identity or social accounts. The authorization model can be used to grant permissions to your backend app or standard APIs like Microsoft Graph. In this blog, a web application is discussed that does the following:
The code of the project can be found here…
Python Flask is a popular tool to create web applications. Using Azure AD, users can authenticate to the REST APIs and retrieve data from Azure SQL. In this blog, a sample Python web application is created as follows:
Azure Cosmos DB is a fully managed multi-database service. It enables you to build highly responsive applications worldwide. As part of Cosmos DB, Gremlin is supported for graph databases. Since Cosmos DB is optimized for fast processing (OLTP), traversal limits may apply for heavy analytic workloads (OLAP). In that case, Azure Databricks and GraphFrames can be used as an alternative to do advanced analytics, see also architecture below.
In the remaining of blog, the following is done:
Selenium is the standard tool for automated web browser testing. On top of that, Selenium is a popular tool for web scraping. When creating a web scraper in Azure, Azure Functions is a logical candidate to run your code in. However, the default Azure Functions image does not contain the dependencies that Selenium requires. In this blog, a web scraper in Azure Functions is created as follows:
Update 2021–07–27: Code deployed successfully, git repo and blog are up to date.
The architecture of web…
Azure Storage always stores multiple copies of your data. When Geo-redundant Storage (GRS) is used, it is also replicated to the paired region. This way, GRS prevents that data is lost in case of disaster. However, GRS cannot prevent data loss when application errors corrupt data. Corrupted data is then just replicated to other zones/regions. In that case, a backup is needed to restore your data. Two backup strategies are as follows:
Secure Azure Functions with Azure AD, Key Vault and VNETs. Then connect to Azure SQL using firewall rules and Managed Identity of Function.
Azure Functions is a popular tool to create small snippets of code that can execute simple tasks. Azure Functions can be triggered using queue triggers, HTTP triggers or time triggers. A typical pattern of an Azure Function is as follows:
Pattern is depicted below, in which data is retrieved from…