Digitalization is leading to an unprecedentedly high volume of data in companies. Many see running SQL Server databases from Microsoft Azure or Amazon Web Services (AWS) as the best way to ensure IT performance in the face of growing data volumes and more complex analysis requirements. However, the initial hope of being able to work more cost-effectively by switching to the cloud is not being fulfilled for some. One significant reason for this could be that data assets have not been optimized for the new cloud environment in advance. The migration should therefore only be completed after thorough preparation.
Migrating to the cloud is similar to moving into a new home: while you're clearing out shelves and looking at your belongings, objects appear that you didn't even know you had. The question that inevitably arises is: is every item in the entire household still relevant for the new house? Or has the time come to spring clean and remove the clutter?
This realization can also be applied to the migration of SQL Server databases to the cloud. Since the new environment has different rules than on-premises, a smooth move should be preceded by appropriate clean-up work in the database. To do this, database administrators (DBAs) must first of all gain an overview of how all databases interact with the connected applications. This allows them to clean up unnecessary clutter in their data sets and revise codes if necessary. Migration should therefore be preceded by a two-step process consisting of evaluation and examination phases.
Evaluation phase: Data selection for the migration
One of the most common reasons for the failure of cloud migrations is excessive cost. In many cases, this can be attributed to the fact that the new cloud tariff model has not been sufficiently taken into account. Unused data, the amount of which is largely negligible in on-premises operation, can put a significant strain on the budget in the cloud, where the tariff is determined by CPU, storage, and IOPs. In contrast, completing a comprehensive evaluation in advance helps to ensure that the new environment is used as efficiently as possible. To do this, determine all inventory data records and assign them to three categories - cleansing, archiving, migration - one after the other.
Large amounts of junk data or data sets that are simply no longer useful are suitable for cleanup prior to cloud migration. This category includes data that was created in the past but may be of poor quality and only needed to be stored for legal reasons. If the legally required period of time has elapsed, these can now be deleted. If it is personal data, the data stock should also be considered in light of GDPR and any other data protection regulations.
During their investigations, DBAs may also encounter the opposite case: some data sets which, although outdated, are of suitable quality for current and future trend analyses. Here it is advisable to continue to use the data in read-only mode. For example, if migration to Microsoft Azure is planned, the data can simply be moved to a comparatively less expensive storage level using a SQL Stretch database. There, the data is still available in read-only mode and can be retrieved as required for business intelligence operations, for the application of AI or machine learning functions and for the creation of predictive analyses.
After the data needing to be cleaned and archived has been identified, the amount of data suitable for migration has automatically been formed. Although this data comes from local production systems, this does not mean that it can be transferred directly to a cloud-based production system. To prevent possible complaints from users that their reports no longer make sense since the migration, the next step is to subject this data to a thorough quality check.
Examination phase: Quality check for databases
Since no changes should be made to applications and databases during a migration process, it is important to eliminate any features that prevent solid performance. Additional quality checks are necessary to ensure smooth interaction between the application and database levels. The following points should therefore be ensured:
- Consistent naming standards for objects such as tables, views, triggers, stored procedures, and user-defined functions (UDFs).
- Do not use oversized columns, for example CHAR(500), if none of the values contained exceed 32 characters.
- GUIDs (Globally Unique Identifiers) are not used as clustered indices. This is only permitted for small tables that are not extended. You must also check whether GUIDs are used as cluster primary keys, since this can cause numerous performance problems.
- There are no data types defined as MAX size, such as NVARCHAR(MAX).
- There are no implicit conversions, as these can cause serious code problems. In particular, when Object Relational Mapping (ORM) tools are used, conversion problems are more likely to occur because ORMs usually use GUIDs as cluster indices by default.
Furthermore, the coding of the query timeouts should be examined once again. If server timeouts already occur in the on-premises environment for certain queries, these will increase in the cloud. To prevent this, the code should be revised so that it is more resilient in the cloud compared to query timeouts and the associated queries are optimized accordingly.
Another necessary, but in some cases possibly painful, task is the evaluation and testing of popular functions, such as the creation of temporary tables. While such features are often used to improve the logic of the coding, only a few of them have a positive effect on performance. To avoid any nasty surprises in the cloud, it's best to schedule a test for the most commonly used database features.
Reliable documentation facilitates the switch to the cloud
On the whole, the step into the cloud requires nothing less than creating comprehensive documentation based on a data catalog. To avoid discovering after migration that applications and users have literally pulled the rug out from underneath their feet, a further step must be added: record which applications access the data recorded in the catalog.
This may seem as unpleasant to DBAs as having to deal with long forgotten possessions when moving house, but it is just as essential in this situation. In order to simplify the documentation process, it is worth using appropriate management tools that can, among other things, automatically create a detailed overview of the data origin. In this way, suitable conditions can be created for smooth migration and efficient cloud use.