problem solving database

#1. A Lack of Indexes

Database indexes let queries to efficiently retrieve data from a database. These indexes work similarly to the index at the end of a book: You can look at 10 pages to find the information that you need rather than searching through the full 1,000 pages.

A screenshot of a cell phoneDescription automatically generated

How Indexes Increase Efficiency – Source: Essential SQL

You should use database indexes in many situations, such as:

Foreign Keys – Indexing foreign keys lets you to efficiently query relationships, such as a “belongs to” or “has many” relationship.
Sorted Values – Indexing frequently used sorting methods can make it easier to retrieve sorted data rather than relying on “group by” or other queries.
Uniqueness – Indexing unique values in a database enables you to efficiently look up information and instantly validate uniqueness.

Database indexes can dramatically improve the performance of large and unorganized datasets, but that doesn't mean that you should use it for every situation. After all, database indexes are separate tables that take up additional space and require special queries to access – it’s not a free operation by any means.

#2. Inefficient Querying

Most developers aren't database experts and the code they write can lead to significant performance issues. The most common problem is the so-called N+1 query , which is a common cause of insidious performance bottlenecks in web applications.

For example, suppose you want to query the database for a list of items.

You might write the following iterator in PHP:

The problem with this approach is that you're executing many separate queries rather than combining them into a single query. It's much faster to execute one query with 100 results than 100 queries with a single result.

The solution is batching queries:

N+1 queries can be difficult to detect because they don't creep up until you have a lot of queries. You may be testing on a development machine with only a handful of queries without an issue, but as soon as you push the code to production, you could see dramatic performance issues under production loads.

#3. Improper Data Types1

Databases can store nearly any type of data, but it's important to choose the correct data type for the job to avoid performance issues. For instance, it’s usually a good idea to select the data type with the smallest size and best performance for a given task.

There are a few best practices to keep in mind:

Size – Use the smallest data size that will accommodate the largest possible value. For example, you should use `tinyint` instead of `int` if you're only storing small values.
Keys – Primary keys should use simple `integer` or `string` data types rather than `float` or `datetime` to avoid unnecessary overhead.
NULL – Avoid NULL values in fixed-length columns since NULL consumes the same space as an input value would. These values can add up quickly in large columns.
Numbers – Avoid storing numbers as strings even if they aren't used in mathematical operations. For example, a ZIP code or phone number integer is smaller than a string.

There are also several complex data types that can add value in certain circumstances but come with gotchas.

For example, Postgresql’s ENUM data type makes it easy to include an enumerable list in a single database item. While ENUM is great for reducing join complexity and storage space, long lists of values, or variable values, can introduce performance issues since it makes database queries a lot more expensive.

How to Spot Problems

The problem with inefficient database queries is that they’re difficult to detect until it's too late. For example, a developer may experience very little lag with a small sample database and just his local machine, but a production-level number of simultaneous queries on a much larger database could grind the entire application to a halt.

Load tests are the best way to avoid these problems by simulating production-level traffic before pushing any code to production users. While most load tests simulate traffic using protocols, the best platforms spin up real browser instances for the most accurate performance results.

A close up of a mapDescription automatically generated

LoadNinja’s In-Depth Analytics – Source: LoadNinja

LoadNinja enables you to record and instantly replay load tests in minutes rather than hours. These tests are run across tens of thousands of real browsers in the cloud to generate a realistic load. Engineers can then access step times, browser resources, and other actionable metrics.

Load tests are only helpful if they're run in advance of a production deployment. Using a continuous integration (CI) and deployment (CD) process, you can ensure that load tests run with each merge into the master or pre-production branch well-before the final push to production.

It’s an Easy Fix

Database performance issues are a common cause of web application bottlenecks. Most of these problems boil down to a lack of indexing, inefficient queries, and the misuse of data types, which can all be easily fixed. The challenge is identifying them before they reach production.

Load testing is the best way to ensure database bottlenecks – and other performance issues – are identified before they reach production users. Using this solution, you can easily incorporate these tests into your CI/CD pipeline and make fixes before they become costly problems.

Start Your 14 Day Free Trial

By submitting this form, you agree to our Terms of Use and Privacy Policy

Ensure your web applications reliably perform under any condition

Record and playback test scripts in minutes with no dynamic correlation or coding
Generate accurate load with real browsers at scale for realistic performance data
Analyze browser-based performance data that developers and testers can understand out of the box
Visualize, isolate and debug any performance issue Virtual Users encounter

New FishXProxy Phishing Kit Making Phishing Accessible to Script Kiddies

SiegedSec Hacks Heritage Foundation Think Tank; Leaks "Project 2025" Data

SiegedSec Hacks Heritage Foundation; Leaks Data Over “Project 2025”

Scammers are leveraging deepfake technology to create convincing health and celebrity-endorsed ads on social media, targeting millions globally. Learn how to spot and avoid these deceitful scams, which have already cost millions in losses.

AI-Driven Scam Ads: Deepfake Tech Used to Peddle Bogus Health Products

Hackers Steal Millions of Customers Data from UAE's Lulu Hypermarket

UAE’s Lulu Hypermarket Data Breach: Hackers Claim Millions of Customer Records

Zyklon B hacker
Zues Malware
Zoombombing

5 Common Database Management Challenges & How to Solve Them

Since nearly every application or tool in your tech stack connects to a database, it’s no surprise that 57% of organizations find themselves constantly managing database challenges.

Storing and accessing huge volumes of data poses problems when teams are responsible for managing the security, reliability, and uptime of multiple databases in a hybrid IT environment on top of their day-to-day tasks. Yet, these teams often run into the same issues across their tech stack and may not even recognize it.

Here are the 5 most common database challenges your team should watch out for and how to solve them.

1. Managing Scalability with Growing Data Volumes

As data volumes continue growing at an average of 63% per month , organizations often don’t have their databases set up to effectively scale.

Not only are individual tools and applications delivering larger datasets into databases, but there’s also a good chance your data is being updated and queried more frequently. Your queries are getting more complex, your data is more distributed, and you’re defining more relationships between your data. Many relational databases aren’t designed to support all of these factors.

Even if your database is designed to scale with your data needs, you may need to pay to manage and query your increasing amount of data. Horizontal scaling can only go so far before memory upgrade costs become untenable.

Something all organizations should consider is whether you’re actually using the data you’re storing. Create retention policies that reduce the amount of data you store as you scale. For example, you can decrease the amount of data you store by erasing transient data in permanent storage, allowing you to better leverage the storage you have available.

2. Maintaining Database Performance

Slow database performance isn’t just inconvenient for your team; it can also stall applications and impact your end-users. Providing the best experience for employees and customers is a must, so it’s crucial to solve database performance issues quickly.

Beyond scalability issues, high latency in databases is often related to slow read/write speeds. Caching to a remote host is one solution to support scaling your databases that don’t need to be updated frequently. This is a great way to offload the database, particularly if some of your data only needs to be accessed in read-only mode.

You should also focus on improving query performance. In some cases, that may involve creating indexes to retrieve data more efficiently. In others, that may involve leveraging more skilled employees with more experience working with databases. Otherwise, inexperienced users can create unexpected performance bottlenecks.

3. Database Access Concerns

Even if your organization sets up and regularly monitors database security, you may continue running into security issues based on your access permissions.

Embracing a least-privilege approach is a must if you’re experiencing database security issues. Reducing the number of people with access using role-based access control, attribute-based access control, or a combination of the two reduces the likelihood of insider threats, phishing or malware attacks, and human error that impacts your data quality. Limit access to users with the right skills to maintain peak performance.

Thankfully, you don’t have to manage access independently for every database. A robust infrastructure access platform can help you manage what access is appropriate across multiple databases based on roles and functions.

4. Misconfigured or Incomplete Security

There’s no doubt that misconfigured security poses a significant risk to databases, particularly in cloud environments. Often, incomplete cloud security without encryption can expose your data to external attacks. Yet, when you’re managing multiple databases, it’s easy to overlook correct configuration or security patches.

Newly deployed or updated databases are particularly at risk for attacks. Regularly monitoring and upgrading databases can enhance security, but those efforts fall short if your database isn’t properly encrypted. Some databases have encryption on by default, so query your database to confirm that either transparent data encryption (TDE) or tablespace encryption (TSE) is enabled.

Plus, poor database configuration or implementation can lead to both intentional data loss—through unauthorized access or exporting—and unintentional data loss through corruption or incomplete logs. Activating logging features helps organizations keep better track of their data, discover and triage data issues, and remediate lost data incidents. Tracking data movement, traces, and logs with full-stack observability tools gives your team the visibility needed to monitor databases and identify threats before sensitive data is at risk.

5. Data Integration and Quality Problems

Without data standardization, your organization can experience integration issues within your database. Finding and aggregating data for queries is especially difficult when data types and formats aren’t aligned across all sources. Plus, data silos across your organization may leave datasets incomplete, resulting in poor queries that both create performance issues and waste company time, resources, and money.

Not all data integration tools are created equal. Leverage platforms and tools that let your organization create rules to standardize your data for each source before it’s integrated into your data pipeline . From there, use the same standardization processes for the existing data in your database and employ automation to limit redundant or incomplete data.

It’s also important to ensure all your sources are integrated seamlessly and regularly into your database. Automation plays a crucial role in data integration, and many tools can push data into your database in real-time or more frequently. However, you may still want to set up integration frequency for different sources, since real-time updates for all data can impact performance if your solution isn’t prepared to support your data.

Managing Database Challenges with Confidence

With more data comes more database challenges. But, with the right tools and preparation, your organization doesn’t have to constantly focus on mitigating database issues. For instance, adopting a modern access control solution like strongDM , where workflows are streamlined for DBAs or developers will go a long way towards ensuring easy, secure access to databases.

At the end of the day, by overcoming these five common challenges, your organization can keep data quality high, improve its security posture, and maintain data accessibility for your organization.

Vulnerability

Cyber Crime
Cyber Events

Stop!t: An App for Kids To Report Cyberbullies With Push of A Button

Surveillance

Tiny “Spyslide Webcam Cover” Protects Your Privacy From Hackers, Spies

Cyber Attacks

Xplain Hack Aftermath: Play Ransomware Leaks Sensitive Swiss Government Data

Scams and Fraud

AI-Powered Scams, Human Trafficking Fuel Global Cybercrime Surge: INTERPOL

SQL Server training
Write for us!

Problem solving database or data warehouse production issues with PRIDESTES DEPLOY Principle

This article talks about a golden rule called PRIDESTES DEPLOY to sort out any SQL BI (business intelligence) related issue and it is particularly useful for resolving database, data warehouse or even reporting related issues where the database is built through modern tools like Azure Data Studio.

I am coining a new term perhaps never used before, but it is practiced generally in almost every IT environment where teams are busy resolving issues round the clock.

PRIDESTES DEPLOY principle is not the solution it is rather a key behind a solution that may be related to any sensitive environment including the Production environment.

About PRIDESTES DEPLOY

After actively taking part in numerous professional life scenarios to resolve production related issues with reference to a database or data warehouse business intelligence solution I have been inclined to formalize the standard or principle that has helped us and can help other professionals and teams while they already use it and it is just a matter of a polite reminder.

I call it PRIDESTES DEPLOY someone may call it something else but as long as we follow it in accordance with its essence it is a fast-track way to either find a solution or the cause that can facilitate the solution.

Let us explore this in detail.

What is PRIDESTES DEPLOY?

PRIDESTES DEPLOY is an acronym and a principle that can be adopted in order to take steps in resolving an issue related to any IT framework including database or data warehouse-related issues.

What does PRIDESTES DEPLOY Stand for?

PRIDESTES DEPLOY stands for the following things:

P: Preliminary Analysis

R: Replicating the Problem

I: Identifying the cause

DES: Designing the Solution/Fix

TES: Testing the Solution/Fix

DEPLOY: Deploying the Solution/Fix

Preliminary Analysis (P)

As the name indicates any problem-solving story begins with getting to know the IT or SQL/BI-based problem and the best way to understand the nature of the problem and factors surrounding it is to analyze the issue.

Preliminary analysis suggests that the first step is always to analyze the problem domain by doing some checks to understand exactly what the problem is.

So, the P in PRISDESTES DEPLOY encourages the solution developer to start their work by first analysing the problem itself and this requires performing preliminary analysis.

Please remember the most important part of preliminary analysis is (analysing) the source that is responsible for reporting the problem such as the database issue reported by a customer to gather as much information as possible. However, this phase is quite challenging.

In a typical environment potential source reporting the problem can be one of the following:

A Customer can simply raise a support ticket about a database/data warehouse/reporting problem he/she is faced with
Post-deployment checks by you or other team members can sometimes reveal a potential issue
Any automated checks can also raise an issue provided they are configured or designed to contain enough information for the preliminary analysis by the respective developer/analyst

No matter which source notified the issue your preliminary analysis should always target the most important piece of information found in the reported problem in order to gather precise information to speed up the problem-solving task.

There is one more important point that is about the assignment of the task for preliminary analysis. For example, a Power BI related issue must first be assigned to the Power BI team of the organization so that they can pass it onto the relevant expert often the developer for preliminary analysis.

Please remember, the preliminary analysis must focus on the following:

What is the problem?
A rough idea of solving the problem
Information about next steps
Information about the best person who can resolve the issue

Professional Tip

Finally, if you are an automation fan then I suggest any raised work item by the customer should automatically trigger preliminary analysis after assigning it to the most appropriate team and if you automate preliminary analysis further such as analysing the problem statement and extracting more information then that’s a plus but not always required.

Replicating the Problem (R)

The R in PRIDESTES DEPLOY refers to “Replicating the Problem”. Now, this is a very crucial step and should never be skipped, ignored, or underestimated unless you have a genuine reason to do so.

In the world of problem solvers, replicating a problem successfully is half the solution and sometimes it is the solution because you know what went wrong you just need to build it (solution) after you know it. However, please stay focussed as replicating the problem especially a production issue is not a child’s play because a very solid understanding and experience in the related field is required to be able to replicate a reported problem.

For example, if a customer reports an error in a Power BI report then you have to replicate the exact error to understand what might have caused the error.

Replicating the database or data warehouse related issue requires a fair amount of effort and time provided you have done the homework because you should have some means to replicate the problem. Now, this is a matter of a lot of discussions whether the production issue needs to be replicated in the production (Prod) or should be doing it in QA or Test environment.

Please bear in mind sometimes Production issues can only be replicated in Production and that’s why they are rightly called Production issues and in that case please have some solid strategy to replicate a production issue.

For example, if a customer says his record has failed to save in the database then you have to try to save the exact record (row) into the database and prepare to see the error but also keep an eye on the underlying database structure and the objects (procedures) taking part in saving the record because often the problem lies there for that particular case.

Please remember it is always handy to have a live test workspace in case of Power BI error handling that must have a test user to replicate any problem reported by the customers.

Identifying the cause (I)

Now, what is a problem without identifying the cause.

For example, you are informed by a team (self-discovered production error) that for some strange reason duplicates are generated in the final FACT table. Now, unless you know the exact reason removing the duplicates from the current result set is not going to keep the system calm as this may happen again.

That’s why we must identify the actual cause of the issue to resolve it completely and the preceding steps (preliminary analysis and replicating the problem) can help a lot to identify the cause of the problem. In my opinion, this is a very demanding step as it can test the patience of the developer assigned to do the job of resolving the production issue because he may have to run numerous tests to find out the exact cause of the issue.

However, the experience and a solid background in that area (such as database, data warehouse or reporting) can be handy here. If you have spent over a decade resolving such issues, then it may turn out to be a piece of cake to identify the cause, but every Production issue in itself can be sometimes unique and challenging even though you have seen this before and know the cause.

Please remember to be fully aware of the underlying architecture and the processes involved in a reporting solution (when handling a reporting issue reported by the customer) can help you to identify the cause of an issue although be prepared to run a couple of stringent tests to get to the point.

Designing the Solution/Fix (DES)

Finally designing the solution or fix is proof that you have worked hard in the previous steps.

You cannot perform well in this phase, if you have not done much in the previous phases that’s why in order to design the solution you must have done the right amount of preliminary analysis followed by replicating the problem and identifying the cause.

If you know the cause you can fix the problem and to fix the problem, you design the solution.

Sometimes designing the solution is as simple as modifying a small, stored procedure to behave correctly in case of a special use case that has caused the system error. Now you modify the stored procedure to ensure that you don’t break the existing code and you address the issue being reported.

On some other days, designing a solution can be adding a new data workflow to handle archiving requirements that are not correctly handled by the existing data warehouse architecture.

Designing the fix or solution in a traditional professional problem-solving scenario ultimately means developing it with the help of tools and technologies and this also requires you to define how your solution/fix fits well into the current database or data warehouse architecture including the data movement activities to achieve the desired goal that may mean using some form of data integration service such as Azure Data Factory or Integration Services Projects (SSIS Packages) to extract, transform and load data.

Testing the Solution/Fix (TES)

Designing the solution is not enough it has to be tested to meet its requirements. Testing in itself is a skill as I have seen often times the developers can spot the issue in their code even before it is picked up by a tester. However, testing is also a very broad area.

You have to be pretty specific in this step and it is somewhat similar to unit testing provided you understand that the unit being tested can be as big as a report showing wrong figures and then narrowing it down to the object (table) behind that error or the workflow (data loading activity) that keeps the object active.

Testing your fix simply means you have to check from the user perspective whether the problem has been resolved or not and this can be a step where you can do your part of testing and then hand it over to the team of professionals responsible for the overall testing of the whole module.

If it is a small issue, then I suggest the person who is developing the fix should test it first if he completely understands the bigger picture. For example, sometimes a business intelligence developer/analyst is smart enough to know that fixing this piece of the puzzle solves the mystery although he is not completely aware of all the other components that interact with the data warehouse such as data models, Power BI reports or real-time analysis in Excel.

Deploying the Solution/Fix (DEPLOY)

Finally, you have to deploy the solution or the fix that you have worked so hard on it.

Again, deployment can be a small piece of code ranging from a stored procedure modification to a fully functional process consisting of several objects and sub-processes, but the ultimate goal remains the smooth deployment to the Production server.

Considering the modern-day tools and technologies the deployment can be completely automated such as using Azure DevOps Builds and Release pipelines but there is nothing that stops you to use a simpler and even manual way of deploying an object to the Production system (with the help of the teams responsible for finally pushing it to the Server) as long as you have a predefined set strategy that is acceptable, workable and shareable.

If your data warehouse (database) is managed by a SQL database project then you can simply deploy a small fix such as a modified stored procedure or view that addresses the issue to Dev (development database) using publish script or even schema compare tool and then from there (after successful unit testing) let it be handled by the next team or you can also deploy to Test and UAT regardless of the fact that you have got a very sophisticated fully managed deployment strategy or just a manual check list that is shared across other team members and is as solid as an automated deployment strategy.

I am not against automated builds and releases, but I don’t recommend for a sole developer to overcomplicate tasks that can assist rather than focussing and working on the main objective expected to be delivered as soon as possible.

A Word of Advice

We all know that practice makes a man perfect, but I read in an interesting article that sometimes practice does not make a man perfect if he is not doing the things correctly then he is mastering to do things incorrectly.

In other words, surround yourself with experts of your field and get the knowledge-based understanding of your area of expertise through knowledge sharing sessions or taking reputable good courses along with finding time to learn and implement things from credible sources that you learn and most importantly it is crucial to be aware of standard practices and how to encourage others to follow them.

Once you have a solid foundation and experience you can find your own ways and feel free to experiment with PRIDESTES DEPLOY to mend it according to your level of comfort but without compromising the standards and please keep in mind without dedication and commitment the journey is difficult so be proactive and be ready to improve and to embrace new technologies and tools that can help you to solve Production issues but at the same time work on the methodology even it is just a few dry steps that are actually helpful for you, your team and your organization to solve real-time production issues efficiently and effectively.

Recent Posts

SQL Machine Learning in simple words - May 15, 2023
MySQL Cluster in simple words - February 23, 2023
Common use cases of SQL SELECT Distinct - February 2, 2023

SQL Unit Testing Data Warehouse Extracts with tSQLt
The Concept of Test-Driven Data Warehouse Development (TDWD) with tSQLt
Mapping schema and recursively managing data – Part 1
Using tSQLt for Test-Driven Data Warehouse Development (TDWD)
Why you should cleverly name Database Objects for SQL Unit Testing

DBmarlin Blog

Get our latest articles in your inbox each month.

Uncover the insights in database performance with DBmarlin's informative blog. Stay updated and informed!

CPU - has your database CPU consumption jumped?

Disk I/O - Is your database suddenly doing far more I/O than before?
SQL - what are your top resource-consuming queries?

Locks - have you any blocking locks?

Object sizes - Have any key tables or indexes suddenly jumped in size?

Sessions - has the number of sessions suddenly spiked?

Taking these in turn, let’s expand a bit.

Which users are affected? Is this a single user issue or are all users affected?

If this is a single user problem, not only is it easier to identify what is going on by talking to them, but also in reality this is a lower priority than something affecting all users. So, prioritise accordingly and grab a nice cup of coffee if you can. ☕️
If all users are affected, then it’s a much bigger problem and you might be firefighting the problem under pressure. In this case, ensuring that you have all the information readily available and presented clearly before the event will ease the stress.
DBmarlin lets you drill down into Users , Clients , Sessions , Programs and Database or Schema so you can easily see who is affected.

Releases - have there been any software releases or changes on the system today?

Knowing about the software releases is key to understanding variable system performance, especially in today’s world of agile and rapid release cycles.
Have any new database objects been dropped in?
Have your key queries changed execution paths because of new objects or changes in object size? (i.e. have 3,000 row tables becomes 3,000,000 overnight?)
DBmarlin can help you here with it’s change tracking function showing changes along the timeline on the landing page for your database.

Disk - are any of your filesystems or drives full?

It may seem obvious, but servers do not react well to having key filesystems full, and neither do databases. This doesn’t just include tablespace and transaction log area, but audit file space is required as looking at trace files you may find that suddenly the app is core dumping and causing unexpected issues.
Server root mounts and drives
Software mounts and drives
Online transaction/redo log areas
Archived transaction/redo log areas
Database Temporary locations
Audit trail locations
Trace file locations

Transaction logs - has your log production jumped?

An unexpected increase in transaction load can stress the transaction log system and cause issues with archived/saved logs too.
Oracle Redo/Archive logs
MS-SQL Transaction Log
PostgreSQL WAL logs
MySQL binary logs
Ensure your logs are sized correctly for the transaction throughput so that log switches do not affect your system.
Knowing your system’s performance during key times of day is critical to diagnosing issues.
DBmarlin’s Time Comparison feature allows you to easily see Total DB Time differences between two periods. By default it will compare the last hour with the previous one allowing you to see if something has just happened.

Disk - is your database suddenly doing far more I/O than before?

Again - having an historical view is key to troubleshooting.
DBmarlin’s Time Comparison feature allows you to easily see the Top Waits in your database and what has changed. Again - by default, it will compare the last hour with the previous one allowing you to see if something has just happened.

SQL - what are your top consuming queries?

Being able to quickly determine what your top resource-consuming queries are is critical to getting to the bottom of that issue.
DBmarlin’s ability to show you a graphical representation of what is occurring now on the landing page of your database, but once again, the Time Comparison feature allows you to pick a baseline time from this hour last week, when everything was peachy.
Blocking locks can quickly hang up your key applications.
DBmarlin can help start you in the right direction by showing the wait time on the Wait Events pie chart on your database’s landing page.

Object sizes? Have any key tables or indexes suddenly jumped in size?

Keeping a record of the size of your database objects is a subject all on its own. Being able to track them over time allows you to see if there has been a sudden increase, altering your execution plans and killing the performance of your application’s key queries.
DBmarlin’s Change History facility allows you to see if a release could be causing your current issue. After selecting the change you are interested in, examine the Statements to find the ones with multiple execution plans , and use the Execution Plans drop-down to compare the plans.
Being able to see at a glance if you are being hit by an application server that has an issue is of real value here.
On DBmarlin’s database landing page, the DB Time will be the first thing you see in the top left corner indicating the trend. If you see it rocketing upwards, change the data of the pie chart to Sessions and Clients to identify the source of the issue.

So in summary, troubleshooting is rarely a simple exercise but it can be made a lot easier if you have a structured approach to troubleshooting supported by a tool which visualises database performance. Having been involved in many major incidents over the years, the technical teams will communicate well and reach a common understanding of issues and deciding next steps, but I have often found it very useful to have pictures to show ‘The Management’ as a way of increasing their understanding of complex issues and gaining their buy-in as to where focus should be put.

As in common with standalone tuning exercises, don’t try and boil the ocean in fixing things - you are here to restore service. Having an historical comparison available to show you ‘what good looks like’ is incredibly useful here.

And - when you’ve restored service - stop!

Ready to try DBmarlin?

If you would like to find out more about DBmarlin and why we think it is special, try one of the links below.

Get hands-on without an installation at play.dbmarlin.com
Download DBmarlin from www.dbmarlin.com , with one FREE standard edition license, which is free forever for 1 target database.
Follow the latest news on our LinkedIn Community at linkedin.com/showcase/dbmarlin
Join our community on Slack at join-community.dbmarlin.com

27 Sep 2021

Database Performance
#Application Performance

Real World Problem Solving with SQL

Tutorial Real World Problem Solving with SQL
Description Examples of how to use SQL to solve real problems, as discussed in the database@home event. https://asktom.oracle.com/pls/apex/asktom.search?oh=8141
Tags match_recognize, sum, analytic functions
Area SQL Analytics
Contributor Chris Saxon (Oracle)
Created Tuesday May 05, 2020

Prerequisite SQL

Predicting stock shortages.

To start we'll build a prediction engine, estimating when shops will run out of fireworks to sell. This will use data from tables storing the shops, order received, and daily and hourly sales predictions:

Calculating Running Totals

By adding the OVER clause to SUM, you can calculate running totals. This has three clauses:

Partition by - split the data set up into separate groups
Order by - sort the data, defining the order totals are calculated
Window clause - which rows to include in the total

This returns the cumulative sales for each shop by date:

The clauses in this SUM work as follows:

partition by sales.shopid calculates the running total for each shop separately
order by sales.saleshour sorts the rows by hour, so this gives the cumulative sales by date
rows between unbounded preceding and current row sums the sales for this row and all previous rows

Creating Hourly Sales Predictions

The budget figures are daily. To combine these with hourly sales, we need to convert them to hourly figures.

This query does this by cross joining the day and hour budget tables. Then multiplying the daily budget by the hourly percentage to give the expected sales for that hour:

Finding When Predicted Sales Exceed Stock Level

We can now combine the hourly actual and predicted sales figures. We want to include actual figures up to the time of the last recorded sale. After this, the query should use the expected figures.

This query returns the real or projected sales figures. Then computes the running total of sales of this combined figure:

It also predicts the remaining stock at each time by subtracting the running total of hourly sales (real or predicted) from the starting stock. Notice that this has this window clause:

This means include all the previous rows and exclude the current row. This is because we want to return the expected stock at the start of the hour. Including the current row returns the expected level at the end of the hour.

Predicting Exact Time of Zero Stock

The previous query returned every hour and the expected stock level. To find the time stock is expected to run out, find the last date where the remaining stock is greater than zero:

To estimate the exact time stock will run out, we can take the ratio of expected sales to remaining stock for the hour it's due to run out. The query does this with these functions:

This uses the KEEP clause to return the value for stock and quantity for the maximum HOUR. This is necessary because STOCKNEM & QTYNUM are not in the GROUP BY or in an aggregate function.

This technique of calculating the previous running total up to some limit has many other applications. These include:

Stock picking algorithms
Calculating SLA breach times for support tickets

We'll look at how you can use this to do stock picking algorithms next

Stock Picking Routines

Next we'll look at how to use SQL to find which stock location to get inventory from to fulfil orders. This will use these tables:

The algorithm needs to list all the locations stock pickers need to choose from. There must be enough locations for the sum of quantities in stock reaches the ordered quantity of that product.

Get the Cumulative Quantity of Stock Picked

The algorithm needs to keep selecting locations until the total quantity picked is greater than the ordered quantity. As with the previous problem, this is possible with a running SUM.

This query gets the cumulative selected quantity for the current row and the previous quantity:

To filter this to those locations needed to fulfil the order, we need all the rows where the previous picked quantity is less than the ordered quantity.

Try Different Stock Picking Algorithms

Like the previous problem, the SQL needs to keep adding rows to the result until the running total for the previous row is greater than the ordered quantity.

This starts the SQL to find all the needed locations. You can implement different stock picking algorithms by changing the ORDER BY for the running total. Experiment by replacing /* TODO */ with different columns to see what effect this has on the locations chosen:

Row Numbering Methods

Whichever algorithm you decide to use to select stock, it may choose a poor route for the stock picker to walk around the warehouse. This can lead to them walking back down an aisle, when it would be better to continue up to the top of the aisle. Then walk back down the next one.

One way to do this is to number the aisles needed. Then walk up the odds and back down the evens. This means locations in the same aisle need the same aisle number. We want to assign these numbers!

Oracle Database has three row numbering functions:

Rank - An Olympic ranking system. Rows with same sort key have the same rank. After ties there is a gap in the ranks. After ties the numbering starts from the row's position in the results.
Dense_Rank - Like RANK, this sets the rank to be the same for rows with the same sort key. But this has no gaps in the sequence
Row_Number - This gives unique consecutive values

This compares the different ranking functions:

Change Stock Picking Route

To improve the routing algorithm, we want to give locations row numbers. Locations on the same aisle must have the same rank. We can then alternate the route up and down the aisles by sorting:

By ascending position for odd aisles
By descending position for even aisles

To do this locations in the same aisle must have the same rank. And there must be no gaps in the ranks. Thus DENSE_RANK is the correct function to use here. This sorts by warehouse number, then aisle:

To alternate ascending/descending sorts, take this rank modulus two. Return the position when it's one and negate the position when it's zero:

The complete query for this is:

Finding Consecutive Rows - Tabibitosan

For the third problem we're searching for consecutive dates in this running log:

The goal is to split these rows into groups of consecutive dates. For each group, return the start date and number of days in it.

There's a trick you can use to do this:

Assign unique, consecutive numbers sorted by date to each row
Subtract this row number from the date

After applying this method, consecutive dates will have the same value:

You can then summarise these data by grouping by the expression above and returning min, max, and counts to find start, end, and numbers of rows:

This technique is referred to at the Tabibitosan method.

Finding Consecutive Rows - Pattern Matching

Added in Oracle Database 12c, the row pattern matching clause, MATCH_RECOGNIZE, offers another way to solve this problem.

To do this, you need to define pattern variables: criteria for rows to meet. Then create a regular expression using these variables.

To find consecutive dates, you need to look for rows where the current date equals the previous date plus one. This pattern variable does this:

To search for a series of consecutive rows, use this pattern:

This matches one instance of INIT followed by any number of CONSECUTIVE.

But what's this INIT variable? It has no definition!

Undefined variables are "always true". This matches any row. This enables the pattern to match the first row in the data set. Without this CONSECUTIVE will always be false, because the previous RUN_DATE will always be null.

This query returns the first date and number of rows in each group:

By default MATCH_RECOGNIZE returns one row per group. To make it easier to see what's going on, this query adds the clause:

This returns all the matched rows. In the MEASURES clause it also adds the CLASSIFIER function. This returns the name of the pattern variable this row matches:

Counting Number of Child Nodes in a Tree

To finish, we'll build an organization tree using the classic EMP table:

We want to augment this by adding the total number of reports each person has. I.e. count the number of nodes in the tree below this one.

You can do this with the following query:

The subquery calculates the hierarchy for every row in the table. So it queries EMP once for each row in the table. This leads to a huge amount of extra work. You can see this by getting its execution plan:

Note the FULL TABLE SCAN at line three happens 14 times. Reading a total of 196 rows. As you add more rows to EMP, this query will scale terribly.

Counting Child Nodes - Pattern Matching

You can overcome the performance issues for the previous query with this algorithm:

Create the hierarchy returning the rows using depth first search.
Add a row number to the results showing which order this returns rows
Walk through the tree in this order
For each row, count the number of rows after it which are at a lower depth (the LEVEL is greater)

This uses the fact that when using depth-first search, all the children of a node will be at a lower depth. The next node that is the same depth or higher as the current is not a child.

You can implement this in MATCH_RECOGNIZE with these clauses:

Like searching for consecutive rows, this starts with an always true variable. Then looks for zero or more rows which are at a greater depth.

A key difference is this clause:

After matching the pattern for the first row, this instructs the database to repeat the process for the second row in the data set. Then the third, fourth, etc.

This contrasts with the default behaviour for MATCH_RECOGNIZE: after completing a pattern, continue the search from the last matched row. Because all rows are children of the root, the default would match every row. Then have nothing left to match! So it only returns the count of the child for KING!

All together this gives:

Notice that now the pattern only reads EMP once, for a total of 14 rows read. This scales significantly better than using a subquery!

Additional Information

Database on OTN SQL and PL/SQL Discussion forums Oracle Database Download Oracle Database

MSP business strategy

How to diagnose and troubleshoot database performance problems

Database performance problems can wreak havoc with web site performance and cost your customers lots of money. diagnosing problems is much easier when you rely on this systematic approach..

Hilary Cotter

Service provider takeaway: Database service providers should follow an established methodology when troubleshooting database performance problems.

Relational database management systems (RDBMSes) , such as SQL Server , Oracle , DB2 and Sybase , are highly scalable and capable of responding to thousands of requests per second. Mission-critical applications are dependent on highly responsive database systems to provide their clients with sub-second performance. Unfortunately, performance problems on a database have the potential to drive your customer's Web users to another site, causing significant financial losses to the company.

It is essential that service providers use a proven methodology to diagnose and troubleshoot performance problems in customers' databases. Following a methodology enables you to approach the diagnosis of problems in an orderly, logical manner, which increases the chances that you'll find the exact cause of the problem quickly and accurately. Failure to use a methodology will result in the service provider attempting to solve symptoms with no clear understanding of what the underlying problem is, and whether the solution offered has solved the problem.

What changed?

With change management systems in place, it will be easy for service providers to isolate code or schema changes that are responsible for the performance problems and to fix them or revert back to an earlier version of the code. There are several tools available commercially for change management, such as Quest's Change Director and Idera's Change Manager.

Examining baseline and benchmarking metrics

Baselines and benchmarking will also help service providers determine what has changed on the system and what the bottleneck is. A baseline is performance data gathered during a simulated load or actual load. The same metrics can be gathered and compared with the current workloads to determine how these metrics have changed. These comparisons provide a quick window into bottlenecks. For example, comparing current performance counters with the baseline might reveal higher-than-normal disk I/O patterns. Memory bottlenecks normally manifest themselves as memory errors in the event log. Disk I/O bottlenecks typically manifest as transitory performance problems. CPU bottlenecks typically manifest as high CPU utilization, but disk I/O bottlenecks can also cause high CPU utilization.

Benchmarks are performance counters collected during a variety of loads. They are used to determine how your server will respond under load and what bottlenecks will exist. For example, you could run a load test to determine how the database system will perform during the holiday rush.

While baselines provide a basis to compare performance at various times throughout the lifetime of your points of comparison, benchmarks allow you to compare performance under various workloads. For example, you might want to have a benchmark for how the system responds during peak demand cycles. This benchmark can be compared with current performance counters to see if the system response is unexpected for that workload.

Let's look at an example of how this works. You have a baseline and a benchmark for various workloads for your RDBMS. You know your baseline for transaction per second (tps) is 1000 tps. Your monitoring reveals that it is dropping because it is the holiday season and your RDBMS is under peak load. You compare the current tps with the benchmark you have recorded for peak demand and discover that your current tps value is below the recorded benchmark. Clearly something is wrong and needs to be addressed. Further examination of the peak season baseline shows that the CPU counters are running much higher than you previously recorded. With this information, you can then quickly start diagnosing what is consuming the CPU.

Baselines and benchmarks can also quickly identify changing load patterns, which may dictate the need for more powerful hardware.

In summary, benchmarks and baselines provide a basis to:

See how your system response is changing with your current workload.
See if this change is expected or unexpected.
See if you are maxing out your hardware.
Identify the bottleneck.
Proactively address performance degradation.

Setting performance tuning goals

Once you have determined what the source of the bottleneck is, you should determine whether you need to upgrade hardware, change the architecture or improve indexing. At this point you also need to create performance tuning goals with customers. Interview customers to determine what they feel is the pressing performance problem, and then tune the system to address these goals. It is important to understand performance problems of the RDBMS before the client interview; otherwise you will be chasing symptoms of the problem rather than attacking the problem itself. You will also be in a position to tell the client what you feel is achievable and whether new hardware is required. That said, if you address only what you see as the performance problems, you may or may not solve the performance problems that the clients are seeing. The job will not be done until you have solved the performance problem from the client's perspective.

Using performance tuning tools

Once you have determined the bottlenecks and set performance tuning goals, you can begin the process of performance tuning. Tools such as profiler, perfmon or the SQL Server DMV (Dynamic Management View) will enable you to see what is going on in the system. In any RDBMS under load, many processes will simultaneously be running; some processes will be executing and other processes will be waiting for resources to become available. DMVs allow you to see which processes are not currently executing and what they are waiting on. The DMV will also tell you whether these waits are normal. The DMV can also be used to identify problem processes or queries, sometimes even at the statement level, and provide indications of how to solve them. Most frequently, you will be able to solve performance problems by adding indexes, rebuilding them or rewriting queries.

Once this process is completed, capture another baseline to ensure that performance problems have been solved and to provide another performance record that you can measure future workloads against.

By following this methodology, you can be proactive in addressing performance problems as they develop.

About the author Hilary Cotter has been involved in IT for more than 20 years as a Web and database consultant. Microsoft first awarded Cotter the Microsoft SQL Server MVP award in 2001. He is the author of a book on SQL Server transactional replication and is currently working on books on merge replication and Microsoft search technologies.

Dig Deeper on MSP business strategy

capacity management

How to address Python performance problems

4 critical API caching practices all developers should know

8 tips to optimize network bandwidth and performance

Research from Kaseya reveals the top priorities for service providers as we move deeper into the second half

Despite the ongoing targeting of the managed service community by cyber criminals, many feel confident about their position

AI-capable products are starting to hit the market, providing expectations of a strengthening 2024

New research from Cisco Talos highlighted three of the most popular known vulnerabilities that were exploited by ransomware gangs...

A Check Point Software Technologies researcher who discovered CVE-2024-38112 said the Windows spoofing vulnerability may have ...

Microsoft disclosed and patched a whopping 142 vulnerabilities in a busy Patch Tuesday that included two zero-day flaws under ...

The acquisition brings Storj's distributed storage offerings together with Valdi's distributed compute services for ...

Pure Storage is the latest infrastructure vendor to add Nvidia DGX SuperPod certification and new product offerings to support ...

Hyperscaler service offerings to detect or eliminate malware add features that analysts call cyberstorage to common object ...

Non-standalone 5G uses a combination of existing 4G LTE architecture with a 5G RAN. Standalone 5G, on the other hand, uses a 5G ...

This guide teaches networking newbies how to set up a home network, from understanding hardware components to managing network ...

Networks are always evolving, and network automation is the next step forward. From soft skills to AI, these skills are essential...

Containers and VMs have their own use cases, but one takes the lead in efficiency. Compare the two options, and see how Docker ...

Amazon Athena can provide an efficient, cost-effective method of data analysis. But did you properly optimize Athena performance ...

Centralized identity management is vital to the protection of your organization's resources. Do you know how to secure Azure ...

With Exascale, the tech giant aims to improve the efficiency of its relational database platform to handle GenAI workloads while ...

Synthetic data can enhance the performance and capabilities of data augmentation techniques. Navigate the challenges generative ...

The Europe-based vendor intends to use its latest funding to shift its growth focus to the U.S. while also continuing to invest ...

Analytics can exhibit biases that affect the bottom line or incite social outrage through discrimination. It's important to ...

The longtime independent analytics vendor's new platform update combines embedded BI with generative AI to deliver AI-powered ...

Data teams can use generative AI to make data visualization creation approachable for business users of all technical skill ...

9 Common Database Management Challenges and How to Fix Them

16 May 2022

Picking a suitable database can be challenging, given that there are many options available today. However, more and more businesses worldwide are becoming reliant on data when operating their day-to-day operations and making educated business decisions.

With plenty of data being created, it becomes challenging to manage data dispersed across various geolocations and several business line applications.

In this post, we’ll walk you through some of the most common data management challenges, as well as how you can solve them:

1. Managing scalability as data volume increases

As data grows by 63% per month , most companies don’t have their databases set up to scale effectively.

Some different tools and apps deliver more extensive datasets into databases, but there’s also a likelihood wherein data is frequently updated and queried. As these queries become increasingly complex and your data more distributed, you’ll define more relationships between your data. Most relational databases aren’t often designed to support all of these factors.

All organizations need to consider whether you’ll be using the data they’ll be storing. You must develop retention policies to decrease the amount of information you’re keeping as you scale.

For instance, you can reduce the amount of data you store by removing transient data on permanent storage, which will allow you to create better leverage on the available storage you have.

2. Maintaining database performance

Slow database performance is inconvenient for your team, but it also stalls applications and impacts end-users. Offering your employees and customers the best experience is essential. That’s why you must solve these database performances quickly.

Caching to a remote host is one of the best solutions to support scaling your databases that don’t need to be updated regularly. It’s an excellent way to offload the database, especially if some of your data needs to be accessed on a read-only mode.

In the same way, you should also work on improving query performance. It might involve developing indexes that allow you to retrieve data efficiently.

To some, it may also include leveraging more skilled employees that have more experience working with databases. If you fail at this, inexperienced users may have unexpected performance bottlenecks. Getting the proper database management support is essential to overcoming these unexpected challenges.

3. Multiple data storage

Multiple data storages are one of the most significant challenges most businesses encounter. Big organizations may develop tens of business solutions with their data repository like CRM, ERP, databases, etc.

Having multiple data storage poses a significant barrier that needs to be addressed to evaluate and handle it. Now, if data is placed in separate siloed systems, it’s hard to identify and consolidate in a universal data platform which will speed up data-driven choices.

Therefore, make sure you come up with a single source of truth for your data. The principal focus of your organization is to get rid of data silos and link data from consumers, products, and suppliers.

4. Data safety

Data loss costs your business money. Aside from that, this doesn’t count you losing your business reputation and the possibility of it closing down.

While your database should process your data to ensure nothing will be lost, ensure that you always back up your data. See to it that you duplicate information and then store copies separately. Doing so will spare you from unnecessary hassles down the road.

5. Limitations on mitigation

Most software and application apps have limitations. It includes data servers. Forward-thinking companies that focus on transaction volume know their catalog components, data structure, hardware configuration, and computer systems.

They know that all of these can hurt their data loss, and they need to accept the right solutions at the right time.

6. Data management and distribution

Data management has its pros and cons. Businesses need to know how much data needs to be distributed and what will be the best way that you can undo its power.

Aside from that, companies should also know the appropriate level of power allocation to communities. One of the biggest challenges in managing and creating a distributed database is the lack of integrated information for all data.

7. Misconfigured or incomplete security

There’s no doubt about it. Having a misconfigured security can cause a significant risk to databases, especially in cloud environments.

Having incomplete cloud security with no encryption will expose your data to external attacks. But when managing multiple databases, it’s pretty easy to overlook that suitable configuration or security patches.

8. Data Integration

Database management may be pretty simple before. However, as you continue to scale databases, new complexities emerge. Now, you’re stumped on how to modify your DBM.

You must integrate data from different sources if you’re offering omnichannel services. You can do this with software that’s specially created for this purpose.

Everyone hates dealing with slow computers. If every single time that you’re trying to retrieve data, and you’re stressed out, then is high time that you optimize your systems. Ensure that you index correctly and do not include too many joins on SQL queries.

If this isn’t the issue, it’s also high time to enhance your bandwidth, or you may have caught a virus. It is why you must come up with database health checks regularly.

Over to You

So, there you have it. These challenges and solutions can be a handy guide for picking the suitable database your company needs if you want it to succeed.

As you know, databases are the core software resource that your business depends on. Therefore, it’s an important decision to make the first time around.

Databases are known to be information warehouses. That’s why securing them should be one of your main priorities. See to it that you configure and deploy your database. At the same time, all aspects of security should be maintained to resist any attacks that might come your way.

By overcoming these challenges, your organization can keep the data quality high, enhance its security posture, and maintain data accessibility in your organization.

Related Blogs

Data Security in Cloud
Top 8 Latest Database Management Platforms in 2022
Clean up your data using SQL for effective analysis
Data Backup: Importance & the Different Data Backup Storage Options
Efficient ways to write SQL Queries

Recent Blogs

Data-Driven Insights: Leveraging Analytics for Enhanced E-Learning Outcomes
Predictive Analytics: Forecasting the Future of Supply Chains
Game-Changers: Exploring the Latest Innovations in Sports Technology
Tailoring Success: The Power of Customizable CRM Solutions in Today’s Business Landscape
Maximizing Business Success: Harnessing AI-Driven Insights in ERP for Enhanced Decision-Making

All content provided on this blog is for informational purposes only. Tudip Technologies provides no endorsement and makes no representations as to accuracy, reliability, completeness, suitability or validity of any information or content on, distributed through or linked, downloaded or accessed from this site. Tudip Technologies will not be liable for any errors, omissions, or delays in this information or any losses, injuries, or damages arising from its display or use of the information on this site. All information is provided on an as-is basis without any obligation to make improvements or to correct errors or omissions. This site may contain links to other websites. Tudip Technologies makes no guarantees or promises regarding these websites and does not necessarily endorse or approve of their content. You may not modify any part of the blog. The inclusion of any part of this blog in another work, whether in printed or electronic or other form, or inclusion of any part of the blog in another website by linking, framing or otherwise without the express permission of Tudip Technologies is prohibited. This site may not be used for any illegal or illicit purpose and Tudip Technologies reserves the right, at its sole discretion and without notice of any kind, to remove anything posted to this site. By using this site, you hereby acknowledge that any reliance upon any materials shall be at your sole risk.

Amazon Web Services
Azure Cloud
Google Cloud Platform

“Is to be a Global partner and the first choice for our customers by providing leadership in specific domains to help our customers accelerate the value creation process.”

Our Mission

“Is to create a niche’ by offering cutting-edge integrated services across technologies empowered by innovation, best in class process and best of breed technology.”

Plot No. 11/2, Phase 3, Hinjewadi Rajiv Gandhi Infotech Park, Pune, India – 411057 [email protected] +91-96-8990-0537

Sr. No. 241/3/A, Datta Mandir Road, Wakad, Pune, India – 411057 [email protected] +91-96-8990-0537

64 Caracas Road North York, Toronto Ontario M2K 1B1, Canada [email protected]

Cra. 9 # 113-53 Of. 1405 Bogotá D.C., Colombia [email protected]

United States

1999 S. Bascom Ave Suite 700, Campbell CA. 95008, USA [email protected] +1-408-216-8162

22 Kumasi Crescent, Wuse 2, Abuja, Nigeria [email protected]

Mexico

Calle Amado Nervo #785 Interior B Colonia Ladron De Guevara 44600 Guadalajara, Jalisco, Mexico [email protected]

Tudip Information Technologies L.L.C Office No 109, ABU HAIL BUILDING 13, Abu Hail, Dubai, UAE [email protected]

The Best 9 Websites to Practice SQL Online

Meenakshi Agarwal

In this article, we’ll explore the 9 best websites to practice SQL , providing you with platforms rich in data, examples, and guidance to help you use them easily.

As you might know, SQL, or Structured Query Language, is a programming language for interacting with databases. It is a standard way to use SQL for operating with tables and most databases support it. Moreover, SQL is one of the most in-demand skills in the tech industry, with data analysts, data scientists, and software engineers all needing to know SQL.

Best Platforms to Practice SQL

If you are looking to practice SQL, there are a number of great platforms available. Here are a few of the best:

Why Practice SQL?

Before we delve into the platforms, it’s important to understand the benefits of practicing SQL:

Skill Development : SQL proficiency is a basic skill in data-related professions. Practicing SQL enhances your ability to manage and analyze data effectively.
Problem-Solving : SQL enables you to extract, transform, and manipulate data. Regular practice sharpens your problem-solving skills, making you adept at tackling real-world data challenges.
Career Advancement : SQL expertise is in high demand across various job roles. Whether you’re looking to become a data analyst, a web developer, or a database administrator, SQL proficiency can give you a competitive edge.

In order to help you, I have compiled a list including some of the best platforms for practicing SQL, catering to various skill levels:

1. SQLZoo ( A Beginner’s Friend)

Website : SQLZoo
Skill Level : Beginner to Intermediate

SQLZoo is an ideal starting point for beginners. It offers a structured set of interactive SQL tutorials and exercises. You can practice SQL queries directly in your web browser. The tutorials cover a wide range of SQL topics, ensuring you can progress at your own pace.

Example : SQLZoo provides a beginner’s tutorial on “SELECT from WORLD,” which allows you to practice basic SQL SELECT statements by querying data about countries worldwide.

Cool Features:

Interactive Learning : It helps you learn SQL by doing, not just reading.
Teaches Basics to Advanced : Covers SQL from simple to more complex.
Great for Beginners : If you’re new to SQL, this is a great place to start.
Instant Feedback : It tells you if you’re doing it right or wrong.
Covers a Lot : You can learn many things about SQL here.
Not Real Databases : It doesn’t use real databases, so it’s not like working on a real job.
Less Practical : You might not feel how SQL is used in the real world.

2. Codecademy (Beginner-Friendly)

Website : Codecademy SQL Course
Skill Level : Beginner

Codecademy offers an interactive SQL course intended for beginners. The course includes hands-on coding exercises with instant feedback, making it easy to learn and practice SQL step by step. Here’s a sample code of a beginner exercise:

Example : You can practice SQL by retrieving specific information from a database. For instance, you can learn to extract data from a table called employees :

Cool Features :

Interactive Practice : You can practice SQL in a friendly online environment.
Tracks Your Progress : It keeps an eye on how you’re doing.
Hands-On Learning : You learn by doing, not just listening.
Easy to Follow : The lessons are set up in an easy-to-follow way.
Fun to Use : It’s designed to make learning fun.
Not for Advanced : If you’re already good at SQL, you might find it too basic.
Paid Features : Some advanced stuff might need a paid subscription.

3. SQLFiddle (Intermediate)

Website : SQLFiddle
Skill Level : Intermediate

SQLFiddle is a web-based tool that allows you to write and execute SQL queries in various database systems. This platform is perfect for practicing SQL in a real-world context. Here’s a sample use case:

Example : You can use SQLFiddle to experiment with different SQL database systems like MySQL, PostgreSQL, or SQLite. For instance, you can create a table and insert data into it.

Test on Different Systems : You can practice SQL on different systems like MySQL and PostgreSQL.
Feels Real : It’s like playing with a real database.
Learn on Real Systems : You get to know how SQL works on different databases.
Share Your Work : You can show your SQL to others and ask for help.
See Results Right Away : It quickly tells you what happened after you run a query.
Not Much Learning Material : It’s more for practice, not so much for learning from scratch.
Not Great for Newbies : If you’re new to SQL, it can be tough.

4. HackerRank (Intermediate to Advanced)

Website : HackerRank SQL Challenges
Skill Level : Intermediate to Advanced

HackerRank offers a series of SQL challenges and competitions that are suitable for intermediate and advanced learners. The challenges cover a wide range of SQL topics, and you can earn badges to showcase your skills. Let’s look at a more advanced example:

Example : HackerRank challenges often involve complex SQL queries, such as calculating the average salary of employees in a specific department.

Challenges and Competitions : It’s like a game with SQL problems.
Earn Points : You can get points and show off your skills.
Fun Learning : It’s like a game, so it’s fun and keeps you going.
Talk to Others : You can chat with others, ask questions, and learn from them.
For All Levels : There are easy and hard challenges for everyone.
Some Are Tough : Some challenges are really hard, so if you’re new, you might get stuck.
Not Many Tutorials : It’s more about solving problems, not so much about learning SQL from scratch.

5. LeetCode (Advanced)

Website : LeetCode SQL Problems
Skill Level : Advanced

LeetCode is renowned for its coding challenges, including SQL problems. These challenges are perfect for those looking to take their SQL skills to an advanced level. The platform also provides community discussions and solutions. Here’s an example of an advanced SQL challenge:

Example : LeetCode challenges may involve complex data manipulations and aggregations, such as finding the second-highest salary in a database.

Puzzles for SQL : Like solving puzzles with SQL.
Learn from Others : See how others solve problems.
Advanced Stuff : Good if you want to get really good at SQL.
Talk to People : You can see what others do and learn from them.
Good for Interviews : Some questions help you get ready for job interviews.
Not for Beginners : It can be really hard if you’re just starting with SQL.
Not Much for Learning : It’s more about solving problems, not learning step by step.

6. W3Schools SQL Tutorial (Beginner-Friendly)

Website : W3Schools SQL Tutorial

W3Schools offers an extensive and beginner-friendly SQL tutorial. It includes interactive examples and exercises that allow you to practice SQL in a structured manner. The tutorials cover various SQL topics, making it a valuable resource for beginners. Here’s an example of a simple query:

Example : You can practice SQL by learning how to retrieve data from a table in the tutorial. For instance, you can learn to select data from a table named “customers.”

Real Databases : You set up your own database like a real job.
Learn by Doing : You learn by practicing on a real database.
Real-Life Practice : This is the closest thing to working on a real job.
Choose Your System : You can use the kind of database you want, like MySQL or PostgreSQL.
Good for Everyone : If you’re just starting or already know SQL, you can use this.
Setting Up Can Be Hard : It might be tricky to set up a real database if you’re new.
No Step-by-Step Learning : You have to find your own way because it’s more about practice.

7. SQLBolt (Interactive)

Website : SQLBolt

SQLBolt, previously mentioned, is not just for beginners. It also offers intermediate and advanced SQL tutorials. You can move on to more complex exercises, including JOIN operations and subqueries, as you progress. Here’s an example of an intermediate-level query:

Example : You can practice SQL by learning how to perform SQL JOINs on tables. For example, you can retrieve data from two related tables, “orders” and “customers.”

Lots of Lessons : You can learn a lot about SQL here.
Try It Live : It’s like practicing in a real coding editor.
Learn Lots : You can learn from the basics to the advanced stuff.
Practice as You Go : It’s easy to practice right where you’re learning.
Organized and Simple : The lessons are easy to follow and clear.
Not Real Databases : You don’t practice much with real databases.
Lots to Learn : Some might find all the lessons a bit too much.

8. Mode Analytics (Advanced)

Website : Mode Analytics

Mode Analytics provides an advanced SQL tutorial with a focus on data analysis and exploration. It’s designed for users who want to leverage SQL for in-depth data analytics and visualization. You can practice complex queries and data manipulation. Here’s an example of an advanced SQL query:

Example : You can practice SQL by creating advanced data visualizations using SQL queries, like plotting time series data or creating interactive dashboards.

Data Analysis Focus : It’s all about using SQL for data analysis.
See Your Data : You can turn data into charts and dashboards.
Advanced Data Analysis : Good if you want to learn how to analyze data deeply.
Useful Skills : You learn how to use data for real-world decisions.
Good for Analysts : People who want to work with data will like this.
Not for Starters : If you’re just starting, you might find it too tough.
Not for All SQL : It’s not good for learning all SQL things, especially for databases.

9. SQL Pad (Intermediate to Advanced)

Websi te: SqlPad

StrataScratch offers an interactive SQL platform with a vast library of real-world SQL challenges. It’s suitable for users looking to enhance their SQL skills through practical problem-solving. You can practice SQL by tackling real data problems from various domains. Here’s an example of an advanced challenge:

Example : You can practice SQL by solving complex problems, such as optimizing a query for performance or analyzing datasets to draw actionable insights.

Real-Life Problems : You solve real data problems from different jobs.
Learn by Doing : You get better at SQL by doing it.
Solve Real Stuff : You learn to solve real problems with data.
Many Kinds of Problems : It has lots of different things to practice on.
Good for Work : If you want to work with data, this is great practice.
Not for Starters : If you’re just starting with SQL, it might be too hard.
Some Paid Stuff : If you want to use all the things, you might need to pay.

Remember, the choice of the platform depends on your skill level and your specific SQL learning objectives. Each of these platforms offers a unique approach to learning and practicing SQL, so explore them to find the one that best suits your needs and preferences. Happy SQL practicing!

10. Your Local Database (Practical)

Skill Level : All Levels

For a real-world experience, consider setting up your own local database using software like MySQL, PostgreSQL, or SQLite. This hands-on practice allows you to create databases, load data, and run queries. Here’s a practical example:

Example : You can create a simple table in your local database and insert data into it:

How to Use SQL Practice Platforms

The best way to test your SQL programming skills is to start with the basics and then gradually work your way up to more challenging exercises. As you practice, be sure to pay attention to the feedback that the platform provides. This feedback can help you identify your strengths and weaknesses, and it can also help you learn from your mistakes.

Here are some additional tips for practicing SQL:

Practice regularly. The more you practice, the better you will become at SQL.
Use a variety of resources. There are many different SQL practice resources available, so don’t be afraid to try different ones until you find one that works for you.
Challenge yourself. Don’t be afraid to try more challenging exercises, even if you make mistakes. The more you challenge yourself, the better you will become at SQL.
Get help. If you get stuck on a particular exercise, don’t be afraid to ask for help from a friend, mentor, or online forum.

Practicing SQL is a valuable investment in your career, and the platforms mentioned in this tutorial cater to different skill levels. Start with beginner-friendly options, progress to more advanced challenges, and don’t forget to apply your skills to real-world projects. As you practice and gain confidence, you’ll become a proficient SQL user. Happy querying!

Written by Meenakshi Agarwal

Meenakshi Agarwal, 10+ years in IT. Managing TechBeamers.com , creating insightful tutorials, quizzes, and exercises in Python, Java, SQL, Selenium, C-Sharp.

Text to speech

Database News

DBAs often have a need to identify why a problem has occurred, or is occurring in their SQL Server database. This article covers some of the tools you can use to look for clues, and the steps you might go through to help troubleshoot a SQL Server problem.

We all have problems that occur from time to time, where we need to work through some problem solving steps to identify why the problem occurred, or is occurring. In these situations, you need to act like a Crime Scene Investigator (CSI) to uncover the root cause of the problem. No, we do not put yellow caution tape around our servers, our network cables, and our desktop machines. However, sometimes we do unplug the network cable to prevent the machine from being further contaminated, in rare cases. Sometimes it is obvious what caused the problem, but not always. Regardless of the problem, you will need to do some forensic analysis to determine the cause of the problem. In this article, I will cover some of the tools you can use to look for clues, and the steps you might go through to help troubleshoot a problem.

Problem Solving Tools

There are a number of different tools you can use for troubleshooting. I cannot cover, nor do I know of all the tools you could possibly use for troubleshooting, therefore I will cover the most common tools that are available within SQL Server and the Windows OS. In most cases, you should be able to find enough information using these tools to provide you with enough clues to determine the cause of a particular problem.

Here is a list of those commonly used tools:

Event viewer

When SQL Server starts up, it starts a default trace event, provided the default trace is enabled. Profiler can be used to review the information captured from the default trace event. It is amazing what you can find by exploring the default trace information. Additionally you might find it useful to create your own traces while troubleshooting a problem.

Notepad does not seem like much of a tool; however, it can be used to open different log files. Notepad allows you to do string searches within large log files to quickly locate information. . If your log files are too big you might have to use WordPad as an alternative.

Sometime when your system is having problems, event records will be written to the Windows event logs. You can browse through the events one at a time using the Event viewer. Events in the event log may provide a quick answer to why your SQL Server instance is not behaving as it should, provided there are event records associated with the problem you are trying to solve.

Information Gathering Phase

In order to diagnose a problem you first need to gather some information about the problem. You also need to review log files to determine what kind of system error messages and log records exist that might help you to diagnose the problem. Below are a set of steps you should consider when going through the information gathering phase of your forensic analysis.

Step 1: Gathering the Facts

The first step in any problem solving exercise is to gather the facts. You need to know what kind of problem is happening. This is where you need to interview the customer, or programmer to understand how and when the problem occurs. You need to determine if it is system wide, or is it more localized to a particular application, or component of an application. You also need to know the timeframe around when the problem occurred, and whether or not it is still a problem. In addition to this, you need to know the last time the system was working correctly. You need to determine if any new system or application changes were introduced that might have caused the problem. Armed with some facts about the problem you can start to look for clues that might help identify the root cause of the problem.

Step 2: Test in Different environments and Machines

It is worth testing in different environments, if you have them. This is a fact gathering exercise, but I spelled it out as a separate step because lots of times seasoned staff do not think about performing tests in separate environments when they gather facts.

You might find only one environment is affected, a set of environments or all environments. If only one environment is affected, the problem might be a configuration issue with that environment, or the other working environments. Alternatively, it might be the data in the environment that is causing the problem.

Additionally you might want to try different client machines, or application servers. Occasional, you might find that a different configuration or a set up is causing the application to work, or not work. You need to explore all the different setup and configuration options and then document those that work, and those that do not work.

Step 3: Review the SQL Server Error Log

SQL Server creates a log file called "ERRORLOG". A new ERRORLOG file is created every time SQL Server starts up. SQL Server by default keeps six old errors log files, where each one has a sequential number associated. The ERRORLOG file by default is stored in the "Log" folder within the standard "…Program FilesMicrosoft SQL Server…" folder structure.

Find the log file that is associated with the timeframe for when the problem first occurred. Look to see if there are any anomalies in the messages being outputted by SQL Server. Sometime if SQL Server detects a change, or encounters a problem, it will be logged in the ERRORLOG file.

Step 4: Review the Event Log

You should use the Event Viewer to look at the different event log records. The event log contains both informational warnings and error events. You should look at all the events that occurred shortly before, during and after the timeframe of the identified problem. You need to make sure you review both the "Application" and "System" events, as well as the "Security" events.

Step 5: Review the Default Trace

The default trace, as stated earlier, is a trace that SQL Server starts automatically when it starts up, if the default trace option is enabled. It is similar to the flight recorder in a modern jet. This trace captures all configuration changes to an instance. By reviewing the default trace information you can identify what kind of database changes might have been made during the period of time when the problem was identified.

The default trace files are stored in the same log folder as the ERRORLOG. They are named like "log_xxx.trc", where xxx is a sequential number. You can open these files with profiler to see the recorded events. Alternatively, you can use the "fn_trace_gettable" function to process the file using T-SQL, like so:

Step 6: Review the Change Log

Review your organization’s change log. I hope your organization has one. A change log is some centralized location that identifies all changes the have been introduced. If your organization has one, this helps identify any changes that have recently occurred. This log might provide you with some clues as to why a particular problem is occurring, especially if the application that is having the problem is the one that has been modified recently. If your organization does not have a change log, then during step 1 you might ask a programmer when the last application change was made.

Analysis Phase

Now that you have gathered some information, you need to analyze the data that you have gathered. Review the information collected in each step. Look for anomalies that would support the problem identified by the customer or programmer.

Take the situation identified in step 1 above and try to determine how each log or the trace file might help you identify why the problem is occurring. Review the information available in each step to see if there are any clues that will allow you to gaining a better understanding of what is causing the problem.

After you have done this analysis, you might be lucky and identify the cause of the problem. However, there will be times when the steps above do not yield a solution to the problem. In this case, you will need to move on and do some additional testing and information gathering.

Additional Testing and Information Gathering

If you can’t find the problem by reviewing the different system logs, the change log, or the default trace then you will need to resort to analyzing the actual process that caused the problem. This means you might have to look at code and even runs some different tests. The rest of the steps identified here is only a starting point. They should help you organize your thoughts on how you might go about performing additional testing and gathering more information to help you resolve the problem at hand.

Step 7: Develop a testing plan

Sit down with the customer and the application programmer and document the steps they are going through, which is causing the problem. Much of this information might have already been collected in step 1, but it is at least worth going over again. Identify if the problem is repeatable. If it cannot be repeated then it might be difficult to determine what caused the problem. The point here is to identify how the application is connecting to SQL Server and the T-SQL code that is being executed. Based on the problem being investigated you will need to develop a set of tests to run and what information you might want to capture along the way to identify what is going on. Prior to doing any testing, I would suggest you go to the next step.

Step 8: Backup Database

Before moving forward with doing additional testing, analysis and troubleshooting, it might be prudent to do a backup of the problem database. This backup can be a full, differential or log backup depending on your current database backup strategy, and the status of your last backup. This backup will provide you a recovery point should you want to start tweaking SQL Server as part of your diagnostics troubleshooting steps identified in step 6.

Step 9: Perform Additional tests and logging

Break up your test into small logical pieces if possible. For those steps of the tests that connect to SQL Server, you might consider turning on SQL Server Profiler so you can monitor what kind of T-SQL statements and batches are being executed. Profiler will allow you to capture the code that is running, which sometimes is different than what the programmer expects, Sometimes Profiler and the additional steps are all it takes to narrow down what is causing the issues at hand.

CSI approach to Resolving Problems

One of the most rewarding thing as a DBA is to help a programmer and customer resolve a problem. The harder the problem is to resolve the greater the reward. Above, I have identified a stepwise approach to troubleshooting a problem. The approach identified some logs to look at and some tools to use, to help you identify and troubleshoot problems. You may find that these steps do not necessarily meet your needs and that is ok. The important point is to understand that you need to develop and follow a troubleshooting process for your environment. Having a set of questions, steps and tools will allow you to be proactive in developing an approach that you can use in resolving problems quickly, and methodically.

» See All Articles by Columnist Gregory A. Larsen

Best online courses to learn sql, best courses for database administrators, tip 74 – changing cost threshold for parallelism, get the free newsletter.

Subscribe to Cloud Insider for top news, trends & analysis

Latest Articles

Sql sort by statement, working with null values in sql, working with sql and, or, and not operators, how to use databases with python.

We have covered a wide range of topics in the sections beginner , intermediate and advanced .

Basic Retrieval
Arithmetic Operations and Comparisons:
Aggregation Functions
Group By and Having
Window Functions
Conditional Statements
DateTime Operations
Creating and Aliasing
Constraints
Stored Procedures:
Transactions

let’s create the table schemas and insert some sample data into them.

Create Sales table

sale_id	product_id	quantity_sold	sale_date	total_price
1	101	5	2024-01-01	2500.00
2	102	3	2024-01-02	900.00
3	103	2	2024-01-02	60.00
4	104	4	2024-01-03	80.00
5	105	6	2024-01-03	90.00

Create Products table

product_id	product_name	category	unit_price
101	Laptop	Electronics	500.00
102	Smartphone	Electronics	300.00
103	Headphones	Electronics	30.00
104	Keyboard	Electronics	20.00
105	Mouse	Electronics	15.00

This hands-on approach provides a practical environment for beginners to experiment with various SQL commands, gaining confidence through real-world scenarios. By working through these exercises, newcomers can solidify their understanding of fundamental concepts like data retrieval, filtering, and manipulation, laying a strong foundation for their SQL journey.

1. Retrieve all columns from the Sales table.

Explanation:

This SQL query selects all columns from the Sales table, denoted by the asterisk (*) wildcard. It retrieves every row and all associated columns from the Sales table.

2. Retrieve the product_name and unit_price from the Products table.

product_name	unit_price
Laptop	500.00
Smartphone	300.00
Headphones	30.00
Keyboard	20.00
Mouse	15.00

This SQL query selects the product_name and unit_price columns from the Products table. It retrieves every row but only the specified columns, which are product_name and unit_price.

3. Retrieve the sale_id and sale_date from the Sales table.

sale_id	sale_date
1	2024-01-01
2	2024-01-02
3	2024-01-02
4	2024-01-03
5	2024-01-03

This SQL query selects the sale_id and sale_date columns from the Sales table. It retrieves every row but only the specified columns, which are sale_id and sale_date.

4. Filter the Sales table to show only sales with a total_price greater than $100.

sale_id	product_id	quantity_sold	sale_date	total_price
1	101	5	2024-01-01	2500.00
2	102	3	2024-01-02	900.00

This SQL query selects all columns from the Sales table but only returns rows where the total_price column is greater than 100. It filters out sales with a total_price less than or equal to $100.

5. Filter the Products table to show only products in the ‘Electronics’ category.

This SQL query selects all columns from the Products table but only returns rows where the category column equals ‘Electronics’. It filters out products that do not belong to the ‘Electronics’ category.

6. Retrieve the sale_id and total_price from the Sales table for sales made on January 3, 2024.

sale_id	total_price
4	80.00
5	90.00

This SQL query selects the sale_id and total_price columns from the Sales table but only returns rows where the sale_date is equal to ‘2024-01-03’. It filters out sales made on any other date.

7. Retrieve the product_id and product_name from the Products table for products with a unit_price greater than $100.

product_id	product_name
101	Laptop
102	Smartphone

This SQL query selects the product_id and product_name columns from the Products table but only returns rows where the unit_price is greater than $100. It filters out products with a unit_price less than or equal to $100.

8. Calculate the total revenue generated from all sales in the Sales table.

total_revenue
3630.00

This SQL query calculates the total revenue generated from all sales by summing up the total_price column in the Sales table using the SUM() function.

9. Calculate the average unit_price of products in the Products table.

average_unit_price
173

This SQL query calculates the average unit_price of products by averaging the values in the unit_price column in the Products table using the AVG() function.

10. Calculate the total quantity_sold from the Sales table.

total_quantity_sold
20

This SQL query calculates the total quantity_sold by summing up the quantity_sold column in the Sales table using the SUM() function.

11. Retrieve the sale_id, product_id, and total_price from the Sales table for sales with a quantity_sold greater than 4.

sale_id	product_id	total_price
1	101	2500.00
5	105	90.00

This SQL query selects the sale_id, product_id, and total_price columns from the Sales table but only returns rows where the quantity_sold is greater than 4.

12. Retrieve the product_name and unit_price from the Products table, ordering the results by unit_price in descending order.

This SQL query selects the product_name and unit_price columns from the Products table and orders the results by unit_price in descending order using the ORDER BY clause with the DESC keyword.

13. Retrieve the total_price of all sales, rounding the values to two decimal places.

product_name
3630.00

This SQL query calculates the total sales revenu by summing up the total_price column in the Sales table and rounds the result to two decimal places using the ROUND() function.

14. Calculate the average total_price of sales in the Sales table.

average_total_price
726.000000

This SQL query calculates the average total_price of sales by averaging the values in the total_price column in the Sales table using the AVG() function.

15. Retrieve the sale_id and sale_date from the Sales table, formatting the sale_date as ‘YYYY-MM-DD’.

sale_id	formatted_date
1	2024-01-01
2	2024-01-02
3	2024-01-02
4	2024-01-03
5	2024-01-03

This SQL query selects the sale_id and sale_date columns from the Sales table and formats the sale_date using the DATE_FORMAT() function to display it in ‘YYYY-MM-DD’ format.

16. Calculate the total revenue generated from sales of products in the ‘Electronics’ category.

This SQL query calculates the total revenue generated from sales of products in the ‘Electronics’ category by joining the Sales table with the Products table on the product_id column and filtering sales for products in the ‘Electronics’ category.

17. Retrieve the product_name and unit_price from the Products table, filtering the unit_price to show only values between $20 and $600.

product_name	unit_price
Laptop	500.00
Smartphone	300.00
Headphones	30.00
Keyboard	20.00

This SQL query selects the product_name and unit_price columns from the Products table but only returns rows where the unit_price falls within the range of $50 and $200 using the BETWEEN operator.

18. Retrieve the product_name and category from the Products table, ordering the results by category in ascending order.

product_name	category
Laptop	Electronics
Smartphone	Electronics
Headphones	Electronics
Keyboard	Electronics
Mouse	Electronics

This SQL query selects the product_name and category columns from the Products table and orders the results by category in ascending order using the ORDER BY clause with the ASC keyword.

19. Calculate the total quantity_sold of products in the ‘Electronics’ category.

This SQL query calculates the total quantity_sold of products in the ‘Electronics’ category by joining the Sales table with the Products table on the product_id column and filtering sales for products in the ‘Electronics’ category.

20. Retrieve the product_name and total_price from the Sales table, calculating the total_price as quantity_sold multiplied by unit_price.

product_name	total_price
Laptop	2500.00
Smartphone	900.00
Headphones	60.00
Keyboard	80.00
Mouse	90.00

This SQL query retrieves the product_name from the Sales table and calculates the total_price by multiplying quantity_sold by unit_price, joining the Sales table with the Products table on the product_id column.

These exercises are designed to challenge you beyond basic queries, delving into more complex data manipulation and analysis. By tackling these problems, you’ll solidify your understanding of advanced SQL concepts like joins, subqueries, functions, and window functions, ultimately boosting your ability to work with real-world data scenarios effectively.

1. Calculate the total revenue generated from sales for each product category.

category	total_revenue
Electronics	3630.00

This query joins the Sales and Products tables on the product_id column, groups the results by product category, and calculates the total revenue for each category by summing up the total_price.

2. Find the product category with the highest average unit price.

category
Electronics

This query groups products by category, calculates the average unit price for each category, orders the results by the average unit price in descending order, and selects the top category with the highest average unit price using the LIMIT clause.

3. Identify products with total sales exceeding $500.

product_name
Headphones
Keyboard
Laptop
Mouse
Smartphone

This query joins the Sales and Products tables on the product_id column, groups the results by product name, calculates the total sales revenue for each product, and selects products with total sales exceeding 30 using the HAVING clause.

4. Count the number of sales made in each month.

month	sales_count
2024-01	5

This query formats the sale_date column to extract the month and year, groups the results by month, and counts the number of sales made in each month.

5. Determine the average quantity sold for products with a unit price greater than $100.

average_quantity_sold
4.0000

This query joins the Sales and Products tables on the product_id column, filters products with a unit price greater than $100, and calculates the average quantity sold for those products.

6. Retrieve the product name and total sales revenue for each product.

product_name	total_revenue
Laptop	2500.00
Smartphone	900.00
Headphones	60.00
Keyboard	80.00
Mouse	90.00

This query joins the Sales and Products tables on the product_id column, groups the results by product name, and calculates the total sales revenue for each product.

7. List all sales along with the corresponding product names.

sale_id	product_name
1	Laptop
2	Smartphone
3	Headphones
4	Keyboard
5	Mouse

This query joins the Sales and Products tables on the product_id column and retrieves the sale_id and product_name for each sale.

8. Retrieve the product name and total sales revenue for each product.

category	category_revenue	revenue_percentage
Electronics	3630.00	100.000000

This query will give you the top three product categories contributing to the highest percentage of total revenue generated from sales. However, if you only have one category (Electronics) as in the provided sample data, it will be the only result.

9. Rank products based on total sales revenue.

product_name	total_revenue	revenue_rank
Laptop	2500.00	1
Smartphone	900.00	2
Mouse	90.00	3
Keyboard	80.00	4
Headphones	60.00	5

This query joins the Sales and Products tables on the product_id column, groups the results by product name, calculates the total sales revenue for each product, and ranks products based on total sales revenue using the RANK () window function.

10. Calculate the running total revenue for each product category.

category	product_name	sale_date	running_total_revenue
Electronics	Laptop	2024-01-01	2500.00
Electronics	Smartphone	2024-01-02	3460.00
Electronics	Headphones	2024-01-02	3460.00
Electronics	Keyboard	2024-01-03	3630.00
Electronics	Mouse	2024-01-03	3630.00

This query joins the Sales and Products tables on the product_id column, partitions the results by product category, orders the results by sale date, and calculates the running total revenue for each product category using the SUM() window function.

11. Categorize sales as “High”, “Medium”, or “Low” based on total price (e.g., > $200 is High, $100-$200 is Medium, < $100 is Low).

sale_id	sales_category
1	High
2	High
3	Low
4	Low
5	Low

This query categorizes sales based on total price using a CASE statement. Sales with a total price greater than $200 are categorized as “High”, sales with a total price between $100 and $200 are categorized as “Medium”, and sales with a total price less than $100 are categorized as “Low”.

12. Identify sales where the quantity sold is greater than the average quantity sold.

sale_id	product_id	quantity_sold	sale_date	total_price
1	101	5	2024-01-01	2500.00
5	105	6	2024-01-03	90.00

This query selects all sales where the quantity sold is greater than the average quantity sold across all sales in the Sales table.

13. Extract the month and year from the sale date and count the number of sales for each month.

month	sales_count
2024-01	5

14. Calculate the number of days between the current date and the sale date for each sale.

sale_id	days_since_sale
1	185
2	184
3	184
4	183
5	183

This query calculates the number of days between the current date and the sale date for each sale using the DATEDIFF function.

15. Identify sales made during weekdays versus weekends.

sale_id	day_type
1	Weekday
2	Weekday
3	Weekday
4	Weekend
5	Weekend

This query categorizes sales based on the day of the week using the DAYOFWEEK function. Sales made on Sunday (1) or Saturday (7) are categorized as “Weekend”, while sales made on other days are categorized as “Weekday”.

This section likely dives deeper into complex queries, delving into advanced features like window functions, self-joins, and intricate data manipulation techniques. By tackling these challenging exercises, users can refine their SQL skills and tackle real-world data analysis scenarios with greater confidence and efficiency.

1. Write a query to create a view named Total_Sales that displays the total sales amount for each product along with their names and categories.

product_name	category	total_sales_amount
Laptop	Electronics	2500.00
Smartphone	Electronics	900.00
Headphones	Electronics	60.00
Keyboard	Electronics	80.00
Mouse	Electronics	90.00

This query creates a view named Total_Sales that displays the total sales amount for each product along with their names and categories.

2. Retrieve the product details (name, category, unit price) for products that have a quantity sold greater than the average quantity sold across all products.

product_name	category	unit_price
Laptop	Electronics	500.00
Mouse	Electronics	15.00

This query retrieves the product details (name, category, unit price) for products that have a quantity sold greater than the average quantity sold across all products.

3. Explain the significance of indexing in SQL databases and provide an example scenario where indexing could significantly improve query performance in the given schema.

sale_id	product_id	quantity_sold	sale_date	total_price
4	104	4	2024-01-03	80.00
5	105	6	2024-01-03	90.00

With an index on the sale_date column, the database can quickly locate the rows that match the specified date without scanning the entire table. The index allows for efficient lookup of rows based on the sale_date value, resulting in improved query performance.

4. Add a foreign key constraint to the Sales table that references the product_id column in the Products table.

This query adds a foreign key constraint to the Sales table that references the product_id column in the Products table, ensuring referential integrity between the two tables.

5. Create a view named Top_Products that lists the top 3 products based on the total quantity sold.

product_name	total_quantity_sold
Mouse	6
Laptop	5
Keyboard	4

This query creates a view named Top_Products that lists the top 3 products based on the total quantity sold.

6. Implement a transaction that deducts the quantity sold from the Products table when a sale is made in the Sales table, ensuring that both operations are either committed or rolled back together.

The quantity in stock for product with product_id 101 should be updated to 5.The transaction should be committed successfully.

7. Create a query that lists the product names along with their corresponding sales count.

product_name	sales_count
Headphones	1
Keyboard	1
Laptop	1
Mouse	1
Smartphone	1

This query selects the product names from the Products table and counts the number of sales (using the COUNT() function) for each product by joining the Sales table on the product_id. The results are grouped by product name using the GROUP BY clause.

8. Write a query to find all sales where the total price is greater than the average total price of all sales.

The subquery (SELECT AVG(total_price) FROM Sales) calculates the average total price of all sales. The main query selects all columns from the Sales table where the total price is greater than the average total price obtained from the subquery.

9. Analyze the performance implications of indexing the sale_date column in the Sales table, considering the types of queries commonly executed against this column.

Query without indexing:.

Operation	Details
Filter: (sales.sale_date = DATE’2024-01-01′)	(cost=0.75 rows=1) (actual time=0.020..0.031 rows=1 loops=1)
Table scan on Sales	(cost=0.75 rows=5) (actual time=0.015..0.021 rows=5 loops=1)

Query with Indexing:

Operation	Details
Index lookup on Sales using idx_sale_date (sale_date=DATE’2024-01-01′)	(cost=0.35 rows=1) (actual time=0.024..0.024 rows=1 loops=1)

This format clearly displays the operations and details of the query execution plan before and after indexing.

Without indexing, the query performs a full table scan, filtering rows based on the sale date, which is less efficient. With indexing, the query uses the index to quickly locate the relevant rows, significantly improving query performance.

10. Add a check constraint to the quantity_sold column in the Sales table to ensure that the quantity sold is always greater than zero.

sale_id	product_id	quantity_sold	sale_date	total_price
1	101	5	2024-01-01	2500.00
2	102	3	2024-01-02	900.00
3	103	2	2024-01-02	60.00
4	104	4	2024-01-03	80.00
5	105	6	2024-01-03	90.00

All rows in the Sales table meet the condition of the check constraint, as each quantity_sold value is greater than zero.

11. Create a view named Product_Sales_Info that displays product details along with the total number of sales made for each product.

product_id	product_name	category	unit_price	total_sales
101	Laptop	Electronics	500.00	1
102	Smartphone	Electronics	300.00	1
103	Headphones	Electronics	30.00	1
104	Keyboard	Electronics	20.00	1
105	Mouse	Electronics	15.00	1

This view provides a concise and organized way to view product details alongside their respective sales information, facilitating analysis and reporting tasks.

12. Develop a stored procedure named Update_Unit_Price that updates the unit price of a product in the Products table based on the provided product_id.

The above SQL code creates a stored procedure named Update_Unit_Price. This stored procedure takes two parameters: p_product_id (the product ID for which the unit price needs to be updated) and p_new_price (the new unit price to set).

13. Implement a transaction that inserts a new product into the Products table and then adds a corresponding sale record into the Sales table, ensuring that both operations are either fully completed or fully rolled back.

product_id	product_name	category	unit_price
101	Laptop	Electronics	550.00
102	Smartphone	Electronics	300.00
103	Headphones	Electronics	30.00
104	Keyboard	Electronics	20.00
105	Mouse	Electronics	15.00

This will update the unit price of the product with product_id 101 to 550.00 in the Products table.

14. Write a query that calculates the total revenue generated from each category of products for the year 2024.

category	total_revenue
Electronics	3630.00

When you execute this query, you will get the total revenue generated from each category of products for the year 2024.

If you’re looking to sharpen your SQL skills and gain more confidence in querying database s, consider delving into these articles. They’re packed with query-based SQL questions designed to enhance your understanding and proficiency in SQL .

By practicing with these exercises, you’ll not only improve your SQL abilities but also boost your confidence in tackling various database-related tasks. The Questions are as follows:

How to Insert a Value that Contains an Apostrophe in SQL?
How to Select Row With Max Value in SQL?
How to Efficiently Convert Rows to Columns in SQL?
How To Use Nested Select Queries in SQL
How to Select Row With Max Value on a Column in SQL?
How to Specify Condition in Count() in SQL?
How to Find the Maximum of Multiple Columns in SQL?
How to Update Top 100 Records in SQL?
How to Select the Last Records in a One-To-Many Relationship Using SQL Join
How to Join First Row in SQL?
How to Insert Row If Not Exists in SQL?
How to Use GROUP BY to Concatenate Strings in SQL?
How Inner Join works in LINQ to SQL
How to Get the Identity of an Inserted Row in SQL
How to Declare a Variable in SQL?

Mastering SQL requires consistent practice and hands-on experience. By working through these SQL practice exercises , you’ll strengthen your skills and gain confidence in querying relational databases.

Whether you’re just starting or looking to refine your expertise, these exercises provide valuable opportunities to hone your SQL abilities. Keep practicing , and you’ll be well-equipped to tackle real-world data challenges with SQL.

Please Login to comment...

Similar reads, improve your coding skills with practice.

What kind of Experience do you want to share?

10 Beginner SQL Practice Exercises With Solutions

online practice
sql practice

Table of Contents

The Dataset

Exercise 1: selecting all columns from a table, exercise 2: selecting a few columns from a table, exercise 3: selecting a few columns and filtering numeric data in where, exercise 4: selecting a few columns and filtering text data in where, exercise 5: selecting a few columns and filtering data using two conditions in where, exercise 6: filtering data using where and sorting the output, exercise 7: grouping data by one column, exercise 8: grouping data by multiple columns, exercise 9: filtering data after grouping, exercise 10: selecting columns from two tables, that was fun now, time to do sql practice on your own.

Solve these ten SQL practice problems and test where you stand with your SQL knowledge!

This article is all about SQL practice. It’s the best way to learn SQL. We show you ten SQL practice exercises where you need to apply essential SQL concepts. If you’re an SQL rookie, no need to worry – these examples are for beginners.

Use them as a practice or a way to learn new SQL concepts. For more theoretical background and (even more!) exercises, there’s our interactive SQL Basics course. It teaches you how to select data from one or more tables, aggregate and group data, write subqueries, and use set operations. The course comprises 129 interactive exercises so there is no lack of opportunities for SQL practice, especially if you add some of the 12 ways of learning SQL online to it.

Speaking of practice, let’s start with our exercises!

The question is always where to find data for practicing SQL. We’ll use our dataset for all exercises. No need to limit yourself to this, though – you can find other free online datasets for practicing SQL .

Our dataset consists of two tables.

The table distribution_companies lists movie distribution companies with the following columns:

id – The ID of the distribution company. This is the primary key of the table.
company_name – The name of the distribution company.

The table is shown below.

id	company_name
1	Columbia Pictures
2	Paramount Pictures
3	Warner Bros. Pictures
4	United Artists
5	Universal Pictures
6	New Line Cinema
7	Miramax Films
8	Produzioni Europee Associate
9	Buena Vista
10	StudioCanal

The second table is movies . These are the columns:

id – The ID of the movie. This is the primary key of the table.
movie_title – The movie title.
imdb_rating – The movie rating on IMDb.
year_released – The year the movie was released.
budget – The budget for the movie in millions of dollars.
box_office – The earnings of the movie in millions of dollars.
distribution_company_id – The ID of the distribution company, referencing the table distribution_companies (foreign key).
language – The language(s) spoken in the movie.

id	movie_title	imdb_rating	year_released	budget	box_office	distribution_company_id	language
1	The Shawshank Redemption	9.2	1994	25.00	73.30	1	English
2	The Godfather	9.2	1972	7.20	291.00	2	English
3	The Dark Knight	9.0	2008	185.00	1,006.00	3	English
4	The Godfather Part II	9.0	1974	13.00	93.00	2	English, Sicilian
5	12 Angry Men	9.0	1957	0.34	2.00	4	English
6	Schindler's List	8.9	1993	22.00	322.20	5	English, German, Yiddish
7	The Lord of the Rings: The Return of the King	8.9	2003	94.00	1,146.00	6	English
8	Pulp Fiction	8.8	1994	8.50	213.90	7	English
9	The Lord of the Rings: The Fellowship of the Ring	8.8	2001	93.00	898.20	6	English
10	The Good, the Bad and the Ugly	8.8	1966	1.20	38.90	8	English, Italian, Spanish

Exercise: Select all data from the table distribution_companies .

Solution explanation: Select the data using the SELECT statement. To select all the columns, use an asterisk ( * ). The table from which the data is selected is specified in the FROM clause.

Solution output:

Exercise: For each movie, select the movie title, the IMDb rating, and the year the movie was released.

Solution explanation: List all the columns needed ( movie_title , imdb_rating , and year_released ) in the SELECT statement, separated by the comma. Reference the table movies in the FROM clause.

movie_title	imdb_rating	year_released
The Shawshank Redemption	9.2	1994
The Godfather	9.2	1972
The Dark Knight	9.0	2008
The Godfather Part II	9.0	1974
12 Angry Men	9.0	1957
Schindler's List	8.9	1993
The Lord of the Rings: The Return of the King	8.9	2003
Pulp Fiction	8.8	1994
The Lord of the Rings: The Fellowship of the Ring	8.8	2001
The Good, the Bad and the Ugly	8.8	1966

Exercise: Select the columns movie_title and box_office from the table movies . Show only movies with earnings above $300 million.

Solution explanation: List the columns in SELECT and reference the table in FROM . Use a WHERE clause to filter the data – write the column box_office and use the ‘greater than’ operator ( > ) to show only values above $300 million.

movie_title	box_office
The Dark Knight	1,006.00
Schindler's List	322.20
The Lord of the Rings: The Return of the King	1,146.00
The Lord of the Rings: The Fellowship of the Ring	898.20

Exercise: Select the columns movie_title , imdb_rating , and year_released from the table movies . Show movies that have the word ‘Godfather’ in the title.

Solution explanation: List the columns in SELECT and reference the table in the FROM clause. Use a WHERE clause to filter the data. After writing the column name, use the LIKE logical operator to look for ‘Godfather’ in the movie title, written in single quotes. To find the word anywhere in the movie title, place the wildcard character ( % ) before and after the word.

movie_title	imdb_rating	year_released
The Godfather	9.2	1972
The Godfather Part II	9.0	1974

Exercise: Select the columns movie_title , imdb_rating , and year_released from the table movies . Show movies that were released before 2001 and had a rating above 9.

Solution explanation: List the columns in SELECT and reference the table in FROM . Set the first condition that the year released is before 2001 using the ‘less than’ ( < ) operator. To add another condition, use the AND logical operator. Use the same logic as the first condition, this time using the ‘greater than’ operator with the column imdb_rating .

movie_title	imdb_rating	year_released
The Shawshank Redemption	9.2	1994
The Godfather	9.2	1972

Exercise: Select the columns movie_title , imdb_rating , and year_released from the table movies . Show movies released after 1991. Sort the output by the year released in ascending order.

Solution explanation: List the columns in SELECT and reference the table in FROM . Filter the data with WHERE by applying the ‘greater than’ operator to the column year_released . To sort the data, use an ORDER BY clause and write the column name by which you wish to sort. The type of sorting is specified by writing ASC (ascending) or DESC (descending). If the type is omitted, the output is sorted in ascending order by default.

movie_title	imdb_rating	year_released
Schindler's List	8.9	1993
The Shawshank Redemption	9.2	1994
Pulp Fiction	8.8	1994
The Lord of the Rings: The Fellowship of the Ring	8.8	2001
The Lord of the Rings: The Return of the King	8.9	2003
The Dark Knight	9.0	2008

Exercise: Show the count of movies per each language category.

Solution explanation: Select the column language from the table movies . To count the number of movies, use the aggregate function COUNT() . Use the asterisk ( * ) to count the rows, which equals the count of movies. To give this column a name, use the AS keyword followed by the desired name. To show the count by language, you need to group the data by it, so write the column language in the GROUP BY clause.

language	number_of_movies
English	7
English, German, Yiddish	1
English, Sicilian	1
English, Italian, Spanish	1

Exercise: Show the count of movies by year released and language. Sort results by the release date in ascending order.

Solution explanation: List the columns year_released and language from the table movies in SELECT . Use COUNT(*) to count the number of movies and give this column a name using the AS keyword. Specify the columns by which you want to group in the GROUP BY clause. Separate each column name with a comma. Sort the output using ORDER BY with the column year_released and the ASC keyword.

year_released	language	number_of_movies
1957	English	1
1966	English, Italian, Spanish	1
1972	English	1
1974	English, Sicilian	1
1993	English, German, Yiddish	1
1994	English	2
2001	English	1
2003	English	1
2008	English	1

Exercise: Show the languages spoken and the average movie budget by language category. Show only the languages with an average budget above $50 million.

Solution explanation: Select the column language from the table movies . To compute the average budget, use the aggregate function AVG() with the column budget in parentheses. Name the column in the output by using the AS keyword. Group the data by rating using GROUP BY . To filter the data after grouping, use a HAVING clause. In it, use the same AVG() construct as in SELECT and set the values to be above 50 using the ‘greater than’ operator.

language	movie_budget
English	59.01

Exercise: Show movie titles from the table movies , each with the name of its distribution company.

Solution explanation: List the columns movie_title and company_name in SELECT . In the FROM clause, reference the table distribution_companies . Give it an alias dc to shorten its name for use later. The AS keyword is omitted here; you may use it if you wish. To access the data from the other table, use JOIN (it may also be written as INNER JOIN ) and write the table name after it. Give this table an alias also. The join used here is an inner type of join; it returns only the rows that match the joining condition specified in the ON clause. The tables are joined where the column id from the table distribution_companies is equal to the column distribution_company_id from the table movies . To specify which column is from which table, use the corresponding alias of each table.

movie_title	company_name
The Shawshank Redemption	Columbia Pictures
The Godfather Part II	Paramount Pictures
The Godfather	Paramount Pictures
The Dark Knight	Warner Bros. Pictures
12 Angry Men	United Artists
Schindler's List	Universal Pictures
The Lord of the Rings: The Fellowship of the Ring	New Line Cinema
The Lord of the Rings: The Return of the King	New Line Cinema
Pulp Fiction	Miramax Films
The Good, the Bad and the Ugly	Produzioni Europee Associate

These ten SQL practice exercises give you a taste of what practicing SQL looks like. Whether you are at the beginner, intermediate, or advanced level, it’s the same. What changes is the complexity of the problems you solve and of the code you write.

Look for more challenges in the SQL Basics course and the Monthly SQL Practice track. Both are excellent for your SQL practice online. This is true, especially if you do not have an opportunity to use SQL on a daily basis in your job.

So, don’t try to test how long it takes to forget what you once knew in SQL! Use every opportunity to solve as many SQL practice problems as possible.