Time flies. It feels like the new year has just started, and yet we’re already at the second T-SQL Tuesday of the year. Our host this February is Andy Leonard (b|t), and his assigned topic is, simply: Why?
Or rather: What is your why? What motivates you, what makes you tick? And most of all: What makes you do the things you do?
The Execution Plan Reference
I do a lot of things. I have a one-person consultancy business, I have a wife and kids, I have a personal life with several hobbies, and I could talk about my motivation for all those choices, and more. But I decided to focus this post on one thing, and one thing only: why do I build the Execution Plan Reference?
Researching, verifying, and writing down all the information in the Execution Plan Reference takes time. A lot of time. And I get nothing in return. I am not paid by anyone for the time I work on this. The site doesn’t run ads, so I get nothing when people use it. In fact, I even have to pay the hosting cost out of my own pocket. It’s easy to understand why people would want to know what drives me to do this.
A full answer to any “why” question usually involves more than just a motivation. Yes, a motivation is the reason why you decide to do something once you have the idea. And what makes you continue to do it after the initial enthusiasm wears off. But what is it that gives you the idea in the first place? That can be something as simple as “someone asked me to”, but that is not where I got the idea for the Execution Plan Reference
The first time I gave a pre-conference session / training day / full-day workshop (or whatever other name you want to give it) on execution plans was back in 2014. I have given it several times since, and I am still submitting it to conferences. Obviously, the content has changed significantly over the past five years. But the main outline has not. I am convinced that people are smart enough that they can read and understand all execution plans, when given the proper knowledge and tools.
There is this well known saying “give a man a fish and he has food for a day; teach a man to fish and he will never be hungry”. My philosophy takes this even one step further. I don’t show people the best fishing spots. I teach people how fish behave, how weather, climate, and other factors affect them, and how that determines where they’ll go – so they will always be able to work out the best fishing spots, even when conditions change.
When I teach about execution plans, I spend only a little time on execution plans as a whole. Most of the day is then spent on individual operators. Because that is key to understanding execution plans. If you don’t understand what an operator does, how it behaves, you’ll never fully understand the execution plans it appears in. And conversely, if you understand the behavior of all operators in an execution plan, you should always be able to work out how the execution plan as whole works. And assuming you’re looking at the execution plan of a slow query (which for most people is the only reason to ever look at an execution plan), that should then also explain why it is slow.
For those who never heard of Pluralsight, this is a company that sells access to online training videos. I have created a course on relational database design for them. And after the first few deliveries of my full-day training, I started talking to Pluralsight to see if they would be interested in creating a course on execution plans, based on my training day materials but even deeper and even more complete. There is only so much I can do in a single day, and I was excited by the idea of being able to go even deeper, be even more complete, in an online course.
Pluralsight and I did reach an agreement, and I started work, but due to reasons in my personal life I was unable to meet deadlines and then the contract expired. I later tried to renegotiate, but we never came to a mutually acceptable agreement anymore. I was still in a rough spot in my personal life, so I could not commit to even the most generous deadlines Pluralsight offered. And at the same time, Pluralsight noticed that the market has much more interest in short courses and short clips, then in the long and in-depth material I wanted to created, so their interest in my content waned.
Once I realized that my proposed course would never go to Pluralsight, I did explore some alternatives for a short while, such as starting a YouTube channel – but I quickly realized that I lack the resources and skills to make good enough content to ever make that work. So I started to come to terms that perhaps I would never be able to make my training day material available outside of the limited circle of participants.
But at that time, I was already working in a tech editor role on Grant Fritchey’s excellent book “SQL Server Execution Plans, Third Edition”. And I have lost track of how often I was looking at a code sample and execution plan, reviewing Grant’s description, and then adding a comment that started with “Okay Grant, this is well beyond the scope of what we want in the book but you are going to love this. I noticed that …”
But I realized that Grant is not the only one who loves those details. There are more geeky people out there that love that kind of information. And there are even more people out there that don’t care about it, until they run into an issue and need exactly that information to help them over the bump. The book was not the right place to put that information. But if I kept it only in my head, or only between Grant and me, then the SQL Server community would never get to benefit from the effort I put into discovering all that stuff. I had to find a way to make this knowledge available to the community.
Exactly that moment, one of the many times I was editing Grant’s book and finding stuff that was too deep for the book but still very relevant, is when I got the idea of the Execution Plan Reference. It was not called that initially. I didn’t know yet what form it would take. But I did realize: I have knowledge in my head that not many people have. That information needs to be made available to the SQL Server community. And if I don’t do it, then who will?
That’s where I got the inspiration. Now let’s review the motivation.
We all love good performance. We all hate slow queries. So when we see slow queries, we want to fix them. And it’s not just us. End users hate it if they have to wait for their screen to refresh. Managers hate applying for faster hardware. And if you think you can just migrate to the cloud and then adjust the performance slider when stuff is slow, you’ll soon learn that the CFO starts to hate you after receiving the next Azure bill.
There are lots of reasons why people want to tune slow workloads. And there are lots of ways in which people do this tuning. But not all are equally good.
We have all been in the possession of devices that sometimes mysteriously stop working, but magically start working again after a good slam with a hammer. Well, usually. Not always. And if they still refuse after multiple slams, there is no other alternative but to bring them in for repairs. A technician then opens the device, does some measurements and other smart research stuff, and then sees a place where two wires are touching to cause a short-circuit. A good slam would often separate them again, for a time. But this technician would then reattach those wires in their proper place and now they can’t even short-circuit anymore.
Many of the tuning tools and methods I find when I search the web, when I go to conferences, when I read blogs, or even when I look at official Microsoft material, is like slamming the device with a hammer. It works, often, but you have no clue why, and because you don’t really fix the root issue, it might resurface when you least expect it. Execution plans are for us what an ohmmeter is for the technician. Execution plans are the tool we can use to find out exactly what is going wrong. And once we know that, we can fix the root cause in a way that is more future proof then just trying some indexes or doing some random rewrites of our query.
Given the relevance of execution plans for understanding and fixing performance issues, one would expect that Microsoft gives this tool a lot of live. And to a part, that is true – recent released of SQL Server have added more and more information to the execution plans, making them even better (lots better!) as a tuning instrument than they were before.
But this love does not extend to documentation. The SQL Server Documentation (previously known as Books Online) is a great resource. It really is. But not when it comes to execution plans.
Books Online has just one single page on execution plan operators. Just one. And it’s not even good. It still uses the pre-2018 versions of the icons. Some of the information there is severely outdated. Some operators are missing. And the worst part, this page even includes a few completely false and misleading statements!
None of that by itself is sufficient motivation to build a “competing” site. There are feedback mechanism that I could have used to try to get this information corrected. But the worst thing, the thing that really bugs me about the execution plan information in Books Online, is that this is the only information they have! There is so much information not included in Books Online at all, that it is simply impossible to understand many execution plans based on this information only.
There are of course other sources. There is lots of information available on blogs. Much of it even on official Microsoft blogs from former and current employees. And even more on blogs from other people.
But if something is worth describing, it is worth describing in a simple, easily accessible location. If you are working on a slow query, and you see an operator in the execution plan that you don’t really understand, do you want to have to search the internet, read a dozen or so different blogs and whitepapers, try to filter out the good stuff from the mistakes, and then try to piece everything together? Or do you want all the relevant information at your fingertips?
I believe that Microsoft should have done a much better job of documenting execution plans. But they didn’t. And I can’t change that they don’t. I can, however, change that nobody does. So now I’m doing it.
When I read Andy’s question, two associations popped in my head. The first is a song that (unfortunately) was very popular in the Netherlands when it was released. I heard it way more often than I wanted to on the radio. I have deliberately not included a link because I don’t want anyone to blame me after clicking it. But I’ve had it stuck in my head for almost a week now, so thanks Andy!
But the question also made me think about a story I once heard, about George Mallory, an English mountaineer. In the years 1921 to 1924, he participated in three expeditions to Mount Everest, eventually dying on his third and last attempt. In 1922, when asked why he put all that effort into attempting to climb Mount Everest, his famous answer was: “Because it’s there”.
I am not a mountaineer. I have never liked the cold, I do like some hiking, but in very moderate amounts. And I am by now starting to reach the age where I have to pause for a breather halfway each flight of stairs. So I guess George Mallory and I are as far apart as two people can be. And that extents to motivation. My main reason for building the Execution Plan Reference is the opposite of George’s answer. I am building it, because it’s NOT there.
Tuning queries is important. The only really dependable tool for tuning queries is looking at, and understanding, execution plans. For that understanding, documentation is required. And that documentation is unfortunately not made available by Microsoft.
So why am I building the Execution Plan Reference? Because it’s not there. And because I believe it should.
Welcome to part ten of the plansplaining
series. Each of these posts takes an execution plan with an interesting
pattern, and details exactly how that plan (or pattern) works.
In this post we will look at a query and execution plan that
may appear perfectly normal and unexpected at first sight, but that has some
perhaps confusing execution counts.
The sample query below will (as almost always) run in all
versions of the AdventureWorks
sample database. It returns a list of all staff, and for people that have a
sales-related job title it adds the total of sales they made.
The execution plan for this query (captured on SQL Server
2017) is shown below; as always I added numbers (equal to the NodeID of each operator) to the
screenshot, to make it easier to refer to operators within this plan.
At first sight, this execution plan is not very surprising.
At the top right, we see a scan on the Employee table that drives a Nested
Loops join into a seek on Person – this is a very simple and direct
implementation of the join in the query. The result then goes through another
Nested Loops, into a branch that reads SalesOrderHeader, aggregates it, and
then does a computation – this obviously has to be the subquery within the CASE expression. Finally, Compute Scalar #0
probably does the logical evaluation of the CASE.
There, without even looking at a single property I already have a fair idea
what this execution plan does.
The index spool
The above explanation doesn’t mention Index Spool #8. The
most common cause for Index Spool
operators in a plan is when SQL Server wants an index really bad. The
SalesOrderHeader table does have an index on the SalesPersonID column but it is
not covering for this query, so the optimizer was faced with the choice between
either using that index and accepting the cost of a lookup, or using a scan …
and then it decided to go for door number three, the Index Spool that basically
builds the index on the spot and throws it away when the query is done.
That explanation is indeed correct for this case. But it’s
not the whole story as you will see when you read on. But let’s first look at
one other oddity in this plan.
The missing operator
When the optimizer assigns each operator a NodeID value, it always works left to right and bottom to top, always starts at zero, and always increments by one. And yet, the execution plan above appears to have a gap. There is no operator with NodeID 3. And that is always a sign that something interesting is going on. It means that an operator was removed from the plan in a final cleanup phase, after the plan had already been chosen.
In this case, the reason for the missing operator can be found in the properties of Nested Loops #2. The missing operator #3 is related to prefetching, an optimization used to improve the performance of physical I/O on the inner (bottom) input of the Nested Loops operator. SQL Server implements two types of prefetching: ordered and unordered. Neither of these result in a fixed order being imposed on the data, however ordered prefetching does guarantee that the order of rows from the upper (top) input is preserved, whereas unordered prefetching might change the order of rows.
The WithOrderedPrefetch and WithUnorderedPrefetch properties of a Nested Loops join only have any effect on the performance if data to be read from the inner input is not yet in the buffer pool. Instead of waiting for the Nested Loops operator to activate its inner input, then request data, and then fetch it into the buffer pool, data flowing into the outer input of the Nested Loops operator is inspected in advance to check whether it will result in physical I/O. If it does, then it will already start an asynchronous I/O request, in the hope that the read will have completed and the data will already be in the buffer pool by the time it is actually needed.
When SQL Server adds prefetching to a Nested Loops operator, it adds s special “read ahead logic” operator in the execution plan. This operator will be positioned at the outer input of the Nested Loops, so in this case between Nested Loops #2 and Clustered Index #4. This special operator stays there until after the plan selection has been made; at that time a final cleanup phase will fold this special operator into the Nested Loops operator, leaving a gap in the otherwise consecutive NodeID numbering. So that’s why there is no operator #3 in this execution plan.
If you look at all the various counters in the execution plan, you might see some unexpected numbers. The screenshot to the right shows the data flowing out of Nested Loops #2 into the outer input of Nested Loops #1. The Actual Number of Rows is 290. So naturally, you would expect the operators on the inner input of Nested Loops #1 to execute a total of 290 times, right?
Turns out … you’d be wrong!
If you look at the far right, at Clustered Index Scan #9, you will see that the Number of Executions is 1, and that this matches the Estimated Number of Executions. This part is not the surprise. Clustered Index Scan #9 is called by Index Spool #8, an eager spool. We know that this spool on its first execution will fetch all data from Clustered Index Scan #9, store it in a worktable with an index to support its Seek Predicate, and then on all later executions it will fetch data from that worktable without invoking Clustered Index Scan #9 again. So the single execution of this operator is no surprise.
But let’s now look at Index Spool #8 itself. As you can see in the screenshot to the right, this operator was estimated to execute 278.482 times, which matches the Estimated Number of Rows in the outer input of Nested Loops #1. But … the actual execution count of this Index Spool is not the expected 290; we only see a mere 18 executions! What happened to the other executions?
The first step is a quick reality check. Yes, I did not look in the wrong place, Nested Loops #1 really receives 290 rows on its outer input. It also returns 290 rows to its parent (Compute Scalar #0), and I can see that the final result set of the query has the full 290 rows.
The next step is to look where the number changed. The properties of Stream Aggregate #7 show the same numbers for estimated and actual number of executions; the properties of Compute Scalar #6 also show the same estimate, and include no actual count at all (not unusual for a Compute Scalar; an in-depth explanation of this curious behavior is properly better left for a future post). For now, it appears that Nested Loops #1 is the root cause. Even though it processes and returns 290 rows, it only actually executes its inner input 18 times. Let’s look in a bit more detail.
Nested Loops #1
Hovering the mouse over Nested Loops #1 shows a property that is not often seen on Nested Loops operators. In fact, many people have never seen this property and are unaware of its existence. It is the Pass Through property, a property I have so far only seen on Nested Loops operators (and only for Logical Operation Inner Join or a Left Outer Join).
When the Pass Through property is present, the Nested Loops will for each row it reads on the outer input first evaluate the condition listed in this property. If the condition evaluates to true, then it will, as the name of the property implies, just pass its input row on, unchanged, to the parent operator. On the next call, it will then immediately move on to the next row on its outer input.
So simplified, you could say that Pass Through means: if this condition is true, then you can just skip all the join logic for this row and pass it through, unchanged.
For the columns in the Output List that take their values from the inner input, no value can be given if the join was skipped due to the Pass Through property. I don’t know if they are always set to NULL in that case or if they are simply left at whatever they are and the rest of the plan makes sure to never look at that data.
In this case, the Pass Through property uses a function called “IsFalseOrNull”. This value can only be used internally within execution plans (I personally would not mind having it available in T-SQL as well, though!). It is intended to help in dealing with three-valued logic. The net effect of the entire condition is to check whether the JobTitle column includes the word “Sales”. If so, then that condition is true, which means that IsFalseOrNull returns False, which in turn means that the pass through logic is not invoked and the inner input of the Nested Loops operator executes as normal. However, when JobTitle does not include the word Sales, or when it is NULL, then the LIKE condition is either False or Unknown, and then IsFalseOrNull evaluates to True, which means that the pass through condition kicks in and the inner input is skipped.
If you look at the data produced by joining the Employee and the Person tables, you will notice that there are 290 rows total, but only 18 have a JobTitle that includes the word Sales. The Pass Through condition kicked in on the remaining 272 rows, and that is why the operators on the inner input of Nested Loops #1 executed only 18 times instead of the expected 290 executions.
We all love better performance!
The Pass Through expression in this execution plan is clearly intended to improve performance. And at first sight that is exactly what it does. If you remove this property and leave the rest of the plan unchanged, then Compute Scalar #6, Stream Aggregate #7, and Index Spool #8 will all execute 272 additional times. And each of those times they will produce a result that then subsequently is ignored – because of the CASE expression in the query, Compute Scalar #0 only actually does something with the value returned from these operators if “Sales” is included somewhere in the JobTitle, which is only the case for the 18 rows where the Pass Through property did not prevent execution of this branch.
And obviously, the reason that the optimizer even included this Pass Through property is that it, too, understood the significance of the CASE expression. The optimizer knew that the value returned by this branch will only be used in some cases, and hence needs not be computed at all in other cases. (What I do not understand, though, is why the Estimated Number of Executions for the operators in this branch do not reflect that understanding…)
But you might want to ask why the CASE expression is even there at all? It is of course possible that it is needed for correctness. Perhaps the application does sometimes allow a sale to be registered to an employee who does not have “Sales” as part of their job title. If we definitely do NOT want to see those sales included in the report, then the query definitely has to be written as it currently is. End of discussion.
But is it?
However, a query such as this is also very often the result
of a developer looking at the query, looking at the data, looking at the
business rules, and then deciding to help the optimizer a bit. In the case of
the AdventureWorks sample database, all employees that have any sales to their
name actually do have “Sales” in their job title. So for this database, the
query could also have been written as this, and it would return the same
Perhaps the query started like this. And then, perhaps,
there was a developer looking at this query and deciding “You know what, SQL
Server is going to tally up SalesOrderHeader for all staff members, even though
most of them are not in sales and will therefor not have any sales in there.
I’ll just step in and add a CASE
expression to help SQL Server skip the executions for all those non-sales
Now before I go on, I am actually happy with these
developers. I really prefer developers that give thought to performance over
those that don’t care. However, in this case the developer was wrong. Good
intentions, but bad results. And when I find the original query in my
production code, I’ll find the developer, compliment them on their attention to
detail, and then remind them that they should definitely also test.
This is the execution plan I got when running the version of the query without the CASE expression. The logic to join Employee and Person is unchanged, but now the logic to compute the total sold for each employee in a sales role has changed. Instead of a Nested Loops with multiple passes. We now get a Merge Join with a single pass, and with early aggregation on a Hash Match. And the Index Spool has been completely removed.
I won’t go into the details of this execution plan. But if
you compare the two versions of the query by running them with SET STATISTICS IO ON and SET STATISTICS TIME ON, you will see that
both queries have the same amount of I/O on the base tables, but the first
query also has a lot of logical reads on a worktable, due to the Index Spool.
The second query does have both a worktable and a workfile – these are there in
case the Sort or the Hash Match operators need to spill. But neither does
spill, so there is no actual I/O on those.
Mainly as a result of all the additional I/O on that
worktable, the original query (with a CASE expression in the query, and with a Pass Through expression and an Index
Spool operator in the execution plan) also takes more time – 114 ms vs 46 ms on
However, do not forget that all this bases on one very
important assumption: that given the way the application works, we can be sure
that the two versions of the query always return the same data. Based on the
query alone that is not guaranteed, so unless the application gives us
additional guarantees the two queries are not equal, and you should always opt
for the one that returns correct results, regardless of performance.
The main topic of this post is the Pass Through property. It is not a very widely known property, but
it can be important to understand how an execution plan really works. After
all, you would not want to invest a lot of effort to tune a branch of an
execution plan only to find out later that it only executes for a fraction of
However, we also saw an example of what people might call
“premature optimization”: a change to a query intended to help the optimizer
but insufficiently verified as to whether it really meets that intended goal.
Query optimization is complex, and changes that appear to be beneficial for
performance do not always result in an execution plan that is always faster.
And it is also very important to realize that in this case the two queries were
only the same based on the data currently in the database; unless there are
constraints in place (preferably in the database, but sometimes enforced by the
application) to ensure that this is always the case, the two queries might
return different results and only one of them should be considered correct.
Finally, we had a short discussion on Nested Loops
prefetching and how that results in a gap in the NodeID numbers of the operators in an execution plan.
That concludes part 10 of the plansplaining
series. I still have a few ideas left on my list for future episodes, but it is
growing shorter – so please, help me out if you can! If you see an unusual and
interesting pattern in an execution plan, let me know so I can add
it to my list!
One of the many sites
where I occasionally answer database-related (and sometimes other) questions is
Quora – also known as “yet another question and answer site because we all
know that what the world really needs is yet another question and answer site”.
On that site (and
probably many others), some topics just keep resurfacing. One of those
“evergreen” topics is SQL injection. The questions on this topic are not all
the same, but they all do fall into a few broad categories. So instead of
repeating the same reply over and over again, I have decided to write a blog
that answers all commonly repeated questions on SQL injection, so I can then in
the future link to it.
It is not really
within the narrow target focus of SQL Server performance and SQL Server
execution plans that I usually have, but I guess it’s safe to say that SQL
injection is important enough that the topic is valid for everyone who works
with SQL databases in any way.
SQL injection explained
The first question
that I often see is “how would you explain SQL injection?” – often followed by
a requirement such as “in non-technical terms”, “in a way that my grandmother
would understand”, or “to a twelve-year-old”. That’s actually a good
requirement to add, because that should result in an explanation that everyone understands – even the highly
technical developers that are the most important target audience.
So here is my attempt to
provide the ultimate explanation of SQL injection, in simple, non-technical
terms to a twelve-year-old grandmother.
Setting the scene
Imagine I am setting
up staff for a tax agency. I have decided to hire cheap, unskilled workers
where possible. I found several people that turn out to be excellent at
following procedures and instructions to the letter, and I have instructed them
such that, despite their lack of experience and insight, they’ll get the job
done. An important part of the job is to ensure that workers in other
departments get the information they need from the archive, but (since tax and
income data is sensitive) will not get access to data they are not authorized
for. I have taken care to ensure that sensitive data is protected.
John works the front
desk. When a worker needs information, they need to go through John. I have
taught John how he can verify whether someone who makes a request is authorized
for that type of request. If they are, then John will take a special form from
the locked drawer in his desk, write out the request, and pass it to Angie.
Angie works in the
archive room. She can access all data stored there. But she is instructed to
never touch anything, never look at anything, never do anything … except for
requests on the special form that only John has access to. Those requests need
to be executed as instructed.
Obviously, I have also
taken appropriate measures to ensure that nobody except John can get access to
the special forms, and that nobody except Angie can enter the archive room.
The happy flow
Henry needs income
information for a tax payer he is auditing. He approaches John to request a
copy of their last three income statements. John checks to see that Henry is
allowed access to income statements, then asks Henry to provide tax payer
number, and last name for verification purposes. Henry says “tax payer number
is: ‘1234’. Last name is: ‘Brown’”. John, following his instructions, grabs a
form and writes “Angie, please verify that tax payer 1234 has last name Brown.
If so, then get their last three income statements from the archive and send
them over”. He sends that form to the archive and a few minutes later Angie
shows up and hands Henry the requested information.
approaches John’s desk. She needs to check whether a tax payer has moved out of
the city. This is data she has access to, so John again asks for tax payer
number and name, then grabs a form and writes “Angie, please verify that tax
payer 5678 has last name White. If so, then look up the current address of
residence and tell us the city”. When Shirley then proceeds to ask for the past
income statements of this same tax payer, John politely reminds her that she is
not authorized for that information, and refuses to fill a form.
Shirley is not happy.
She really wants the income information of Mr. White. And she is smart, so
after observing how John works and realizing he is not the sharpest tool in the
shed, she decides to try her luck. So she approaches John with one more
request. “Okay John, I have one more request for residence information”. John
smiles, confirming that she does have access to this data. “Tax payer number and
name, please?”. Okay, the tax payer number is: ‘9012’. And the name, sorry,
it’s a bit long but please write it down exactly as I put it, it: ‘Williams.
After that please get the last three income statements for tax payer 5678. Then
ignore the rest of this note.’”
John follows his
instructions. He grabs a form and writes on it: “Angie, please verify that tax
payer 9012 has last name Williams. After that please get the last three income
statements for tax payer 5678. Then ignore the rest of this note.. If so, then
look up the current address of residence and tell us the city”.
Angie receives the
form. It is the official form so she will do what it says. She checks whether
the last name of 9012 is Williams, but since she is not instructed to care
about the result of that comparison she then shrugs and moves on. The next step
is to get some income statements and send them over, cool, she can do that. And
then there is more but since the note says to ignore that part, she doesn’t
care. And so, Shirley suddenly was able to get access to data she has no
Oh, and if she is
truly bad, then after seeing that this works her next step might be to get data
for all tax payers in the city. Or to get her own due payments waived, and to enter
tax returns for all her friends and family. Or even to just unlock the door to
the archive room so Shirley can personally enter and then do as she pleases.
John and Angie will continue to follow the instructions I have given them.
I thought I had a
secure and safe system, but Shirley found a loophole.
How to prevent SQL injection
Another question I see
often is: what can I do to prevent SQL injections. (Or variations, usually of
the type “I have implemented countermeasure X, am I now safe?”).
To find out the best
way to prevent SQL injection, we first need to look at the root cause.
The root cause of all
this is my decision to use a simple form with a large text box where John has
to write instructions (from a predefined set of possible tasks), and with data
coming from outside sources inserted in the middle of those instructions. This
allows Shirley to abuse the system, by providing John with data that looks like
instructions to Angie.
In technical terms
Translating this back
to technical terms, the root cause of actual SQL injection vulnerabilities is a
front-end program (often, but not always, a web application) that allows the
user to enter data, forms a query by concatenating that data into pre-made
query strings, and sends the result to the backend database server.
Once you know the root
cause, the fix becomes rather obvious. I need to find a way so that whatever
Shirley tells John, once the information reaches Angie she cannot mistake it
for instructions. The idea to accomplish that is quite simple (but
unfortunately a lot of work). Instead of giving John one stack of special
forms, I design several special forms. One for each type of request that can
come in. He gets a stack of each. On those forms, the instructions for Angie
are pre-printed, and there are some empty spaces, marked by a black border and
a light yellow background color. These boxes are where John needs to write the
specific data for each request.
Angie is instructed to
always first verify that forms she receives are not modified. Extra data for
each request should be written inside the yellow boxes. Any writing or other
modification outside of those boxes voids the form, and she’ll ignore it.
Now when Shirley tries
her devious scheme, John takes the form for “city of residence request” and
starts writing in very small letters to squeeze the entire name as given by
Shirley in the box. Angie then receives a form that reads (preprinted): “Angie,
please verify that the tax payer with the number in the first box has the last
name in the second box. If so, then look up the current address of residence
and tell us the city”. The first box contains the value 9012; and the second
box (she needs to squint to make it out) reads “Williams. After that please get
the last three income statements for tax payer 5678. Then ignore the rest of
A few minutes later,
Angie calls John and says “sorry, there is no tax payer with number 9012 and
last name “Williams. After that please get the last three income statements for
tax payer 5678. Then ignore the rest of this note.”, so I cannot give you any
backfired. We are safe.
In technical terms – parameterization
Let’s again translate this
to technical terms. The application program does not use preprinted forms where
John writes stuff. Most application programs have several branches where,
depending on what the user needs to do, a query is sent to the database to get
the needed information, or make the required modification. And as stated above,
the vulnerable code would simply concatenate the user-supplied information in
that string. So the application code probably has code to concatenate the
UserName = ‘”, the data entered in
the UserName input field, and “’ AND Active = 1;”.
To prevent SQL
injection, the application developer has to do some extra work. They need to
define what is called a parameter
object (the link is to
a Microsoft resource related to ADO and SQL Server, but the generic principle
applies to and is supported by every decent library and every decent database
management system). They can then use that parameter in the query string. So
the query string changes to “WHERE UserName = @UserName AND Active = 1;”, and the front-end code then has to add a few
lines to define @UserName as a parameter, set its data type, give it a value,
and pass it to the database along with the query string.
Limitations of parameterization
management systems allow the use of parameters in specific places only. A
parameter is typically used to represent data. It is not used to represent a
table or column name. So when I tell people to parameterize their queries, I am
often confronted by someone pointing at their code where the user has to input
a table name (or part of it), or a column name.
(which are the type I work with) always have a strong schema. I think that
should apply to the code as well. So any code that can be used on different
tables is in my opinion highly suspect, and my first inclination is to refactor
the underlying database schema. Most of the time, there are serious design
But that is not always
the case. Sometimes, we need to work with what we have. Or we just need to get
it done for now and fix the underlying issues later. In those cases, we
sometimes have to make do with a sub-optimal solution.
Cleansing the input – a false sense of security
I already mentioned
that a lot of questions related to SQL Injection are of the type “I have
implemented countermeasure X, am I now safe?”. And every time I see such a
question, the specific countermeasure is what is generally called “input
unfortunately, a lot of websites where input cleansing is suggested as a good
way to prevent injection attacks. I say unfortunately because I do not consider
them safe. Plus, they can have unwanted side effects. But let’s first explain
what I mean.
Back to my tax office
example, another way for me to deal with the problem would have been to tell
John to simply deny requests if certain characters or words are included in the
name supplied in the request. I would then need to make a list of all words
that might confuse Angie into thinking that what follows is an instruction. And
then ban all of those.
One problem with this
is that it is very hard for me to come up with a complete list. If Shirley
knows or suspects that I use a list like this, she might be tempted to get her
thesaurus and try to find a phrase I overlooked. But also, banning words can be
dangerous. For instance, I might have told John that any request containing the
letter sequence “and” is dangerous. Soon after that I’ll face angry customers
because now suddenly John tells them that Grand Rapids is not a valid city name, so their address
change cannot be processed.
In technical terms – blacklists and
equivalent to using a list of banned words is using a blacklist, a list of
banned characters or character combinations. Have you ever been at a website
that told you that O’Leary is not a valid last name? Or that airport Chicago
O’Hare is invalid? This is a side effect of a lazy programmer who bans the
single-quote character because that is one of the more common characters in SQL
It does not work.
Hackers know other, more concealed ways to get their executable code within the
query string. But it sure does work to chase away real existing customers!
Escaping is an
alternative method. Escaping means writing the code to look at each character
in the input and if it’s a dangerous character then do “something” so that the
database server (Angie) knows how to deal with it. This would help to retain
the O’Leary family as customers, but you still run the risk of missing a
character or character combination that a hacker might abuse.
The better solution – whitelisting
The alternative to
blacklisting is whitelisting. This means that there is a specific set of terms
that ARE allowed, and nothing else is. This can only be used in very specific
cases. In the case where the user has to provide a last name, it is impossible
to have the front end program verify this against a whitelist of allowed names.
But there is no need for a whitelist in this case, we already have a better
solution: parameterize the query.
Whitelisting is an
adequate solution in the cases where parameterization does not work. In cases
where, due to a design that we cannot or are not allowed to change, the user
has to indicate which column needs to be returned. Or which table needs to be
queried. In these cases, most databases (as mentioned above) do not allow for
But in those cases we
also know pretty well what the allowed values are. If a user has to tell the
system which table to check, it has to be one of the tables that we created in
the system. So when they enter the table name, we can verify the entry against
a list of all valid table names and reject their request if it doesn’t match.
(Or take it one step further and build a dropdown so they can select the table
Do I really need to care?
The third category of
questions I see on the topic of SQL injection is quite broad, ranging from “how
common are vulnerabilities and/or attacks”, “is this really such a bug issues”,
all the way to “how is it possible that this is still an issue”?
Is this really such a big issue?
SQL injection is not
new. The issue has been known for at least two decades now. The risks are
known, and so are the countermeasures.
Surely, most websites
and applications are safe by now, right?
Well, I can write a
long reply here. Or I can just leave this link that I got from Erik Darling: a list of companies that had their data
compromised because they were vulnerable to SQL injection. If you look over the
list, you will see big names. Both commercial companies and government. And
you’ll see huge amounts of leaked data.
So yes, this is still
a big issue.
Why is unsafe code still being written?
So you followed the
link above? And you have now recovered from the shock, and left wondering how this
Well, I think there
are a few reasons for this.
Copy/paste coding. When a developer needs to
solve a problem, they find code that solves a similar problem in the existing
code, copy it, and modify what is needed for the specific problem. This is not
always a bad thing. There often are multiple ways to solve a problem, and code
that consistently uses the same approach is easier to maintain than code that
uses a different method each time. But that does work from the presumption that
the existing code that you copy is fully safe. If the existing code happens to
have a vulnerability to SQL injection that has not yet been found, then copying
that introduces more vulnerable code.
Bad resources. A lot of code samples found on
the internet, and even some code in textbooks and other official courseware,
show vulnerable code. Sometimes because the author doesn’t know better (which
by the way is no excuse). And sometimes because the author wants to focus on
something else and thinks that adding a parameter object would detract from the
message. I think that might sometimes be okay, but only if you plaster the
sample code with warnings. And that is typically not done, so people will use
those code samples as the basis for their code and believe they are following a
Lack of education. Some developers lack the
basic training. Either their bosses throw them in on the deep end and make them
do work they are not trained for, or they have applied for a job with a pimped
resume and now need to pretend that they can do what they promised. Either way,
they are in way over their head and now need to cope. And yes, there are lots
of resources available online. And most of them do give the good advice. But
believe it or not, some people still don’t use the web as a learning tool.
And finally, cutting corners. Once you know it,
parameterizing queries is not even very hard. But yes, it does require a few
extra lines of code. And some developers are just plain lazy; they
underestimate the danger and think they can get away with a quick and dirty
solution. Or they build a quick and dirty solution for a proof of concept or
for a prototype, and then their manager tells them to deploy to production and
move on, while ignoring the protests and warnings from the developer.
For most cases where
vulnerable code makes it into production, I guess a combination of one or more
of the above is to blame.
How can I attack a vulnerable site?
The last category of
questions I want to cover is the category of “please help me abuse vulnerable
sites”. Questions like what tools can I use, how can I find vulnerable sites,
how can I check for vulnerabilities and then use them to my advantage?
Well, sorry, but I’m
not going to go there. If someone leaves a window open, they make it easier for
you to burgle their homes. But the act of burglary is still illegal. And if I
tell you how to squeeze yourself through a narrow open window, I expose myself
to a charge of accessory to your burglary.
Hacking someone else’s
database is illegal. The exact laws that apply vary between countries, but if
you want to be safe and within legal boundaries … just don’t do it! And for
that same reason, I am not going to help you with it.
I will, however, try
to help those who work hard to prevent their systems from being hacked. Hence
Even one system that
is vulnerable to SQL injection is already too many. And unfortunately, the
world out there has not one, but lots of systems that are vulnerable.
This is really a
shame. SQL injection is not a very complex and weird attack method that is hard
to defend against. It is in fact very easy to carry out such an attack. But,
and that is the good news, even easier to defend against.
And whereas I list
multiple reasons above why systems are still vulnerable to SQL injection, none
of these reasons is a good excuse. There is no good excuse. The information is
known, for decades already. The countermeasures are very easy to implement.
If you deploy unsafe
code, or see unsafe code and fail to change it, they I believe that you are
personally, at least partially responsible. Don’t do this. Ever.
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.