Month: February 2019

T-SQL Tuesday #111: Why, tell me why

Time flies. It feels like the new year has just started, and yet we’re already at the second T-SQL Tuesday of the year. Our host this February is Andy Leonard (b|t), and his assigned topic is, simply: Why?

Or rather: What is your why? What motivates you, what makes you tick? And most of all: What makes you do the things you do?

The Execution Plan Reference

I do a lot of things. I have a one-person consultancy business, I have a wife and kids, I have a personal life with several hobbies, and I could talk about my motivation for all those choices, and more. But I decided to focus this post on one thing, and one thing only: why do I build the Execution Plan Reference?

Researching, verifying, and writing down all the information in the Execution Plan Reference takes time. A lot of time. And I get nothing in return. I am not paid by anyone for the time I work on this. The site doesn’t run ads, so I get nothing when people use it. In fact, I even have to pay the hosting cost out of my own pocket. It’s easy to understand why people would want to know what drives me to do this.

Inspiration

A full answer to any “why” question usually involves more than just a motivation. Yes, a motivation is the reason why you decide to do something once you have the idea. And what makes you continue to do it after the initial enthusiasm wears off. But what is it that gives you the idea in the first place? That can be something as simple as “someone asked me to”, but that is not where I got the idea for the Execution Plan Reference

Training day

The first time I gave a pre-conference session / training day / full-day workshop (or whatever other name you want to give it) on execution plans was back in 2014. I have given it several times since, and I am still submitting it to conferences. Obviously, the content has changed significantly over the past five years. But the main outline has not. I am convinced that people are smart enough that they can read and understand all execution plans, when given the proper knowledge and tools.

There is this well known saying “give a man a fish and he has food for a day; teach a man to fish and he will never be hungry”. My philosophy takes this even one step further. I don’t show people the best fishing spots. I teach people how fish behave, how weather, climate, and other factors affect them, and how that determines where they’ll go – so they will always be able to work out the best fishing spots, even when conditions change.

When I teach about execution plans, I spend only a little time on execution plans as a whole. Most of the day is then spent on individual operators. Because that is key to understanding execution plans. If you don’t understand what an operator does, how it behaves, you’ll never fully understand the execution plans it appears in. And conversely, if you understand the behavior of all operators in an execution plan, you should always be able to work out how the execution plan as whole works. And assuming you’re looking at the execution plan of a slow query (which for most people is the only reason to ever look at an execution plan), that should then also explain why it is slow.

Pluralsight

For those who never heard of Pluralsight, this is a company that sells access to online training videos. I have created a course on relational database design for them. And after the first few deliveries of my full-day training, I started talking to Pluralsight to see if they would be interested in creating a course on execution plans, based on my training day materials but even deeper and even more complete. There is only so much I can do in a single day, and I was excited by the idea of being able to go even deeper, be even more complete, in an online course.

Pluralsight and I did reach an agreement, and I started work, but due to reasons in my personal life I was unable to meet deadlines and then the contract expired. I later tried to renegotiate, but we never came to a mutually acceptable agreement anymore. I was still in a rough spot in my personal life, so I could not commit to even the most generous deadlines Pluralsight offered. And at the same time, Pluralsight noticed that the market has much more interest in short courses and short clips, then in the long and in-depth material I wanted to created, so their interest in my content waned.

Grant Fritchey

Once I realized that my proposed course would never go to Pluralsight, I did explore some alternatives for a short while, such as starting a YouTube channel – but I quickly realized that I lack the resources and skills to make good enough content to ever make that work. So I started to come to terms that perhaps I would never be able to make my training day material available outside of the limited circle of participants.

But at that time, I was already working in a tech editor role on Grant Fritchey’s excellent book “SQL Server Execution Plans, Third Edition”. And I have lost track of how often I was looking at a code sample and execution plan, reviewing Grant’s description, and then adding a comment that started with “Okay Grant, this is well beyond the scope of what we want in the book but you are going to love this. I noticed that …”

But I realized that Grant is not the only one who loves those details. There are more geeky people out there that love that kind of information. And there are even more people out there that don’t care about it, until they run into an issue and need exactly that information to help them over the bump. The book was not the right place to put that information. But if I kept it only in my head, or only between Grant and me, then the SQL Server community would never get to benefit from the effort I put into discovering all that stuff. I had to find a way to make this knowledge available to the community.

Exactly that moment, one of the many times I was editing Grant’s book and finding stuff that was too deep for the book but still very relevant, is when I got the idea of the Execution Plan Reference. It was not called that initially. I didn’t know yet what form it would take. But I did realize: I have knowledge in my head that not many people have. That information needs to be made available to the SQL Server community. And if I don’t do it, then who will?

That’s where I got the inspiration. Now let’s review the motivation.

Relevance

We all love good performance. We all hate slow queries. So when we see slow queries, we want to fix them. And it’s not just us. End users hate it if they have to wait for their screen to refresh. Managers hate applying for faster hardware. And if you think you can just migrate to the cloud and then adjust the performance slider when stuff is slow, you’ll soon learn that the CFO starts to hate you after receiving the next Azure bill.

There are lots of reasons why people want to tune slow workloads. And there are lots of ways in which people do this tuning. But not all are equally good.

We have all been in the possession of devices that sometimes mysteriously stop working, but magically start working again after a good slam with a hammer. Well, usually. Not always. And if they still refuse after multiple slams, there is no other alternative but to bring them in for repairs. A technician then opens the device, does some measurements and other smart research stuff, and then sees a place where two wires are touching to cause a short-circuit. A good slam would often separate them again, for a time. But this technician would then reattach those wires in their proper place and now they can’t even short-circuit anymore.

Many of the tuning tools and methods I find when I search the web, when I go to conferences, when I read blogs, or even when I look at official Microsoft material, is like slamming the device with a hammer. It works, often, but you have no clue why, and because you don’t really fix the root issue, it might resurface when you least expect it. Execution plans are for us what an ohmmeter is for the technician. Execution plans are the tool we can use to find out exactly what is going wrong. And once we know that, we can fix the root cause in a way that is more future proof then just trying some indexes or doing some random rewrites of our query.

Books Online

Given the relevance of execution plans for understanding and fixing performance issues, one would expect that Microsoft gives this tool a lot of live. And to a part, that is true – recent released of SQL Server have added more and more information to the execution plans, making them even better (lots better!) as a tuning instrument than they were before.

But this love does not extend to documentation. The SQL Server Documentation (previously known as Books Online) is a great resource. It really is. But not when it comes to execution plans.

Books Online has just one single page on execution plan operators. Just one. And it’s not even good. It still uses the pre-2018 versions of the icons. Some of the information there is severely outdated. Some operators are missing. And the worst part, this page even includes a few completely false and misleading statements!

None of that by itself is sufficient motivation to build a “competing” site. There are feedback mechanism that I could have used to try to get this information corrected. But the worst thing, the thing that really bugs me about the execution plan information in Books Online, is that this is the only information they have! There is so much information not included in Books Online at all, that it is simply impossible to understand many execution plans based on this information only.

Other sources

There are of course other sources. There is lots of information available on blogs. Much of it even on official Microsoft blogs from former and current employees. And even more on blogs from other people.

But if something is worth describing, it is worth describing in a simple, easily accessible location. If you are working on a slow query, and you see an operator in the execution plan that you don’t really understand, do you want to have to search the internet, read a dozen or so different blogs and whitepapers, try to filter out the good stuff from the mistakes, and then try to piece everything together? Or do you want all the relevant information at your fingertips?

I believe that Microsoft should have done a much better job of documenting execution plans. But they didn’t. And I can’t change that they don’t. I can, however, change that nobody does. So now I’m doing it.

Conclusion

When I read Andy’s question, two associations popped in my head. The first is a song that (unfortunately) was very popular in the Netherlands when it was released. I heard it way more often than I wanted to on the radio. I have deliberately not included a link because I don’t want anyone to blame me after clicking it. But I’ve had it stuck in my head for almost a week now, so thanks Andy!

But the question also made me think about a story I once heard, about George Mallory, an English mountaineer. In the years 1921 to 1924, he participated in three expeditions to Mount Everest, eventually dying on his third and last attempt. In 1922, when asked why he put all that effort into attempting to climb Mount Everest, his famous answer was: “Because it’s there”.

I am not a mountaineer. I have never liked the cold, I do like some hiking, but in very moderate amounts. And I am by now starting to reach the age where I have to pause for a breather halfway each flight of stairs. So I guess George Mallory and I are as far apart as two people can be. And that extents to motivation. My main reason for building the Execution Plan Reference is the opposite of George’s answer. I am building it, because it’s NOT there.

Tuning queries is important. The only really dependable tool for tuning queries is looking at, and understanding, execution plans. For that understanding, documentation is required. And that documentation is unfortunately not made available by Microsoft.

So why am I building the Execution Plan Reference? Because it’s not there. And because I believe it should.

Plansplaining, part 10. Just passing through

Welcome to part ten of the plansplaining series. Each of these posts takes an execution plan with an interesting pattern, and details exactly how that plan (or pattern) works.

In this post we will look at a query and execution plan that may appear perfectly normal and unexpected at first sight, but that has some perhaps confusing execution counts.

Sample query

The sample query below will (as almost always) run in all versions of the AdventureWorks sample database. It returns a list of all staff, and for people that have a sales-related job title it adds the total of sales they made.

The execution plan for this query (captured on SQL Server 2017) is shown below; as always I added numbers (equal to the NodeID of each operator) to the screenshot, to make it easier to refer to operators within this plan.

At first sight, this execution plan is not very surprising. At the top right, we see a scan on the Employee table that drives a Nested Loops join into a seek on Person – this is a very simple and direct implementation of the join in the query. The result then goes through another Nested Loops, into a branch that reads SalesOrderHeader, aggregates it, and then does a computation – this obviously has to be the subquery within the CASE expression. Finally, Compute Scalar #0 probably does the logical evaluation of the CASE. There, without even looking at a single property I already have a fair idea what this execution plan does.

The index spool

The above explanation doesn’t mention Index Spool #8. The most common cause for Index Spool operators in a plan is when SQL Server wants an index really bad. The SalesOrderHeader table does have an index on the SalesPersonID column but it is not covering for this query, so the optimizer was faced with the choice between either using that index and accepting the cost of a lookup, or using a scan … and then it decided to go for door number three, the Index Spool that basically builds the index on the spot and throws it away when the query is done.

That explanation is indeed correct for this case. But it’s not the whole story as you will see when you read on. But let’s first look at one other oddity in this plan.

The missing operator

When the optimizer assigns each operator a NodeID value, it always works left to right and bottom to top, always starts at zero, and always increments by one. And yet, the execution plan above appears to have a gap. There is no operator with NodeID 3. And that is always a sign that something interesting is going on. It means that an operator was removed from the plan in a final cleanup phase, after the plan had already been chosen.

In this case, the reason for the missing operator can be found in the properties of Nested Loops #2. The missing operator #3 is related to prefetching, an optimization used to improve the performance of physical I/O on the inner (bottom) input of the Nested Loops operator. SQL Server implements two types of prefetching: ordered and unordered. Neither of these result in a fixed order being imposed on the data, however ordered prefetching does guarantee that the order of rows from the upper (top) input is preserved, whereas unordered prefetching might change the order of rows.

The WithOrderedPrefetch and WithUnorderedPrefetch properties of a Nested Loops join only have any effect on the performance if data to be read from the inner input is not yet in the buffer pool. Instead of waiting for the Nested Loops operator to activate its inner input, then request data, and then fetch it into the buffer pool, data flowing into the outer input of the Nested Loops operator is inspected in advance to check whether it will result in physical I/O. If it does, then it will already start an asynchronous I/O request, in the hope that the read will  have completed and the data will already be in the buffer pool by the time it is actually needed.

When SQL Server adds prefetching to a Nested Loops operator, it adds s special “read ahead logic” operator in the execution plan. This operator will be positioned at the outer input of the Nested Loops, so in this case between Nested Loops #2 and Clustered Index #4. This special operator stays there until after the plan selection has been made; at that time a final cleanup phase will fold this special operator into the Nested Loops operator, leaving a gap in the otherwise consecutive NodeID numbering. So that’s why there is no operator #3 in this execution plan.

If you want to learn more about prefetching, I recommend reading Craig Freedman’s blog post, Paul White’s blog post, and Fabiano Amorim’s article. The rest of this blog post will focus on other areas of this execution plan.

How many executions?

If you look at all the various counters in the execution plan, you might see some unexpected numbers. The screenshot to the right shows the data flowing out of Nested Loops #2 into the outer input of Nested Loops #1. The Actual Number of Rows is 290. So naturally, you would expect the operators on the inner input of Nested Loops #1 to execute a total of 290 times, right?

Turns out … you’d be wrong!

If you look at the far right, at Clustered Index Scan #9, you will see that the Number of Executions is 1, and that this matches the Estimated Number of Executions. This part is not the surprise. Clustered Index Scan #9 is called by Index Spool #8, an eager spool. We know that this spool on its first execution will fetch all data from Clustered Index Scan #9, store it in a worktable with an index to support its Seek Predicate, and then on all later executions it will fetch data from that worktable without invoking Clustered Index Scan #9 again. So the single execution of this operator is no surprise.

But let’s now look at Index Spool #8 itself. As you can see in the screenshot to the right, this operator was estimated to execute 278.482 times, which matches the Estimated Number of Rows in the outer input of Nested Loops #1. But … the actual execution count of this Index Spool is not the expected 290; we only see a mere 18 executions! What happened to the other executions?

The first step is a quick reality check. Yes, I did not look in the wrong place, Nested Loops #1 really receives 290 rows on its outer input. It also returns 290 rows to its parent (Compute Scalar #0), and I can see that the final result set of the query has the full 290 rows.

The next step is to look where the number changed. The properties of Stream Aggregate #7 show the same numbers for estimated and actual number of executions; the properties of Compute Scalar #6 also show the same estimate, and include no actual count at all (not unusual for a Compute Scalar; an in-depth explanation of this curious behavior is properly better left for a future post). For now, it appears that Nested Loops #1 is the root cause. Even though it processes and returns 290 rows, it only actually executes its inner input 18 times. Let’s look in a bit more detail.

Nested Loops #1

Hovering the mouse over Nested Loops #1 shows a property that is not often seen on Nested Loops operators. In fact, many people have never seen this property and are unaware of its existence. It is the Pass Through property, a property I have so far only seen on Nested Loops operators (and only for Logical Operation Inner Join or a Left Outer Join).

When the Pass Through property is present, the Nested Loops will for each row it reads on the outer input first evaluate the condition listed in this property. If the condition evaluates to true, then it will, as the name of the property implies, just pass its input row on, unchanged, to the parent operator. On the next call, it will then immediately move on to the next row on its outer input.

So simplified, you could say that Pass Through means: if this condition is true, then you can just skip all the join logic for this row and pass it through, unchanged.

For the columns in the Output List that take their values from the inner input, no value can be given if the join was skipped due to the Pass Through property. I don’t know if they are always set to NULL in that case or if they are simply left at whatever they are and the rest of the plan makes sure to never look at that data.

In this case, the Pass Through property uses a function called “IsFalseOrNull”. This value can only be used internally within execution plans (I personally would not mind having it available in T-SQL as well, though!). It is intended to help in dealing with three-valued logic. The net effect of the entire condition is to check whether the JobTitle column includes the word “Sales”. If so, then that condition is true, which means that IsFalseOrNull returns False, which in turn means that the pass through logic is not invoked and the inner input of the Nested Loops operator executes as normal. However, when JobTitle does not include the word Sales, or when it is NULL, then the LIKE condition is either False or Unknown, and then IsFalseOrNull evaluates to True, which means that the pass through condition kicks in and the inner input is skipped.

If you look at the data produced by joining the Employee and the Person tables, you will notice that there are 290 rows total, but only 18 have a JobTitle that includes the word Sales. The Pass Through condition kicked in on the remaining 272 rows, and that is why the operators on the inner input of Nested Loops #1 executed only 18 times instead of the expected 290 executions.

We all love better performance!

The Pass Through expression in this execution plan is clearly intended to improve performance. And at first sight that is exactly what it does. If you remove this property and leave the rest of the plan unchanged, then Compute Scalar #6, Stream Aggregate #7, and Index Spool #8 will all execute 272 additional times. And each of those times they will produce a result that then subsequently is ignored – because of the CASE expression in the query, Compute Scalar #0 only actually does something with the value returned from these operators if “Sales” is included somewhere in the JobTitle, which is only the case for the 18 rows where the Pass Through property did not prevent execution of this branch.

And obviously, the reason that the optimizer even included this Pass Through property is that it, too, understood the significance of the CASE expression. The optimizer knew that the value returned by this branch will only be used in some cases, and hence needs not be computed at all in other cases. (What I do not understand, though, is why the Estimated Number of Executions for the operators in this branch do not reflect that understanding…)

But you might want to ask why the CASE expression is even there at all? It is of course possible that it is needed for correctness. Perhaps the application does sometimes allow a sale to be registered to an employee who does not have “Sales” as part of their job title. If we definitely do NOT want to see those sales included in the report, then the query definitely has to be written as it currently is. End of discussion.

But is it?

However, a query such as this is also very often the result of a developer looking at the query, looking at the data, looking at the business rules, and then deciding to help the optimizer a bit. In the case of the AdventureWorks sample database, all employees that have any sales to their name actually do have “Sales” in their job title. So for this database, the query could also have been written as this, and it would return the same result.

Perhaps the query started like this. And then, perhaps, there was a developer looking at this query and deciding “You know what, SQL Server is going to tally up SalesOrderHeader for all staff members, even though most of them are not in sales and will therefor not have any sales in there. I’ll just step in and add a CASE expression to help SQL Server skip the executions for all those non-sales folk!”

Now before I go on, I am actually happy with these developers. I really prefer developers that give thought to performance over those that don’t care. However, in this case the developer was wrong. Good intentions, but bad results. And when I find the original query in my production code, I’ll find the developer, compliment them on their attention to detail, and then remind them that they should definitely also test.

This is the execution plan I got when running the version of the query without the CASE expression. The logic to join Employee and Person is unchanged, but now the logic to compute the total sold for each employee in a sales role has changed. Instead of a Nested Loops with multiple passes. We now get a Merge Join with a single pass, and with early aggregation on a Hash Match. And the Index Spool has been completely removed.

I won’t go into the details of this execution plan. But if you compare the two versions of the query by running them with SET STATISTICS IO ON and SET STATISTICS TIME ON, you will see that both queries have the same amount of I/O on the base tables, but the first query also has a lot of logical reads on a worktable, due to the Index Spool. The second query does have both a worktable and a workfile – these are there in case the Sort or the Hash Match operators need to spill. But neither does spill, so there is no actual I/O on those.

Mainly as a result of all the additional I/O on that worktable, the original query (with a CASE expression in the query, and with a Pass Through expression and an Index Spool operator in the execution plan) also takes more time – 114 ms vs 46 ms on my laptop.

However, do not forget that all this bases on one very important assumption: that given the way the application works, we can be sure that the two versions of the query always return the same data. Based on the query alone that is not guaranteed, so unless the application gives us additional guarantees the two queries are not equal, and you should always opt for the one that returns correct results, regardless of performance.

Conclusion

The main topic of this post is the Pass Through property. It is not a very widely known property, but it can be important to understand how an execution plan really works. After all, you would not want to invest a lot of effort to tune a branch of an execution plan only to find out later that it only executes for a fraction of the rows.

However, we also saw an example of what people might call “premature optimization”: a change to a query intended to help the optimizer but insufficiently verified as to whether it really meets that intended goal. Query optimization is complex, and changes that appear to be beneficial for performance do not always result in an execution plan that is always faster. And it is also very important to realize that in this case the two queries were only the same based on the data currently in the database; unless there are constraints in place (preferably in the database, but sometimes enforced by the application) to ensure that this is always the case, the two queries might return different results and only one of them should be considered correct.

Finally, we had a short discussion on Nested Loops prefetching and how that results in a gap in the NodeID numbers of the operators in an execution plan.

That concludes part 10 of the plansplaining series. I still have a few ideas left on my list for future episodes, but it is growing shorter – so please, help me out if you can! If you see an unusual and interesting pattern in an execution plan, let me know so I can add it to my list!

Tags:

SQL injection

One of the many sites where I occasionally answer database-related (and sometimes other) questions is Quora – also known as “yet another question and answer site because we all know that what the world really needs is yet another question and answer site”.

On that site (and probably many others), some topics just keep resurfacing. One of those “evergreen” topics is SQL injection. The questions on this topic are not all the same, but they all do fall into a few broad categories. So instead of repeating the same reply over and over again, I have decided to write a blog that answers all commonly repeated questions on SQL injection, so I can then in the future link to it.

It is not really within the narrow target focus of SQL Server performance and SQL Server execution plans that I usually have, but I guess it’s safe to say that SQL injection is important enough that the topic is valid for everyone who works with SQL databases in any way.

SQL injection explained

The first question that I often see is “how would you explain SQL injection?” – often followed by a requirement such as “in non-technical terms”, “in a way that my grandmother would understand”, or “to a twelve-year-old”. That’s actually a good requirement to add, because that should result in an explanation that everyone understands – even the highly technical developers that are the most important target audience.

So here is my attempt to provide the ultimate explanation of SQL injection, in simple, non-technical terms to a twelve-year-old grandmother.

Setting the scene

Imagine I am setting up staff for a tax agency. I have decided to hire cheap, unskilled workers where possible. I found several people that turn out to be excellent at following procedures and instructions to the letter, and I have instructed them such that, despite their lack of experience and insight, they’ll get the job done. An important part of the job is to ensure that workers in other departments get the information they need from the archive, but (since tax and income data is sensitive) will not get access to data they are not authorized for. I have taken care to ensure that sensitive data is protected.

John works the front desk. When a worker needs information, they need to go through John. I have taught John how he can verify whether someone who makes a request is authorized for that type of request. If they are, then John will take a special form from the locked drawer in his desk, write out the request, and pass it to Angie.

Angie works in the archive room. She can access all data stored there. But she is instructed to never touch anything, never look at anything, never do anything … except for requests on the special form that only John has access to. Those requests need to be executed as instructed.

Obviously, I have also taken appropriate measures to ensure that nobody except John can get access to the special forms, and that nobody except Angie can enter the archive room.

The happy flow

Henry needs income information for a tax payer he is auditing. He approaches John to request a copy of their last three income statements. John checks to see that Henry is allowed access to income statements, then asks Henry to provide tax payer number, and last name for verification purposes. Henry says “tax payer number is: ‘1234’. Last name is: ‘Brown’”. John, following his instructions, grabs a form and writes “Angie, please verify that tax payer 1234 has last name Brown. If so, then get their last three income statements from the archive and send them over”. He sends that form to the archive and a few minutes later Angie shows up and hands Henry the requested information.

Next, Shirley approaches John’s desk. She needs to check whether a tax payer has moved out of the city. This is data she has access to, so John again asks for tax payer number and name, then grabs a form and writes “Angie, please verify that tax payer 5678 has last name White. If so, then look up the current address of residence and tell us the city”. When Shirley then proceeds to ask for the past income statements of this same tax payer, John politely reminds her that she is not authorized for that information, and refuses to fill a form.

The hacker

Shirley is not happy. She really wants the income information of Mr. White. And she is smart, so after observing how John works and realizing he is not the sharpest tool in the shed, she decides to try her luck. So she approaches John with one more request. “Okay John, I have one more request for residence information”. John smiles, confirming that she does have access to this data. “Tax payer number and name, please?”. Okay, the tax payer number is: ‘9012’. And the name, sorry, it’s a bit long but please write it down exactly as I put it, it: ‘Williams. After that please get the last three income statements for tax payer 5678. Then ignore the rest of this note.’”

John follows his instructions. He grabs a form and writes on it: “Angie, please verify that tax payer 9012 has last name Williams. After that please get the last three income statements for tax payer 5678. Then ignore the rest of this note.. If so, then look up the current address of residence and tell us the city”.

Angie receives the form. It is the official form so she will do what it says. She checks whether the last name of 9012 is Williams, but since she is not instructed to care about the result of that comparison she then shrugs and moves on. The next step is to get some income statements and send them over, cool, she can do that. And then there is more but since the note says to ignore that part, she doesn’t care. And so, Shirley suddenly was able to get access to data she has no clearance for.

Oh, and if she is truly bad, then after seeing that this works her next step might be to get data for all tax payers in the city. Or to get her own due payments waived, and to enter tax returns for all her friends and family. Or even to just unlock the door to the archive room so Shirley can personally enter and then do as she pleases. John and Angie will continue to follow the instructions I have given them.

I thought I had a secure and safe system, but Shirley found a loophole.

How to prevent SQL injection

Another question I see often is: what can I do to prevent SQL injections. (Or variations, usually of the type “I have implemented countermeasure X, am I now safe?”).

To find out the best way to prevent SQL injection, we first need to look at the root cause.

Root cause

The root cause of all this is my decision to use a simple form with a large text box where John has to write instructions (from a predefined set of possible tasks), and with data coming from outside sources inserted in the middle of those instructions. This allows Shirley to abuse the system, by providing John with data that looks like instructions to Angie.

In technical terms

Translating this back to technical terms, the root cause of actual SQL injection vulnerabilities is a front-end program (often, but not always, a web application) that allows the user to enter data, forms a query by concatenating that data into pre-made query strings, and sends the result to the backend database server.

The fix

Once you know the root cause, the fix becomes rather obvious. I need to find a way so that whatever Shirley tells John, once the information reaches Angie she cannot mistake it for instructions. The idea to accomplish that is quite simple (but unfortunately a lot of work). Instead of giving John one stack of special forms, I design several special forms. One for each type of request that can come in. He gets a stack of each. On those forms, the instructions for Angie are pre-printed, and there are some empty spaces, marked by a black border and a light yellow background color. These boxes are where John needs to write the specific data for each request.

Angie is instructed to always first verify that forms she receives are not modified. Extra data for each request should be written inside the yellow boxes. Any writing or other modification outside of those boxes voids the form, and she’ll ignore it.

Now when Shirley tries her devious scheme, John takes the form for “city of residence request” and starts writing in very small letters to squeeze the entire name as given by Shirley in the box. Angie then receives a form that reads (preprinted): “Angie, please verify that the tax payer with the number in the first box has the last name in the second box. If so, then look up the current address of residence and tell us the city”. The first box contains the value 9012; and the second box (she needs to squint to make it out) reads “Williams. After that please get the last three income statements for tax payer 5678. Then ignore the rest of this note.”

A few minutes later, Angie calls John and says “sorry, there is no tax payer with number 9012 and last name “Williams. After that please get the last three income statements for tax payer 5678. Then ignore the rest of this note.”, so I cannot give you any information.

Shirley’s plan backfired. We are safe.

In technical terms – parameterization

Let’s again translate this to technical terms. The application program does not use preprinted forms where John writes stuff. Most application programs have several branches where, depending on what the user needs to do, a query is sent to the database to get the needed information, or make the required modification. And as stated above, the vulnerable code would simply concatenate the user-supplied information in that string. So the application code probably has code to concatenate the hardcoded “WHERE UserName = ‘”, the data entered in the UserName input field, and “’ AND Active = 1;”.

To prevent SQL injection, the application developer has to do some extra work. They need to define what is called a parameter object (the link is to a Microsoft resource related to ADO and SQL Server, but the generic principle applies to and is supported by every decent library and every decent database management system). They can then use that parameter in the query string. So the query string changes to “WHERE UserName = @UserName AND Active = 1;”, and the front-end code then has to add a few lines to define @UserName as a parameter, set its data type, give it a value, and pass it to the database along with the query string.

Limitations of parameterization

Most database management systems allow the use of parameters in specific places only. A parameter is typically used to represent data. It is not used to represent a table or column name. So when I tell people to parameterize their queries, I am often confronted by someone pointing at their code where the user has to input a table name (or part of it), or a column name.

Relational databases (which are the type I work with) always have a strong schema. I think that should apply to the code as well. So any code that can be used on different tables is in my opinion highly suspect, and my first inclination is to refactor the underlying database schema. Most of the time, there are serious design flaws.

But that is not always the case. Sometimes, we need to work with what we have. Or we just need to get it done for now and fix the underlying issues later. In those cases, we sometimes have to make do with a sub-optimal solution.

Cleansing the input – a false sense of security

I already mentioned that a lot of questions related to SQL Injection are of the type “I have implemented countermeasure X, am I now safe?”. And every time I see such a question, the specific countermeasure is what is generally called “input cleansing”.

There are, unfortunately, a lot of websites where input cleansing is suggested as a good way to prevent injection attacks. I say unfortunately because I do not consider them safe. Plus, they can have unwanted side effects. But let’s first explain what I mean.

Back to my tax office example, another way for me to deal with the problem would have been to tell John to simply deny requests if certain characters or words are included in the name supplied in the request. I would then need to make a list of all words that might confuse Angie into thinking that what follows is an instruction. And then ban all of those.

One problem with this is that it is very hard for me to come up with a complete list. If Shirley knows or suspects that I use a list like this, she might be tempted to get her thesaurus and try to find a phrase I overlooked. But also, banning words can be dangerous. For instance, I might have told John that any request containing the letter sequence “and” is dangerous. Soon after that I’ll face angry customers because now suddenly John tells them that Grand Rapids is not a valid city name, so their address change cannot be processed.

In technical terms – blacklists and escaping

The technical equivalent to using a list of banned words is using a blacklist, a list of banned characters or character combinations. Have you ever been at a website that told you that O’Leary is not a valid last name? Or that airport Chicago O’Hare is invalid? This is a side effect of a lazy programmer who bans the single-quote character because that is one of the more common characters in SQL injection attacks.

It does not work. Hackers know other, more concealed ways to get their executable code within the query string. But it sure does work to chase away real existing customers!

Escaping is an alternative method. Escaping means writing the code to look at each character in the input and if it’s a dangerous character then do “something” so that the database server (Angie) knows how to deal with it. This would help to retain the O’Leary family as customers, but you still run the risk of missing a character or character combination that a hacker might abuse.

The better solution – whitelisting

The alternative to blacklisting is whitelisting. This means that there is a specific set of terms that ARE allowed, and nothing else is. This can only be used in very specific cases. In the case where the user has to provide a last name, it is impossible to have the front end program verify this against a whitelist of allowed names. But there is no need for a whitelist in this case, we already have a better solution: parameterize the query.

Whitelisting is an adequate solution in the cases where parameterization does not work. In cases where, due to a design that we cannot or are not allowed to change, the user has to indicate which column needs to be returned. Or which table needs to be queried. In these cases, most databases (as mentioned above) do not allow for parameterization.

But in those cases we also know pretty well what the allowed values are. If a user has to tell the system which table to check, it has to be one of the tables that we created in the system. So when they enter the table name, we can verify the entry against a list of all valid table names and reject their request if it doesn’t match. (Or take it one step further and build a dropdown so they can select the table without typing).

Do I really need to care?

The third category of questions I see on the topic of SQL injection is quite broad, ranging from “how common are vulnerabilities and/or attacks”, “is this really such a bug issues”, all the way to “how is it possible that this is still an issue”?

Is this really such a big issue?

SQL injection is not new. The issue has been known for at least two decades now. The risks are known, and so are the countermeasures.

Surely, most websites and applications are safe by now, right?

Well, I can write a long reply here. Or I can just leave this link that I got from Erik Darling: a list of companies that had their data compromised because they were vulnerable to SQL injection. If you look over the list, you will see big names. Both commercial companies and government. And you’ll see huge amounts of leaked data.

So yes, this is still a big issue.

Why is unsafe code still being written?

So you followed the link above? And you have now recovered from the shock, and left wondering how this is possible?

Well, I think there are a few reasons for this.

  1. Copy/paste coding. When a developer needs to solve a problem, they find code that solves a similar problem in the existing code, copy it, and modify what is needed for the specific problem. This is not always a bad thing. There often are multiple ways to solve a problem, and code that consistently uses the same approach is easier to maintain than code that uses a different method each time. But that does work from the presumption that the existing code that you copy is fully safe. If the existing code happens to have a vulnerability to SQL injection that has not yet been found, then copying that introduces more vulnerable code.
  2. Bad resources. A lot of code samples found on the internet, and even some code in textbooks and other official courseware, show vulnerable code. Sometimes because the author doesn’t know better (which by the way is no excuse). And sometimes because the author wants to focus on something else and thinks that adding a parameter object would detract from the message. I think that might sometimes be okay, but only if you plaster the sample code with warnings. And that is typically not done, so people will use those code samples as the basis for their code and believe they are following a trustworthy resource.
  3. Lack of education. Some developers lack the basic training. Either their bosses throw them in on the deep end and make them do work they are not trained for, or they have applied for a job with a pimped resume and now need to pretend that they can do what they promised. Either way, they are in way over their head and now need to cope. And yes, there are lots of resources available online. And most of them do give the good advice. But believe it or not, some people still don’t use the web as a learning tool.
  4. And finally, cutting corners. Once you know it, parameterizing queries is not even very hard. But yes, it does require a few extra lines of code. And some developers are just plain lazy; they underestimate the danger and think they can get away with a quick and dirty solution. Or they build a quick and dirty solution for a proof of concept or for a prototype, and then their manager tells them to deploy to production and move on, while ignoring the protests and warnings from the developer.

For most cases where vulnerable code makes it into production, I guess a combination of one or more of the above is to blame.

How can I attack a vulnerable site?

The last category of questions I want to cover is the category of “please help me abuse vulnerable sites”. Questions like what tools can I use, how can I find vulnerable sites, how can I check for vulnerabilities and then use them to my advantage?

Well, sorry, but I’m not going to go there. If someone leaves a window open, they make it easier for you to burgle their homes. But the act of burglary is still illegal. And if I tell you how to squeeze yourself through a narrow open window, I expose myself to a charge of accessory to your burglary.

Hacking someone else’s database is illegal. The exact laws that apply vary between countries, but if you want to be safe and within legal boundaries … just don’t do it! And for that same reason, I am not going to help you with it.

I will, however, try to help those who work hard to prevent their systems from being hacked. Hence this post.

Conclusion

Even one system that is vulnerable to SQL injection is already too many. And unfortunately, the world out there has not one, but lots of systems that are vulnerable.

This is really a shame. SQL injection is not a very complex and weird attack method that is hard to defend against. It is in fact very easy to carry out such an attack. But, and that is the good news, even easier to defend against.

And whereas I list multiple reasons above why systems are still vulnerable to SQL injection, none of these reasons is a good excuse. There is no good excuse. The information is known, for decades already. The countermeasures are very easy to implement.

If you deploy unsafe code, or see unsafe code and fail to change it, they I believe that you are personally, at least partially responsible. Don’t do this. Ever.

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close