One of the many sites where I occasionally answer database-related (and sometimes other) questions is Quora – also known as “yet another question and answer site because we all know that what the world really needs is yet another question and answer site”.
On that site (and probably many others), some topics just keep resurfacing. One of those “evergreen” topics is SQL injection. The questions on this topic are not all the same, but they all do fall into a few broad categories. So instead of repeating the same reply over and over again, I have decided to write a blog that answers all commonly repeated questions on SQL injection, so I can then in the future link to it.
It is not really within the narrow target focus of SQL Server performance and SQL Server execution plans that I usually have, but I guess it’s safe to say that SQL injection is important enough that the topic is valid for everyone who works with SQL databases in any way.
SQL injection explained
The first question that I often see is “how would you explain SQL injection?” – often followed by a requirement such as “in non-technical terms”, “in a way that my grandmother would understand”, or “to a twelve-year-old”. That’s actually a good requirement to add, because that should result in an explanation that everyone understands – even the highly technical developers that are the most important target audience.
So here is my attempt to provide the ultimate explanation of SQL injection, in simple, non-technical terms to a twelve-year-old grandmother.
Setting the scene
Imagine I am setting up staff for a tax agency. I have decided to hire cheap, unskilled workers where possible. I found several people that turn out to be excellent at following procedures and instructions to the letter, and I have instructed them such that, despite their lack of experience and insight, they’ll get the job done. An important part of the job is to ensure that workers in other departments get the information they need from the archive, but (since tax and income data is sensitive) will not get access to data they are not authorized for. I have taken care to ensure that sensitive data is protected.
John works the front desk. When a worker needs information, they need to go through John. I have taught John how he can verify whether someone who makes a request is authorized for that type of request. If they are, then John will take a special form from the locked drawer in his desk, write out the request, and pass it to Angie.
Angie works in the archive room. She can access all data stored there. But she is instructed to never touch anything, never look at anything, never do anything … except for requests on the special form that only John has access to. Those requests need to be executed as instructed.
Obviously, I have also taken appropriate measures to ensure that nobody except John can get access to the special forms, and that nobody except Angie can enter the archive room.
The happy flow
Henry needs income information for a tax payer he is auditing. He approaches John to request a copy of their last three income statements. John checks to see that Henry is allowed access to income statements, then asks Henry to provide tax payer number, and last name for verification purposes. Henry says “tax payer number is: ‘1234’. Last name is: ‘Brown’”. John, following his instructions, grabs a form and writes “Angie, please verify that tax payer 1234 has last name Brown. If so, then get their last three income statements from the archive and send them over”. He sends that form to the archive and a few minutes later Angie shows up and hands Henry the requested information.
Next, Shirley approaches John’s desk. She needs to check whether a tax payer has moved out of the city. This is data she has access to, so John again asks for tax payer number and name, then grabs a form and writes “Angie, please verify that tax payer 5678 has last name White. If so, then look up the current address of residence and tell us the city”. When Shirley then proceeds to ask for the past income statements of this same tax payer, John politely reminds her that she is not authorized for that information, and refuses to fill a form.
Shirley is not happy. She really wants the income information of Mr. White. And she is smart, so after observing how John works and realizing he is not the sharpest tool in the shed, she decides to try her luck. So she approaches John with one more request. “Okay John, I have one more request for residence information”. John smiles, confirming that she does have access to this data. “Tax payer number and name, please?”. Okay, the tax payer number is: ‘9012’. And the name, sorry, it’s a bit long but please write it down exactly as I put it, it: ‘Williams. After that please get the last three income statements for tax payer 5678. Then ignore the rest of this note.’”
John follows his instructions. He grabs a form and writes on it: “Angie, please verify that tax payer 9012 has last name Williams. After that please get the last three income statements for tax payer 5678. Then ignore the rest of this note.. If so, then look up the current address of residence and tell us the city”.
Angie receives the form. It is the official form so she will do what it says. She checks whether the last name of 9012 is Williams, but since she is not instructed to care about the result of that comparison she then shrugs and moves on. The next step is to get some income statements and send them over, cool, she can do that. And then there is more but since the note says to ignore that part, she doesn’t care. And so, Shirley suddenly was able to get access to data she has no clearance for.
Oh, and if she is truly bad, then after seeing that this works her next step might be to get data for all tax payers in the city. Or to get her own due payments waived, and to enter tax returns for all her friends and family. Or even to just unlock the door to the archive room so Shirley can personally enter and then do as she pleases. John and Angie will continue to follow the instructions I have given them.
I thought I had a secure and safe system, but Shirley found a loophole.
How to prevent SQL injection
Another question I see often is: what can I do to prevent SQL injections. (Or variations, usually of the type “I have implemented countermeasure X, am I now safe?”).
To find out the best way to prevent SQL injection, we first need to look at the root cause.
The root cause of all this is my decision to use a simple form with a large text box where John has to write instructions (from a predefined set of possible tasks), and with data coming from outside sources inserted in the middle of those instructions. This allows Shirley to abuse the system, by providing John with data that looks like instructions to Angie.
In technical terms
Translating this back to technical terms, the root cause of actual SQL injection vulnerabilities is a front-end program (often, but not always, a web application) that allows the user to enter data, forms a query by concatenating that data into pre-made query strings, and sends the result to the backend database server.
Once you know the root cause, the fix becomes rather obvious. I need to find a way so that whatever Shirley tells John, once the information reaches Angie she cannot mistake it for instructions. The idea to accomplish that is quite simple (but unfortunately a lot of work). Instead of giving John one stack of special forms, I design several special forms. One for each type of request that can come in. He gets a stack of each. On those forms, the instructions for Angie are pre-printed, and there are some empty spaces, marked by a black border and a light yellow background color. These boxes are where John needs to write the specific data for each request.
Angie is instructed to always first verify that forms she receives are not modified. Extra data for each request should be written inside the yellow boxes. Any writing or other modification outside of those boxes voids the form, and she’ll ignore it.
Now when Shirley tries her devious scheme, John takes the form for “city of residence request” and starts writing in very small letters to squeeze the entire name as given by Shirley in the box. Angie then receives a form that reads (preprinted): “Angie, please verify that the tax payer with the number in the first box has the last name in the second box. If so, then look up the current address of residence and tell us the city”. The first box contains the value 9012; and the second box (she needs to squint to make it out) reads “Williams. After that please get the last three income statements for tax payer 5678. Then ignore the rest of this note.”
A few minutes later, Angie calls John and says “sorry, there is no tax payer with number 9012 and last name “Williams. After that please get the last three income statements for tax payer 5678. Then ignore the rest of this note.”, so I cannot give you any information.
Shirley’s plan backfired. We are safe.
In technical terms – parameterization
Let’s again translate this to technical terms. The application program does not use preprinted forms where John writes stuff. Most application programs have several branches where, depending on what the user needs to do, a query is sent to the database to get the needed information, or make the required modification. And as stated above, the vulnerable code would simply concatenate the user-supplied information in that string. So the application code probably has code to concatenate the hardcoded “WHERE UserName = ‘”, the data entered in the UserName input field, and “’ AND Active = 1;”.
To prevent SQL injection, the application developer has to do some extra work. They need to define what is called a parameter object (the link is to a Microsoft resource related to ADO and SQL Server, but the generic principle applies to and is supported by every decent library and every decent database management system). They can then use that parameter in the query string. So the query string changes to “WHERE UserName = @UserName AND Active = 1;”, and the front-end code then has to add a few lines to define @UserName as a parameter, set its data type, give it a value, and pass it to the database along with the query string.
Limitations of parameterization
Most database management systems allow the use of parameters in specific places only. A parameter is typically used to represent data. It is not used to represent a table or column name. So when I tell people to parameterize their queries, I am often confronted by someone pointing at their code where the user has to input a table name (or part of it), or a column name.
Relational databases (which are the type I work with) always have a strong schema. I think that should apply to the code as well. So any code that can be used on different tables is in my opinion highly suspect, and my first inclination is to refactor the underlying database schema. Most of the time, there are serious design flaws.
But that is not always the case. Sometimes, we need to work with what we have. Or we just need to get it done for now and fix the underlying issues later. In those cases, we sometimes have to make do with a sub-optimal solution.
Cleansing the input – a false sense of security
I already mentioned that a lot of questions related to SQL Injection are of the type “I have implemented countermeasure X, am I now safe?”. And every time I see such a question, the specific countermeasure is what is generally called “input cleansing”.
There are, unfortunately, a lot of websites where input cleansing is suggested as a good way to prevent injection attacks. I say unfortunately because I do not consider them safe. Plus, they can have unwanted side effects. But let’s first explain what I mean.
Back to my tax office example, another way for me to deal with the problem would have been to tell John to simply deny requests if certain characters or words are included in the name supplied in the request. I would then need to make a list of all words that might confuse Angie into thinking that what follows is an instruction. And then ban all of those.
One problem with this is that it is very hard for me to come up with a complete list. If Shirley knows or suspects that I use a list like this, she might be tempted to get her thesaurus and try to find a phrase I overlooked. But also, banning words can be dangerous. For instance, I might have told John that any request containing the letter sequence “and” is dangerous. Soon after that I’ll face angry customers because now suddenly John tells them that Grand Rapids is not a valid city name, so their address change cannot be processed.
In technical terms – blacklists and escaping
The technical equivalent to using a list of banned words is using a blacklist, a list of banned characters or character combinations. Have you ever been at a website that told you that O’Leary is not a valid last name? Or that airport Chicago O’Hare is invalid? This is a side effect of a lazy programmer who bans the single-quote character because that is one of the more common characters in SQL injection attacks.
It does not work. Hackers know other, more concealed ways to get their executable code within the query string. But it sure does work to chase away real existing customers!
Escaping is an alternative method. Escaping means writing the code to look at each character in the input and if it’s a dangerous character then do “something” so that the database server (Angie) knows how to deal with it. This would help to retain the O’Leary family as customers, but you still run the risk of missing a character or character combination that a hacker might abuse.
The better solution – whitelisting
The alternative to blacklisting is whitelisting. This means that there is a specific set of terms that ARE allowed, and nothing else is. This can only be used in very specific cases. In the case where the user has to provide a last name, it is impossible to have the front end program verify this against a whitelist of allowed names. But there is no need for a whitelist in this case, we already have a better solution: parameterize the query.
Whitelisting is an adequate solution in the cases where parameterization does not work. In cases where, due to a design that we cannot or are not allowed to change, the user has to indicate which column needs to be returned. Or which table needs to be queried. In these cases, most databases (as mentioned above) do not allow for parameterization.
But in those cases we also know pretty well what the allowed values are. If a user has to tell the system which table to check, it has to be one of the tables that we created in the system. So when they enter the table name, we can verify the entry against a list of all valid table names and reject their request if it doesn’t match. (Or take it one step further and build a dropdown so they can select the table without typing).
Do I really need to care?
The third category of questions I see on the topic of SQL injection is quite broad, ranging from “how common are vulnerabilities and/or attacks”, “is this really such a bug issues”, all the way to “how is it possible that this is still an issue”?
Is this really such a big issue?
SQL injection is not new. The issue has been known for at least two decades now. The risks are known, and so are the countermeasures.
Surely, most websites and applications are safe by now, right?
Well, I can write a long reply here. Or I can just leave this link that I got from Erik Darling: a list of companies that had their data compromised because they were vulnerable to SQL injection. If you look over the list, you will see big names. Both commercial companies and government. And you’ll see huge amounts of leaked data.
So yes, this is still a big issue.
Why is unsafe code still being written?
So you followed the link above? And you have now recovered from the shock, and left wondering how this is possible?
Well, I think there are a few reasons for this.
- Copy/paste coding. When a developer needs to solve a problem, they find code that solves a similar problem in the existing code, copy it, and modify what is needed for the specific problem. This is not always a bad thing. There often are multiple ways to solve a problem, and code that consistently uses the same approach is easier to maintain than code that uses a different method each time. But that does work from the presumption that the existing code that you copy is fully safe. If the existing code happens to have a vulnerability to SQL injection that has not yet been found, then copying that introduces more vulnerable code.
- Bad resources. A lot of code samples found on the internet, and even some code in textbooks and other official courseware, show vulnerable code. Sometimes because the author doesn’t know better (which by the way is no excuse). And sometimes because the author wants to focus on something else and thinks that adding a parameter object would detract from the message. I think that might sometimes be okay, but only if you plaster the sample code with warnings. And that is typically not done, so people will use those code samples as the basis for their code and believe they are following a trustworthy resource.
- Lack of education. Some developers lack the basic training. Either their bosses throw them in on the deep end and make them do work they are not trained for, or they have applied for a job with a pimped resume and now need to pretend that they can do what they promised. Either way, they are in way over their head and now need to cope. And yes, there are lots of resources available online. And most of them do give the good advice. But believe it or not, some people still don’t use the web as a learning tool.
- And finally, cutting corners. Once you know it, parameterizing queries is not even very hard. But yes, it does require a few extra lines of code. And some developers are just plain lazy; they underestimate the danger and think they can get away with a quick and dirty solution. Or they build a quick and dirty solution for a proof of concept or for a prototype, and then their manager tells them to deploy to production and move on, while ignoring the protests and warnings from the developer.
For most cases where vulnerable code makes it into production, I guess a combination of one or more of the above is to blame.
How can I attack a vulnerable site?
The last category of questions I want to cover is the category of “please help me abuse vulnerable sites”. Questions like what tools can I use, how can I find vulnerable sites, how can I check for vulnerabilities and then use them to my advantage?
Well, sorry, but I’m not going to go there. If someone leaves a window open, they make it easier for you to burgle their homes. But the act of burglary is still illegal. And if I tell you how to squeeze yourself through a narrow open window, I expose myself to a charge of accessory to your burglary.
Hacking someone else’s database is illegal. The exact laws that apply vary between countries, but if you want to be safe and within legal boundaries … just don’t do it! And for that same reason, I am not going to help you with it.
I will, however, try to help those who work hard to prevent their systems from being hacked. Hence this post.
Even one system that is vulnerable to SQL injection is already too many. And unfortunately, the world out there has not one, but lots of systems that are vulnerable.
This is really a shame. SQL injection is not a very complex and weird attack method that is hard to defend against. It is in fact very easy to carry out such an attack. But, and that is the good news, even easier to defend against.
And whereas I list multiple reasons above why systems are still vulnerable to SQL injection, none of these reasons is a good excuse. There is no good excuse. The information is known, for decades already. The countermeasures are very easy to implement.
If you deploy unsafe code, or see unsafe code and fail to change it, they I believe that you are personally, at least partially responsible. Don’t do this. Ever.