Formatting Rules In SQL Injection By albro

Formatting Rules In SQL Injection

In this post, I'm going to talk about the rules for formatting queries. These rules help us to be safe from SQL Injection, but it should be noted that applying these rules manually isn't correct and may cause problems for our program.

Formatting rules in MySQL

The truth is that formatting rules are not that simple and you can't put them all in one list because they change based on different situations. The interesting thing here is that programmers usually treat SQL as if they were dealing with a string literal, while SQL is like a program, like a PHP script, and like any other program, it has its own structure and components. has the Therefore, each of these parts needs a special formatting that is specific to itself and does not work for other components. These rules in MySQL are as follows:

Strings:
- Either they should be added through prepared statements or they should be inside Quotation (' ' or " " signs).
- Special characters must be escaped.
- Encoding must be clearly specified in advance or work based on hex.
Numbers:
- They should be added through prepared statement.
- or filtered to add only numeric values.
Identifiers:
- must be placed between two backtick signs (the ` sign - the key above the tab key, on the left and top of the keyboard)
- Special characters must be escaped.
Operators and keywords:
- There are no special rules for operators and keywords.

Identifiers are actually the database name, table name, row name and column name.

As you can see, these rules are different in different parts, so you can't escape all your data or use prepared statements everywhere!

Many of you must be saying, I used to do these things all the time. Is my app safe? No! The problem is that you do these things manually. You should never manually implement each and every one of these things, but you should have a mechanism that does it for you automatically. Why?

Manual Formatting

Why do you think formatting manually is a problem? This is because it is done manually, and anything done manually increases the chance of error dozens of times. Formatting manually depends on the level of literacy of the programmer, the state of the programmer, whether the programmer is tired or happy or sad, and so on.

In fact, the reliance of a system on the condition of the programmer is one of its biggest problems, and it can be said that manual formatting is the biggest and perhaps the only reason for SQL Injection attacks in the world.

but why?

Manual formatting can be incomplete: remember the Bobby Tables example from the previous post? This example is one of the best examples of incomplete formatting that was done manually because our string was only placed inside the quotation but not escaped. If you look at the rules above, I have said that in addition to using quotations, escaping and proper encoding should also be used in the fields. Imagine that your site has thousands of lines of code and it will be really difficult and time-consuming to comply with each of these items.
We may format a wrong literal: of course, as long as we follow the formatting rules, it is not so dangerous, because it will stop the program, and we will notice it during the site programming process and correct it, but if the formatting is The incomplete form is done, practically everything is finished! For example, many answers in Stack Overflow say that it is better to escape identifiers as well. This is completely wrong and may cause SQL Injection.
Manual formatting is completely optional: apart from the fact that we may get distracted and not implement the formatting perfectly, another interesting point is seen among PHP programmers. Some of them do not format some data at all because they think that data is divided into categories such as clean (non-harmful) and unclean (harmful) or user input and non-user input. The misconception is that they think that secure data does not need to be formatted; Remember the name Leo O'Hara from the previous post? This name is not a harmful code, but because of the presence of ' in the family, our program is in trouble. Your attention as a programmer should be on the literal data type, not the data source (where the data came from). If the data type is string, then it should be formatted as strings. This optionality of manual formatting and the fact that we can not use it wherever we want causes many problems.
Manual formatting can be very far from query execution: this problem is almost ignored, but it is the most important problem of manual formatting.
Manual formatting makes the code more crowded and cluttered when we didn't need to confuse ourselves like that.

The problem of the distance between manual formatting and query execution is a very important issue. Almost all of us are tempted to clean the data at once instead of sanitizing it in different parts of the program. When we do this, we may have cleared data that has a long way to run. What do you think is the problem?

First of all, we should note that if the distance between clearing and execution is long, it means that the corresponding query is not in front of our eyes, and in this case, we cannot be sure what kind of data this literal is. Therefore, the first and second rules of in-place formatting are violated.

Also, if we want to avoid this problem and clear the data in several different places, we will still face a big problem; It is possible that due to the similarity of some data, we may mistakenly think that we have already cleaned such data and enter it into the system without cleaning. Imagine what consequences such a problem can have in the group work that is done between several developers!

On the other hand, we may clean the same data twice; For example, we have cleared the data upon receiving, but we must clear it again before execution. This is not dangerous, but it is embarrassing for a professional developer.

The last problem is that cleaning or formatting before execution causes our variable to change and can no longer be used anywhere except for the query. Many times we want to make other uses of the input data because the input data is not only useful for the database, but if it is formatted from the beginning, it can no longer be used.

what's the solution?

What is the solution with these interpretations? What mechanism should we use to get rid of these problems?

What is a prepared statement?

The concept of prepared statements in simple language is as follows:

Instead of entering data directly into the database, we can use elements that take the place of the data and send the data itself later.

In the following example, I enter data directly into the database through a variable:

uName = getRequestString("username");
uPass = getRequestString("userpassword");
sql = 'SELECT * FROM Users WHERE Name ="' + uName + '" AND Pass ="' + uPass + '"'

If the user fills the form honestly, an example like the following will appear:

SELECT * FROM Users WHERE Name ="albro Hive" AND Pass ="myPass"

But the situation is different if the hacker enters code like the following code:

SELECT * FROM Users WHERE Name ="" or ""="" AND Pass ="" or ""=""

This query is always true (this is the logic of the SQL language) and will provide the user/hacker with our table information. SQL injection methods are many and sometimes complicated. I have mentioned only one simple example of it here so that you can familiarize yourself with this topic.

First of all, I would like you to understand the difference between two main issues:

The ability to use prepared statements natively in the world's leading DBMS.
The concept of using placeholders as data representatives in a query.

The ability to use prepared statements in Native form is very simple and smart; The query and its main data are sent to the server separately, and there is no possibility that there will be any interference between them, so SQL Injection becomes impossible. Of course, it should be said that this type of use has its limitations because they only support two types of literals (strings and numbers) and they are not sufficient for use in the real world.

There are misconceptions about native prepared statements:

They are faster: not so in PHP! Because PHP does not allow you to reuse a prepared statement in different calls, and on the other hand, repeated queries that are in the same instance rarely happen, so there is no speed.
They are safer: This statement is correct, but not because they are native, but because they are prepared statements and do not use manual formatting.

The best and best advantage of using prepared statements is to eliminate the risk of SQL injection:

A prepared statement takes care of all formatting issues and does them automatically without the need for programmer intervention.
A prepared statement does the formatting without any errors and is safe.
A prepared statement performs the formatting exactly at the right time, that is, just before executing the query.

In fact, this is the reason for hating and avoiding manual formatting and praising prepared statements. Using prepared statements has two other advantages that are not critical but still a good help:

prepared statements do not manipulate the original data, so the original data can be used elsewhere; For example, it can be displayed in the browser or stored in cookies.
In some cases, programmers can make their codes much shorter by using prepared statements.

Therefore, being native of a prepared statement is not critical, we can emulate prepared statements and send a query to the server at once with PDO::ATTR_EMULATE_PREPARES turned on (set to true). In this case, the data is still formatted correctly and it can be said that it is safe.

Prepared statements can be used even with the old MySQL extension. The following function can give you maximum security by using the mentioned extension:

function paraQuery()
{
    $args  = func_get_args();
    $query = array_shift($args);
    $query = str_replace("%s","'%s'",$query); 
    foreach ($args as $key => $val)
    {
        $args[$key] = mysql_real_escape_string($val);
    }
    $query  = vsprintf($query, $args);
    $result = mysql_query($query);
    if (!$result)
    {
        throw new Exception(mysql_error()." [$query]");
    }
    return $result;
}
$query  = "SELECT * FROM table where a=%s AND b LIKE %s LIMIT %d";
$result = paraQuery($query, $a, "%$b%", $limit);

As you can see, in this example, the security is observed at the PDO level and everything is parameterized. So our rule of thumb is:

All dynamic values (i.e. values that change, such as variables) must be entered into the query with prepared statements. Why dynamic values? This is because the fixed parts of a query never change and any problems in this field will be identified during the development and coding of the website. Therefore, there is no need to manipulate fixed parts. In the example above, SELECT and FROM are fixed. Can you imagine a situation where they would cause trouble?

What was my purpose of this discussion?

My goal was for you to understand that the main security belongs to the discussion of prepared statements and is not specific to PDO, but my suggestion is to use PDO.

Formatting Rules In SQL Injection By albro | Ecency