In this post, I'm going to talk about the rules for formatting queries. These rules help us to be safe from SQL Injection, but it should be noted that applying these rules manually isn't correct and may cause problems for our program.
The truth is that formatting rules are not that simple and you can't put them all in one list because they change based on different situations. The interesting thing here is that programmers usually treat SQL as if they were dealing with a string literal, while SQL is like a program, like a PHP script, and like any other program, it has its own structure and components. has the Therefore, each of these parts needs a special formatting that is specific to itself and does not work for other components. These rules in MySQL are as follows:
Identifiers are actually the database name, table name, row name and column name.
As you can see, these rules are different in different parts, so you can't escape all your data or use prepared statements everywhere!
Many of you must be saying, I used to do these things all the time. Is my app safe? No! The problem is that you do these things manually. You should never manually implement each and every one of these things, but you should have a mechanism that does it for you automatically. Why?
Why do you think formatting manually is a problem? This is because it is done manually, and anything done manually increases the chance of error dozens of times. Formatting manually depends on the level of literacy of the programmer, the state of the programmer, whether the programmer is tired or happy or sad, and so on.
In fact, the reliance of a system on the condition of the programmer is one of its biggest problems, and it can be said that manual formatting is the biggest and perhaps the only reason for SQL Injection attacks in the world.
but why?
Leo O'Hara from the previous post? This name is not a harmful code, but because of the presence of ' in the family, our program is in trouble. Your attention as a programmer should be on the literal data type, not the data source (where the data came from). If the data type is string, then it should be formatted as strings. This optionality of manual formatting and the fact that we can not use it wherever we want causes many problems.
The problem of the distance between manual formatting and query execution is a very important issue. Almost all of us are tempted to clean the data at once instead of sanitizing it in different parts of the program. When we do this, we may have cleared data that has a long way to run. What do you think is the problem?
First of all, we should note that if the distance between clearing and execution is long, it means that the corresponding query is not in front of our eyes, and in this case, we cannot be sure what kind of data this literal is. Therefore, the first and second rules of in-place formatting are violated.
Also, if we want to avoid this problem and clear the data in several different places, we will still face a big problem; It is possible that due to the similarity of some data, we may mistakenly think that we have already cleaned such data and enter it into the system without cleaning. Imagine what consequences such a problem can have in the group work that is done between several developers!
On the other hand, we may clean the same data twice; For example, we have cleared the data upon receiving, but we must clear it again before execution. This is not dangerous, but it is embarrassing for a professional developer.
The last problem is that cleaning or formatting before execution causes our variable to change and can no longer be used anywhere except for the query. Many times we want to make other uses of the input data because the input data is not only useful for the database, but if it is formatted from the beginning, it can no longer be used.
What is the solution with these interpretations? What mechanism should we use to get rid of these problems?
The concept of prepared statements in simple language is as follows:
Instead of entering data directly into the database, we can use elements that take the place of the data and send the data itself later.
In the following example, I enter data directly into the database through a variable:
uName = getRequestString("username");
uPass = getRequestString("userpassword");
sql = 'SELECT * FROM Users WHERE Name ="' + uName + '" AND Pass ="' + uPass + '"'
If the user fills the form honestly, an example like the following will appear:
SELECT * FROM Users WHERE Name ="albro Hive" AND Pass ="myPass"
But the situation is different if the hacker enters code like the following code:
SELECT * FROM Users WHERE Name ="" or ""="" AND Pass ="" or ""=""
This query is always true (this is the logic of the SQL language) and will provide the user/hacker with our table information. SQL injection methods are many and sometimes complicated. I have mentioned only one simple example of it here so that you can familiarize yourself with this topic.
First of all, I would like you to understand the difference between two main issues:
The ability to use prepared statements in Native form is very simple and smart; The query and its main data are sent to the server separately, and there is no possibility that there will be any interference between them, so SQL Injection becomes impossible. Of course, it should be said that this type of use has its limitations because they only support two types of literals (strings and numbers) and they are not sufficient for use in the real world.
There are misconceptions about native prepared statements:
The best and best advantage of using prepared statements is to eliminate the risk of SQL injection:
In fact, this is the reason for hating and avoiding manual formatting and praising prepared statements. Using prepared statements has two other advantages that are not critical but still a good help:
Therefore, being native of a prepared statement is not critical, we can emulate prepared statements and send a query to the server at once with PDO::ATTR_EMULATE_PREPARES turned on (set to true). In this case, the data is still formatted correctly and it can be said that it is safe.
Prepared statements can be used even with the old MySQL extension. The following function can give you maximum security by using the mentioned extension:
function paraQuery()
{
$args = func_get_args();
$query = array_shift($args);
$query = str_replace("%s","'%s'",$query);
foreach ($args as $key => $val)
{
$args[$key] = mysql_real_escape_string($val);
}
$query = vsprintf($query, $args);
$result = mysql_query($query);
if (!$result)
{
throw new Exception(mysql_error()." [$query]");
}
return $result;
}
$query = "SELECT * FROM table where a=%s AND b LIKE %s LIMIT %d";
$result = paraQuery($query, $a, "%$b%", $limit);
As you can see, in this example, the security is observed at the PDO level and everything is parameterized. So our rule of thumb is:
All dynamic values (i.e. values that change, such as variables) must be entered into the query with prepared statements. Why dynamic values? This is because the fixed parts of a query never change and any problems in this field will be identified during the development and coding of the website. Therefore, there is no need to manipulate fixed parts. In the example above, SELECT and FROM are fixed. Can you imagine a situation where they would cause trouble?
What was my purpose of this discussion?
My goal was for you to understand that the main security belongs to the discussion of prepared statements and is not specific to PDO, but my suggestion is to use PDO.