Sanitizing is a great way to protect WordPress code

8 min read
6541

It’s no secret that security is an essential part of any software. In this article, I will tell you about data sanitization and how a few small rules can help you greatly secure your codebase. If sanitization is the antidote, then what is the poison? Code injection is one of the most popular vulnerabilities in web development. Let’s take a deeper look at the problem. Let’s get started.

Sanitizing is defined as cleaning something to make it free of bacteria or disease causing elements.

Yeah, I like the definition outside of computer science because it’s a great analogy from the real world. Bacteria or disease causing elements can be a reason for some diseases, possibly severe, possibly incurable. Let’s get started with the cause of why do sanitizing is important.

Code injection is a surprise in user data

Code injection is the exploitation of a computer bug that is caused by processing invalid data. The injection is used by an attacker to introduce (or “inject”) code into a vulnerable computer program and change the course of execution. The result of successful code injection can be disastrous, for example, by allowing computer viruses or computer worms to propagate.

Code injection vulnerabilities occur when an application sends untrusted data to an interpreter. Injection flaws are most often found in SQL, LDAP, XPath, NoSQL queries, OS commands, XML parsers, SMTP headers, program arguments, etc. Injection flaws tend to be easier to discover when examining source code than via testing.

Wikipedia

Code injection is the common name for all vulnerabilities that rely on adding and running vulnerable code. All code injections work in about the same way. On the SQL injection example we can take a look and dive into the injection process:

<?php

$search = $_POST['search'];

global $wpdb;

$posts = $wpdb->get_results(
	'SELECT ID, post_title, post_content FROM ' . $wpdb->posts . '
	WHERE post_type="post"
	AND post_content LIKE "%' . $search . '%"'
);

wp_send_json_success( $posts );

Can you see a place for injection? Yes, this is the $search = $_POST['search']; line. However, in the positive case, when the client enters some word or phrase, everything works without any problems. However, in the extreme case, what can happen if someone enters SQL code here? That’s where the fun begins…

Looking at the code, you can get absolutely any data from the database. Don’t you believe it? Let’s start with a simple example and send the %" OR 1 = "1 code to the search field and that way we can get all the entries from the tables. Consequently, there could be other posts types, private posts, etc.

We can also get other data, such as the username and password if we send the %" UNION SELECT 0, user_pass, user_login FROM wp_users WHERE user_pass LIKE "%" code. Even though we have a hash instead of a real password, it’s easy to crack with password guessing programs like John The Ripper or similar.

Despite this problem, the recipe for preventing injections is one, and it’s simple: removing unexpected characters from user data or, in other words, sanitizing user input.

The early sanitize or security input technique

To sanitize means to remove unexpected characters (e.g., HTML symbols, some code, special symbols, etc.) from user input.

When dealing with u003cstrongu003euser inputu003c/strongu003e, you must u003cstrongu003ealways sanitizeu003c/strongu003e these data once, u003cstrongu003eas soon as possibleu003c/strongu003e.

Remember that even one unexpected request can cause a lot of problems, sleepless nights, and headaches.

Although the definitions are brief and clear, we need to be clear about what each component means. What does the user input mean? Fortunately, PHP has a clear and concise answer — these are superglobal variables such as $_REQUEST, $_POST, $_GET, $_FILES, $_COOKIE, and in some cases $_ENV, $_SERVER and $_SESSION.

Since all developers should use a powerful tool like the IDE, which can automatically remember when to perform certain actions, of course sanitizing is no exception. The WordPress team has prepared WordPress Coding Standards, which, in addition to checking coding standards, reminds us to sanitize user input.

You’ve read about sanitizing several dozen times before, but what does it mean? It means removing unexpected characters from the data that could potentially break your application or in other words pass the data throw PHP or WordPress sanitizing functions.

PHP sanitizing functions list

Here is the list of PHP sanitizing functions:

  • filter_input( int $type, string $var_name, int $filter = FILTER_DEFAULT, array|int $options = 0 ): mixed
  • filter_input_array( int $type, array|int $options = FILTER_DEFAULT, bool $add_empty = true ): array|false | null
  • filter_var( mixed $value, int $filter = FILTER_DEFAULT, array|int $options = 0 ): mixed
  • filter_var_array( array $array, array|int $options = FILTER_DEFAULT, bool $add_empty = true ): array|false | null
  • (int) $var
  • (float) $var
  • intval( mixed $value, int $base = 10 ): int
  • floatval( mixed $value ): float
  • etc.

WordPress sanitizing functions list

And WordPress sanitizing functions:

  • absint( mixed $maybeint ): int
  • esc_url_raw( string $url, string[] $protocols = null ): string
  • sanitize_bookmark( stdClass|array $bookmark, string $context = 'display' ): strClass|array
  • sanitize_bookmark_field( string $field, mixed $value, int $bookmark_id, string $context ): mixed
  • sanitize_category( object|array $category, string $context ): object|array
  • sanitize_category_field( string $field, mixed $value, int $cat_id, string $context ): mixed
  • sanitize_file_name( string $filename ): string
  • sanitize_email( string $email ): string
  • sanitize_hex_color( string $color ): string|void
  • sanitize_hex_color_no_hash( string $color ): string|null
  • sanitize_html_class( string $class, string $fallback = '' ): string
  • sanitize_key( string $key ): string
  • sanitize_meta( string $meta_key, mixed $meta_value, string $object_type, string $object_subtype = '' ): mixed
  • sanitize_mime_type( string $mime_type ): string
  • sanitize_option( string $option, string $value ): string
  • sanitize_post( object|WP_Post|array $post, string $context = 'display' ): object|WP_Post|array
  • sanitize_post_field( string $field, mixed $value, int $post_id, string $context = 'display' ): mixed
  • sanitize_sql_orderby( string $orderby ): string|false
  • sanitize_term( array|object $term, string $taxonomy, string $context = 'display' ): array|object
  • sanitize_term_field( string $field, string $value, int $term_id, string $taxonomy, string $context ): mixed
  • sanitize_text_field( string $str ): string
  • sanitize_textarea_field( string $str ): string
  • sanitize_title( string $title, string $fallback_title = '', string $context = 'save' ): string
  • sanitize_title_for_query( string $title ): string
  • sanitize_title_with_dashes( string $title, string $raw_title = '', string $context = 'display' ): string
  • sanitize_user( string $username, bool $strict = false ): string
  • sanitize_user_field( string $field, mixed $value, int $user_id, string $context ): mixed
  • wp_filter_comment( array $commentdata ): array
  • wp_filter_content_tags( string $content, string $context = null ): string
  • wp_filter_nohtml_kses( string $data): string
  • wp_filter_object_list( array $list, array $args = [], string $operator = 'and', bool|string $field = false ): array
  • wp_filter_oembed_iframe_title_attribute( string $result, object $data, string $url ): string
  • wp_filter_oembed_result( string $result, object $data, string $url ): string
  • wp_filter_pre_oembed_result( null|string $result, string $url, array $args ): null|string
  • wp_kses( string $string, array[]|string $allowed_html, string[] $allowed_protocols = [] ): string
  • etc.

Despite the huge number of functions above, how do I find the right one for my case? It is better if you choose the most stringent function for your case. In other words, look at your data and think logically. Naturally, your best friend is critical thinking and the autocomplete in PhpStorm. Also, feel free to dive into how these functions and investigate their code. Let’s dive into a few examples:

<?php

if ( ! wp_verify_nonce( sanitize_key( $_POST['_wpnonce'] ), 'security::example' ) ) {
	return;
}

$post_title   = isset( $_POST['post_title'] ) ? sanitize_text_field( wp_unslash( $_POST['post_title'] ) ) : '';
$post_content = isset( $_POST['post_content'] ) ? sanitize_textarea_field( wp_unslash( $_POST['post_content'] ) ) : '';
$email        = isset( $_POST['email'] ) ? sanitize_email( $_POST['email'] ) : '';
$post_id      = isset( $_GET['post_id'] ) ? absint( $_GET['post_id'] ) : 0;
$referer_url  = ! empty( $_SERVER['HTTP_REFERER'] ) ? esc_url_raw( wp_unslash( $_SERVER['HTTP_REFERER'] ) ) : '';
$orderby      = ! empty( $_COOKIE['my_plugin_orderby'] ) && strtoupper( sanitize_text_field( wp_unslash( $_COOKIE['my_plugin_orderby'] ) ) ) === 'ASC' ?
	'ASC' : 'DESC';
$admin_color  = ! empty( $_SESSION['my_plugin_admin_color'] ) ? sanitize_hex_color( wp_unslash( $_SESSION['my_plugin_admin_color'] ) ) : '#181818';

To make it easygoing, I’ve prepared a shortlist of what to use and when to use it:

  • numbers – (int)..., (float)..., absint( ... )
  • nonces – sanitize_key( ... )
  • one-line field – sanitize_text_field( wp_unslash( ... ) )
  • few-lines field – sanitize_textarea_field( wp_unslash( ... ) )
  • email – sanitize_email( ... )
  • urls – esc_url( ... ), esc_url_raw( ... )
  • file names – sanitize_file_name( wp_unslash( ... ) )

Why do we use wp_unslash sometimes?

WordPress core developers care about site security and add extra slashes for characters like options, post content, user data, etc. to prevent XSS attacks. XSS attacks are a massive topic that we’ll discuss in another article, but in a nutshell, if someone finds a way to put the <script>alert('Bingo' );</script> code into your database, it will be stored in the database as <script>alert( 'Bingo' );</script> and as a result, the script won’t work when you print it.

How to protect your WordPress database?

Another important thing that needs your attention is working with the database. Merely use WordPress functions such as get_posts, get_terms, get_users, etc. for this, as they are safe. Okay, but what to do with custom tables and SQL queries?

Sanitization cannot guarantee protection against SQL injection.

Let’s investigate the following example:

<?php

$search = sanitize_text_field( wp_unslash( $_POST['search'] ) );

global $wpdb;

$posts = $wpdb->get_results(
	'SELECT ID, post_title, post_content FROM ' . $wpdb->posts . '
	WHERE post_type="post"
	AND post_content LIKE "%' . $search . '%"' // SQL Injection here.
);

wp_send_json_success( $posts );

The code above still is vulnerable. SQL injection can be passed in different ways:

  • Multi-queries — close a previous query and write one more
  • Subqueries — use a subquery as a value for the main query
  • Break quotes — by closing a quote and adding code after
  • The UNION statement — add a new query via UNION SELECT ... construction.

The multi-query example: SELECT * FROM wp_posts; SELECT * FROM wp_users.

Technically, PHP only allows multi-query queries through the mysql_multi_query and mysqli_multi_query functions (or their object-oriented counterparts).

Firstly, the good news is that WordPress doesn’t use them, and to protect your code from multi-queries SQL injections you merely should always use wpdb for custom queries.

Secondly, subqueries are possible to inject only into numeric values and into the IN statement. Here are two rules: sanitize numeric values via (int), (float), absint( … ), etc., and don’t use changeable strings inside the IN statement (yep, it’s hard, but I don’t have another recipe.

Last but not least, to prevent quotes from breaking and the use of UNION statements. You should use special wpdb methods for creating, updating, replacing, and deleting or the wpdb::prepare method firstly inside all other cases.

To summarize all these rules for custom queries:

  • Always use wpdb
  • Always sanitize values
  • Numerical values should be sanitized with extra care
  • Don’t use changable string values inside the IN statements
  • For creating, udating, and deleting use special wpdb methods:
    • $wpdb->insert( ... );
    • $wpdb->update( ... );
    • $wpdb->replace( ... );
    • $wpdb->delete( ... );
  • For reading always use $wpdb->prepare inside:
    • $wpdb->select( $wpdb->prepare( … ) );
    • $wpdb->get_results( $wpdb->prepare( … ) );
    • $wpdb->get_var( $wpdb->prepare( … ) );
    • $wpdb->get_col( $wpdb->prepare( … ) );
    • $wpdb->get_row( $wpdb->prepare( … ) );

In addition, a few examples the proper wpdb methods usage:

<?php

global $wpdb;

$wpdb->insert(
	$wpdb->posts,
	[
		'post_title' => sanitize_text_field( wp_unslash( $_POST['post_title'] ) ),
	],
	[
		'%s',
	]
);

$wpdb->update(
	$wpdb->posts,
	[
		'post_title' => sanitize_text_field( wp_unslash( $_POST['post_title'] ) ),
	],
	[
		'ID' => absint( $_POST['post_id'] ),
	],
	[
		'%s',
	]
);

$wpdb->query(
	$wpdb->prepare(
		"SELECT * FROM {$wpdb->posts} WHERE ID > %d AND post_title LIKE %s",
		absint( $_POST['post_id'] ),
		'%' . $wpdb->esc_like( sanitize_text_field( wp_unslash( $_POST['search'] ) ) ) . '%'
	)
);

How to create a custom sanitize function?

When you are working on a large project, you will most likely need your own functions to sanitze the data. In the example below, we have a fairly large function to handle the sort order.

function security_sanitize_orderby( $string, $default = 'ASC' ) {

	$string          = strtoupper( trim( sanitize_text_field( wp_unslash( $string ) ) ) );
	$allowed_orderby = [
		'RAND()',
		'ASC',
		'DESC',
	];
	$default         = in_array( $default, $allowed_orderby, true ) ? $default : 'ASC';

	foreach ( $allowed_orderby as $orderby ) {
		if ( $string === $orderby ) {
			return $orderby;
		}
	}

	return $default;
}

However, adding a function isn’t enough since who always wants to mute some PHPCS rules? Therefore, let’s register our new function inside your coding standards.

<?xml version="1.0"?>
<ruleset name="MyCodingStandards">
	<!-- ... -->
	<rule ref="WordPress.Security.ValidatedSanitizedInput">
		<properties>
			<property name="customSanitizingFunctions" type="array">
				<element value="security_sanitize_orderby"/>
			</property>
			<property name="customUnslashingSanitizingFunctions" type="array">
				<element value="security_sanitize_orderby"/>
			</property>
		</properties>
	</rule>
	<!-- ... -->
</ruleset>

A few Above, we added two sections: customSanitizingFunctions and customUnslashingSanitizingFunctions. The first section registers your sanitizing function. Likewise, the second section allows you to skip unslashing before.

To wrap up, always sanitize user input to be sure that your data is under your control. Likewise, you must sanitize data once and as soon as possible. Even one injection can make a lot of sleepless nights. In addition, if you deal with an open-source, be twice attentive for sanitizing.

If the content was useful, share it on social networks

Leave a Reply

Your email address will not be published. Required fields are marked *

Subscribe to news. I promise not to spam :)
Follow me, don't be shy