Fake News Detection using Deep Learning

5 min readJun 28, 2021

The topic of “fake news” is one that has stayed of central concern to contemporary political and social discourse. In this post, I will expand upon my previous post to explore different ways to use deep learning to detect whether a given news article is reliable (‘real news’) or unreliable (‘fake news’). I’ll create four neural networks in this article: 1) a baseline, dense neural network; 2) a Convolutional Neural Network (CNN); 3) a Long Short-Term Memory (LSTM) neural network, and 4) a Bidirectional LSTM network.

Part 1: Data Preprocessing

I’ll be using the same Kaggle dataset for all three networks. The data is stored in a csv file with five columns: id, title, author, text, and label. Missing values were filled in with whitespaces. For more information on the data, including data visualization and more comprehensive EDA, see my previous post. I fit a tokenizer on the training text that also filtered out punctuation and made all text lowercase, used that tokenizer to transform the text into sequences, and then padded those sequences so that the input arrays would all be the same size. Following that, I used scikit-learn’s train_test_split function to split the data into training and testing sets.

max_features = 4500tokenizer = Tokenizer(num_words = max_features, filters='!"#$%&()*+,-./:;<=>?@[\\]^_`{|}~\t\n', lower = True, split = ' ')tokenizer.fit_on_texts(texts = train_data['text'])word_index = tokenizer.word_index
vocab_size = len(word_index)
print(vocab_size)X = tokenizer.texts_to_sequences(texts = train_data['text'])
X = pad_sequences(X, maxlen = max_features, padding='pre')
print(X.shape)
y = train_data['label'].values
print(y.shape)# splitting the data training data for training and validation.X_train, X_test, y_train, y_test = train_test_split(X,                   y, test_size = 0.2, random_state = 42, shuffle=True)

Part 2: Creating a Baseline Model

Before diving into more complex models, I first wanted to create a simple Dense Neural Network to serve as a point of reference. I built it using the Sequential API, alternating Dense layers with 512 neurons with Dropout layers set to a dropout rate of 0.2. The model achieved only 52% accuracy, which is significantly less than the shallow models I previously created.

model = tf.keras.Sequential([tf.keras.layers.Embedding(input_dim = max_features, output_dim = 120),tf.keras.layers.Dense(512, kernel_regularizer=regularizers.l2(0.0001), activation='elu'),tf.keras.layers.Dropout(0.5),tf.keras.layers.Dense(512, kernel_regularizer=regularizers.l2(0.0001), activation='elu'),tf.keras.layers.Dropout(0.5),tf.keras.layers.Dense(512, kernel_regularizer=regularizers.l2(0.0001), activation='elu'),tf.keras.layers.Dropout(0.5),tf.keras.layers.Dense(512, kernel_regularizer=regularizers.l2(0.0001), activation='elu'),tf.keras.layers.Dropout(0.5),tf.keras.layers.Dense(1)])model.compile(loss='binary_crossentropy', optimizer='adam', metrics ['accuracy'])model.summary()history = model.fit(X_train, y_train, epochs=5, batch_size=100, validation_data=(X_test, y_test))

plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'], loc='upper left')
plt.show()plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'], loc='upper right')
plt.show()

Part 3: Creating a CNN Model

Next, I created a multilayer Convolutional Neural Network. Following a similar pattern as my baseline, I repeated a sequence of Convolutional layers with varying sizes, MaxPooling layers, and Dropout layers with a dropout rate of 0.2. This model performed slightly better than our baseline model, achieving 57% accuracy after 5 epochs, but still not good enough to be deployable.

model = tf.keras.Sequential([tf.keras.layers.Embedding(input_dim = max_features, output_dim = 120),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Conv1D(128, 4, activation='relu', kernel_regularizer=regularizers.l2(0.0001)),
tf.keras.layers.MaxPool1D(pool_size=4),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Conv1D(64, 4, activation='relu', kernel_regularizer=regularizers.l2(0.0001)),
tf.keras.layers.MaxPool1D(pool_size=4),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Conv1D(32, 4, activation='relu', kernel_regularizer=regularizers.l2(0.0001)),
tf.keras.layers.MaxPool1D(pool_size=4),
tf.keras.layers.Dense(1, activation='sigmoid')])model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])model.summary()history = model.fit(X_train, y_train, epochs=5, batch_size=100, validation_data=(X_test, y_test))

Part 4: Creating a Hybrid CNN-RNN Model

The next model I created was a hybrid CNN-RNN model. After a convolutional layer, I used two LSTM layers and then two Dense layers. I also used three dropout layers set to 0.5 to apply some regularization to the model.

This model once again saw improved performance than the one created above, achieving 96% accuracy after 5 epochs.

model = tf.keras.Sequential([tf.keras.layers.Embedding(input_dim = max_features, output_dim = 120),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Conv1D(64, 5, activation='relu', kernel_regularizer=regularizers.l2(0.0001)),
tf.keras.layers.MaxPooling1D(pool_size=4),
tf.keras.layers.LSTM(20, return_sequences=True),
tf.keras.layers.LSTM(20),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(512, activation='relu', kernel_regularizer=regularizers.l2(0.0001)),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(256, activation='relu', kernel_regularizer=regularizers.l2(0.0001)),
tf.keras.layers.Dense(1, activation='sigmoid')])model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])model.summary()history = model.fit(X_train, y_train, epochs=5, batch_size=100, validation_data=(X_test, y_test))

Part 5: Creating a Bidirectional-LSTM Model

Lastly, I created a bidirectional-LSTM Model. After 5 epochs, this model achieved 95% validation accuracy. However, it achieved 97% validation accuracy after only two epochs — this architecture looks to be causing the model to overfit rather quickly, whereas the model above was still seeing rises in validation accuracy throughout the training process.

Conclusion: Model Evaluation and Considerations for Further Research

Throughout this article, I demonstrated how to create some common deep learning models. The models that incorporated LSTM cells saw the best performance, and would likely be what I would choose if I were to be tasked with deploying a model into a production environment.

The task of identifying fake news, detecting its appearances online, and stopping its spread is one that will likely continue to grow in importance. While classifying texts is an important accomplishment in this area, there’s still much more research to be done. Take, for instance, the fact that this model only considers the raw content of news articles. One could also consider how fake news propagates through a social network, such as this exciting paper that demonstrates the efficacy of graph neural networks for modeling the spread of fake news across Twitter.

I hope you enjoyed my articles on this topic! All code can be found at my gitHub :)