Information about the location and extent of informal settlements is necessary to guide decision making and resource allocation for their upgrading. Very high resolution (VHR) satellite images can provide this useful information, however, different urban settlement types are hard to be automatically discriminated and extracted from VHR imagery, because of their abstract semantic class definition. State-of-the-art classification techniques rely on hand-engineering spatial-contextual features to improve the classification results of pixel-based methods. In this paper, we propose to use convolutional neural networks (CNNs) for learning discriminative spatial features, and perform automatic detection of informal settlements. The experimental analysis is carried out on a QuickBird image acquired over Dar es Salaam, Tanzania. The proposed technique is compared against support vector machines (SVMs) using texture features extracted from grey level co-occurrence matrix (GLCM) and local binary patterns (LBP), which result in accuracies of 86.65% and 90.48%, respectively. CNN leads to better classification, resulting in an overall accuracy of 91.71%. A sensitivity analysis shows that deeper networks result in higher accuracies when large training sets are used. The study concludes that training CNN in an end-to-end fashion can automatically learn spatial features from the data that are capable of discriminating complex urban land use classes.