Plant-parasitic nematodes (PPN) cause damage to crops across the world and are a major threat to global food security. A phylogenetic analysis of the phylum Nematoda [1,2] has shown that the ability to parasitize plants has arisen independently on at least four separate occasions within the phylum. The majority of the most economically important PPN species are located in Clade 12 (Tylenchida), and include migratory endoparasitic species as well as the biotrophic, sedentary endoparasitic root-knot and cyst nematodes. These nematodes, and the Clade 10 plant parasite Bursaphelenchus xylophilus, have been intensively studied and extensive genome and transcriptome resources are available for these nematodes. These resources include full genome sequences for several root-knot and cyst nematodes (e.g., [3,4,5,6]) and B. xylophilus  as well as extensive transcriptome analysis for a wide range of other species in these clades (reviewed in ). In contrast to endoparasitic nematodes that are restricted to Clades 12 and 10, ectoparasitic nematodes species can be found in all four plant parasite Clades . However, very little genome or transcriptome information is available for the ectoparasitic nematodes in Clades 1 (e.g., Trichodoridae) and 2 (e.g., Longidoridae) other than a small-scale expressed sequence tag project for the Longidoridae Xiphinema index . Consequently, the molecular process by which Clades 1 and 2 ectoparasitic nematodes infect plants is poorly known.
Ectoparasitic nematodes from Clades 1 and 2 cause damage to plants, either through direct feeding or by transmission of plant viruses. The economic damage caused by plant viruses explains why major vector species, including the nematodes X. index and Longidorus elongatus, are among the most studied ectoparasites. Both these nematodes belong to the family Longidoridae and are members of Clade 2 of the phylum. Longidorus elongatus has mainly been found in temperate regions. It feeds on a wide variety of herbaceous annual and perennial crops and weeds [11,12] and transmits two major plant viruses, Raspberry ringspot virus (RRV) and Tomato black ring virus (TBRV). Xiphinema index is primarily recorded on grapevine, though it can develop on other perennial crops . Its native area is the Middle East, from where it has been spread with cultivated grapevine to viticulture regions globally . It has a high economic impact by transmitting Grapevine fanleaf virus (GFLV), the major grapevine virus worldwide [15,16]. Both L. elongatus and X. index are diploid (with n = 7 and n = 10 chromosomes, respectively). Both reproduce by meiotic parthenogenesis  and rarely have males. For X. index, sexual reproduction has been described  but it is the exception, and the biotic or environmental factors that activate sexual reproduction are unknown. As for other Longidorids, multiplication of both nematodes is slow compared to Clade 12 plant parasites, and development from egg to adult may take one to several years [11,19].
All life stages of X. index and L. elongatus occur outside the host. Xiphinema index feeds using its long hollow odontostyle to penetrate root cells and ingest cell contents. When feeding at the root tip, large multinucleate, metabolically active cells, similar in appearance and ontogeny to the giant cells of Clade 12 root-knot nematodes, are formed . At sites other than the root tip, the nematode feeds from a column of cells in the root [21,22] and can induce necrosis of damaged tissues. Responses of the host plant may also differ depending on the host status of the plant. For example, Ficus carica is a good host for X. index and modified host cells are seen frequently, whereas on a poor host (e.g., Solanum lycopersicum) non-modified cells are observed at root tips . Feeding by X. index alternates phases of withdrawal and ingestion of cell contents with phases of inactivity when the nematode is thought to be injecting saliva. Feeding behaviour and gall formation by L. elongatus are similar, even though this nematode feeds exclusively on the root tips of its hosts and does not induce multinucleate cells because mitosis and cytokinesis occur together during hyperplasia . Changes to nuclei and DNA levels within galls induced by L. elongatus have been recorded .
Analysis of the genomes and transcriptomes of PPN to date has shown that horizontal gene transfer (HGT) has substantially contributed to plant-parasitic nematode genomes and played a key role in the evolution of plant parasitism [8,25]. While not all genes acquired via HGT have been implicated in parasitism, and similarly not all genes implicated in parasitism are acquired via HGT, some genes acquired via HGT are indeed clearly involved in the parasitism process. For instance, a wide range of cell wall-degrading enzymes, including cellulases and pectate lyases, is present in Clade 10 and 12 PPN (reviewed in ). A key finding is that multiple independent HGT events have occurred and may have facilitated the evolution of plant parasitism in several different groups. This is best illustrated by the presence of different cellulases in Clade 10 and Clade 12 PPN; the Clade 12 PPN contain Glycoside Hydrolase Family 5 (GH5) cellulases that are most likely to have been acquired from bacteria , while the Clade 10 PPN B. xylophilus contains Glycoside Hydrolase Family 45 (GH45) cellulases that were probably acquired from fungi . Besides degradation of the plant cell wall, some genes acquired via HGT have been shown to be involved in processing of nutrients from the plants or manipulation of plant defence system . For example, root-knot and cyst nematodes have acquired invertases from bacteria that convert sucrose from the plant into glucose and fructose, readily processed by animals .
The absence of genome and transcriptome information for early-branching ectoparasitic nematodes means that it is not known whether HGT has also been important in the evolution of plant parasitism in these groups. In order to better understand the mechanisms underpinning parasitism by these ectoparasitic nematodes, and to determine the extent of HGT in these independently evolved PPN, we report the deep sequencing of transcriptomes of X. index and L. elongatus and an analysis of genes potentially acquired via HGT in these species. We demonstrate the presence of a biochemically active Glycoside Hydrolase Family 12 (GH12) cellulase of likely bacterial origin in X. index, confirming that independent HGT has occurred in this group and may have played a role in the evolution of plant parasitism.
Horizontal gene transfer (HGT), is the transmission of genes between organisms by other way than direct (vertical) inheritance from parental lineages to their offspring. HGT is prevalent in prokaryotes , with substantial proportions of bacterial genes jumping horizontally, rather than being vertically inherited . These horizontally-acquired genes play important functions in bacteria, including spreading of antibiotic resistance and emergence of pathogenicity [3,4,5,6]. Although HGT is much less prevalent in eukaryotes, and particularly multicellular eukaryotes, there are reported cases in the literature, including in viridiplantae and Metazoa [7,8,9,10,11,12]. Some of the reported examples also evoke important associated roles in the recipient organism. This suggests that the genomic and biological impact of HGT could be more widespread in the tree of life than initially thought .
These HGT challenge the view of a purely tree-like backbone underlying inheritance of genetic information across species . With the technological progress and cost reduction for genome sequencing, more and more genomes from a wider diversity of organisms now become available and this trend will continue. To provide a more comprehensive view of the contribution of HGT to the genomes of the different organisms across the tree of life, methods to rapidly detect candidate HGT are needed.
Two classical ways to identify candidate HGT are (i) to study the intrinsic (e.g., GC content, codon usage distribution) and/or (ii) the extrinsic (e.g., percent identity, BLAST  E-value against other species) characteristics of the genes of a species of interest. For instance, in the ‘intrinsic’ category, genes that have GC content and codon usage deviating from the rest of the genes of a species of interest (receiver) might be considered as a sign for horizontal acquisition. Similarly, in the ‘extrinsic’ category, if a gene from a species of interest (receiver) shows higher similarity (lower E-value, higher bit-score in BLAST) to sequences from distantly related (donor) species than to genes of close relatives; this gene is a candidate HGT. Another frequent ‘extrinsic’ approach is to assess the phylogenomic distribution of genes across a panel of diverse organism by clustering genes in groups of orthologs and looking for those with a patchy distribution (phylogenomic profiles). For a more comprehensive overview of the different categories of methods to detect HGT, please refer to this recent review by Ravenhall et al. .
Methods in the intrinsic category can be efficient to identify HGT, as long as the GC content and codon usage distributions of the receiver and donor genomes are sufficiently different. However, if these intrinsic metrics are closely related or if the HGT event is ancient and the acquired gene has adapted to the GC content and codon usage of the receiver organism, such methods will not be able to identify HGT .
Methods in the extrinsic category do not suffer from undistinguishable GC content and codon usage as they rely only in a difference of magnitude in the BLAST or other similarity metrics between closely related and distant taxa. These methods are better suited to identify HGT between distant species (e.g., trans-kingdom) and require an as comprehensive as possible reference sequence library covering a diversity of species. One such method in the extrinsic category is the Alien Index metrics. It was first introduced by Gladyshev et al.  to identify HGT of non-metazoan origin in a metazoan species, the bdelloid rotifer Adineta vaga. Briefly, an Alien Index (AI) is calculated to measure a difference of magnitude between the best non-metazoan and best metazoan E-value. If the best non-metazoan E-value is closer to 0 than the best metazoan E-value, the AI will be positive, if the AI is ≥45, it has been assumed that an HGT event is very likely. This AI method was used again to assess the total contribution of genes of non-metazoan origin in the whole genome of the bdelloid rotifer once it was sequenced . Using custom Perl scripts, our laboratory and collaborators also used AI to identify candidate HGT of non-metazoan origin in the genome of the plant-parasitic nematode Globodera rostochiensis , in the genomes of several panagrolaimid nematodes  as well as in the transcriptomes of the nematodes Nacobbus aberrans , Xiphinema index  and other as yet unpublished genomes. Calculation of AI scores for whole proteomes had previously been implemented in a software called AlienG . It was used to highlight the importance of HGT in the colonization of land by plants  and in several other studies of HGT across different lineages [24,25,26]. In the original AlienG, publication, an AI > 30 was deemed good indicator of acquisition via HGT. However, as far as we know, this software is neither publicly available for download nor deployed on a web server. Hence, no user-friendly web tool or downloadable software is available, so far to compute AI scores directly from BLAST results. Furthermore, extracting the taxonomic information from BLAST results against the National Center for Biotechnology Information´s (NCBI) (Bethesda, MD, USA) protein libraries can be a long and difficult task, which prevents popularization of such methods.
To circumvent this difficulty of retrieving taxonomic information, it is tempting to divide the NCBI’s non-redundant protein library (NR) or other sequence libraries into taxonomic subsets (i.e., a non-alien subset consisting of sequences from species closely related to the receiver species of interest, and one or several subsets consisting of sequences from distantly related (alien) candidate donors). This approach has been implemented in a Perl software named alien_index . However, E-values obtained from BLAST against different sequence libraries are not comparable which makes calculation of an AI questionable, especially if the libraries are of different sizes. This is one of the reasons why, another score, named HGT index (or h), has been proposed . The principle is very similar to the AI score, except that it is calculated based on a difference between the best donor and recipient species bit scores. In contrast to E-values, bit scores are comparable between sequence libraries of different sizes. Thus, this allows tackling the problem of taxonomic identification from a single big BLAST results by running several different BLASTs against several taxonomic subdivisions of a sequence library (e.g., NCBI’s NR, Swissprot, Uniprot ). However, this imposes to predefine a priori, candidate donor taxa and to run multiple BLASTs against different libraries that have either to be downloaded and formatted by the user or constructed by the user. The HGT index method was first used to determine genes of non-metazoan origin in the transcriptome of the bdelloid rotifer Adineta ricciae  and later used to infer the contribution of HGT to the transcriptomes of several different metazoan species, including vertebrates .
A refined scoring method has been recently proposed that not only takes into account sequence similarity between receiver and candidate donors, measured in BLAST bit score, but also taxonomic distance measured as the number of step in the NCBI’s taxonomic lineage to reach the common ancestor of the query and subject species . The software is publicly available and has been initially developed to identify HGT in fungal genomes. However, this requires local installation of voluminous data, manual configuration, and expert skills are required to tune and adapt the method to identify HGT in other phyla.
Similarly, a method called DarkHorse and initially developed to identify HGT in bacteria is available as a software that can be downloaded and installed locally . This method proposes a lineage probability index score (LPI) based on the taxonomic ranks of top hits to identify candidate HGT from BLAST against protein databases. Here again, the method can be generalized to other taxa of interest. However, the installation is restricted to users with computational skills on a Unix-based platform and requires installing, configuring and managing a MySQL database as well as downloading the whole NCBI’s NR database and the NCBI’s taxonomy.
Phylogenetic methods that compare a gene or protein tree to a reference species tree and identify inconsistencies are the gold standard to predict HGT. While such methods exist [32,33,34,35] they need to be implemented by expert users. These phylogeny-based methods require a reference species tree for comparison. However, such reference trees are not always available for the group of species of interest but some methods propose to generate a reference species trees as well as computing individual gene trees in parallel . Furthermore, producing phylogenetic trees for a whole proteome or large gene set can be extremely time consuming, especially if the initial homology search retrieves numerous homologs for alignments and phylogenetic reconstruction. Because of their computationally demanding nature, phylogeny-based methods are currently hardly adapted to analysis of large genomes or to a high number of genes. Hence, AI or HGT-index based methods constitute an interesting method to rapidly identify candidate HGT and narrow down the number of genes to be phylogenetically analyzed afterwards.
As far as we know, there is currently no publicly available web tool that allows to easily and rapidly identify HGT from large datasets directly from a BLAST result.
Here, we propose Alienness, a user-friendly web application that requires no installation of any software and that is publicly accessible at http://alienness.sophia.inra.fr. Alienness requires nothing else than BLAST results against any sequence library at the NCBI and a few parameters to calculate AI for a set of query sequences (e.g., a whole proteome). Alienness can be applied to any genome of interest and to identify candidate HGT from any donor to any recipient taxonomic group.
We tested the accuracy of Alienness on the genomes of two plant-parasitic nematodes, for which phylogenetically supported HGT of a whole series of genes involved in plant parasitism had been previously identified [37,38]. We found that all phylogenetically supported cases could be retrieved by Alienness with an AI > 9 and that this AI threshold corresponded to a low rate of putative false positives.
We believe Alienness will promote a more rapid exploration of candidate HGT across the tree of life and will contribute to assessing more globally the evolutionary significance of HGT.