The Carnegie Institution, Department of Plant Biology, Stanford, California 94305
The last decade has led to an explosion of genomic information that is being used to help researchers understand the gene content of organisms, how gene content and expression patterns may explain the ecological niche in which the organism lives, the ways in which gene content have been arranged and modified by evolution, the movement of genes and gene clusters among different organisms, and environmental and developmental processes that modulate the expression of genes. In this introductory manuscript, I discuss select algae and how genomics is impacting our understanding of these organisms. Four algae for which near-full genome information has become or will shortly become available are the red alga Cyanidioshyzon merolae, the green alga Chlamydomonas reinhardtii, the diatom Thalassiosira pseudonana, and the marine picoeukraryote Ostreococcus tauri. There is also the full sequence of the vestigial red algal genome associated with the nucleomorph of the Cyptomonad Guillardia theta. A number of other algal genomes, such as that of Phaeodactylum tricornutum, are currently being sequenced. Furthermore, there has been a substantial body of cDNA sequence information generated from various algae. Algae are important contributors to global productivity and biogeochemical cycling, but genomics of these organisms is still in its infancy, and the resources to support large scale projects concerning algal genomes and global gene expression are limited. However, it is useful to discuss the algae that are currently being examined using genomic technologies, some of the information that has been generated from genomic analyses, criteria that may be used for choosing specific organisms for future genome studies (and viable candidates for such studies), and how the information gained might help us better understand structural, functional, developmental, and evolutionary aspects of photosynthetic organisms.
Genomics is often viewed as the generation and analyses of nucleotide sequences of the full or near-full genome as well as cDNAs collections. From sequence information, researchers identify individual genes and repeat elements, analyze the organization and arrangement of genes, and make comparisons among genomes with respect to gene arrangement and sequence identity/similarity; sometimes descriptions of genomics extend to the use of methods for examining global gene expression using microarray technology.
A number of different bacterial and mammalian systems (including humans) that serve as models for genomic studies have been developed because the information gained from such studies can be of immediate importance with respect to human health. However, other systems, including the algae, are gradually benefiting from rapid, widespread use of genomic techniques. Although many would consider the development of algal genomic systems as less urgent than those associated with humans, mice, and pathogenic bacteria, the algae are critical components of many habitats on the planet and are major producers of fixed carbon, especially in marine ecosystems.
The algae are a highly diverse group of photosynthetic organisms that are ubiquitous on the Earth and are critical for maintaining terrestrial and atmospheric conditions. These organisms come in a variety of forms ranging from the tiny picoplankton that inhabit open oceans (Díez et al., 2001; Biegala et al., 2003; see also http://www.sb-roscoff.fr/Phyto/PICODIV/PICODIV_publications.html) to the macrophytic organisms that form turf meadows and forests in coastal waters (Graham and Wilcox, 2000). The diversity among the algae is enormous, not only with respect to size and shape of the organisms, but also with respect to the production of various chemical compounds through novel biosynthetic pathways. For example, the different pigments that comprise the light-harvesting antennae in algae are visually striking and biochemically diverse. In the green algae, the light-harvesting antennae contain mostly chlorophylls a and b, with a significant level of carotenoids, while the antennae pigments of the red algae and cyanobacteria are predominantly the phycobiliproteins, in which bilin chromophores (phycoerythrobilin and phycocyanobilin) are covalently bonded to apophycobiliproteins. In contrast, diatoms and dinofagellates use oxygenated carotenoids as their major light-harvesting pigments. The composition of polysaccharides and cell walls also shows enormous diversity among the algae. For example, some algae have microfibrillar walls of cellulose or other polysaccharides and others have proteinaceous or silicacious walls or scales.
Algae are also economically important since they serve as a source of food, and in many parts of the world they can be used in salads, soups, and as garnish. Most well known among algal foods is the wrap for sushi, or nori, which is derived from the dried fronds of the red alga Porphyra. Algae are also used as a vitamin source by the health food industry (http://www.1001beautysecrets.com/nutrition/algae/), especially cyanobacteria or blue green algae (http://www.crystalpurewater.com/health.htm) since they can be rich in the vitamin A precursor -carotene. But there is a wide range of uses for algae and algal products. They are used as feed additives for aquaculture, as coloring agents to enhance the appeal of food, and as fluorescent tags to identify, quantify, or localize surface antigens for specific medical assays. Algae also synthesize a number of different polysaccharides and lipids that, in addition to serving as carbon storage compounds, perform biological functions and have commercial value. Some of the polysaccharides are anionic and bind metal ions, chelate heavy metals, and help maintain a hydration shell around the alga. The commercially valuable polysaccharides are agar, carrageenans, alginates, and fucoids (Berteau and Mulloy, 2003; Feizi and Mulloy, 2003; Drury et al., 2004; Matsubara, 2004). Certain of these polysaccharides have anticoagulant characteristics (Matsubara, 2004), while others are used for making solid medium for growing bacteria in the laboratory, gels for the delivery of medicines, thickeners in food products such as ice cream, and numerous products including cosmetics, cleaners, ceramics, and toothpaste (http://www.nmnh.si.edu/botany/projects/algae/Alg-Prod.htm). Furthermore, both diatoms and dinoflagellates synthesize long chain polyunsaturated fatty acids (fish oils) that appear to be beneficial for mammalian brain development (Chamberlain, 1996; Salem et al., 2001); these fatty acids are sold as health food products but are also being incorporated into baby formula in many countries throughout the world.
While most algae thrive as free-living organisms, some are more prevalent in symbiotic associations, and still others have evolved into parasites (Goff and Coleman, 1995). Many of the symbiotic associations established by algae are critical for survival of the heterotrophic host organism in environments with low levels of organic carbon compounds. For example, the dinoflagellate Symbiodinium sp. populates and transfers fixed carbon to the tissue of corals, allowing for the establishment and maintenance of the coral reefs that physically stabilize the coastal environment (Murdoch, 1996). Rising temperatures are causing bleaching of the reefs, which could have a pronounced impact on the environment (Coles and Brown, 2003). The growth of specific algae in oceans, estuaries, and lakes can be of concern since they can attain very high densities or blooms that stimulate the proliferation of consumers and the generation of anoxic conditions that can suffocate aquatic animals. A number of the algae and cyanobacteria that form such blooms also produce neurotoxins and are a threat to global water supplies and fisheries (especially with respect to the shell fish industry). Furthermore, the composition of phytoplankton communities has implications with respect to carbon fluxes and the trophic transfer of carbon in food chains.
One difficulty facing algal biologists is the challenge to move from morphological, chemical, and geophysical descriptors of algal/bacterial communities to more molecular descriptors that include both gene content and expression levels. Indeed, our understanding of biological, biophysical, and geochemical processes will all be informed by the wealth of data that can be acquired using a spectrum of biotechnological methods that have been developed over the last 20 years. Much of this information will have its origins in acquiring the full-gene content of an organism, combined with tools to determine the level of expression of specific genes under different environmental conditions, at different developmental stages, and in different tissue types. Naturally, genomic studies are expensive and the resources to support such studies are limited. It is critical that societies and scientific communities with knowledge of the scientific and economic importance of particular groups of organisms, such as the algae, make informed choices as to which organisms would be of most benefit for genomic examination, whether involving whole genome or cDNA projects. It would be most efficient to solicit the aid of large, well-equipped centers that have an expert staff to complete the required sequencing tasks efficiently. However, the first important step for the scientific community with a working knowledge of the field is to define the organisms for which full-genome and cDNA sequences should be obtained, to develop collaborations to facilitate the generation and analysis of genomic information, to petition various agencies for the funds required to obtain the sequence information, and to help train the community, either through courses or workshops and tutorials over the internet, in ways in which the genomic information can be used and extended.
SEQUENCED GENOMES |
---|
TOP SEQUENCED GENOMES OTHER ALGAE CONCLUDING REMARKS LITERATURE CITED |
---|
Currently, there are few algae for which the nuclear genome has been sequenced. Recently, complete or nearly completed sequences of the genomes of the red alga Cyanidioschyzon merolae (http://merolae.biol.s.u-tokyo.ac.jp/; Matsuzaki et al., 2004), the diatom Thalassiosira pseudonana (http://genome.jgi-psf.org/thaps1/thaps1.home.html; Armbrust et al., 2004), and the green alga C. reinhardtii (http://genome.jgi-psf.org/chlre2/chlre2.home.html) have been made publicly available. Other genomes either sequenced and not released or in the process of being sequenced include Ostreococcus tauri (http://www.iscb.org/ismb2004/posters/stromATpsb.ugent.be_844.html; Derelle et al., 2002), Volvox carteri, and Phaeodactylum tricornutum (see http://trace.ensembl.org/perl/traceview?attr=tt_ce_sp&tt_1=1). In addition, the complete sequences of the three chromosomes that constitute the nucleomorph genome of G. theta, which represents a vestigial red algal genome, have been reported (Douglas et al., 2001). But this is just the beginning of an era that is triggering an explosion of information on gene content, gene organization, and the sequences that control gene expression from numerous organisms within the different kingdoms of life. Below, I discuss the algae for which there is significant genomic sequence information (discussed in various articles in this issue of Plant Physiology, especially for C. reinhardtii), but I also try to raise issues concerning the direction of algal genomics and ways to decide on organisms for which full genome sequences will be most immediately useful.
Nucleomorph Genome of G. theta
Of the chlorophyll c-containing chromophytic algae, the Cryptomonads are the only organisms to retain the enslaved red algal nucleus that resulted from a secondary endosymbiotic event (Cavalier-Smith, 2000; Maier et al., 2000). This reduced nucleus or nucleomorph has an envelop membrane with nuclear pores, but the genetic content of the nucleomorph is highly reduced relative to a red algal genome. The DNA of the nucleomorph of the Cryptomonad G. theta has now been sequenced.
The nucleomorph of G. theta contains 3 mini-chromosomes that together constitute 551 kb. This genome is predicted to have 464 genes encoding polypeptides, of which nearly one-half encode proteins of unknown function. The genes are highly compacted in the genome (which has almost no noncoding DNA), and only 17 of the protein coding genes contain introns that can be removed by a spliceosome. Most of the introns are near the 5' ends of the transcripts, and 11 of these 17 intron-containing genes encode ribosomal proteins.
There are a number of interesting aspects with respect to the protein coding sequences of the nucleomorph genome. Most proteins encoded on the nucleomorph genome are needed for the replication of the chromosomes, gene expression, and perpetuation of periplastid ribosomes, with few required for other cellular functions. For example, a number of the nucleomorph-encoded proteins participate in the processing of mRNA, the removal of tRNA introns, and the maturation of rRNA. However, the genome does contain 30 chloroplast targeted proteins, 3 transporters, and a few enzymes (one anabolic and some regulatory). Since the plastid genome houses a small percentage of the genes required for the biogenesis of functional chloroplasts, and the nucleomorph only encodes an additional 30 chloroplast localized proteins, most of the proteins that function in the chloroplast must be synthesized in the cytoplasm of the cell and traverse the rough endoplasmic reticulum (ER), the periplastid membrane, pass through the periplastid space, and then cross the double envelop membrane of the plastid to reach their site of function within the organelle. The arrangement of these membranes and the location of the nucleomorph within the periplastid space are clearly diagrammed by Douglas et al. (2001).
Of the plastid-localized polypeptides encoded on the nucleomorph genome, only a few function in photosynthesis (rubredoxin and HLIP; the latter is a small protein in the light-harvesting complex (LHC) protein family important for survival during high light stress in cyanobacteria [He et al., 2001]), plastid division and gene expression, nucleic acid metabolism, and protein translocation into the plastid and thylakoids. The nucleomorph encoded plastid proteins have amino terminal extensions that, in the case of rubredoxin, have been shown to function as a transit peptide that enables the protein to traverse the plastid envelop membrane (Wastl et al., 2000). The nucleomorph genome also encodes RNA polymerase subunits, regulatory proteins that may influence starch accumulation, protein synthesis, and nucleomorph DNA replication and division; three core histones plus a histone acetylase and deacetylase; and proteins of the ubiquitin-proteasome degradation pathway. There are also proteins essential for nucleomorph functions that are not encoded by the nucleomorph genome; these proteins, which include the subunits of DNA polymerase, would have to be routed from the cytoplasm of the cell to the nucleomorph.
Elucidating steps involved in the biosynthesis of the plastid, the nucleomorph, and periplastid compartment, and developing an understanding of coordinate expression of genes encoded on the nuclear, plastid, and nucleomorph genomes will increase our understanding of the roles of the various compartments in cellular processes, the communications between the different genetic compartments of a cell, and the ways in which proteins and metabolites are exchanged among these compartments. Ultimately, defining the genetic content of all of the different genomes in the Cryptomonads will help elucidate the loss of genetic information in the genome of the endosymbiont following the secondary endosymbiotic event and the exchange of genetic information among the genomes.
C. merolae
The Cyanidiales is a group of unicellular, asexual red algae that grow at high temperatures and under acidic conditions. This group includes the genera Cyanidium, Cyanidioschyzon, and Galdieria, although recent work suggests an unexpectedly high level of genetic diversity among the Cyanidiales (Ciniglia et al., 2004). The first algal nuclear genome to be sequenced was that of a member of the Cyanidiales, C. merolae, whose genome is among the smallest that occurs in photosynthetic eukaryotes. C. merolae is an organism that grows in the hotspring (45°C) at a pH of 1.5 and is considered one of the most primitive algal species (Kuroiwa et al., 1998; Matsuzaki et al., 2004; Nozaki et al., 2004). Its subcellular structure is relatively simple with a single Golgi apparatus and ER and a relatively small number of internal membrane structures. The plastid genome of this organism, which is about 150 kb and contains 243 genes, has been sequenced (Ohta et al., 2003). Interestingly, there is an overlap between the protein coding sequences for many of these genes (40%), which has resulted in a highly compacted plastid genome.
C. merolae has also been the subject of a number of interesting studies concerning mechanisms by which mitochondria and plastids divide (Kuroiwa et al., 1998; Miyagishima et al., 1999; Kuroiwa, 2000; Miyagishima et al., 2001a, 2001b, 2001c, 2003; Nishida et al., 2004). Furthermore, it may be possible to introduce exogenous DNA into these organisms by electroporation; the introduced DNA appears to integrate into the nuclear genome by homologous recombination (Minoda et al., 2004).
The recently sequenced nuclear genome of C. merolae (which still contains some gaps) is approximately 16.5 Mb, with 5,331 genes packed into 20 chromosomes. Within the genome there are only three rDNA units that are not tandemly arranged but define separate loci (Maruyama et al., 2004). The nucleolus is small and not associated with chromatin, which might make it a relatively simple model for defining the composition and biochemical features of a minimal nucleolus. Of the predicted genes contained in the nuclear genome, only 26 have introns and all but 1 of these have single introns. This organism has a very minimal set of motor proteins that includes a set of tubulin subunits, two actins, and both intermediate filament and kinesin family proteins. Furthermore, there are only 2 dynamin encoding genes (most organisms have a family of dynamin genes containing at least 10 members), which function in mitochondrion and chloroplast division, and no genes encoding myosin or dynein motors. These findings suggest that a highly reduced set of motor proteins accomplish cytokinesis and cell motility in this organism.
The analysis of the C. merolae genomic sequence also has implications with respect to the endosymbiont origins of the plastid. The enzymes of the Calvin cycle originated from a combination of genes derived from a cyanobacterial endosymbiont and its eukaryotic host. This mosaic gene composition is similar in C. merolae and Arabidopsis (Arabidopsis thaliana), suggesting that they originated from a common ancestral organism and that this composition remained stable even after the separation of the two lineages. There are many other interesting observations/deductions developing from the sequence of the C. merolae genome, including the finding that the tRNAs contain ectopic introns, that there are no genes encoding two of the major classes of photoreceptors associated with plants (the phototropins, which are blue UV-A light photoreceptors and the phytochromes, which are red light photoreceptors), and that there is only a single His kinase and no response regulators other than those encoded on the plastid genome. A seemingly limited repertoire of signaling elements encoded on the nuclear genome of this alga may reflect the specialized environmental niche in which this organism grows. It would also be interesting to learn more about the transport proteins associated with the cytoplasmic membrane of this and related organisms and the mechanisms by which it deals with the low external pH of the environment (the pumps and exclusion mechanisms that may be associated with maintaining the pH of the cytoplasm of the cell).
T. pseudonana
Diatoms are a diverse group of organisms present in marine, freshwater, and terrestrial environments. They are estimated to be represented by tens-of-thousands of species on the Earth (Round et al., 1990) and may be responsible for as much as 20% of global primary productivity. These organisms can have different gross morphologies (pennate, centric, coccoid, triangular) with precisely patterned and beautifully ornamented silicified cell walls or frustules. Recent work on diatoms has employed sophisticated molecular techniques, and many different diatom species can now be transformed using biolistic procedures (Dunahey et al., 1995; Apt et al., 1996; Zaslavskaia et al., 2000). Reporter genes have also been successfully introduced into diatoms to study gene expression; these reporters include the Escherichia coli uidA gene encoding -glucuronidase, the Tn9-derived cat gene encoding chloramphenicol acetyl transferase, the firefly luc gene encoding luciferase (Falciatore et al., 1999), a variant of the green fluorescent protein gene (egfp), and the aequorin gene from the jellyfish Aequorea victoria (Falciatore et al., 2000). Genes encoding proteins fused to GFP have been introduced into the diatoms and the fusion proteins targeted to various subcellular compartments, including the lumen of the ER (Apt et al., 2002), the chloroplast (Apt et al., 2002), and the cytoplasmic membranes (Zaslavskaia et al., 2001). A chimeric gene encoding the human Glc transporter fused to GFP was introduced into P. tricornutum. The expressed protein integrated into the cytoplasmic membranes and converted this diatom from an obligate photoautotroph to a heterotroph (growth in the dark on exogenous Glc; Zaslavskaia et al., 2001). One significant problem in working with the diatoms stems from the fact that they are diploid and researchers have not been able to consistently achieve sexual crosses, making it difficult to obtain mutants in which both alleles for a specific gene have been modified. Hopefully, continued analyses of the life cycle of the diatoms will help reveal factors that elicit and control sexuality in these organisms (Vaulot et al., 1986, 1987; Armbrust and Chisholm, 1990; Mann, 1993; Armbrust, 1999; Mann et al., 1999; Armbrust and Galindo, 2001).
The choice of the diatom species used in the development of genomic studies was based on several criteria including ecological importance, the capacity of the organism for biomineralization, the ease with which the organism can be manipulated at genetic and molecular levels, and the estimated size of the genome; there is an obvious bias toward sequencing small genomes. There is little information on the sizes of diatom genomes, with most of it coming from the studies of Veldhuis et al. (1997), which estimate the genome sizes of seven diatom species by staining the DNA in the cells with PicoGreen or SYTOX Green and monitoring fluorescence of the individual cells using flow cytometry. The sizes of the genomes varied from 34 to approximately 700 Mb. Ultimately, the centric diatom T. pseudonana and the pennate diatom P. tricornutum were considered to be the most appropriate for generating genomic information. T. pseudonana, a silicified diatom, represents a species with a small genome (estimated by Veldhuis et al. to be 34 Mb) in which other members of the group are ubiquitous and ecologically important; Thalassiosira weissflogii appears to be much more ecologically relevant than T. pseudonana, but the former was found to have a genome that is approximately 20 times larger than that of the latter. The T. pseudonana strain that was sequenced, CCMP 1335, was collected from Moriches Bay (Long Island, NY) in 1958 and is available from the Center for Culture of Marine Phytoplankton (http://ccmp.bigelow.org/). The physiological knowledgebase for T. pseudonana is not well developed, and most molecular tools (e.g. transformation) have not been tested with this organism. The sequence of the T. pseudonana nuclear genome has been completed (Armbrust et al., 2004) by the Joint Genome Institutes (JGI; http://genome.jgi-psf.org/thaps1/thaps1.home.html), with many cDNA sequences to help identify coding regions of genes. The genome size, based on sequence analyses, was found to be very close to the fluorescence-based size estimate (approximately 34 Mb), and, from an optical map (Jing et al., 1998), the genome was determined to consist of 24 chromosomes ranging in size from 0.66 to 3.32 Mb. The nucleotide sequencing of the genome predicts at least 11,242 protein coding genes and that the organism contains a number of metabolic pathways associated with heterotrophic growth. The genome, among the smallest diatom genomes known (the genome of P. tricornutum is smaller), has few repeat elements, and much of the interspersed repeats represent remnants of transposable elements.
There are numerous areas of biology for which genetic and genomic analyses of diatoms would be extremely valuable. One of the major areas of interest over the last decade concerns cell wall or frustule formation. Frustules are silicified cell walls of the diatoms in which the deposition of the silica creates a precise, nano-scale pattern; these structures have the potential for exploitation as substrates for nanotechnology development. Furthermore, researchers are just beginning to gain an understanding of the transport of silicic acid into diatom cells (Hildebrand et al., 1997, 1998; Hildebrand and Wetherbee, 2003); there is little understanding of the intracellular movement of silica and the processes involved in the assembly of this compound into a precisely patterned frustule. Analyses of cell wall biogenesis and the ability to manipulate cell wall structure may provoke the development of new strategies for silicon-based fabrication technology. From a biological perspective, understanding the synthesis of wall components and how they are put together will enhance our knowledge of factors that modulate the assembly of an extracellular matrix, the ways in which this matrix is patterned, the role of patterning in biological function, and the means for modifying biological patterns. It has been known for quite a while that silica polymerization in diatoms occurs in the silica deposition vesicle, a specialized compartment within the cell delimited by a membrane called the silicalemma (Reimann et al., 1966; Crawford and Schmid, 1986). Cytoskeletal components such as microtubules and actin function in silicification; the former is involved in positioning the site at which silicification is initiated and may also influence valve morphology (Pickett-Heaps and Kowalski, 1981; Pickett-Heaps, 1983). The recently characterized polyanionic phosphoproteins of the cell wall, the silaffins (Kröger et al., 1999, 2000, 2002; Poulsen et al., 2003; Poulsen and Kröger, 2004), are associated with silica deposition and cell wall patterning processes; there are five silaffin encoding genes on the T. pseudonana genome. Other components of the cell wall that appear to function in silica polymerization are linear, long-chain polyamines (Kröger et al., 2000). A number of copies of genes thought to be involved in the synthesis of spermine and spermidine, which are likely intermediates in the biosynthesis of long chain polyamines, have also been identified on the T. pseudonana genome. Another family of genes associated with cell wall structure encodes the frustulins, wall glycoproteins that may be important for wall biogenesis but not specifically for the assembly of the silica building blocks (Vrieling et al., 1999). Interestingly, the diatoms appear to have a complete urea cycle, which probably occurs in mitochondria, and they can use urea as a sole nitrogen source. Ornithine, an intermediate in this cycle, is a precursor of the metabolites spermine and spermidine (Morgan, 1999; Igarashi and Kashiwagi, 2000). The urea cycle may also serve in the generation of creatine phosphate, a high energy molecule that can drive certain cellular processes.
There are a number of other areas that will be interesting to explore with respect to sequence analyses of the diatom genome. These include the ways in which diatoms position themselves in the water column, the function and evolution of light-harvesting components (Buchel, 2003; Oeltjen et al., 2004), the mechanisms associated with nonphotochemical quenching of excess absorbed light energy (Lohr and Wilhelm, 1999; Lavaud et al., 2002, 2003), carbon metabolism and the potential role of the C4 pathway in CO2 fixation (Reinfelder et al., 2004), the biosynthesis of long chain polyunsaturated fatty acids (Lebeau and Robert, 2003; Wen and Chen, 2003; Tonon et al., 2004), the role of Ca2+ in signaling in cellular processes (Falciatore et al., 2000), the identification and functional analyses of photoreceptors, the development of different cell morphotypes, and the control of morphogenesis.
Some diatoms, including those in the Thalassiosira genera, can control their position in the water column, which can influence light and nutrient availability, via extrusion of chitin fibers through frustule pores (Round et al., 1990). There are numerous genes encoding enzymes involved in the biosynthesis and degradation of chitin that may help regulate the dynamics of chitin extrusion and help the organism modulate its position in the environment.
The PSBS protein, a member of the extended LHC protein family, is critical for xanthophyll cycle-mediated energy dissipation in plants (Li et al., 2000; Peterson and Havir, 2001; Aspinall-O'Dea et al., 2002). The diatoms also have a xanthophyll cycle that is thought to be involved in the dissipation of excess absorbed light energy (Lohr and Wilhelm, 1999; Lavaud et al., 2002). Interestingly, no gene encoding PSBS has been identified on the genome of T. pseudonana, although such a gene has recently been discovered in the genome sequence database of C. reinhardtii (Gutman and Niyogi, 2004). It will be important to determine if there is a protein that is functionally analogous to PSBS and which diatom proteins are important for xanthophyll-dependent energy dissipation. Furthermore, T. pseudanana has no identified genes encoding LHC-like, stress-associated ELIPs and SEPs, although there are two genes encoding the related HLIPs.
Several other findings concerning genes present (or absent) in the T. pseudonana genome are interesting to note. While many of the enzymes involved in C4 metabolism are present on the T. pseudonana genome, an enzyme that would decarboxylate C4 acids in the plastid to generate the CO2 substrate for ribulose 1,5-bisphosphate carboxylase was not identified; this is intriguing since the C4 pathway appears important for the fixation of inorganic carbon in T. weissflogii (Reinfelder et al., 2004). Whether the gene encoding the decarboxylating enzyme was just missed in the analyses of the genome or whether a novel (or highly diverged) enzyme functions in this capacity remains to be established. Also, a high proportion of the fatty acids synthesized by T. pseudonana are the commercially valuable long chain polyunsaturated fatty acids eicosapentaenoic and docosahexaenoic acids. The genes involved in their biosynthesis have been identified on the genome. With respect to photoreceptors, genes encoding members of both the phytochrome and cryptochrome families have been identified on the T. pseudonana genome, although there do not appear to be genes encoding the phototropin or rhodopsin photoreceptors. Currently, there needs to be much more extensive analyses of the T. pseudonana genome and efforts to link the genomic information with physiological/ecological processes.
It has recently been announced that JGI will sequence the full genome of P. tricornutum. Completion of this sequence will allow a comparison between centric and pennate species and may also help clarify the genetic basis of morphotype differentiation. Like