Genome index (b), Proteome size vs. Neutral

Genome size (a), Proteome size vs. Splicing index (b), Proteome size vs. Neutral DNA (c), N=210, all regressions are significant except for the PGLS on Genome size versus Splicing index. See Table 6 for detailed statistics.}label{some example}end{figure}section{Discussion}Genome evolution is one of the main debated topics in evolutionary biology, and here we tested for three evolutionary adaptive relationships among some genomic features predicted by previous theoretical work (Rigato and Fusco, in prep).Simple OLS analyses provided strong significant support for all the three predictions. This means that under the OLS assumptions, the neutral DNA and the splicing index appears to be a predictor of the organismal complexity (proteome size). However, in this scenario, the relation between the splicing index and the proteome size could be due only to the relation between the genome size and the splicing index itself, conferring support to the economic hypothesis for the origin of splicing. However, an analysis of the data under a phylogenetic framework, resulted in high levels of  phylogenetic signal (lambda) for all traits and the evolutionary model fitting resulted in the fact that evolution of all traits is likely to be characterized by some levels of selective pressure (OU model). In other words, this means that adaptive relations may exist in these genomic features but, due to the phylogenetic signal, they might be distorted if phylogeny is not taken into account. In fact the PGLS analysis under the best found correlation structures, supported only two relations out of three, in particular the relation between the genome size and the splicing index disappeared under the PGLS analysis (Fig. 3a). This means that is very likely that the genome size is not a predictor of the splicing index level and that the relation we found in the preceding OLS was due to the effect of common ancestry. The PGLS resulted as the best fitted model in all comparisons conferring a stronger support to the fact that the splicing index and the neutral DNA proportion are good candidate mechanisms for the phenotypic robustness since they are positively related with complexity.In addition, if neutral DNA and the splicing index are robustness components, they should precede the escalation of complexity in evolution, since their increase allow to access higher levels of phenotypic robustness and thus higher levels of complexity. This is also the reason why we regressed the proteome size as the response variable, while we adopted the neutral DNA and the splicing index as predictor variables. Comparative studies already indicated that alternative splicing preceded multicellularity in evolution, and suggested that this mechanism might have been co-opted to assist in the development of multicellular organisms citep{irimia2007functional}. Similarly, we suggest that the neutral DNA proportion increases should preceded any increases in organismal complexity because higher levels of robustness allows to explore higher levels of complexity. Surely the neutral DNA proportion and the splicing index are not the only source of robustness and other robustness candidate sources have been shown to be positively related with complexity proxies (like proteome size), as for instance the mean level of protein disorder citep{schad2011relationship}. Possibly even more factor contributing to robustness could be tested in the future.The neutral DNA proportion and the splicing index evolution can also partly contribute to explain the puzzle of the genome size evolution. This is because in most eukaryotes a great proportion of the genome is composed by neutral DNA (the mean of our eukaryotic dataset is 76\%), and because the splicing index influence how much information can be stored in the encoding genome. Simple linear explanations of the genome size problem without a phylogenetic framework have been proposed since long, the last and most famous being that provided by Lynch citep{lynch2003origins}, showing that nearly 60\% of variance in genome size can be explained by the variance in the effective population size (with a taxon sampling of N=21). However, these results has been subsequently confuted by a phylogenetic analysis provided on the same dataset by Whitney and Garland citep{whitney2010did}. Genome size is a complex trait that is unlikely to be explained by univariate analyses  citep{charlesworth2004genome, gregory2005genome}. Phylogenetic comparative methods should be combined with multivariate models that are capable of distinguishing the contributions of highly correlated predictor variables. We suggest that the relation between the neutral DNA proportion, the splicing index and the complexity should be taken into account in such analysis aiming to explore the problem of the genome size evolution.These first results could be improved in a near future by using better proxies of complexity, such as the complexity dimensions (n) of a Fisher geometric model, which in principle can be derived from genetic population data (i.e. dN/dS estimantion) citep{orr2006distribution}, or the organismal cell type number. Here we found strong support for our two main predictions even accounting for the effect of phylogeny, however a wider and a deeper analysis should be provided in the future considering a multivariate phylogenetic analysis and different robustness and complexity proxies.