Rigid geometry solves "curse of dimensionality" effects

I will explain my manuscript submitted, named “Rigid geometry solves “curse of dimensionality” effects: an application to proteomics.” The draft is available as doi: https://doi.org/10.1101/094391 in bioRxiv.

 

To evaluate physiological states of preserved biological samples at low temperatures, we analyzed HEK-293 samples in liquid nitrogen for 2 years and compared them to control samples in various conditions with LC/MS. However, “curse of dimensionality” due to highly dimensionalized system seemed to cause sparse structure of data set and the metric in unused values of LC/MS seemed not to work well. Various clustering analyses failed to cluster out 2-years-samples from control samples, though it was still possible to clustering them out by neural network analysis.

                To improve the condition, we have developed the new metric ‘v’ based on algebraic geometry. Roughly speaking, in algebraic geometry, basically the convergence/divergence of the values are assumed to be isomorphic to nilpotent, and chaotic oscillation to -1 values. This methodology enables us to extract characteristic parameters specific to the system. We used rigid geometry to solve the situation, by locally free but globally converged calculations. Directly speaking, the ‘v’ is ln(unused value)/ln(p-adic metric). The blowup cancelled out the curse of dimensionality. We have succeeded in clustering out 2-years-samples from control samples by all the methods we performed.

                We discuss a possible interpretation for a group of protein signal as a quasiparticle Majorana fermion. That is, the variance space of protein signals includes weakly interacting protein components of 16 dimensions. This is different from species of 4 dimensions or populations of 24 dimensions in our previous arXiv paper. That suggests possible evolutionary tract of biological hierarchy, such as species in a community with no coupling components in 4 dimensions involves individual cells in a cell population with chaotic coupling in 24 dimensions, which frequently bursts/collapse. Finally, proteins in a cell with weakly coupled and highly tuned situation is achieved as a final output in the down layer. The evolution of the system proceeds from community to protein direction. This methodology based on rigid geometry might be a common methodology that is applicable to every data set that nearly obeys Boltzmann distribution.