FAQ

1. What is the EV MemProt Atlas?

EV MemProt Atlas is an online database developed by the National Center for Protein Science (Beijing). It provides a manually curated catalogue of human extracellular vesicle membrane proteins and their supporting literature evidence, which were identified through text mining and manual curation workflows.

2. How to search the EV membrane proteins, the related diseases and their supporting literature evidence?

EV MemProt Atlas provides three search entry routes: gene symbol, protein symbol, and disease term.

Search by gene or protein symbol

Users can enter a gene or protein symbol in the corresponding search box. A dropdown menu will auto-complete gene/protein symbols available in EV MemProt Atlas. After selecting a target entry and clicking the Search button, the system will return search results in a table that lists EV membrane proteins, associated diseases, and corresponding supporting literature evidence. Clicking the Reset button clears all entered search terms.

Search by gene or protein symbol

Search by disease term

Users can enter a disease name in the Related Disease search box. A dropdown menu will auto-complete disease terms stored in the database. After selecting a disease entry and clicking Search, the engine generates result tables displaying matched EV membrane proteins, their linked diseases, and relevant literature evidence. The Reset button erases all input search criteria.

Search by related disease

Clicking hyperlinked protein or gene symbols within search results opens the protein detail page. This page provides comprehensive information of the target protein, including Ensembl and UniProtKB accession IDs for manual cross-reference queries.

Protein detail page with external identifiers

3. How to sort and filter proteins on the browse page?

Users may click triangle icons on column headers to sort table entries. The top search box supports fuzzy protein matching; two dropdown menus enable filtering proteins by protein classification and transmembrane count. Click hyperlinked protein symbols to view detailed information on the target protein.

Browse page sorting and filtering

4. How to retrieve and download extracellular vesicle membrane protein datasets?

Navigate to the Browse tab on the navigation bar to view the full collection of human EV membrane proteins and download all literature-supported entries for downstream research. Users may also search for individual target proteins and download their corresponding datasets separately.

Download EV membrane protein data

5. How to search a single protein and access original database links?

The Browse page supports bulk browsing and downloading of the complete EV membrane protein dataset. Users can also search for a specific protein, such as CD9, to open its dedicated detail page. On this page, users may download data related to CD9 and click embedded hyperlinks to redirect to external original databases for full gene and protein annotation resources.

Single protein detail links and download

6. How does our team collect and curate extracellular vesicle membrane proteins and supporting literature evidence to construct EV MemProt Atlas?

We established three independent curation workflows. Each pipeline first screened human extracellular vesicle (EV/exosome) proteins, then assessed whether candidate proteins satisfied unified subcellular localization criteria for EV membrane proteins.

A human EV protein is only retained as a membrane candidate if it is annotated as a cell membrane component in at least two out of four authoritative resources: the high-confidence Cell Surface Protein Atlas (CSPA), UniProt entries tagged with cell membrane, cell membrane annotations from the Human Protein Atlas (HPA), and Gene Ontology cellular component (GO-CC) cell membrane terms.

ExoCarta database workflow

We first extracted all human EV proteins recorded in ExoCarta, filtered candidates against the unified cell membrane localization criteria, and finally identified 1,253 human EV membrane proteins.

PubMed text mining workflow (EV and exosome keywords)

We queried PubMed with keywords related to extracellular vesicles and exosomes and retrieved 11,797 abstracts. Protein entities were extracted via a bio-NER tool, yielding 10,867 valid abstracts and 50,690 annotated sentences. After two rounds of manual curation, 3,000 abstracts, 6,268 sentences, and 1,180 candidate EV proteins were retained. We further filtered these candidates against the cell membrane localization criteria and identified 220 human EV membrane proteins.

Large-scale EV proteomic dataset integration workflow

We combined EV, exosome, surfaceome, and proteome-related keywords for PubMed retrieval and initially retrieved 3,961 abstracts. Only human proteomic studies with downloadable human EV protein lists were reserved, resulting in 614 eligible articles and 60,721 candidate EV proteins. Applying the identical membrane localization filter, we obtained 2,069 human EV membrane proteins.

Candidate proteins recovered from the three workflows were pooled into a combined set of 3,542 entries. After redundancy elimination, we generated a final non-redundant dataset of 2,176 unique human EV membrane proteins to construct the EV MemProt Atlas resource.

7. How do we guarantee the data quality of EV MemProt Atlas?

To ensure the reliability of all EV membrane proteins deposited in the database, we built a three-tier quality control system composed of independent manual screening, iterative expert re-review, and quantitative scoring validation.

First, four researchers independently screened all literature sentences linked to candidate EV proteins to retain only credible evidence supporting each protein. Second, all preliminarily filtered datasets underwent two rounds of rigorous re-curation by a two-person expert panel. Third, we implemented a dual quantitative scoring framework for supplementary quality evaluation: a literature frequency score and an EVMP score.

All records curated manually and validated via quantitative scoring are displayed as verified supporting evidence at the top of each protein's detail page.