推荐设备MORE

济南兼职企业网站建设—深圳

济南兼职企业网站建设—深圳

疑难问题

应用MiXCR开展免疫力组库剖析

日期:2021-02-04
我要分享

前边我领着大伙儿根据IMGT数据信息库认知能力免疫力组库,并且也一起从IMGT数据信息库免费下载免疫力组库有关fasta编码序列,免疫力组库关键的科学研究目标便是分为BCR的IGH,IGK,IGL这3类,及其TCR的TRA,TRB,TRD,TRG,他们各有都是有V,D(可选择),J,C遗传基因。

接下去又了解了免疫力组库测序数据信息,了解了免疫力组库测序数据信息的一些特点,而且应用igblast开展免疫力组库剖析了,可是哪个是基本的核对,尽管寻找每个测序片断的V,D(可选择),J,C遗传基因,而且取得 CDR3编码序列,正中间流程有点儿多,基本的测序数据信息过虑即使了,还必须把PE数据信息合拼,fastq文件格式变为fasta文件格式,并且自身igblast手机软件就难以应用,数据信息库文档搭建也繁杂。实际上也是有包裝好的一站式步骤。

NGS and de-multiplexing was performed on an Illumina MiSeq sequencer (600–cycle, single indexed, paired-end run). Analysis of the TRB puted using the MiXCR analysis tool

MiXCR是MiLaboratory开发设计的,她们试验室荣誉出品了3款:MiXCR, MIGEC, VDJtools ,在免疫力组库行业都很知名。在其中MiXCR工作中步骤提示图以下:

image-34850

安裝MiXCR手机软件

由于MiXCR有二进印刷制版本,因此强烈推荐立即免费下载它而且缓解压力就可以应用啦!

milaboratory/mixcr/releases/download/v3.0.13/mixcr-3.0.13.zip (今年0五月0五日查询的全新版) 文本文档:en/

但是MiXCR是根据java服务平台开发设计,因此起动MiXCR情况下,必须确保java自然环境是ok的,并且规定java8哦。

假如是Linux或是macOS,那麼十分简易,立即应用里边的mixcr指令就可以启用此软件。

假如是Windows服务平台呢,必须应用 java -Xmx4g -Xms3g -jar 来启用免费下载到的MiXCR有二进印刷制版本里边的jar文档哦。

我的是macOS实际操作系统软件,手机软件免费下载缓解压力后以下所显示:

(base) jmzengdeMBP:mixcr-3.0.13 jmzeng$ pwd
/Users/jmzeng/biosoft/MiXCR/mixcr-3.0.13
(base) jmzengdeMBP:mixcr-3.0.13 jmzeng$ tree -h
├── [1.4k高清] LICENSE
├── [ 96] libraries
├── [4.0K] mixcr
└── [9.3M] mixcr.jar
1 directory, 3 files
一站式步骤应用

MiXCR手机软件出示一个 analyze方式,能够一站式进行(align, assemblePartial, extend, assemble, assembleContigs and export) 这种剖析,必须用心读 en/ 表明书,其出示的事例以下:

mixcr analyze amplicon
 -s species \
 --starting-material startingMaterial \
 --5-end 5End --3-end 3End \
 --adapters adapters \
 [OPTIONS] input_file1 [input_file2] analysis_name

在前边大家了解的免疫力组库测序数据信息,是人们的,MiSeq测序仪,PE300测序对策,TRB,DNA测序。因此大家搭建的指令是:

mixcr=/Users/jmzeng/biosoft/MiXCR/mixcr-3.0.13/mixcr
$mixcr analyze amplicon \
 -s hsa \
 --starting-material dna \
 --5-end v-primers --3-end c-primers \
 --adapters adapters-present \
 raw/ERR3445170_1.fastq.gz raw/ERR3445170_2.fastq.gz ERR3445170

由于我设定了 —adapters adapters-present,因此这一手机软件也会帮助除去大家测序数据信息里边的接头编码序列,因此我这儿立即应用了raw测序数据信息fq文档开展剖析,不用历经前边igblastn的trim_galore全过程。

很轻轻松松就出去了断果,看上去启用了我Mac所有测算資源

image-54841

运作系统日志以下;

NOTE: report file is not specified, using ERR3445170.report to write report.
Alignment: 2.1%
Alignment: 15.4% ETA: 00:00:38
Alignment: 27.9% ETA: 00:00:17
Alignment: 39.5% ETA: 00:00:15
Alignment: 51.5% ETA: 00:00:12
Alignment: 63% ETA: 00:00:09
Alignment: 74.2% ETA: 00:00:06
Alignment: 85.4% ETA: 00:00:03
Alignment: 96.9% ETA: 00:00:00
============= Report ==============
Analysis time: 29.52s
Total sequencing reads: 84833
Successfully aligned reads: 83421 (98.34%)
Paired-end alignment conflicts eliminated: 3027 (3.57%)
Alignment failed, no hits (not TCR/IG?): 850 (1%)
Alignment failed because of absence of V hits: 4 (0%)
Alignment failed because of absence of J hits: 507 (0.6%)
No target with both V and J alignments: 49 (0.06%)
Alignment failed because of low total score: 2 (0%)
Overlapped: 81097 (95.6%)
Overlapped and aligned: 80179 (94.51%)
Alignment-aided overlaps: 1313 (1.64%)
Overlapped and not aligned: 918 (1.08%)
V gene chimeras: 29 (0.03%)
J gene chimeras: 4 (0%)
TRB chains: 83418 (100%)
IGH chains: 3 (0%)
Realigned with forced non-floating bound: 10098 (11.9%)
Realigned with forced non-floating right bound in left read: 3 (0%)
Realigned with forced non-floating left bound in right read: 3 (0%)
Initialization: progress unknown
Assembling initial clonotypes: 100%
Building clones: 25.2%
Writing clones: 0%
============= Report ==============
Analysis time: 2.33s
Final clonotype count: 2116
Average number of reads per clonotype: 35.18
Reads used in clonotypes, percent of total: 74448 (87.76%)
Reads used in clonotypes before clustering, percent of total: 76478 (90.15%)
Number of reads used as a core, percent of used: 75378 (98.56%)
Mapped low quality reads, percent of used: 1100 (1.44%)
Reads clustered in PCR error correction, percent of used: 2030 (2.65%)
Reads pre-clustered due to the similar VJC-lists, percent of used: 33 (0.04%)
Reads dropped due to the lack of a clone sequence, percent of total: 4236 (4.99%)
Reads dropped due to low quality, percent of total: 116 (0.14%)
Reads dropped due to failed mapping, percent of total: 2590 (3.05%)
Reads dropped with low quality clones, percent of total: 1 (0%)
Clonotypes eliminated by PCR error correction: 832
Clonotypes dropped as low quality: 1
Clonotypes pre-clustered due to the similar VJC-lists: 27
TRB chains: 2116 (100%)

手机软件内嵌了免疫力组库数据信息库,因此会依据出示的种群姓名,自身去核对BCR的IGH,IGK,IGL这3类,及其TCR的TRA,TRB,TRD,TRG,他们各有都是有V,D(可选择),J,C遗传基因。由于大家这一检测数据信息是TRB的,因此仅有TRB的結果出去,以下:

 2117 ERR3445170.clonotypes.ALL.txt
 1 ERR3445170.clonotypes.IGH.txt
 1 ERR3445170.clonotypes.IGK.txt
 1 ERR3445170.clonotypes.IGL.txt
 1 ERR3445170.clonotypes.TRA.txt
 2117 ERR3445170.clonotypes.TRB.txt
 1 ERR3445170.clonotypes.TRD.txt
 1 ERR3445170.clonotypes.TRG.txt
了解MiXCR手机软件的輸出結果

以下:

image-33784

輸出的列较为多:

 1 cloneId
 2 cloneCount
 3 cloneFraction
 4 targetSequences
 5 targetQualities
 6 allVHitsWithScore
 7 allDHitsWithScore
 8 allJHitsWithScore
 9 allCHitsWithScore
 10 allVAlignments
 11 allDAlignments
 12 allJAlignments
 13 allCAlignments
 14 nSeqFR1
 15 minQualFR1
 16 nSeqCDR1
 17 minQualCDR1
 18 nSeqFR2
 19 minQualFR2
 20 nSeqCDR2
 21 minQualCDR2
 22 nSeqFR3
 23 minQualFR3
 24 nSeqCDR3
 25 minQualCDR3
 26 nSeqFR4
 27 minQualFR4
 28 aaSeqFR1
 29 aaSeqCDR1
 30 aaSeqFR2
 31 aaSeqCDR2
 32 aaSeqFR3
 33 aaSeqCDR3
 34 aaSeqFR4
 35 refPoints

实际上较为非常容易了解,分为V,D(可选择),J遗传基因的核对状况,分为FR1-4及其CDR1-3的编码序列状况 第一一部分是免疫力组库关键编码序列

由于V,D,J,C遗传基因具备一定水平的类似性,因此其实不必须所有的好几百个碱基编码序列才可以明确是V,D,J,C遗传基因的什么构成。

cloneId cloneCount cloneFraction targetSequences
0 20112.0 0.270717 TGTGCCAGCAGCTTGGCCCGAGAAGGGATTGAAAACACCATATATTTT
1 9264.0 0.010316 TGTGCCAGCAGCCGCCTGGACCTAGGATTTTACGAGCAGTACTTC
2 5037.0 0.77111 TGTGCCAGCAATGTACAGCTCCTATAACACTGAAGCTTTCTTT
3 2710.0 0.0364085 TGCAGCGGGCAAGGGGGGGAGAGGGGAGCAGATACGCAGTATTTT
4 1851.0 0.310768 TGTGCCAGCAGTTTATCTCAAGGGACGCTATACAATGAGCAGTTCTTC
5 1084.0 0.0051794 TGCAGCGTTGTACCCGATAGCACAGATACGCAGTATTTT
6 1082.0 0.0485279 TGCGCCAGCAGCTTGACAAACACCATATATTTT
7 890.0 0.009972 TGCAGTGTGTGGTGGACCGGGGGGTATGGCTACACCTTC
8 538.0 0.392865 TGTGCCTGGAGTCCCGGGTTTGGGGATTCACCCCTCCACTTT

指令是:cut -f 1-4 ERR3445170.clonotypes.TRB.txt|head

第二一部分是V,D,J,C遗传基因

关键是便捷统计分析该样版的V,D,J,C遗传基因占有率

allVHitsWithScore allJHitsWithScore allVAlignments allJAlignments
TRBV5-5*00(886.1) TRBJ1-3*00(197) 549|567|585|0|18||90.0 26|42|70|32|48||80.0
TRBV6-5*00(872.3) TRBJ2-7*00(175.7) 519|530|556|0|11||55.0 24|39|67|30|45||75.0
TRBV6-5*00(863.7) TRBJ1-1*00(217.9) 519|529|556|0|10||50.0 22|40|68|25|43||90.0
TRBV29-1*00(942.1) TRBJ2-3*00(204.2) 700|707|734|0|7||35.0 24|41|69|28|45||85.0
TRBV27*00(923.6) TRBJ2-1*00(200.4) 556|573|593|0|17||85.0 24|42|70|30|48||90.0
TRBV29-1*00(955.5) TRBJ2-3*00(230.1) 700|710|734|0|10||50.0 19|41|69|17|39||110.0
TRBV5-1*00(693.5) TRBJ1-3*00(195.9) 554|569|590|0|15||75.0 26|42|70|17|33||80.0
TRBV20-1*00(935.1) TRBJ1-2*00(208.9) 759|766|793|0|7||35.0 25|40|68|24|39||75.0
TRBV30*00(904.5) TRBJ1-6*00(160.6) 787|799|821|0|12||60.0 28|45|73|25|42||85.0

指令是:cut -f 6-13 ERR3445170.clonotypes.TRB.txt|head|cut -f 1,3,5,7

第三,4一部分大伙儿实际上只是是关联CDR3地区

在其中第三一部分是CDR3地区的核苷酸编码序列,第四一部分是CDR3地区的氨基酸编码序列。

nSeqCDR3 aaSeqCDR3
TGTGCCAGCAGCTTGGCCCGAGAAGGGATTGAAAACACCATATATTTT CASSLAREGIENTIYF
TGTGCCAGCAGCCGCCTGGACCTAGGATTTTACGAGCAGTACTTC CASSRLDLGFYEQYF
TGTGCCAGCAATGTACAGCTCCTATAACACTGAAGCTTTCTTT CASNVQL_YNTEAFF
TGCAGCGGGCAAGGGGGGGAGAGGGGAGCAGATACGCAGTATTTT CSGQGGERGADTQYF
TGTGCCAGCAGTTTATCTCAAGGGACGCTATACAATGAGCAGTTCTTC CASSLSQGTLYNEQFF
TGCAGCGTTGTACCCGATAGCACAGATACGCAGTATTTT CSVVPDSTDTQYF
TGCGCCAGCAGCTTGACAAACACCATATATTTT CASSLTNTIYF
TGCAGTGTGTGGTGGACCGGGGGGTATGGCTACACCTTC CSVWWTGGYGYTF
TGTGCCTGGAGTCCCGGGTTTGGGGATTCACCCCTCCACTTT CAWSPGFGDSPLHF

指令是:cut -f 24,33 ERR3445170.clonotypes.TRB.txt|head

还可以自身搭建数据信息库

由于这一手机软件是内嵌了免疫力组库数据信息库,因此会依据出示的种群姓名,自身去核对BCR的IGH,IGK,IGL这3类,及其TCR的TRA,TRB,TRD,TRG,他们各有都是有V,D(可选择),J,C遗传基因。

将会会出現版本号升级不如时,或是你科学研究的种群,其实不在手机软件考虑到范畴内的状况,那麼你可以能是必须自主搭建哦。

参照en/