前边我领着大伙儿根据IMGT数据信息库认知能力免疫力组库,并且也一起从IMGT数据信息库免费下载免疫力组库有关fasta编码序列,免疫力组库关键的科学研究目标便是分为BCR的IGH,IGK,IGL这3类,及其TCR的TRA,TRB,TRD,TRG,他们各有都是有V,D(可选择),J,C遗传基因。
接下去又了解了免疫力组库测序数据信息,了解了免疫力组库测序数据信息的一些特点,而且应用igblast开展免疫力组库剖析了,可是哪个是基本的核对,尽管寻找每个测序片断的V,D(可选择),J,C遗传基因,而且取得 CDR3编码序列,正中间流程有点儿多,基本的测序数据信息过虑即使了,还必须把PE数据信息合拼,fastq文件格式变为fasta文件格式,并且自身igblast手机软件就难以应用,数据信息库文档搭建也繁杂。实际上也是有包裝好的一站式步骤。
NGS and de-multiplexing was performed on an Illumina MiSeq sequencer (600–cycle, single indexed, paired-end run). Analysis of the TRB puted using the MiXCR analysis tool
MiXCR是MiLaboratory开发设计的,她们试验室荣誉出品了3款:MiXCR, MIGEC, VDJtools ,在免疫力组库行业都很知名。在其中MiXCR工作中步骤提示图以下:
由于MiXCR有二进印刷制版本,因此强烈推荐立即免费下载它而且缓解压力就可以应用啦!
milaboratory/mixcr/releases/download/v3.0.13/mixcr-3.0.13.zip (今年0五月0五日查询的全新版) 文本文档:en/但是MiXCR是根据java服务平台开发设计,因此起动MiXCR情况下,必须确保java自然环境是ok的,并且规定java8哦。
假如是Linux或是macOS,那麼十分简易,立即应用里边的mixcr指令就可以启用此软件。
假如是Windows服务平台呢,必须应用 java -Xmx4g -Xms3g -jar 来启用免费下载到的MiXCR有二进印刷制版本里边的jar文档哦。
我的是macOS实际操作系统软件,手机软件免费下载缓解压力后以下所显示:
(base) jmzengdeMBP:mixcr-3.0.13 jmzeng$ pwd /Users/jmzeng/biosoft/MiXCR/mixcr-3.0.13 (base) jmzengdeMBP:mixcr-3.0.13 jmzeng$ tree -h ├── [1.4k高清] LICENSE ├── [ 96] libraries ├── [4.0K] mixcr └── [9.3M] mixcr.jar 1 directory, 3 files一站式步骤应用
MiXCR手机软件出示一个 analyze方式,能够一站式进行(align, assemblePartial, extend, assemble, assembleContigs and export) 这种剖析,必须用心读 en/ 表明书,其出示的事例以下:
mixcr analyze amplicon -s species \ --starting-material startingMaterial \ --5-end 5End --3-end 3End \ --adapters adapters \ [OPTIONS] input_file1 [input_file2] analysis_name
在前边大家了解的免疫力组库测序数据信息,是人们的,MiSeq测序仪,PE300测序对策,TRB,DNA测序。因此大家搭建的指令是:
mixcr=/Users/jmzeng/biosoft/MiXCR/mixcr-3.0.13/mixcr $mixcr analyze amplicon \ -s hsa \ --starting-material dna \ --5-end v-primers --3-end c-primers \ --adapters adapters-present \ raw/ERR3445170_1.fastq.gz raw/ERR3445170_2.fastq.gz ERR3445170
由于我设定了 —adapters adapters-present,因此这一手机软件也会帮助除去大家测序数据信息里边的接头编码序列,因此我这儿立即应用了raw测序数据信息fq文档开展剖析,不用历经前边igblastn的trim_galore全过程。
很轻轻松松就出去了断果,看上去启用了我Mac所有测算資源
运作系统日志以下;
NOTE: report file is not specified, using ERR3445170.report to write report. Alignment: 2.1% Alignment: 15.4% ETA: 00:00:38 Alignment: 27.9% ETA: 00:00:17 Alignment: 39.5% ETA: 00:00:15 Alignment: 51.5% ETA: 00:00:12 Alignment: 63% ETA: 00:00:09 Alignment: 74.2% ETA: 00:00:06 Alignment: 85.4% ETA: 00:00:03 Alignment: 96.9% ETA: 00:00:00 ============= Report ============== Analysis time: 29.52s Total sequencing reads: 84833 Successfully aligned reads: 83421 (98.34%) Paired-end alignment conflicts eliminated: 3027 (3.57%) Alignment failed, no hits (not TCR/IG?): 850 (1%) Alignment failed because of absence of V hits: 4 (0%) Alignment failed because of absence of J hits: 507 (0.6%) No target with both V and J alignments: 49 (0.06%) Alignment failed because of low total score: 2 (0%) Overlapped: 81097 (95.6%) Overlapped and aligned: 80179 (94.51%) Alignment-aided overlaps: 1313 (1.64%) Overlapped and not aligned: 918 (1.08%) V gene chimeras: 29 (0.03%) J gene chimeras: 4 (0%) TRB chains: 83418 (100%) IGH chains: 3 (0%) Realigned with forced non-floating bound: 10098 (11.9%) Realigned with forced non-floating right bound in left read: 3 (0%) Realigned with forced non-floating left bound in right read: 3 (0%) Initialization: progress unknown Assembling initial clonotypes: 100% Building clones: 25.2% Writing clones: 0% ============= Report ============== Analysis time: 2.33s Final clonotype count: 2116 Average number of reads per clonotype: 35.18 Reads used in clonotypes, percent of total: 74448 (87.76%) Reads used in clonotypes before clustering, percent of total: 76478 (90.15%) Number of reads used as a core, percent of used: 75378 (98.56%) Mapped low quality reads, percent of used: 1100 (1.44%) Reads clustered in PCR error correction, percent of used: 2030 (2.65%) Reads pre-clustered due to the similar VJC-lists, percent of used: 33 (0.04%) Reads dropped due to the lack of a clone sequence, percent of total: 4236 (4.99%) Reads dropped due to low quality, percent of total: 116 (0.14%) Reads dropped due to failed mapping, percent of total: 2590 (3.05%) Reads dropped with low quality clones, percent of total: 1 (0%) Clonotypes eliminated by PCR error correction: 832 Clonotypes dropped as low quality: 1 Clonotypes pre-clustered due to the similar VJC-lists: 27 TRB chains: 2116 (100%)
手机软件内嵌了免疫力组库数据信息库,因此会依据出示的种群姓名,自身去核对BCR的IGH,IGK,IGL这3类,及其TCR的TRA,TRB,TRD,TRG,他们各有都是有V,D(可选择),J,C遗传基因。由于大家这一检测数据信息是TRB的,因此仅有TRB的結果出去,以下:
2117 ERR3445170.clonotypes.ALL.txt 1 ERR3445170.clonotypes.IGH.txt 1 ERR3445170.clonotypes.IGK.txt 1 ERR3445170.clonotypes.IGL.txt 1 ERR3445170.clonotypes.TRA.txt 2117 ERR3445170.clonotypes.TRB.txt 1 ERR3445170.clonotypes.TRD.txt 1 ERR3445170.clonotypes.TRG.txt了解MiXCR手机软件的輸出結果
以下:
輸出的列较为多:
1 cloneId 2 cloneCount 3 cloneFraction 4 targetSequences 5 targetQualities 6 allVHitsWithScore 7 allDHitsWithScore 8 allJHitsWithScore 9 allCHitsWithScore 10 allVAlignments 11 allDAlignments 12 allJAlignments 13 allCAlignments 14 nSeqFR1 15 minQualFR1 16 nSeqCDR1 17 minQualCDR1 18 nSeqFR2 19 minQualFR2 20 nSeqCDR2 21 minQualCDR2 22 nSeqFR3 23 minQualFR3 24 nSeqCDR3 25 minQualCDR3 26 nSeqFR4 27 minQualFR4 28 aaSeqFR1 29 aaSeqCDR1 30 aaSeqFR2 31 aaSeqCDR2 32 aaSeqFR3 33 aaSeqCDR3 34 aaSeqFR4 35 refPoints
实际上较为非常容易了解,分为V,D(可选择),J遗传基因的核对状况,分为FR1-4及其CDR1-3的编码序列状况 第一一部分是免疫力组库关键编码序列
由于V,D,J,C遗传基因具备一定水平的类似性,因此其实不必须所有的好几百个碱基编码序列才可以明确是V,D,J,C遗传基因的什么构成。
cloneId cloneCount cloneFraction targetSequences 0 20112.0 0.270717 TGTGCCAGCAGCTTGGCCCGAGAAGGGATTGAAAACACCATATATTTT 1 9264.0 0.010316 TGTGCCAGCAGCCGCCTGGACCTAGGATTTTACGAGCAGTACTTC 2 5037.0 0.77111 TGTGCCAGCAATGTACAGCTCCTATAACACTGAAGCTTTCTTT 3 2710.0 0.0364085 TGCAGCGGGCAAGGGGGGGAGAGGGGAGCAGATACGCAGTATTTT 4 1851.0 0.310768 TGTGCCAGCAGTTTATCTCAAGGGACGCTATACAATGAGCAGTTCTTC 5 1084.0 0.0051794 TGCAGCGTTGTACCCGATAGCACAGATACGCAGTATTTT 6 1082.0 0.0485279 TGCGCCAGCAGCTTGACAAACACCATATATTTT 7 890.0 0.009972 TGCAGTGTGTGGTGGACCGGGGGGTATGGCTACACCTTC 8 538.0 0.392865 TGTGCCTGGAGTCCCGGGTTTGGGGATTCACCCCTCCACTTT
指令是:cut -f 1-4 ERR3445170.clonotypes.TRB.txt|head
第二一部分是V,D,J,C遗传基因关键是便捷统计分析该样版的V,D,J,C遗传基因占有率
allVHitsWithScore allJHitsWithScore allVAlignments allJAlignments TRBV5-5*00(886.1) TRBJ1-3*00(197) 549|567|585|0|18||90.0 26|42|70|32|48||80.0 TRBV6-5*00(872.3) TRBJ2-7*00(175.7) 519|530|556|0|11||55.0 24|39|67|30|45||75.0 TRBV6-5*00(863.7) TRBJ1-1*00(217.9) 519|529|556|0|10||50.0 22|40|68|25|43||90.0 TRBV29-1*00(942.1) TRBJ2-3*00(204.2) 700|707|734|0|7||35.0 24|41|69|28|45||85.0 TRBV27*00(923.6) TRBJ2-1*00(200.4) 556|573|593|0|17||85.0 24|42|70|30|48||90.0 TRBV29-1*00(955.5) TRBJ2-3*00(230.1) 700|710|734|0|10||50.0 19|41|69|17|39||110.0 TRBV5-1*00(693.5) TRBJ1-3*00(195.9) 554|569|590|0|15||75.0 26|42|70|17|33||80.0 TRBV20-1*00(935.1) TRBJ1-2*00(208.9) 759|766|793|0|7||35.0 25|40|68|24|39||75.0 TRBV30*00(904.5) TRBJ1-6*00(160.6) 787|799|821|0|12||60.0 28|45|73|25|42||85.0
指令是:cut -f 6-13 ERR3445170.clonotypes.TRB.txt|head|cut -f 1,3,5,7
第三,4一部分大伙儿实际上只是是关联CDR3地区在其中第三一部分是CDR3地区的核苷酸编码序列,第四一部分是CDR3地区的氨基酸编码序列。
nSeqCDR3 aaSeqCDR3 TGTGCCAGCAGCTTGGCCCGAGAAGGGATTGAAAACACCATATATTTT CASSLAREGIENTIYF TGTGCCAGCAGCCGCCTGGACCTAGGATTTTACGAGCAGTACTTC CASSRLDLGFYEQYF TGTGCCAGCAATGTACAGCTCCTATAACACTGAAGCTTTCTTT CASNVQL_YNTEAFF TGCAGCGGGCAAGGGGGGGAGAGGGGAGCAGATACGCAGTATTTT CSGQGGERGADTQYF TGTGCCAGCAGTTTATCTCAAGGGACGCTATACAATGAGCAGTTCTTC CASSLSQGTLYNEQFF TGCAGCGTTGTACCCGATAGCACAGATACGCAGTATTTT CSVVPDSTDTQYF TGCGCCAGCAGCTTGACAAACACCATATATTTT CASSLTNTIYF TGCAGTGTGTGGTGGACCGGGGGGTATGGCTACACCTTC CSVWWTGGYGYTF TGTGCCTGGAGTCCCGGGTTTGGGGATTCACCCCTCCACTTT CAWSPGFGDSPLHF
指令是:cut -f 24,33 ERR3445170.clonotypes.TRB.txt|head
还可以自身搭建数据信息库由于这一手机软件是内嵌了免疫力组库数据信息库,因此会依据出示的种群姓名,自身去核对BCR的IGH,IGK,IGL这3类,及其TCR的TRA,TRB,TRD,TRG,他们各有都是有V,D(可选择),J,C遗传基因。
将会会出現版本号升级不如时,或是你科学研究的种群,其实不在手机软件考虑到范畴内的状况,那麼你可以能是必须自主搭建哦。
参照en/