diff --git a/GSoC.md b/GSoC.md new file mode 100644 index 0000000..d631352 --- /dev/null +++ b/GSoC.md @@ -0,0 +1,45 @@ +@def title = "JuliaHealth - Google Summer of Code" + +This page lists our [Google Summer of Code (GSoC)](https://summerofcode.withgoogle.com) fellows and their experiences working across the JuliaHealth ecosystem. +Students interested in being a GSoC fellow should review these past projects to get a sense for what we look for in building projects that contribute to the JuliaHealth ecosystem. + +\toc + +# GSoC 2023 + +## JuliaHealth's Tools for Patient-Level Predictions: Strengthening Capacity and Innovation + +**Student:** Fareeda Abdelazeez + +**Mentor:** Jacob Zelko + +[Project Proposal](https://docs.google.com/document/d/18-p6VG6MwvzFdyA45MvXyqxOLVEByFP6D_gff9-E1XE/edit#heading=h.zgq6k5hzq0t) ##add the pdf + +**Summary:** Working with the OMOP CDM (Observational Medical Outcomes Partnership Common Data Model) involves handling large datasets that require a set of tools for extracting necessary data efficiently for various analyses. +The first part of the project focused on improving JuliaHealth's infrastructure by increasing the range of tools available to users. +This involved enabling connections to various databases and working with building understanding on how to robustly work with observational health data. +The second goal was to leverage the capacity built in the previous phase to develop a comprehensive framework for patient-level prediction. +This framework explored how to predict patient cohort outcomes with given treatments and was tested on the [MIMIC III dataset](https://physionet.org/content/mimiciii/1.4/) that was converted to the OMOP CDM. + + + +**Fellowship core accomplishments:** + +The [PR](https://github.com/JuliaHealth/OMOPCDMCohortCreator.jl/pull/54) for OMOPCDMCohortCreator added new features: + +- Enriched OMOPCDMCohortCreator tools +- Intensive tests for new functions +- Updated the documentation + +The [PR](https://github.com/JuliaDatabases/DBConnector.jl/pull/13) +for DBConnector + +- Created a documentation +- Rewired tools used to connect to SQLite,postgresql, MySQL +- Created test unit + +This Jupyter notebook shows the flow of creating a prediction model from OMOP CDM using developed packages through the program. +You can find it on juliahealth website - the tutorial section. +For more details, this [blogpost] (https://medium.com/@fareedaabdelazeez/google-summer-of-code-2023-strengthening-healthcare-with-juliahealth-7b8fde5af9ec) wraps up the details of GSoC program achievements. Check Acknowledgments too! + +- Poster presentation at [JuliaCon 2023, _JuliaHealth's Tools for Patient-Level Predictions: Strengthening Capacity and Innovation_](/assets/JuliaCon-gsoc.pdf) diff --git a/_assets/JuliaCon-gsoc.pdf b/_assets/JuliaCon-gsoc.pdf new file mode 100644 index 0000000..ff8835d Binary files /dev/null and b/_assets/JuliaCon-gsoc.pdf differ diff --git a/_assets/JuliaHealth-Patient-level-prediction.html b/_assets/JuliaHealth-Patient-level-prediction.html new file mode 100644 index 0000000..5ad673a --- /dev/null +++ b/_assets/JuliaHealth-Patient-level-prediction.html @@ -0,0 +1,16901 @@ + + +
+ + +using SQLite
+using LibPQ
+using DataFrames
+using DBInterface
+import OMOPCDMCohortCreator as occ
+"""
+using OMOPCDMDatabaseConnector
+This package will connect the database directly in the future but it is still under maintanence
+"""
+
+#connecting the basic way
+DBconn = DBInterface.connect(LibPQ.Connection,"**********************************")
+occ.GenerateDatabaseDetails(
+ :postgresql,
+ "omop"
+ )
+tables = occ.GenerateTables(DBconn)
+occ.GetDatabasePersonIDs(DBconn)
+[ Info: Global database dialect set to: postgresql +[ Info: Global schema set to: omop +[ Info: measurement table generated internally +[ Info: payer_plan_period table generated internally +[ Info: location table generated internally +[ Info: source_to_concept_map table generated internally +[ Info: note_nlp table generated internally +[ Info: visit_detail_assign table generated internally +[ Info: visit_occurrence table generated internally +[ Info: vocabulary table generated internally +[ Info: procedure_occurrence table generated internally +[ Info: relationship table generated internally +[ Info: domain table generated internally +[ Info: dose_era table generated internally +[ Info: concept table generated internally +[ Info: death table generated internally +[ Info: metadata table generated internally +[ Info: concept_class table generated internally +[ Info: drug_era table generated internally +[ Info: note table generated internally +[ Info: specimen table generated internally +[ Info: condition_occurrence table generated internally +[ Info: concept_ancestor table generated internally +[ Info: cohort table generated internally +[ Info: fact_relationship table generated internally +[ Info: drug_exposure table generated internally +[ Info: person table generated internally +[ Info: observation_period table generated internally +[ Info: cost table generated internally +[ Info: cohort_attribute table generated internally +[ Info: observation table generated internally +[ Info: condition_era table generated internally +[ Info: concept_relationship table generated internally +[ Info: provider table generated internally +[ Info: concept_synonym table generated internally +[ Info: attribute_definition table generated internally +[ Info: cdm_source table generated internally +[ Info: cohort_definition table generated internally +[ Info: drug_strength table generated internally +[ Info: visit_detail table generated internally +[ Info: care_site table generated internally +[ Info: device_exposure table generated internally ++
46520-element Vector{Int64}:
+ 622701440
+ 622684030
+ 622692774
+ 622709475
+ 622705072
+ 622691611
+ 622703768
+ 622697129
+ 622701153
+ 622682262
+ 622705465
+ 622711774
+ 622693042
+ ⋮
+ 622690452
+ 622698813
+ 622691630
+ 622689444
+ 622691888
+ 622678894
+ 622709120
+ 622702234
+ 622706201
+ 622680998
+ 622693599
+ 622698890
+### The concepts for AFib and Stroke from ATLAS OHDSI
+
+Afib = [4199501,313217]
+stroke = [4164092,
+44784623,
+40480002,
+43530679,
+437544,
+43531622,
+4112018,
+374055,
+759831,
+442615,
+437540,
+4201094,
+4326561,
+4029497,
+380747,
+372924,
+316437,
+375557,
+376713,
+374384,
+441874,
+381316,
+381591,
+44782819,
+36712779,
+438873,
+438881,
+438270,
+434166,
+440537,
+4014781,
+372721,
+4159164,
+4153380,
+37109512,
+432346,
+43530687,
+40479575,
+4306943,
+441246,
+40481762,
+40484522,
+40484513,
+436277,
+437427,
+439190,
+439847,
+42873157,
+444197,
+444198,
+444196,
+433624,
+436526,
+40492969,
+434155,
+4310996,
+434056,
+40480938,
+40481842,
+378774,
+377254,
+444091,
+443790,
+443864,
+314667,
+436430,
+4162038,
+435378,
+433037,
+372654,
+443599,
+436519,
+260841,
+313272,
+313833,
+40480449,
+432923,
+4134162,
+441709,
+440244,
+378544,
+4318408,
+439040,
+433050,
+618759,
+4045745,
+373503,
+4154699,
+4136546,
+4017107,
+380423,
+434656,
+43531583]
+93-element Vector{Int64}:
+ 4164092
+ 44784623
+ 40480002
+ 43530679
+ 437544
+ 43531622
+ 4112018
+ 374055
+ 759831
+ 442615
+ 437540
+ 4201094
+ 4326561
+ ⋮
+ 4318408
+ 439040
+ 433050
+ 618759
+ 4045745
+ 373503
+ 4154699
+ 4136546
+ 4017107
+ 380423
+ 434656
+ 43531583
+### since mimic iii data manipulated date of birth for patients for privacy issues, it can be retrieved from mimic data using this query
+## IMPORTANT NOTE: Addition of 229896253 for each mimic_id is a special case for our database, Almost you don't have to add it
+
+Mconn = LibPQ.Connection("*******************************************")
+LibPQ.execute(Mconn,"set search_path to mimiciii")
+
+### something happened in the database and each mimic_id is past its corresponding person_id by 229896253
+
+age_mimic = LibPQ.execute(Mconn,"SELECT pat.subject_id, (pat.mimic_id) AS person_id,
+ CAST(CAST(EXTRACT(epoch FROM adm.admittime - pat.dob)/(60*60*24*365.242) AS numeric) AS integer) AS age
+FROM icustays ie
+INNER JOIN admissions adm
+ ON ie.hadm_id = adm.hadm_id
+INNER JOIN patients pat
+ ON ie.subject_id = pat.subject_id
+;") |> DataFrame
+age_mimic = unique(age_mimic, :person_id)
+sort(age_mimic, :person_id)
+| Row | subject_id | person_id | age |
|---|---|---|---|
| Int32? | Int32? | Int32? | |
| 1 | 249 | 622672103 | 75 |
| 2 | 250 | 622672104 | 24 |
| 3 | 251 | 622672105 | 20 |
| 4 | 252 | 622672106 | 55 |
| 5 | 253 | 622672107 | 84 |
| 6 | 255 | 622672108 | 78 |
| 7 | 256 | 622672109 | 77 |
| 8 | 257 | 622672110 | 82 |
| 9 | 258 | 622672111 | 0 |
| 10 | 260 | 622672112 | 0 |
| 11 | 261 | 622672113 | 76 |
| 12 | 262 | 622672114 | 64 |
| 13 | 263 | 622672115 | 56 |
| ⋮ | ⋮ | ⋮ | ⋮ |
| 46465 | 44065 | 622718611 | 66 |
| 46466 | 44069 | 622718612 | 67 |
| 46467 | 44071 | 622718613 | 60 |
| 46468 | 44073 | 622718614 | 88 |
| 46469 | 44082 | 622718615 | 66 |
| 46470 | 44083 | 622718616 | 54 |
| 46471 | 44084 | 622718617 | 58 |
| 46472 | 44089 | 622718618 | 85 |
| 46473 | 44115 | 622718619 | 37 |
| 46474 | 44123 | 622718620 | 85 |
| 46475 | 44126 | 622718621 | 52 |
| 46476 | 44128 | 622718622 | 51 |
### Getting list of person_ids who suffer from AFib and stroke using OMOPCDMCohortCreator###
+AFib_combined_df = DataFrame()
+stroke_combined_df = DataFrame()
+AFib_combined_df = occ.ConditionFilterPersonIDs(Afib,DBconn)
+AFib_combined_df[!, :has_AFib] .= 1
+stroke_combined_df = occ.ConditionFilterPersonIDs(stroke,DBconn)
+stroke_combined_df[!, :has_stroke] .= 1
+df = outerjoin(AFib_combined_df, stroke_combined_df, on = [:person_id => :person_id], matchmissing = :equal)
+df = coalesce.(df, 0)
+| Row | person_id | has_AFib | has_stroke |
|---|---|---|---|
| Int32 | Int64 | Int64 | |
| 1 | 622707430 | 1 | 1 |
| 2 | 622716907 | 1 | 1 |
| 3 | 622704382 | 1 | 1 |
| 4 | 622695167 | 1 | 1 |
| 5 | 622679460 | 1 | 1 |
| 6 | 622673865 | 1 | 1 |
| 7 | 622690154 | 1 | 1 |
| 8 | 622679180 | 1 | 1 |
| 9 | 622678192 | 1 | 1 |
| 10 | 622702721 | 1 | 1 |
| 11 | 622693672 | 1 | 1 |
| 12 | 622697655 | 1 | 1 |
| 13 | 622713836 | 1 | 1 |
| ⋮ | ⋮ | ⋮ | ⋮ |
| 15126 | 622707415 | 0 | 1 |
| 15127 | 622700696 | 0 | 1 |
| 15128 | 622679790 | 0 | 1 |
| 15129 | 622687676 | 0 | 1 |
| 15130 | 622691527 | 0 | 1 |
| 15131 | 622717579 | 0 | 1 |
| 15132 | 622698960 | 0 | 1 |
| 15133 | 622702426 | 0 | 1 |
| 15134 | 622708972 | 0 | 1 |
| 15135 | 622701212 | 0 | 1 |
| 15136 | 622717243 | 0 | 1 |
| 15137 | 622690412 | 0 | 1 |
#Getting each patiend gender and race
+df = occ.GetPatientGender(df, DBconn)
+df = occ.GetPatientRace(df, DBconn)
+| Row | person_id | race_concept_id | gender_concept_id | has_AFib | has_stroke |
|---|---|---|---|---|---|
| Int32? | Int32? | Int32? | Int64? | Int64? | |
| 1 | 622707430 | 8527 | 8507 | 1 | 1 |
| 2 | 622716907 | 8527 | 8507 | 1 | 1 |
| 3 | 622704382 | 8527 | 8507 | 1 | 1 |
| 4 | 622695167 | 8527 | 8532 | 1 | 1 |
| 5 | 622679460 | 4087921 | 8532 | 1 | 1 |
| 6 | 622673865 | 8527 | 8532 | 1 | 1 |
| 7 | 622690154 | 8527 | 8507 | 1 | 1 |
| 8 | 622679180 | 8527 | 8507 | 1 | 1 |
| 9 | 622678192 | 8527 | 8507 | 1 | 1 |
| 10 | 622702721 | 4188159 | 8507 | 1 | 1 |
| 11 | 622693672 | 4218674 | 8532 | 1 | 1 |
| 12 | 622697655 | 8527 | 8507 | 1 | 1 |
| 13 | 622713836 | 8527 | 8507 | 1 | 1 |
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
| 15126 | 622707415 | 8527 | 8507 | 0 | 1 |
| 15127 | 622700696 | 4087921 | 8532 | 0 | 1 |
| 15128 | 622679790 | 8527 | 8507 | 0 | 1 |
| 15129 | 622687676 | 4188159 | 8532 | 0 | 1 |
| 15130 | 622691527 | 8527 | 8532 | 0 | 1 |
| 15131 | 622717579 | 8527 | 8532 | 0 | 1 |
| 15132 | 622698960 | 8527 | 8507 | 0 | 1 |
| 15133 | 622702426 | 8515 | 8507 | 0 | 1 |
| 15134 | 622708972 | 8527 | 8507 | 0 | 1 |
| 15135 | 622701212 | 38003599 | 8532 | 0 | 1 |
| 15136 | 622717243 | 8527 | 8532 | 0 | 1 |
| 15137 | 622690412 | 8527 | 8532 | 0 | 1 |
#joining the table with their ages
+df = leftjoin(df, age_mimic, on = [:person_id => :person_id], matchmissing = :equal, makeunique=true)
+sort(df, :person_id)
+
+ArgumentError: column :person_id not found in the left data frame
+
+Stacktrace:
+ [1] DataFrames.DataFrameJoiner(dfl::DataFrame, dfr::DataFrame, on::Vector{Pair{Symbol, Symbol}}, matchmissing::Symbol, kind::Symbol)
+ @ DataFrames ~/.julia/packages/DataFrames/58MUJ/src/join/composer.jl:54
+ [2] _join(df1::DataFrame, df2::DataFrame; on::Vector{Pair{Symbol, Symbol}}, kind::Symbol, makeunique::Bool, indicator::Nothing, validate::Tuple{Bool, Bool}, left_rename::typeof(identity), right_rename::typeof(identity), matchmissing::Symbol, order::Symbol)
+ @ DataFrames ~/.julia/packages/DataFrames/58MUJ/src/join/composer.jl:504
+ [3] #leftjoin#673
+ @ ~/.julia/packages/DataFrames/58MUJ/src/join/composer.jl:940 [inlined]
+ [4] top-level scope
+ @ In[9]:2
+#df.age .= ifelse.(df.age .> 89, 90, df.age)
+df
+| Row | person_id | race_concept_id | gender_concept_id | has_AFib | has_stroke | subject_id | age |
|---|---|---|---|---|---|---|---|
| Int32 | Int32 | Int32 | Int64 | Int64 | Int32 | Int32 | |
| 1 | 622672122 | 4218674 | 8507 | 0 | 1 | 270 | 80 |
| 2 | 622672559 | 38003599 | 8507 | 0 | 1 | 274 | 66 |
| 3 | 622672560 | 8527 | 8507 | 0 | 1 | 275 | 82 |
| 4 | 622672566 | 8527 | 8532 | 0 | 1 | 282 | 74 |
| 5 | 622672568 | 8527 | 8532 | 1 | 0 | 284 | 87 |
| 6 | 622672570 | 8527 | 8532 | 1 | 0 | 286 | 85 |
| 7 | 622672573 | 4218674 | 8507 | 1 | 0 | 290 | 74 |
| 8 | 622672587 | 38003599 | 8532 | 1 | 1 | 304 | 300 |
| 9 | 622672588 | 8527 | 8532 | 1 | 1 | 305 | 73 |
| 10 | 622672589 | 8527 | 8532 | 0 | 1 | 306 | 61 |
| 11 | 622672590 | 8527 | 8532 | 0 | 1 | 307 | 75 |
| 12 | 622672595 | 4218674 | 8532 | 0 | 1 | 313 | 79 |
| 13 | 622672603 | 4218674 | 8532 | 1 | 0 | 321 | 75 |
| ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
| 15116 | 622713243 | 4218674 | 8532 | 1 | 0 | 94879 | 300 |
| 15117 | 622713245 | 8527 | 8507 | 1 | 0 | 94889 | 53 |
| 15118 | 622713246 | 8527 | 8507 | 1 | 1 | 94896 | 77 |
| 15119 | 622713249 | 8527 | 8507 | 1 | 0 | 94906 | 80 |
| 15120 | 622713254 | 8527 | 8507 | 1 | 0 | 94916 | 300 |
| 15121 | 622713255 | 8527 | 8507 | 1 | 0 | 94921 | 69 |
| 15122 | 622713256 | 8527 | 8532 | 0 | 1 | 94924 | 75 |
| 15123 | 622713257 | 38003599 | 8532 | 1 | 1 | 94926 | 87 |
| 15124 | 622713258 | 8527 | 8532 | 0 | 1 | 94932 | 47 |
| 15125 | 622713261 | 8527 | 8507 | 1 | 1 | 94942 | 80 |
| 15126 | 622713264 | 8527 | 8532 | 0 | 1 | 94953 | 53 |
| 15127 | 622713265 | 4218674 | 8532 | 1 | 0 | 94954 | 68 |
[count(ismissing,col) for col in eachcol(df)]
+dropmissing!(df)
+[count(ismissing,col) for col in eachcol(df)]
+5-element Vector{Int64}:
+ 0
+ 0
+ 0
+ 0
+ 0
+eltype.(eachcol(df))
+7-element Vector{DataType}:
+ Int32
+ Int32
+ Int32
+ Int64
+ Int64
+ Int32
+ Int32
+# dropping person_id and subject_id columns because they won't be needed
+df = select(df, Not(:person_id))
+df = select(df, Not(:subject_id))
+
+ArgumentError: column name :subject_id not found in the data frame
+
+Stacktrace:
+ [1] lookupname
+ @ ~/.julia/packages/DataFrames/58MUJ/src/other/index.jl:413 [inlined]
+ [2] getindex
+ @ ~/.julia/packages/DataFrames/58MUJ/src/other/index.jl:422 [inlined]
+ [3] getindex
+ @ ~/.julia/packages/DataFrames/58MUJ/src/other/index.jl:227 [inlined]
+ [4] manipulate(df::DataFrame, c::InvertedIndex{Symbol}; copycols::Bool, keeprows::Bool, renamecols::Bool)
+ @ DataFrames ~/.julia/packages/DataFrames/58MUJ/src/abstractdataframe/selection.jl:1836
+ [5] select(df::DataFrame, args::Any; copycols::Bool, renamecols::Bool, threads::Bool)
+ @ DataFrames ~/.julia/packages/DataFrames/58MUJ/src/abstractdataframe/selection.jl:1299
+ [6] select(df::DataFrame, args::Any)
+ @ DataFrames ~/.julia/packages/DataFrames/58MUJ/src/abstractdataframe/selection.jl:1299
+ [7] top-level scope
+ @ In[8]:3
+eltype.(eachcol(df))
+df.age = convert.(Int64, df.age)
+df.race_concept_id = convert.(Int64, df.race_concept_id)
+df.gender_concept_id = convert.(Int64, df.gender_concept_id)
+df.has_AFib = convert.(Int64, df.has_AFib)
+df.has_stroke = convert.(Int64, df.has_stroke)
+df = select(df, Not(:gender_concept_id))
+df = select(df, Not(:race_concept_id))
++ArgumentError: column name :age not found in the data frame + +Stacktrace: + [1] lookupname + @ ~/.julia/packages/DataFrames/58MUJ/src/other/index.jl:413 [inlined] + [2] getindex + @ ~/.julia/packages/DataFrames/58MUJ/src/other/index.jl:422 [inlined] + [3] getindex(df::DataFrame, #unused#::typeof(!), col_ind::Symbol) + @ DataFrames ~/.julia/packages/DataFrames/58MUJ/src/dataframe/dataframe.jl:557 + [4] getproperty(df::DataFrame, col_ind::Symbol) + @ DataFrames ~/.julia/packages/DataFrames/58MUJ/src/abstractdataframe/abstractdataframe.jl:431 + [5] top-level scope + @ In[10]:2+
describe(df)
+| Row | variable | mean | min | median | max | nmissing | eltype |
|---|---|---|---|---|---|---|---|
| Symbol | Float64 | Int64 | Float64 | Int64 | Int64 | DataType | |
| 1 | has_AFib | 0.681766 | 0 | 1.0 | 1 | 0 | Int64 |
| 2 | has_stroke | 0.44256 | 0 | 0.0 | 1 | 0 | Int64 |
| 3 | age | 87.8847 | 0 | 74.0 | 307 | 0 | Int64 |
using Pkg
+using StatsBase
+countmap(df.has_stroke)
+Dict{Int64, Int64} with 2 entries:
+ 0 => 2892
+ 1 => 2296
+#using MLJ
+#df, df_test = partition(df, 0.7, rng=123, shuffle= true)
+(3632×3 DataFrame + Row │ has_AFib has_stroke age + │ Int64 Int64 Int64 +──────┼───────────────────────────── + 1 │ 1 0 69 + 2 │ 1 0 67 + 3 │ 1 0 62 + 4 │ 0 1 49 + 5 │ 0 1 63 + 6 │ 1 0 44 + 7 │ 1 0 88 + 8 │ 1 0 83 + 9 │ 1 0 85 + 10 │ 1 0 79 + 11 │ 1 0 67 + ⋮ │ ⋮ ⋮ ⋮ + 3623 │ 1 0 80 + 3624 │ 0 1 81 + 3625 │ 1 1 77 + 3626 │ 0 1 79 + 3627 │ 1 0 73 + 3628 │ 1 0 76 + 3629 │ 1 0 71 + 3630 │ 1 0 88 + 3631 │ 1 0 80 + 3632 │ 1 0 84 + 3611 rows omitted, 1556×3 DataFrame + Row │ has_AFib has_stroke age + │ Int64 Int64 Int64 +──────┼───────────────────────────── + 1 │ 1 0 77 + 2 │ 1 1 300 + 3 │ 1 0 80 + 4 │ 1 0 84 + 5 │ 1 0 84 + 6 │ 1 0 89 + 7 │ 1 0 86 + 8 │ 0 1 60 + 9 │ 1 1 67 + 10 │ 1 0 83 + 11 │ 1 0 85 + ⋮ │ ⋮ ⋮ ⋮ + 1547 │ 1 0 59 + 1548 │ 1 1 71 + 1549 │ 1 0 73 + 1550 │ 1 0 59 + 1551 │ 1 0 300 + 1552 │ 1 1 53 + 1553 │ 1 1 58 + 1554 │ 0 1 74 + 1555 │ 1 0 59 + 1556 │ 1 0 85 + 1535 rows omitted)+
using MLJ
+using BetaML
+
+# Split the data into training and test sets
+#selected_features = [:gender_concept_id, :age, :race_concept_id, :has_AFib]
+selected_features = [ :age, :has_AFib]
+
+X = select(df, selected_features)
+y = df.has_stroke
+train, test = partition(df, 0.7, rng=123, shuffle=true)
+train_X = select(train, selected_features)
+train_y = train.has_stroke
+test_X = select(test, selected_features)
+test_y = test.has_stroke
+
+models(matching(X, y))
+3-element Vector{NamedTuple{(:name, :package_name, :is_supervised, :abstract_type, :deep_properties, :docstring, :fit_data_scitype, :human_name, :hyperparameter_ranges, :hyperparameter_types, :hyperparameters, :implemented_methods, :inverse_transform_scitype, :is_pure_julia, :is_wrapper, :iteration_parameter, :load_path, :package_license, :package_url, :package_uuid, :predict_scitype, :prediction_type, :reporting_operations, :reports_feature_importances, :supports_class_weights, :supports_online, :supports_training_losses, :supports_weights, :transform_scitype, :input_scitype, :target_scitype, :output_scitype)}}:
+ (name = EvoTreeCount, package_name = EvoTrees, ... )
+ (name = NeuralNetworkClassifier, package_name = BetaML, ... )
+ (name = NeuralNetworkRegressor, package_name = BetaML, ... )
+Tree = @load EvoTreeClassifier pkg = EvoTrees
+using EvoTrees
+config = EvoTreeClassifier(
+ loss=:mse,
+ nrounds=100,
+ max_depth=6,
+ nbins=32,
+ eta=0.1)
+import EvoTrees ✔ ++
[ Info: For silent loading, specify `verbosity=0`. ++
EvoTreeClassifier( + nrounds = 100, + lambda = 0.0, + gamma = 0.0, + eta = 0.1, + max_depth = 6, + min_weight = 1.0, + rowsample = 1.0, + colsample = 1.0, + nbins = 32, + alpha = 0.5, + tree_type = "binary", + rng = Random.MersenneTwister(123))+
train_X =Matrix(train_X)
+m = fit_evotree(config; x_train = train_X, y_train=train_y)
+┌ Info: EvoTreeClassifier{EvoTrees.MLogLoss} +│ - nrounds: 100 +│ - lambda: 0.0 +│ - gamma: 0.0 +│ - eta: 0.1 +│ - max_depth: 6 +│ - min_weight: 1.0 +│ - rowsample: 1.0 +│ - colsample: 1.0 +│ - nbins: 32 +│ - alpha: 0.5 +│ - tree_type: binary +└ - rng: Random.MersenneTwister(123, (0, 4008, 3006, 626)) ++
EvoTree{EvoTrees.MLogLoss, 2}
+ - Contains 101 trees in field `trees` (incl. 1 bias tree).
+ - Data input has 2 features.
+ - [:target_levels, :fnames, :feattypes, :edges, :featbins] info accessible in field `info`
+
+test_X = Matrix(test_X)
+preds = MLJ.predict( m,test_X)
+1090×2 Matrix{Float32}:
+ 0.800264 0.199736
+ 0.814238 0.185762
+ 1.36711f-5 0.999986
+ 0.847743 0.152257
+ 0.827981 0.172019
+ 1.36711f-5 0.999986
+ 0.741776 0.258224
+ 0.822133 0.177867
+ 1.36711f-5 0.999986
+ 0.847743 0.152257
+ 1.36711f-5 0.999986
+ 0.800264 0.199736
+ 1.36711f-5 0.999986
+ ⋮
+ 1.36711f-5 0.999986
+ 0.797935 0.202065
+ 0.735359 0.264641
+ 0.735655 0.264345
+ 0.741776 0.258224
+ 0.860911 0.139089
+ 0.908374 0.0916265
+ 1.36711f-5 0.999986
+ 0.908374 0.0916265
+ 0.826605 0.173395
+ 1.36711f-5 0.999986
+ 1.36711f-5 0.999986
+features_gain = EvoTrees.importance(m)
+2-element Vector{Pair{String, Float64}}:
+ "feat_2" => 0.9758581174651332
+ "feat_1" => 0.024141882534866845
+using Plots
+plot(preds)
+preds[:,2]= [if x < 0.5 0 else 1 end for x in preds[:,2]];
+preds
+1090×2 Matrix{Float32}:
+ 0.800264 0.0
+ 0.814238 0.0
+ 1.36711f-5 1.0
+ 0.847743 0.0
+ 0.827981 0.0
+ 1.36711f-5 1.0
+ 0.741776 0.0
+ 0.822133 0.0
+ 1.36711f-5 1.0
+ 0.847743 0.0
+ 1.36711f-5 1.0
+ 0.800264 0.0
+ 1.36711f-5 1.0
+ ⋮
+ 1.36711f-5 1.0
+ 0.797935 0.0
+ 0.735359 0.0
+ 0.735655 0.0
+ 0.741776 0.0
+ 0.860911 0.0
+ 0.908374 0.0
+ 1.36711f-5 1.0
+ 0.908374 0.0
+ 0.826605 0.0
+ 1.36711f-5 1.0
+ 1.36711f-5 1.0
+pred = convert.(Int64, preds[:,2])
+prediction_df = DataFrame(y_actual = test_y, y_predicted = pred, prob_predicted = preds[:,1]);
+prediction_df.correctly_classified = prediction_df.y_actual .== prediction_df.y_predicted
+1090-element BitVector: + 1 + 1 + 1 + 1 + 0 + 1 + 0 + 0 + 1 + 1 + 1 + 1 + 1 + ⋮ + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1 + 1+
accuracy = mean(prediction_df.correctly_classified)
+print("Accuracy of the model is : ",accuracy)
+Accuracy of the model is : 0.865137614678899+